Assignment - Dimensionality Reduction (assignment.ipynb)
This assignment is based on content discussed in module 6 and will work with the famous MNIST dataset, which is a set of images of handwritten digits https:
en.wikipedia.org/wiki/MNIST_database.
The dataset has been provided to you in a .csv file. – (mnist_dataset.csv)
Learning outcomes
· Apply a Random Forest classification algorithm to MNIST dataset
· Perform dimensionality reduction of features using PCA and compare classification on the reduced dataset to that of original one
· Apply dimensionality reduction techniques: t-SNE and LLE
Questions (15 points total)
Question 1 (1 point). Load the MNIST dataset and split it into a training set and a test set (take the first 60,000 instances for training, and the remaining 10,000 for testing).
Question 2 (2 points). Train a Random Forest classifier on the dataset and time how long it takes, then evaluate the resulting model on the test set.
Question 3 (4 points). Next, use PCA to reduce the dataset’s dimensionality, with an explained variance ratio of 95%. Train a new Random Forest classifier on the reduced dataset and see how long it takes. Was training much faster? Next evaluate the classifier on the test set: how does it compare to the previous classifier?
Question 4 (4 points). Use t-SNE to reduce the MNIST dataset, show result graphically.
Question 5 (4 points). Compare with other dimensionality methods: Locally Linear Embedding (LLE) or Multidimensional scaling (MDS).