Homework 10
Reading
• Required: View https:
www.youtube.com/watch?v=k3AiUhwHQ28. This is a lecture by
Suvrit Sra. He is guest lecturing in Gil Strang’s MIT course on computational methods
in machine learning. View the whole lecture as it connects to many themes we have been
discussing.
• Optional: Sauer, Sections 5.1.1, XXXXXXXXXXDifferentiation using finite difference approximations.
Problems
1. Consider f(x) = ex. Note that f ′(0) = 1. Consider the finite difference:
f(0 + h) − f(0)
h
. (1)
For h = 10i with i = −20,−19,−18, . . . ,−1, 0 calculate the finite differences. For each h,
determine how many digits in the finite difference estimate are co
ect (you know the true
value of the derivative is 1). Note that XXXXXXXXXXis co
ect up to 4 digits since XXXXXXXXXX · · · = 1.
Explain your results given finite differences and floating point e
or. DON’T FORGET TO
SET options(digits=16) so you can see 16 digits.
2. Consider the MNIST dataset from homework 8. Recall, in that homework, we used a logistic
egression in 784 dimensions to build a classifier for the number 3. Here, we will use PCA to
visualize and dimensionaly reduce the dataset.
(a) In order to visualize the dataset, apply a two-dimensional PCA to the dataset and
plot the coeffecients for the first two principle components. Use orthogonalized powe
iteration to compute the two principle components yourself. (Don’t forget to subtract
off the mean!) Color the points according to the number represented by the image in
the sample, i.e. the value given in the first column of mtrain.csv. (You can use the
first 1000 rows since plotting 60, 000 points takes a while.)
(b) Apply the PCA to reduce the dimensionality of the dataset from 784 to a dimension k.
For this portion you may use your languages eigenvector functions or its pca functions.
You don’t need to apply power iteration. For some different values of k, do the following
i. Determine the fraction of the total variance captured by the k-dimensional PCA.
ii. In the file mnist_intro the function show_image displays the image given a
vector of pixels. (For example, if the vector v contains the 784 pixels of a particula
image, then show_image(v) will display the image.) For each value of k, compute
the projection of the image (i.e. 784 dimensional vector) onto the principle compo-
nents. (The projected image will still be a 784 dimensional vector, but it will have
k pca coefficients; one for each principle component.) Then use show_image(v)
to compare the original image to the projected image. For what k, can you begin
to discern the number in the projected image?
What value of k do you think captures the dataset well?
1
(c) Given your results in (b), choose a dimension k and reduce the dataset from 784 di-
mensions to k dimensions. Then, build a classifier based on the k dimensional dataset.
Fit the logistic regression to the whole dataset using a stochastic gradient approach as
discussed in the YouTube video (mentioned above). Use the mtest.csv dataset to
test the accuracy of your dataset. Comment on the time needed to compute the logistic
egression and its accuracy relative to what you found in hw 8.
2