Homework # 9
Note: We didn’t quite get through splines in Lecture 10, but
I think we did enough to allow you to at least start the problem
elow. We will finish spline interpolations in Lecture 11.
Sauer Reading:
• Section 4.3.1 discusses Gram-Schmidt and the QR decomposi-
tion. (Sections 4.3.2 and 4.3.3 discuss other methods to produce
orthonormal bases, but we will not discuss these methods.)
• Section 3.4 covers splines. Sauer uses a slightly different pa-
ameterization than we did, see his eqn XXXXXXXXXXIn his param-
eterization, f(xi) = yi is implicit. I find our paramaterization,
i.e. Si(x) = aix
3 + bix
2 + cix+ di, more intuitive.
Problems:
1. This problem provides an example of how interpolation can be
used. The attached spreadsheet provides life expectancy data
for the US population. The second column gives the probability
of death for the given age. So, for example, the probability that
a person between the ages of 20 and 21 dies is XXXXXXXXXX.
Suppose a 40 year old decides to buy life insurance. The 40 yea
old will make monthly payments of $200 every month until
death. In this problem we will consider the worth of these
payments, a quantity of interest to the insurance company. The
payoff upon death will not be considered in this problem. If we
assume (continuous time) interest rates of 5% and let m be the
number of months past age 40 that the person lives, then the
present value of the payments (how much future payments are
worth in today’s dollars) is,
PV =
m∑
i=1
200e−.05i/12 (1)
Our goal is to determine the average of PV, in other words
E[PV]. For the insurance company, this is one way to measure
the revenue
ought in by the policy. The difficulty is that ou
data is yearly, while payments are made monthly and people
do not always die at the beginning of the month.
1
(a) Let L(t) be the probability the 40 year old lives past the
age 40 + t where t is any positive real number. Estimate
L(t) by first considering t = 0, 1, 2, XXXXXXXXXXThese values of
L(t) can be computed using the spreadsheet data. (Fo
example, for the 40 year old to live to 42, they must not
die between the ages 40 − 41 and 41 − 42). For other t
values, interpolate using a cubic spline. In R you can use
the spline and splinefun commands to construct cubic
splines, see the help documentation. Graph the interpo-
lating cubic spline of L(t) and include the datapoints, i.e.
L(t) for t = 0, 1, . . . ..
(b) Explain why the expected (average) present value of the
payments is given by
E[PV ] =
∞∑
i=1
200L(i/12)e−.05i/12 (2)
In practice we can’t sum to∞, choose an appropriate cutoff
and calculate E[PV].
2. Below, let A be an n× k matrix. Define the span of A, written
span(A), as the span of the column vectors of A. In class we
discussed Gram-Schmidt (GS) orthogonalization. Here I just
want you to go through and finish the arguments I made in
class.
(a) Given a matrix A, write down the GS iteration that will
produce an orthogonal matrix Q with span(Q) = span(A).
(b) Prove that the GS iteration you wrote down in (a) produces
orthonormal vectors with the co
ect span (see Sauer if you
get stuck).
(c) Write an R function GramSchmidt(A) which returns the
matrix Q in (a). Check that your function works and com-
pare to the result of using R’s qr function for some non-
trivial choice of A.
3. This problem will provide some examples intended to help you
understand the condition number of a matrix and how it relates
to the determinant, which is more commonly introduced as a
measure of matrix invertibility.
2
(a) Let Q be an n×n orthonormal matrix.. Show that ‖Qx‖ =
‖x‖. (Hint write down what ‖Qx‖2 is as dot product).
Show then that κ(Q) = 1. (Orthonormal matrices achieve
the best possible condition number.)
(b) Consider the following matrix:
M =
(
a a cos(θ)
0 a sin(θ)
)
(3)
where a ∈ R and θ ∈ [0, π/2]. Notice that as θ → 0,
the two columns of M become closer to being linearly de-
pendent but that a simply multiplies both columns. The
determinant and condition number both provide informa-
tion on the columns forming the matrix. How does a affect
the determinant and the condition number? How does θ
affect the determinant and the condition number? Explain
why the condition number is a better measure of linea
independence than the determinant.
3
Tb 2
Table I. Life table for the total population: United States, 2003
Probablity
of dying
between
ages x to x+1
Age qx
0-1 XXXXXXXXXX
1-2 XXXXXXXXXX
2-3 XXXXXXXXXX
3-4 XXXXXXXXXX
4-5 XXXXXXXXXX
5-6 XXXXXXXXXX
6-7 XXXXXXXXXX
7-8 XXXXXXXXXX
8-9 XXXXXXXXXX
9-10 XXXXXXXXXX
10-11 XXXXXXXXXX
11-12 XXXXXXXXXX
12-13 XXXXXXXXXX
13-14 XXXXXXXXXX
14-15 XXXXXXXXXX
15-16 XXXXXXXXXX
16-17 XXXXXXXXXX
17-18 XXXXXXXXXX
18-19 XXXXXXXXXX
19-20 XXXXXXXXXX
20-21 XXXXXXXXXX
21-22 XXXXXXXXXX
22-23 XXXXXXXXXX
23-24 XXXXXXXXXX
24-25 XXXXXXXXXX
25-26 XXXXXXXXXX
26-27 XXXXXXXXXX
27-28 XXXXXXXXXX
28-29 XXXXXXXXXX
29-30 XXXXXXXXXX
30-31 XXXXXXXXXX
31-32 XXXXXXXXXX
32-33 XXXXXXXXXX
33-34 XXXXXXXXXX
34-35 XXXXXXXXXX
35-36 XXXXXXXXXX
36-37 XXXXXXXXXX
37-38 XXXXXXXXXX
38-39 XXXXXXXXXX
39-40 XXXXXXXXXX
40-41 XXXXXXXXXX
41-42 XXXXXXXXXX
42-43 XXXXXXXXXX
43-44 XXXXXXXXXX
44-45 XXXXXXXXXX
45-46 XXXXXXXXXX
46-47 XXXXXXXXXX
47-48 XXXXXXXXXX
48-49 XXXXXXXXXX
49-50 XXXXXXXXXX
50-51 XXXXXXXXXX
51-52 XXXXXXXXXX
52-53 XXXXXXXXXX
53-54 XXXXXXXXXX
54-55 XXXXXXXXXX
55-56 XXXXXXXXXX
56-57 XXXXXXXXXX
57-58 XXXXXXXXXX
58-59 XXXXXXXXXX
59-60 XXXXXXXXXX
60-61 XXXXXXXXXX
61-62 XXXXXXXXXX
62-63 XXXXXXXXXX
63-64 XXXXXXXXXX
64-65 XXXXXXXXXX
65-66 XXXXXXXXXX
66-67 XXXXXXXXXX
67-68 XXXXXXXXXX
68-69 XXXXXXXXXX
69-70 XXXXXXXXXX
70-71 XXXXXXXXXX
71-72 XXXXXXXXXX
72-73 XXXXXXXXXX
73-74 XXXXXXXXXX
74-75 XXXXXXXXXX
75-76 XXXXXXXXXX
76-77 XXXXXXXXXX
77-78 XXXXXXXXXX
78-79 XXXXXXXXXX
79-80 XXXXXXXXXX
80-81 XXXXXXXXXX
81-82 XXXXXXXXXX
82-83 XXXXXXXXXX
83-84 XXXXXXXXXX
84-85 XXXXXXXXXX
85-86 XXXXXXXXXX
86-87 XXXXXXXXXX
87-88 XXXXXXXXXX
88-89 XXXXXXXXXX
89-90 XXXXXXXXXX
90-91 XXXXXXXXXX
91-92 XXXXXXXXXX
92-93 XXXXXXXXXX
93-94 XXXXXXXXXX
94-95 XXXXXXXXXX
95-96 XXXXXXXXXX
96-97 XXXXXXXXXX
97-98 XXXXXXXXXX
98-99 XXXXXXXXXX
99-100 XXXXXXXXXX
100 and over XXXXXXXXXX
Homework 10
Reading
• Required: View https:
www.youtube.com/watch?v=k3AiUhwHQ28. This is a lecture by
Suvrit Sra. He is guest lecturing in Gil Strang’s MIT course on computational methods
in machine learning. View the whole lecture as it connects to many themes we have been
discussing.
• Optional: Sauer, Sections 5.1.1, XXXXXXXXXXDifferentiation using finite difference approximations.
Problems
1. Consider f(x) = ex. Note that f ′(0) = 1. Consider the finite difference:
f(0 + h) − f(0)
h
. (1)
For h = 10i with i = −20,−19,−18, . . . ,−1, 0 calculate the finite differences. For each h,
determine how many digits in the finite difference estimate are co
ect (you know the true
value of the derivative is 1). Note that XXXXXXXXXXis co
ect up to 4 digits since XXXXXXXXXX · · · = 1.
Explain your results given finite differences and floating point e
or. DON’T FORGET TO
SET options(digits=16) so you can see 16 digits.
2. Consider the MNIST dataset from homework 8. Recall, in that homework, we used a logistic
egression in 784 dimensions to build a classifier for the number 3. Here, we will use PCA to
visualize and dimensionaly reduce the dataset.
(a) In order to visualize the dataset, apply a two-dimensional PCA to the dataset and
plot the coeffecients for the first two principle components. Use orthogonalized powe
iteration to compute the two principle components yourself. (Don’t forget to subtract
off the mean!) Color the points according to the number represented by the image in
the sample, i.e. the value given in the first column of mtrain.csv. (You can use the
first 1000 rows since plotting 60, 000 points takes a while.)
(b) Apply the PCA to reduce the dimensionality of the dataset from 784 to a dimension k.
For this portion you may use your languages eigenvector functions or its pca functions.
You don’t need to apply power iteration. For some different values of k, do the following
i. Determine the fraction of the total variance captured by the k-dimensional PCA.
ii. In the file mnist_intro the function show_image displays the image given a
vector of pixels. (For example, if the vector v contains the 784 pixels of a particula
image, then show_image(v) will display the image.) For each value of k, compute
the projection of the image (i.e. 784 dimensional vector) onto the principle compo-
nents. (The projected image will still be a 784 dimensional vector, but it will have
k pca coefficients; one for each principle component.) Then use show_image(v)
to compare the original image to the projected image. For what k, can you begin
to discern the number in the projected image?
What value of k do you think captures the dataset well?
1
(c) Given your results in (b), choose a dimension k and reduce the dataset from 784 di-
mensions to k dimensions. Then, build a classifier based on the k dimensional dataset.
Fit the logistic regression to the whole dataset using a stochastic gradient approach as
discussed in the YouTube video (mentioned above). Use the mtest.csv dataset to
test the accuracy of your dataset. Comment on the time needed to compute the logistic
egression and its accuracy relative to what you found in hw 8.
2