Homework # 9 Note: We didn’t quite get through splines in Lecture 10, but I think we did enough to allow you to at least start the problem below. We will finish spline interpolations in Lecture 11....

1 answer below »

Homework # 9
Note: We didn’t quite get through splines in Lecture 10, but
I think we did enough to allow you to at least start the problem
elow. We will finish spline interpolations in Lecture 11.
Sauer Reading:
• Section 4.3.1 discusses Gram-Schmidt and the QR decomposi-
tion. (Sections 4.3.2 and 4.3.3 discuss other methods to produce
orthonormal bases, but we will not discuss these methods.)
• Section 3.4 covers splines. Sauer uses a slightly different pa-
ameterization than we did, see his eqn XXXXXXXXXXIn his param-
eterization, f(xi) = yi is implicit. I find our paramaterization,
i.e. Si(x) = aix
3 + bix
2 + cix+ di, more intuitive.
Problems:
1. This problem provides an example of how interpolation can be
used. The attached spreadsheet provides life expectancy data
for the US population. The second column gives the probability
of death for the given age. So, for example, the probability that
a person between the ages of 20 and 21 dies is XXXXXXXXXX.
Suppose a 40 year old decides to buy life insurance. The 40 yea
old will make monthly payments of $200 every month until
death. In this problem we will consider the worth of these
payments, a quantity of interest to the insurance company. The
payoff upon death will not be considered in this problem. If we
assume (continuous time) interest rates of 5% and let m be the
number of months past age 40 that the person lives, then the
present value of the payments (how much future payments are
worth in today’s dollars) is,
PV =
m∑
i=1
200e−.05i/12 (1)
Our goal is to determine the average of PV, in other words
E[PV]. For the insurance company, this is one way to measure
the revenue
ought in by the policy. The difficulty is that ou
data is yearly, while payments are made monthly and people
do not always die at the beginning of the month.
1
(a) Let L(t) be the probability the 40 year old lives past the
age 40 + t where t is any positive real number. Estimate
L(t) by first considering t = 0, 1, 2, XXXXXXXXXXThese values of
L(t) can be computed using the spreadsheet data. (Fo
example, for the 40 year old to live to 42, they must not
die between the ages 40 − 41 and 41 − 42). For other t
values, interpolate using a cubic spline. In R you can use
the spline and splinefun commands to construct cubic
splines, see the help documentation. Graph the interpo-
lating cubic spline of L(t) and include the datapoints, i.e.
L(t) for t = 0, 1, . . . ..
(b) Explain why the expected (average) present value of the
payments is given by
E[PV ] =
∞∑
i=1
200L(i/12)e−.05i/12 (2)
In practice we can’t sum to∞, choose an appropriate cutoff
and calculate E[PV].
2. Below, let A be an n× k matrix. Define the span of A, written
span(A), as the span of the column vectors of A. In class we
discussed Gram-Schmidt (GS) orthogonalization. Here I just
want you to go through and finish the arguments I made in
class.
(a) Given a matrix A, write down the GS iteration that will
produce an orthogonal matrix Q with span(Q) = span(A).
(b) Prove that the GS iteration you wrote down in (a) produces
orthonormal vectors with the co
ect span (see Sauer if you
get stuck).
(c) Write an R function GramSchmidt(A) which returns the
matrix Q in (a). Check that your function works and com-
pare to the result of using R’s qr function for some non-
trivial choice of A.
3. This problem will provide some examples intended to help you
understand the condition number of a matrix and how it relates
to the determinant, which is more commonly introduced as a
measure of matrix invertibility.
2
(a) Let Q be an n×n orthonormal matrix.. Show that ‖Qx‖ =
‖x‖. (Hint write down what ‖Qx‖2 is as dot product).
Show then that κ(Q) = 1. (Orthonormal matrices achieve
the best possible condition number.)
(b) Consider the following matrix:
M =
(
a a cos(θ)
0 a sin(θ)
)
(3)
where a ∈ R and θ ∈ [0, π/2]. Notice that as θ → 0,
the two columns of M become closer to being linearly de-
pendent but that a simply multiplies both columns. The
determinant and condition number both provide informa-
tion on the columns forming the matrix. How does a affect
the determinant and the condition number? How does θ
affect the determinant and the condition number? Explain
why the condition number is a better measure of linea
independence than the determinant.
3

Tb 2
    Table I. Life table for the total population: United States, 2003
        Probablity
        of dying
        between
        ages x to x+1
    Age    qx
    0-1     XXXXXXXXXX
    1-2     XXXXXXXXXX
    2-3     XXXXXXXXXX
    3-4     XXXXXXXXXX
    4-5     XXXXXXXXXX
    5-6     XXXXXXXXXX
    6-7     XXXXXXXXXX
    7-8     XXXXXXXXXX
    8-9     XXXXXXXXXX
    9-10     XXXXXXXXXX
    10-11     XXXXXXXXXX
    11-12     XXXXXXXXXX
    12-13     XXXXXXXXXX
    13-14     XXXXXXXXXX
    14-15     XXXXXXXXXX
    15-16     XXXXXXXXXX
    16-17     XXXXXXXXXX
    17-18     XXXXXXXXXX
    18-19     XXXXXXXXXX
    19-20     XXXXXXXXXX
    20-21     XXXXXXXXXX
    21-22     XXXXXXXXXX
    22-23     XXXXXXXXXX
    23-24     XXXXXXXXXX
    24-25     XXXXXXXXXX
    25-26     XXXXXXXXXX
    26-27     XXXXXXXXXX
    27-28     XXXXXXXXXX
    28-29     XXXXXXXXXX
    29-30     XXXXXXXXXX
    30-31     XXXXXXXXXX
    31-32     XXXXXXXXXX
    32-33     XXXXXXXXXX
    33-34     XXXXXXXXXX
    34-35     XXXXXXXXXX
    35-36     XXXXXXXXXX
    36-37     XXXXXXXXXX
    37-38     XXXXXXXXXX
    38-39     XXXXXXXXXX
    39-40     XXXXXXXXXX
    40-41     XXXXXXXXXX
    41-42     XXXXXXXXXX
    42-43     XXXXXXXXXX
    43-44     XXXXXXXXXX
    44-45     XXXXXXXXXX
    45-46     XXXXXXXXXX
    46-47     XXXXXXXXXX
    47-48     XXXXXXXXXX
    48-49     XXXXXXXXXX
    49-50     XXXXXXXXXX
    50-51     XXXXXXXXXX
    51-52     XXXXXXXXXX
    52-53     XXXXXXXXXX
    53-54     XXXXXXXXXX
    54-55     XXXXXXXXXX
    55-56     XXXXXXXXXX
    56-57     XXXXXXXXXX
    57-58     XXXXXXXXXX
    58-59     XXXXXXXXXX
    59-60     XXXXXXXXXX
    60-61     XXXXXXXXXX
    61-62     XXXXXXXXXX
    62-63     XXXXXXXXXX
    63-64     XXXXXXXXXX
    64-65     XXXXXXXXXX
    65-66     XXXXXXXXXX
    66-67     XXXXXXXXXX
    67-68     XXXXXXXXXX
    68-69     XXXXXXXXXX
    69-70     XXXXXXXXXX
    70-71     XXXXXXXXXX
    71-72     XXXXXXXXXX
    72-73     XXXXXXXXXX
    73-74     XXXXXXXXXX
    74-75     XXXXXXXXXX
    75-76     XXXXXXXXXX
    76-77     XXXXXXXXXX
    77-78     XXXXXXXXXX
    78-79     XXXXXXXXXX
    79-80     XXXXXXXXXX
    80-81     XXXXXXXXXX
    81-82     XXXXXXXXXX
    82-83     XXXXXXXXXX
    83-84     XXXXXXXXXX
    84-85     XXXXXXXXXX
    85-86     XXXXXXXXXX
    86-87     XXXXXXXXXX
    87-88     XXXXXXXXXX
    88-89     XXXXXXXXXX
    89-90     XXXXXXXXXX
    90-91     XXXXXXXXXX
    91-92     XXXXXXXXXX
    92-93     XXXXXXXXXX
    93-94     XXXXXXXXXX
    94-95     XXXXXXXXXX
    95-96     XXXXXXXXXX
    96-97     XXXXXXXXXX
    97-98     XXXXXXXXXX
    98-99     XXXXXXXXXX
    99-100     XXXXXXXXXX
    100 and over     XXXXXXXXXX

Homework 10
Reading
• Required: View https:
www.youtube.com/watch?v=k3AiUhwHQ28. This is a lecture by
Suvrit Sra. He is guest lecturing in Gil Strang’s MIT course on computational methods
in machine learning. View the whole lecture as it connects to many themes we have been
discussing.
• Optional: Sauer, Sections 5.1.1, XXXXXXXXXXDifferentiation using finite difference approximations.
Problems
1. Consider f(x) = ex. Note that f ′(0) = 1. Consider the finite difference:
f(0 + h) − f(0)
h
. (1)
For h = 10i with i = −20,−19,−18, . . . ,−1, 0 calculate the finite differences. For each h,
determine how many digits in the finite difference estimate are co
ect (you know the true
value of the derivative is 1). Note that XXXXXXXXXXis co
ect up to 4 digits since XXXXXXXXXX · · · = 1.
Explain your results given finite differences and floating point e
or. DON’T FORGET TO
SET options(digits=16) so you can see 16 digits.
2. Consider the MNIST dataset from homework 8. Recall, in that homework, we used a logistic
egression in 784 dimensions to build a classifier for the number 3. Here, we will use PCA to
visualize and dimensionaly reduce the dataset.
(a) In order to visualize the dataset, apply a two-dimensional PCA to the dataset and
plot the coeffecients for the first two principle components. Use orthogonalized powe
iteration to compute the two principle components yourself. (Don’t forget to subtract
off the mean!) Color the points according to the number represented by the image in
the sample, i.e. the value given in the first column of mtrain.csv. (You can use the
first 1000 rows since plotting 60, 000 points takes a while.)
(b) Apply the PCA to reduce the dimensionality of the dataset from 784 to a dimension k.
For this portion you may use your languages eigenvector functions or its pca functions.
You don’t need to apply power iteration. For some different values of k, do the following
i. Determine the fraction of the total variance captured by the k-dimensional PCA.
ii. In the file mnist_intro the function show_image displays the image given a
vector of pixels. (For example, if the vector v contains the 784 pixels of a particula
image, then show_image(v) will display the image.) For each value of k, compute
the projection of the image (i.e. 784 dimensional vector) onto the principle compo-
nents. (The projected image will still be a 784 dimensional vector, but it will have
k pca coefficients; one for each principle component.) Then use show_image(v)
to compare the original image to the projected image. For what k, can you begin
to discern the number in the projected image?
What value of k do you think captures the dataset well?
1
(c) Given your results in (b), choose a dimension k and reduce the dataset from 784 di-
mensions to k dimensions. Then, build a classifier based on the k dimensional dataset.
Fit the logistic regression to the whole dataset using a stochastic gradient approach as
discussed in the YouTube video (mentioned above). Use the mtest.csv dataset to
test the accuracy of your dataset. Comment on the time needed to compute the logistic
egression and its accuracy relative to what you found in hw 8.
2

hw-9-2dz4grnr.pdf us-life-expectancy-2003-qxcejeoh.xls hw10-vxd45kq5-okkljzpv.pdf

Answered 2 days After Nov 11, 2021

Solution

Rahul answered on Nov 13 2021

130 Votes

SOLUTION.PDF

Homework # 9 Note: We didn’t quite get through splines in Lecture 10, but I think we did enough to allow you to at least start the problem below. We will finish spline interpolations in Lecture 11....

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment