COMP 3007 Final Take-home - Page 2 of 7 Due: TBD - See wall
Part 1: The method of Gradient Descent
2.1 Goal
The goal of this project is to find the zeroes of a certain class of function using (indirectly)
an algorithm known as the method of gradient descent.
2.2 Background: Linear Alge
a
This project will require very basic linear alge
a - no more than matrix multiplication.
Consider the following matrix equation:
f(x, y) =
(
3 1
2 −1
)(
x
y
)
−
(
−1
1
)
=
(
3x+ y + 1
2x− y − 1
)
If we define A :=
(
3 1
2 −1
)
, ~x :=
(
x
y
)
, and ~b :=
(
−1
1
)
then we can write this expression
as A~x−~b. Now observe that this expression is functionally equivalent to the following:
f(x, y) = 〈3x+ y + 1, 2x− y − 1〉
Thus there are two ways to write the function f(x, y) - as a either a matrix equation, or as
a typical multivariate function. Symbolab is a great tool if you are unfamiliar with matrix
multiplication. Click here to see the matrix multiplication above done in symbolab. Also,
while we can’t graph f(x, y) fully, we have some tools to analyze this function: link.
The motivation behind this project comes from an important topic in linear alge
a -
solving systems of equations. For example, we might be interested in know for which x, y the
following is true:
3x+ y = −1
2x− y = 1
This is equivalent to asking when f(x, y) = 〈0, 0〉 or equivalently A~x−~b = ~0.
2.3 Background: The Gradient
Recall that for a multivariate function h(x, y) from Rn → R, we can construct the gradient
of h, denoted ∇h. When n = 2, the gradient of h : R2 → R is:
∇h = ∇h(x, y) =
〈
∂h
∂x
,
∂h
∂y
〉
In particular, we saw this indirectly when taking taking a directional derivative of h(x, y).
Notice that the output of h(x, y) is a single number, whereas ∇h is a vector with two
components. There is another important fact about gradients which is illustrated in this
link: https:
www.desmos.com/calculato
sduapnehxo. This illustration shows the level
sets (i.e. preimages!!) of the function h(x, y) = x2 + xy + y2. It also contains a movable
point. I have also graphed the gradient of h(x, y) (which is a vector!) and translated it to
this movable point. You can change which function we take the level sets of in the link, but
you have to change both functions it in the top 2 lines. here is another example.
https:
www.symbolab.com/solve
step-by-step/%5Cbegin%7Bpmatrix%7D3%261%5C%5C%20%202%26-1%5Cend%7Bpmatrix%7D%5Cbegin%7Bpmatrix%7Dx%5C%5C%20%20y%5Cend%7Bpmatrix%7D-%5Cbegin%7Bpmatrix%7D-1%5C%5C%201%5Cend%7Bpmatrix%7D
https:
www.desmos.com/calculato
wovmgdw3om
https:
www.desmos.com/calculato
sduapnehxo
https:
www.desmos.com/calculato
azt4uaopy5
Rectangle
Rectangle
COMP 3007 Final Take-home - Page 3 of 7 Due: TBD - See wall
2.4 Gradient descent project example
2.4.1 Properties of the gradient
(i) Experiment with the the desmos link given in section 2.3 above, trying out a variety of
functions. For each function you choose, move the gradient along one or more of the
level sets. What relationships do you observe between the gradient and the level sets of
the function?
(ii) Now, using any example you chose or the one provided, drag the gradient to a point on
some level set, and zoom in to that point until the level set looks straight. What would
happen on the surface above this point if we move slightly in the direction perpendicula
to the gradient?
(iii) Using the explanation you gave in the previous step, explain why the gradient points in
the direction in which f(x, y) increases the fastest.
2.4.2 Set up
(iv) Now recall the function f(x, y) given in section 2.2 above,
f(x, y) = 〈3x+ y + 1, 2x− y − 1〉 = A~x−~
Why are we unable to take the gradient of f(x, y)?
(v) Using the fact that the length of a vector is given by ||〈a, b〉|| =
√
a2 + b2, compute
||f(x, y)||2 =: L(x, y). Note that your answer should be a function L : R2 → R.
(vi) Now compute ∇||f(x, y)||2 = ∇L(x, y). Note that your answer should be a function
∇L : R2 → R2.
(vii) Now compute AT (A~x−~b). Here AT is the transpose of A, so AT =
(
3 2
1 −1
)
. Again,
if you are unfamiliar with matrix multiplication, the link above to symbolab is a great
tool to ca
y out this calculation. Your answer should be a 2× 1 matrix (vector).
(viii) Compare the results from the previous two steps. Use this to guess the general matrix
formula for the gradient of the function ||A~x −~b||2. That is, complete the following
equation:
∇||A~x−~b||2 =???
2.4.3 Investigating the method of Gradient Descent
(ix) Now consider the general formula for the method of gradient descent for a function
g(x, y), or more formally g : R2 → R:
~an+1 = ~an − η∇g(~an)
Rectangle
Rectangle
COMP 3007 Final Take-home - Page 4 of 7 Due: TBD - See wall
Here η is a step size, and we must start with an initial location of ~a0 =
(
x0
y0
)
. The
method of gradient descent is an algorithm that generates a sequence of points that,
under the appropriate conditions, will converge to the location of a local minimum of
the function g. Give a short explanation for why this algorithm will find the location of
the minimum. You might mention what problems one might encounter. For example,
what can go wrong with η? Give as much detail as possible. It would be a great idea
to include a picture to show what this process looks like.
(x) Provided is Python code to implement the method of gradient descent for the function
L(x, y), using the initial point
(
.5
−.5
)
and a step size of η = .01. Summarize the results
of the output. The code can be run here if you are not familiar with Python.
(xi) Turning our attention back to the functions f(x, y) and L(x, y) = ||f(x, y)||2, explain
why we can’t use the method of gradient descent on f(x, y), but why we can use it on
L(x, y).
(xii) Our goal with this project is to find the (x, y) pair(s) that make the function f(x, y) =
〈0, 0〉. Explain why the results of applying the method of gradient descent on L(x, y)
gives us just that.
(xiii) Use the results of the Python code to estimate a point (x, y) that makes f(x, y) = 0.
Verify that your guess makes f(x, y) = 0.
(xiv) (bonus) What would have changed if our function did not have a solution? What would
have changed in the steps above and what would have remained the same?
https:
epl.it/join/sbpegyhx-
endtgerics
Rectangle
Rectangle
COMP 3007 Final Take-home - Page 5 of 7 Due: TBD - See wall
# Please note t ha t t h i s code i s minimal and i s on ly meant f o r demonstrat ion .
#I t i s c o r r e c t mathemat ica l ly , but not what I would use in a r e a l s i t u a t i o n .
# I t con ta ins no e
or checking , and the f unc t i on s cou ld ( shou ld )
#be wr i t t en in a g en e r a l i z e d f a sh i on . I encourage you to r ewr i t e t h i s
#more f o rma l l y i f you are f am i l i a r wi th programming .
import math
import numpy as np
def vecLength ( v ) :
tempSum = 0.0
for x in v :
tempSum += x [ 0 ]∗∗2
eturn math . s q r t (tempSum)
def f (A, b , x , y ) :
temp = np . matrix ( [ [ x ] , [ y ] ] )
eturn np . matrix (np . dot (A, temp)−b)
def grad (A, b , x , y ) :
vec = np . matrix ( [ [ x ] , [ y ] ] )
eturn np . matrix (2∗np . dot (np . t ranspose (A) , np . dot (A, vec)−b ) )
def main ( ) :
eta = .01
A = np . matrix ( [ [ 3 , 1 ] , [ 2 , − 1 ] ] )
= np . matrix ( [ [ − 1 ] , [ 1 ] ] )
x = −1
y = 1
step = 0
while vecLength ( grad (A, b , x , y ) ) > . 0 1 :
s tep += 1
x = (np . matrix ( [ [ x ] , [ y ] ] ) − eta ∗grad (A, b , x , y ) ) . item (0)
y = (np . matrix ( [ [ x ] , [ y ] ] ) − eta ∗grad (A, b , x , y ) ) . item (1)
print ( ” (x , y ) = ( ” ,x , ” , ” ,y , ” ) ” )
print ( ”The magnitude squared ( l ength ) o f the output o f f )
p r i n t ( , i . e . , | | f (x , y ) | |ˆ2=L(x , y ) at s tep ” , step , ” i s ” , vecLength ( f (A, b , x , y ) )∗∗2)
print ( ”The magnitude ( l ength ) o f the g rad i ent at s tep ” , step , ” i s ” , vecLength ( grad (A, b , x , y ) ) , ’ \n ’ )
main ( )
Rectangle