Great Deal! Get Instant \$25 FREE in Account on First Order + 10% Cashback on Every Order Order Now

# COMP 3007 Final Take-home - Page 2 of 7 Due: TBD - See wall Part 1: The method of Gradient Descent 2.1 Goal The goal of this project is to find the zeroes of a certain class of function using...

COMP 3007 Final Take-home - Page 2 of 7 Due: TBD - See wall
Part 1: The method of Gradient Descent
2.1 Goal
The goal of this project is to find the zeroes of a certain class of function using (indirectly)
an algorithm known as the method of gradient descent.
2.2 Background: Linear Alge
a
This project will require very basic linear alge
a - no more than matrix multiplication.
Consider the following matrix equation:
f(x, y) =
(
3 1
2 −1
)(
x
y
)

(
−1
1
)
=
(
3x+ y + 1
2x− y − 1
)
If we define A :=
(
3 1
2 −1
)
, ~x :=
(
x
y
)
, and ~b :=
(
−1
1
)
then we can write this expression
as A~x−~b. Now observe that this expression is functionally equivalent to the following:
f(x, y) = 〈3x+ y + 1, 2x− y − 1〉
Thus there are two ways to write the function f(x, y) - as a either a matrix equation, or as
a typical multivariate function. Symbolab is a great tool if you are unfamiliar with matrix
multiplication. Click here to see the matrix multiplication above done in symbolab. Also,
while we can’t graph f(x, y) fully, we have some tools to analyze this function: link.
The motivation behind this project comes from an important topic in linear alge
a -
solving systems of equations. For example, we might be interested in know for which x, y the
following is true:
3x+ y = −1
2x− y = 1
This is equivalent to asking when f(x, y) = 〈0, 0〉 or equivalently A~x−~b = ~0.
Recall that for a multivariate function h(x, y) from Rn → R, we can construct the gradient
of h, denoted ∇h. When n = 2, the gradient of h : R2 → R is:
∇h = ∇h(x, y) =

∂h
∂x
,
∂h
∂y

In particular, we saw this indirectly when taking taking a directional derivative of h(x, y).
Notice that the output of h(x, y) is a single number, whereas ∇h is a vector with two
components. There is another important fact about gradients which is illustrated in this
www.desmos.com/calculato
sduapnehxo. This illustration shows the level
sets (i.e. preimages!!) of the function h(x, y) = x2 + xy + y2. It also contains a movable
point. I have also graphed the gradient of h(x, y) (which is a vector!) and translated it to
this movable point. You can change which function we take the level sets of in the link, but
you have to change both functions it in the top 2 lines. here is another example.
https:
www.symbolab.com/solve
step-by-step/%5Cbegin%7Bpmatrix%7D3%261%5C%5C%20%202%26-1%5Cend%7Bpmatrix%7D%5Cbegin%7Bpmatrix%7Dx%5C%5C%20%20y%5Cend%7Bpmatrix%7D-%5Cbegin%7Bpmatrix%7D-1%5C%5C%201%5Cend%7Bpmatrix%7D
https:
www.desmos.com/calculato
wovmgdw3om
https:
www.desmos.com/calculato
sduapnehxo
https:
www.desmos.com/calculato
azt4uaopy5
Rectangle
Rectangle
COMP 3007 Final Take-home - Page 3 of 7 Due: TBD - See wall
(i) Experiment with the the desmos link given in section 2.3 above, trying out a variety of
functions. For each function you choose, move the gradient along one or more of the
level sets. What relationships do you observe between the gradient and the level sets of
the function?
(ii) Now, using any example you chose or the one provided, drag the gradient to a point on
some level set, and zoom in to that point until the level set looks straight. What would
happen on the surface above this point if we move slightly in the direction perpendicula
(iii) Using the explanation you gave in the previous step, explain why the gradient points in
the direction in which f(x, y) increases the fastest.
2.4.2 Set up
(iv) Now recall the function f(x, y) given in section 2.2 above,
f(x, y) = 〈3x+ y + 1, 2x− y − 1〉 = A~x−~
Why are we unable to take the gradient of f(x, y)?
(v) Using the fact that the length of a vector is given by ||〈a, b〉|| =

a2 + b2, compute
||f(x, y)||2 =: L(x, y). Note that your answer should be a function L : R2 → R.
(vi) Now compute ∇||f(x, y)||2 = ∇L(x, y). Note that your answer should be a function
∇L : R2 → R2.
(vii) Now compute AT (A~x−~b). Here AT is the transpose of A, so AT =
(
3 2
1 −1
)
. Again,
if you are unfamiliar with matrix multiplication, the link above to symbolab is a great
tool to ca
y out this calculation. Your answer should be a 2× 1 matrix (vector).
(viii) Compare the results from the previous two steps. Use this to guess the general matrix
formula for the gradient of the function ||A~x −~b||2. That is, complete the following
equation:
∇||A~x−~b||2 =???
2.4.3 Investigating the method of Gradient Descent
(ix) Now consider the general formula for the method of gradient descent for a function
g(x, y), or more formally g : R2 → R:
~an+1 = ~an − η∇g(~an)
Rectangle
Rectangle
COMP 3007 Final Take-home - Page 4 of 7 Due: TBD - See wall
Here η is a step size, and we must start with an initial location of ~a0 =
(
x0
y0
)
. The
method of gradient descent is an algorithm that generates a sequence of points that,
under the appropriate conditions, will converge to the location of a local minimum of
the function g. Give a short explanation for why this algorithm will find the location of
the minimum. You might mention what problems one might encounter. For example,
what can go wrong with η? Give as much detail as possible. It would be a great idea
to include a picture to show what this process looks like.
(x) Provided is Python code to implement the method of gradient descent for the function
L(x, y), using the initial point
(
.5
−.5
)
and a step size of η = .01. Summarize the results
of the output. The code can be run here if you are not familiar with Python.
(xi) Turning our attention back to the functions f(x, y) and L(x, y) = ||f(x, y)||2, explain
why we can’t use the method of gradient descent on f(x, y), but why we can use it on
L(x, y).
(xii) Our goal with this project is to find the (x, y) pair(s) that make the function f(x, y) =
〈0, 0〉. Explain why the results of applying the method of gradient descent on L(x, y)
gives us just that.
(xiii) Use the results of the Python code to estimate a point (x, y) that makes f(x, y) = 0.
Verify that your guess makes f(x, y) = 0.
(xiv) (bonus) What would have changed if our function did not have a solution? What would
have changed in the steps above and what would have remained the same?
https:
epl.it/join/sbpegyhx-
endtgerics
Rectangle
Rectangle
COMP 3007 Final Take-home - Page 5 of 7 Due: TBD - See wall
# Please note t ha t t h i s code i s minimal and i s on ly meant f o r demonstrat ion .
#I t i s c o r r e c t mathemat ica l ly , but not what I would use in a r e a l s i t u a t i o n .
# I t con ta ins no e
or checking , and the f unc t i on s cou ld ( shou ld )
#be wr i t t en in a g en e r a l i z e d f a sh i on . I encourage you to r ewr i t e t h i s
#more f o rma l l y i f you are f am i l i a r wi th programming .
import math
import numpy as np
def vecLength ( v ) :
tempSum = 0.0
for x in v :
tempSum += x [ 0 ]∗∗2
eturn math . s q r t (tempSum)
def f (A, b , x , y ) :
temp = np . matrix ( [ [ x ] , [ y ] ] )
eturn np . matrix (np . dot (A, temp)−b)
def grad (A, b , x , y ) :
vec = np . matrix ( [ [ x ] , [ y ] ] )
eturn np . matrix (2∗np . dot (np . t ranspose (A) , np . dot (A, vec)−b ) )
def main ( ) :
eta = .01
A = np . matrix ( [ [ 3 , 1 ] , [ 2 , − 1 ] ] )
= np . matrix ( [ [ − 1 ] , [ 1 ] ] )
x = −1
y = 1
step = 0
while vecLength ( grad (A, b , x , y ) ) > . 0 1 :
s tep += 1
x = (np . matrix ( [ [ x ] , [ y ] ] ) − eta ∗grad (A, b , x , y ) ) . item (0)
y = (np . matrix ( [ [ x ] , [ y ] ] ) − eta ∗grad (A, b , x , y ) ) . item (1)
print ( ” (x , y ) = ( ” ,x , ” , ” ,y , ” ) ” )
print ( ”The magnitude squared ( l ength ) o f the output o f f )
p r i n t ( , i . e . , | | f (x , y ) | |ˆ2=L(x , y ) at s tep ” , step , ” i s ” , vecLength ( f (A, b , x , y ) )∗∗2)
print ( ”The magnitude ( l ength ) o f the g rad i ent at s tep ” , step , ” i s ” , vecLength ( grad (A, b , x , y ) ) , ’ \n ’ )
main ( )
Rectangle
Answered 2 days AfterMar 11, 2022

## Solution

Lalit answered on Mar 13 2022
Ans 1:
The set of points in the domain of a function where the function is constant is another way to
visualize a 3D function in a 2D domain. As the constant in level sets changes the level curve
of the function providing the gradient vector to change. The change in gradient vector length
was observed with moving the gradient along different level sets.
The minimum value that ?�⃗� ?(? ) can achieve is −∥ ??(? ) ∥, and this occurs when unit
vector ? points in the opposite direction from ∇?(? ), in the direction of −??(? ).
Thus, −??(? ) points in the direction of greatest decrease.
Ans 2:
It was observed that the gradient vector is perpendicular to all the level curves. The level curve
is simply a cross section of the graphs of function ?(?, ?). Moving perpendicular to the gradient
vector will act as normal to the point of gradient. Gradient descent works in spaces of any
number of dimensions, even in infinite-dimensional ones. In the latter case, the search space is
typically a function space, and one calculates the Fréchet derivative of the functional to be
minimized to determine the descent direction.
Figure 1: Variation of gradient with moving the point of value.
https:
en.wikipedia.org/wiki/Function_space
https:
en.wikipedia.org/wiki/Fr%C3%A9chet_derivative
Ans3:
From ?�⃗� ?(? ) =∥ ??(? ) ∥. ?⃗⃗ ⃗ , we can see that ? is the direction vector of the gradient. Now
as the value of ??(? ) increases which is a scalar quantity with the increase in value of the level
set. Thus, the gradient points in the increasing direction in which ?(? ) increases the fastest.
Ans4:
The function ?(?) =< 3? + ? + 1,2? − ? − 1 > = ?�̅� − �̅� represent a relation a point vector
in a space with one-to-one relation between ℝ2 → ℝ2. The gradient of a point space vector is
not defined therefore we are unable to take the gradient of ?(?, ?).
Ans 5:
?(?, ?) = < 3? + ? + 1,2? − ? − 1...
SOLUTION.PDF