COMP 3007 Final Take-home - Page 2 of 7 Due: TBD - See wallPart 1: The method of Gradient Descent2.1...

Question

COMP 3007 Final Take-home - Page 2 of 7 Due: TBD - See wallPart 1: The method of Gradient Descent2.1 GoalThe goal of this project is to find the zeroes of a certain class of function using (indirectly)an algorithm known as the method of gradient descent.2.2 Background: Linear AlgeaThis project will require very basic linear algea - no more than matrix multiplication.Consider the following matrix equation:f(x, y) =(3 12 −1)(xy)−(−11)=(3x+ y + 12x− y − 1)If we define A :=(3 12 −1), ~x :=(xy), and ~b :=(−11)then we can write this expressionas A~x−~b. Now observe that this expression is functionally equivalent to the following:f(x, y) = 〈3x+ y + 1, 2x− y − 1〉Thus there are two ways to write the function f(x, y) - as a either a matrix equation, or asa typical multivariate function. Symbolab is a great tool if you are unfamiliar with matrixmultiplication. Click here to see the matrix multiplication above done in symbolab. Also,while we can’t graph f(x, y) fully, we have some tools to analyze this function: link.The motivation behind this project comes from an important topic in linear algea -solving systems of equations. For example, we might be interested in know for which x, y thefollowing is true:3x+ y = −12x− y = 1This is equivalent to asking when f(x, y) = 〈0, 0〉 or equivalently A~x−~b = ~0.2.3 Background: The GradientRecall that for a multivariate function h(x, y) from Rn → R, we can construct the gradientof h, denoted ∇h. When n = 2, the gradient of h : R2 → R is:∇h = ∇h(x, y) =〈∂h∂x,∂h∂y〉In particular, we saw this indirectly when taking taking a directional derivative of h(x, y).Notice that the output of h(x, y) is a single number, whereas ∇h is a vector with twocomponents. There is another important fact about gradients which is illustrated in thislink: https:www.desmos.com/calculatosduapnehxo. This illustration shows the levelsets (i.e. preimages!!) of the function h(x, y) = x2 + xy + y2. It also contains a movablepoint. I have also graphed the gradient of h(x, y) (which is a vector!) and translated it tothis movable point. You can change which function we take the level sets of in the link, butyou have to change both functions it in the top 2 lines. here is another example.https:www.symbolab.com/solvestep-by-step/%5Cbegin%7Bpmatrix%7D3%261%5C%5C%20%202%26-1%5Cend%7Bpmatrix%7D%5Cbegin%7Bpmatrix%7Dx%5C%5C%20%20y%5Cend%7Bpmatrix%7D-%5Cbegin%7Bpmatrix%7D-1%5C%5C%201%5Cend%7Bpmatrix%7Dhttps:www.desmos.com/calculatowovmgdw3omhttps:www.desmos.com/calculatosduapnehxohttps:www.desmos.com/calculatoazt4uaopy5RectangleRectangleCOMP 3007 Final Take-home - Page 3 of 7 Due: TBD - See wall2.4 Gradient descent project example2.4.1 Properties of the gradient(i) Experiment with the the desmos link given in section 2.3 above, trying out a variety offunctions. For each function you choose, move the gradient along one or more of thelevel sets. What relationships do you observe between the gradient and the level sets ofthe function?(ii) Now, using any example you chose or the one provided, drag the gradient to a point onsome level set, and zoom in to that point until the level set looks straight. What wouldhappen on the surface above this point if we move slightly in the direction perpendiculato the gradient?(iii) Using the explanation you gave in the previous step, explain why the gradient points inthe direction in which f(x, y) increases the fastest.2.4.2 Set up(iv) Now recall the function f(x, y) given in section 2.2 above,f(x, y) = 〈3x+ y + 1, 2x− y − 1〉 = A~x−~Why are we unable to take the gradient of f(x, y)?(v) Using the fact that the length of a vector is given by ||〈a, b〉|| =√a2 + b2, compute||f(x, y)||2 =: L(x, y). Note that your answer should be a function L : R2 → R.(vi) Now compute ∇||f(x, y)||2 = ∇L(x, y). Note that your answer should be a function∇L : R2 → R2.(vii) Now compute AT (A~x−~b). Here AT is the transpose of A, so AT =(3 21 −1). Again,if you are unfamiliar with matrix multiplication, the link above to symbolab is a greattool to cay out this calculation. Your answer should be a 2× 1 matrix (vector).(viii) Compare the results from the previous two steps. Use this to guess the general matrixformula for the gradient of the function ||A~x −~b||2. That is, complete the followingequation:∇||A~x−~b||2 =???2.4.3 Investigating the method of Gradient Descent(ix) Now consider the general formula for the method of gradient descent for a functiong(x, y), or more formally g : R2 → R:~an+1 = ~an − η∇g(~an)RectangleRectangleCOMP 3007 Final Take-home - Page 4 of 7 Due: TBD - See wallHere η is a step size, and we must start with an initial location of ~a0 =(x0y0). Themethod of gradient descent is an algorithm that generates a sequence of points that,under the appropriate conditions, will converge to the location of a local minimum ofthe function g. Give a short explanation for why this algorithm will find the location ofthe minimum. You might mention what problems one might encounter. For example,what can go wrong with η? Give as much detail as possible. It would be a great ideato include a picture to show what this process looks like.(x) Provided is Python code to implement the method of gradient descent for the functionL(x, y), using the initial point(.5−.5)and a step size of η = .01. Summarize the resultsof the output. The code can be run here if you are not familiar with Python.(xi) Turning our attention back to the functions f(x, y) and L(x, y) = ||f(x, y)||2, explainwhy we can’t use the method of gradient descent on f(x, y), but why we can use it onL(x, y).(xii) Our goal with this project is to find the (x, y) pair(s) that make the function f(x, y) =〈0, 0〉. Explain why the results of applying the method of gradient descent on L(x, y)gives us just that.(xiii) Use the results of the Python code to estimate a point (x, y) that makes f(x, y) = 0.Verify that your guess makes f(x, y) = 0.(xiv) (bonus) What would have changed if our function did not have a solution? What wouldhave changed in the steps above and what would have remained the same?https:epl.it/join/sbpegyhx-endtgericsRectangleRectangleCOMP 3007 Final Take-home - Page 5 of 7 Due: TBD - See wall# Please note t ha t t h i s code i s minimal and i s on ly meant f o r demonstrat ion .#I t i s c o r r e c t mathemat ica l ly , but not what I would use in a r e a l s i t u a t i o n .# I t con ta ins no eor checking , and the f unc t i on s cou ld ( shou ld )#be wr i t t en in a g en e r a l i z e d f a sh i on . I encourage you to r ewr i t e t h i s#more f o rma l l y i f you are f am i l i a r wi th programming .import mathimport numpy as npdef vecLength ( v ) :tempSum = 0.0for x in v :tempSum += x [ 0 ]∗∗2eturn math . s q r t (tempSum)def f (A, b , x , y ) :temp = np . matrix ( [ [ x ] , [ y ] ] )eturn np . matrix (np . dot (A, temp)−b)def grad (A, b , x , y ) :vec = np . matrix ( [ [ x ] , [ y ] ] )eturn np . matrix (2∗np . dot (np . t ranspose (A) , np . dot (A, vec)−b ) )def main ( ) :eta = .01A = np . matrix ( [ [ 3 , 1 ] , [ 2 , − 1 ] ] ) = np . matrix ( [ [ − 1 ] , [ 1 ] ] )x = −1y = 1step = 0while vecLength ( grad (A, b , x , y ) ) > . 0 1 :s tep += 1x = (np . matrix ( [ [ x ] , [ y ] ] ) − eta ∗grad (A, b , x , y ) ) . item (0)y = (np . matrix ( [ [ x ] , [ y ] ] ) − eta ∗grad (A, b , x , y ) ) . item (1)print ( ” (x , y ) = ( ” ,x , ” , ” ,y , ” ) ” )print ( ”The magnitude squared ( l ength ) o f the output o f f )p r i n t ( , i . e . , | | f (x , y ) | |ˆ2=L(x , y ) at s tep ” , step , ” i s ” , vecLength ( f (A, b , x , y ) )∗∗2)print ( ”The magnitude ( l ength ) o f the g rad i ent at s tep ” , step , ” i s ” , vecLength ( grad (A, b , x , y ) ) , ’ 
 ’ )main ( )Rectangle

Lalit · Accepted Answer

Ans 1:  
The set of points in the domain of a function where the function is constant is another way to 
visualize a 3D function in a 2D domain. As the constant in level sets changes the level curve 
of the function providing the gradient vector to change. The change in gradient vector length 
was observed with moving the gradient along different level sets. 
The minimum value that ?�⃗�  ?(?  ) can achieve is −∥ ??(? ) ∥, and this occurs when unit 
vector ?  points in the opposite direction from ∇?(?  ), in the direction of −??(?  ). 
Thus, −??(?  ) points in the direction of greatest decrease.
Ans 2:  
It was observed that the gradient vector is perpendicular to all the level curves. The level curve 
is simply a cross section of the graphs of function ?(?, ?). Moving perpendicular to the gradient 
vector will act as normal to the point of gradient. Gradient descent works in spaces of any 
number of dimensions, even in infinite-dimensional ones. In the latter case, the search space is 
typically a function space, and one calculates the Fréchet derivative of the functional to be 
minimized to determine the descent direction.
Figure 1: Variation of gradient with moving the point of value.
https://en.wikipedia.org/wiki/Function_space
https://en.wikipedia.org/wiki/Fr%C3%A9chet_derivative
Ans3:  
From ?�⃗�  ?(?  ) =∥ ??(? ) ∥. ?⃗⃗  ⃗ ,  we can see that ?  is the direction vector of the gradient. Now 
as the value of ??(? ) increases which is a scalar quantity with the increase in value of the level 
set. Thus, the gradient points in the increasing direction in which ?(? ) increases the fastest.
Ans4:  
The function ?(?) = = ?�̅� − �̅� represent a relation a point vector 
in a space with one-to-one relation between ℝ2 → ℝ2. The gradient of a point space vector is 
not defined therefore we are unable to take the gradient of ?(?, ?).
Ans 5:

COMP 3007 Final Take-home - Page 2 of 7 Due: TBD - See wall Part 1: The method of Gradient Descent 2.1 Goal The goal of this project is to find the zeroes of a certain class of function using...

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment