5.3 Optimization Approaches: Gradient Descent and Ordinary Least Squares

Gradient Descent

Gradient Descent is an incredible optimization method courtesy of statistics. We're going to try and get a little intuition about it, but if you want more, there are some links at the end of this lesson.

Let's take the following graph:


We're going to try to find the minimum value of the graph.

We will use the term m to represent the slope at a particular point, 𝝰 to represent what we call the learning rate. The learning rate represents step size. You'll see what that means in a minute.

Gradient Descent is made up of this particular update rule:

Don't worry if this all feels disjointed, we're going to put it all together now.

Step 1.

Start with a random x-value.

Step 2.

For this random value, calculate the slope of the function there. This might be a bit of a strange concept - calculating the slope at one point - but this is done by Calculus. This is m.

Step 3. 𝝰

is the learning rate, and we set this beforehand. This will stay constant. Again, don't worry too much

about this, if you want to know more about it, use the links below.

Step 4.

Then we use the update rule, and change the x-value.

Step 5.

We then repeat Steps 2-4 again and again until a certain threshold.

Again, if you want to know more about that threshold, you can refer to the links below.

We will be applying the process of Gradient Descent to find optimal to minimize the cost function for extra-simple regression (). It's just the same principle for all the values of .

Ordinary Least Squares

We'll not go into the mechanics of how this method of Ordinary Least Squares works, but effectively it is a direct solution to the optimization problem for Simple Linear Regression. That means there's an equation for the optimal value of each parameter, which just needs to be calculated.

At any rate, within this course, we'll be using a library to deal with all of this stuff, so don't worry about the actual working of OLS.

Now that we've understood Gradient Descent and OLS, we'll go to work on our first implementation of Linear Regression.



Analytics Vidhya - (This article gets a bit more technical, Calculus background may be suggested)


Videos: 3Blue1Brown

Previous Section

Next Section

5.2 Simple Linear Regression


Copyright © 2021 Code 4 Tomorrow. All rights reserved. The code in this course is licensed under the MIT License. If you would like to use content from any of our courses, you must obtain our explicit written permission and provide credit. Please contact classes@code4tomorrow.org for inquiries.