Fundamentals 3 min read

Mastering Gradient Descent: A Practical Guide to Numerical Optimization

This article explains why algebraic methods often fail to locate critical points in real‑world problems and introduces the steepest descent (gradient descent) algorithm as a simple iterative numerical technique, detailing its initialization, update rules, step‑size considerations, and a concrete example.

Model Perspective

Jul 6, 2022

Mastering Gradient Descent: A Practical Guide to Numerical Optimization

Optimization Problems Numerical Solutions

In most practical problems, algebraic methods cannot find critical points. For example, consider the following function:

The critical point equation is:

We cannot solve it with simple algebra; at this point we can use numerical methods for optimization.

In this article we introduce a common iterative method to approximate solutions—gradient descent (steepest descent) algorithm. The basic idea is to start from a guess and then use an update rule to move closer to the critical point.

Gradient Descent

The simplest numerical method for finding a local minimum is the steepest descent method, also called gradient descent.

This algorithm proceeds as follows:

First, initialize the function at a point. If the derivative at that point is zero, we are already at a critical point. Otherwise, the sign of the derivative tells us whether moving away will increase or decrease the function.

If the derivative is negative, moving to the right of the point will decrease the function; if the derivative is positive, moving to the left will decrease the function.

By repeatedly moving according to these rules, the function value decreases until reaching a minimum (though if the function is monotonic decreasing, a minimum may never be reached). The iterative process is illustrated below:

Note that the step size must be sufficiently small, otherwise we may “skip” the location of a local minimum.

Example

Consider the following equation:

We cannot obtain a simple formula for the critical point, so we need to use steepest descent. The derivative of the function is:

Thus, the update rule is:

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

gradient descent optimization algorithms critical points numerical optimization steepest descent

Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.