Master Classic Optimization Algorithms: Direct vs Iterative Methods Explained
This article introduces classic optimization algorithms, distinguishing direct methods that require convexity and closed‑form solutions from iterative first‑ and second‑order methods, and explains their applicability, underlying theory, and key references for solving smooth unconstrained problems.
Scene Description
Researchers have proposed various algorithms for different optimization problems, leading to the field of convex optimization. Among many algorithms, several classic ones are worth remembering; knowing their applicable scenarios helps generate solution ideas for new problems.
Problem Description
An unconstrained optimization problem is presented where the objective function L(·) is smooth. What optimization algorithms can solve it and what are their applicable scenarios?
Prerequisite Knowledge
Calculus, linear algebra, basic concepts of convex optimization.
Answer and Analysis
Classic optimization algorithms can be divided into two categories: direct methods and iterative methods.
Direct Methods
Direct methods provide the optimal solution directly but require two conditions: (1) L(·) is a convex function; (2) the problem admits a closed‑form solution. Convexity means that for any x, y and any λ∈[0,1], the line segment between (x, L(x)) and (y, L(y)) lies above the function surface.
If L is convex, a sufficient and necessary condition for θ* to be optimal is that the gradient at θ* is zero. Additionally, a closed‑form solution must exist. Ridge regression satisfies both conditions.
However, these requirements limit the applicability of direct methods, so iterative methods are often used.
Iterative Methods
Iterative methods update the estimate of the optimum: θ t+1 =θ t +δ t . They are classified into first‑order and second‑order methods.
First‑order Methods
First‑order methods approximate L(θ t +δ t ) by a first‑order Taylor expansion, leading to an update rule (gradient descent). They work well when δ is small.
First‑order methods are also called gradient descent; they use only gradient information.
Second‑order Methods
Second‑order methods use a second‑order Taylor expansion, involving the Hessian matrix of L at θ t . This yields the Newton update.
Second‑order methods converge faster than first‑order methods but require computing and inverting the Hessian, which is costly in high dimensions and may converge to saddle points for non‑convex functions.
Further Reading
Yurii Nesterov introduced an accelerated first‑order method in 1983, achieving optimal convergence rates. Broyden, Fletcher, Goldfarb, and Shanno independently proposed quasi‑Newton methods in 1970, later extended to the limited‑memory L‑BFGS algorithm in 1989.
References
[1] Boyd, Stephen, and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
[2] Nesterov, Yurii. “A method of solving a convex programming problem with convergence rate O(1/k²).” Soviet Mathematics Doklady, vol. 27, no. 2, 1983.
[3] Broyden, Charles G. “The convergence of a class of double‑rank minimization algorithms: 2. The new algorithm.” IMA Journal of Applied Mathematics, vol. 6, no. 3, 1970, pp. 222‑231.
[4] Fletcher, Roger. “A new approach to variable metric algorithms.” The Computer Journal, vol. 13, no. 3, 1970, pp. 317‑322.
[5] Goldfarb, Donald. “A family of variable‑metric methods derived by variational means.” Mathematics of Computation, vol. 24, no. 109, 1970, pp. 23‑26.
[6] Shanno, David F. “Conditioning of quasi‑Newton methods for function minimization.” Mathematics of Computation, vol. 24, no. 111, 1970, pp. 647‑656.
[7] Liu, Dong C., and Jorge Nocedal. “On the limited memory BFGS method for large scale optimization.” Mathematical Programming, vol. 45, no. 1, 1989, pp. 503‑528.
Hulu Beijing
Follow Hulu's official WeChat account for the latest company updates and recruitment information.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.