Artificial Intelligence 8 min read

LASSO Regression Explained: Theory, Case Studies, and Python Code

This article introduces the mathematical foundations of ordinary least squares, ridge, and LASSO regression, explains why LASSO requires coordinate descent, presents two real-world case studies with data, and provides complete Python code for fitting, visualizing, and interpreting LASSO models.

Model Perspective
Model Perspective
Model Perspective
LASSO Regression Explained: Theory, Case Studies, and Python Code

Mathematical Principles Overview

Ordinary Least Squares (OLS) in multiple regression fits a parameter vector that minimizes the residual sum of squares. Ridge regression adds a penalty term to the loss function to address non‑invertibility, while LASSO regression introduces an L1 penalty, encouraging sparsity in the coefficient vector.

The L1 penalty is the sum of absolute values of the regression coefficients, making the loss function nondifferentiable at zero. Consequently, standard OLS solvers, gradient descent, Newton, and quasi‑Newton methods fail for LASSO. Coordinate descent, which updates one coefficient at a time along coordinate axes, can efficiently solve LASSO.

Before estimating LASSO coefficients, the optimal penalty parameter must be selected, often via qualitative visualization similar to ridge regression.

Case 1

The table below (Malinvand, 1966) contains French economic data: total imports (dependent variable) and three explanatory variables—total domestic output, stock, and total consumption (all in 10⁹ francs).

The task is to obtain the LASSO regression equation for this dataset.

The resulting plot shows the relationship between the regularization path and LASSO coefficients, indicating a good selection. The standardized LASSO regression equation is then back‑transformed to the original scale, and the model’s goodness‑of‑fit is reported, demonstrating LASSO’s ability to perform variable selection conveniently.

<code>import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import Lasso, LassoCV
from scipy.stats import zscore

plt.rc('font', size=16)
plt.rc('text', usetex=True)  # comment if LaTeX not installed

a = np.loadtxt("data/economic.txt")
n = a.shape[1] - 1  # number of predictors
aa = zscore(a)        # standardize data
x = aa[:, :n]
y = aa[:, n]

b = []
kk = np.logspace(-4, 0, 100)  # range of alpha values
for k in kk:
    md = Lasso(alpha=k).fit(x, y)
    b.append(md.coef_)

st = ['s-r', '*-k', 'p-b']
for i in range(3):
    plt.plot(kk, np.array(b)[:, i], st[i])
plt.legend(['$x_1$', '$x_2$', '$x_3$'], fontsize=15)
plt.show()

mdcv = LassoCV(alphas=np.logspace(-4, 0, 100)).fit(x, y)
print("Optimal alpha=", mdcv.alpha_)
md0 = Lasso(0.21).fit(x, y)
cs0 = md0.coef_
print("Standardized coefficients:", cs0)

mu = np.mean(a, axis=0)
s = np.std(a, axis=0, ddof=1)
params = [mu[-1] - s[-1] * sum(cs0 * mu[:-1] / s[:-1]), s[-1] * cs0 / s[:-1]]
print("Original scale coefficients:", params)
print("R^2:", md0.score(x, y))
</code>

Case 2

When modeling Chinese private car ownership, four factors are considered: per‑capita disposable income of urban households, total urban population, annual automobile production, and total highway length. The dependent variable is the number of private cars (in ten‑thousands).

Using ordinary least squares yields a regression equation with four explanatory variables, but some coefficients are insignificant or negative, making the model unreasonable. Therefore, LASSO regression is applied to obtain a more parsimonious model.

<code>import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
from sklearn.linear_model import Lasso
from scipy.stats import zscore

# plt.rc('text', usetex=True)  # comment if LaTeX not installed

a = np.loadtxt("data/car.txt")  # 9 rows, 5 columns
n = a.shape[1] - 1
x = a[:, :n]
X = sm.add_constant(x)
md = sm.OLS(a[:, n], X).fit()
print(md.summary())

aa = zscore(a)
x = aa[:, :n]
y = aa[:, n]

b = []
kk = np.logspace(-4, 0, 100)
for k in kk:
    md = Lasso(alpha=k).fit(x, y)
    b.append(md.coef_)

st = ['s-r', '*-k', 'p-b', '^-y']
for i in range(n):
    plt.plot(kk, np.array(b)[:, i], st[i])
plt.legend(['$x_1$', '$x_2$', '$x_3$', '$x_4$'], fontsize=15)
plt.show()

md0 = Lasso(0.05).fit(x, y)
cs0 = md0.coef_
print("Standardized coefficients:", cs0)

mu = a.mean(axis=0)
s = a.std(axis=0, ddof=1)
params = [mu[-1] - s[-1] * sum(cs0 * mu[:-1] / s[:-1]), s[-1] * cs0 / s[:-1]]
print("Original scale coefficients:", params)
print("R^2:", md0.score(x, y))
</code>

References

司守奎,孙玺菁. Python数学实验与建模.

Machine LearningpythonstatisticsregressionLASSO
Model Perspective
Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.