Introduction to Lasso Regression with scikit-learn
This article provides a comprehensive guide to Lasso regression, covering its theoretical background, scikit-learn API parameters, step‑by‑step Python implementation, cross‑validation for hyper‑parameter tuning, visualization of predictions, and a discussion of its advantages over ridge regression.
The article introduces Lasso regression as an L1‑regularized linear model, explains its role in feature selection and over‑fitting mitigation, and contrasts it with ridge regression (L2 regularization).
It then presents the key scikit‑learn Lasso class parameters— alpha (regularization strength), max_iter (maximum iterations), and warm_start (reuse previous solution)—in a concise table.
Code example 1: Importing libraries #coding=utf-8 import pandas as pd from sklearn import cross_validation from sklearn.linear_model import Lasso, LassoCV import numpy as np import matplotlib.pyplot as plt
Code example 2: Loading data and splitting data = pd.read_csv('D:\\hua.cao\\python\\20180606\\Folds5x2_pp.csv') X = data[['AT', 'V', 'AP', 'RH']] Y = data[['PE']] X_TRAIN, X_TEST, Y_TRAIN, Y_TEST = train_test_split(X, Y, random_state=1)
Code example 3: Training a Lasso model lasso = Lasso(alpha=0.01) lasso.fit(X_TRAIN, Y_TRAIN)
Code example 4: Making predictions Y_PRED = lasso.predict(X_TEST)
Code example 5: Inspecting the fitted model print lasso.coef_ print lasso.intercept_
The resulting coefficients and intercept are shown, and the derived regression equation is displayed as PE = -1.9687·AT - 0.2393·V + 0.0566·AP - 0.1586·RH + 460.35 .
Code example 6: Cross‑validation with LassoCV X1 = data[['AT', 'V', 'AP', 'RH']] Y1 = data[['PE']] lassocv = LassoCV() lassocv.fit(X1, Y1) print lassocv.alpha_ print lassocv.coef_ print np.sum(lassocv.coef_ != 0)
This block automatically selects the optimal alpha and reports the number of non‑zero features.
Code example 7: Visualizing predictions fig, ax = plt.subplots() ax.scatter(Y_TEST, Y_PRED) ax.plot([Y_TEST.min(), Y_TEST.max()], [Y_TEST.min(), Y_TEST.max()], 'k--', lw=4) ax.set_xlabel('Measured') ax.set_ylabel('Predicted') plt.show()
A scatter plot with a diagonal reference line illustrates how closely the predicted values match the measured ones.
The article concludes by summarizing that Lasso regression shrinks some coefficients to zero, enhancing model generalization and reducing computational cost compared to ridge regression, and it references further reading on norm regularization and compressed sensing.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.