Artificial Intelligence 5 min read

How to Solve Multiple Linear Regression with sklearn and statsmodels in Python

This article demonstrates how to perform multiple linear regression using sklearn's LinearRegression and the statsmodels library in Python, covering both formula‑based and array‑based approaches, complete with example data, code snippets, and model evaluation details.

Model Perspective
Model Perspective
Model Perspective
How to Solve Multiple Linear Regression with sklearn and statsmodels in Python

Using LinearRegression from sklearn.linear_model

Using the LinearRegression function from sklearn.linear_model can solve multiple linear regression problems, but the built‑in model evaluation provides only a single metric, so users must program additional statistical tests. The call format is LinearRegression().fit(X, y) where X is the matrix of independent variables (excluding a column of all ones) and y is the vector of dependent observations.

Example

Problem

The heat released during cement setting is related to two main chemical components; given a data set, determine a linear regression model.

Data table (omitted for brevity).

Computation

Code

The obtained regression model is:

The model’s coefficient of determination (R²) indicates a good fit.

Using statsmodels library

The statsmodels library can solve regression models in two ways: formula‑based and array‑based.

Formula‑based call format:

<code>import statsmodels as sm
sm.formula.ols(formula, data=df)</code>

where formula is a string such as "y ~ x1 + x2" and df is a DataFrame or dictionary containing the data.

Array‑based call format:

<code>import statsmodels.api as sm
sm.OLS(y, X).fit()</code>

where y is the dependent vector and X is the independent matrix with a column of ones added.

Code

Formula‑based example:

<code>import numpy as np
import statsmodels.api as sm
a = np.loadtxt("data/cement.txt")
d = {'x1': a[:,0], 'x2': a[:,1], 'y': a[:,2]}
md = sm.formula.ols('y~x1+x2', d).fit()
print(md.summary())
ypred = md.predict({'x1': a[:,0], 'x2': a[:,1]})</code>

Array‑based example:

<code>import numpy as np
import statsmodels.api as sm
a = np.loadtxt("data/cement.txt")
X = sm.add_constant(a[:,:2])
md = sm.OLS(a[:,2], X).fit()
print(md.params)
print(md.summary2())</code>

References

Shi Shou‑kui, Sun Xi‑jing. Python Mathematics Experiments and Modeling.

machine learningPythondata analysislinear regressionstatsmodelssklearn
Model Perspective
Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.