Artificial Intelligence 21 min read

10 Essential Plots for Linear Regression with Python Code Examples

This tutorial explains ten crucial visualizations for linear regression—scatter plot, trend line, residual plot, normal probability plot, learning curve, bias‑variance tradeoff, residuals vs fitted, partial regression, leverage, and Cook's distance—each illustrated with clear Python code using scikit‑learn, matplotlib, seaborn, and statsmodels.

IT Services Circle

Sep 8, 2024

10 Essential Plots for Linear Regression with Python Code Examples

The article introduces ten important chart types that are indispensable when learning or applying linear regression, showing how each plot helps diagnose model performance, data distribution, and assumptions.

Scatter Plot

A scatter plot visualizes the relationship between two variables, helping to assess whether a linear model is appropriate.

from sklearn.datasets import load_diabetes
# Load dataset

diabetes = load_diabetes()
X = diabetes.data[:, 2]  # third feature as independent variable
y = diabetes.target

def simple_linear_regression(X, y):
    X_mean = sum(X) / len(X)
    y_mean = sum(y) / len(y)
    numerator = sum((X - X_mean) * (y - y_mean))
    denominator = sum((X - X_mean) ** 2)
    slope = numerator / denominator
    intercept = y_mean - slope * X_mean
    return slope, intercept

slope, intercept = simple_linear_regression(X, y)

Plot the scatter and the fitted regression line:

import matplotlib.pyplot as plt

plt.scatter(X, y, color='blue', label='Data points')
plt.plot(X, slope*X + intercept, color='red', label='Regression line')
plt.xlabel('X label')
plt.ylabel('y label')
plt.title('Scatter plot with regression line')
plt.legend()
plt.show()

Linear Trend Line Plot

This plot adds a trend line to a scatter plot, making the overall linear relationship clearer, especially in noisy data.

import seaborn as sns
import matplotlib.pyplot as plt

sns.regplot(x=X, y=y, color='red', scatter_kws={'color':'blue','s':10})
plt.xlabel('X label')
plt.ylabel('y label')
plt.title('Linear trend line plot')
plt.show()

Residual Plot

A residual plot shows the differences between observed and predicted values, helping to detect non‑random patterns that indicate model issues.

y_pred = slope * X + intercept
residuals = y - y_pred

plt.scatter(X, residuals, color='blue')
plt.axhline(y=0, color='red', linestyle='--')
plt.xlabel('X label')
plt.ylabel('Residuals')
plt.title('Residual plot')
plt.show()

Normal Probability Plot

This plot checks whether residuals follow a normal distribution, a key assumption for linear regression inference.

import scipy.stats as stats
import matplotlib.pyplot as plt

stats.probplot(residuals, dist='norm', plot=plt)
plt.xlabel('Theoretical quantiles')
plt.ylabel('Ordered residuals')
plt.title('Normal probability plot')
plt.show()

Learning Curve

The learning curve displays training and cross‑validation scores as the number of training examples grows, revealing over‑ or under‑fitting.

from sklearn.model_selection import learning_curve
from sklearn.linear_model import LinearRegression
import numpy as np
import matplotlib.pyplot as plt

model = LinearRegression()
train_sizes, train_scores, valid_scores = learning_curve(
    model, X[:, np.newaxis], y, train_sizes=[50,100,200,300], cv=5)
train_mean = np.mean(train_scores, axis=1)
train_std = np.std(train_scores, axis=1)
valid_mean = np.mean(valid_scores, axis=1)
valid_std = np.std(valid_scores, axis=1)

plt.fill_between(train_sizes, train_mean-train_std, train_mean+train_std, alpha=0.1, color='r')
plt.fill_between(train_sizes, valid_mean-valid_std, valid_mean+valid_std, alpha=0.1, color='g')
plt.plot(train_sizes, train_mean, 'o-', color='r', label='Training score')
plt.plot(train_sizes, valid_mean, 'o-', color='g', label='Cross-validation score')
plt.xlabel('Training examples')
plt.ylabel('Score')
plt.title('Learning curve')
plt.legend(loc='best')
plt.show()

Bias‑Variance Tradeoff Plot

This plot visualizes how model complexity affects bias and variance, guiding the choice of an appropriate model.

from sklearn.model_selection import validation_curve
from sklearn.linear_model import LinearRegression
import numpy as np
import matplotlib.pyplot as plt

model = LinearRegression()
param_range = np.arange(1, 10)
train_scores, valid_scores = validation_curve(
    model, X[:, np.newaxis], y, param_name='fit_intercept', param_range=param_range,
    cv=5, scoring='neg_mean_squared_error')
train_mean = -np.mean(train_scores, axis=1)
train_std = np.std(train_scores, axis=1)
valid_mean = -np.mean(valid_scores, axis=1)
valid_std = np.std(valid_scores, axis=1)

plt.fill_between(param_range, train_mean-train_std, train_mean+train_std, alpha=0.1, color='r')
plt.fill_between(param_range, valid_mean-valid_std, valid_mean+valid_std, alpha=0.1, color='g')
plt.plot(param_range, train_mean, 'o-', color='r', label='Training score')
plt.plot(param_range, valid_mean, 'o-', color='g', label='Cross-validation score')
plt.xlabel('Model complexity')
plt.ylabel('Negative Mean Squared Error')
plt.title('Bias-variance tradeoff plot')
plt.legend(loc='best')
plt.show()

Residuals vs Fitted Plot

This plot checks whether residuals are randomly distributed around zero across the range of fitted values.

from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

model = LinearRegression()
model.fit(X.reshape(-1,1), y)
y_pred = model.predict(X.reshape(-1,1))
residuals = y - y_pred

plt.scatter(y_pred, residuals, color='blue')
plt.axhline(y=0, color='red', linestyle='--')
plt.xlabel('Fitted values')
plt.ylabel('Residuals')
plt.title('Residuals vs Fitted plot')
plt.show()

Partial Regression Plot

This plot isolates the effect of a single predictor while controlling for others, revealing its independent contribution.

from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X.reshape(-1,1))
X_train, X_test, y_train, y_test = train_test_split(X_poly, y, test_size=0.2, random_state=42)
model = LinearRegression()
model.fit(X_train, y_train)

plt.scatter(X[:,1], y, color='blue', label='Actual data')
plt.scatter(X[:,1], model.predict(X_poly), color='red', label='Predicted data')
plt.xlabel('Partial Regression')
plt.ylabel('Target')
plt.title('Partial Regression Plot')
plt.legend()
plt.show()

Leverage Plot

Leverage plots identify points that have a disproportionate influence on the fitted regression coefficients.

import statsmodels.api as sm
import statsmodels.graphics.regressionplots as rp
import matplotlib.pyplot as plt

X_const = sm.add_constant(X)
model = sm.OLS(y, X_const).fit()
rp.plot_leverage_resid2(model)
plt.xlabel('Leverage')
plt.ylabel('Standardized Residuals Squared')
plt.title('Leverage Plot')
plt.show()

Cook's Distance Plot

Cook's distance quantifies the influence of each observation on the overall regression fit, flagging potentially problematic points.

import statsmodels.api as sm
import matplotlib.pyplot as plt

X_const = sm.add_constant(X)
model = sm.OLS(y, X_const).fit()
influence = model.get_influence()
cook_dist = influence.cooks_distance[0]

plt.stem(cook_dist, markerfmt=',')
plt.xlabel('Data points')
plt.ylabel("Cook's distance")
plt.title("Cook's Distance Plot")
plt.show()

Each of these visualizations provides a different perspective on model fit, assumptions, and data quality, enabling a comprehensive evaluation of linear regression models.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python statistics model evaluation Data visualization Matplotlib scikit-learn

Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.