Artificial Intelligence 12 min read

Mastering Regression: Key Assumptions, Metrics, and Model Evaluation

This article explains the fundamental assumptions of linear regression, compares linear and nonlinear models, discusses multicollinearity, outliers, regularization, heteroscedasticity, VIF, stepwise regression, and reviews essential evaluation metrics such as MAE, MSE, RMSE, R² and Adjusted R².

Model Perspective
Model Perspective
Model Perspective
Mastering Regression: Key Assumptions, Metrics, and Model Evaluation

Regression analysis provides a solid foundation for many machine learning algorithms. This article summarizes ten important regression problems and five key evaluation metrics.

Assumptions of Linear Regression

Linear regression has four assumptions:

Linearity: the relationship between independent variable (x) and dependent variable (y) should be linear.

Independence: features should be independent, minimizing multicollinearity.

Normality: residuals should follow a normal distribution.

Homoscedasticity: variance of data points around the regression line should be constant for all values.

Concept of Residuals

Residuals are the errors between predicted values and observed values, measuring the distance of data points from the regression line.

A residual plot is a good way to assess a regression model; random scatter without patterns indicates a suitable linear model.

Linear vs. Non‑Linear Regression Models

Both are types of regression problems, differing in the data they are trained on.

Linear models assume a linear relationship between features and target, while non‑linear models assume no linear relationship and fit curves.

Three best ways to determine linearity:

Residual plot

Scatter plot

Multicollinearity

Multicollinearity occurs when some features are highly correlated, making it difficult for the model to learn distinct patterns and degrading performance. It should be mitigated before training.

Impact of Outliers

Outliers are data points far from the average range.

Outliers pull the best‑fit line toward them, increasing error rates and resulting in high MSE.

MSE and MAE

MSE (Mean Squared Error) measures the squared difference between actual and predicted values; MAE (Mean Absolute Error) measures the absolute difference. MSE penalizes large errors more heavily, while MAE is more robust to outliers.

L1 and L2 Regularization

When data are scarce, basic linear regression tends to overfit; L1 (Lasso) and L2 (Ridge) regularization help mitigate this.

L1 adds the absolute value of coefficients as a penalty, effectively removing features with small coefficients.

L2 adds the squared magnitude of coefficients as a penalty, shrinking large coefficients.

Both are useful when training data are limited, variance is high, features outnumber observations, or multicollinearity exists.

Heteroscedasticity

Heteroscedasticity means the variance of data points around the best‑fit line varies across the range, leading to uneven residual dispersion and unreliable predictions. Plotting residuals helps detect it.

Large variance differences often arise from features with vastly different scales (e.g., a column ranging from 1 to 100 000).

Variance Inflation Factor (VIF)

VIF quantifies how much a variable can be predicted by other variables. A high VIF indicates strong correlation; such variables should be removed.

Stepwise Regression

Stepwise regression iteratively adds or removes predictors based on statistical significance, aiming to minimize error between observed and predicted values while efficiently handling high‑dimensional data.

Evaluation Metrics

Using a regression example (predicting salary from work experience), the following metrics are introduced:

Mean Absolute Error (MAE)

MAE is the average absolute difference between actual and predicted values; lower values indicate better models.

Advantages: easy to interpret, same unit as output, relatively robust to outliers.

Disadvantages: uses absolute value, which is not differentiable everywhere, limiting its use as a loss function.

Mean Squared Error (MSE)

MSE squares the differences before averaging; it is differentiable everywhere, making it suitable as a loss function.

Advantages: differentiable, usable as loss.

Disadvantages: units are squared, harder to interpret; sensitive to outliers.

Root Mean Squared Error (RMSE)

RMSE is the square root of MSE, restoring the original unit while still being sensitive to outliers.

The choice among MAE, MSE, and RMSE depends on the problem context.

R² Score

R² ranges from 0 to 1, indicating goodness of fit. An R² of 0 means the model performs no better than predicting the mean; 1 means perfect fit; negative values indicate worse than mean prediction.

R² can increase or stay constant as more features are added, even if they are irrelevant.

Adjusted R² Score

Adjusted R² accounts for the number of predictors, penalizing the inclusion of irrelevant features and providing a more reliable measure of model performance.

References:

https://mp.weixin.qq.com/s/Sx1lf2Ia6FPblTQdC7-fdg

machine learningmetricsregressionmodel evaluationlinear regression
Model Perspective
Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.