Understanding Machine Learning vs Deep Learning and a Practical sklearn Regression Tutorial
This article explains the difference between machine learning and deep learning, compares ML algorithms with traditional logic code, introduces the scikit‑learn library, demonstrates data preprocessing, model training with RandomForestRegressor, and shows how to build a voting regressor for disease progression prediction using Python.
Distinguishing Machine Learning and Deep Learning
Machine learning is a branch of artificial intelligence that requires manual feature extraction, similar to an expert examining a vase, whereas deep learning uses instruments to automatically discover useful representations from raw data, like a spectrometer measuring carbon isotopes.
For example, predicting a diabetic patient's blood sugar level starts with expert‑chosen features such as height and weight; deep learning would ingest raw video recordings and learn the relevant patterns itself.
Machine Learning Algorithms vs Traditional Logic Code
Traditional if/else logic is static and must be rewritten when data patterns change, while machine‑learning models adapt their parameters automatically to new data without code modifications.
Old ML algorithms need hand‑crafted features, are lightweight, and work well on small datasets (e.g., license‑plate recognition). Modern deep‑learning models learn features autonomously, handle large‑scale tasks, and require more compute.
Introduction to sklearn
The sklearn (scikit‑learn) library provides utilities for data preprocessing, feature engineering, model selection, and evaluation, including many classic algorithms such as clustering, regression, and ensemble methods.
Clustering algorithms automatically group data without supervision; examples include spectral clustering, density‑based clustering, K‑means, and hierarchical clustering.
Practical: Voting Regressor for Disease Progress Prediction
Data Collection and Description
A public diabetes dataset contains 442 samples with 10 input features (AGE, SEX, BMI, BP, S1‑S6) and one target variable (1‑year blood‑glucose level).
Index
Name
Example
1
AGE
59
2
SEX
2
3
BMI
32.1
4
BP
101
5
S1
157
6
S2
93.2
7
S3
38
8
S4
4
9
S5
4.8598
10
S6
87
Output
Y
151
Data Preprocessing
Features are mean‑centered and scaled by standard deviation to make them comparable across different units.
import numpy as np
datasets = np.loadtxt("diabetes.txt")
X = datasets[:, :10] # inputs
y = datasets[:, -1] # output
X_mean = np.mean(X, axis=0)
X_centered = X - X_mean
std = np.std(X_centered, axis=0)
scaled_X = X_centered * std
scale = np.sqrt(np.sum(scaled_X**2, axis=0))
scaled_X = scaled_X / scaleModel Training
A RandomForestRegressor from sklearn is trained on the scaled data.
from sklearn.ensemble import RandomForestRegressor
reg_rf = RandomForestRegressor(random_state=1)
reg_rf.fit(scaled_X, y)Prediction
For a new patient, the same preprocessing is applied before calling predict :
x_new = [[49, 1, 31.1, 110, 154, 95.2, 33, 4, 4.6692, 97]]
scaled_x_new = (x_new - X_mean) * std / scale
pred_rf = reg_rf.predict(scaled_x_new)
# pred_rf => array([213.26])Voting Regressor
Multiple regressors (GradientBoostingRegressor, RandomForestRegressor, LinearRegression) are combined using VotingRegressor to improve robustness.
from sklearn.ensemble import GradientBoostingRegressor, VotingRegressor
from sklearn.linear_model import LinearRegression
reg1 = GradientBoostingRegressor(random_state=1)
reg1.fit(scaled_X, y)
reg2 = RandomForestRegressor(random_state=1)
reg2.fit(scaled_X, y)
reg3 = LinearRegression()
reg3.fit(scaled_X, y)
reg = VotingRegressor([('gb', reg1), ('rf', reg2), ('lr', reg3)])
reg.fit(scaled_X, y)
pred = reg.predict(scaled_x_new)Conclusion
Traditional machine‑learning algorithms are lightweight AI solutions suitable for tasks with relatively stable features; they require domain expertise for feature engineering. Deep learning excels when raw data is abundant and feature extraction is complex, but both approaches have their place.
Predicting lottery numbers with machine learning is infeasible because the draws are purely random and contain no deterministic pattern.
Rare Earth Juejin Tech Community
Juejin, a tech community that helps developers grow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.