Common Machine Learning Algorithms for Data Prediction with Python Code Examples
This article introduces ten widely used machine learning algorithms for data prediction, explains their core concepts, and provides complete Python code snippets using scikit‑learn and related libraries to help readers implement regression, classification, and time‑series forecasting tasks.
Using machine learning algorithms for data prediction is a common task in data analysis.
1. Linear Regression
Explanation: Linear regression is a basic regression algorithm suitable for predicting continuous variables.
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
# Assume feature matrix X and target y
X = [[1, 2], [2, 4], [3, 6], [4, 8]]
y = [3, 6, 9, 12]
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create and fit model
model = LinearRegression()
model.fit(X_train, y_train)
# Predict and evaluate
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print("Linear Regression MSE:", mse)2. Logistic Regression
Explanation: Logistic regression is a common classification algorithm for binary problems.
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Assume feature matrix X and binary target y
X = [[1, 2], [2, 4], [3, 6], [4, 8]]
y = [0, 0, 1, 1]
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create and fit model
model = LogisticRegression()
model.fit(X_train, y_train)
# Predict and evaluate
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Logistic Regression Accuracy:", accuracy)3. Decision Tree
Explanation: Decision trees are tree‑structured models for classification and regression, easy to interpret.
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Assume feature matrix X and target y for classification
X = [[1, 2], [2, 4], [3, 6], [4, 8]]
y = [0, 0, 1, 1]
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create and fit model
model = DecisionTreeClassifier()
model.fit(X_train, y_train)
# Predict and evaluate
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Decision Tree Accuracy:", accuracy)4. Random Forest
Explanation: Random forest is an ensemble learning method that combines multiple decision trees to improve prediction accuracy.
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Assume feature matrix X and target y for classification
X = [[1, 2], [2, 4], [3, 6], [4, 8]]
y = [0, 0, 1, 1]
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create and fit model
model = RandomForestClassifier()
model.fit(X_train, y_train)
# Predict and evaluate
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Random Forest Accuracy:", accuracy)5. Support Vector Machine
Explanation: SVM is a powerful classifier that maximizes the margin between classes, suitable for linear and non‑linear problems.
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Assume feature matrix X and target y for classification
X = [[1, 2], [2, 4], [3, 6], [4, 8]]
y = [0, 0, 1, 1]
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create and fit model
model = SVC()
model.fit(X_train, y_train)
# Predict and evaluate
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("SVM Accuracy:", accuracy)6. K‑Nearest Neighbors
Explanation: K‑NN classifies a sample based on the majority label of its K nearest neighbors.
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Assume feature matrix X and target y for classification
X = [[1, 2], [2, 4], [3, 6], [4, 8]]
y = [0, 0, 1, 1]
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create and fit model
model = KNeighborsClassifier()
model.fit(X_train, y_train)
# Predict and evaluate
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("KNN Accuracy:", accuracy)7. Naive Bayes
Explanation: Naive Bayes applies Bayes' theorem with the assumption of feature independence for classification.
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Assume feature matrix X and target y for classification
X = [[1, 2], [2, 4], [3, 6], [4, 8]]
y = [0, 0, 1, 1]
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create and fit model
model = GaussianNB()
model.fit(X_train, y_train)
# Predict and evaluate
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Naive Bayes Accuracy:", accuracy)8. Neural Network (MLP)
Explanation: A multilayer perceptron is a deep learning model suitable for various prediction tasks.
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Assume feature matrix X and target y for classification
X = [[1, 2], [2, 4], [3, 6], [4, 8]]
y = [0, 0, 1, 1]
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create and fit model
model = MLPClassifier()
model.fit(X_train, y_train)
# Predict and evaluate
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Neural Network Accuracy:", accuracy)9. XGBoost
Explanation: XGBoost is a gradient‑boosted tree algorithm that combines many decision trees to improve performance.
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Assume feature matrix X and target y for classification
X = [[1, 2], [2, 4], [3, 6], [4, 8]]
y = [0, 0, 1, 1]
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create and fit model
model = XGBClassifier()
model.fit(X_train, y_train)
# Predict and evaluate
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("XGBoost Accuracy:", accuracy)10. Time‑Series Forecasting (ARIMA)
Explanation: Time‑series prediction models future values based on temporal dependencies in the data.
import pandas as pd
from statsmodels.tsa.arima.model import ARIMA
# Assume a CSV file with columns "date" and "value"
data = pd.read_csv("data.csv")
data["date"] = pd.to_datetime(data["date"])
data.set_index("date", inplace=True)
# Build and fit ARIMA model
model = ARIMA(data["value"], order=(1, 1, 1))
model_fit = model.fit()
# Forecast next 10 points
future_values = model_fit.predict(start=len(data), end=len(data)+10)
print("Future 10 predictions:", future_values)These examples cover common data‑prediction algorithms and tasks, allowing readers to choose suitable models for their specific analysis needs.
Test Development Learning Exchange
Test Development Learning Exchange
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.