Overview of Common Machine Learning Models: Characteristics, Advantages, and Disadvantages
This article provides a concise overview of fifteen widely used machine learning models—including decision trees, random forests, k‑means, KNN, EM, linear and logistic regression, Naive Bayes, Apriori, Boosting, GBDT, SVM, neural networks, HMM, and CRF—detailing their features, strengths, weaknesses, and typical application scenarios.
Decision Tree
Features
Suitable for small datasets; uses hierarchical variables or decision nodes to classify instances such as credit‑reliable vs. unreliable users.
Example Scenarios
Rule‑based credit assessment, horse‑race outcome prediction.
Advantages
Simple computation, strong interpretability, handles missing attributes, tolerates irrelevant features, and evaluates diverse characteristics effectively.
Disadvantages
Prone to over‑fitting (mitigated by pruning or ensemble methods like Random Forest).
Applicable Data Types
Both numeric and nominal attributes.
CART Classification and Regression Trees
Uses Gini index for split decisions; classification trees for categorical targets, regression trees for continuous targets.
Advantages
Highly flexible, supports misclassification costs, can incorporate prior probabilities, automatic cost‑complexity pruning, produces easy‑to‑understand rules, and achieves high accuracy.
Disadvantages
Requires multiple scans and sorting of the dataset, making it inefficient; C4.5 variant only works on data that fits in memory.
Random Forest
Features
Accuracy comparable to AdaBoost; robust to errors and outliers; performance depends on strength and correlation of base classifiers; sensitive to the number of attributes considered at each split (commonly log₂(n)+1).
Example Scenarios
User churn analysis, risk assessment.
Advantages
Less prone to over‑fitting, fast on large databases, handles missing values well, provides intrinsic variable‑importance estimates, balances errors on imbalanced data, and offers proximity measures useful for outlier detection and visualization.
Disadvantages
May over‑fit noisy classification/regression problems; attribute‑level importance can be unreliable when attributes have differing levels.
k‑means Clustering
Features
May not reach global optimum; results depend on initial centroid selection; uses mean‑square error as the dispersion metric.
Advantages
Simple and fast; computational complexity O(n k t) where n is sample size, k is number of clusters, and t is iterations.
Disadvantages
Requires numeric centroids, sensitive to the choice of k and initial points, unsuitable for clusters of vastly different sizes, and highly sensitive to noise and outliers.
K‑Nearest Neighbors (KNN)
Features
No explicit training phase; prediction by majority vote; key elements are k value, distance metric, and decision rule.
Advantages
Simple, works for classification and regression, handles non‑linear boundaries, O(n) prediction cost, and relatively robust to outliers.
Disadvantages
Requires pre‑defining k; can be biased toward majority classes in imbalanced datasets.
Common Algorithm
kd‑tree for efficient neighbor search.
Expectation‑Maximization (EM)
Features
E‑step computes the expected value of hidden variables given current parameters; M‑step maximizes the expected likelihood to update parameters.
Advantages
More stable and accurate than k‑means.
Disadvantages
Computationally intensive, slow convergence, and sensitive to initial parameter guesses.
Linear Regression
Features
Provides an analytical solution.
Advantages
Simple and has a closed‑form solution.
Disadvantages
Poor fit for complex data; tends to under‑fit.
Logistic Regression
Features
Based on the logistic distribution; optimization methods include iterative scaling, gradient descent, and quasi‑Newton approaches.
Advantages
Simple, low computational cost, and minimal storage requirements.
Disadvantages
Prone to under‑fitting and limited accuracy.
Naive Bayes
Features
Uses prior knowledge to compute posterior probabilities; assumes conditional independence of features.
Example Scenarios
Sentiment analysis, consumer segmentation.
Advantages
Performs well on small datasets, supports multi‑class problems, and offers fast classification.
Disadvantages
Independence assumption can reduce accuracy; performance not guaranteed to be high.
Apriori
Features
Two‑stage frequent‑itemset generation; iteratively removes items with support below a threshold.
Advantages
Easy to implement.
Disadvantages
Slow on large datasets, generates many candidate sets, and incurs high I/O due to repeated full‑dataset scans.
Boosting
Features
Adjusts sample weights during learning and combines multiple classifiers based on performance.
Advantages
Low generalization error, high classification accuracy, and requires few hyper‑parameters.
Disadvantages
Sensitive to outliers.
Gradient Boosted Decision Trees (GBDT / MART)
Features
Two variants: residual version and gradient version (the latter is more widely used).
Advantages
Handles both linear and non‑linear relationships and is less prone to over‑fitting.
Support Vector Machine (SVM)
Features
Maps data from low‑dimensional to high‑dimensional space to achieve linear separability.
Advantages
Enables non‑linear classification and regression, yields low generalization error, and is relatively interpretable.
Disadvantages
Sensitive to choice of kernel function and hyper‑parameters.
Neural Networks
Features
Inspired by brain structure; composed of artificial neurons.
Advantages
Back‑Propagation (BP) networks provide strong non‑linear fitting, simple learning rules, robustness, memory and self‑learning capabilities, and good parallelism; Radial Basis Function (RBF) networks offer optimal approximation, no local minima, fast convergence, and excellent classification performance.
Disadvantages
Models are not interpretable, require sufficient data, and are sensitive to initialization.
Hidden Markov Model (HMM)
Features
Double stochastic process consisting of a hidden Markov chain and observable output functions.
Example Scenarios
Facial expression analysis, weather forecasting.
Advantages
Solves sequence labeling problems.
Disadvantages
Relies on homogeneous Markov and observation independence assumptions, which may introduce bias.
Conditional Random Field (CRF)
Features
Discriminative, undirected graphical model.
Advantages
Globally normalized probability avoids label‑bias, flexible feature design, and does not require strict independence assumptions.
Disadvantages
High training cost and computational complexity.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.