An Overview of Automated Machine Learning (AutoML): Definitions, Algorithms, Frameworks, and Open Challenges
This article provides a comprehensive overview of Automated Machine Learning (AutoML), covering its definition, objectives, research areas, hyperparameter optimization methods, pipeline construction, major CASH algorithms, open-source frameworks such as AutoSklearn and NNI, practical case studies, and current open research challenges.
Automated Machine Learning (AutoML) is introduced as an end‑to‑end automation of the machine‑learning workflow, aiming to reduce the expertise and time required for model building by automatically selecting algorithms, tuning hyper‑parameters, and constructing pipelines.
The article answers four key questions: what AutoML is, why it is needed, what problems it can solve, and the scope of its research, highlighting challenges such as complex model‑building pipelines, reliance on expert knowledge, and limited talent.
Problem formalization is presented through three definitions: a machine‑learning pipeline configuration, the pipeline‑creation problem, and pipeline performance evaluation, emphasizing the black‑box nature of the optimization task.
Various CASH (Combined Algorithm Selection and Hyper‑parameter) algorithms are reviewed, including grid search, random search, Bayesian optimization (SMAC, Gaussian Process, TPE), evolutionary methods, particle‑swarm optimization, and ε‑greedy multi‑armed bandit approaches, with discussion of their strengths and weaknesses.
Two categories of ML‑pipeline synthesis methods are described: fixed‑shape pipelines that reduce search space but may limit performance on complex data, and variable‑shape pipelines that are more flexible but computationally expensive.
Open‑source AutoML frameworks are compared, focusing on AutoSklearn (Scikit‑learn based, SMAC optimizer, meta‑learning, ensemble selection) and Microsoft NNI (supports both NAS and hyper‑parameter search, flexible tuners, assessors, and training platforms). Their advantages, limitations, and practical deployment considerations are outlined.
Real‑world case studies demonstrate the effectiveness of AutoML: an XGBoost regression model achieved a 23.5% reduction in validation RMSE, and a DNN classification model improved AUC by 0.32% compared with manual tuning.
The article concludes with a discussion of open research problems in AutoML, including smarter optimization techniques, fully automated pipelines, reproducibility and interpretability, efficiency and scalability, lifelong learning, and standardized benchmarks.
JD Tech Talk
Official JD Tech public account delivering best practices and technology innovation.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.