Artificial Intelligence 14 min read

An Overview of Automated Machine Learning (AutoML): Definitions, Algorithms, Frameworks, and Open Challenges

This article provides a comprehensive overview of Automated Machine Learning (AutoML), covering its definition, objectives, research areas, hyperparameter optimization methods, pipeline construction, major CASH algorithms, open-source frameworks such as AutoSklearn and NNI, practical case studies, and current open research challenges.

JD Tech Talk
JD Tech Talk
JD Tech Talk
An Overview of Automated Machine Learning (AutoML): Definitions, Algorithms, Frameworks, and Open Challenges

Automated Machine Learning (AutoML) is introduced as an end‑to‑end automation of the machine‑learning workflow, aiming to reduce the expertise and time required for model building by automatically selecting algorithms, tuning hyper‑parameters, and constructing pipelines.

The article answers four key questions: what AutoML is, why it is needed, what problems it can solve, and the scope of its research, highlighting challenges such as complex model‑building pipelines, reliance on expert knowledge, and limited talent.

Problem formalization is presented through three definitions: a machine‑learning pipeline configuration, the pipeline‑creation problem, and pipeline performance evaluation, emphasizing the black‑box nature of the optimization task.

Various CASH (Combined Algorithm Selection and Hyper‑parameter) algorithms are reviewed, including grid search, random search, Bayesian optimization (SMAC, Gaussian Process, TPE), evolutionary methods, particle‑swarm optimization, and ε‑greedy multi‑armed bandit approaches, with discussion of their strengths and weaknesses.

Two categories of ML‑pipeline synthesis methods are described: fixed‑shape pipelines that reduce search space but may limit performance on complex data, and variable‑shape pipelines that are more flexible but computationally expensive.

Open‑source AutoML frameworks are compared, focusing on AutoSklearn (Scikit‑learn based, SMAC optimizer, meta‑learning, ensemble selection) and Microsoft NNI (supports both NAS and hyper‑parameter search, flexible tuners, assessors, and training platforms). Their advantages, limitations, and practical deployment considerations are outlined.

Real‑world case studies demonstrate the effectiveness of AutoML: an XGBoost regression model achieved a 23.5% reduction in validation RMSE, and a DNN classification model improved AUC by 0.32% compared with manual tuning.

The article concludes with a discussion of open research problems in AutoML, including smarter optimization techniques, fully automated pipelines, reproducibility and interpretability, efficiency and scalability, lifelong learning, and standardized benchmarks.

case studyMachine LearningAutoMLNeural Architecture SearchHyperparameter OptimizationOpen-source Frameworks
JD Tech Talk
Written by

JD Tech Talk

Official JD Tech public account delivering best practices and technology innovation.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.