Artificial Intelligence 15 min read

Why FLAML Is the Fast, Lightweight AutoML Framework You Should Try

This article introduces Microsoft’s FLAML, a fast and lightweight AutoML library, explains its design principles, cost‑aware search strategy, key observations, properties, and experimental results, and provides practical code examples for integrating FLAML into Python machine‑learning workflows.

GuanYuan Data Tech Team

May 5, 2022

Why FLAML Is the Fast, Lightweight AutoML Framework You Should Try

Overview

AutoML has achieved many successes in recent machine‑learning competitions, and FLAML is a new, efficient, lightweight AutoML framework promoted by Microsoft. FLAML (Fast and Lightweight AutoML Library) was introduced in a 2021 Microsoft research paper and quickly became an officially recommended automated tuning library for LightGBM.

Related Work

For background on AutoML, see the article "AutoML框架概览" (https://zhuanlan.zhihu.com/p/212512984) and the original paper (https://arxiv.org/pdf/1911.04706.pdf). Additional references include a KDnuggets article on FLAML + Ray Tune and a Microsoft Research stand‑up video.

Summary of Current Work

FLAML differs from other state‑of‑the‑art AutoML frameworks by focusing on lightweight search: it incorporates hyper‑parameters, learner choice, and sample size into a unified cost model that includes CPU time and cross‑validation overhead. The framework’s core contribution is a cost‑aware search strategy that accelerates and simplifies the search space.

Observations

Increasing sample size reduces the gap between test error and validation error, and cross‑validation yields smaller gaps than hold‑out under fixed conditions.

With fixed sample size, increasing model complexity does not always lower loss to the minimum.

Cost is proportional to both sample size and hyper‑parameter space (e.g., number of trees in a tree model).

Properties

Choose sample size and model complexity jointly: large sample with complex model, small sample with simple model.

Use cross‑validation only for small sample sizes or when the cost budget is large; otherwise hold‑out suffices.

Allow all learners to compete fairly by considering resampling strategy and sample size in the cost model.

Select the configuration with the lowest cost when errors are comparable, avoiding excessive time on marginal improvements.

Design Overview

FLAML’s design emphasizes speed and efficiency by avoiding exhaustive global search. It sequentially selects a resampling strategy, learner, and hyper‑parameters, then evaluates validation error and cost. This loop repeats until a time limit is reached.

Search Strategy

The framework introduces Estimated Cost for Improvement (ECI) to guide search. ECI estimates the cost required to achieve the next loss improvement, considering three cases: (1) improving loss with the current learner and sample size, (2) increasing sample size for the current learner, and (3) switching to a new learner.

Search Detailed Procedure

Select an appropriate resampling strategy (cross‑validation or hold‑out) based on dataset size and time budget.

Choose a learner using ECI‑based probabilities, ensuring fair competition among learners.

Search hyper‑parameters and sample size with a randomized direct search method, starting with small sample sizes and expanding as training progresses.

Example

An illustration (from the Microsoft stand‑up) shows FLAML’s iterative process, where early iterations use simple models and small samples, then gradually increase model complexity (e.g., LightGBM trees) and sample size as loss improves.

FLAML Comparison Experiments

FLAML was evaluated on 53 datasets (39 classification, 14 regression). Compared with Auto‑sklearn, H2O AutoML, TPOT, cloud‑automlh, and HpBandSter, FLAML achieved comparable or better results within the same time budget. Notably, FLAML trained for 1 minute outperformed other libraries trained for 10 minutes on 62‑83% of datasets, and outperformed 10‑minute training on 72‑89% of datasets when compared against 1‑hour training.

Implementation

FLAML follows the scikit‑learn API. A basic usage example:

from flaml import AutoML
automl = AutoML()
automl.fit(X_train=X_train, y_train=y_train, time_budget=60, estimator_list=['lgbm'])
print('Best ML model:', automl.model)
print('Best hyperparameter config:', automl.best_config)

Custom learners can be defined by subclassing existing estimators and providing a custom search space. Example for a customized XGBoost learner:

'''create an XGBoost learner class with a customized search space'''
from flaml.model import XGBoostSklearnEstimator
from flaml import tune

class MyXGB(XGBoostSklearnEstimator):
    '''XGBoostSklearnEstimator with a customized search space'''    
    @classmethod
    def search_space(cls, data_size, **params):
        upper = min(2**15, int(data_size))
        return {
            'n_estimators': {
                'domain': tune.lograndint(lower=4, upper=upper),
                'low_cost_init_value': 4,
            },
            'max_leaves': {
                'domain': tune.lograndint(lower=4, upper=upper),
                'low_cost_init_value': 4,
            },
        }
'''Use CFO in FLAML to tune XGBoost'''
from flaml import AutoML
automl = AutoML()
automl.add_learner(learner_name='my_xgboost', learner_class=MyXGB)
automl.fit(X_train=X_train, y_train=y_train, time_budget=15, estimator_list=['my_xgboost'], hpo_method='cfo')
settings = {
    "time_budget": 240,
    "metric": 'r2',
    "estimator_list": ['my_xgboost'],
    "task": 'regression',
    "log_file_name": 'houses_experiment.log',
    "hpo_method": 'cfo',
    "seed": 7654321,
}
automl.fit(X_train=X_train, y_train=y_train, **settings)

Because FLAML adheres to the scikit‑learn API, it can be integrated into pipelines for further convenience.

Images

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AutoML Cost-aware Search FLAML

Written by

GuanYuan Data Tech Team

Practical insights from the GuanYuan Data Tech Team

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.