Artificial Intelligence 33 min read

Clustering-Based Global LSTM Models for Large-Scale Time Series Forecasting

The paper proposes clustering thousands of related time series and training separate global LSTM models for each cluster, showing that this reduces heterogeneity, leverages shared information, and improves forecasting accuracy compared to individual models, with extensive experiments on CIF2016 and NN5 datasets.

DataFunSummit

Feb 1, 2023

Clustering-Based Global LSTM Models for Large-Scale Time Series Forecasting

Many industries generate thousands of time series that need forecasting, making it impractical to train a separate model for each series. The authors first cluster similar series and then build a global LSTM model for each cluster, which increases training data and exploits information from similar series.

The methodology extracts interpretable features using the tsmeasures function from the anomalous‑acm R package, applies STL for seasonal‑trend decomposition, stabilizes variance with a logarithmic transform, and normalizes windows before feeding them to LSTM networks. Clustering algorithms evaluated include K‑Means, PAM, DBSCAN, and Snob.

Sliding‑window multi‑input‑multi‑output (MIMO) forecasting is used, with window sizes chosen based on forecast horizon and seasonality. Hyper‑parameters of the LSTM are tuned via Bayesian optimization using the bayesian‑optimization Python package.

Experiments on the CIF2016 and NN5 competition datasets compare the proposed clustered‑LSTM approach against baseline global LSTM, ETS, ARIMA, and Theta models. Evaluation metrics include mean sMAPE, median sMAPE, mean MASE, and ranking measures.

Results show that clustering consistently improves accuracy over the baseline global LSTM, achieving top‑ranked performance on CIF2016 and competitive rankings on NN5. The study also highlights the trade‑off between the number of clusters and information loss, and demonstrates reduced training time due to model parallelism.

In conclusion, global LSTM models combined with time‑series clustering provide a powerful solution for large‑scale forecasting, effectively leveraging cross‑series similarity while mitigating the negative impact of heterogeneity.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Machine Learning clustering time series forecasting LSTM RNN global model

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.