Artificial Intelligence 27 min read

Mastering Time Series Forecasting: From Moving Averages to Transformers

Time series forecasting, essential across weather, finance, and commerce, involves tasks like classification, clustering, anomaly detection, and especially prediction; this article explores its definitions, evaluation metrics, traditional methods, machine‑learning approaches, deep‑learning models such as TFT, and emerging AutoML tools, offering practical insights and best practices.

GuanYuan Data Tech Team
GuanYuan Data Tech Team
GuanYuan Data Tech Team
Mastering Time Series Forecasting: From Moving Averages to Transformers

Time Series Problem Definition and Classification

Time series are data points ordered by time. Common tasks include classification, clustering, anomaly detection, and especially prediction. This article focuses on forecasting.

Forecasting is widely used in weather, traffic, finance, sales, medicine, and system load. A McKinsey AI study ranks time‑series problems as the second most valuable data type.

Google’s internal time‑series forecasting use cases illustrate the breadth of applications.

Evaluating Time Series Forecasts

Regression metrics such as MAE and MSE are sensitive to the scale of the target. Scale‑independent metrics like SMAPE and WMAPE are preferred because they lie in the 0‑1 range, enabling cross‑series comparison.

Time Series Forecasting Validation

Cross‑validation for time series must respect temporal order: split the data chronologically (e.g., train on Jan‑Jun, validate on Jul; then train on Feb‑Jul, validate on Aug, etc.). This mirrors real‑world deployment.

Traditional Time Series Methods

Moving Average (MA) is a strong baseline: the simple MA averages the last n observations. Weighted and exponential moving averages extend this idea. In pandas,

rolling

and

ewm

implement them; SQL window functions can also be used.

ARIMA combines autoregressive (AR) and moving‑average (MA) components. Auto‑ARIMA tools (e.g.,

pmdarima

) automate order selection. Variants include SARIMA, ARIMAX, ARCH, GARCH. These models require fitting each series individually, which can be costly at scale.

Prophet (Facebook) uses an additive model (trend + seasonality + holidays) and provides probabilistic forecasts. It is user‑friendly but still needs per‑series fitting.

Machine Learning Methods

Most winning Kaggle forecasting solutions rely on gradient‑boosted trees (e.g., LightGBM, XGBoost). The workflow transforms a time series into a tabular regression problem using a sliding window: historical values become features (lag features) and the target is the future value.

Key parameters include historical window size, prediction horizon (gap), and prediction window length. Feature engineering distinguishes static categorical features (e.g., product ID) from dynamic features (lag values, date‑derived attributes). Advanced tools like

tsfresh

automate feature extraction.

Deep Learning Methods

RNN/GRU/LSTM models can predict multiple steps but often require careful tuning. Seq2Seq architectures with attention were used in the Web Traffic Forecasting competition. WaveNet applies dilated causal convolutions but performed worse than RNNs in our tests.

DeepAR (Amazon) outputs probabilistic forecasts but is unstable compared to tree models. Temporal Fusion Transformers (TFT) combine attention with feature selection networks and can match or exceed GBDT performance, though training cost remains high.

<code>training = TimeSeriesDataSet(
    data[lambda x: x.date <= training_cutoff],
    time_idx=...,  # column name of time of observation
    target=...,    # column name of target to predict
    group_ids=[...],  # column name(s) for timeseries IDs
    max_encoder_length=max_encoder_length,  # how much history to use
    max_prediction_length=max_prediction_length,  # how far to predict into future
    static_categoricals=[...],
    static_reals=[...],
    time_varying_known_categoricals=[...],
    time_varying_known_reals=[...],
    time_varying_unknown_categoricals=[...],
    time_varying_unknown_reals=[...],
)
</code>

AutoML for Time Series

Libraries such as Auto_TS and AutoTS automate model selection, hyper‑parameter tuning, and validation for time‑series tasks. However, the most effective pipelines still rely on strong feature engineering (e.g., via

tsfresh

) combined with GBDT models.

Future directions include better handling of concept drift, prior shift, and covariate shift, as well as research on data augmentation and pre‑training for time‑series data.

GBDTMachine Learningdeep learningtime series forecastingmetricsAutoMLProphet
GuanYuan Data Tech Team
Written by

GuanYuan Data Tech Team

Practical insights from the GuanYuan Data Tech Team

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.