Artificial Intelligence 10 min read

TimeHF: A Billion‑Scale Time Series Forecasting Model Guided by Human Feedback

The JD Supply Chain algorithm team introduces TimeHF, a billion‑parameter time‑series large model that leverages RLHF to boost demand‑forecast accuracy by over 10%, detailing dataset construction, the PCTLM architecture, a custom RLHF framework (TPO), and extensive SOTA experimental results.

JD Tech
JD Tech
JD Tech
TimeHF: A Billion‑Scale Time Series Forecasting Model Guided by Human Feedback

JD's supply‑chain algorithm team unveiled TimeHF, the industry’s first self‑developed billion‑scale pure time‑series model that applies reinforcement learning from human feedback (RLHF) to improve sales‑forecast accuracy by more than 10% and reduce demand uncertainty.

Traditional forecasting methods such as ARIMA, Prophet, LSTM, and TCN struggle with complex patterns and zero‑shot generalization, while existing time‑series LLM adaptations (e.g., GPT4TS, TimesFM) lack breakthrough performance due to low‑quality public datasets and unsuitable RLHF pipelines.

To address these issues, the team built a 15‑billion‑sample high‑quality dataset by mixing JD internal sales data, public datasets (Monash, TSLib), and synthetic data, applying rigorous cleaning, deduplication, diversity ranking, and data‑ratio strategies.

The proposed PCTLM (Patch Convolutional Timeseries Large Model) partitions input series into overlapping patches, projects them via a convolutional encoder, and employs grouped attention with time‑position encoding (ROPE) and GQA to capture cross‑patch dependencies.

Because conventional RLHF frameworks (PPO, RLOO) are unsuitable for continuous‑output time‑series models, the team designed TPO (Timeseries Policy Optimization), a RLHF scheme that adds preference‑based pairwise data, a probability‑output component modeling predictions as N(μ,1), and an advantage function based on baseline rewards, while retaining an MSE term to preserve forecasting performance.

Extensive experiments on public benchmarks show that the SFT‑plus‑TPO‑trained PCTLM outperforms GPT4TS and five strong baselines (PatchTST, Autoformer, iTransformer, DLinear, Informer) in MAE, achieving state‑of‑the‑art results across most datasets.

Deployed on JD’s supply‑chain system for 20,000 SKUs, the model delivers automated replenishment with markedly higher prediction accuracy than the previous online solution.

For full technical details, see the paper “TimeHF: Billion‑Scale Time Series Models Guided by Human Feedback” (https://arxiv.org/abs/2501.15942).

Big Datadeep learningLarge Language Modelssupply chaintime series forecastingRLHF
JD Tech
Written by

JD Tech

Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.