Artificial Intelligence 11 min read

A Billion-Scale Pure Time Series Large Model: PCTLM with SFT and TPO for Forecasting

This article presents a pioneering billion‑parameter pure time‑series large model (PCTLM) trained on a 1.5‑billion‑sample dataset, introduces a novel RLHF framework (TPO) for time‑series forecasting, and demonstrates state‑of‑the‑art performance across multiple public benchmarks, surpassing existing models such as GPT4TS.

JD Tech Talk
JD Tech Talk
JD Tech Talk
A Billion-Scale Pure Time Series Large Model: PCTLM with SFT and TPO for Forecasting

1. Introduction

Time series forecasting is a core technology for supply chain management, energy scheduling, and financial risk control. Traditional methods (ARIMA, Prophet) and early deep learning models (LSTM, TCN) struggle with complex pattern capture, zero‑shot generalization, low‑quality datasets, and lack of effective RLHF solutions for large models.

Recent adaptations of large language models (LLM) to time‑series prediction (e.g., GPT4TS, TimesFM) have not achieved breakthrough results due to dataset quality issues and the absence of suitable RLHF frameworks.

2. Technical Solution

The supply‑chain algorithm team built the industry’s first billion‑scale pure time‑series model, achieving SOTA on several public datasets.

Data: a 1.5‑billion‑sample high‑quality dataset constructed via time‑slice, data‑ratio, and synthetic data generation paradigms.

Model: the universal PCTLM model patches time‑series data, enhances cross‑patch information capture, and incorporates grouped attention with temporal position encoding.

Domain optimization: a novel RLHF scheme for time‑series models, implemented as the TPO reinforcement‑learning framework.

2.1 Dataset Construction

Existing public time‑series datasets are limited in size and diversity. To address this, we combined JD self‑operated sales data, public datasets, and synthetic data, applying quality filtering, deduplication, diversity sorting, and data‑ratio strategies, resulting in approximately 1.5 billion samples—the largest in the field.

Base Data

JD dataset: multi‑category sales data from the past three years, covering about 1.2 billion samples.

Public datasets: Monash and TSLib collections, expanded via random time‑point slicing, totaling ~20 million samples.

Synthetic dataset: generated from simple model predictions and custom trend, seasonality, and noise components, adding ~400 million samples.

Data Cleaning

Labeling: each sample is annotated with length, average sales, zero‑sale ratio, etc.

Quality filtering: remove short or highly noisy series.

Deduplication: cluster within random groups and retain top‑N samples per cluster.

Diversity sorting: reorder batches to maximize feature variety.

Data ratio: set synthetic 20 %, public 4 %, JD 76 % and adjust dimensional ratios.

2.2 Model Design – PCTLM

PCTLM (Patch Convolutional Timeseries Large Model) uses overlapping patches and a masked encoder architecture. Convolutional layers capture cross‑patch information, and a grouped attention mechanism with rotary position encoding (ROPE) reduces computational cost.

2.3 Training Scheme – RLHF

Standard RL methods for text LLMs cannot be directly applied to pure time‑series models because outputs are continuous values and loss functions differ. We propose TPO (Timeseries Policy Optimization), a reinforcement‑learning framework tailored for time‑series models.

Input: augment original time‑series with good/bad prediction pairs to guide fine‑tuning toward better forecasts.

Probability‑output component: model predictions are treated as samples from N(μ, 1) to enable probability computation and KL‑divergence calculation.

Advantage function: based on REINFORCE‑style baseline reward differences, encouraging predictions closer to the preferred (good) outcomes.

Time‑series loss: combine MSE with RL loss to preserve forecasting accuracy while preventing over‑fitting during fine‑tuning.

3. Model Effectiveness

On public benchmarks, the SFT + TPO‑enhanced PCTLM outperforms GPT4TS and five leading full‑shot time‑series deep learning methods (PatchTST, AutoFormer, iTransformer, DLinear, Informer), achieving the lowest MAE in most cases.

4. Conclusion

We introduce a complete training pipeline (PCTLM + SFT + TPO) for time‑series large models, delivering the first billion‑scale pure time‑series model with superior zero‑shot performance over GPT4TS and supervised baselines. The proposed RLHF scheme (TPO) outperforms existing RLHF frameworks (PPO, RLOO) in both performance and effectiveness. Deployed in JD’s supply‑chain system, the model now serves 20,000 SKUs, significantly improving automatic replenishment accuracy.

For more details, see the paper: https://arxiv.org/abs/2501.15942

Scan the QR code to join the technical community.

Big Datadeep learningtime series forecastinglarge language modelRLHFPCTLMTPO
JD Tech Talk
Written by

JD Tech Talk

Official JD Tech public account delivering best practices and technology innovation.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.