Artificial Intelligence 13 min read

Time Series Forecasting: Tools, Models, and Lessons from Ctrip

This article outlines Ctrip's approach to time series forecasting, covering background, common tools such as factor‑based models, traditional statistical methods like ARIMA, and machine‑learning techniques including tree and neural networks, and shares practical experiences on data splitting, feature engineering, model stability, and evaluation.

Ctrip Technology
Ctrip Technology
Ctrip Technology
Time Series Forecasting: Tools, Models, and Lessons from Ctrip

Background

With the rapid growth of big data, many domains—including natural sciences, social sciences, industrial engineering, and fintech—accumulate massive datasets, among which time‑series data (ordered by timestamps) play a crucial role. Predicting future states of such series supports cash‑flow forecasting in finance, revenue and inventory forecasting in retail, order and service volume forecasting in tourism, as well as weather and population density predictions.

Ctrip also faces time‑series forecasting challenges such as order volume, call volume, and visitor flow predictions, and the following sections describe the methods and insights applied.

Common Time‑Series Forecasting Tools

Time‑series forecasting methods can be grouped into three categories:

Factor‑based models derived from business‑domain understanding.

Traditional statistical models such as mean regression, ARIMA, and exponential smoothing (e.g., Holt‑Winters).

Machine‑learning models, including tree‑based algorithms and neural networks.

1. Factor‑Based Models

These models excel when historical data are limited or when strong business interpretability is required. They serve as baselines and help calibrate black‑box outputs. Example cases include:

Visitor‑flow prediction: Decompose the forecast into a baseline volume, a weekly pattern factor, and an event impact factor, adjusting for known events.

New international business line: Identify key business‑related factors via feature importance from tree models, then forecast those factors with statistical methods.

2. Traditional Statistical Models

Common techniques include mean regression, ARIMA, and exponential smoothing. They are low‑complexity and fast but may underperform when external influences (marketing campaigns, natural disasters) affect the series. Nevertheless, they remain valuable as baselines, for anomaly detection, as components of ensemble models, and for providing reasonable prediction ranges to stabilize black‑box outputs.

3. Machine‑Learning Models

Time‑series forecasting can be framed as a regression problem. Two major families are used:

Tree Models

Algorithms such as XGBoost or LightGBM allow easy incorporation of categorical features (e.g., seasonality flags, holiday indicators). Effective feature engineering includes:

Discrete time features (year, month, day, hour, weekday, day‑of‑year, week‑of‑year).

Binary time flags (holiday, weekend, adjusted workday).

Sliding‑window aggregations (past X‑day mean, variance, max, quartiles, skewness).

Predictions from other statistical models (ARIMA, SARIMA, exponential smoothing) as features.

Cross‑line business data (e.g., pre‑sale metrics for after‑sale forecasts).

Neural Networks

Models such as CNN, RNN, LSTM, and GRU are applied when abundant sequential data are available. LSTM, for instance, can integrate external variables (weather, holidays) and automatically extract temporal patterns. Multi‑task learning—predicting multiple time slots or related series jointly—has shown to improve generalization and accuracy.

Practical Experiences and Reflections

Train‑test split: Preserve temporal order; test data must follow the training period to avoid leakage.

Leveraging frontline expertise: Encode domain knowledge (e.g., known anomalies, event impacts) into cleaning rules or calibration knowledge bases.

Using future‑looking signals: Incorporate reservation or browsing data that reflect upcoming user intent.

Ensuring output stability: Build interpretable prediction ranges to calibrate black‑box outputs and reduce outliers.

Retraining frequency: Adopt sliding‑window or expanding‑window strategies to retrain models as new data arrive, balancing computational cost and accuracy.

Model evaluation: Assess models based on accuracy, maintainability, interpretability, and stability, tailored to each project’s requirements.

Trust in historical data: Older data may be down‑weighted as business dynamics evolve; recent data receive higher importance.

machine learningdata analysisforecastingtime seriesARIMACtrip
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.