Time Series Forecasting: Tools, Models, and Lessons from Ctrip
This article outlines Ctrip's approach to time series forecasting, covering background, common tools such as factor‑based models, traditional statistical methods like ARIMA, and machine‑learning techniques including tree and neural networks, and shares practical experiences on data splitting, feature engineering, model stability, and evaluation.
Background
With the rapid growth of big data, many domains—including natural sciences, social sciences, industrial engineering, and fintech—accumulate massive datasets, among which time‑series data (ordered by timestamps) play a crucial role. Predicting future states of such series supports cash‑flow forecasting in finance, revenue and inventory forecasting in retail, order and service volume forecasting in tourism, as well as weather and population density predictions.
Ctrip also faces time‑series forecasting challenges such as order volume, call volume, and visitor flow predictions, and the following sections describe the methods and insights applied.
Common Time‑Series Forecasting Tools
Time‑series forecasting methods can be grouped into three categories:
Factor‑based models derived from business‑domain understanding.
Traditional statistical models such as mean regression, ARIMA, and exponential smoothing (e.g., Holt‑Winters).
Machine‑learning models, including tree‑based algorithms and neural networks.
1. Factor‑Based Models
These models excel when historical data are limited or when strong business interpretability is required. They serve as baselines and help calibrate black‑box outputs. Example cases include:
Visitor‑flow prediction: Decompose the forecast into a baseline volume, a weekly pattern factor, and an event impact factor, adjusting for known events.
New international business line: Identify key business‑related factors via feature importance from tree models, then forecast those factors with statistical methods.
2. Traditional Statistical Models
Common techniques include mean regression, ARIMA, and exponential smoothing. They are low‑complexity and fast but may underperform when external influences (marketing campaigns, natural disasters) affect the series. Nevertheless, they remain valuable as baselines, for anomaly detection, as components of ensemble models, and for providing reasonable prediction ranges to stabilize black‑box outputs.
3. Machine‑Learning Models
Time‑series forecasting can be framed as a regression problem. Two major families are used:
Tree Models
Algorithms such as XGBoost or LightGBM allow easy incorporation of categorical features (e.g., seasonality flags, holiday indicators). Effective feature engineering includes:
Discrete time features (year, month, day, hour, weekday, day‑of‑year, week‑of‑year).
Binary time flags (holiday, weekend, adjusted workday).
Sliding‑window aggregations (past X‑day mean, variance, max, quartiles, skewness).
Predictions from other statistical models (ARIMA, SARIMA, exponential smoothing) as features.
Cross‑line business data (e.g., pre‑sale metrics for after‑sale forecasts).
Neural Networks
Models such as CNN, RNN, LSTM, and GRU are applied when abundant sequential data are available. LSTM, for instance, can integrate external variables (weather, holidays) and automatically extract temporal patterns. Multi‑task learning—predicting multiple time slots or related series jointly—has shown to improve generalization and accuracy.
Practical Experiences and Reflections
Train‑test split: Preserve temporal order; test data must follow the training period to avoid leakage.
Leveraging frontline expertise: Encode domain knowledge (e.g., known anomalies, event impacts) into cleaning rules or calibration knowledge bases.
Using future‑looking signals: Incorporate reservation or browsing data that reflect upcoming user intent.
Ensuring output stability: Build interpretable prediction ranges to calibrate black‑box outputs and reduce outliers.
Retraining frequency: Adopt sliding‑window or expanding‑window strategies to retrain models as new data arrive, balancing computational cost and accuracy.
Model evaluation: Assess models based on accuracy, maintainability, interpretability, and stability, tailored to each project’s requirements.
Trust in historical data: Older data may be down‑weighted as business dynamics evolve; recent data receive higher importance.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.