Call Center Volume Forecasting and Staffing Optimization at Ctrip: From Data Cleaning to V2.0 Predictive System
This article describes Ctrip's call‑center staffing challenge, detailing data cleaning, trend analysis, feature engineering, the initial ARIMAX‑Fourier model (V1.0), its limitations, and the improved V2.0 solution that combines TBATS, ARIMA residuals and XGBoost, achieving up to 89.5% prediction accuracy.
The project addresses Ctrip's call‑center human‑resource scheduling by forecasting inbound call volume a week in advance, aiming to balance service quality with labor cost reduction.
Background: Call volume is influenced by industry‑specific external factors such as marketing strategies, economic cycles, seasons, weather, and holidays, requiring these variables to be incorporated into predictive models.
Data Pipeline: Historical daily call counts are uploaded via a web portal, stored in MySQL, and synchronized to a Hive data warehouse for model input, replacing manual Excel‑based processes.
Trend Analysis: The data exhibits clear time‑series characteristics: strong seasonal patterns (yearly, monthly, weekly, hourly), a decreasing annual trend due to app self‑service, holiday dips, and spikes caused by extreme weather or flight disruptions.
Feature Engineering: External factors (X) such as ticket orders, weather, seasonal peaks, and holidays are engineered alongside the target series (Y) to enrich the model.
V1.0 Model: An ARIMAX model augmented with Fourier terms was deployed. While it performed adequately for regular days, it struggled with holidays and extreme weather because Fourier components cannot capture variable lunar‑calendar holidays and the linear treatment of external variables limited expressive power.
V2.0 Model: To overcome V1.0 shortcomings, the pipeline applies a Box‑Cox transformation, fits a TBATS model, models TBATS residuals with ARIMA, and finally corrects those residuals using an XGBoost regression tree that leverages the engineered external features. This hybrid approach captures both complex seasonality and non‑linear relationships.
Evaluation: Back‑testing from 2018‑03‑05 to 2018‑07‑10 shows progressive improvements: STL 73.9%, ARIMAX+Fourier 78.9%, TBATS 79.6%, XGBoost 82.7%, and the combined XGB+TBATS 89.5% average prediction accuracy.
Conclusion: Tree‑based regression outperforms traditional time‑series methods when strong feature engineering is applied. For time‑series forecasting in any industry, incorporating business‑driven features and using hybrid models is essential for high‑accuracy predictions.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.