Time Series Analysis and ARIMA Modeling Practice with Python
This article introduces time series fundamentals, classification, and challenges for internet businesses, then provides a step‑by‑step Python tutorial on ARIMA modeling—including data loading, stationarity testing, differencing, ACF/PACF analysis, AIC‑based order selection, model training, prediction, error evaluation, exogenous variable integration, and diagnostic checks.
Time series analysis is an important branch of statistics that studies patterns over time to forecast future values, applicable to stock prices, sales, rainfall, and other domains.
Time series can be classified by stationarity, indicator type, and time attribute, with period indicators being additive and point indicators non‑additive.
For internet companies, business volume forecasting faces challenges such as periodic effects, holidays, regional differences, inventory constraints, and external factors.
The article focuses on ARIMA modeling, introducing ARMA components (AR(p) and MA(q)) and how differencing transforms a non‑stationary series into a stationary one for ARMA application.
Practical steps using Python:
Step 1 – Load data:
df = pd.read_csv('testdata.csv', encoding='gbk', index_col='ddate')
df.index = pd.to_datetime(df.index)
df['cnt'] = df['cnt'].astype(float)Step 2 – Test stationarity with Augmented Dickey‑Fuller:
from statsmodels.tsa.stattools import adfuller
def test_stationarity(timeseries):
dftest = adfuller(timeseries, autolag='AIC')
return dftest[1]Step 3 – Differencing to achieve stationarity and re‑test.
Step 4 – Plot ACF and PACF to get initial hints for p and q.
Step 5 – Determine optimal (p,q) by minimizing AIC over a grid:
for p in range(1, pmax+1):
for q in range(1, qmax+1):
try:
model = ARIMA(endog=df['cnt'], order=(p,1,q))
results = model.fit(disp=-1)
print('ARIMA p:{} q:{} - AIC:{}'.format(p, q, results.aic))
except:
passThe minimum AIC suggests p=7, q=7 for a first‑order differenced series.
Step 6 – Train the ARIMA(7,1,7) model, generate predictions, and compute error rate (≈8.58%).
Step 7 – Incorporate exogenous variables (e.g., holidays, week identifiers) to improve accuracy, reducing error to about 1.77%.
Step 8 – Model diagnostics using residual QQ‑plot and Durbin‑Watson test (value ≈1.99) confirm normality and lack of autocorrelation.
Conclusion: Proper data preprocessing, model selection, and inclusion of relevant external factors are crucial for reliable time‑series forecasting.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.