Fundamentals 10 min read

Time Series Analysis with Python: Complete ARIMA Modeling Workflow

This tutorial walks through the full Python-based ARIMA modeling process for time‑series analysis, covering data loading, stationarity and white‑noise tests, model order selection, parameter estimation, diagnostic checks, and future forecasting with detailed code examples.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Time Series Analysis with Python: Complete ARIMA Modeling Workflow

The article introduces a step‑by‑step Python workflow for building and evaluating an ARIMA model for time‑series analysis, including plotting time series, performing stationarity tests, unit‑root tests, white‑noise tests, model order selection, parameter estimation, model diagnostics, and forecasting.

Time Series Analysis Concept

Time series analysis is a statistical branch that studies sequences of data points collected over time, such as stock indices, price indices, GDP, and sales volumes.

Basic Modeling Steps

Key steps include data import, visualizing the series, testing for stationarity, selecting model order, fitting the model, evaluating parameters, and making predictions.

Import Modules

<code>import sys
import os
import warnings
warnings.filterwarnings("ignore")
import pandas as pd
import numpy as np
from arch.unitroot import ADF
import matplotlib.pylab as plt
%matplotlib inline
from matplotlib.pylab import style
style.use('ggplot')
import statsmodels.api as sm
import statsmodels.formula.api as smf
import statsmodels.tsa.api as smt
from statsmodels.tsa.stattools import adfuller
from statsmodels.stats.diagnostic import acorr_ljungbox
from statsmodels.graphics.api import qqplot
pd.set_option('display.float_format', lambda x: '%.5f' % x)
np.set_printoptions(precision=5, suppress=True)
"""中文显示问题"""
plt.rcParams['font.family'] = ['sans-serif']
plt.rcParams['font.sans-serif'] = ['SimHei']
</code>

Load Data

<code>data = pd.read_excel("data.xlsx", index_col="年份", parse_dates=True)
data.head()
</code>

Stationarity Test

First‑order differencing is applied and plotted:

<code>data["diff1"] = data["xt"].diff(1).dropna()
data["diff2"] = data["diff1"].diff(1).dropna()
data1 = data.loc[:, ["xt", "diff1", "diff2"]]
data1.plot(subplots=True, figsize=(18, 12), title="差分图")
</code>

The Augmented Dickey‑Fuller test is then performed:

<code>print("单位根检验:\n")
print(ADF(data.diff1.dropna()))
</code>

The test returns a statistic of -3.156, p‑value 0.023, indicating the first‑difference series is stationary.

White‑Noise Test

<code>from statsmodels.stats.diagnostic import acorr_ljungbox
acorr_ljungbox(data.diff1.dropna(), lags=[i for i in range(1,12)], boxpierce=True)
</code>

All p‑values are below the significance level, confirming the differenced series is non‑white‑noise and suitable for modeling.

Model Order Selection

Autocorrelation and partial autocorrelation plots suggest candidate models ARIMA(1,1,0), ARIMA(1,1,1), and ARIMA(0,1,1).

Model Optimization

<code>arma_mod20 = sm.tsa.ARIMA(data["xt"], (1,1,0)).fit()
arma_mod30 = sm.tsa.ARIMA(data["xt"], (0,1,1)).fit()
arma_mod40 = sm.tsa.ARIMA(data["xt"], (1,1,1)).fit()
values = [[arma_mod20.aic, arma_mod20.bic, arma_mod20.hqic],
          [arma_mod30.aic, arma_mod30.bic, arma_mod30.hqic],
          [arma_mod40.aic, arma_mod40.bic, arma_mod40.hqic]]
df = pd.DataFrame(values, index=["AR(1,1,0)", "MA(0,1,1)", "ARMA(1,1,1)"],
                  columns=["AIC", "BIC", "hqic"])
print(df)
</code>

The AIC/BIC comparison indicates the MA(0,1,1) model (ARIMA(0,1,1)) performs best.

Parameter Estimation

<code>from statsmodels.tsa.arima_model import ARIMA
model = ARIMA(data["xt"], order=(0,1,1))
result = model.fit()
print(result.summary())
</code>

The summary shows significant coefficients (const = 4.9956, ma.L1.D.xt = 0.6710) with p‑values < 0.05.

Model Diagnostics

Residuals are examined with Ljung‑Box and QQ plots to confirm white‑noise behavior:

<code>resid = result.resid
fig = plt.figure(figsize=(12,8))
ax = fig.add_subplot(111)
fig = qqplot(resid, line='q', ax=ax, fit=True)
</code>

The QQ plot shows residuals follow a normal distribution, supporting model adequacy.

Model Prediction

<code>pred = result.predict('1988', '1990', dynamic=True, typ='levels')
print(pred)
plt.figure(figsize=(12, 8))
plt.xticks(rotation=45)
plt.plot(pred)
plt.plot(data.xt)
plt.show()
</code>

Predicted values for 1988‑1990 are 278.36, 283.35, and 288.35 respectively, and the plotted forecast aligns well with the original series.

data analysisstatistical modelingtime seriesARIMA
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.