Master VAR Modeling: Theory, Workflow, and Full Python Implementation
This guide explains the theory behind Vector Autoregression (VAR) models, outlines the complete modeling workflow—including data preparation, stationarity and cointegration testing, lag order selection, parameter estimation, stability diagnostics, and impulse‑response and variance‑decomposition analysis—and provides a full Python implementation with code examples.
VAR Modeling Steps
Traditional time‑series models such as ARIMA only use the series' own lags, whereas a VAR model also incorporates lagged values of other relevant factors, combining multivariate regression with lag analysis.
Basic VAR formulation is followed by extensions such as SVAR, CVAR, VECM, all belonging to the VAR family.
Modeling Procedure
The typical workflow includes:
Plot the series of N factors and compute correlation coefficients to assess linear relationships (note that zero correlation does not rule out nonlinear dependence).
Conduct stationarity tests (ADF) on each factor; if any factor is non‑stationary, difference all series until they become stationary.
Perform cointegration tests (e.g., Engle‑Granger) between the dependent variable yₜ and each explanatory factor after all series pass the ADF test; drop factors that fail.
Select lag order using information criteria (AIC, BIC) and likelihood‑ratio tests, typically preferring the smallest AIC/BIC and the largest LR value.
After determining the lag order, estimate the parameters and check their statistical significance, then test parameter stability (AR roots must lie inside the unit circle; CUSUM test should not reject the null of stability).
Stability‑checked models can be used for impulse‑response analysis (with Cholesky orthogonalized residuals) and variance‑decomposition to assess the impact of each factor on the target variable.
Implementing VAR in Python
The following Python code demonstrates the entire process, from data loading to model estimation, diagnostic testing, impulse‑response plotting, and variance‑decomposition.
<code># Model‑related packages
import statsmodels.api as sm
import statsmodels.stats.diagnostic
# Plotting packages
import matplotlib.pyplot as plt
# Other packages
import pandas as pd
import numpy as np
data = pd.read_excel('data/gong0809.xlsx', index_col=0)
read = data['read']
share = data['share']
favorite = data['favorite']
</code> <code># Plot series correlation
fig = plt.figure(figsize=(12,8))
plt.plot(read,'r',label='read')
plt.plot(share,'g',label='share')
plt.grid(True)
plt.axis('tight')
plt.legend(loc=0)
plt.ylabel('Price')
plt.show()
</code> <code># ADF unit‑root test
maxlags = 10
adfResult = sm.tsa.stattools.adfuller(read, maxlags)
output = pd.DataFrame(index=['Test Statistic Value','p-value','Lags Used','Number of Observations Used','Critical Value(1%)','Critical Value(5%)','Critical Value(10%)'],
columns=['value'])
output['value']['Test Statistic Value'] = adfResult[0]
output['value']['p-value'] = adfResult[1]
output['value']['Lags Used'] = adfResult[2]
output['value']['Number of Observations Used'] = adfResult[3]
output['value']['Critical Value(1%)'] = adfResult[4]['1%']
output['value']['Critical Value(5%)'] = adfResult[4]['5%']
output['value']['Critical Value(10%)'] = adfResult[4]['10%']
</code> <code># Cointegration test
result = sm.tsa.stattools.coint(read, favorite)
</code> <code># Model estimation and lag order selection (using VARMAX)
varLagNum = 5
data1 = data.copy()
data1 = data1.astype('float64')
orgMod = sm.tsa.VARMAX(data1, order=(varLagNum,0), trend='ct', exog=None)
fitMod = orgMod.fit(maxiter=1000, disp=False)
print(fitMod.summary())
resid = fitMod.resid
result = {'fitMod': fitMod, 'resid': resid}
</code> <code># CUSUM stability test
# Null hypothesis: stable (no drift); alternative: unstable
result = statsmodels.stats.diagnostic.breaks_cusumolsresid(resid)
</code> <code># Impulse‑response functions (Cholesky orthogonalized)
ax = fitMod.impulse_responses(terms, orthogonalized=True).plot(figsize=(12,8))
plt.show()
</code> <code># Variance decomposition
md = sm.tsa.VAR(dataFrame)
re = md.fit(2)
fevd = re.fevd(10)
print(fevd.summary())
fevd.plot(figsize=(12,16))
plt.show()
</code>Reference: https://blog.csdn.net/mooncrystal123/article/details/86736397
Model Perspective
Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.