How to Build and Forecast an ARMA Model in Python Using Sunspot Data
Learn to load the sunspots.csv dataset, fit an ARMA(4,2) model with statsmodels, evaluate it using AIC/BIC, visualize original and predicted values, and forecast the 1989 sunspot count, all with step‑by‑step Python code and plots.
ARMA Model Solution in Python
Using the sunspots.csv file, we build an appropriate ARMA model to predict the number of sunspots in 1989. The statsmodels.api.tsa.ARMA() function is used for fitting.
Code
<code>import pandas as pd, numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
plt.rc('font',family='SimHei'); plt.rc('font',size=16)
d=pd.read_csv('sunspots.csv',usecols=['counts'])
md=sm.tsa.ARMA(d,(9,1)).fit()
years=np.arange(1700,1989) # known observation years
dhat=md.predict()
plt.plot(years[-20:],d.values[-20:],'o-k')
plt.plot(years[-20:],dhat.values[-20:],'P--')
plt.legend(('Original observations','Predicted values'))
plt.show()
dnext=md.predict(d.shape[0],d.shape[0])
print(dnext) # show next period prediction
</code>Below is a complete modeling procedure.
Step 1: Plot the original data series to confirm stationarity.
Step 2: Use AIC and BIC criteria to select ARMA(4,2), fit the model with Python, and obtain residuals and their distribution.
Step 3: Using the fitted model, predict the 1989 sunspot count as 139, and compare with the original data.
Code
<code>import pandas as pd, numpy as np
import statsmodels.api as sm
import matplotlib.pylab as plt
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
plt.rc('axes',unicode_minus=False)
plt.rc('font',family='SimHei'); plt.rc('font',size=16)
d=pd.read_csv('sunspots.csv'); dd=d['counts']
years=d['year'].values.astype(int)
plt.plot(years,dd.values,'-*'); plt.figure()
ax1=plt.subplot(121); plot_acf(dd,ax=ax1,title='Autocorrelation')
ax2=plt.subplot(122); plot_pacf(dd,ax=ax2,title='Partial Autocorrelation')
for i in range(1,6):
for j in range(1,6):
md=sm.tsa.ARMA(dd,(i,j)).fit()
print([i,j,md.aic,md.bic])
zmd=sm.tsa.ARMA(dd,(4,2)).fit()
print(zmd.summary()) # display model information
residuals = pd.DataFrame(zmd.resid)
fig, ax = plt.subplots(1,2)
residuals.plot(title="Residuals", ax=ax[0])
residuals.plot(kind='kde', title='Density', ax=ax[1])
plt.legend(''); plt.ylabel('')
dhat=zmd.predict(); plt.figure()
plt.plot(years[-20:],dd.values[-20:],'o-k')
plt.plot(years[-20:],dhat.values[-20:],'P--')
plt.legend(('Original observations','Predicted values'))
dnext=zmd.predict(d.shape[0],d.shape[0])
print(dnext) # show next period prediction
plt.show()
</code>Model Perspective
Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.