Artificial Intelligence 12 min read

Time Series Forecasting and Anomaly Detection for API Traffic Using Seasonal Decomposition and ARIMA

The article presents a complete workflow for predicting next‑day API request volumes by exploring per‑minute traffic data, handling missing values, applying seasonal decomposition, training an ARIMA model on the trend component, and generating confidence intervals to flag anomalous spikes.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Time Series Forecasting and Anomaly Detection for API Traffic Using Seasonal Decomposition and ARIMA

Company platforms expose many APIs (account query, release, red‑packet, etc.) that log per‑minute access counts, resulting in 1440 records per day; the goal is to forecast the next day's traffic using historical data and trigger alerts when actual traffic deviates significantly from the prediction.

Data exploration uses a seven‑day sample (10080 minutes) stored in data with columns date (minute timestamp) and count (access count). Initial plots reveal abrupt drops to zero caused by ETL‑generated placeholder values.

Missing values are filled by averaging the surrounding points:

data = pd.read_csv(filename)
print('size: ', data.shape)
print(data.head())

Key characteristics identified for modeling:

Strong daily seasonality with higher activity in afternoons/evenings.

Frequent spikes and drops, requiring smoothing before modeling.

Different APIs may exhibit vastly different patterns, so the model must be adaptable.

Preprocessing

1. Split the first six days as training data and the seventh day as test data.

class ModelDecomp(object):
def __init__(self, file, test_size=1440):
self.ts = self.read_data(file)
self.test_size = test_size
self.train_size = len(self.ts) - self.test_size
self.train = self.ts[:len(self.ts)-test_size]
self.test = self.ts[-self.test_size:]

2. Smooth the training series by differencing, detecting outliers beyond 1.5×IQR, and replacing them with the mean of surrounding values:

def _diff_smooth(self, ts):
dif = ts.diff().dropna()  # difference series
td = dif.describe()
high = td['75%'] + 1.5 * (td['75%'] - td['25%'])
low = td['25%'] - 1.5 * (td['75%'] - td['25%'])
forbid_index = dif[(dif > high) | (dif < low)].index
i = 0
while i < len(forbid_index) - 1:
n = 1
start = forbid_index[i]
while forbid_index[i+n] == start + timedelta(minutes=n):
n += 1
i += n - 1
end = forbid_index[i]
value = np.linspace(ts[start - timedelta(minutes=1)], ts[end + timedelta(minutes=1)], n)
ts[start:end] = value
i += 1
self.train = self._diff_smooth(self.train)
draw_ts(self.train)

3. Decompose the (smoothed) series into trend, seasonal, and residual components using statsmodels :

from statsmodels.tsa.seasonal import seasonal_decompose
decomposition = seasonal_decompose(self.ts, freq=1440, two_sided=False)
self.trend = decomposition.trend
self.seasonal = decomposition.seasonal
self.residual = decomposition.resid
decomposition.plot()

The additive model assumes observed = trend + seasonal + residual . Only the trend component is modeled further.

Modeling

Train an ARIMA model on the trend part:

def trend_model(self, order):
self.trend.dropna(inplace=True)
train = self.trend[:len(self.trend)-self.test_size]
self.trend_model = ARIMA(train, order).fit(disp=-1, method='css')

Predict the next day's trend, then add back the seasonal pattern and confidence bounds derived from the residual distribution:

d = self.residual.describe()
delta = d['75%'] - d['25%']
self.low_error, self.high_error = (d['25%'] - 1 * delta, d['75%'] + 1 * delta)
def predict_new(self):
n = self.test_size
self.pred_time_index = pd.date_range(start=self.train.index[-1], periods=n+1, freq='1min')[1:]
self.trend_pred = self.trend_model.forecast(n)[0]
self.add_season()
def add_season(self):
self.train_season = self.seasonal[:self.train_size]
values, low_conf_values, high_conf_values = [], [], []
for i, t in enumerate(self.pred_time_index):
trend_part = self.trend_pred[i]
season_part = self.train_season[self.train_season.index.time == t.time()].mean()
predict = trend_part + season_part
low_bound = predict + self.low_error
high_bound = predict + self.high_error
values.append(predict)
low_conf_values.append(low_bound)
high_conf_values.append(high_bound)
self.final_pred = pd.Series(values, index=self.pred_time_index, name='predict')
self.low_conf = pd.Series(low_conf_values, index=self.pred_time_index, name='low_conf')
self.high_conf = pd.Series(high_conf_values, index=self.pred_time_index, name='high_conf')

Evaluation

Apply the pipeline to the sample file, plot the original series, predictions, and confidence intervals, and compute RMSE:

md = ModelDecomp(file=filename, test_size=1440)
md.decomp(freq=1440)
md.trend_model(order=(1,1,3))
md.predict_new()
pred = md.final_pred
test = md.test
plt.subplot(211)
plt.plot(md.ts)
plt.subplot(212)
pred.plot(color='blue', label='Predict')
test.plot(color='red', label='Original')
md.low_conf.plot(color='grey', label='low')
md.high_conf.plot(color='grey', label='high')
plt.legend(loc='best')
plt.title('RMSE: %.4f' % np.sqrt(sum((pred.values - test.values) ** 2) / test.size))
plt.show()

The resulting RMSE is about 462.8, which is acceptable given the magnitude of the raw counts; two abrupt spikes in the test set exceed the confidence bounds and are correctly flagged as anomalies.

Conclusion

The core idea for any periodic API traffic series is to decompose the signal, model the trend, re‑assemble with seasonal and residual components, and define confidence intervals for anomaly detection; the approach can be adapted to other APIs with different patterns.

PythonAnomaly Detectiondata preprocessingforecastingtime seriesARIMA
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.