Artificial Intelligence 12 min read

Time Series Forecasting and Anomaly Detection for API Traffic Using Seasonal Decomposition and ARIMA

The article presents a complete workflow for predicting next‑day API request volumes by exploring per‑minute traffic data, handling missing values, applying seasonal decomposition, training an ARIMA model on the trend component, and generating confidence intervals to flag anomalous spikes.

Python Programming Learning Circle

May 21, 2020

Time Series Forecasting and Anomaly Detection for API Traffic Using Seasonal Decomposition and ARIMA

Company platforms expose many APIs (account query, release, red‑packet, etc.) that log per‑minute access counts, resulting in 1440 records per day; the goal is to forecast the next day's traffic using historical data and trigger alerts when actual traffic deviates significantly from the prediction.

Data exploration uses a seven‑day sample (10080 minutes) stored in data with columns date (minute timestamp) and count (access count). Initial plots reveal abrupt drops to zero caused by ETL‑generated placeholder values.

Missing values are filled by averaging the surrounding points:

data = pd.read_csv(filename)

print('size: ', data.shape)

print(data.head())

Key characteristics identified for modeling:

Strong daily seasonality with higher activity in afternoons/evenings.

Frequent spikes and drops, requiring smoothing before modeling.

Different APIs may exhibit vastly different patterns, so the model must be adaptable.

Preprocessing

1. Split the first six days as training data and the seventh day as test data.

class ModelDecomp(object):

def __init__(self, file, test_size=1440):

self.ts = self.read_data(file)

self.test_size = test_size

self.train_size = len(self.ts) - self.test_size

self.train = self.ts[:len(self.ts)-test_size]

self.test = self.ts[-self.test_size:]

2. Smooth the training series by differencing, detecting outliers beyond 1.5×IQR, and replacing them with the mean of surrounding values:

def _diff_smooth(self, ts):

dif = ts.diff().dropna()  # difference series

td = dif.describe()

high = td['75%'] + 1.5 * (td['75%'] - td['25%'])

low = td['25%'] - 1.5 * (td['75%'] - td['25%'])

forbid_index = dif[(dif > high) | (dif < low)].index

i = 0

while i < len(forbid_index) - 1:

n = 1

start = forbid_index[i]

while forbid_index[i+n] == start + timedelta(minutes=n):

n += 1

i += n - 1

end = forbid_index[i]

value = np.linspace(ts[start - timedelta(minutes=1)], ts[end + timedelta(minutes=1)], n)

ts[start:end] = value

i += 1

self.train = self._diff_smooth(self.train)

draw_ts(self.train)

3. Decompose the (smoothed) series into trend, seasonal, and residual components using statsmodels:

from statsmodels.tsa.seasonal import seasonal_decompose

decomposition = seasonal_decompose(self.ts, freq=1440, two_sided=False)

self.trend = decomposition.trend

self.seasonal = decomposition.seasonal

self.residual = decomposition.resid

decomposition.plot()

The additive model assumes observed = trend + seasonal + residual. Only the trend component is modeled further.

Modeling

Train an ARIMA model on the trend part:

def trend_model(self, order):

self.trend.dropna(inplace=True)

train = self.trend[:len(self.trend)-self.test_size]

self.trend_model = ARIMA(train, order).fit(disp=-1, method='css')

Predict the next day's trend, then add back the seasonal pattern and confidence bounds derived from the residual distribution:

d = self.residual.describe()

delta = d['75%'] - d['25%']

self.low_error, self.high_error = (d['25%'] - 1 * delta, d['75%'] + 1 * delta)

def predict_new(self):

n = self.test_size

self.pred_time_index = pd.date_range(start=self.train.index[-1], periods=n+1, freq='1min')[1:]

self.trend_pred = self.trend_model.forecast(n)[0]

self.add_season()

def add_season(self):

self.train_season = self.seasonal[:self.train_size]

values, low_conf_values, high_conf_values = [], [], []

for i, t in enumerate(self.pred_time_index):

trend_part = self.trend_pred[i]

season_part = self.train_season[self.train_season.index.time == t.time()].mean()

predict = trend_part + season_part

low_bound = predict + self.low_error

high_bound = predict + self.high_error

values.append(predict)

low_conf_values.append(low_bound)

high_conf_values.append(high_bound)

self.final_pred = pd.Series(values, index=self.pred_time_index, name='predict')

self.low_conf = pd.Series(low_conf_values, index=self.pred_time_index, name='low_conf')

self.high_conf = pd.Series(high_conf_values, index=self.pred_time_index, name='high_conf')

Evaluation

Apply the pipeline to the sample file, plot the original series, predictions, and confidence intervals, and compute RMSE:

md = ModelDecomp(file=filename, test_size=1440)

md.decomp(freq=1440)

md.trend_model(order=(1,1,3))

md.predict_new()

pred = md.final_pred

test = md.test

plt.subplot(211)

plt.plot(md.ts)

plt.subplot(212)

pred.plot(color='blue', label='Predict')

test.plot(color='red', label='Original')

md.low_conf.plot(color='grey', label='low')

md.high_conf.plot(color='grey', label='high')

plt.legend(loc='best')

plt.title('RMSE: %.4f' % np.sqrt(sum((pred.values - test.values) ** 2) / test.size))

plt.show()

The resulting RMSE is about 462.8, which is acceptable given the magnitude of the raw counts; two abrupt spikes in the test set exceed the confidence bounds and are correctly flagged as anomalies.

Conclusion

The core idea for any periodic API traffic series is to decompose the signal, model the trend, re‑assemble with seasonal and residual components, and define confidence intervals for anomaly detection; the approach can be adapted to other APIs with different patterns.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Anomaly Detection data preprocessing forecasting Time-series ARIMA

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.