Time Series Feature Engineering Techniques in Python
This article explains how to extract a variety of date‑time based features—including date, time, lag, rolling, expanding, and domain‑specific attributes—from a time‑series dataset using pandas, and discusses proper validation strategies for building reliable forecasting models.
In business analytics, time is a core component for analyzing sales, revenue, profit, and growth, and for making forecasts; handling time‑sensitive data requires careful feature engineering.
The article begins with a brief introduction to time‑series concepts, emphasizing that observations are captured at regular intervals and each point depends on its predecessors.
Using a sample dataset named Train_SU63ISt.csv that contains a datetime column and a count column, the data is loaded and the datetime column is converted to a proper datetime type:
<code>import pandas as pd
data = pd.read_csv('Train_SU63ISt.csv')
data['Datetime'] = pd.to_datetime(data['Datetime'], format='%d-%m-%Y %H:%M')
data.dtypes</code>Several date‑related features are then derived, such as year, month, day, day of week (numeric and name):
<code>data['year'] = data['Datetime'].dt.year
data['month'] = data['Datetime'].dt.month
data['day'] = data['Datetime'].dt.day
data['dayofweek_num'] = data['Datetime'].dt.dayofweek
data['dayofweek_name'] = data['Datetime'].dt.weekday_name
data.head()</code>Time‑related features like hour and minute are extracted similarly:
<code>data['Hour'] = data['Datetime'].dt.hour
data['minute'] = data['Datetime'].dt.minute
data.head()</code>Lag features are created by shifting the target variable, allowing the model to use past values as predictors:
<code>data['lag_1'] = data['Count'].shift(1)
data['lag_2'] = data['Count'].shift(2)
data['lag_3'] = data['Count'].shift(3)
data['lag_4'] = data['Count'].shift(4)
data['lag_5'] = data['Count'].shift(5)
data['lag_6'] = data['Count'].shift(6)
data['lag_7'] = data['Count'].shift(7)
data = data[['Datetime', 'lag_1', 'lag_2', 'lag_3', 'lag_4', 'lag_5', 'lag_6', 'lag_7', 'Count']]
data.head(10)</code>Rolling‑window statistics (e.g., a 7‑day moving average) are computed to capture recent trends:
<code>data['rolling_mean'] = data['Count'].rolling(window=7).mean()
data = data[['Datetime', 'rolling_mean', 'Count']]
data.head(10)</code>Expanding‑window features, which consider all data up to the current point, are generated with the expanding() method:
<code>data['expanding_mean'] = data['Count'].expanding(2).mean()
data = data[['Datetime', 'Count', 'expanding_mean']]
data.head(10)</code>The article also discusses domain‑specific features, such as incorporating holidays or weather information, and stresses the importance of understanding the problem context to design effective features.
For model validation, a time‑ordered split is recommended: the last three months of the 25‑month dataset are held out as a validation set, while the earlier data is used for training, ensuring that future information is never leaked into the model.
Finally, the article concludes that mastering these feature‑engineering techniques transforms a time‑series forecasting problem into a supervised learning task, enabling the use of regression models like linear regression or random forests with greater confidence.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.