Artificial Intelligence 14 min read

Time Series Feature Engineering Techniques in Python

This article explains how to extract a variety of date‑time based features—including date, time, lag, rolling, expanding, and domain‑specific attributes—from a time‑series dataset using pandas, and discusses proper validation strategies for building reliable forecasting models.

Python Programming Learning Circle

Sep 10, 2024

Time Series Feature Engineering Techniques in Python

In business analytics, time is a core component for analyzing sales, revenue, profit, and growth, and for making forecasts; handling time‑sensitive data requires careful feature engineering.

The article begins with a brief introduction to time‑series concepts, emphasizing that observations are captured at regular intervals and each point depends on its predecessors.

Using a sample dataset named Train_SU63ISt.csv that contains a datetime column and a count column, the data is loaded and the datetime column is converted to a proper datetime type:

import pandas as pd
 data = pd.read_csv('Train_SU63ISt.csv')
 data['Datetime'] = pd.to_datetime(data['Datetime'], format='%d-%m-%Y %H:%M')
 data.dtypes

Several date‑related features are then derived, such as year, month, day, day of week (numeric and name):

data['year'] = data['Datetime'].dt.year
 data['month'] = data['Datetime'].dt.month
 data['day'] = data['Datetime'].dt.day
 data['dayofweek_num'] = data['Datetime'].dt.dayofweek
 data['dayofweek_name'] = data['Datetime'].dt.weekday_name
 data.head()

Time‑related features like hour and minute are extracted similarly:

data['Hour'] = data['Datetime'].dt.hour
 data['minute'] = data['Datetime'].dt.minute
 data.head()

Lag features are created by shifting the target variable, allowing the model to use past values as predictors:

data['lag_1'] = data['Count'].shift(1)
 data['lag_2'] = data['Count'].shift(2)
 data['lag_3'] = data['Count'].shift(3)
 data['lag_4'] = data['Count'].shift(4)
 data['lag_5'] = data['Count'].shift(5)
 data['lag_6'] = data['Count'].shift(6)
 data['lag_7'] = data['Count'].shift(7)
 data = data[['Datetime', 'lag_1', 'lag_2', 'lag_3', 'lag_4', 'lag_5', 'lag_6', 'lag_7', 'Count']]
 data.head(10)

Rolling‑window statistics (e.g., a 7‑day moving average) are computed to capture recent trends:

data['rolling_mean'] = data['Count'].rolling(window=7).mean()
 data = data[['Datetime', 'rolling_mean', 'Count']]
 data.head(10)

Expanding‑window features, which consider all data up to the current point, are generated with the expanding() method:

data['expanding_mean'] = data['Count'].expanding(2).mean()
 data = data[['Datetime', 'Count', 'expanding_mean']]
 data.head(10)

The article also discusses domain‑specific features, such as incorporating holidays or weather information, and stresses the importance of understanding the problem context to design effective features.

For model validation, a time‑ordered split is recommended: the last three months of the 25‑month dataset are held out as a validation set, while the earlier data is used for training, ensuring that future information is never leaked into the model.

Finally, the article concludes that mastering these feature‑engineering techniques transforms a time‑series forecasting problem into a supervised learning task, enabling the use of regression models like linear regression or random forests with greater confidence.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Feature Engineering forecasting Time-series pandas

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.