Artificial Intelligence 18 min read

Transforming Time Series Data into Supervised Learning Datasets with Pandas

This tutorial explains how to convert single‑variable and multivariate time‑series data into supervised learning formats using pandas shift() and a custom series_to_supervised() function, covering one‑step, multi‑step, and sequence forecasting examples with complete Python code.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Transforming Time Series Data into Supervised Learning Datasets with Pandas

Machine‑learning methods such as deep learning can be applied to time‑series forecasting, but the series must first be reshaped into a supervised‑learning problem where each observation becomes a pair of input (X) and output (y) sequences.

The pandas shift() function is essential for this transformation: shifting a column forward creates lag (historical) features, while shifting backward creates forecast features, both of which can be combined to form the required X‑y pairs.

Example of creating a simple lag column:

<code>from pandas import DataFrame

df = DataFrame()
df['t'] = [x for x in range(10)]
print(df)</code>

Resulting DataFrame shows the original series. Adding a lag column with df['t-1'] = df['t'].shift(1) produces:

<code>   t  t-1
0  0  NaN
1  1  0.0
2  2  1.0
3  3  2.0
4  4  3.0
5  5  4.0
6  6  5.0
7  7  6.0
8  8  7.0
9  9  8.0</code>

Using a negative shift creates a forecast column, e.g., df['t+1'] = df['t'].shift(-1) , yielding the future values needed for prediction.

The core utility is the series_to_supervised() function, which automatically builds a supervised‑learning DataFrame from any time‑series array. Its parameters are:

data : list or 2‑D NumPy array of observations (required)

n_in : number of lag observations to use as inputs (default 1)

n_out : number of future observations to predict (default 1)

dropnan : whether to remove rows containing NaN values (default True)

The function concatenates shifted copies of the data, labels columns as var1(t‑k) for inputs and var1(t+m) for outputs, and returns the assembled DataFrame.

<code>from pandas import DataFrame, concat

def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
    """Convert time series to a supervised learning dataset.
    data: sequence of observations (list or ndarray)
    n_in: number of lag inputs
    n_out: number of forecast outputs
    dropnan: remove rows with NaN values
    """
    n_vars = 1 if type(data) is list else data.shape[1]
    df = DataFrame(data)
    cols, names = list(), list()
    # input sequence (t‑n, … t‑1)
    for i in range(n_in, 0, -1):
        cols.append(df.shift(i))
        names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
    # forecast sequence (t, t+1, … t+n)
    for i in range(0, n_out):
        cols.append(df.shift(-i))
        if i == 0:
            names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
        else:
            names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
    agg = concat(cols, axis=1)
    agg.columns = names
    if dropnan:
        agg.dropna(inplace=True)
    return agg</code>

One‑step prediction example:

<code>values = [x for x in range(10)]
data = series_to_supervised(values)
print(data)</code>

Output shows var1(t‑1) as input and var1(t) as target.

Multi‑step forecasting (e.g., two past steps to predict two future steps) is achieved by calling series_to_supervised(values, 2, 2) , producing columns var1(t‑2) , var1(t‑1) , var1(t) , and var1(t+1) .

For multivariate series, the same function handles multiple variables. Creating a DataFrame with columns ob1 and ob2 , then applying series_to_supervised(values) yields a supervised dataset where each time step contains both variables as inputs and outputs.

machine learningPythonforecastingtime seriespandassupervised learning
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.