Artificial Intelligence 20 min read

Time-Series Forecasting Augmentation: Frequency, Decomposition, and Patch Methods Compared

The article examines the challenges of augmenting time-series forecasting, reviews mainstream techniques—including frequency-domain, decomposition, and patch-based methods—and demonstrates through extensive experiments that Temporal Patch Shuffle (TPS) consistently achieves superior performance across long-term, short-term, and classification tasks.

Data Party THU

May 26, 2026

Time-Series Forecasting Augmentation: Frequency, Decomposition, and Patch Methods Compared

Why Classification‑Oriented Augmentation Fails for Forecasting

Classic augmentation techniques such as jittering, scaling, window warping, and permutation were designed for classification where the label is static. In forecasting the label is a future continuous segment; perturbing only the input breaks the input‑target alignment, causing performance drops.

Data‑Label Consistency: A Necessary Condition

Let the look‑back window be x and the prediction horizon be y. The training object is the concatenated sequence s = x ∥ y, not x alone. Augmentation must be applied to s before splitting back into (\tilde{x}, \tilde{y}). This preserves the continuity between input and target.

s = x ∥ y, \tilde{s} = \mathcal{A}(s), (\tilde{x}, \tilde{y}) = Split(\tilde{s})

Figure 1 illustrates the pipeline: the look‑back window and horizon are concatenated, augmented, then split to keep input‑target alignment.

Classification of Prediction‑Enhancement Methods

Effective recent methods fall into three streams: frequency‑domain, decomposition‑based, and controlled signal‑level manipulation.

Frequency‑based: RobustTAD, FreqMask, FreqMix, WaveMask, WaveMix, Dominant Shuffle.

Decomposition‑based: STAug.

Other: wDBA, MBB, Upsample.

Patch‑based: TPS.

Frequency‑Domain Methods

RobustTAD

RobustTAD applies a discrete Fourier transform to the concatenated sequence, perturbs selected frequency bands, and inverses back to the time domain. Perturbation can replace amplitudes with samples from a Gaussian distribution or add a small offset to phases. Although originally proposed for anomaly detection, the amplitude‑perturbation variant is used in multivariate forecasting experiments.

FreqMask and FreqMix

Both start with a real FFT of the concatenated sequence: s = x ∥ y, S = rFFT(s) FreqMask zeroes out a binary mask M on selected frequencies: S̃ = M ⊙ S, \tilde{s} = irFFT(S̃) FreqMix mixes the spectra of two sequences using the same mask:

S̃ = M ⊙ S₁ + (1 - M) ⊙ S₂, \tilde{s} = irFFT(S̃)

These operations are simple but lose temporal localization of the affected frequencies.

WaveMask and WaveMix

Wavelet transforms retain time localization. The discrete wavelet transform (DWT) decomposes s into multi‑level coefficients W^{(l)}. Masking or mixing is applied per level:

s = x ∥ y
W = WaveDec(s) = {W^{(1)}, W^{(2)}, …, W^{(L+1)}}
WaveMask: \tilde{W}^{(l)} = M^{(l)} ⊙ W^{(l)}
WaveMix:  \tilde{W}^{(l)} = M^{(l)} ⊙ W^{(l)}_1 + (1 - M^{(l)}) ⊙ W^{(l)}_2
s̃ = WaveRec(\tilde{W})

WaveMask removes selected wavelet coefficients; WaveMix swaps coefficients between two sequences. Experiments show they outperform all baselines on 12 out of 16 prediction horizons.

Dominant Shuffle

Dominant Shuffle selects the top‑k dominant frequencies from the FFT and shuffles only those components:

S = FFT(s)
Ω_k = indices of top‑k dominant frequencies
S̃_{Ω_k} = Shuffle(S_{Ω_k})
s̃ = IFFT(S̃)

This conservative approach avoids over‑perturbing the spectrum. In the unified TPS benchmark it is not the strongest method.

Decomposition‑Based Method: STAug

STAug applies Empirical Mode Decomposition (EMD) to two sequences, obtains intrinsic mode functions (IMFs), and recombines them with mixup‑style interpolation weights sampled from a uniform distribution. The resulting sequence mixes temporal features from both inputs.

STAug suffers from high memory consumption of EMD; the TPS experiments could not evaluate it on the ECL and Traffic datasets due to GPU memory limits, as also noted in the original paper.

Beyond Frequency: wDBA, MBB, Upsample

wDBA aligns sequences with Dynamic Time Warping (DTW) and averages them to synthesize new samples; it yields high‑quality data but is computationally expensive. MBB decomposes a series into trend, seasonality, and residual via STL, then bootstraps blocks from the residual. Upsample simply selects a contiguous segment and linearly interpolates it back to the original length, acting as a strong non‑frequency baseline. In the TPS benchmark, Upsample consistently ranks near the top, though TPS still outperforms it overall.

From Image Patches to Time Patches

Patch‑based augmentation is mature in computer vision (e.g., PatchShuffle, PatchMix) because images have spatial redundancy. Time series lack such redundancy; naive non‑overlapping patches create hard boundaries and break input‑target alignment. Therefore, patch‑based ideas must be adapted carefully for the temporal domain.

Temporal Patch Shuffle (TPS)

The TPS pipeline is straightforward:

Concatenate look‑back window and horizon into a continuous sequence to enforce data‑label consistency.

Extract overlapping temporal patches using patch length p and stride s. Overlap ensures smooth transitions after reconstruction.

Compute a variance score for each patch (across all channels). Low‑variance patches contain fewer structural features and are safer to shuffle.

Select the lowest‑variance proportion α of patches and randomly permute their positions; the rest remain unchanged.

Reconstruct the sequence by placing each (possibly shuffled) patch back and averaging overlapping regions to smooth discontinuities.

Split the reconstructed sequence back into the augmented input and target.

Algorithm 1 (TPS) has three hyper‑parameters: patch length p, stride s, and shuffle proportion α. Instead of exhaustive grid search, the authors evaluate ~20 candidate configurations on a validation set.

Ablation Studies

Key findings from the ablations:

Data‑label consistency is decisive; augmenting only the input while keeping the target unchanged causes the largest performance drop.

Overlapping patches are crucial; replacing them with non‑overlapping patches degrades results noticeably.

Variance‑aware ordering provides a modest gain; its benefit disappears when all patches are shuffled ( α = 1.0).

Temporal‑domain shuffling outperforms frequency‑domain variants, confirming that direct time‑domain manipulation is most effective.

Higher shuffle ratios (0.7–1.0) generally yield stronger, stable improvements across datasets.

Overall, the experiments emphasize that augmentation must inject controlled randomness that respects the signal’s temporal structure.

Long‑Term Forecasting Evaluation

TPS was evaluated on nine long‑term forecasting datasets using five recent backbones: TSMixer, DLinear, PatchTST, TiDE, and LightTS. TPS achieved the best average MSE on every backbone and the highest win‑rate. Improvements over the strongest competing augmentation ranged from 2.08% to 10.51%, with LightTS showing the largest gain.

Short‑Term Traffic Forecasting

On four short‑term traffic datasets (PeMS‑03, 04, 07, 08) using PatchTST as the backbone, TPS again delivered the strongest augmentation performance, with MSE improvements of 7.14%, 2.34%, 0.00%, and 4.26% respectively. Even on the dataset where the gain was zero, TPS never degraded performance.

Extension to Time‑Series Classification

TPS adapts smoothly to classification: the concatenation step is omitted, and shuffling is performed at the sample level rather than batch level. On 30 univariate UCR datasets (MiniRocket) and 10 multivariate UEA datasets (MultiRocket), TPS achieved the highest average accuracy among compared augmentations, improving accuracy by 0.50% and 1.10% respectively.

Summary

TPS’s advantage stems from three combined factors: it avoids costly decomposition steps, does not indiscriminately perturb the entire spectrum, and crucially preserves input‑target alignment. By applying variance‑aware, overlapping patch shuffling directly in the time domain, TPS delivers consistent, architecture‑agnostic gains across long‑term, short‑term, and classification tasks, establishing a new state‑of‑the‑art benchmark for time‑series prediction augmentation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

data augmentation time series forecasting frequency domain Temporal Patch Shuffle wavelet patch-based methods prediction enhancement

Written by

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.