Time-Series Forecasting Augmentation: Frequency, Decomposition, and Patch Methods Compared

The article examines the challenges of augmenting time-series forecasting, reviews mainstream techniques—including frequency-domain, decomposition, and patch-based methods—and demonstrates through extensive experiments that Temporal Patch Shuffle (TPS) consistently achieves superior performance across long-term, short-term, and classification tasks.

Data Party THU
Data Party THU
Data Party THU
Time-Series Forecasting Augmentation: Frequency, Decomposition, and Patch Methods Compared

Why Classification‑Oriented Augmentation Fails for Forecasting

Classic augmentation techniques such as jittering, scaling, window warping, and permutation were designed for classification where the label is static. In forecasting the label is a future continuous segment; perturbing only the input breaks the input‑target alignment, causing performance drops.

Data‑Label Consistency: A Necessary Condition

Let the look‑back window be x and the prediction horizon be y. The training object is the concatenated sequence s = x ∥ y, not x alone. Augmentation must be applied to s before splitting back into (\tilde{x}, \tilde{y}). This preserves the continuity between input and target.

s = x ∥ y, \tilde{s} = \mathcal{A}(s), (\tilde{x}, \tilde{y}) = Split(\tilde{s})

Figure 1 illustrates the pipeline: the look‑back window and horizon are concatenated, augmented, then split to keep input‑target alignment.

image
image

Classification of Prediction‑Enhancement Methods

Effective recent methods fall into three streams: frequency‑domain, decomposition‑based, and controlled signal‑level manipulation.

Frequency‑based: RobustTAD, FreqMask, FreqMix, WaveMask, WaveMix, Dominant Shuffle.

Decomposition‑based: STAug.

Other: wDBA, MBB, Upsample.

Patch‑based: TPS.

Frequency‑Domain Methods

RobustTAD

RobustTAD applies a discrete Fourier transform to the concatenated sequence, perturbs selected frequency bands, and inverses back to the time domain. Perturbation can replace amplitudes with samples from a Gaussian distribution or add a small offset to phases. Although originally proposed for anomaly detection, the amplitude‑perturbation variant is used in multivariate forecasting experiments.

FreqMask and FreqMix

Both start with a real FFT of the concatenated sequence: s = x ∥ y, S = rFFT(s) FreqMask zeroes out a binary mask M on selected frequencies: S̃ = M ⊙ S, \tilde{s} = irFFT(S̃) FreqMix mixes the spectra of two sequences using the same mask:

S̃ = M ⊙ S₁ + (1 - M) ⊙ S₂, \tilde{s} = irFFT(S̃)

These operations are simple but lose temporal localization of the affected frequencies.

image
image

WaveMask and WaveMix

Wavelet transforms retain time localization. The discrete wavelet transform (DWT) decomposes s into multi‑level coefficients W^{(l)}. Masking or mixing is applied per level:

s = x ∥ y
W = WaveDec(s) = {W^{(1)}, W^{(2)}, …, W^{(L+1)}}
WaveMask: \tilde{W}^{(l)} = M^{(l)} ⊙ W^{(l)}
WaveMix:  \tilde{W}^{(l)} = M^{(l)} ⊙ W^{(l)}_1 + (1 - M^{(l)}) ⊙ W^{(l)}_2
s̃ = WaveRec(\tilde{W})

WaveMask removes selected wavelet coefficients; WaveMix swaps coefficients between two sequences. Experiments show they outperform all baselines on 12 out of 16 prediction horizons.

image
image

Dominant Shuffle

Dominant Shuffle selects the top‑k dominant frequencies from the FFT and shuffles only those components:

S = FFT(s)
Ω_k = indices of top‑k dominant frequencies
S̃_{Ω_k} = Shuffle(S_{Ω_k})
s̃ = IFFT(S̃)

This conservative approach avoids over‑perturbing the spectrum. In the unified TPS benchmark it is not the strongest method.

image
image

Decomposition‑Based Method: STAug

STAug applies Empirical Mode Decomposition (EMD) to two sequences, obtains intrinsic mode functions (IMFs), and recombines them with mixup‑style interpolation weights sampled from a uniform distribution. The resulting sequence mixes temporal features from both inputs.

STAug suffers from high memory consumption of EMD; the TPS experiments could not evaluate it on the ECL and Traffic datasets due to GPU memory limits, as also noted in the original paper.

image
image

Beyond Frequency: wDBA, MBB, Upsample

wDBA aligns sequences with Dynamic Time Warping (DTW) and averages them to synthesize new samples; it yields high‑quality data but is computationally expensive. MBB decomposes a series into trend, seasonality, and residual via STL, then bootstraps blocks from the residual. Upsample simply selects a contiguous segment and linearly interpolates it back to the original length, acting as a strong non‑frequency baseline. In the TPS benchmark, Upsample consistently ranks near the top, though TPS still outperforms it overall.

From Image Patches to Time Patches

Patch‑based augmentation is mature in computer vision (e.g., PatchShuffle, PatchMix) because images have spatial redundancy. Time series lack such redundancy; naive non‑overlapping patches create hard boundaries and break input‑target alignment. Therefore, patch‑based ideas must be adapted carefully for the temporal domain.

image
image

Temporal Patch Shuffle (TPS)

The TPS pipeline is straightforward:

Concatenate look‑back window and horizon into a continuous sequence to enforce data‑label consistency.

Extract overlapping temporal patches using patch length p and stride s. Overlap ensures smooth transitions after reconstruction.

Compute a variance score for each patch (across all channels). Low‑variance patches contain fewer structural features and are safer to shuffle.

Select the lowest‑variance proportion α of patches and randomly permute their positions; the rest remain unchanged.

Reconstruct the sequence by placing each (possibly shuffled) patch back and averaging overlapping regions to smooth discontinuities.

Split the reconstructed sequence back into the augmented input and target.

image
image

Algorithm 1 (TPS) has three hyper‑parameters: patch length p, stride s, and shuffle proportion α. Instead of exhaustive grid search, the authors evaluate ~20 candidate configurations on a validation set.

Ablation Studies

Key findings from the ablations:

Data‑label consistency is decisive; augmenting only the input while keeping the target unchanged causes the largest performance drop.

Overlapping patches are crucial; replacing them with non‑overlapping patches degrades results noticeably.

Variance‑aware ordering provides a modest gain; its benefit disappears when all patches are shuffled ( α = 1.0).

Temporal‑domain shuffling outperforms frequency‑domain variants, confirming that direct time‑domain manipulation is most effective.

Higher shuffle ratios (0.7–1.0) generally yield stronger, stable improvements across datasets.

Overall, the experiments emphasize that augmentation must inject controlled randomness that respects the signal’s temporal structure.

Long‑Term Forecasting Evaluation

TPS was evaluated on nine long‑term forecasting datasets using five recent backbones: TSMixer, DLinear, PatchTST, TiDE, and LightTS. TPS achieved the best average MSE on every backbone and the highest win‑rate. Improvements over the strongest competing augmentation ranged from 2.08% to 10.51%, with LightTS showing the largest gain.

image
image

Short‑Term Traffic Forecasting

On four short‑term traffic datasets (PeMS‑03, 04, 07, 08) using PatchTST as the backbone, TPS again delivered the strongest augmentation performance, with MSE improvements of 7.14%, 2.34%, 0.00%, and 4.26% respectively. Even on the dataset where the gain was zero, TPS never degraded performance.

image
image

Extension to Time‑Series Classification

TPS adapts smoothly to classification: the concatenation step is omitted, and shuffling is performed at the sample level rather than batch level. On 30 univariate UCR datasets (MiniRocket) and 10 multivariate UEA datasets (MultiRocket), TPS achieved the highest average accuracy among compared augmentations, improving accuracy by 0.50% and 1.10% respectively.

image
image

Summary

TPS’s advantage stems from three combined factors: it avoids costly decomposition steps, does not indiscriminately perturb the entire spectrum, and crucially preserves input‑target alignment. By applying variance‑aware, overlapping patch shuffling directly in the time domain, TPS delivers consistent, architecture‑agnostic gains across long‑term, short‑term, and classification tasks, establishing a new state‑of‑the‑art benchmark for time‑series prediction augmentation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

data augmentationtime series forecastingfrequency domainTemporal Patch Shufflewaveletpatch-based methodsprediction enhancement
Data Party THU
Written by

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.