Artificial Intelligence 11 min read

Pyraformer: Low-Complexity Pyramidal Attention for Long-Range Time Series Modeling and Forecasting

Pyraformer introduces a pyramidal attention mechanism that captures long-range dependencies in time-series data with linear time and space complexity, achieving state-of-the-art forecasting accuracy on multiple real-world datasets while reducing computational cost, as demonstrated in extensive ICLR-2022 experiments.

AntTech
AntTech
AntTech
Pyraformer: Low-Complexity Pyramidal Attention for Long-Range Time Series Modeling and Forecasting

Artificial intelligence is rapidly reducing costs and being applied to everyday work, decision support, risk management, and operational optimization. In this context, Ant Group and Shanghai Jiao Tong University co-authored a paper titled Pyraformer: Low-Complexity Pyramidal Attention for Long-Range Time Series Modeling and Forecasting , which was accepted as an oral presentation at ICLR 2022 (oral acceptance rate 1.6%). The paper has been cited over 846 times.

The ICLR 2022 program chair praised the work for proposing a multi-resolution pyramidal attention mechanism that captures long-range dependencies with linear time and space complexity, providing extensive experimental and ablation evidence that it consistently outperforms the state of the art.

Pyraformer addresses the core challenge of time‑series forecasting: building a model that is both powerful and lightweight enough to capture short‑ and long‑range temporal patterns. By introducing a Pyramid Attention Module (PAM) that aggregates features across multiple resolutions, the maximum signal‑propagation path length becomes constant (O(1)), while both time and memory costs grow linearly with sequence length.

Key contributions include:

Design of a pyramidal attention mechanism with cross‑scale and intra‑scale connections, enabling compact multi‑resolution representations.

Theoretical proof that the longest path length can be reduced to O(1) while maintaining O(L) time and space complexity.

Extensive experiments on real‑world datasets (Electricity, Wind, App Flow, etc.) showing superior accuracy in both single‑step and long‑range multi‑step forecasting, with lower computational overhead compared to Transformer, Longformer, Reformer, Informer, and other sparse‑attention variants.

Experimental results demonstrate that Pyraformer achieves the lowest NRMSE and ND scores while using the fewest Q‑K attention pairs, reducing them by up to 96.6% relative to full attention. In long‑range forecasting on ETTh1 and ETTm1, Pyraformer reduces MSE by 24.8%–28.9% compared to Informer.

Efficiency is further validated by a custom CUDA kernel for PAM implemented with TVM, showing faster computation and lower GPU memory usage than prob‑sparse and full attention mechanisms as sequence length grows.

For readers interested in the original work, the paper can be accessed at https://iclr.cc/virtual/2022/oral/6828 and the code repository is available at https://github.com/ant-research/Pyraformer .

deep learningtime series forecastingICLR 2022low complexityPyraformerpyramidal attention
AntTech
Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.