Tagged articles

9 articles

Page 1 of 1

May 29, 2026 · Artificial Intelligence

WaDi: One‑Step Image Generation with LoRA Meets RoPE

This work analyzes weight‑direction changes in diffusion‑model distillation, proposes a low‑rank rotation adapter (LoRaD) to model those changes, and integrates it into Variational Score Distillation as WaDi, achieving state‑of‑the‑art FID on COCO with only ~10% trainable parameters while generalizing to multiple downstream tasks.

LoRARoPEdiffusion models

0 likes · 20 min read

WaDi: One‑Step Image Generation with LoRA Meets RoPE

AI Large Model Application Practice

Jan 15, 2026 · Artificial Intelligence

Why Transformers Need Positional Embeddings and How They Work

This article explains the order‑blindness of Transformer self‑attention, why naïvely adding raw position indices harms semantics, and walks through sinusoidal, learnable, and rotary positional encodings together with PI and YaRN techniques for extending sequence length.

AILLMPositional Embedding

0 likes · 12 min read

Why Transformers Need Positional Embeddings and How They Work

Baidu Intelligent Cloud Tech Hub

Nov 25, 2025 · Artificial Intelligence

Why DeepSeek‑V3.2‑Exp Lost Performance and How a Simple RoPE Fix Restored It

The Baidu Baige team discovered that DeepSeek‑V3.2‑Exp’s long‑context performance lagged behind the official report, traced the issue to a subtle RoPE layout mismatch in the open‑source inference demo, collaborated with DeepSeek to fix it, and verified that the model’s speed and accuracy fully recovered across multiple benchmarks.

AI infrastructureDeepSeekLLM inference

0 likes · 9 min read

Why DeepSeek‑V3.2‑Exp Lost Performance and How a Simple RoPE Fix Restored It

Data Party THU

Oct 16, 2025 · Artificial Intelligence

How Tensor Product Attention Redefines Long‑Context Transformers

The article analyzes the Tensor Product Attention (TPA) method presented at NeurIPS 2025, explaining how it factorizes Q, K, V tensors to drastically reduce KV cache size and attention complexity, and demonstrates superior convergence, lower perplexity, and faster inference on long‑sequence tasks compared with existing attention variants.

Efficient AttentionKV CacheRoPE

0 likes · 11 min read

How Tensor Product Attention Redefines Long‑Context Transformers

Data Party THU

Jul 29, 2025 · Artificial Intelligence

Can 2‑Simplicial Attention Outperform Standard Transformers? A Deep Dive

This article reviews Meta's rotation‑invariant 2‑simplicial attention, explains its trilinear formulation and windowed implementation, analyzes its impact on scaling laws compared with standard dot‑product attention, and presents experimental results showing when the new mechanism offers advantages.

2-simplicial attentionMetaNeural architecture

0 likes · 12 min read

Can 2‑Simplicial Attention Outperform Standard Transformers? A Deep Dive

NewBeeNLP

Nov 27, 2024 · Artificial Intelligence

How Can Large Language Models Extend Their Context Window? A Deep Dive into Position Encoding

This article reviews the principles of absolute and relative positional encodings, explains why window extrapolation is crucial for large language models, analyzes current extrapolation methods, evaluates their performance, and answers common questions about extending LLM context windows.

LLMPositional EncodingRoPE

0 likes · 14 min read

How Can Large Language Models Extend Their Context Window? A Deep Dive into Position Encoding

NewBeeNLP

Oct 16, 2024 · Artificial Intelligence

Unlocking Long-Sequence LLMs: Position Embeddings, Scaling, and Efficient Attention

This article reviews recent advances in training and inference for long‑sequence large language models, comparing ALIBI and RoPE position embeddings, exploring RoPE scaling techniques, analyzing attention optimizations, and outlining practical data, evaluation, and system frameworks for scalable LLM deployment.

Flash AttentionLLMRoPE

0 likes · 14 min read

Unlocking Long-Sequence LLMs: Position Embeddings, Scaling, and Efficient Attention

Sohu Tech Products

Sep 11, 2024 · Artificial Intelligence

How RoPE and FlashAttention Empower GLM-4-Plus for Long-Text Mastery

This article explains the core mechanisms of Transformer models, details the Rotational Position Embedding (RoPE) and FlashAttention techniques for handling long sequences, introduces the GLM-4-Plus series, and presents an empirical evaluation on the THUCNews dataset showing its superior long-text performance.

FlashAttentionGLM-4-PlusLong Text

0 likes · 13 min read

How RoPE and FlashAttention Empower GLM-4-Plus for Long-Text Mastery

DeWu Technology

Mar 13, 2024 · Artificial Intelligence

Extending Context Length in LLaMA Models: Structures, Challenges, and Techniques

The article reviews LLaMA’s Transformer and RoPE architecture, explains why its context windows (4K‑128K tokens) are limited, and evaluates industry‑proven extension techniques—including linear, NTK‑aware, and YaRN interpolation plus LongLoRA sparse attention—while addressing memory and quadratic‑cost challenges and presenting a KubeAI workflow for fine‑tuning and deployment.

AILLaMALongLoRA

0 likes · 17 min read

Extending Context Length in LLaMA Models: Structures, Challenges, and Techniques