Tagged articles
4 articles
Page 1 of 1
Data Party THU
Data Party THU
Feb 4, 2026 · Artificial Intelligence

How Sakana AI Redefines Long-Context Transformers: DroPE, REPO, and FwPKM Explained

This article analyzes Sakana AI's three recent papers that challenge traditional Transformer long‑sequence handling by removing positional embeddings, reconstructing position awareness, and adding a fast‑weight external memory, showing how each approach improves ultra‑long text understanding.

Memory MechanismPositional EmbeddingTransformer
0 likes · 12 min read
How Sakana AI Redefines Long-Context Transformers: DroPE, REPO, and FwPKM Explained
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Oct 25, 2023 · Artificial Intelligence

Unlocking GLM & ChatGLM: Deep Dive into MindSpore Large‑Model Techniques

The MindSpore Season 2 open class offers a comprehensive overview of GLM to ChatGLM architectures, positional‑embedding strategies, stable training optimizations, and step‑by‑step instructions for deploying large language models with Ascend, ModelArts, and MindSpore Transformers, while previewing upcoming multimodal remote‑sensing sessions.

Artificial IntelligenceChatGLMGLM
0 likes · 6 min read
Unlocking GLM & ChatGLM: Deep Dive into MindSpore Large‑Model Techniques
Code DAO
Code DAO
Dec 8, 2021 · Artificial Intelligence

Understanding Compact Transformers: Build and Train Vision & NLP Models on a Personal PC

This article walks through the design of Compact Transformers, explaining scaled dot‑product self‑attention, positional embeddings, multi‑head attention, and Vision Transformer architecture, and provides full PyTorch code so readers can train lightweight CV and NLP classifiers on a single PC.

Compact TransformersPatch EmbeddingPositional Embedding
0 likes · 19 min read
Understanding Compact Transformers: Build and Train Vision & NLP Models on a Personal PC