Tagged articles

15 articles

Page 1 of 1

Apr 30, 2026 · Artificial Intelligence

Turning Transformers into Mamba: How Apple Linearized Inference Costs

Apple introduced a two‑step cross‑architecture distillation method that converts costly quadratic‑time Transformers into cheaper linear‑time Mamba models, preserving most of the original performance while dramatically reducing inference cost.

AI researchLinear AttentionMamba

0 likes · 8 min read

Turning Transformers into Mamba: How Apple Linearized Inference Costs

Machine Learning Algorithms & Natural Language Processing

Apr 22, 2026 · Artificial Intelligence

Turning Transformers into Mamba: A Cross‑Architecture Distillation That Linearizes Inference Cost

The article presents a two‑step cross‑architecture distillation method that replaces the quadratic softmax attention of Transformers with a learned linear attention and then maps it onto a Mamba backbone, achieving near‑teacher performance while reducing inference cost to linear time.

Cross‑ArchitectureLinear AttentionMamba

0 likes · 8 min read

Turning Transformers into Mamba: A Cross‑Architecture Distillation That Linearizes Inference Cost

Machine Heart

Apr 22, 2026 · Artificial Intelligence

Apple Turns Transformers into Mamba with Linear‑Cost Distillation

Apple proposes a two‑step cross‑architecture distillation that converts expensive, high‑performing Transformers into cheaper, nearly equally strong Mamba models by first replacing softmax attention with learned linear attention (Hedgehog) and then embedding this intermediate form into Mamba, achieving comparable perplexity and downstream task performance with far lower inference cost.

Artificial IntelligenceLinear AttentionMamba

0 likes · 7 min read

Apple Turns Transformers into Mamba with Linear‑Cost Distillation

Bighead's Algorithm Notes

Dec 30, 2025 · Artificial Intelligence

MaGNet: Dual‑Hypergraph Mamba Network for Time‑Causal and Global Stock Trend Forecasting

MaGNet introduces a three‑component architecture—MAGE block with bidirectional Mamba, adaptive gating and sparse MoE, 2‑D spatio‑temporal attention, and a dual hypergraph framework (time‑causal and global probability hypergraphs)—that outperforms 17 baselines on six major stock indices in both prediction accuracy and risk‑adjusted returns.

Financial AIHypergraphMaGNet

0 likes · 14 min read

MaGNet: Dual‑Hypergraph Mamba Network for Time‑Causal and Global Stock Trend Forecasting

Instant Consumer Technology Team

Aug 20, 2025 · Artificial Intelligence

Nvidia Unveils Nemotron‑Nano‑9B‑v2: Tiny Open‑Source LLM with Switchable Reasoning

Nvidia’s newly released Nemotron‑Nano‑9B‑v2, a 9‑billion‑parameter open‑source LLM optimized for a single Nvidia A10 GPU, introduces a toggleable reasoning mode and budget controls, delivering up to six‑fold speed gains, multilingual support, and strong benchmark results across various tasks.

AI inferenceMambaNVIDIA

0 likes · 5 min read

Nvidia Unveils Nemotron‑Nano‑9B‑v2: Tiny Open‑Source LLM with Switchable Reasoning

Data Party THU

Aug 5, 2025 · Artificial Intelligence

Why State Space Models May Outperform Transformers: A Deep Dive

The article provides a comprehensive technical analysis of state space models (SSM) versus Transformers, covering their core mechanisms, three essential design factors, training efficiency, scaling behavior, tokenization debates, and experimental evidence that highlights the trade‑offs and potential advantages of SSMs in modern AI systems.

MambaState Space ModelTransformer

0 likes · 21 min read

Why State Space Models May Outperform Transformers: A Deep Dive

AIWalker

Jun 24, 2025 · Artificial Intelligence

Mamba-Adaptor Merges Adaptor‑T and Adaptor‑S to Revolutionize Vision Tasks with State‑of‑the‑Art Benchmarks

The paper introduces Mamba-Adaptor, a plug‑and‑play module combining Adaptor‑T and Adaptor‑S to overcome causal computation, long‑range forgetting, and spatial modeling limits of visual Mamba models, delivering top‑ranked results on ImageNet and COCO across multiple downstream tasks.

AdaptorMambaState Space Model

0 likes · 25 min read

Mamba-Adaptor Merges Adaptor‑T and Adaptor‑S to Revolutionize Vision Tasks with State‑of‑the‑Art Benchmarks

AI Frontier Lectures

Jun 7, 2025 · Artificial Intelligence

Can MaIR’s Locality‑Preserving Mamba Boost Image Restoration?

The article presents MaIR, a locality‑ and continuity‑preserving Mamba‑based model for image restoration, detailing its three‑stage architecture, novel scanning strategy, loss functions, experimental results on super‑resolution and denoising, and ablation studies, with links to the arXiv paper and source code.

DenoisingMambacomputer vision

0 likes · 5 min read

Can MaIR’s Locality‑Preserving Mamba Boost Image Restoration?

AI Frontier Lectures

Jun 3, 2025 · Artificial Intelligence

How MaIR Advances Image Restoration with a Locality‑Preserving Mamba Architecture

The article presents MaIR, a Mamba‑based image restoration model that preserves locality and continuity, detailing its architecture, scanning strategies, loss functions, experimental results on super‑resolution and denoising, and an ablation study, while providing links to the arXiv paper and GitHub source code.

DenoisingMambacomputer vision

0 likes · 5 min read

How MaIR Advances Image Restoration with a Locality‑Preserving Mamba Architecture

AI Frontier Lectures

May 10, 2025 · Artificial Intelligence

Can the ‘Canon’ Layer Unlock New Limits in Large Language Models?

A new study introduces the lightweight “Canon” layer for large language models, showing how it improves information flow, inference depth, and scalability across Transformers, linear attention, and state‑space architectures, while offering a controlled synthetic pre‑training benchmark for deeper architectural analysis.

AI researchMambaModel architecture

0 likes · 11 min read

Can the ‘Canon’ Layer Unlock New Limits in Large Language Models?

AI Frontier Lectures

Mar 14, 2025 · Artificial Intelligence

Do Vision Models Really Need Mamba? A Deep Dive into MambaOut

This article critically examines the MambaOut paper, analyzing whether state‑space‑based Mamba token mixers are necessary for vision tasks, presenting two hypotheses, describing the construction of MambaOut models without SSM, and reporting extensive ImageNet, COCO and ADE20K experiments that reveal when Mamba is beneficial.

MambaState Space ModelToken Mixer

0 likes · 17 min read

Do Vision Models Really Need Mamba? A Deep Dive into MambaOut

AIWalker

Mar 11, 2025 · Artificial Intelligence

MobileMamba: Lightweight Multi‑Receptive‑Field Backbone Beats Existing Mamba Models

MobileMamba introduces a three‑stage, lightweight backbone with a multi‑receptive‑field feature‑interaction module that combines wavelet‑enhanced Mamba, multi‑kernel depthwise convolutions, and redundant‑mapping reduction, delivering up to 83.6% ImageNet Top‑1 accuracy while running 21× faster than LocalVim and 3.3× faster than EfficientVMamba.

CNNMambaMobileMamba

0 likes · 10 min read

MobileMamba: Lightweight Multi‑Receptive‑Field Backbone Beats Existing Mamba Models

AIWalker

Mar 10, 2025 · Artificial Intelligence

HSR-Mamba Solves Mamba’s HSISR Issue with Dual Strategies, Beats Prior Methods

HSR-Mamba introduces a contextual spatial‑spectral state‑space model that tackles Mamba's limitations in hyperspectral image super‑resolution through a local partition mechanism and a global spectral rearrangement strategy, achieving significantly higher PSNR, SSIM and SAM scores than existing approaches while using fewer parameters and FLOPs.

Dual strategyHSI super-resolutionMamba

0 likes · 25 min read

HSR-Mamba Solves Mamba’s HSISR Issue with Dual Strategies, Beats Prior Methods

NewBeeNLP

Apr 2, 2024 · Artificial Intelligence

Jamba: How AI21 Labs Merged Mamba and Transformer for 3× Faster 128k Contexts

Jamba, a hybrid Mamba‑Transformer model from AI21 Labs, combines state‑space and attention layers with Mixture‑of‑Experts to deliver up to three times the throughput of comparable 52‑billion‑parameter LLMs on 128k context windows while maintaining high output quality and low memory usage.

JambaLLMMamba

0 likes · 6 min read

Jamba: How AI21 Labs Merged Mamba and Transformer for 3× Faster 128k Contexts

NewBeeNLP

Mar 4, 2024 · Artificial Intelligence

A Curated Tour of Mamba Papers: 25 Cutting‑Edge State‑Space Model Innovations

This article presents a GitHub‑hosted collection of 25 recent research papers on Mamba and its variants, summarizing each work’s core contributions across sequence modeling, vision, medical imaging, graph analysis, and multimodal tasks, and highlighting their performance gains over prior methods.

MambaSequence Modelingcomputer vision

0 likes · 13 min read

A Curated Tour of Mamba Papers: 25 Cutting‑Edge State‑Space Model Innovations