Author

Machine Learning Algorithms & Natural Language Processing

Focused on frontier AI technologies, empowering AI researchers' progress.

322

Articles

Likes

240

Views

Comments

Latest from Machine Learning Algorithms & Natural Language Processing

100 recent articles max

Machine Learning Algorithms & Natural Language Processing

May 20, 2026 · Artificial Intelligence

MLNLP 2026 Symposium: Top AI Scholars from Qiyuan Lab, BIT, Tsinghua & Alibaba Reveal New Agent and Table Research

The MLNLP 2026 academic symposium on May 31 will feature leading AI researchers from Qiyuan Lab, Beijing Institute of Technology, Tsinghua University and Alibaba presenting cutting‑edge work on autonomous agents, table intelligence, multi‑agent learning environments, and the future of general agents.

AI ConferenceChinaLarge Language Models

0 likes · 8 min read

MLNLP 2026 Symposium: Top AI Scholars from Qiyuan Lab, BIT, Tsinghua & Alibaba Reveal New Agent and Table Research

Machine Learning Algorithms & Natural Language Processing

May 20, 2026 · Artificial Intelligence

How 800 Data Points Halve LLM Chain‑of‑Thought Length and Boost Accuracy

The ICLR‑2026 paper introduces LCPO, a lightweight preference‑optimization technique that uses only 800 curated examples and 50 training steps to cut large‑model chain‑of‑thought generation length by about 50% while maintaining or even improving answer accuracy, dramatically reducing training and inference costs.

Efficient InferenceLCPOLarge Language Models

0 likes · 8 min read

How 800 Data Points Halve LLM Chain‑of‑Thought Length and Boost Accuracy

Machine Learning Algorithms & Natural Language Processing

May 20, 2026 · Artificial Intelligence

Can 99% Sparse Transformers Run Faster? Insights from the ‘Attention Is All You Need’ Authors

The paper shows that applying lightweight L1 regularization can make over 99% of FFN activations zero, and by using a new tile‑wise ELLPACK (TwELL) format together with a hybrid routing scheme, inference speed improves up to 30% while memory usage drops over 24% and energy consumption is reduced, all with negligible impact on downstream task performance.

CUDAGPU optimizationHybrid Routing

0 likes · 8 min read

Can 99% Sparse Transformers Run Faster? Insights from the ‘Attention Is All You Need’ Authors

Machine Learning Algorithms & Natural Language Processing

May 20, 2026 · Artificial Intelligence

Composer 2.5 Narrows the Gap to Claude Opus 4.7 with Ten‑Fold Cost Savings

Composer 2.5, the latest AI‑coding model from Cursor, claims near‑par performance with Claude 4.7 Opus and GPT‑5.5 while delivering up to ten‑times higher efficiency and a pricing model of $0.5 per M input tokens and $2.5 per M output tokens, backed by novel reinforcement‑learning tricks, massive synthetic data, and a custom Muon optimizer with dual‑grid HSDP architecture.

AI programmingComposer 2.5HSDP

0 likes · 13 min read

Composer 2.5 Narrows the Gap to Claude Opus 4.7 with Ten‑Fold Cost Savings

Machine Learning Algorithms & Natural Language Processing

May 20, 2026 · Artificial Intelligence

How New LLM Architectures Like Gemma 4 and DeepSeek V4 Cut Long‑Context Costs

The article surveys recent open‑weight LLM releases—Gemma 4, Laguna XS.2, ZAYA1‑8B and DeepSeek V4—detailing how KV‑cache sharing, per‑layer embeddings, layer‑wise attention budgeting, compressed convolutional attention and manifold‑constrained hyper‑connections dramatically reduce memory and compute for ultra‑long contexts while preserving model quality.

Attention optimizationKV CacheLLM

0 likes · 25 min read

How New LLM Architectures Like Gemma 4 and DeepSeek V4 Cut Long‑Context Costs

Machine Learning Algorithms & Natural Language Processing

May 19, 2026 · Artificial Intelligence

Dynamic Memory Forest: Precise Long‑Dialogue Tracking for Highly Coherent Responses

The paper introduces the Dynamic Memory Forest (DMF) framework, inspired by human memory consolidation and growth, which transforms fragmented long‑term dialogue histories into structured memory trees, enabling entropy‑driven walks and grafting mechanisms that markedly improve coherence and efficiency of LLM responses.

Dynamic Memory ForestEntropy-Driven WalkLLM Memory

0 likes · 11 min read

Dynamic Memory Forest: Precise Long‑Dialogue Tracking for Highly Coherent Responses

Machine Learning Algorithms & Natural Language Processing

May 19, 2026 · Artificial Intelligence

From P(y|x) to P(y): Reinforcement Learning in Pre‑train Space Unlocks Endogenous Reasoning

The paper introduces PreRL, which removes the input condition to directly optimize the reasoning trajectory (P(y)) of large language models, and combines it with standard RL in Dual Space RL (DSRL), achieving consistent gains on math and out‑of‑distribution benchmarks, faster training, and richer reasoning behaviors.

DSRLLarge Language ModelsPreRL

0 likes · 11 min read

From P(y|x) to P(y): Reinforcement Learning in Pre‑train Space Unlocks Endogenous Reasoning

Machine Learning Algorithms & Natural Language Processing

May 17, 2026 · Artificial Intelligence

Why This Open‑Source Claude Code Pipeline Has Earned 6.4k Stars for AI‑Powered Paper Writing

The article presents the open‑source ARS (academic‑research‑skills) pipeline that stitches together four Claude Code skills—research, writing, review, and orchestration—detailing its agent architecture, citation verification, integrity gates, anti‑flattery mechanisms, three‑layer data isolation, cost, token usage, and installation steps.

AI writingClaudeLLM

0 likes · 10 min read

Why This Open‑Source Claude Code Pipeline Has Earned 6.4k Stars for AI‑Powered Paper Writing

Machine Learning Algorithms & Natural Language Processing

May 17, 2026 · Artificial Intelligence

How to Build Agentic Factual SFT and Mid‑Train Datasets: Query Selection, Trajectory Generation, and Tool Usage

This article outlines a systematic approach for creating agentic factual SFT and Mid‑train data, covering the definition of training goals, query filtering, two‑layer classification and labeling, trajectory format, differences between Mid‑train and SFT, a practical synthesis pipeline, and common pitfalls to avoid.

SFTagentic AIdata synthesis

0 likes · 11 min read

How to Build Agentic Factual SFT and Mid‑Train Datasets: Query Selection, Trajectory Generation, and Tool Usage

Machine Learning Algorithms & Natural Language Processing

May 16, 2026 · Artificial Intelligence

Token Superposition Training Accelerates LLM Pre‑training 2.5× Without Changing Architecture

Token Superposition Training (TST) speeds up large‑language‑model pre‑training by up to 2.5× without altering model architecture or compute budget, using a superposition phase that averages token embeddings into bags and predicts groups of tokens, followed by a standard recovery phase, as demonstrated on 10B‑parameter MoE and smaller models.

LLM pretrainingMCE lossMoE

0 likes · 10 min read

Token Superposition Training Accelerates LLM Pre‑training 2.5× Without Changing Architecture