Tagged articles

690 articles

Page 2 of 7

Apr 5, 2026 · Industry Insights

How Datus AI Is Redefining Data Engineering with an Open‑Source Data Agent

This article examines Datus AI’s open‑source Data Engineering Agent, detailing its architecture, interactive context engineering, evaluation results, and future roadmap, and explains how it tackles the challenges of scaling AI‑driven data workflows.

AI agentsNL2SQLOpen Source

0 likes · 20 min read

How Datus AI Is Redefining Data Engineering with an Open‑Source Data Agent

Machine Heart

Apr 5, 2026 · Artificial Intelligence

Cut Token Costs by 68% with Dynamic Multi‑Agent Collaborative Coding

The paper introduces AgentConductor, a 3‑billion‑parameter orchestrator that generates adaptive YAML‑based multi‑agent topologies, dynamically re‑plans when code errors occur, achieving a 14.6% accuracy boost and up to 68% token‑cost reduction compared to existing static agent pipelines.

AgentConductorLLM code generationYAML topology

0 likes · 9 min read

Cut Token Costs by 68% with Dynamic Multi‑Agent Collaborative Coding

AI Engineer Programming

Apr 5, 2026 · Artificial Intelligence

How Kimi, Cursor, and Chroma Use Reinforcement Learning to Train Agent Models

The article analyzes three recent technical reports—Moonshot AI's Kimi K2.5, Cursor's Composer 2, and Chroma's Context‑1—detailing how each system trains agent models with reinforcement learning, parallel orchestration, self‑summarization, and self‑editing, and highlights shared methodological themes and performance gains.

Chroma Context-1Cursor ComposerKimi

0 likes · 19 min read

How Kimi, Cursor, and Chroma Use Reinforcement Learning to Train Agent Models

Machine Learning Algorithms & Natural Language Processing

Apr 4, 2026 · Artificial Intelligence

Why the Best SFT Checkpoint May Hurt RL Performance: Adaptive Early‑Stop Loss (AESL) for LLM Cold‑Start

The paper reveals that over‑optimizing supervised fine‑tuning (SFT) for large language models can diminish their reinforcement‑learning (RL) potential, proposes an Adaptive Early‑Stop Loss (AESL) that balances accuracy and output diversity during cold‑start, and demonstrates across multiple LLMs that AESL consistently yields superior RL results.

AI trainingAdaptive Early‑Stop LossLLM

0 likes · 11 min read

Why the Best SFT Checkpoint May Hurt RL Performance: Adaptive Early‑Stop Loss (AESL) for LLM Cold‑Start

Machine Heart

Apr 3, 2026 · Artificial Intelligence

Beyond Token Entropy: ReLaX Uses Latent Dynamics to Rethink Exploration‑Exploitation in LLM RL

The paper introduces ReLaX, a framework that shifts focus from token‑level entropy to the latent‑space dynamics of large models, employing Koopman operators and a Dynamic Spectral Divergence metric to quantitatively guide exploration‑exploitation balance, and demonstrates state‑of‑the‑art performance on both pure‑text and multimodal RL benchmarks.

Koopman operatorReLaXdynamic spectral divergence

0 likes · 12 min read

Beyond Token Entropy: ReLaX Uses Latent Dynamics to Rethink Exploration‑Exploitation in LLM RL

Machine Heart

Apr 2, 2026 · Artificial Intelligence

HSImul3R: Bridging Perception and Simulation for Physics‑Ready 3D Human‑Scene Interaction

HSImul3R introduces a physics‑in‑the‑loop reconstruction pipeline that closes the perception‑simulation gap by jointly optimizing human motion and scene geometry, leveraging reinforcement learning, direct simulation‑reward optimization, and a new HSIBench dataset to produce simulation‑ready 3D human‑scene interactions.

3D ReconstructionDSROHSIBench

0 likes · 12 min read

HSImul3R: Bridging Perception and Simulation for Physics‑Ready 3D Human‑Scene Interaction

Machine Heart

Apr 2, 2026 · Artificial Intelligence

Breaking the Multi‑Robot Barrier: Sequential World‑Model Decomposition (ICLR 2026)

SeqWM introduces a sequential causal decomposition of joint dynamics, allowing each robot to model its marginal contribution conditioned on prior agents, which simplifies world‑model learning, enables intent‑sharing planning via MPPI, and achieves superior performance in challenging simulation benchmarks and real‑robot tests.

MPPISeqWMmodel-based RL

0 likes · 7 min read

Breaking the Multi‑Robot Barrier: Sequential World‑Model Decomposition (ICLR 2026)

Lao Guo's Learning Space

Apr 1, 2026 · Artificial Intelligence

Humans Achieve 100% While Top AI Models Score Below 0.4% on ARC‑AGI‑3 Benchmark

In the ARC‑AGI‑3 test, 486 random humans solved all 150+ game‑based puzzles with a perfect 100% success rate in a median of 7.4 minutes, whereas leading models such as GPT‑5, Claude Opus 4.6, Gemini 3.1 Pro and Grok 4.20 managed at most 0.37%, exposing a stark gap in meta‑cognitive reasoning.

AGIARC-AGI-3benchmark

0 likes · 9 min read

Humans Achieve 100% While Top AI Models Score Below 0.4% on ARC‑AGI‑3 Benchmark

Bighead's Algorithm Notes

Mar 31, 2026 · Artificial Intelligence

Top AI-Driven Quantitative Finance Papers from AAAI 2026

This article curates and summarizes recent AI research papers presented at AAAI 2026 that advance quantitative finance, covering controllable market generation, LLM‑powered alpha factor mining, risk‑aware multi‑agent portfolio management, foundation models for market data, and reinforcement‑learning trading policies.

AIFinancial Market SimulationMeta Learning

0 likes · 12 min read

Top AI-Driven Quantitative Finance Papers from AAAI 2026

Machine Heart

Mar 31, 2026 · Artificial Intelligence

Can LLM Judges Be Trusted? TrustJudge Leverages Full Probability Distributions

LLM judges often produce contradictory scores and non‑transitive preferences; the TrustJudge framework replaces discrete scoring with distribution‑sensitive scoring and likelihood‑aware aggregation, dramatically reducing both score‑comparison and pairwise‑transitivity inconsistencies across multiple model families, improving accuracy and even serving as a reward signal for RL training.

LLM evaluationReward ModelingTrustJudge

0 likes · 12 min read

Can LLM Judges Be Trusted? TrustJudge Leverages Full Probability Distributions

Shi's AI Notebook

Mar 30, 2026 · Artificial Intelligence

AI Daily Digest March 30, 2026: Open‑Source Tools, Model Releases, and Research Highlights

The March 30 AI daily digest curates recent open‑source voice input and TypeScript libraries, new development workflows, a 30B parameter model that runs on 24 GB GPUs, and NVIDIA's PivotRL research that reduces reinforcement‑learning rollouts while matching end‑to‑end performance, all with concrete benchmarks and links.

AI toolsAgent workflowOpen Source

0 likes · 13 min read

AI Daily Digest March 30, 2026: Open‑Source Tools, Model Releases, and Research Highlights

Machine Heart

Mar 30, 2026 · Artificial Intelligence

Proactive Interaction for Video Multimodal Models: MMDuet2 & ProactiveVideoQA

This article surveys the ICLR 2026 papers ProactiveVideoQA and MMDuet2, detailing how video multimodal large models can decide when to reply autonomously, the PAUC benchmark for evaluating timeliness and accuracy, a reinforcement‑learning training pipeline that requires no precise timestamps, and experimental findings on data construction, frame‑sampling density, and SOTA performance.

MMDuet2PAUCProactive Interaction

0 likes · 17 min read

Proactive Interaction for Video Multimodal Models: MMDuet2 & ProactiveVideoQA

Bighead's Algorithm Notes

Mar 29, 2026 · Artificial Intelligence

How MetaTrader Uses Reinforcement Learning to Boost Trading Strategy Generalization

The article reviews the MetaTrader method, which formulates sequential portfolio optimization as a partially offline reinforcement‑learning problem, introduces a double‑layer RL algorithm and a conservative TD objective to improve out‑of‑distribution generalization, and demonstrates superior performance on CSI‑300 and NASDAQ‑100 datasets compared with existing baselines.

Financial TradingMetaTraderOOD data augmentation

0 likes · 15 min read

How MetaTrader Uses Reinforcement Learning to Boost Trading Strategy Generalization

DataFunSummit

Mar 29, 2026 · Artificial Intelligence

How Code Intelligence Is Evolving: From Foundation Models to Repository‑Level Agents

This article reviews the rapid evolution of code intelligence, covering the history of code foundation models, reinforcement‑learning optimizations, scaling‑law insights, the LoopCoder architecture, rigorous multi‑level evaluation suites, and the emergence of repository‑level code agents, while highlighting open‑source contributions such as Qwen‑Coder.

code evaluationcode intelligencereinforcement learning

0 likes · 15 min read

How Code Intelligence Is Evolving: From Foundation Models to Repository‑Level Agents

Machine Heart

Mar 29, 2026 · Artificial Intelligence

Scaling World Model Dynamics to Over a Thousand Steps in Two ICLR Papers

The article reviews two ICLR papers by Haoxin Lin that advance world‑model dynamics from single‑step bootstrapping to any‑step direct prediction, introduce structured uncertainty via backtracking, and achieve stable full‑horizon roll‑outs of over a thousand steps, dramatically improving both online and offline reinforcement‑learning performance.

any-step predictiondynamics modelingfull-horizon rollout

0 likes · 16 min read

Scaling World Model Dynamics to Over a Thousand Steps in Two ICLR Papers

PaperAgent

Mar 29, 2026 · Industry Insights

From Reasoning to Agentic Thinking: How Harnesses Are Redefining AI Development

The article examines the shift from traditional reasoning‑based large‑language‑model pipelines to agentic, harness‑driven AI systems, outlining the definition of a harness, its engineering challenges, architectural components, and the broader implications for training, reinforcement learning, and future research directions.

AI HarnessInfrastructureIntelligent agents

0 likes · 16 min read

From Reasoning to Agentic Thinking: How Harnesses Are Redefining AI Development

Bighead's Algorithm Notes

Mar 26, 2026 · Artificial Intelligence

Paper Reading: ArchetypeTrader – A Reinforcement‑Learning Framework for Selecting and Optimizing Crypto Trading Strategies

The article reviews the ArchetypeTrader framework, which addresses market‑segmentation and demonstration‑data issues in crypto‑currency reinforcement learning by discovering discrete trading archetypes, selecting them via a hierarchical RL agent, and refining actions with a regret‑aware adapter, achieving superior profit and risk‑adjusted returns across multiple markets.

cryptocurrency tradinghierarchical reinforcement learningregret-aware optimization

0 likes · 16 min read

Paper Reading: ArchetypeTrader – A Reinforcement‑Learning Framework for Selecting and Optimizing Crypto Trading Strategies

Alibaba Cloud Big Data AI Platform

Mar 25, 2026 · Artificial Intelligence

Scaling Multimodal Reinforcement Learning with NVIDIA Isaac Lab and TiledCamera

This article explains how to use NVIDIA Isaac Lab and the TiledCamera component to run large‑scale, multimodal reinforcement learning on GPU clusters, covering environment setup, noVNC visualization, command‑line execution, distributed training with torchrun, and performance analysis across multiple GPU configurations.

GPU scalingNVIDIA Isaac LabTiledCamera

0 likes · 12 min read

Scaling Multimodal Reinforcement Learning with NVIDIA Isaac Lab and TiledCamera

Bighead's Algorithm Notes

Mar 24, 2026 · Artificial Intelligence

How an Interactive Imitation‑Learning Agent Framework Trains Robust Trading Strategies

The article analyzes the simulation‑reality gap in algorithmic trading and proposes an interactive market simulator that combines a pool of imitation‑learning agents, an action‑synthesis network, and a DDPG‑based reinforcement‑learning trader, showing superior robustness and downside protection on QQQ data.

Agent-Based ModelingDDPGFinancial AI

0 likes · 16 min read

How an Interactive Imitation‑Learning Agent Framework Trains Robust Trading Strategies

SuanNi

Mar 24, 2026 · Artificial Intelligence

How Memento‑Skills Enables Self‑Evolving LLMs Without Fine‑Tuning

Introducing Memento‑Skills, a novel framework that freezes LLM parameters while an external skill library iteratively reads, writes, and refines capabilities, achieving up to 116% accuracy gains on GAIA and HLE benchmarks and demonstrating scalable self‑evolution without costly model fine‑tuning.

LLMreinforcement learningself-evolution

0 likes · 11 min read

How Memento‑Skills Enables Self‑Evolving LLMs Without Fine‑Tuning

Machine Learning Algorithms & Natural Language Processing

Mar 22, 2026 · Artificial Intelligence

NS-Diff: Adding a Physics Engine to Diffusion Models for Fluid and Rigid‑Body Dynamics

The CVPR 2026 paper introduces NS‑Diff, a physics‑guided video diffusion framework that combines a noise‑robust dynamics detector, a physical‑condition latent injection module, and reinforcement‑learning optimization to reduce jerk error by 43 % and fluid divergence by 33 %, achieving superior physical realism and visual quality across multiple benchmarks.

CVPR 2026NS‑DiffNavier-Stokes

0 likes · 13 min read

NS-Diff: Adding a Physics Engine to Diffusion Models for Fluid and Rigid‑Body Dynamics

DataFunTalk

Mar 22, 2026 · Artificial Intelligence

Why Cursor’s Composer 2 Beats Claude Opus 4.6 in Performance and Price

Cursor’s new Composer 2 programming model outperforms Claude Opus 4.6 on benchmarks like Terminal‑Bench 2.0 and SWE‑bench Multilingual, while slashing token costs to $0.5/M input and $2.5/M output, thanks to a novel self‑summary reinforcement‑learning technique that enables efficient long‑context processing.

AILarge Language Modelpricing

0 likes · 8 min read

Why Cursor’s Composer 2 Beats Claude Opus 4.6 in Performance and Price

PaperAgent

Mar 22, 2026 · Artificial Intelligence

Can LLM Agents Self‑Evolve Without Retraining? Inside Memento‑Skills

The article analyzes the Memento‑Skills framework, which treats external memory as executable skills to enable deployment‑time continual learning for frozen LLM agents, detailing its read‑write reflective loop, skill‑as‑memory design, behavior‑trained skill router, experimental validation on GAIA and HLE benchmarks, and theoretical guarantees without gradient updates.

AIAgentLLM

0 likes · 9 min read

Can LLM Agents Self‑Evolve Without Retraining? Inside Memento‑Skills

AI Engineering

Mar 21, 2026 · Industry Insights

Is Cursor’s Composer 2 Powered by Kimi? The Truth Is More Complex

A developer uncovered that Cursor’s Composer 2 actually runs on the Kimi K2.5 model with reinforcement learning, prompting a rapid licensing dispute that ended with official confirmation and highlights the opaque yet collaborative nature of today’s open AI model ecosystem.

AI model licensingComposer 2Cursor

0 likes · 4 min read

Is Cursor’s Composer 2 Powered by Kimi? The Truth Is More Complex

Bighead's Algorithm Notes

Mar 20, 2026 · Artificial Intelligence

Weekly Quantitative Finance Paper Summaries (Mar 14‑Mar 20, 2026)

This article compiles abstracts of four recent AI‑driven quantitative finance papers, covering an autonomous factor‑investing framework, a program‑level factor‑mining system, an adaptive regime‑aware stock‑price predictor with reinforcement learning, and a comprehensive analysis of AI agents in financial markets.

AI agentsfactor investinglarge language models

0 likes · 10 min read

Weekly Quantitative Finance Paper Summaries (Mar 14‑Mar 20, 2026)

Machine Learning Algorithms & Natural Language Processing

Mar 20, 2026 · Artificial Intelligence

Cursor’s Composer 2 Beats Claude Opus 4.6 with ‘Ankle‑Cut’ Pricing via New Reinforcement‑Learning Method

Cursor’s newly released Composer 2 model surpasses Claude Opus 4.6 on benchmarks such as Terminal‑Bench 2.0, offers dramatically lower token pricing, and achieves these gains by introducing a novel self‑summary reinforcement‑learning technique that compresses long‑context tasks while preserving critical information.

Composer 2CursorLLM

0 likes · 9 min read

Cursor’s Composer 2 Beats Claude Opus 4.6 with ‘Ankle‑Cut’ Pricing via New Reinforcement‑Learning Method

AI Explorer

Mar 20, 2026 · Industry Insights

Key AI Breakthroughs and Market Moves on March 20 2026

On March 20 2026, Alibaba’s Qwen 3.5‑Max topped the LMArena blind‑test, OpenAI bought Astral to boost AI coding, Zhejiang University released a real‑time 4D world model, Meta’s Agent leaked data, and a series of AI‑driven innovations from Nvidia, robotics to drug discovery reshaped the industry.

AIAI design toolsAI hardware

0 likes · 7 min read

Key AI Breakthroughs and Market Moves on March 20 2026

Machine Learning Algorithms & Natural Language Processing

Mar 19, 2026 · Artificial Intelligence

From Solving to Evolving: How RETROAGENT Gives AI Agents Real Retrospective Learning

The article analyzes the RETROAGENT framework, showing how its dual intrinsic feedback and memory‑buffer mechanisms enable LLM agents to move beyond solving tasks toward continual evolution, and presents benchmark results that demonstrate significant performance gains and strong test‑time adaptation across four challenging environments.

LLM agentsRETROAGENTdual intrinsic feedback

0 likes · 7 min read

From Solving to Evolving: How RETROAGENT Gives AI Agents Real Retrospective Learning

Machine Learning Algorithms & Natural Language Processing

Mar 19, 2026 · Artificial Intelligence

From Language Modeling to World Modeling: Limits of Large Language Models

Speaker Li Yixia from Southern University of Science and Technology presents a talk on using large language models as textual world models, defining a three‑layer evaluation framework and showing through experiments that fine‑tuned models improve next‑state prediction and agent performance, yet face limits tied to behavior coverage and environment complexity.

Evaluation Frameworkagent performancelarge language models

0 likes · 4 min read

From Language Modeling to World Modeling: Limits of Large Language Models

Xiaomi Tech

Mar 18, 2026 · Artificial Intelligence

Xiaomi Unveils MiMo-V2-TTS: Giving Agents a Voice with Soul

Xiaomi introduces MiMo-V2-TTS, a self‑developed speech‑synthesis large model that combines a custom audio tokenizer, multi‑codebook architecture, massive pre‑training on over a hundred million hours of data and multi‑dimensional reinforcement learning to deliver fine‑grained style control, dialect support, role‑play and high‑quality singing, aiming to give AI agents expressive, human‑like voices.

audio tokenizerlarge modelmultilingual TTS

0 likes · 6 min read

Xiaomi Unveils MiMo-V2-TTS: Giving Agents a Voice with Soul

AI Explorer

Mar 17, 2026 · Artificial Intelligence

RISE Enables Breakthrough in Vision‑Language‑Action Learning for Embodied AI

The article examines the limitations of vision‑language‑action (VLA) models in real‑world tasks, explains how the RISE technique from Hong Kong University uses internal simulation, reflection and imagination to cut training costs by an order of magnitude, and discusses its implications for future embodied AI.

Embodied AIRISEVLA

0 likes · 6 min read

RISE Enables Breakthrough in Vision‑Language‑Action Learning for Embodied AI

Machine Learning Algorithms & Natural Language Processing

Mar 15, 2026 · Artificial Intelligence

Is RL Dead in LLM Post-Training? MIT’s RandOpt Challenges Traditional Methods

The MIT‑CSAIL paper introduces RandOpt, a single‑step, gradient‑free, fully parallel post‑training algorithm that adds Gaussian noise to pretrained LLM weights and ensembles the results, achieving or surpassing PPO/GRPO performance by exploiting dense "neural thickets" that emerge as model scale grows.

LLMRandOptScaling Law

0 likes · 12 min read

Is RL Dead in LLM Post-Training? MIT’s RandOpt Challenges Traditional Methods

SuanNi

Mar 12, 2026 · Artificial Intelligence

How OpenClaw‑RL Turns Everyday Interactions into Self‑Evolving AI

OpenClaw‑RL, a new reinforcement‑learning framework from Princeton, captures hidden evaluative and instructional signals in daily user interactions, converts them into real‑time training data, and uses a decoupled asynchronous architecture with binary RL and online policy distillation to achieve superior performance in both personal‑device and cloud‑scale scenarios.

AI FeedbackAsynchronous ArchitectureOnline Distillation

0 likes · 10 min read

How OpenClaw‑RL Turns Everyday Interactions into Self‑Evolving AI

AIWalker

Mar 12, 2026 · Artificial Intelligence

BeautyGRPO: RL‑Driven Realistic Portrait Retouching Ends Over‑Beautification (CVPR 2026)

The paper introduces BeautyGRPO, a reinforcement‑learning framework that combines a fine‑grained preference dataset (FRPref‑10K) with Dynamic Path Guidance to balance aesthetic enhancement and high‑fidelity preservation in portrait retouching, achieving superior metrics and user preference over existing SFT and RL models.

AI aestheticsCVPR 2026dynamic path guidance

0 likes · 11 min read

BeautyGRPO: RL‑Driven Realistic Portrait Retouching Ends Over‑Beautification (CVPR 2026)

Didi Tech

Mar 12, 2026 · Artificial Intelligence

How STAPO Improves Large‑Model Fine‑Tuning by Silencing Spurious Tokens

The STAPO (Spurious‑Token‑Aware Policy Optimization) algorithm, introduced by Tsinghua University's iDLab and Didi's Deep Sea Lab, tackles policy‑entropy instability and performance oscillation in reinforcement‑learning fine‑tuning of large models by mathematically analyzing token collision probability, defining spurious tokens, and applying a Silencing Spurious Tokens mechanism that yields state‑of‑the‑art results on multiple math‑reasoning benchmarks.

AI safetySTAPOfine-tuning

0 likes · 7 min read

How STAPO Improves Large‑Model Fine‑Tuning by Silencing Spurious Tokens

DataFunTalk

Mar 11, 2026 · Artificial Intelligence

Agent Lightning: Decoupling Optimizers to Empower AI Agents via Reinforcement Learning

Agent Lightning, an open‑source system from Microsoft Research Asia, introduces a novel optimizer‑agent disaggregation architecture that enables any AI agent to benefit from reinforcement learning, offering non‑intrusive experience capture, programmable pipelines, and flexible signal passing, while addressing real‑world challenges of scalability, multi‑step tasks, and zero‑code integration.

Agent LightningExperience CaptureLearning Systems

0 likes · 21 min read

Agent Lightning: Decoupling Optimizers to Empower AI Agents via Reinforcement Learning

DataFunSummit

Mar 10, 2026 · Artificial Intelligence

How Agent Lightning Redefines AI Agent Learning with Optimizer‑Agent Decoupling

The article explores the paradigm shift toward AI agents in 2025, detailing the open‑source Agent Lightning project’s architecture, non‑intrusive experience capture, programmable pipelines, and experimental results that demonstrate its ability to enable reinforcement learning for any agent with minimal code changes.

Agent LightningMachine LearningOpen‑source Framework

0 likes · 20 min read

How Agent Lightning Redefines AI Agent Learning with Optimizer‑Agent Decoupling

PaperAgent

Mar 10, 2026 · Artificial Intelligence

How MemSifter Delivers High‑Precision, Low‑Cost Long‑Term Memory for LLMs

MemSifter introduces a lightweight agent that outsources memory retrieval for large language models, using a Think‑and‑Rank pipeline and a task‑result‑oriented reinforcement‑learning training paradigm to achieve superior retrieval accuracy and efficiency across eight benchmark tasks while keeping inference overhead minimal.

AgentEfficiencyLLM

0 likes · 13 min read

How MemSifter Delivers High‑Precision, Low‑Cost Long‑Term Memory for LLMs

AI Explorer

Mar 6, 2026 · Artificial Intelligence

AReaL: Lightning‑Fast Asynchronous RL Engine for Building High‑Performance LLM Agents

AReaL, an open‑source, fully asynchronous reinforcement‑learning platform co‑developed by Tsinghua University and Ant Group, dramatically speeds up training of complex LLM agents, offering a simple, stable, and hardware‑flexible solution for developers seeking industrial‑grade AI agents.

AI infrastructureAReaLAsynchronous Training

0 likes · 7 min read

AReaL: Lightning‑Fast Asynchronous RL Engine for Building High‑Performance LLM Agents

Tencent Cloud Developer

Mar 5, 2026 · Artificial Intelligence

20 Cutting‑Edge RAG Optimization Techniques: From Semantic Chunking to Self‑RAG

This article systematically presents twenty practical RAG (Retrieval‑Augmented Generation) optimization methods—covering semantic chunking, chunk‑size evaluation, context‑enhanced retrieval, query transformation, re‑ranking, feedback loops, multimodal and graph RAG, hierarchical retrieval, HyDE, Self‑RAG and reinforcement‑learning‑enhanced RAG—each with clear Python code examples, advantages, limitations and ideal use‑cases.

AILLMRAG

0 likes · 57 min read

20 Cutting‑Edge RAG Optimization Techniques: From Semantic Chunking to Self‑RAG

Kuaishou Tech

Mar 4, 2026 · Artificial Intelligence

How LLMs Are Revolutionizing Reinforcement Learning for Recommendation Systems

This survey examines the emerging LLM‑RL collaborative recommendation paradigm, outlining its research background, five main collaboration patterns, standardized evaluation protocols, and the key challenges and future directions for building smarter, more robust recommender systems.

Artificial IntelligenceLLMRecommendation Systems

0 likes · 14 min read

How LLMs Are Revolutionizing Reinforcement Learning for Recommendation Systems

Woodpecker Software Testing

Mar 4, 2026 · Artificial Intelligence

Deep Dive into Adversarial Testing Performance Optimization for AI Systems

The article examines Adversarial Testing Performance Optimization (ATPO) as a new industrial-quality paradigm, detailing how adversarial samples expose hidden performance bottlenecks across AI pipelines, presenting three typical adversarial loads with corresponding optimization targets, common implementation pitfalls, and emerging intelligent approaches using reinforcement learning and digital twins.

AI pipelinesDigital TwinPerformance Optimization

0 likes · 8 min read

Deep Dive into Adversarial Testing Performance Optimization for AI Systems

PaperAgent

Mar 3, 2026 · Artificial Intelligence

How CharacterFlywheel Scales Engaging LLMs: 15 Iterations of Production Optimization

The article presents CharacterFlywheel, a 15‑generation flywheel methodology that iteratively improves social‑dialogue LLMs in production using data‑driven reward models, rejection sampling, and a mix of SFT, DPO, and RL, with detailed experiments and best‑practice insights.

AI safetyLLM optimizationReward Modeling

0 likes · 12 min read

How CharacterFlywheel Scales Engaging LLMs: 15 Iterations of Production Optimization

PaperAgent

Mar 2, 2026 · Artificial Intelligence

SKILLRL: Boosting LLM Agents with Skill Distillation and Recursive Evolution

SKILLRL introduces a novel framework that transforms raw LLM agent trajectories into compact, reusable skills via experience‑driven distillation, hierarchical skill banks, and recursive skill evolution, achieving up to 90% success on ALFWorld and 73% on WebShop while reducing token usage by over 10% compared to memory‑based baselines.

LLM agentsSKILLRLhierarchical skill bank

0 likes · 10 min read

SKILLRL: Boosting LLM Agents with Skill Distillation and Recursive Evolution

AI Explorer

Mar 2, 2026 · Artificial Intelligence

OpenSandbox: Alibaba’s Open‑Source AI Sandbox for Secure, Scalable Agent Execution

OpenSandbox, an open‑source sandbox platform from Alibaba, offers a unified, secure, and extensible execution environment for AI agents, code execution, and reinforcement‑learning workloads, leveraging Docker and high‑performance Kubernetes runtimes, with multi‑language SDKs and fine‑grained network controls.

AI agentsAI sandboxDocker

0 likes · 7 min read

OpenSandbox: Alibaba’s Open‑Source AI Sandbox for Secure, Scalable Agent Execution

Xiaomi Tech

Mar 2, 2026 · Artificial Intelligence

How Xiaomi’s Tactile‑Enabled Robot Graduates from Lab to Automotive Assembly Line

The article details Xiaomi Robotics' transition of its VLA‑based robot with TacRefineNet tactile perception from laboratory experiments to a real automotive factory, achieving a 90.2% dual‑side success rate over three hours while meeting a 76‑second production cycle, and explains the end‑to‑end data‑driven control, multimodal sensing, whole‑body motion strategy, failure cases, and open resources.

TacRefineNetVLA modelXiaomi Robotics

0 likes · 8 min read

How Xiaomi’s Tactile‑Enabled Robot Graduates from Lab to Automotive Assembly Line

AI Frontier Lectures

Feb 28, 2026 · Artificial Intelligence

Can Reinforcement Learning Revolutionize Text-to-3D Generation? A Deep Dive

This article presents a systematic investigation of applying reinforcement learning to text‑to‑3D generation, detailing reward design, algorithm selection, a new 3D benchmark, a hierarchical GRPO framework, extensive ablations, and the resulting performance gains and limitations.

AI researchGenerative Modelsreinforcement learning

0 likes · 13 min read

Can Reinforcement Learning Revolutionize Text-to-3D Generation? A Deep Dive

Baobao Algorithm Notes

Feb 24, 2026 · Artificial Intelligence

The Bitter Lesson of Building Agentic RL in Terminal Environments

This article recounts the challenges of moving from single‑step RL with verifiable rewards to multi‑step agentic reinforcement learning in terminal environments, detailing infrastructure design, asynchronous pipelines, data quality checks, masking strategies, curriculum training, chunk‑based optimization, and practical lessons learned from large‑scale experiments.

Asynchronous TrainingCredit AssignmentEnvironment Augmentation

0 likes · 33 min read

The Bitter Lesson of Building Agentic RL in Terminal Environments

Machine Learning Algorithms & Natural Language Processing

Feb 23, 2026 · Artificial Intelligence

System Engineering Behind Billions of Parameters: Insider Training Details from Seven Top AI Labs

This article systematically dissects the engineering decisions behind frontier large‑language‑model training—covering architecture choices, attention variants, optimizer evolution, data‑curation strategies, scaling‑law insights, and post‑training SFT/RL pipelines—based on open‑source reports from seven leading AI laboratories.

Mixture of Expertslarge language modelsmodel training

0 likes · 26 min read

System Engineering Behind Billions of Parameters: Insider Training Details from Seven Top AI Labs

HyperAI Super Neural

Feb 19, 2026 · Artificial Intelligence

World Model & VLA Breakthroughs: Top Papers from NVIDIA, ByteDance, Tsinghua and Others

This roundup highlights six recent embodied AI papers that advance world models and vision‑language‑action (VLA) techniques, covering DreamDojo's massive first‑person video model, LingBot‑World simulator, Agent World Model generator, BagelVLA, ACoT‑VLA, and the closed‑loop World‑VLA‑Loop framework.

Embodied AISynthetic EnvironmentsVision-Language-Action

0 likes · 8 min read

World Model & VLA Breakthroughs: Top Papers from NVIDIA, ByteDance, Tsinghua and Others

Machine Learning Algorithms & Natural Language Processing

Feb 18, 2026 · Artificial Intelligence

Microsoft’s 671B LLM Unifies Offline Ad Tasks—Can It Cut Compute Costs?

Microsoft’s AdNanny replaces a forest of specialized offline models with a single 671 B LLM, using a three‑stage data factory to generate reasoning‑rich corpora, dynamic task re‑weighting, RL‑based metric alignment, and a hybrid 31‑pipeline‑parallel architecture that halves compute cost while boosting performance on core ad‑ranking tasks.

AdNannyLLMdynamic weighting

0 likes · 9 min read

Microsoft’s 671B LLM Unifies Offline Ad Tasks—Can It Cut Compute Costs?

Old Zhang's AI Learning

Feb 16, 2026 · Artificial Intelligence

Qwen3.5 Deep Dive: Multimodal Architecture, Benchmarks, and Deployment Guide

This article provides a detailed analysis of Qwen3.5, covering its multimodal MoE design, massive inference speedups, extensive benchmark results against GPT‑5.2, Claude 4.5 Opus and Gemini‑3 Pro, RL scaling strategies, training infrastructure innovations, and practical usage via API and local deployment.

FP8 trainingLarge Language ModelMultimodal AI

0 likes · 13 min read

Qwen3.5 Deep Dive: Multimodal Architecture, Benchmarks, and Deployment Guide

Machine Learning Algorithms & Natural Language Processing

Feb 15, 2026 · Artificial Intelligence

Embedding Error Correction into the Policy Space: How Search‑R2 Redefines Search‑Enhanced Reasoning

The Search‑R2 framework integrates error detection, localization, and correction into a reinforcement‑learning loop for search‑enhanced reasoning, achieving notably larger accuracy gains on difficult multi‑hop QA tasks than baseline methods, even when those baselines receive higher sampling budgets.

Error CorrectionMulti-hop QASearch-Enhanced Reasoning

0 likes · 15 min read

Embedding Error Correction into the Policy Space: How Search‑R2 Redefines Search‑Enhanced Reasoning

PaperAgent

Feb 15, 2026 · Artificial Intelligence

Why Memory Is the Next Critical Infrastructure for AI Agents

This survey reviews over 200 papers to propose a three‑dimensional classification framework for foundation‑agent memory, analyzes paradigm shifts from model‑centric to utility‑centric AI, and outlines memory substrates, cognitive mechanisms, operation strategies, learning paradigms, evaluation metrics, applications, and future research directions.

AI agentsAgent ArchitectureMemory Mechanisms

0 likes · 10 min read

Why Memory Is the Next Critical Infrastructure for AI Agents

Top Architect

Feb 14, 2026 · Artificial Intelligence

Why Test‑Time Compute Is the Next Breakthrough for Large Language Models

The article explains how inference‑oriented large language models shift the focus from training‑time resources to test‑time computation, detailing scaling laws, verification techniques, reinforcement‑learning pipelines such as DeepSeek‑R1, and methods for distilling reasoning abilities into smaller, consumer‑grade models.

inference computelarge language modelsmodel distillation

0 likes · 19 min read

Why Test‑Time Compute Is the Next Breakthrough for Large Language Models

Baidu Intelligent Cloud Tech Hub

Feb 12, 2026 · Artificial Intelligence

Deploying GLM-5 on Baidu Kunlun P800 XPU with vLLM‑Kunlun Plugin

This article explains how Baidu's new GLM-5 large model is adapted to the Kunlun P800 XPU, detailing the async reinforcement learning framework Slime, optimization techniques like INT8 quantization and tensor‑parallelism, and provides step‑by‑step deployment commands using the open‑source vLLM‑Kunlun plugin.

AI accelerationGLM-5INT8 Quantization

0 likes · 6 min read

Deploying GLM-5 on Baidu Kunlun P800 XPU with vLLM‑Kunlun Plugin

Open Source Tech Hub

Feb 12, 2026 · Artificial Intelligence

How GLM-5 Advances AI with Bigger Scale, Sparse Attention, and Agent Capabilities

GLM-5, a new large language model with 744 B parameters and 28.5 T tokens of training data, introduces DeepSeek sparse attention and an asynchronous RL system called slime, delivering strong benchmark gains on complex system engineering, long‑horizon agent tasks, and surpassing many open‑source competitors.

AIGLM-5Large Language Model

0 likes · 6 min read

How GLM-5 Advances AI with Bigger Scale, Sparse Attention, and Agent Capabilities

DeWu Technology

Feb 11, 2026 · Artificial Intelligence

How Generative Models Transform Re‑ranking Architecture for Faster, More Diverse Recommendations

This article examines the evolution of re‑ranking systems from traditional pointwise models to a two‑stage generation‑evaluation framework, compares autoregressive and non‑autoregressive generative approaches, details inference speed optimizations with GPU and model‑server upgrades, and outlines a future end‑to‑end sequence generation architecture enhanced by reinforcement learning and contrastive learning.

AIGenerative ModelsInference Optimization

0 likes · 14 min read

How Generative Models Transform Re‑ranking Architecture for Faster, More Diverse Recommendations

Ximalaya Technology Team

Feb 11, 2026 · Artificial Intelligence

How Ximalaya Used Generative AI to Revolutionize Audio Recommendations

This article details Ximalaya's journey from traditional multi‑stage recommendation pipelines to generative AI‑driven models, covering business challenges, architectural and model differences, phased deployments, knowledge distillation, semantic ID encoding, decoder‑only strategies, extensive offline and online evaluations, and future research directions.

Encoder-DecoderKnowledge DistillationRecommendation Systems

0 likes · 24 min read

How Ximalaya Used Generative AI to Revolutionize Audio Recommendations

Machine Learning Algorithms & Natural Language Processing

Feb 10, 2026 · Artificial Intelligence

Why Self‑Distillation Is the 2026 Keyword for Continual Learning in Large Models

At the start of 2026, self‑distillation dominates the most cited LLM papers, offering a teacher‑free way for large models to continually acquire new knowledge while preserving existing capabilities.

Reasoningcontinual learninglarge language models

0 likes · 9 min read

Why Self‑Distillation Is the 2026 Keyword for Continual Learning in Large Models

AI Frontier Lectures

Feb 10, 2026 · Artificial Intelligence

Can an 8B Model Outperform GPT‑4 in Faithfulness Detection? Inside FaithLens

FaithLens is an 8‑billion‑parameter model that surpasses GPT‑4.1 and other large models on twelve hallucination‑detection benchmarks while providing high‑quality natural‑language explanations, thanks to a novel data‑synthesis pipeline, three‑dimensional filtering, and rule‑based reinforcement learning.

Efficient InferenceLLM hallucinationexplainable AI

0 likes · 12 min read

Can an 8B Model Outperform GPT‑4 in Faithfulness Detection? Inside FaithLens

AI Frontier Lectures

Feb 10, 2026 · Artificial Intelligence

How SE‑Bench Uncovers the Hidden Challenges of Knowledge Internalization in Self‑Evolving AI

The paper introduces SE‑Bench, a code‑based benchmark that isolates knowledge internalization by obfuscating NumPy APIs, and uses it to reveal the Open‑Book paradox, the RL gap, and the potential of self‑play for true self‑evolution in large language models.

AISE-Benchknowledge internalization

0 likes · 17 min read

How SE‑Bench Uncovers the Hidden Challenges of Knowledge Internalization in Self‑Evolving AI

PaperAgent

Feb 7, 2026 · Artificial Intelligence

Can 13 Parameters Match Full‑Scale Fine‑Tuning? TinyLoRA’s RL Breakthrough

TinyLoRA, a Meta‑proposed method that fine‑tunes Qwen2.5‑7B with only 13 trainable parameters (26 bytes), achieves 91% accuracy on GSM8K under reinforcement learning, revealing that ultra‑low‑parameter RL can rival full‑scale supervised fine‑tuning.

GSM8KQwen2.5TinyLoRA

0 likes · 7 min read

Can 13 Parameters Match Full‑Scale Fine‑Tuning? TinyLoRA’s RL Breakthrough

AI Frontier Lectures

Feb 6, 2026 · Artificial Intelligence

Can Merging Text‑Only and Grounded Visual Reasoning Unlock Better Vision‑Language Models?

The paper introduces Mixture‑of‑Visual‑Thoughts (MoVT), a context‑adaptive reasoning paradigm that integrates pure‑text and visually‑grounded inference modes within a single model, and presents the two‑stage AdaVaR training framework with a novel AdaGRPO reinforcement‑learning algorithm to automatically select the optimal mode for each visual‑language task, achieving consistent gains across eight benchmarks and surpassing strong baselines including GPT‑4o.

AdaVaRMixture-of-Visual-Thoughtsadaptive inference

0 likes · 16 min read

Can Merging Text‑Only and Grounded Visual Reasoning Unlock Better Vision‑Language Models?

HyperAI Super Neural

Feb 6, 2026 · Artificial Intelligence

Latest Advances in AI Agents: PaperBanana, SDPO, Lumine, Idea2Story, and Insight Agents

This weekly roundup highlights five recent AI agent papers—PaperBanana for automated academic illustration, SDPO's self‑distillation reinforcement learning, Lumine's open‑world generalist agent, Idea2Story's pipeline for turning research ideas into narratives, and Insight Agents' fast e‑commerce insights—showcasing diverse breakthroughs in multi‑agent frameworks, self‑feedback learning, and real‑world deployment.

AI agentsautomated scientific narrativemulti-agent systems

0 likes · 8 min read

Latest Advances in AI Agents: PaperBanana, SDPO, Lumine, Idea2Story, and Insight Agents

Alimama Tech

Feb 5, 2026 · Artificial Intelligence

Can Few-Shot Reinforcement Learning Supercharge Budget-Constrained Auto-Bidding?

This paper introduces ABPlanner, a few‑shot, context‑aware budget planner that enhances budget‑constrained auto‑bidding in online advertising by hierarchically allocating budgets across short‑term stages and training a sequential decision‑maker with deep reinforcement learning, achieving significant gains in simulated and real‑world A/B tests.

auto-biddingbudget allocationfew-shot learning

0 likes · 13 min read

Can Few-Shot Reinforcement Learning Supercharge Budget-Constrained Auto-Bidding?

Baobao Algorithm Notes

Feb 4, 2026 · Artificial Intelligence

Efficient Long-Sequence Modeling: Linear & Sparse Attention, MegaKernels, RL Tricks

This article reviews recent 2025 advances in long‑sequence LLM inference, covering Kimi Linear attention, DuoAttention and DeepSeek Sparse Attention, MegaKernel and MPK designs for kernel‑level efficiency, reinforcement‑learning rollout optimizations, and the Tawa deep‑learning compiler framework.

Deep Learning CompilerLLM optimizationLinear Attention

0 likes · 22 min read

Efficient Long-Sequence Modeling: Linear & Sparse Attention, MegaKernels, RL Tricks

Baobao Algorithm Notes

Feb 4, 2026 · Artificial Intelligence

Mastering Reinforcement Learning: From Basics to Advanced Agentic RL Techniques

This comprehensive guide walks through reinforcement learning fundamentals, MDP modeling, value functions, Bellman equations, and key algorithms such as Q‑learning, REINFORCE, PPO, DPO, and GRPO, then contrasts LLM‑RL with Agentic‑RL and surveys leading industry frameworks and real‑world applications.

Artificial IntelligenceLLMMachine Learning

0 likes · 42 min read

Mastering Reinforcement Learning: From Basics to Advanced Agentic RL Techniques

Xiaomi Tech

Feb 3, 2026 · Artificial Intelligence

Xiaomi’s AI Research Secures Spots on ICLR 2026 – Papers and Key Findings

The International Conference on Learning Representations (ICLR) 2026 accepted multiple Xiaomi papers covering multimodal reasoning, reinforcement learning, GUI agents, autonomous driving, audio generation and benchmark design, each presenting novel frameworks, data‑centric training tricks and strong experimental results that advance the state of the art.

Audio GenerationAutonomous DrivingICLR 2026

0 likes · 17 min read

Xiaomi’s AI Research Secures Spots on ICLR 2026 – Papers and Key Findings

Baidu Geek Talk

Feb 2, 2026 · Artificial Intelligence

How Cloud AI Infra Powers the Next Wave of Embodied Intelligence

This article outlines the rapid rise of embodied intelligence, the explosion of Vision‑Language‑Action (VLA) research, and how cloud‑based AI infrastructure—including multi‑level IaaS, data pipelines, dual‑system model designs, and reinforcement‑learning workflows—addresses emerging scaling and deployment challenges.

VLAmultimodal modelsreinforcement learning

0 likes · 13 min read

How Cloud AI Infra Powers the Next Wave of Embodied Intelligence

DaTaobao Tech

Feb 2, 2026 · Operations

How Policy Regularization Boosts Deep Reinforcement Learning for Large‑Scale Inventory Management

This article presents DeepStock, a deep reinforcement learning framework with policy regularization that integrates classic inventory heuristics, achieving 7% turnover reduction and multi‑million cost savings across millions of SKU‑warehouse pairs in Alibaba's self‑operated ecosystem.

Operations Researchdeep learningindustrial AI

0 likes · 18 min read

How Policy Regularization Boosts Deep Reinforcement Learning for Large‑Scale Inventory Management

Data Party THU

Jan 31, 2026 · Artificial Intelligence

Can LLMs Learn While Being Tested? Inside the TTT-Discover Breakthrough

The article examines the Test‑Time Training to Discover (TTT‑Discover) approach, which applies reinforcement learning during inference to let large language models continuously improve on single test problems, and reports strong results across mathematics, GPU kernel optimization, algorithm design, and biology.

AI researchLLMScientific Discovery

0 likes · 9 min read

Can LLMs Learn While Being Tested? Inside the TTT-Discover Breakthrough

JD Tech

Jan 31, 2026 · Artificial Intelligence

How JD's 9N‑LLM Engine Powers Scalable Generative Recommendation at Massive Scale

This article details JD Retail's 9N‑LLM unified training framework that tackles the massive data, hardware heterogeneity, and algorithmic challenges of generative recommendation by integrating TensorFlow and PyTorch, supporting GPU/NPU, and delivering high‑throughput sample processing, sparse/dense optimization, and flexible reinforcement‑learning capabilities.

GPU/NPURaylarge-scale AI

0 likes · 26 min read

How JD's 9N‑LLM Engine Powers Scalable Generative Recommendation at Massive Scale

JD Retail Technology

Jan 30, 2026 · Artificial Intelligence

How JD’s 9N‑LLM Engine Powers Scalable Generative Recommendation at Industrial Scale

The article details JD Retail’s 9N‑LLM unified training engine—supporting TensorFlow and PyTorch, GPU and NPU, and both traditional and generative recommendation scenarios—explaining its architecture, high‑throughput sample engine, distributed sparse embedding system, five‑stage pipeline, UniAttention accelerator, and reinforcement‑learning capabilities that together enable TB‑scale data, B‑scale dense parameters, and efficient RL training for real‑world recommendation services.

GPU/NPUUniAttentiondistributed training

0 likes · 26 min read

How JD’s 9N‑LLM Engine Powers Scalable Generative Recommendation at Industrial Scale

DaTaobao Tech

Jan 30, 2026 · Artificial Intelligence

Human‑like LLM Replies for Live Digital Hosts: ASR‑Based Style Transfer and Reward Modeling

This article proposes an ASR‑driven pipeline that creates high‑quality AI‑reply vs. human‑like reply pairs, trains a rewrite model and a reward model, and uses GRPO reinforcement learning to generate natural, helpful, and less AI‑sounding responses in digital‑human live streaming, achieving 92% accuracy and 97% helpfulness while improving user experience.

ASR dataLLMQwen

0 likes · 20 min read

Human‑like LLM Replies for Live Digital Hosts: ASR‑Based Style Transfer and Reward Modeling

AI Engineering

Jan 30, 2026 · Artificial Intelligence

Why Letting LLMs Argue Improves Their Reasoning Quality

Google’s recent study of over 8,000 reasoning tasks shows that advanced LLMs like DeepSeek‑R1 spontaneously develop multiple internal “expert” personas that debate, and that activating a discovered “social switch” dramatically raises accuracy, revealing that engineered conflict can enhance AI reasoning.

AI debateFeature ControlLLM

0 likes · 8 min read

Why Letting LLMs Argue Improves Their Reasoning Quality

PaperAgent

Jan 30, 2026 · Artificial Intelligence

How LLM‑in‑Sandbox Turns Large Models into General‑Purpose Agents Without Extra Training

The LLM‑in‑Sandbox framework places large language models inside a virtual machine that provides external tool access, persistent storage, and code execution, yielding up to a 24.2% performance boost across six benchmark tasks without additional training, and it scales from zero‑shot to reinforcement‑learning‑enhanced agents while remaining cost‑effective.

EfficiencyLLMagentic AI

0 likes · 6 min read

How LLM‑in‑Sandbox Turns Large Models into General‑Purpose Agents Without Extra Training

Meituan Technology Team

Jan 29, 2026 · Artificial Intelligence

How LongCat‑Flash‑Thinking‑2601 Achieves Real‑World Generalization for Agents

LongCat‑Flash‑Thinking‑2601, a 560‑billion‑parameter MoE model, combines environment expansion, multi‑environment RL, systematic noise training, a heavy‑thinking reasoning mode, and Zigzag sparse attention to deliver strong benchmark performance and robust real‑world agent capabilities.

Environment ExpansionLarge Language ModelOpen Source

0 likes · 14 min read

How LongCat‑Flash‑Thinking‑2601 Achieves Real‑World Generalization for Agents

Alibaba Cloud Developer

Jan 28, 2026 · Artificial Intelligence

How We Built a High‑Performance AI Rental Advisor with One‑Model Tool‑Use and Reinforcement Learning

This article details the design, challenges, and performance gains of an AI‑driven rental recommendation system that replaces a multi‑agent architecture with a single LLM using dynamic tool‑use, introduces a two‑stage reinforcement‑learning pipeline, and achieves sub‑second latency and higher accuracy for complex rental scenarios.

AI recommendationLarge Language ModelSystem Architecture

0 likes · 19 min read

How We Built a High‑Performance AI Rental Advisor with One‑Model Tool‑Use and Reinforcement Learning

PaperAgent

Jan 25, 2026 · Artificial Intelligence

How Deep GraphRAG Solves Retrieval’s Three‑Way Dilemma with Hierarchical Search

Deep GraphRAG tackles the three‑fold dilemma of traditional Retrieval‑Augmented Generation by introducing hierarchical global‑to‑local retrieval, a beam‑search dynamic reordering that cuts latency, and a DW‑GRPO reinforcement‑learning module that adaptively weights rewards, achieving near‑state‑of‑the‑art performance with up to 86% faster inference.

AI researchGraphRAGHierarchical Retrieval

0 likes · 5 min read

How Deep GraphRAG Solves Retrieval’s Three‑Way Dilemma with Hierarchical Search

Meituan Technology Team

Jan 23, 2026 · Artificial Intelligence

How EvoCUA Set a New Open‑Source SOTA for Computer‑Use Agents with Evolutionary Learning

EvoCUA, a native computer‑use agent from Meituan, combines a verifiable data‑synthesis engine, a ten‑thousand‑level sandbox infrastructure, and an experience‑driven learning paradigm to overcome data‑scaling and feedback challenges, achieving a 56.7% success rate on the OSWorld benchmark and surpassing previous open‑source models.

AI agentComputer UseOSWorld

0 likes · 27 min read

How EvoCUA Set a New Open‑Source SOTA for Computer‑Use Agents with Evolutionary Learning

Tencent Advertising Technology

Jan 22, 2026 · Artificial Intelligence

How Tencent’s Bidding Algorithms Evolved from GMPC to GRB: A Deep Dive into Generative RL for Ads

The article reviews the 2025 evolution of Tencent advertising’s bidding system—from the second‑generation GMPC control algorithm through the third‑generation MRB reinforcement‑learning model to the fourth‑generation generative RL GRB—detailing architectural upgrades, multi‑channel modeling, training pipelines, and experimental gains, and outlines the 2026 AI‑agent roadmap.

AdvertisingGenerative ModelsOnline Learning

0 likes · 15 min read

How Tencent’s Bidding Algorithms Evolved from GMPC to GRB: A Deep Dive into Generative RL for Ads

Tencent Cloud Developer

Jan 20, 2026 · Artificial Intelligence

From Transformers to Agents: A Complete Timeline of Large Language Model Evolution

This article traces the evolution of large language models from the 2017 Transformer breakthrough through successive milestones such as BERT, GPT‑3, RL‑HF alignment, multimodal extensions, open‑source alternatives, and the rise of retrieval‑augmented generation, AI agents, and emerging protocols that shape modern AI applications.

Open-source modelsRAGlarge language models

0 likes · 44 min read

From Transformers to Agents: A Complete Timeline of Large Language Model Evolution

PaperAgent

Jan 19, 2026 · Artificial Intelligence

How Reinforcement Learning Can Boost LLM Reasoning by Shaping Token Distributions

Recent research shows that applying reinforcement learning to large language models can dramatically improve inference performance, but its effectiveness depends on the token distribution produced during pre‑training, prompting a novel rewrite of cross‑entropy as a single‑step policy gradient with controllable entropy parameters.

LLMRLToken Distribution

0 likes · 6 min read

How Reinforcement Learning Can Boost LLM Reasoning by Shaping Token Distributions

PaperAgent

Jan 16, 2026 · Artificial Intelligence

How a 4B Model Beats 30B Giants: Inside AgentCPM-Explore’s SOTA Performance

AgentCPM-Explore, a 4‑billion‑parameter open‑source model, achieves state‑of‑the‑art results on long‑range exploration tasks, matching or surpassing larger 8B and even 30B models, thanks to a full‑stack infrastructure, novel training tricks, and extensive benchmark evaluations across eight agent‑centric datasets.

AgentAgentCPM-ExploreLarge Language Model

0 likes · 10 min read

How a 4B Model Beats 30B Giants: Inside AgentCPM-Explore’s SOTA Performance

Xiaohongshu Tech REDtech

Jan 15, 2026 · Information Security

How Hi-Guard Improves Trustworthy Multimodal Content Moderation with Policy‑Aligned Reasoning

The Hi-Guard framework transforms content moderation by aligning multimodal models with policy rules through hierarchical prompting, a structured taxonomy, and soft‑margin reinforcement learning, achieving significant gains in accuracy, precision, recall, and explainability for large‑scale user‑generated content platforms.

Multimodal AIcontent moderationexplainability

0 likes · 9 min read

How Hi-Guard Improves Trustworthy Multimodal Content Moderation with Policy‑Aligned Reasoning

Amap Tech

Jan 14, 2026 · Artificial Intelligence

How ArenaRL Enables Open‑World Travel Agents to Learn via Comparative Reinforcement Learning

Gaode Maps and Tongyi DeepResearch unveil ArenaRL, an open‑domain reinforcement‑learning framework that replaces absolute scoring with relative ranking, uses self‑play and a linear‑complexity tournament, and demonstrates measurable gains on POI ranking and complex travel‑planning tasks.

ArenaRLopen-domainranking

0 likes · 8 min read

How ArenaRL Enables Open‑World Travel Agents to Learn via Comparative Reinforcement Learning

Bighead's Algorithm Notes

Jan 11, 2026 · Artificial Intelligence

FinRpt: A Multi‑Agent Framework for Automatic Generation and Evaluation of Stock Research Reports

FinRpt introduces a novel multi‑agent pipeline that builds a high‑quality stock research report (ERR) dataset from six financial data sources, defines a comprehensive 11‑metric evaluation suite, and demonstrates that supervised‑fine‑tuned and reinforcement‑learned LLM agents significantly outperform single LLM baselines in both accuracy and efficiency.

FinRptLLMdataset

0 likes · 14 min read

FinRpt: A Multi‑Agent Framework for Automatic Generation and Evaluation of Stock Research Reports

AI Engineering

Jan 10, 2026 · Artificial Intelligence

Teaching LLMs to Manage Memory Autonomously, Dropping Manual Rules

Alibaba's new AgeMem framework turns long‑term and short‑term memory management for large language model agents into a learnable reinforcement‑learning task, replacing handcrafted rules with a three‑stage training process and achieving significant benchmark gains.

AgeMemGRPOLLM

0 likes · 9 min read

Teaching LLMs to Manage Memory Autonomously, Dropping Manual Rules

Bighead's Algorithm Notes

Jan 8, 2026 · Artificial Intelligence

Alpha‑R1: Reinforcement‑Learning‑Driven Large‑Model Alpha Factor Selection

Alpha‑R1 integrates reinforcement learning with an 8‑billion‑parameter LLM to jointly process price and news data, creating context‑aware factor embeddings that outperform traditional quantitative and generic LLM baselines on CSI 300 and CSI 1000 portfolios, demonstrating robust alpha‑decay resistance and zero‑sample generalization.

Financial AILarge Language Modelalpha factor selection

0 likes · 16 min read

Alpha‑R1: Reinforcement‑Learning‑Driven Large‑Model Alpha Factor Selection

Amap Tech

Jan 8, 2026 · Artificial Intelligence

How AI Powers Fancy Video Generation for Real‑World POI Scenes

This article details the AI techniques behind Gaode's "Street Ranking" project, explaining the Fancy video concept, the dual training and production pipelines, and the use of SFT, reinforcement learning, MoE‑LoRA, distribution‑matching distillation, and quality‑filtering to achieve 25× faster generation with high aesthetic fidelity.

AI video generationMultimodaldistillation

0 likes · 24 min read

How AI Powers Fancy Video Generation for Real‑World POI Scenes

Tencent Advertising Technology

Jan 8, 2026 · Artificial Intelligence

How Tencent Boosted Ad Experience by Up to 20% Using Reinforcement‑Learning‑Based Ranking

Tencent's ad tech team redesigned its ad ranking system by adding a parallel user‑experience‑optimized pipeline and evolving from manual CEM tuning to DDPG‑based reinforcement learning, achieving 10‑20% improvements in CTR, repeat‑view rates, and other experience metrics while maintaining overall spend.

Advertisingmulti-objective optimizationranking

0 likes · 17 min read

How Tencent Boosted Ad Experience by Up to 20% Using Reinforcement‑Learning‑Based Ranking

Data Party THU

Jan 7, 2026 · Artificial Intelligence

Why the Common KL Penalty in LLM RL Training Is Biased—and How to Fix It

A recent study reveals that the widely used KL regularization in LLM reinforcement learning (RLVR) is mathematically biased, leading to unstable training and poorer generalization, and shows that moving the KL term back to the reward with a simple K1 estimator can boost out‑of‑domain performance by up to 20%.

AI researchKL regularizationLLM training

0 likes · 10 min read

Why the Common KL Penalty in LLM RL Training Is Biased—and How to Fix It

Bighead's Algorithm Notes

Jan 6, 2026 · Artificial Intelligence

FinRS: A Risk‑Sensitive Trading Framework for Real‑World Financial Markets

FinRS integrates hierarchical market analysis, dual decision agents, and multi‑time‑scale reward feedback to enable risk‑aware multi‑stage trading, achieving higher cumulative returns, better Sharpe ratios, and lower maximum drawdowns than existing LLM‑based and reinforcement‑learning baselines across diverse stocks.

FinRSLLMfinancial markets

0 likes · 14 min read

FinRS: A Risk‑Sensitive Trading Framework for Real‑World Financial Markets

Bighead's Algorithm Notes

Jan 4, 2026 · Artificial Intelligence

How VTA Combines Large‑Model Reasoning for Precise and Explainable Stock Time‑Series Forecasting

The VTA framework integrates large language model reasoning with textual annotation of technical indicators, employs a Time‑GRPO reinforcement‑learning objective and multi‑stage joint conditional training, and achieves state‑of‑the‑art accuracy and expert‑rated interpretability on US, Chinese and European stock datasets.

LLMTime-seriesVTA

0 likes · 19 min read

How VTA Combines Large‑Model Reasoning for Precise and Explainable Stock Time‑Series Forecasting

PaperAgent

Dec 29, 2025 · Artificial Intelligence

Unveiling Bottom‑up Policy Optimization: Boosting LLM Reasoning with Internal Strategies

This article introduces Bottom‑up Policy Optimization (BuPO), a novel reinforcement‑learning framework that treats large language models as collections of internal layer and modular policies, revealing distinct inference entropy patterns in Llama and Qwen‑3 and demonstrating superior performance on complex mathematical reasoning benchmarks.

AI researchBottom-up OptimizationInternal Policy

0 likes · 10 min read

Unveiling Bottom‑up Policy Optimization: Boosting LLM Reasoning with Internal Strategies

Data Party THU

Dec 28, 2025 · Artificial Intelligence

How Causal Reinforcement Learning Is Shaping Robust, Explainable AI

This comprehensive survey examines the emerging field of Causal Reinforcement Learning, classifies its core techniques, introduces eleven benchmark environments, evaluates four novel algorithms, and outlines challenges and future research directions for building robust, generalizable, and interpretable AI systems.

AI Robustnessalgorithm evaluationbenchmark environments

0 likes · 12 min read

How Causal Reinforcement Learning Is Shaping Robust, Explainable AI

DataFunTalk

Dec 25, 2025 · Artificial Intelligence

How DeepAgent Redefines General AI Reasoning with Scalable Toolsets

DeepAgent, a new end‑to‑end reasoning agent, integrates autonomous thinking, dynamic tool search, and execution to handle over 16,000 APIs, embodied tasks, and research assistance, achieving state‑of‑the‑art performance on benchmarks like TMDB, ToolBench, ALFWorld, WebShop, and GAIA.

Large Language ModelMemory ManagementReasoning

0 likes · 15 min read

How DeepAgent Redefines General AI Reasoning with Scalable Toolsets

Tencent Advertising Technology

Dec 25, 2025 · Artificial Intelligence

How RAVEN Leverages Reinforcement Reasoning for Precise Ad Video Violation Grounding

RAVEN is a reinforcement‑reasoning framework that combines curriculum learning with hierarchical rewards to enable multimodal large language models to accurately locate and classify violation segments in advertisement videos, even under noisy, large‑scale industrial data.

AdvertisingCurriculum Learningmultimodal LLM

0 likes · 17 min read

How RAVEN Leverages Reinforcement Reasoning for Precise Ad Video Violation Grounding

PaperAgent

Dec 23, 2025 · Artificial Intelligence

CATArena: A Competitive Benchmark That Turns Agent Scoring into Evolutionary Learning

CATArena introduces a tournament‑style evaluation framework where AI agents iteratively code, compete, and improve across classic board games, using three‑dimensional quantitative scores to measure strategy programming, global learning, and generalization, and reveals how different LLM‑based agents learn and adapt over multiple rounds.

AI BenchmarkAgent EvaluationCATArena

0 likes · 8 min read

CATArena: A Competitive Benchmark That Turns Agent Scoring into Evolutionary Learning