Author

PaperAgent

Daily updates, analyzing cutting-edge AI research papers

202

Articles

Likes

170

Views

Comments

Latest from PaperAgent

100 recent articles max

PaperAgent

Mar 6, 2026 · Artificial Intelligence

Which Frontier AI Model Leads 2026? GPT‑5.4 vs Opus 4.6 vs Gemini 3.1 Pro

A detailed 2026 benchmark comparison shows GPT‑5.4 excelling in knowledge work and native computer use, Gemini 3.1 Pro dominating inference at the lowest price, and Opus 4.6 leading software‑engineering tasks, while highlighting distinct pricing tiers, context‑window sizes, and the need for multi‑model routing.

AI benchmarksGPT-5.4Gemini 3.1 Pro

0 likes · 12 min read

Which Frontier AI Model Leads 2026? GPT‑5.4 vs Opus 4.6 vs Gemini 3.1 Pro

PaperAgent

Mar 6, 2026 · Artificial Intelligence

BeyondSWE: Rethinking Code Agent Benchmarks with Real‑World Multi‑Repo Challenges

BeyondSWE expands code‑agent evaluation beyond single‑repo bug fixing by introducing four realistic scenarios, scaling to 246 repositories and 500 samples, revealing a sharp performance drop for top models and highlighting the nuanced impact of search‑augmented agents like SearchSWE.

AI evaluationBeyondSWESearchSWE

0 likes · 6 min read

BeyondSWE: Rethinking Code Agent Benchmarks with Real‑World Multi‑Repo Challenges

PaperAgent

Mar 5, 2026 · Artificial Intelligence

Bridging Agent Runtime and RL: Inside the Claw‑R1 Training Framework

Claw‑R1, a new reinforcement‑learning framework from the USTC Cognitive Intelligence Lab, integrates the OpenClaw Agent Runtime with RL training to enable agents to learn directly in real environments, addressing the gap between simulated tasks and true tool‑calling, multi‑step reasoning, and stable long‑task execution.

AI infrastructureClaw-R1OpenClaw

0 likes · 10 min read

Bridging Agent Runtime and RL: Inside the Claw‑R1 Training Framework

PaperAgent

Mar 4, 2026 · Artificial Intelligence

How Doubao-Seed-2.0 Redefines Native Multimodal Agents and Coding

Doubao-Seed-2.0 showcases a native multimodal architecture that unifies vision and language, delivers state‑of‑the‑art visual‑language performance, and dramatically improves code generation for front‑end, bug‑fixing, and research‑assistant tasks, illustrating the shift toward truly functional AI agents.

AI Research AssistantDoubaoagent models

0 likes · 9 min read

How Doubao-Seed-2.0 Redefines Native Multimodal Agents and Coding

PaperAgent

Mar 3, 2026 · Artificial Intelligence

How CharacterFlywheel Scales Engaging LLMs: 15 Iterations of Production Optimization

The article presents CharacterFlywheel, a 15‑generation flywheel methodology that iteratively improves social‑dialogue LLMs in production using data‑driven reward models, rejection sampling, and a mix of SFT, DPO, and RL, with detailed experiments and best‑practice insights.

AI safetyLLM optimizationReward Modeling

0 likes · 12 min read

How CharacterFlywheel Scales Engaging LLMs: 15 Iterations of Production Optimization

PaperAgent

Mar 3, 2026 · Information Security

What 11 Critical Security Flaws Were Uncovered in OpenClaw AI Agents?

A comprehensive study of the OpenClaw framework reveals eleven severe security vulnerabilities in multi‑agent AI systems, ranging from over‑reactive data deletion to identity‑spoofing attacks, resource‑exhaustion loops, and covert manipulation, highlighting systemic social‑coherence failures and the need for robust agent governance.

AI AgentsLLM SecurityOpenClaw

0 likes · 14 min read

What 11 Critical Security Flaws Were Uncovered in OpenClaw AI Agents?

PaperAgent

Mar 2, 2026 · Artificial Intelligence

SKILLRL: Boosting LLM Agents with Skill Distillation and Recursive Evolution

SKILLRL introduces a novel framework that transforms raw LLM agent trajectories into compact, reusable skills via experience‑driven distillation, hierarchical skill banks, and recursive skill evolution, achieving up to 90% success on ALFWorld and 73% on WebShop while reducing token usage by over 10% compared to memory‑based baselines.

LLM agentsSKILLRLhierarchical skill bank

0 likes · 10 min read

SKILLRL: Boosting LLM Agents with Skill Distillation and Recursive Evolution

PaperAgent

Mar 1, 2026 · Artificial Intelligence

How On-Policy Context Distillation Enables LLMs to Retain Experience Forever

On-Policy Context Distillation (OPCD) compresses transient in‑context knowledge into LLM parameters, allowing models to permanently retain problem‑solving experience without ground‑truth labels; the article details the OPCD framework, training steps, teacher‑student configurations, and experimental results on math, games, and system‑prompt tasks, highlighting its advantages over traditional context distillation.

Artificial IntelligenceKnowledge DistillationLLM

0 likes · 8 min read

How On-Policy Context Distillation Enables LLMs to Retain Experience Forever

PaperAgent

Feb 27, 2026 · Artificial Intelligence

How DualPath Eliminates Storage Bandwidth Bottlenecks in Agentic LLM Inference

This article analyzes the DualPath architecture that redesigns KV‑Cache data paths to overcome storage‑NIC saturation in Prefill‑Decode LLM systems, presenting theoretical proofs, detailed engineering solutions, and extensive offline and online benchmarks that demonstrate up to 2.25× performance gains.

DualPathLLM inferencePerformance Optimization

0 likes · 9 min read

How DualPath Eliminates Storage Bandwidth Bottlenecks in Agentic LLM Inference

PaperAgent

Feb 27, 2026 · Artificial Intelligence

How HyperRAG Uses N‑ary Hypergraphs to Overcome Binary KG Limitations

HyperRAG introduces an n‑ary hypergraph retrieval framework that replaces binary knowledge‑graph triples with hyperedges, addressing semantic fragmentation and path‑explosion while delivering superior accuracy and efficiency across multiple closed‑ and open‑domain QA benchmarks.

HyperRAGHypergraphLLM Retrieval

0 likes · 6 min read

How HyperRAG Uses N‑ary Hypergraphs to Overcome Binary KG Limitations