Author

PaperAgent

Daily updates, analyzing cutting-edge AI research papers

202

Articles

Likes

170

Views

Comments

Latest from PaperAgent

100 recent articles max

PaperAgent

May 20, 2026 · Artificial Intelligence

AutoTTS Shows How AI Agents Can Outperform Human‑Designed Test‑Time Scaling Strategies

The paper “LLMs Improving LLMs” introduces AutoTTS, an environment where a Claude‑based explorer agent automatically searches test‑time scaling policies, achieving up to 69.5% token savings and superior accuracy on unseen models, all for $39.9 and 160 minutes without any LLM calls during evaluation.

AutoTTSClaudeLLM agents

0 likes · 7 min read

AutoTTS Shows How AI Agents Can Outperform Human‑Designed Test‑Time Scaling Strategies

PaperAgent

May 19, 2026 · Artificial Intelligence

Why Long-Term Memory Needs Vision: How MemEye Evaluates Multimodal Agent Recall

MemEye is a multimodal memory benchmark that tests agents across eight real‑world scenarios, measuring visual evidence granularity and reasoning depth, and reveals that captions fall short for fine‑grained visual recall, highlighting the need for true visual memory in long‑term AI agents.

AI AgentsMemEyebenchmark

0 likes · 4 min read

Why Long-Term Memory Needs Vision: How MemEye Evaluates Multimodal Agent Recall

PaperAgent

May 18, 2026 · Artificial Intelligence

How MemWeaver Combines Behavioral and Cognitive Memory to Rebuild LLM Personalization

MemWeaver introduces a hierarchical memory that fuses behavior‑level and cognition‑level user signals, enabling large language models to generate more personalized content across multiple tasks, with extensive experiments, ablations, and an efficient incremental update mechanism demonstrating superior performance over strong baselines.

LLM personalizationLaMP benchmarkbehavioral memory

0 likes · 12 min read

How MemWeaver Combines Behavioral and Cognitive Memory to Rebuild LLM Personalization

PaperAgent

May 17, 2026 · Artificial Intelligence

Turning LLMs into CT Scans: How Alibaba’s Safe‑SAIL Makes AI Decision Black Boxes Transparent

The paper introduces Safe‑SAIL, a Sparse Autoencoder Interpretation Framework for LLMs that provides pre‑explanation metrics, a segment‑level simulation to cut evaluation cost, and a 1,758‑feature safety database, enabling transparent analysis and interactive debugging of large language model safety decisions.

InterpretabilityLLMSafety

0 likes · 12 min read

Turning LLMs into CT Scans: How Alibaba’s Safe‑SAIL Makes AI Decision Black Boxes Transparent

PaperAgent

May 16, 2026 · Artificial Intelligence

A First Systematic Survey of Agent Skills: Taxonomy, Techniques, and Applications

This survey analyzes the emerging field of Agent Skills, defining a formal skill model, categorizing acquisition pathways, detailing retrieval strategies, and outlining a five‑stage evolution process, while highlighting large‑scale skill repositories and their implications for AI product design.

AI AgentsAgent SkillsSkill Evolution

0 likes · 9 min read

A First Systematic Survey of Agent Skills: Taxonomy, Techniques, and Applications

PaperAgent

May 15, 2026 · Artificial Intelligence

How a 0.6B Model Beats GPT‑5.2 at Agent Privacy – Introducing MemPrivacy

The article analyzes the long‑standing privacy dilemma of cloud‑based agents, presents MemPrivacy’s three‑stage de‑identification framework and four‑level privacy taxonomy, details its two‑phase training with the MemPrivacy‑Bench dataset, and shows benchmark results where a 0.6B model outperforms GPT‑5.2 while keeping latency under 0.5 seconds.

AgentMemPrivacybenchmark

0 likes · 11 min read

How a 0.6B Model Beats GPT‑5.2 at Agent Privacy – Introducing MemPrivacy

PaperAgent

May 14, 2026 · Artificial Intelligence

New Paradigm for LLM Alignment: Insights from Two Recent Anthropic Papers

Anthropic's two May papers reveal that simple SFT/RLHF is insufficient for safe LLMs; inserting a model‑spec mid‑training stage and synthetic‑document fine‑tuning dramatically reduces agentic misalignment, improves data efficiency, and enables models to reason about values before acting.

Agentic MisalignmentAnthropicLLM alignment

0 likes · 13 min read

New Paradigm for LLM Alignment: Insights from Two Recent Anthropic Papers

PaperAgent

May 13, 2026 · Artificial Intelligence

One-for-All Multi-Agent Collaboration: Adaptive Cross-Task Topology Design

The paper introduces OFA-MAS, a one‑for‑all multi‑agent system that learns a universal topology designer using task‑aware graph encoding and a Mixture‑of‑Experts generator, achieving superior performance, OOD generalization, robustness, and efficiency across six major benchmarks.

LLMMixture of ExpertsTask-Aware Graph Encoder

0 likes · 14 min read

One-for-All Multi-Agent Collaboration: Adaptive Cross-Task Topology Design

PaperAgent

May 11, 2026 · Artificial Intelligence

SkillOS: How Skill Governance Powers Self‑Evolving AI Agents

SkillOS addresses the one‑off nature of current LLM agents by introducing a closed‑loop system where a trainable Skill Curator continuously extracts, updates, and manages reusable skills from execution traces, leading to measurable gains in success rates, efficiency, and cross‑task generalization.

Grouped Task StreamsLLM agentsMeta-Strategy Skills

0 likes · 10 min read

SkillOS: How Skill Governance Powers Self‑Evolving AI Agents

PaperAgent

May 9, 2026 · Artificial Intelligence

How Anthropic’s Natural Language Autoencoders Open the LLM Black Box

Anthropic’s Natural Language Autoencoders (NLA) translate high‑dimensional LLM activation vectors into readable text, using an Activation Verbalizer and Reconstruction module trained via RL to maximize Fraction of Variance Explained, and reveal internal planning, language bias, tool‑call hallucinations, and hidden reasoning across multiple Claude models.

Activation VerbalizerAnthropicClaude

0 likes · 9 min read

How Anthropic’s Natural Language Autoencoders Open the LLM Black Box