Tagged articles

8 articles

Page 1 of 1

Nov 14, 2025 · Artificial Intelligence

Can GPT‑5.1’s Core Features Set a New Benchmark for Model Performance?

The article provides an in‑depth analysis of GPT‑5.1, highlighting its enhanced emotional conversation, stronger instruction‑following, superior code generation and physics simulation, and the new adaptive reasoning mechanism with two model variants, while comparing concrete test results against GPT‑5.

GPT-5.1adaptive reasoningconversation

0 likes · 9 min read

Can GPT‑5.1’s Core Features Set a New Benchmark for Model Performance?

Baidu Tech Salon

Oct 24, 2025 · Artificial Intelligence

How Wenxin X1.1 Tops China’s LLMs on the New SuperCLUE-CPIF Benchmark

Recent release of the SuperCLUE-CPIF benchmark shows Baidu’s Wenxin X1.1 achieving the highest score among Chinese large language models, surpassing competitors like DeepSeek‑V3.2‑Exp‑Thinking and Hunyuan‑T1, with notable advantages in precise instruction following and complex task handling.

AI evaluationLarge Language ModelsWenxin X1.1

0 likes · 4 min read

How Wenxin X1.1 Tops China’s LLMs on the New SuperCLUE-CPIF Benchmark

Meituan Technology Team

Aug 28, 2025 · Artificial Intelligence

How Meeseeks Redefines LLM Instruction-Following Evaluation

Meeseeks, a new benchmark released by Meituan’s M17 team, systematically evaluates large language models’ instruction‑following ability with a three‑tier framework, multi‑round self‑correction, and extensive real‑world data, revealing performance gaps among models such as OpenAI o‑series, Claude, DeepSeek and Qwen2.5.

AILLM evaluationMeeseeks

0 likes · 13 min read

How Meeseeks Redefines LLM Instruction-Following Evaluation

AntTech

Jun 4, 2025 · Artificial Intelligence

LLaDA and LLaDA‑V: Large Language Diffusion Models and Their Multimodal Extensions

This article presents the LLaDA series of diffusion‑based large language models, explains how their generative‑modeling principle yields language intelligence comparable to autoregressive models, and details the multimodal LLaDA‑V architecture, training methods, experimental results, and broader implications for AI research.

Generative ModelingLarge Language Modelsdiffusion models

0 likes · 10 min read

LLaDA and LLaDA‑V: Large Language Diffusion Models and Their Multimodal Extensions

Baobao Algorithm Notes

May 26, 2025 · Artificial Intelligence

Why Do Reasoning LLMs Lose Instruction-Following Ability? A Deep Dive into Recent Findings

This article compares two recent papers that investigate why large reasoning models such as Llama and Qwen show degraded instruction‑following performance when using chain‑of‑thought prompting, analyzing attention patterns, training effects, and proposed mitigation strategies.

LLMattentionchain-of-thought

0 likes · 11 min read

Why Do Reasoning LLMs Lose Instruction-Following Ability? A Deep Dive into Recent Findings

AI Frontier Lectures

May 24, 2025 · Artificial Intelligence

When Chain‑of‑Thought Backfires: Why More Reasoning Can Hurt LLM Accuracy

A recent study from Harvard, Amazon and NYU shows that using chain‑of‑thought (CoT) prompting can significantly reduce large language models' ability to follow strict instructions, introducing a new "constraint attention" metric and four mitigation strategies to restore performance.

Chain-of-ThoughtLLMPrompt Engineering

0 likes · 11 min read

When Chain‑of‑Thought Backfires: Why More Reasoning Can Hurt LLM Accuracy

Ops Development & AI Practice

Apr 5, 2025 · Artificial Intelligence

Why Do LLMs Follow Instructions So Well? Unpacking the Secrets

This article explains the concept of instruction‑following in large language models, compares early and modern LLMs, details the training techniques that enable it, highlights its importance, offers practical prompting tips, and discusses current challenges and future directions.

AILLMMachine Learning

0 likes · 10 min read

Why Do LLMs Follow Instructions So Well? Unpacking the Secrets

Kuaishou Tech

Jul 23, 2024 · Artificial Intelligence

Parrot: Enhancing Multi-Turn Instruction Following for Large Language Models

This paper introduces Parrot, a system that enhances large language models' (LLMs) multi-turn instruction following capabilities through context-aware preference optimization (CaPO) and synthetic data generation, achieving significant performance improvements with limited training data.

CaPOLarge Language ModelsNLP

0 likes · 9 min read

Parrot: Enhancing Multi-Turn Instruction Following for Large Language Models