Tagged articles
8 articles
Page 1 of 1
Fun with Large Models
Fun with Large Models
Nov 14, 2025 · Artificial Intelligence

Can GPT‑5.1’s Core Features Set a New Benchmark for Model Performance?

The article provides an in‑depth analysis of GPT‑5.1, highlighting its enhanced emotional conversation, stronger instruction‑following, superior code generation and physics simulation, and the new adaptive reasoning mechanism with two model variants, while comparing concrete test results against GPT‑5.

GPT-5.1adaptive reasoningconversation
0 likes · 9 min read
Can GPT‑5.1’s Core Features Set a New Benchmark for Model Performance?
Baidu Tech Salon
Baidu Tech Salon
Oct 24, 2025 · Artificial Intelligence

How Wenxin X1.1 Tops China’s LLMs on the New SuperCLUE-CPIF Benchmark

Recent release of the SuperCLUE-CPIF benchmark shows Baidu’s Wenxin X1.1 achieving the highest score among Chinese large language models, surpassing competitors like DeepSeek‑V3.2‑Exp‑Thinking and Hunyuan‑T1, with notable advantages in precise instruction following and complex task handling.

AI evaluationWenxin X1.1benchmark
0 likes · 4 min read
How Wenxin X1.1 Tops China’s LLMs on the New SuperCLUE-CPIF Benchmark
Meituan Technology Team
Meituan Technology Team
Aug 28, 2025 · Artificial Intelligence

How Meeseeks Redefines LLM Instruction-Following Evaluation

Meeseeks, a new benchmark released by Meituan’s M17 team, systematically evaluates large language models’ instruction‑following ability with a three‑tier framework, multi‑round self‑correction, and extensive real‑world data, revealing performance gaps among models such as OpenAI o‑series, Claude, DeepSeek and Qwen2.5.

AILLM evaluationMeeseeks
0 likes · 13 min read
How Meeseeks Redefines LLM Instruction-Following Evaluation
AntTech
AntTech
Jun 4, 2025 · Artificial Intelligence

LLaDA and LLaDA‑V: Large Language Diffusion Models and Their Multimodal Extensions

This article presents the LLaDA series of diffusion‑based large language models, explains how their generative‑modeling principle yields language intelligence comparable to autoregressive models, and details the multimodal LLaDA‑V architecture, training methods, experimental results, and broader implications for AI research.

Generative Modelingdiffusion modelsinstruction following
0 likes · 10 min read
LLaDA and LLaDA‑V: Large Language Diffusion Models and Their Multimodal Extensions
AI Frontier Lectures
AI Frontier Lectures
May 24, 2025 · Artificial Intelligence

When Chain‑of‑Thought Backfires: Why More Reasoning Can Hurt LLM Accuracy

A recent study from Harvard, Amazon and NYU shows that using chain‑of‑thought (CoT) prompting can significantly reduce large language models' ability to follow strict instructions, introducing a new "constraint attention" metric and four mitigation strategies to restore performance.

Chain-of-ThoughtLLMinstruction following
0 likes · 11 min read
When Chain‑of‑Thought Backfires: Why More Reasoning Can Hurt LLM Accuracy
Kuaishou Tech
Kuaishou Tech
Jul 23, 2024 · Artificial Intelligence

Parrot: Enhancing Multi-Turn Instruction Following for Large Language Models

This paper introduces Parrot, a system that enhances large language models' (LLMs) multi-turn instruction following capabilities through context-aware preference optimization (CaPO) and synthetic data generation, achieving significant performance improvements with limited training data.

CaPONLPdata synthesis
0 likes · 9 min read
Parrot: Enhancing Multi-Turn Instruction Following for Large Language Models