Tagged articles
3 articles
Page 1 of 1
James' Growth Diary
James' Growth Diary
May 25, 2026 · Artificial Intelligence

Practical Agent Performance Tuning: Slash Latency 75%, Cut Token Costs 71%, Boost Throughput 217%

The article walks through a systematic performance map of LangChain agents and demonstrates concrete latency, token‑usage, and concurrency optimizations—streaming responses, Redis caching, model routing, prompt trimming, context summarisation, dynamic tool selection, parallel graph nodes and batch processing—showing real‑world gains of up to 75% lower latency, 71% fewer tokens and a 217% throughput increase.

Agent OptimizationLangChainLangGraph
0 likes · 30 min read
Practical Agent Performance Tuning: Slash Latency 75%, Cut Token Costs 71%, Boost Throughput 217%
AntTech
AntTech
Apr 23, 2026 · Artificial Intelligence

Ling-2.6-flash: Faster Response, Stronger Execution, and Higher Token Efficiency for Agent Workloads

Ling-2.6-flash is a 104B‑parameter Instruct model that uses a mixed‑linear architecture and token‑efficiency optimizations to achieve up to 340 tokens/s inference speed, 4× higher throughput than comparable models, and ten‑fold lower token consumption on Agent benchmarks, while maintaining SOTA performance.

Agent OptimizationLLMbenchmark
0 likes · 15 min read
Ling-2.6-flash: Faster Response, Stronger Execution, and Higher Token Efficiency for Agent Workloads
Alibaba Cloud Developer
Alibaba Cloud Developer
Oct 31, 2025 · Artificial Intelligence

Why AI Agents Fail and 10 Proven Ways to Make Them Reliable

This article shares the practical lessons learned from building Alibaba Cloud’s digital employee "YunXiaoEr Aivis", explaining why large‑language‑model agents often miss expectations and presenting ten concrete strategies—ranging from clear prompt design to memory management—that dramatically improve multi‑agent reliability.

AI AgentsAgent OptimizationContext Engineering
0 likes · 29 min read
Why AI Agents Fail and 10 Proven Ways to Make Them Reliable