Tagged articles

30 articles

Page 1 of 1

May 28, 2026 · Artificial Intelligence

How Claude Code Prompt Caching Cuts AI Costs by Up to 90% and Boosts Efficiency

Prompt Caching in Anthropic's Claude Code replaces repeated processing of identical prompt prefixes with a prefix‑hash cache, slashing input‑token costs by up to 90%, reducing first‑token latency by 79%, and improving throughput, while preserving model output exactly as if no cache were used.

AI EngineeringCache InvalidationCache Metrics

0 likes · 30 min read

How Claude Code Prompt Caching Cuts AI Costs by Up to 90% and Boosts Efficiency

Architect

May 26, 2026 · Artificial Intelligence

Why Claude’s 17 New Capabilities Matter: Moving Agents from Chat to Real Workflows

Claude’s latest suite of 17 capabilities—Projects, Memory, Artifacts, Chrome, Cowork, Skills, and more—reframes the agent from a simple chat assistant into a workflow component, prompting architects to evaluate context entry, auditable outputs, permission boundaries, repeatable processes, and cost controls before deployment.

AI agentsClaudeContext Engineering

0 likes · 26 min read

Why Claude’s 17 New Capabilities Matter: Moving Agents from Chat to Real Workflows

AI Engineering

May 19, 2026 · Artificial Intelligence

Claude Adds Prompt Cache Diagnostics to Pinpoint Token Cost Spikes

Claude's new Prompt Cache Diagnostics feature lets developers see exactly why a cache miss occurs and how many tokens were wasted, providing beta‑header usage, Python examples, supported miss reasons, limitations, and privacy guarantees to help optimize token costs.

AI developmentAPI DiagnosticsAnthropic

0 likes · 9 min read

Claude Adds Prompt Cache Diagnostics to Pinpoint Token Cost Spikes

SuanNi

May 4, 2026 · Artificial Intelligence

Why Prompt Caching Is Everything for Claude Code

The article explains how Claude Code achieves extreme speed and low cost by building its architecture around a static prompt prefix, detailing the mechanics of prompt caching, safe model and tool switching, plan‑mode tooling, deferred loading, and cache‑safe context compression.

AI agentsAnthropicCache Optimization

0 likes · 10 min read

Why Prompt Caching Is Everything for Claude Code

AI Tech Publishing

May 1, 2026 · Artificial Intelligence

5 Counterintuitive Design Principles for Prompt Caching in Claude Code

The article details five counterintuitive design principles for Claude Code's prompt caching—optimizing prompt layout, using message‑based updates, never switching models or tools mid‑conversation, safely compressing context, and monitoring cache health—backed by concrete examples and up to 90% cost savings.

AI EngineeringCache OptimizationClaude Code

0 likes · 10 min read

5 Counterintuitive Design Principles for Prompt Caching in Claude Code

Architect

Apr 30, 2026 · Artificial Intelligence

How Hermes Agent’s Memory System Fixes the Layered Misconception in OpenClaw

The article dissects Hermes Agent’s four‑layer memory architecture—hot memory, session search, skills, and optional Honcho—explaining how each layer’s cost and purpose differ from OpenClaw’s approach, and why careful placement of facts, history, procedures, and user models leads to more stable, cache‑aware agents.

Agent MemoryContext ManagementHermes Agent

0 likes · 25 min read

How Hermes Agent’s Memory System Fixes the Layered Misconception in OpenClaw

Architect

Apr 24, 2026 · Artificial Intelligence

How Hermes Agents Self‑Evolve: What Should Remain After a Task?

The article examines Hermes Agent’s three‑layer memory system—facts, session retrieval, and process assets—detailing how Skills are created, stored, patched, and secured at runtime, and argues that reliable self‑evolution requires disciplined versioning, evaluation, and access controls rather than unchecked automatic skill generation.

AI SkillsHermes AgentProcess Assets

0 likes · 21 min read

How Hermes Agents Self‑Evolve: What Should Remain After a Task?

AI Architecture Hub

Apr 24, 2026 · Artificial Intelligence

How Claude Code Achieves a 92% Prompt Caching Hit Rate with Three Unbreakable Engineering Rules

Claude Code’s prompt‑caching delivers a 92% hit rate, slashing a 50‑round agent session cost from $6 to $1.15 by separating stable prefixes from dynamic tails, using a three‑layer cache architecture, exact token‑sequence matching, and three strict engineering rules that keep the cache hot and reliable.

Cache Hit RateClaude CodeCost Reduction

0 likes · 13 min read

How Claude Code Achieves a 92% Prompt Caching Hit Rate with Three Unbreakable Engineering Rules

AI Architecture Hub

Apr 23, 2026 · Artificial Intelligence

Why Prompt Caching Is Critical: Lessons from Building Claude Code

Prompt caching, a prefix‑matching technique that reuses prior LLM interactions, proved essential for Claude Code’s low latency and cost, and the article details counter‑intuitive practices such as arranging static prompts first, updating info via messages, avoiding mid‑session model or tool changes, and ensuring cache‑safe context forks.

AI EngineeringCache OptimizationClaude Code

0 likes · 10 min read

Why Prompt Caching Is Critical: Lessons from Building Claude Code

Architect

Apr 21, 2026 · Artificial Intelligence

Why a 92% Prompt Cache Hit Rate Slashes LLM Costs: A Deep Dive into Context Engineering

The article dissects Anthropic's Prompt Caching mechanism, explaining how a 92% cache‑hit rate dramatically reduces pre‑fill costs for long‑running AI agents by structuring stable and dynamic context, managing TTL, look‑back limits, and applying seven practical engineering checks.

AI agentsCache Hit RateClaude

0 likes · 22 min read

Why a 92% Prompt Cache Hit Rate Slashes LLM Costs: A Deep Dive into Context Engineering

AI Tech Publishing

Apr 20, 2026 · Artificial Intelligence

How Claude Code Achieves 92% Prompt Cache Hit Rate and Cuts Costs by 81% – A Deep Dive

This article explains the mechanics of prompt‑caching for large language models, breaks down static versus dynamic context, details KV‑cache operation and its pricing, and shows how Claude Code’s 30‑minute programming session reached a 92% cache hit rate that reduced inference costs by 81%, concluding with three production‑grade design rules.

AI agentsAnthropic APIClaude Code

0 likes · 13 min read

How Claude Code Achieves 92% Prompt Cache Hit Rate and Cuts Costs by 81% – A Deep Dive

Tencent Cloud Developer

Apr 15, 2026 · Artificial Intelligence

How Hermes Agent’s Skills System Enables Self‑Learning AI Agents

This article provides an in‑depth technical analysis of Hermes Agent’s Skills closed‑loop system, detailing its lifecycle from experience extraction and knowledge storage to intelligent retrieval, conditional activation, progressive disclosure, security scanning, and self‑improvement, while comparing it to academic prototypes like Voyager.

AI agentHermes AgentPrompt Caching

0 likes · 27 min read

How Hermes Agent’s Skills System Enables Self‑Learning AI Agents

Machine Heart

Apr 13, 2026 · Artificial Intelligence

What’s the Underlying Logic of Coding Agents and Why Do Claude Code Variants Outperform Others?

The article dissects coding agents by outlining their six core components, explaining how an agent harness orchestrates model inference, repository context, prompt caching, tool validation, context compression, structured memory, and bounded sub‑agents, and shows why these architectural choices give Claude Code a performance edge over plain LLMs.

Agent HarnessContext CompressionLLM

0 likes · 22 min read

What’s the Underlying Logic of Coding Agents and Why Do Claude Code Variants Outperform Others?

AI Tech Publishing

Apr 6, 2026 · Artificial Intelligence

Six Core Components of a Coding Agent Explained with Code

The article systematically breaks down the six essential building blocks of a programming agent—live repository context, prompt shape and cache reuse, structured tool access and validation, context reduction, structured session memory, and bounded sub‑agent delegation—illustrated with a Mini Coding Agent implementation and comparisons to Claude Code, Codex, and OpenClaw.

Coding AgentContext CompressionLLM

0 likes · 15 min read

Six Core Components of a Coding Agent Explained with Code

AI Programming Lab

Apr 5, 2026 · Artificial Intelligence

Do You Really Understand Tokens? A Deep Dive Starting from a Claude Code Session

The article explains what tokens are, how different models tokenize text, the role of token embeddings, positional encoding, self‑attention, KV cache, and why output tokens cost far more than input tokens, while also covering pricing differences and prompt‑caching savings across major LLM providers.

KV CacheLLM pricingLarge Language Model

0 likes · 13 min read

Do You Really Understand Tokens? A Deep Dive Starting from a Claude Code Session

Machine Heart

Apr 1, 2026 · Artificial Intelligence

Claude Code Source Leak: Inside the Accidental Open‑Source Release and New Buddy Feature

The accidental exposure of Claude Code’s TypeScript source via an npm source‑map mishap sparked a rapid community deep‑dive that uncovered anti‑distillation safeguards, a hidden Buddy pet, extensive prompt‑caching logic, undercover mode, auto‑compaction thresholds, and broader engineering trade‑offs, while Anthropic and its founder responded to the slip.

AI agentsClaude CodePrompt Caching

0 likes · 20 min read

Claude Code Source Leak: Inside the Accidental Open‑Source Release and New Buddy Feature

Shi's AI Notebook

Mar 27, 2026 · Artificial Intelligence

Decoding Prompt Caching: From PagedAttention Mechanics to Cost‑Saving Practices

The article explains how Prompt Caching leverages vLLM's PagedAttention and block‑level hashing to reuse KV cache across identical prefixes, dramatically cutting LLM inference latency and cost, and provides concrete engineering tips for maximizing cache hit rates.

HashingKV CacheLLM inference

0 likes · 7 min read

Decoding Prompt Caching: From PagedAttention Mechanics to Cost‑Saving Practices

Architect

Mar 18, 2026 · Artificial Intelligence

Why Prompt Caching Is More Than a Cost‑Saving Trick: It Shapes Agent Architecture

The article explains that Prompt Cache is not merely a way to reduce token costs, but a fundamental mechanism that forces developers to redesign the context management of long‑running AI agents, turning caching considerations into core architectural decisions.

Context EngineeringPrompt Cachinglarge language models

0 likes · 25 min read

Why Prompt Caching Is More Than a Cost‑Saving Trick: It Shapes Agent Architecture

DataFunTalk

Mar 15, 2026 · Artificial Intelligence

How OpenClaw v2026.3.7 Boosts Enterprise AI Agent Efficiency and Cuts Costs

The OpenClaw v2026.3.7 upgrade introduces webhook compatibility fixes, typing‑feedback support, a 33% prompt‑caching cost reduction, smarter model routing with domestic model integration, and persistent bindings for container deployments, making the platform far more suitable for enterprise AI agent scenarios.

AI agentsContainer DeploymentModel routing

0 likes · 10 min read

How OpenClaw v2026.3.7 Boosts Enterprise AI Agent Efficiency and Cuts Costs

Shuge Unlimited

Mar 14, 2026 · Artificial Intelligence

Free OpenClaw Guide: Building an Automated AI Content Creation Workflow (Worth 5000 ¥)

This step‑by‑step tutorial shows how to install and configure the self‑hosted OpenClaw AI gateway, set up essential skills, connect Feishu, and create two practical workflows—curtain image generation and viral content dissection—while covering API setup, budgeting, troubleshooting, and pricing considerations.

AI workflowFeishu botOpenClaw

0 likes · 22 min read

Free OpenClaw Guide: Building an Automated AI Content Creation Workflow (Worth 5000 ¥)

High Availability Architecture

Mar 12, 2026 · Artificial Intelligence

How Claude Code Hits 92% Prompt Cache Rate and Slashes AI Agent Costs by 81%

This article explains the prompt‑caching mechanism used by Claude Code, showing how separating static prefixes from dynamic tails and leveraging KV‑tensor caching reduces the O(n²) complexity of transformer pre‑fill to O(n), achieving a 92% cache hit rate and up to 81% cost savings in long‑running AI agent sessions.

AI agentsClaudeCost Reduction

0 likes · 12 min read

How Claude Code Hits 92% Prompt Cache Rate and Slashes AI Agent Costs by 81%

Code Mala Tang

Mar 9, 2026 · Artificial Intelligence

How Claude’s New Prompt Caching Cuts Token Costs by 90% for Long‑Running Agents

Claude’s API now automatically caches static parts of prompts—system instructions, tool definitions, and context—so repeated calls reuse these sections at only 10% of the standard token price, dramatically reducing costs for multi‑turn agents, but developers must manage prefixes and avoid cache‑breaking changes.

Claude APICost ReductionLLM engineering

0 likes · 15 min read

How Claude’s New Prompt Caching Cuts Token Costs by 90% for Long‑Running Agents

AI Engineer Programming

Mar 7, 2026 · Artificial Intelligence

Prompt Caching, Tool Design, and Agent Architecture: Insights from Claude Code

The article explains LLM inference stages, how KV‑cache and vLLM's Paged Attention enable cross‑request prompt caching, and shares practical guidelines for prompt ordering, immutable caching, and robust tool design that together shape efficient and reliable AI agent architectures.

Agent ArchitectureLLMPrompt Caching

0 likes · 18 min read

Prompt Caching, Tool Design, and Agent Architecture: Insights from Claude Code

AI Code to Success

Mar 1, 2026 · Artificial Intelligence

How Prompt Caching Supercharges Long‑Running AI Agents: 5 Practical Lessons

This article explains how Claude Code’s Prompt Caching technique dramatically reduces latency and cost for long‑running AI agents, and shares five hard‑won engineering practices—including prompt layout, message‑based updates, avoiding mid‑conversation model or tool changes, and safe context forking—to help developers build efficient, cache‑friendly AI applications.

Context ManagementPrompt CachingSystem Design

0 likes · 10 min read

How Prompt Caching Supercharges Long‑Running AI Agents: 5 Practical Lessons

AI Insight Log

Feb 27, 2026 · Artificial Intelligence

Claude Code Prompt‑Caching Bug Drained Quotas—Anthropic’s Hotfix and Architecture Reveal

A prompt‑caching bug in Claude Code caused users' quota to deplete rapidly, prompting Anthropic to issue an emergency hotfix in version 2.1.62, reset rate limits, and publicly disclose the core architecture and five counter‑intuitive caching rules for building reliable AI agents.

AI agentsAgent ArchitectureAnthropic

0 likes · 12 min read

Claude Code Prompt‑Caching Bug Drained Quotas—Anthropic’s Hotfix and Architecture Reveal

AI Waka

Feb 24, 2026 · Artificial Intelligence

How Claude’s New Auto‑Caching Cuts API Token Costs by 90%

By adding a single field to Claude API requests, developers can automatically cache static prompt parts, reducing token billing to just 10% of the original cost and dramatically lowering expenses for multi‑turn AI agents.

AI agentsClaude APICost Reduction

0 likes · 13 min read

How Claude’s New Auto‑Caching Cuts API Token Costs by 90%

PaperAgent

Feb 1, 2026 · Artificial Intelligence

Why Clawdbot Burns Millions of Tokens and How to Slash Its Costs

The article provides a deep technical breakdown of the OpenClaw (formerly Clawdbot) AI agent’s token consumption patterns, identifies four major architectural token‑black‑holes, explains why they are hard to avoid, and offers concrete mitigation strategies such as prompt caching, workflow engines, context compaction, tool pruning, and model routing to dramatically reduce operational costs.

AI agentsCost ReductionPrompt Caching

0 likes · 12 min read

Why Clawdbot Burns Millions of Tokens and How to Slash Its Costs

JavaGuide

Nov 19, 2025 · Artificial Intelligence

Spring AI 1.1 Released: Explosive New Features for Java AI Development

Spring AI 1.1.0 arrives with a major overhaul, adding out‑of‑the‑box Model Context Protocol support, five‑mode prompt caching that can cut LLM costs by up to 90%, reasoning APIs, recursive advisors, a broadened model ecosystem, enhanced vector‑store and chat‑memory options, and richer observability integrations.

AI integrationJavaMCP

0 likes · 9 min read

Spring AI 1.1 Released: Explosive New Features for Java AI Development

Instant Consumer Technology Team

Oct 10, 2025 · Artificial Intelligence

Why Does Claude Code Burn Tokens So Fast? A Deep Dive into Costs and Optimization

A developer recounts two days of using the VS Code Claude Code plugin, discovers a shocking 57 million token usage costing over $30, analyzes the breakdown, compares it with Copilot and Windsurf, and shares practical tips to curb token consumption and avoid rate limits.

AI coding toolsClaudePrompt Caching

0 likes · 12 min read

Why Does Claude Code Burn Tokens So Fast? A Deep Dive into Costs and Optimization

Baobao Algorithm Notes

Oct 17, 2024 · Artificial Intelligence

How Contextual Retrieval Slashes RAG Failures by Up to 67% and Cuts Costs

Anthropic’s Contextual Retrieval augments traditional RAG with contextual embeddings and BM25, reducing retrieval failure rates by 49% (up to 67% with reranking), improving accuracy across domains, and lowering latency and cost through Claude’s prompt‑caching feature.

AIBM25Contextual Retrieval

0 likes · 11 min read

How Contextual Retrieval Slashes RAG Failures by Up to 67% and Cuts Costs