Tagged articles

44 articles

Page 1 of 1

May 31, 2026 · Artificial Intelligence

How Claude Code, Codex, and OpenCode Can Cut Token Usage by Up to 80%

The article breaks down why input tokens dominate 70‑90% of LLM costs and provides concrete, platform‑specific techniques—file filtering, context compression, documentation drives, memory caching, plan mode, output trimming, and model switching—that together can reduce token consumption by 20‑90% across Claude Code, Codex, and OpenCode.

AI coding assistantsClaude CodeCodex

0 likes · 10 min read

How Claude Code, Codex, and OpenCode Can Cut Token Usage by Up to 80%

java1234

May 28, 2026 · Artificial Intelligence

Cut Claude Code Token Costs by 80% with OpenWolf

OpenWolf, an open-source middleware for Claude Code, can slash token consumption by up to 80% by using a project map, learning memory, token ledger, bug memory, and six lifecycle hooks, all without changing your existing Claude CLI workflow.

AI toolsCLIClaude Code

0 likes · 8 min read

Cut Claude Code Token Costs by 80% with OpenWolf

AI Architecture Path

May 24, 2026 · Artificial Intelligence

How agentmemory Fixes Claude Code Forgetting and Slashes Token Usage by 92%

The article explains how the open‑source agentmemory system solves common AI‑coding assistant pain points—session forgetfulness, repetitive context feeding, and high token costs—by providing automatic, cross‑tool persistent memory, hybrid retrieval, and a zero‑dependency deployment that reduces token consumption by 92% while offering detailed benchmarks and configuration guides.

AI agentMCPagentmemory

0 likes · 15 min read

How agentmemory Fixes Claude Code Forgetting and Slashes Token Usage by 92%

AI Architecture Path

May 23, 2026 · Artificial Intelligence

Claude Code Controls the Browser with Playwright and Chrome DevTools MCP

The article compares Playwright MCP and Chrome DevTools MCP, explains their core differences, token consumption, waiting mechanisms, and tool capabilities, and provides step‑by‑step installation, configuration, and practical scenarios, showing how combining snapshot‑based analysis with these tools lets Claude Code efficiently automate browsers while avoiding common pitfalls such as token exhaustion and unstable execution.

AI automationChrome DevTools MCPClaude Code

0 likes · 11 min read

Claude Code Controls the Browser with Playwright and Chrome DevTools MCP

AI Engineering

May 19, 2026 · Artificial Intelligence

Claude Adds Prompt Cache Diagnostics to Pinpoint Token Cost Spikes

Claude's new Prompt Cache Diagnostics feature lets developers see exactly why a cache miss occurs and how many tokens were wasted, providing beta‑header usage, Python examples, supported miss reasons, limitations, and privacy guarantees to help optimize token costs.

AI developmentAPI DiagnosticsAnthropic

0 likes · 9 min read

Claude Adds Prompt Cache Diagnostics to Pinpoint Token Cost Spikes

AI Engineering

May 18, 2026 · Artificial Intelligence

Stop Throwing Money at AI: 10 Open‑Source Tools Cut Claude Code Tokens by 80% and Slash Large Projects by 49×

The article reviews ten open‑source utilities that dramatically reduce token consumption for AI coding assistants—cutting up to 80% of Claude Code tokens, saving hundreds of dollars, and shrinking large‑project token usage by as much as 49‑fold through output compression, command‑log filtering, and selective code‑base context.

AI codingClaude CodeOpen Source

0 likes · 14 min read

Stop Throwing Money at AI: 10 Open‑Source Tools Cut Claude Code Tokens by 80% and Slash Large Projects by 49×

AI Architecture Hub

May 15, 2026 · Artificial Intelligence

Unlock Claude's Full Potential: 18 Essential Steps

Most Claude users only tap 10% of its capabilities; this guide walks you through 18 concrete steps—creating persistent projects, crafting custom instructions, treating Claude as a thinking partner, controlling token usage, and more—to transform it into a personalized, high‑performance assistant.

AI assistantAI productivityClaude

0 likes · 15 min read

Unlock Claude's Full Potential: 18 Essential Steps

Senior Brother's Insights

May 14, 2026 · Artificial Intelligence

7 Practical Tips to Slash Claude Code Token Usage by 80%

This article analyzes why token waste in Claude Code stems mainly from bloated context rather than verbose prompts and presents seven concrete techniques—including model selection, CLAUDE.md management, Subagent usage, precise file targeting, early compacting, context diagnostics, and restrained tool integration—to reduce token consumption by up to 80% while preserving workflow efficiency.

AI coding assistantClaude CodeCompact command

0 likes · 14 min read

7 Practical Tips to Slash Claude Code Token Usage by 80%

Su San Talks Tech

May 13, 2026 · Artificial Intelligence

Cut Claude Code Token Costs by Up to 89% with the Open‑Source RTK CLI

RTK is a high‑performance CLI proxy that filters and compresses command output before it reaches Claude Code’s 200k‑token LLM context, reducing token consumption by 60‑90% and cutting costs up to 89%, with step‑by‑step installation and usage instructions provided.

CLIClaude CodeLLM

0 likes · 5 min read

Cut Claude Code Token Costs by Up to 89% with the Open‑Source RTK CLI

IT Services Circle

May 6, 2026 · Artificial Intelligence

How to Cut Large‑Model Token Usage by Over 90%

The article analyses why AI Skills waste massive token counts, demonstrates a pure‑Skill implementation that costs $10 and 12 minutes, then shows a code‑plus‑model hybrid that reduces runtime to 17 seconds, API calls to one, and cost to $0.004, saving more than 99% of tokens.

ClaudeOpenRouterPlaywright

0 likes · 19 min read

How to Cut Large‑Model Token Usage by Over 90%

Frontend AI Walk

Apr 30, 2026 · Artificial Intelligence

Master AI Coding with Matt Pocock Skills: From Deep Alignment to Architecture in One Workflow

This guide walks developers through installing and using Matt Pocock Skills—a lightweight, composable set of AI‑agent commands that provide deep alignment, shared language, feedback loops, architecture reviews and token‑saving modes to turn "vibe coding" into repeatable, high‑quality delivery.

AI codingDocumentationTest‑Driven Development

0 likes · 19 min read

Master AI Coding with Matt Pocock Skills: From Deep Alignment to Architecture in One Workflow

DeWu Technology

Apr 29, 2026 · Information Security

How a General AI Agent Powers Scalable Gateway Route Security Audits

The article presents a practical AI‑driven security audit system for gateway routes that uses a layered “general Agent + business Skill” design, combines batch AI filtering with human verification, achieves full‑coverage, minute‑level detection, and reduces token costs by over 95 % through multiple optimizations.

AI agentAPI SecurityMCP Tool

0 likes · 15 min read

How a General AI Agent Powers Scalable Gateway Route Security Audits

DevOps Coach

Apr 27, 2026 · Artificial Intelligence

Can You Cut Claude Code’s Token Usage by 75%? A Simple Plugin Shows How

The article demonstrates that Claude Code’s verbose responses waste hundreds of tokens, but a free “caveman” plugin can slash token consumption by up to 75% while preserving answer quality, backed by benchmark data and a research paper on concise replies.

ClaudeLLM cost reductioncaveman plugin

0 likes · 6 min read

Can You Cut Claude Code’s Token Usage by 75%? A Simple Plugin Shows How

IoT Full-Stack Technology

Apr 27, 2026 · Artificial Intelligence

Cut Token Usage by Up to 80% in Claude Code, Codex, and OpenCode

The article explains how to dramatically reduce token consumption in Claude Code, GitHub Copilot's Codex, and the open‑source OpenCode by tightly controlling input, trimming context, filtering files, leveraging tools, caching, and model selection, offering concrete commands, configuration files, and a ten‑step checklist that can cut usage by up to 80%.

AI coding assistantClaudeCodex

0 likes · 11 min read

Cut Token Usage by Up to 80% in Claude Code, Codex, and OpenCode

AI Waka

Apr 26, 2026 · Artificial Intelligence

Unlocking Reliable AI Agents: A Deep Dive into Harness Engineering

The article examines why raw LLM models fail as autonomous coding agents and introduces Harness Engineering—a disciplined scaffold of prompts, tools, context policies, hooks, and sub‑agents—that mitigates context corruption, long‑task collapse, and security risks while cutting token costs by up to 50%.

AI agentHarness EngineeringLLM safety

0 likes · 14 min read

Unlocking Reliable AI Agents: A Deep Dive into Harness Engineering

MeowKitty Programming

Apr 25, 2026 · Backend Development

When Connecting Java to AI, More Tools Aren’t Always Better: Dynamic Tool Discovery Is the New Hotspot

The article explains why loading a Java AI agent with dozens of tools hurts token efficiency and accuracy, and how Spring AI’s dynamic tool discovery—implemented via ToolSearchToolCallAdvisor—lets models fetch only the needed tools per turn, saving up to 64% of tokens and simplifying tool governance for large Java back‑ends.

AI agentsBackend IntegrationDynamic Tool Discovery

0 likes · 7 min read

When Connecting Java to AI, More Tools Aren’t Always Better: Dynamic Tool Discovery Is the New Hotspot

Code Mala Tang

Apr 25, 2026 · Cloud Native

Why MCP Still Matters: Finding the Optimal Path for Agents to Connect to External Systems

The article compares direct API calls, CLI tools, and the Model Context Protocol (MCP) for agent integration, explains MCP's token overhead, presents two token‑reduction strategies, and outlines design principles for building high‑availability MCP servers to maximize agent utility.

AI agentsCLICloud Native

0 likes · 13 min read

Why MCP Still Matters: Finding the Optimal Path for Agents to Connect to External Systems

IoT Full-Stack Technology

Apr 25, 2026 · Artificial Intelligence

How to Cut Claude Code, Codex, and OpenCode Token Usage by Up to 80%

The article breaks down why input tokens dominate cost (70‑90%), then details platform‑specific techniques—file filtering, context compression, documentation‑driven prompts, memory management, plan mode, output trimming, and model switching—that together can reduce Claude Code, Codex, and OpenCode token consumption by 60‑90%, with a practical 10‑step checklist.

AI coding assistantsClaude CodeCodex

0 likes · 11 min read

How to Cut Claude Code, Codex, and OpenCode Token Usage by Up to 80%

Machine Heart

Apr 22, 2026 · Artificial Intelligence

Honor YOYO Claw: The First ‘Shrimp‑Ready’ Laptop That Cuts Token Use by 50%

Honor’s new YOYO Claw technology embeds pre‑configured AI agents into MagicBook laptops, eliminating setup friction, halving token consumption compared with OpenClaw, and delivering device‑level security and multi‑device ecosystem benefits for everyday users.

AI agentsHardware integrationHonor

0 likes · 13 min read

Honor YOYO Claw: The First ‘Shrimp‑Ready’ Laptop That Cuts Token Use by 50%

macrozheng

Apr 16, 2026 · Operations

Cut Token Costs by 90% with RTK: A High‑Performance CLI Proxy for Claude Code

This article introduces RTK, a high‑performance CLI proxy that filters and compresses command output before it reaches Claude Code's 200k LLM context, reducing token consumption by 60‑90% and improving inference speed, with step‑by‑step installation and usage instructions.

CLIClaude CodeLLM

0 likes · 4 min read

Cut Token Costs by 90% with RTK: A High‑Performance CLI Proxy for Claude Code

AI Architecture Path

Apr 14, 2026 · Artificial Intelligence

Cut AI Coding Assistant Token Use by 75% with Caveman’s Minimalist Output

Caveman is an open‑source plugin for AI coding assistants that removes redundant phrasing, cutting output tokens by up to 75% and speeding responses threefold, while preserving code blocks, error messages, and technical terms, and offering multiple intensity levels and specialized commands to streamline development workflows.

AI assistantCLI toolOpen Source

0 likes · 11 min read

Cut AI Coding Assistant Token Use by 75% with Caveman’s Minimalist Output

ArcThink

Apr 13, 2026 · Artificial Intelligence

Why Your Claude Code Quota Drains Fast and How to Save Up to 90% of Tokens

A typical Claude Code session spends 98% of its tokens on input rather than generated code, so most of the budget is wasted on context, file reads, and system prompts; this article explains the billing model, common waste patterns, monitoring tools, and a four‑layer optimization pyramid that can cut token usage by 50‑90%.

AI codingClaude CodeCost Management

0 likes · 23 min read

Why Your Claude Code Quota Drains Fast and How to Save Up to 90% of Tokens

AI Architecture Path

Apr 13, 2026 · Industry Insights

How RTK Cuts AI Coding Token Costs by 90%: A Deep Dive

RTK (Rust Token Killer) is a lightweight, zero‑intrusion CLI proxy that filters noisy terminal output for AI coding assistants, achieving up to 99% compression of irrelevant data and reducing token consumption by more than 90%, thereby lowering costs and boosting developer productivity.

AI programmingCLI toolOpen Source

0 likes · 10 min read

How RTK Cuts AI Coding Token Costs by 90%: A Deep Dive

AsiaInfo Technology: New Tech Exploration

Apr 9, 2026 · Artificial Intelligence

How OAG Shrinks a Million‑Token Ontology to 11% While Keeping LLM Reasoning Power

This article presents the OAG (Ontology‑Augmented Generation) architecture, which uses a three‑stage pipeline of semantic filtering, graph‑based path pruning, and format conversion to compress enterprise‑scale ontologies by up to 89% of tokens while limiting inference accuracy loss to around 3% and adding only ~240 ms latency.

AI agentsLLMgraph algorithms

0 likes · 21 min read

How OAG Shrinks a Million‑Token Ontology to 11% While Keeping LLM Reasoning Power

Senior Tony

Apr 5, 2026 · Artificial Intelligence

How to Impress Interviewers with Smart Token‑Optimization Strategies for LLMs

The article explains why simply switching to cheaper large language models fails in interviews and outlines five practical techniques—prompt simplification, context management, output control, model tiering, and caching—to reduce token consumption while preserving answer quality.

CachingInterview TipsLLM

0 likes · 5 min read

How to Impress Interviewers with Smart Token‑Optimization Strategies for LLMs

SuanNi

Apr 3, 2026 · Artificial Intelligence

How Progressive Disclosure Cuts AI Agent Token Bloat by 90% and Enables Self‑Generated Skills

Google's Agent Development Kit introduces a Progressive Disclosure architecture that splits skill knowledge into three lazy‑loaded layers, dramatically reducing token consumption and improving response quality while also supporting four skill‑building modes, including a meta‑skill that lets agents generate new skills on the fly.

AI agentAgent Development KitMeta Skill

0 likes · 17 min read

How Progressive Disclosure Cuts AI Agent Token Bloat by 90% and Enables Self‑Generated Skills

Architect's Journey

Apr 1, 2026 · Artificial Intelligence

Agentic OS Explained: Can Alibaba Cloud’s AI‑Agent OS Be the Windows for Agents?

Agentic OS, Alibaba Cloud’s first operating system built for AI agents, tackles traditional OS limitations—high onboarding barriers, lengthy training, instability, weak security, and coordination complexity—through a three‑layer design, pre‑packaged Skills that cut token usage by over 30%, a one‑command Copilot Shell deployment, and a comprehensive security core, reshaping the compute paradigm toward agent‑centric workloads.

AI agentAgentic OSCloud Computing

0 likes · 10 min read

Agentic OS Explained: Can Alibaba Cloud’s AI‑Agent OS Be the Windows for Agents?

Yunqi AI+

Mar 25, 2026 · R&D Management

How to Build a Code Review Agent Skill: From Skeleton to Cost‑Effective Localization (Part 2)

This article walks through the complete process of creating a Code Review Skill for AI agents, covering skeleton definition, architecture‑ and coding‑rule derivation, business‑logic checks, unit‑test standards, context routing, token‑consumption analysis, cost‑optimisation tips, and how to extend the pattern to other skills.

Agent SkillCOLA ArchitectureCode Review

0 likes · 16 min read

How to Build a Code Review Agent Skill: From Skeleton to Cost‑Effective Localization (Part 2)

Java Architecture Diary

Mar 23, 2026 · Artificial Intelligence

How Rust Token Killer Cuts AI Coding Token Costs by 90% in Seconds

The article explains how the Rust Token Killer (RTK) tool filters out unnecessary CLI output, dramatically reducing token consumption for AI code assistants by up to 89%, extending session length threefold, and provides quick installation and usage instructions.

AICLIClaude Code

0 likes · 6 min read

How Rust Token Killer Cuts AI Coding Token Costs by 90% in Seconds

Tencent Cloud Developer

Mar 17, 2026 · Artificial Intelligence

Why Anthropic Skips Function Calling: Inside the 5 Skill Execution Modes

This article dissects Anthropic's Skill framework, revealing how it drives AI agents through five distinct execution modes—pure prompt injection, script execution, library calls, progressive document loading, and workflow orchestration—while avoiding function‑calling registration and optimizing token usage.

AIAgentFunction Calling

0 likes · 32 min read

Why Anthropic Skips Function Calling: Inside the 5 Skill Execution Modes

Black & White Path

Mar 12, 2026 · Artificial Intelligence

How to Cut Token Costs When Using OpenClaw Agents

This guide shares practical ways to reduce token consumption in OpenClaw by monitoring agent actions, stopping runaway tasks, trimming oversized markdown configurations, applying concise agent rules, and leveraging free models for testing, helping users halve their AI expenses.

AI agentsOpenClawagent rules

0 likes · 8 min read

How to Cut Token Costs When Using OpenClaw Agents

Rare Earth Juejin Tech Community

Mar 11, 2026 · Artificial Intelligence

How to Build a Cost‑Efficient Multi‑AI Team with Claude Code

This article details a hands‑on experiment that turns Claude Code into a virtual AI team—splitting project‑manager, designer, programmer and QA roles into separate agents, using file‑based communication, strict CLAUDE.md contracts, and token‑saving techniques such as timestamp checks and model‑specific task routing.

AI multi‑agentClaude Codefile-based communication

0 likes · 22 min read

How to Build a Cost‑Efficient Multi‑AI Team with Claude Code

Code Mala Tang

Mar 9, 2026 · Artificial Intelligence

How Claude’s New Prompt Caching Cuts Token Costs by 90% for Long‑Running Agents

Claude’s API now automatically caches static parts of prompts—system instructions, tool definitions, and context—so repeated calls reuse these sections at only 10% of the standard token price, dramatically reducing costs for multi‑turn agents, but developers must manage prefixes and avoid cache‑breaking changes.

Claude APICost ReductionLLM engineering

0 likes · 15 min read

How Claude’s New Prompt Caching Cuts Token Costs by 90% for Long‑Running Agents

Java Backend Technology

Mar 5, 2026 · Artificial Intelligence

How to Slash AI Token Costs: MCP vs Skill and 6 Proven Optimization Techniques

This article explains the fundamental differences between web session tokens and AI tokens, compares MCP and Skill token consumption, presents pricing formulas for major models, and offers practical strategies—including prompt compression, context management, and dynamic toolsets—to dramatically reduce AI token expenses.

Artificial IntelligenceCost ManagementMCP

0 likes · 16 min read

How to Slash AI Token Costs: MCP vs Skill and 6 Proven Optimization Techniques

Efficient Ops

Mar 2, 2026 · Artificial Intelligence

Deploy OpenClaw: Your Multi‑Channel AI Agent Gateway Made Easy

OpenClaw is an AI agent gateway that supports WhatsApp, Telegram, Discord and other platforms, offering a quick curl‑based installation, a guided configuration wizard, extensible Skills system, token‑saving plugins, and operational tools for DevOps and SRE tasks.

AI agentInstallationMulti-Channel Messaging

0 likes · 6 min read

Deploy OpenClaw: Your Multi‑Channel AI Agent Gateway Made Easy

AI Engineering

Mar 2, 2026 · Artificial Intelligence

How Context Mode Cuts 98% of Context Tokens for AI Development Tools

Context Mode inserts a sandbox and SQLite‑FTS5 retrieval layer between Claude Code and tool outputs, shrinking typical tool data from megabytes to a few hundred bytes and reducing overall context usage by 98%, extending session time from about 30 minutes to three hours.

AI toolingClaudeContext Mode

0 likes · 4 min read

How Context Mode Cuts 98% of Context Tokens for AI Development Tools

AI Waka

Feb 24, 2026 · Artificial Intelligence

How Claude’s New Auto‑Caching Cuts API Token Costs by 90%

By adding a single field to Claude API requests, developers can automatically cache static prompt parts, reducing token billing to just 10% of the original cost and dramatically lowering expenses for multi‑turn AI agents.

AI agentsClaude APICost Reduction

0 likes · 13 min read

How Claude’s New Auto‑Caching Cuts API Token Costs by 90%

Fun with Large Models

Feb 10, 2026 · Artificial Intelligence

Building LangChain Agent Skills from Scratch to Cut Token Usage and Boost Tool Accuracy

The article presents a step‑by‑step design and implementation of a Claude‑style Skills mechanism for LangChain agents, using a double‑layer tool architecture, state‑driven dynamic filtering, and middleware interception to load only relevant tools, dramatically reducing token consumption and improving decision quality and response speed.

Agent SkillsDynamic LoadingLangChain

0 likes · 15 min read

Building LangChain Agent Skills from Scratch to Cut Token Usage and Boost Tool Accuracy

Shuge Unlimited

Feb 9, 2026 · Artificial Intelligence

Claude-Mem Saves 95% Tokens and Offers Unlimited Memory – 25.8K‑Star GitHub Project

The article analyzes the "memory loss" problem of AI coding assistants, introduces the open‑source Claude‑Mem project that adds a three‑layer progressive‑disclosure architecture and AI‑driven semantic compression, and shows how it reduces token usage by 95%, boosts tool‑call limits twenty‑fold, and improves developer workflow.

AI coding assistantclaude-memmemory retrieval

0 likes · 18 min read

Claude-Mem Saves 95% Tokens and Offers Unlimited Memory – 25.8K‑Star GitHub Project

PaperAgent

Feb 1, 2026 · Artificial Intelligence

Why Clawdbot Burns Millions of Tokens and How to Slash Its Costs

The article provides a deep technical breakdown of the OpenClaw (formerly Clawdbot) AI agent’s token consumption patterns, identifies four major architectural token‑black‑holes, explains why they are hard to avoid, and offers concrete mitigation strategies such as prompt caching, workflow engines, context compaction, tool pruning, and model routing to dramatically reduce operational costs.

AI agentsCost ReductionPrompt Caching

0 likes · 12 min read

Why Clawdbot Burns Millions of Tokens and How to Slash Its Costs

AI Engineering

Jan 20, 2026 · Artificial Intelligence

How mcpx Cuts Token Overhead in MCP Tool Calls for Local LLMs

The article explains how mcpx reduces MCP tool definition tokens from tens of thousands to a few hundred by discovering tools at execution time, improving accuracy and speed for local large language models while preserving prompt cache integrity.

AnthropicMCPTool Calling

0 likes · 6 min read

How mcpx Cuts Token Overhead in MCP Tool Calls for Local LLMs

PaperAgent

Jan 8, 2026 · Artificial Intelligence

How Cursor’s Dynamic Context Cuts Agent Token Use by 47%

Cursor’s new dynamic context feature lets its coding agents treat long tool outputs as files and selectively load only needed data, reducing total token consumption by 46.9% while improving response quality through techniques like file‑based tool responses, conversation‑history summarization, Agent Skills standards, efficient MCP tool loading, and treating terminal sessions as files.

AI agentsCursorLLM tooling

0 likes · 8 min read

How Cursor’s Dynamic Context Cuts Agent Token Use by 47%

AI Insight Log

Jan 7, 2026 · Artificial Intelligence

How Cursor’s Dynamic Context Discovery Cuts Token Usage by Nearly 47%

Cursor’s new Dynamic Context Discovery mechanism reduces token consumption by 46.9% by externalizing long outputs, preserving full chat history, loading skills on demand, slimming the tool catalog, and syncing terminal output to the file system, dramatically improving cost and focus for AI agents.

Context EngineeringCursorDynamic Context Discovery

0 likes · 6 min read

How Cursor’s Dynamic Context Discovery Cuts Token Usage by Nearly 47%

Programmer DD

Nov 14, 2025 · Artificial Intelligence

Can TOON Format Cut LLM Token Costs by Up to 60%?

This article explains how the TOON data‑serialization format reduces token usage and improves accuracy for large language model calls compared with traditional JSON, provides benchmark results, outlines scenarios where TOON is advantageous or unsuitable, and shows Java integration examples.

JavaLLMTOON

0 likes · 6 min read

Can TOON Format Cut LLM Token Costs by Up to 60%?