Tagged articles
44 articles
Page 1 of 1
Su San Talks Tech
Su San Talks Tech
May 31, 2026 · Artificial Intelligence

How Claude Code, Codex, and OpenCode Can Cut Token Usage by Up to 80%

The article breaks down why input tokens dominate 70‑90% of LLM costs and provides concrete, platform‑specific techniques—file filtering, context compression, documentation drives, memory caching, plan mode, output trimming, and model switching—that together can reduce token consumption by 20‑90% across Claude Code, Codex, and OpenCode.

AI coding assistantsClaude CodeCodex
0 likes · 10 min read
How Claude Code, Codex, and OpenCode Can Cut Token Usage by Up to 80%
java1234
java1234
May 28, 2026 · Artificial Intelligence

Cut Claude Code Token Costs by 80% with OpenWolf

OpenWolf, an open-source middleware for Claude Code, can slash token consumption by up to 80% by using a project map, learning memory, token ledger, bug memory, and six lifecycle hooks, all without changing your existing Claude CLI workflow.

AI toolsCLIClaude Code
0 likes · 8 min read
Cut Claude Code Token Costs by 80% with OpenWolf
AI Architecture Path
AI Architecture Path
May 24, 2026 · Artificial Intelligence

How agentmemory Fixes Claude Code Forgetting and Slashes Token Usage by 92%

The article explains how the open‑source agentmemory system solves common AI‑coding assistant pain points—session forgetfulness, repetitive context feeding, and high token costs—by providing automatic, cross‑tool persistent memory, hybrid retrieval, and a zero‑dependency deployment that reduces token consumption by 92% while offering detailed benchmarks and configuration guides.

AI agentMCPagentmemory
0 likes · 15 min read
How agentmemory Fixes Claude Code Forgetting and Slashes Token Usage by 92%
AI Architecture Path
AI Architecture Path
May 23, 2026 · Artificial Intelligence

Claude Code Controls the Browser with Playwright and Chrome DevTools MCP

The article compares Playwright MCP and Chrome DevTools MCP, explains their core differences, token consumption, waiting mechanisms, and tool capabilities, and provides step‑by‑step installation, configuration, and practical scenarios, showing how combining snapshot‑based analysis with these tools lets Claude Code efficiently automate browsers while avoiding common pitfalls such as token exhaustion and unstable execution.

AI automationChrome DevTools MCPClaude Code
0 likes · 11 min read
Claude Code Controls the Browser with Playwright and Chrome DevTools MCP
AI Engineering
AI Engineering
May 19, 2026 · Artificial Intelligence

Claude Adds Prompt Cache Diagnostics to Pinpoint Token Cost Spikes

Claude's new Prompt Cache Diagnostics feature lets developers see exactly why a cache miss occurs and how many tokens were wasted, providing beta‑header usage, Python examples, supported miss reasons, limitations, and privacy guarantees to help optimize token costs.

AI developmentAPI DiagnosticsAnthropic
0 likes · 9 min read
Claude Adds Prompt Cache Diagnostics to Pinpoint Token Cost Spikes
AI Engineering
AI Engineering
May 18, 2026 · Artificial Intelligence

Stop Throwing Money at AI: 10 Open‑Source Tools Cut Claude Code Tokens by 80% and Slash Large Projects by 49×

The article reviews ten open‑source utilities that dramatically reduce token consumption for AI coding assistants—cutting up to 80% of Claude Code tokens, saving hundreds of dollars, and shrinking large‑project token usage by as much as 49‑fold through output compression, command‑log filtering, and selective code‑base context.

AI codingClaude CodeOpen Source
0 likes · 14 min read
Stop Throwing Money at AI: 10 Open‑Source Tools Cut Claude Code Tokens by 80% and Slash Large Projects by 49×
AI Architecture Hub
AI Architecture Hub
May 15, 2026 · Artificial Intelligence

Unlock Claude's Full Potential: 18 Essential Steps

Most Claude users only tap 10% of its capabilities; this guide walks you through 18 concrete steps—creating persistent projects, crafting custom instructions, treating Claude as a thinking partner, controlling token usage, and more—to transform it into a personalized, high‑performance assistant.

AI assistantAI productivityClaude
0 likes · 15 min read
Unlock Claude's Full Potential: 18 Essential Steps
Senior Brother's Insights
Senior Brother's Insights
May 14, 2026 · Artificial Intelligence

7 Practical Tips to Slash Claude Code Token Usage by 80%

This article analyzes why token waste in Claude Code stems mainly from bloated context rather than verbose prompts and presents seven concrete techniques—including model selection, CLAUDE.md management, Subagent usage, precise file targeting, early compacting, context diagnostics, and restrained tool integration—to reduce token consumption by up to 80% while preserving workflow efficiency.

AI coding assistantClaude CodeCompact command
0 likes · 14 min read
7 Practical Tips to Slash Claude Code Token Usage by 80%
Su San Talks Tech
Su San Talks Tech
May 13, 2026 · Artificial Intelligence

Cut Claude Code Token Costs by Up to 89% with the Open‑Source RTK CLI

RTK is a high‑performance CLI proxy that filters and compresses command output before it reaches Claude Code’s 200k‑token LLM context, reducing token consumption by 60‑90% and cutting costs up to 89%, with step‑by‑step installation and usage instructions provided.

CLIClaude CodeLLM
0 likes · 5 min read
Cut Claude Code Token Costs by Up to 89% with the Open‑Source RTK CLI
IT Services Circle
IT Services Circle
May 6, 2026 · Artificial Intelligence

How to Cut Large‑Model Token Usage by Over 90%

The article analyses why AI Skills waste massive token counts, demonstrates a pure‑Skill implementation that costs $10 and 12 minutes, then shows a code‑plus‑model hybrid that reduces runtime to 17 seconds, API calls to one, and cost to $0.004, saving more than 99% of tokens.

ClaudeOpenRouterPlaywright
0 likes · 19 min read
How to Cut Large‑Model Token Usage by Over 90%
Frontend AI Walk
Frontend AI Walk
Apr 30, 2026 · Artificial Intelligence

Master AI Coding with Matt Pocock Skills: From Deep Alignment to Architecture in One Workflow

This guide walks developers through installing and using Matt Pocock Skills—a lightweight, composable set of AI‑agent commands that provide deep alignment, shared language, feedback loops, architecture reviews and token‑saving modes to turn "vibe coding" into repeatable, high‑quality delivery.

AI codingDocumentationTest‑Driven Development
0 likes · 19 min read
Master AI Coding with Matt Pocock Skills: From Deep Alignment to Architecture in One Workflow
DeWu Technology
DeWu Technology
Apr 29, 2026 · Information Security

How a General AI Agent Powers Scalable Gateway Route Security Audits

The article presents a practical AI‑driven security audit system for gateway routes that uses a layered “general Agent + business Skill” design, combines batch AI filtering with human verification, achieves full‑coverage, minute‑level detection, and reduces token costs by over 95 % through multiple optimizations.

AI agentAPI SecurityMCP Tool
0 likes · 15 min read
How a General AI Agent Powers Scalable Gateway Route Security Audits
DevOps Coach
DevOps Coach
Apr 27, 2026 · Artificial Intelligence

Can You Cut Claude Code’s Token Usage by 75%? A Simple Plugin Shows How

The article demonstrates that Claude Code’s verbose responses waste hundreds of tokens, but a free “caveman” plugin can slash token consumption by up to 75% while preserving answer quality, backed by benchmark data and a research paper on concise replies.

ClaudeLLM cost reductioncaveman plugin
0 likes · 6 min read
Can You Cut Claude Code’s Token Usage by 75%? A Simple Plugin Shows How
IoT Full-Stack Technology
IoT Full-Stack Technology
Apr 27, 2026 · Artificial Intelligence

Cut Token Usage by Up to 80% in Claude Code, Codex, and OpenCode

The article explains how to dramatically reduce token consumption in Claude Code, GitHub Copilot's Codex, and the open‑source OpenCode by tightly controlling input, trimming context, filtering files, leveraging tools, caching, and model selection, offering concrete commands, configuration files, and a ten‑step checklist that can cut usage by up to 80%.

AI coding assistantClaudeCodex
0 likes · 11 min read
Cut Token Usage by Up to 80% in Claude Code, Codex, and OpenCode
AI Waka
AI Waka
Apr 26, 2026 · Artificial Intelligence

Unlocking Reliable AI Agents: A Deep Dive into Harness Engineering

The article examines why raw LLM models fail as autonomous coding agents and introduces Harness Engineering—a disciplined scaffold of prompts, tools, context policies, hooks, and sub‑agents—that mitigates context corruption, long‑task collapse, and security risks while cutting token costs by up to 50%.

AI agentHarness EngineeringLLM safety
0 likes · 14 min read
Unlocking Reliable AI Agents: A Deep Dive into Harness Engineering
MeowKitty Programming
MeowKitty Programming
Apr 25, 2026 · Backend Development

When Connecting Java to AI, More Tools Aren’t Always Better: Dynamic Tool Discovery Is the New Hotspot

The article explains why loading a Java AI agent with dozens of tools hurts token efficiency and accuracy, and how Spring AI’s dynamic tool discovery—implemented via ToolSearchToolCallAdvisor—lets models fetch only the needed tools per turn, saving up to 64% of tokens and simplifying tool governance for large Java back‑ends.

AI agentsBackend IntegrationDynamic Tool Discovery
0 likes · 7 min read
When Connecting Java to AI, More Tools Aren’t Always Better: Dynamic Tool Discovery Is the New Hotspot
IoT Full-Stack Technology
IoT Full-Stack Technology
Apr 25, 2026 · Artificial Intelligence

How to Cut Claude Code, Codex, and OpenCode Token Usage by Up to 80%

The article breaks down why input tokens dominate cost (70‑90%), then details platform‑specific techniques—file filtering, context compression, documentation‑driven prompts, memory management, plan mode, output trimming, and model switching—that together can reduce Claude Code, Codex, and OpenCode token consumption by 60‑90%, with a practical 10‑step checklist.

AI coding assistantsClaude CodeCodex
0 likes · 11 min read
How to Cut Claude Code, Codex, and OpenCode Token Usage by Up to 80%
AI Architecture Path
AI Architecture Path
Apr 14, 2026 · Artificial Intelligence

Cut AI Coding Assistant Token Use by 75% with Caveman’s Minimalist Output

Caveman is an open‑source plugin for AI coding assistants that removes redundant phrasing, cutting output tokens by up to 75% and speeding responses threefold, while preserving code blocks, error messages, and technical terms, and offering multiple intensity levels and specialized commands to streamline development workflows.

AI assistantCLI toolOpen Source
0 likes · 11 min read
Cut AI Coding Assistant Token Use by 75% with Caveman’s Minimalist Output
ArcThink
ArcThink
Apr 13, 2026 · Artificial Intelligence

Why Your Claude Code Quota Drains Fast and How to Save Up to 90% of Tokens

A typical Claude Code session spends 98% of its tokens on input rather than generated code, so most of the budget is wasted on context, file reads, and system prompts; this article explains the billing model, common waste patterns, monitoring tools, and a four‑layer optimization pyramid that can cut token usage by 50‑90%.

AI codingClaude CodeCost Management
0 likes · 23 min read
Why Your Claude Code Quota Drains Fast and How to Save Up to 90% of Tokens
AI Architecture Path
AI Architecture Path
Apr 13, 2026 · Industry Insights

How RTK Cuts AI Coding Token Costs by 90%: A Deep Dive

RTK (Rust Token Killer) is a lightweight, zero‑intrusion CLI proxy that filters noisy terminal output for AI coding assistants, achieving up to 99% compression of irrelevant data and reducing token consumption by more than 90%, thereby lowering costs and boosting developer productivity.

AI programmingCLI toolOpen Source
0 likes · 10 min read
How RTK Cuts AI Coding Token Costs by 90%: A Deep Dive
AsiaInfo Technology: New Tech Exploration
AsiaInfo Technology: New Tech Exploration
Apr 9, 2026 · Artificial Intelligence

How OAG Shrinks a Million‑Token Ontology to 11% While Keeping LLM Reasoning Power

This article presents the OAG (Ontology‑Augmented Generation) architecture, which uses a three‑stage pipeline of semantic filtering, graph‑based path pruning, and format conversion to compress enterprise‑scale ontologies by up to 89% of tokens while limiting inference accuracy loss to around 3% and adding only ~240 ms latency.

AI agentsLLMgraph algorithms
0 likes · 21 min read
How OAG Shrinks a Million‑Token Ontology to 11% While Keeping LLM Reasoning Power
Senior Tony
Senior Tony
Apr 5, 2026 · Artificial Intelligence

How to Impress Interviewers with Smart Token‑Optimization Strategies for LLMs

The article explains why simply switching to cheaper large language models fails in interviews and outlines five practical techniques—prompt simplification, context management, output control, model tiering, and caching—to reduce token consumption while preserving answer quality.

CachingInterview TipsLLM
0 likes · 5 min read
How to Impress Interviewers with Smart Token‑Optimization Strategies for LLMs
SuanNi
SuanNi
Apr 3, 2026 · Artificial Intelligence

How Progressive Disclosure Cuts AI Agent Token Bloat by 90% and Enables Self‑Generated Skills

Google's Agent Development Kit introduces a Progressive Disclosure architecture that splits skill knowledge into three lazy‑loaded layers, dramatically reducing token consumption and improving response quality while also supporting four skill‑building modes, including a meta‑skill that lets agents generate new skills on the fly.

AI agentAgent Development KitMeta Skill
0 likes · 17 min read
How Progressive Disclosure Cuts AI Agent Token Bloat by 90% and Enables Self‑Generated Skills
Architect's Journey
Architect's Journey
Apr 1, 2026 · Artificial Intelligence

Agentic OS Explained: Can Alibaba Cloud’s AI‑Agent OS Be the Windows for Agents?

Agentic OS, Alibaba Cloud’s first operating system built for AI agents, tackles traditional OS limitations—high onboarding barriers, lengthy training, instability, weak security, and coordination complexity—through a three‑layer design, pre‑packaged Skills that cut token usage by over 30%, a one‑command Copilot Shell deployment, and a comprehensive security core, reshaping the compute paradigm toward agent‑centric workloads.

AI agentAgentic OSCloud Computing
0 likes · 10 min read
Agentic OS Explained: Can Alibaba Cloud’s AI‑Agent OS Be the Windows for Agents?
Yunqi AI+
Yunqi AI+
Mar 25, 2026 · R&D Management

How to Build a Code Review Agent Skill: From Skeleton to Cost‑Effective Localization (Part 2)

This article walks through the complete process of creating a Code Review Skill for AI agents, covering skeleton definition, architecture‑ and coding‑rule derivation, business‑logic checks, unit‑test standards, context routing, token‑consumption analysis, cost‑optimisation tips, and how to extend the pattern to other skills.

Agent SkillCOLA ArchitectureCode Review
0 likes · 16 min read
How to Build a Code Review Agent Skill: From Skeleton to Cost‑Effective Localization (Part 2)
Tencent Cloud Developer
Tencent Cloud Developer
Mar 17, 2026 · Artificial Intelligence

Why Anthropic Skips Function Calling: Inside the 5 Skill Execution Modes

This article dissects Anthropic's Skill framework, revealing how it drives AI agents through five distinct execution modes—pure prompt injection, script execution, library calls, progressive document loading, and workflow orchestration—while avoiding function‑calling registration and optimizing token usage.

AIAgentFunction Calling
0 likes · 32 min read
Why Anthropic Skips Function Calling: Inside the 5 Skill Execution Modes
Black & White Path
Black & White Path
Mar 12, 2026 · Artificial Intelligence

How to Cut Token Costs When Using OpenClaw Agents

This guide shares practical ways to reduce token consumption in OpenClaw by monitoring agent actions, stopping runaway tasks, trimming oversized markdown configurations, applying concise agent rules, and leveraging free models for testing, helping users halve their AI expenses.

AI agentsOpenClawagent rules
0 likes · 8 min read
How to Cut Token Costs When Using OpenClaw Agents
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Mar 11, 2026 · Artificial Intelligence

How to Build a Cost‑Efficient Multi‑AI Team with Claude Code

This article details a hands‑on experiment that turns Claude Code into a virtual AI team—splitting project‑manager, designer, programmer and QA roles into separate agents, using file‑based communication, strict CLAUDE.md contracts, and token‑saving techniques such as timestamp checks and model‑specific task routing.

AI multi‑agentClaude Codefile-based communication
0 likes · 22 min read
How to Build a Cost‑Efficient Multi‑AI Team with Claude Code
Code Mala Tang
Code Mala Tang
Mar 9, 2026 · Artificial Intelligence

How Claude’s New Prompt Caching Cuts Token Costs by 90% for Long‑Running Agents

Claude’s API now automatically caches static parts of prompts—system instructions, tool definitions, and context—so repeated calls reuse these sections at only 10% of the standard token price, dramatically reducing costs for multi‑turn agents, but developers must manage prefixes and avoid cache‑breaking changes.

Claude APICost ReductionLLM engineering
0 likes · 15 min read
How Claude’s New Prompt Caching Cuts Token Costs by 90% for Long‑Running Agents
Java Backend Technology
Java Backend Technology
Mar 5, 2026 · Artificial Intelligence

How to Slash AI Token Costs: MCP vs Skill and 6 Proven Optimization Techniques

This article explains the fundamental differences between web session tokens and AI tokens, compares MCP and Skill token consumption, presents pricing formulas for major models, and offers practical strategies—including prompt compression, context management, and dynamic toolsets—to dramatically reduce AI token expenses.

Artificial IntelligenceCost ManagementMCP
0 likes · 16 min read
How to Slash AI Token Costs: MCP vs Skill and 6 Proven Optimization Techniques
Efficient Ops
Efficient Ops
Mar 2, 2026 · Artificial Intelligence

Deploy OpenClaw: Your Multi‑Channel AI Agent Gateway Made Easy

OpenClaw is an AI agent gateway that supports WhatsApp, Telegram, Discord and other platforms, offering a quick curl‑based installation, a guided configuration wizard, extensible Skills system, token‑saving plugins, and operational tools for DevOps and SRE tasks.

AI agentInstallationMulti-Channel Messaging
0 likes · 6 min read
Deploy OpenClaw: Your Multi‑Channel AI Agent Gateway Made Easy
AI Engineering
AI Engineering
Mar 2, 2026 · Artificial Intelligence

How Context Mode Cuts 98% of Context Tokens for AI Development Tools

Context Mode inserts a sandbox and SQLite‑FTS5 retrieval layer between Claude Code and tool outputs, shrinking typical tool data from megabytes to a few hundred bytes and reducing overall context usage by 98%, extending session time from about 30 minutes to three hours.

AI toolingClaudeContext Mode
0 likes · 4 min read
How Context Mode Cuts 98% of Context Tokens for AI Development Tools
AI Waka
AI Waka
Feb 24, 2026 · Artificial Intelligence

How Claude’s New Auto‑Caching Cuts API Token Costs by 90%

By adding a single field to Claude API requests, developers can automatically cache static prompt parts, reducing token billing to just 10% of the original cost and dramatically lowering expenses for multi‑turn AI agents.

AI agentsClaude APICost Reduction
0 likes · 13 min read
How Claude’s New Auto‑Caching Cuts API Token Costs by 90%
Fun with Large Models
Fun with Large Models
Feb 10, 2026 · Artificial Intelligence

Building LangChain Agent Skills from Scratch to Cut Token Usage and Boost Tool Accuracy

The article presents a step‑by‑step design and implementation of a Claude‑style Skills mechanism for LangChain agents, using a double‑layer tool architecture, state‑driven dynamic filtering, and middleware interception to load only relevant tools, dramatically reducing token consumption and improving decision quality and response speed.

Agent SkillsDynamic LoadingLangChain
0 likes · 15 min read
Building LangChain Agent Skills from Scratch to Cut Token Usage and Boost Tool Accuracy
Shuge Unlimited
Shuge Unlimited
Feb 9, 2026 · Artificial Intelligence

Claude-Mem Saves 95% Tokens and Offers Unlimited Memory – 25.8K‑Star GitHub Project

The article analyzes the "memory loss" problem of AI coding assistants, introduces the open‑source Claude‑Mem project that adds a three‑layer progressive‑disclosure architecture and AI‑driven semantic compression, and shows how it reduces token usage by 95%, boosts tool‑call limits twenty‑fold, and improves developer workflow.

AI coding assistantclaude-memmemory retrieval
0 likes · 18 min read
Claude-Mem Saves 95% Tokens and Offers Unlimited Memory – 25.8K‑Star GitHub Project
PaperAgent
PaperAgent
Feb 1, 2026 · Artificial Intelligence

Why Clawdbot Burns Millions of Tokens and How to Slash Its Costs

The article provides a deep technical breakdown of the OpenClaw (formerly Clawdbot) AI agent’s token consumption patterns, identifies four major architectural token‑black‑holes, explains why they are hard to avoid, and offers concrete mitigation strategies such as prompt caching, workflow engines, context compaction, tool pruning, and model routing to dramatically reduce operational costs.

AI agentsCost ReductionPrompt Caching
0 likes · 12 min read
Why Clawdbot Burns Millions of Tokens and How to Slash Its Costs
AI Engineering
AI Engineering
Jan 20, 2026 · Artificial Intelligence

How mcpx Cuts Token Overhead in MCP Tool Calls for Local LLMs

The article explains how mcpx reduces MCP tool definition tokens from tens of thousands to a few hundred by discovering tools at execution time, improving accuracy and speed for local large language models while preserving prompt cache integrity.

AnthropicMCPTool Calling
0 likes · 6 min read
How mcpx Cuts Token Overhead in MCP Tool Calls for Local LLMs
PaperAgent
PaperAgent
Jan 8, 2026 · Artificial Intelligence

How Cursor’s Dynamic Context Cuts Agent Token Use by 47%

Cursor’s new dynamic context feature lets its coding agents treat long tool outputs as files and selectively load only needed data, reducing total token consumption by 46.9% while improving response quality through techniques like file‑based tool responses, conversation‑history summarization, Agent Skills standards, efficient MCP tool loading, and treating terminal sessions as files.

AI agentsCursorLLM tooling
0 likes · 8 min read
How Cursor’s Dynamic Context Cuts Agent Token Use by 47%
AI Insight Log
AI Insight Log
Jan 7, 2026 · Artificial Intelligence

How Cursor’s Dynamic Context Discovery Cuts Token Usage by Nearly 47%

Cursor’s new Dynamic Context Discovery mechanism reduces token consumption by 46.9% by externalizing long outputs, preserving full chat history, loading skills on demand, slimming the tool catalog, and syncing terminal output to the file system, dramatically improving cost and focus for AI agents.

Context EngineeringCursorDynamic Context Discovery
0 likes · 6 min read
How Cursor’s Dynamic Context Discovery Cuts Token Usage by Nearly 47%
Programmer DD
Programmer DD
Nov 14, 2025 · Artificial Intelligence

Can TOON Format Cut LLM Token Costs by Up to 60%?

This article explains how the TOON data‑serialization format reduces token usage and improves accuracy for large language model calls compared with traditional JSON, provides benchmark results, outlines scenarios where TOON is advantageous or unsuitable, and shows Java integration examples.

JavaLLMTOON
0 likes · 6 min read
Can TOON Format Cut LLM Token Costs by Up to 60%?