Artificial Intelligence 11 min read

How to Cut Claude Code, Codex, and OpenCode Token Usage by Up to 80%

The article breaks down why input tokens dominate cost (70‑90%), then details platform‑specific techniques—file filtering, context compression, documentation‑driven prompts, memory management, plan mode, output trimming, and model switching—that together can reduce Claude Code, Codex, and OpenCode token consumption by 60‑90%, with a practical 10‑step checklist.

IoT Full-Stack Technology

Apr 25, 2026

How to Cut Claude Code, Codex, and OpenCode Token Usage by Up to 80%

Source: 我是程序汪

Token‑consumption principles

Cost formula: Total cost = Input Tokens × input price + Output Tokens × output price

Input Tokens (70%–90%): commands, conversation history, project files, tool outputs, system prompts

Output Tokens (10%–30%): code, explanations, logs returned by the AI

Largest black hole: automatic project‑file reading, often 80% of a single interaction’s input

Platform‑specific token‑saving methods

Claude Code (most common, largest optimisation space)

File filtering – .claudeignore : create at project root, syntax identical to .gitignore. Example content:

# Dependencies & builds (largest black hole)
node_modules/

dist/
build/
.next/
__pycache/

# Lock files / logs
*.lock
package-lock.json
*.log

# VCS / IDE
.git/
.idea/
.vscode/

# Resources / cache
*.png
*.jpg
*.svg
*.ico
.cache/
coverage/

Effect: a single interaction drops from ~150 k tokens to ~60 k (≈60% reduction).

Context compression – /compact :

Manual: invoke /compact at logical checkpoints (e.g., after completing a feature).

Command‑guided: /compact keeps code changes and file paths, discarding analysis steps.

Auto‑compression: enable with /config Auto-compact enabled, reducing 25 k tokens to 3 k (≈88% saved).

Documentation‑driven prompt – CLAUDE.md : place at project root to describe stack, directory layout, and commands, avoiding exploratory cat/find/grep calls. Example:

# Project Overview
Next.js 14 + TypeScript + Prisma + PostgreSQL SaaS

# Directory Structure
src/app/       # App Router
src/components/ # Components
src/lib/       # Utilities
src/server/    # Server code

# Development Commands
pnpm dev
pnpm build

Effect: reduces useless token usage by >30%.

Memory management – /memory :

Store:

/memory 项目用 Next.js 14 + TypeScript，接口规范见 docs/api.md

View: /memory list Delete: /memory delete [key] Effect: avoids repeated pasting of configuration, saving >40% of repeated input.

Plan mode – Shift+Tab : AI first produces an execution plan; after confirmation it runs, preventing wasted exploration. Effect: saves >20% of useless tokens.

Output trimming:

Enable "trim tool output" to drop ANSI colors, progress bars, empty lines.

Long‑output truncation: keep only error stacks and failure cases. Example: npm test output reduced from 25 k to 2.5 k tokens (≈90% saved).

Model switching – /model :

Simple tasks (syntax, small functions): /model haiku (lowest price).

Complex tasks (architecture, multi‑file): /model sonnet.

Very complex: /model opus (use only when necessary).

Effect: per‑task cost reduction of 30%–80%.

Codex (GitHub Copilot – IDE‑centric)

IDE configuration – limit context files: VS Code → Settings → GitHub Copilot → Max File Context → set to 3–5 files. Effect: input reduced by >50%.

Command simplification: replace verbose natural‑language prompts with concise comment‑driven snippets. Bad prompt:

帮我写一个用户登录的后端接口，用 Node.js + Express，包含 JWT 验证、密码加密、错误处理

Good prompt: // Node.js Express 登录接口 JWT bcrypt Effect: input reduced by >40%.

Disable unnecessary features: turn off auto‑completion, real‑time suggestions, and full‑project indexing when not needed, reducing continuous scanning token consumption.

File‑by‑file development: develop one function per file, avoid large cross‑file logic, manually copy needed snippets instead of auto‑reading. Effect: context size reduced by >60%.

OpenCode (open‑source / self‑hosted, highly configurable)

Precise context limits – config.json :

{
  "model": {
    "name": "deepseek-v3",
    "input_limit": 128000, // set according to model capability
    "output_limit": 80000
  }
}

Effect: fully utilizes context, avoids automatic truncation and duplicate requests, saving >30%.

File filtering – .opencodeignore : same syntax as .claudeignore, excludes dependencies, builds, logs, resource files.

Context management – manual history clearing: periodically run /clear to reset context and prevent multi‑task history buildup; use separate sessions for different functionalities. Effect: saves >50% of useless context.

Model choice – low‑cost open models:

Simple tasks: Qwen 7B, Llama 3 8B (local or cheap API).

Complex tasks: DeepSeek V3, Qwen Max (switch as needed).

Effect: per‑task price drops 70%–95%.

Compact comparison of optimisation dimensions

File filtering: .claudeignore / .opencodeignore vs IDE max‑file setting – saves 60%–80%.

Context compression: /compact (Claude) vs manual /clear (OpenCode) – saves 50%–88%.

Documentation‑driven prompt: CLAUDE.md vs custom README – saves 30%–50%.

Memory solidification: /memory vs global config – saves 40%–60%.

Plan mode: Shift+Tab vs manual task breakdown – saves 20%–40%.

Output trimming: tool‑output trimming and log truncation – saves 70%–90%.

Model switching: /model haiku/sonnet/opus vs manual plugin change – saves 30%–80%.

Context upper limit management: auto‑manage via /config (Claude) vs precise config.json (OpenCode) – saves >30%.

Practical 10‑step token‑saving checklist

Create .claudeignore or .opencodeignore at the project root and copy the template.

Add a CLAUDE.md describing the tech stack, directory layout, and commands.

Enable auto‑compression (Claude: /config Auto-compact).

For long dialogs, manually run /compact at logical breakpoints.

Store project conventions with /memory to avoid repeated input.

Use Plan Mode ( Shift+Tab) for complex tasks – plan before execution.

Switch models per task: simple → /model haiku, complex → /model sonnet, very complex → /model opus.

Disable unnecessary auto‑features (real‑time suggestions, full‑project scans).

Separate development into distinct sessions or files to prevent history bloat.

Regularly check token usage ( /usage) to locate new black holes.

Key reminders

Input is the core cost driver: prioritize trimming file reads, context size, and command length.

Prefer over‑exclusion to under‑exclusion: excluded files can be pasted manually, which is cheaper than automatic scanning.

Timely cleanup: long dialogs and multi‑task sessions require compression or clearing to avoid history inflation.

Model matching: select the appropriate tier for each task instead of defaulting to the highest‑end model.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Codex AI coding assistants Claude Code token optimization OpenCode

Written by

IoT Full-Stack Technology

Dedicated to sharing IoT cloud services, embedded systems, and mobile client technology, with no spam ads.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.