5 Essential Tools to Install Before Building an AI Agent
The article outlines five critical setup steps—privacy with direnv and a secret manager, token handling via litellm or portkey, context management using uv and git commits, visibility through mitmproxy, and rigorous evaluation with inspect‑ai—showing how they cut token waste by 68.3%, reduce costs 92.5% and raise evaluation pass rates to 94.2% across 347 runs.
Before starting any new agentic AI project, the author recommends a five‑step toolchain that tackles the 2026‑era pain points of secret leakage, cost overruns, and debugging difficulty, backed by data from 347 runs (68.3% token cache, 92.5% cost reduction, 94.2% evaluation pass rate).
1. Privacy – direnv + real secret manager
Install direnv and connect it to a team password manager (e.g., 1Password CLI via op run, Doppler, Infisical, or Vault). direnv loads directory‑specific environment variables on cd and unloads them on exit, ensuring credentials never sit in plain‑text files. This prevents common leaks such as API keys committing to git history, credentials propagating through shell history, shared .env files syncing via Dropbox, and keys left on a stolen laptop.
2. Token – litellm or portkey as a model proxy
Use a unified URL proxy (litellm or portkey) to route all calls to providers like Anthropic, OpenAI, Google, Mistral, or local models. The proxy offers prompt‑hash caching (cutting bills 30‑60%), automatic rate‑limit fallback (e.g., Sonnet → Opus → GPT → local backup), budget caps that stop a single call from spending $200, model‑routing rules that send cheap tasks to Haiku and expensive ones to Opus, and pre‑request PII redaction, all of which dramatically lower unexpected spend.
3. Context – uv + git commit per successful eval
Replace pip+venv with uv, a Python package manager that is 10‑100× faster. After each evaluation suite passes, commit the change with a message that records the model version and pass rate. The commit stores uv.lock (exact dependency snapshot), the exact prompt and code state, precise model‑version‑to‑pass‑rate pairing, and a rollback point, providing a compliance trail and reproducible debugging evidence.
4. Visibility – mitmproxy in front of every LLM call
Deploy mitmproxy as an eavesdropping layer so every request and response is visible. It reveals silent retries, the full prompt (including any accidentally embedded credentials), the model’s raw output before your code reacts, exact token cost per call, and hidden responses that may indicate prompt injection. Without this layer, developers often assume the agent behaved correctly without verification.
5. Evals – inspect‑ai framework
Adopt the open‑source inspect‑ai evaluation framework (used by Anthropic, DeepMind, and the UK AI Safety Institute). It runs the same task on five different models, compares scores side‑by‑side, includes high‑risk behavior tests (lying, tool misuse), provides a proper evaluation structure for tool‑using agents, ensures reproducible scoring with a fixed eval seed, and generates a numeric pass/fail signal. Record every odd behavior, boundary condition, or configuration change in a /lessons.md file for future reference.
By wiring these five components together and maintaining the lessons file, a new agentic system can be operational in two days instead of two months, with measurable improvements in security, cost efficiency, and reliability.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
High Availability Architecture
Official account for High Availability Architecture.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
