Is RAG Dead? How Grep Is Making a Comeback in LLM‑Powered Code Search
This article investigates the claim that Retrieval‑Augmented Generation (RAG) is obsolete by dissecting Claude Code’s grep‑driven search architecture, benchmarking its performance against traditional vector‑based retrieval, comparing it with Cursor and OpenAI Codex, and analyzing the trade‑offs of multi‑round agentic search.
Over the past year, headlines such as "RAG is dead?" have proliferated, while new‑generation agents like Claude Code and Codex explicitly abandon embeddings, indexes, and vector databases, relying instead on LLM‑driven Grep . This article conducts a deep technical investigation of that claim, using the leaked Claude Code source as a primary artifact.
1. Grep in Claude Code
Boris Cherny, the creator of Claude Code, has repeatedly stated that early versions used RAG with a local vector DB but quickly switched to agentic search because it performed better. The official Anthropic blog confirms that Claude Code loads code into context via Grep and Glob rather than any embedding step.
“Early versions of Claude Code used RAG + a local vector db, but we found pretty quickly that agentic search generally works better.” – Boris Cherny (X/Twitter)
The source reveals no embedding‑related code; instead, the GrepTool wraps ripgrep ( rg), a Rust‑based, multi‑threaded, .gitignore‑aware search engine.
2. LLM‑Driven Multi‑Round Search Loop
Claude Code’s core mechanism is a loop: the user query and the list of available tools are sent to the LLM; the LLM either returns a textual answer or a tool‑call request. When a tool is called, its result is appended to the conversation history and the LLM is invoked again with the updated context. The loop ends when the LLM decides the information is sufficient, or when a hard limit (max rounds, budget, user interrupt, or permission error) is hit.
The loop treats all tools equally – the LLM may call any combination of GrepTool, GlobTool, FileReadTool, or AgentTool in a single turn.
2.1 Core Tools
GrepTool : invokes rg to perform regular‑expression search.
GlobTool : matches file paths using glob patterns.
FileReadTool : reads specific line ranges from a file via Node.js fs.
AgentTool : spawns a child agent (type Explore) that runs its own search loop with an isolated context, returning only a summary.
2.2 GrepTool Output Modes
files_with_matches (default): returns only matching file paths. Example pattern "class.*Transport" yields cli/transports/WebSocketTransport.ts and cli/transports/SSETransport.ts. Because only filenames are returned, a subsequent Read is usually required.
content : returns the matching line plus -C 5 lines of surrounding context. This is sufficient for tasks like checking a constant value or function signature without a full file read.
count : returns the number of matches per file, useful for quickly estimating keyword density.
All modes respect a head_limit of 250 results to prevent context overflow.
3. Performance of Ripgrep‑Based Search
Although each Grep round scans the entire project, ripgrep’s five‑layer filtering makes it fast enough for typical developer projects:
Layer 1: .gitignore pruning – skips entire directory trees.
Layer 2: path restriction – limits the traversal root.
Layer 3: glob file‑type filter.
Layer 4: Binary detection – skips non‑text files.
Layer 5: Actual regex matching.
In a leaked Claude Code snapshot (4,471 files), a search limited to bridge/ reduced the candidate set from 4,471 to 32 files before regex matching.
Benchmarks on the Claude Code codebase (≈4,500 files, 950 k lines) show: TOOL_VERBS (low‑frequency term): ripgrep 0.09 s vs GNU grep 2.55 s → 28× faster. async.*generator (regex): 0.10 s vs 3.30 s → 33× faster. import.*from (high‑frequency term): 0.10 s vs 2.45 s → 25× faster.
Ripgrep achieves this through SIMD‑vectorized matching, Boyer‑Moore skipping, OS page‑cache reuse, memory‑mapped I/O, and multi‑threaded file processing.
4. Industry Comparison and Design Philosophy
4.1 Cursor’s Dual‑Index Architecture
Cursor combines a semantic embedding index (generated by tree‑sitter + Merkle‑tree sync, stored in Turbopuffer) with a trigram inverted index called Instant Grep . The trigram index accelerates exact‑match searches, while the embedding index enables semantic retrieval.
4.2 Codex’s Shell‑Based Search
OpenAI’s Codex CLI lacks dedicated search tools; it relies on a generic shell tool that can execute rg, find, cat, etc. All search logic is expressed as shell commands, giving maximum flexibility but requiring the LLM to parse unstructured command output.
4.3 Trade‑offs
Zero‑index (Claude Code, Codex) : No pre‑processing, zero startup cost, no index staleness, but each round adds to the LLM context.
Pre‑indexed (Cursor) : Higher upfront cost, better scalability for large codebases, but requires maintenance and incurs storage overhead.
Both Claude Code and Codex converge on the same conclusion: for local developer projects (tens to a few hundred MB), a pure Grep approach is faster, cheaper, and simpler.
5. Cost Management in Claude Code
Three mechanisms mitigate context‑token explosion:
Prompt cache : Identical prefixes across rounds are cached, charging only ~10 % of the full price for repeated tokens.
Auto‑compaction : When the conversation approaches the context window limit, Claude Code summarizes older turns and replaces them with a concise summary.
Sub‑agent isolation : The Explore child agent processes raw Grep results in its own context; only the final summary is returned to the main dialogue.
6. When Grep Fails and When Embedding Helps
Milvus engineers criticize Grep‑only retrieval for token bloat and lack of semantic understanding. In code‑search tasks, however, identifiers are precise anchors, and studies such as GrepRAG: An Empirical Study and Optimization of Grep‑Like Retrieval for Code Completion (ISSTA ’26) show that a single‑round Grep outperforms embedding‑based RAG on benchmarks like CrossCodeEval and RepoEval_Updated.
For natural‑language QA, Grep alone struggles because queries and answers often use different vocabularies. Query expansion via an LLM (e.g., rewriting a question into a set of keywords) can improve recall, but embedding‑based semantic search still outperforms pure Grep in those scenarios.
7. Conclusions
RAG as a paradigm (retrieval → context → generation) is alive; what is dying is the assumption that code search must rely on pre‑built embedding indexes. Claude Code and Codex demonstrate that LLM‑driven Grep, combined with smart multi‑round loops and context‑management tricks, provides a fast, zero‑maintenance alternative for typical developer codebases. For larger repositories or semantic‑rich natural‑language queries, hybrid or indexed solutions remain necessary.
Overall, the choice between zero‑index Grep and pre‑indexed RAG depends on data size, latency tolerance, and token‑budget constraints.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
