How Karpathy’s LLM‑Wiki Turns LLMs into a Self‑Growing Personal Knowledge Base
The article critiques traditional RAG‑based knowledge bases for lacking persistence, then details Karpathy’s LLM‑wiki approach that incrementally builds a structured, cross‑linked Markdown wiki through three layers, three core operations, and lightweight indexing, enabling continuous, low‑cost knowledge accumulation.
Problems with Traditional RAG
Current mainstream knowledge bases such as ChatGPT file upload and NotebookLM rely on Retrieval‑Augmented Generation (RAG): upload documents, ask questions, and the LLM retrieves relevant fragments to generate answers. Karpathy argues this approach has a fatal flaw—no accumulation. Each query forces the LLM to re‑search and re‑assemble knowledge from raw documents, leaving no lasting record.
Karpathy LLM‑Wiki Idea
The core idea is to treat the LLM not as a search engine but as a programmer maintaining a Markdown wiki—a structured, inter‑linked collection of files that persists and compounds over time. When a new source is added, the LLM reads it, extracts key points, and integrates them into the existing wiki by updating entity pages, revising topic summaries, and flagging contradictions.
Three‑Layer Architecture
Raw Sources – Immutable collection of original papers, articles, images, and data files. The LLM only reads this layer.
The Wiki – LLM‑generated Markdown directory containing summaries, entity pages, concept pages, comparative analyses, and overviews. The LLM owns and writes this layer.
The Schema – Rule files that define the wiki’s organization, conventions, and workflows (e.g., CLAUDE.md for Claude Code, AGENTS.md for Codex). This configuration guides the LLM’s disciplined maintenance.
Three Core Operations
Ingest – Add a new raw file, let the LLM read it, discuss key takeaways, write a summary page, update the index, and modify related entity and concept pages. One source may affect 10‑15 wiki pages. Karpathy prefers a step‑by‑step ingest with human guidance.
Query – Pose questions to the wiki. The LLM searches relevant pages, synthesizes a cited answer, and can output in various formats such as Markdown pages, comparison tables, Marp slides, or matplotlib charts. High‑quality answers can be stored back as new wiki pages, creating a compounding knowledge asset.
Lint – Periodic health checks where the LLM scans the wiki for contradictions, outdated statements, orphan pages, missing cross‑references, and suggests new research directions, ensuring the wiki remains coherent as it grows.
Index and Log Files
index.md is a content‑oriented directory listing every wiki page with a link, a one‑sentence summary, and optional metadata (date, source count). It is updated on each ingest and enables efficient navigation without a vector‑based RAG infrastructure, performing well for ~100 sources and hundreds of pages.
log.md is a time‑oriented append‑only record of all operations (ingest, query, lint). A typical entry looks like ## [2026-04-02] ingest | Article Title. Simple Unix tools can extract recent activity, e.g., grep "^## \[" log.md | tail -5, providing a chronological view of wiki evolution.
Practical Construction
Install Claude Code and Obsidian, create a new directory, and let Claude Code automatically compile raw files into the wiki. While the LLM updates the wiki in real time, the user browses results in Obsidian, using the graph view to follow links and see updates.
Karpathy likens the setup to: Obsidian is the IDE, the LLM is the programmer, and the wiki is the codebase.
Why This Works
The most labor‑intensive part of maintaining a knowledge base is not reading or thinking but the ongoing upkeep—updating cross‑references, keeping summaries current, and reconciling new data with old conclusions. Human‑maintained wikis often become too costly to sustain; the LLM can handle up to 15 page updates per ingest, driving maintenance cost toward zero.
Human effort is limited to source selection, guiding analysis, asking good questions, and interpreting results. The LLM performs the rest.
The concept echoes Vannevar Bush’s 1945 Memex vision of a personal, linked information system, with the LLM solving the historic maintenance problem.
Optional CLI Tools
qmd – a local Markdown search engine supporting BM25/vector hybrid search and LLM re‑ranking, useful when the wiki outgrows simple file indexing.
Obsidian Web Clipper – converts web articles to Markdown for quick ingestion.
Local image download settings – configure Obsidian to store attachments in a fixed folder (e.g., raw/assets/) and bind a shortcut for bulk downloading.
Obsidian’s graph view – visualizes the overall wiki structure.
Marp – Markdown‑based slide format for turning wiki content into presentations.
Dataview plugin – queries YAML front‑matter added by the LLM to generate dynamic tables and lists.
References
Original article URL: https://mp.weixin.qq.com/s/ueCIydLLACyqGP5SrAhpjQ
Gist with example prompts: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
