Build Your Own Personal LLM Wiki (No RAG) – A Step‑by‑Step Guide
This article walks through how to assemble a personal LLM‑driven wiki using Claude Code, Obsidian, and the open‑source qmd search engine, detailing the directory layout, CLAUDE.md workflow spec, ingestion/query/health‑check cycles, and how the approach can be extended for enterprise knowledge bases.
Karpathy’s LLM‑Wiki concept
"I recently discovered a useful pattern: use an LLM to build a personal knowledge base for each research direction. I put the raw material in a raw/ folder and let the LLM gradually ‘compile’ it into a wiki (a collection of .md files). The wiki contains summaries, backlinks, and categorises concepts as separate pages that reference each other. Obsidian is my front‑end IDE. Crucially, all content in the wiki is written and maintained by the LLM; I hardly touch it directly."
When the knowledge base grows (Karpathy’s example: ~100 articles, 400 k words), it can support complex Q&A.
Karpathy built a tiny search engine as a tool.
Periodic “health checks” let the LLM spot contradictions, fill gaps and discover new connections.
Future direction mentioned: synthetic data + fine‑tuning to embed knowledge into model weights.
Implementation
Directory layout
llm-wiki/
├── raw/ ← original material (read‑only)
├── wiki/ ── symlink ──→ Obsidian Vault/llm-wiki/
└── CLAUDE.md ← workflow specificationKey design: wiki/ is a symlink into an Obsidian vault so the wiki benefits from graph view, bidirectional links and full‑text search while staying inside the project.
Wiki internal structure
wiki/
├── index.md ← one‑line per page: link + summary
├── log.md ← operation log
├── concepts/ ← one .md per core concept
├── entities/ ← pages for people, projects, tools, companies
├── sources/ ← summary page for each raw source
└── outputs/ ← query results, analyses, comparison tablesSpecification file CLAUDE.md
Defines three operations for Claude:
Ingest : When a file appears in raw/, Claude discusses the key points, writes a summary in sources/, updates index.md, creates/updates related concepts/ and entities/ pages, and appends an entry to log.md.
Query : Claude reads index.md to locate relevant pages, synthesises an answer, cites the exact pages, and stores valuable answers in outputs/.
Lint (health check) : Periodically scans for contradictory information, orphan pages, missing concept pages, and outdated content.
Search engine (qmd)
Open‑source qmd (by Tobias Lütke) provides three local search modes:
BM25 keyword search – qmd search "keyword" (millisecond latency, no model required).
Vector semantic search – qmd vsearch "description" (requires an embedding model).
Hybrid + re‑ranking – qmd query "question" (BM25 + vector + LLM re‑ranking, best quality).
Toolchain
Claude Code – LLM agent that reads files, writes wiki pages and answers queries.
Obsidian – Front‑end IDE that displays the wiki, provides graph view and bidirectional links.
qmd – Fully offline local search engine (BM25 + vector + LLM re‑ranking).
CLAUDE.md – Text file that encodes the ingest / query / lint workflow for the agent.
Workflow examples
Ingest new knowledge
Place an article, paper or PDF into raw/ and issue the command:
"Process raw/this‑paper.pdf"
Claude discusses the key points, creates a summary page in sources/, updates index.md, creates or updates related concept and entity pages, and records the operation in log.md. A single source can trigger updates to 10‑15 wiki pages.
Cross‑note query
Ask Claude a question, e.g.
"What conclusions have I drawn about RAG and vector databases?" "What are my core views on agent architectures?" "Which AI tools have I used and what are their pros and cons?"
Claude reads index.md, pulls the relevant pages, synthesises an answer, cites the exact pages, and stores high‑value answers in outputs/.
Health check
Run the lint operation:
"lint"
Claude checks for contradictions, isolated pages, missing concept pages and stale information.
Comparison with Karpathy’s original idea
Search : qmd offers a complete hybrid search (BM25 + vector + LLM re‑ranking) out‑of‑the‑box, which is stronger than Karpathy’s simple home‑grown engine.
Workflow codification : CLAUDE.md turns the ad‑hoc “talk to the LLM” process into a repeatable specification.
Obsidian integration : Symlinking wiki/ into an Obsidian vault makes bidirectional links and graph view native.
Enterprise adaptation
Adjusted architecture
Enterprise LLM Wiki
├── raw/ ← meeting notes, design docs, technical proposals, external material
├── wiki/
│ ├── concepts/ ← internal frameworks, business terminology
│ ├── entities/ ← systems, services, teams, people
│ ├── projects/ ← project pages (new)
│ ├── decisions/ ← ADR records (new)
│ └── runbooks/ ← operational manuals (new)
├── CLAUDE.md ← team collaboration spec
└── .git/ ← version control (new)Enterprise value
Onboarding : New hires can ask “What is our messaging architecture?” and receive an answer directly from the wiki.
Decision provenance : Architecture reviews are ingested; future decisions can reference the historical rationale.
Cross‑team knowledge sharing : Results from one team become instantly visible to another, avoiding duplicated effort.
Automatic documentation : New RFCs, post‑mortems or technical docs automatically update related concept and entity pages.
Enterprise challenges
Access control : Hierarchical permission layers are required to restrict page visibility.
Collaboration : Multiple contributors need conflict resolution and versioning; Git provides a natural solution.
Content review : LLM‑generated pages must be manually verified before publishing.
Privacy & security : Sensitive corporate knowledge must stay on‑premise; a locally deployed LLM and qmd satisfy this requirement.
Conclusion
Karpathy’s vision treats the LLM as a knowledge compiler, maintainer and analyst. By wiring Claude Code, Obsidian, qmd and a concise CLAUDE.md spec, the paradigm can be reproduced in under an afternoon. The core invariant is:
Raw material lives forever in raw/ ; compiled knowledge lives forever in wiki/ ; the LLM is the sole editor.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
