Turning Technical Books into Claude Code Skills: Unlocking Internal Documentation as Reusable Skills
The article introduces the open‑source "book-to-skill" tool that compiles PDFs or EPUBs into Claude Code skills, explains its on‑demand loading architecture, compares it with raw PDF retrieval and RAG, and provides detailed implementation steps, performance numbers, and practical usage guidelines.
Tool Overview
Give a PDF or EPUB to ~/.claude/skills/<slug>/. The tool generates structured files: SKILL.md (core framework + chapter index) and per‑chapter markdown files, plus a glossary, patterns list, and cheatsheet.
On‑Demand Loading
A 400‑page technical book is roughly 200 K tokens. Loading the whole text each conversation is costly. The skill loads only the ~4 K‑token SKILL.md by default; a chapter file (~1 K tokens) is read only when a specific topic is queried, dramatically reducing token consumption.
Difference from Direct PDF Ingestion
Direct PDF ingestion performs keyword retrieval and returns page ranges (e.g., “pages 117‑135 mention replication”). The skill extracts the author’s framework, naming conventions, and mental models, so a query about data replication returns a concise summary of the three replication models and their applicable scenarios.
Not a Retrieval‑Augmented Generation (RAG) System
RAG searches across many documents at query time. The skill compiles a single book once, extracting its structure, making it ideal for deep, frequent reference to one source.
Technical Implementation
The project consists of two files: SKILL.md – skill definition, ≈530 lines. scripts/extract.py – text extraction script, ≈830 lines.
Supported source formats: PDF, EPUB, DOCX, TXT, Markdown, reStructuredText, AsciiDoc, HTML, RTF, MOBI/AZW/AZW3.
PDF extraction has two modes:
text mode : pdftotext → PyPDF2 → pdfminer, finishes in seconds, suitable for plain‑text books.
technical mode : uses Docling to preserve tables and code blocks; for a 103‑page book it takes about 164 s and retains 48 tables and 36 code blocks.
Benchmark on a 103‑page technical book:
pdftotext – 0.1 s, 27 K tokens, no tables, no code.
Docling – 164 s, 27 K (+1.2 %) tokens, 48 tables, 36 code blocks.
Compilation Workflow
Document file
│
▼
User selects: technical book or plain‑text book
│
├── technical → Docling (preserve tables & code)
└── plain‑text → pdftotext chain (fast)
│
▼
extract.py outputs full text + metadata
│
▼
Claude analyzes structure (title, author, chapters, TOC)
│
▼
Generate per‑chapter summaries (800–1,200 tokens each)
│
▼
Add glossary, patterns, cheatsheet
│
▼
Create SKILL.md (core framework + index)
│
▼
Write to ~/.claude/skills/<slug>/For books larger than 50 K tokens (≈130 pages), the tool slices the book by chapter using grep + sed, loading only the needed chapter and saving an order of magnitude in token cost.
Installation
mkdir -p ~/.claude/skills/book-to-skill/scripts
curl -o ~/.claude/skills/book-to-skill/SKILL.md https://raw.githubusercontent.com/virgiliojr94/book-to-skill/master/SKILL.md
curl -o ~/.claude/skills/book-to-skill/scripts/extract.py https://raw.githubusercontent.com/virgiliojr94/book-to-skill/master/scripts/extract.pyRunning the Compiler
# PDF – auto‑derive skill name
/book-to-skill ~/Downloads/designing-data-intensive-applications.pdf
# EPUB – specify custom name
/book-to-skill ~/books/clean-code.epub clean-code
# Full path and custom name
/book-to-skill /tmp/ddd-evans.pdf domain-driven-designUsing the Skill
/designing-data-intensive-apps # load core framework
/designing-data-intensive-apps replication # query a specific topic
/designing-data-intensive-apps ch05 # open chapter 5Dependencies
Plain‑text books: pdftotext (poppler‑utils).
Technical books: pip3 install docling.
EPUB handling (optional): pip3 install ebooklib beautifulsoup4.
Design Principles
Density over completeness : a 1 000‑token summary is more useful than a 10 000‑token verbatim excerpt.
Practitioner perspective : output statements such as “use Y when X” instead of merely citing chapter locations.
Pre‑load SKILL.md : place the most important content at the beginning because Claude Code truncates from the end.
Never copy raw text : chapter files are distilled, preserving copyright and information density.
Token‑cost estimate : the compiler reports input/output token counts and estimated monetary cost before proceeding.
Suitable Scenarios
Repeatedly referenced technical books (e.g., DDIA, Clean Code, DDD).
Internal company documentation not present in Claude’s training data.
Newly published books that Claude has not yet ingested.
Non‑English technical books with limited training coverage.
Unsuitable Scenarios
Search across dozens of books – better suited for RAG.
Books Claude already knows well (e.g., Python Cookbook) – limited benefit.
Scanned PDFs without a text layer – require OCR first.
Limitations
The tool’s ceiling depends on Claude’s extraction quality; deep logical chains or heavy context dependencies may not be fully captured in a 1 000‑token chapter summary, though most framework‑oriented books work well.
Conclusion
book-to-skill extracts a book’s core structure, splits it by chapter, and installs it as a Claude Code skill. Compared with raw PDF ingestion, it offers on‑demand loading, predictable token usage, and more reliable answers. The project consists of two files (≈1 300 lines total) and has over 2 300 GitHub stars.
GitHub repository: https://github.com/virgiliojr94/book-to-skill
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Old Zhang's AI Learning
AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
