Choosing the Right Retrieval Strategy: Full‑Text vs Vector vs Graph Search

This article breaks down the underlying logic, ideal scenarios, benchmark data, decision trees, and real‑world case studies for full‑text (BM25), vector, and graph retrieval, showing why hybrid approaches dominate production while each technique has distinct strengths and trade‑offs.

James' Growth Diary
James' Growth Diary
James' Growth Diary
Choosing the Right Retrieval Strategy: Full‑Text vs Vector vs Graph Search

1. Full‑Text Retrieval: Inverted Index + BM25, the "literal precision" champion

In Elasticsearch a query like "refund policy 2024" is tokenized, each term looks up its posting list in the inverted index, the intersected documents are scored with BM25, and the top hits are returned.

User input "refund policy 2024" → Tokens: ["refund", "policy", "2024"] → Inverted index lookup: "refund" → [doc3, doc7, doc12, doc18]; "policy" → [doc1, doc3, doc9, doc12]; "2024" → [doc5, doc12, doc20] → Intersection + BM25 ranking → Result: doc12 (highest score) > doc3 > …

BM25 scoring combines three ideas: term‑frequency saturation, higher weight for rare terms, and document‑length normalization.

// LangChain.js uses Elasticsearch for full‑text search
import { Client } from "@elastic/elasticsearch";

const client = new Client({ node: "http://localhost:9200" });

async function fullTextSearch(query: string, indexName: string) {
  const response = await client.search({
    index: indexName,
    body: {
      query: {
        multi_match: {
          query: query,
          fields: ["content^2", "title^3"], // title gets higher boost
          type: "best_fields",
          fuzziness: "AUTO" // tolerate minor typos
        }
      },
      highlight: { fields: { content: { fragment_size: 150 } } }
    }
  });
  return response.hits.hits.map(hit => ({
    score: hit._score,
    content: hit._source.content,
    highlight: hit.highlight?.content
  }));
}

Scenarios where full‑text shines:

Exact term queries such as product model iPhone 15 Pro Max, legal clause numbers, or personal names.

Documents containing many specialized terms or abbreviations (e.g., ICD‑10, RFC 7231).

Low‑latency requirements: BM25 is 5‑10× faster than vector search and uses an order of magnitude fewer resources.

When full‑text fails: Queries like "my phone is dead" cannot match a document that says "device battery exhausted" because there is no lexical overlap.

2. Vector Retrieval: Semantic matching in a high‑dimensional space

Vector retrieval semantic space diagram
Vector retrieval semantic space diagram

Vector search maps text to a numeric vector; sentences with similar meaning end up close in the space.

"phone is dead" → [0.82, -0.31, 0.47, …] (1536‑dim) "device battery exhausted" → [0.79, -0.28, 0.51, …] (1536‑dim) Cosine similarity = 0.94 → high enough to retrieve.

This explains why semantic search can find content expressed with different wording.

// LangChain.js vector retrieval example (JavaScript)
import { OpenAIEmbeddings } from "@langchain/openai";
import { Milvus } from "@langchain/community/vectorstores/milvus";

const embeddings = new OpenAIEmbeddings({ model: "text-embedding-3-small" });
const vectorStore = await Milvus.fromExistingCollection(embeddings, {
  collectionName: "knowledge_base",
  clientConfig: { address: "localhost:19530" }
});

async function semanticSearch(query: string, topK = 5) {
  const results = await vectorStore.similaritySearchWithScore(query, topK);
  return results
    .filter(([_, score]) => score > 0.7)
    .map(([doc, score]) => ({ content: doc.pageContent, score, metadata: doc.metadata }));
}

// Max‑Marginal‑Relevance to diversify results
async function diverseSearch(query: string) {
  return await vectorStore.maxMarginalRelevanceSearch(query, {
    k: 8,
    fetchK: 20,
    lambda: 0.6 // 0 = max diversity, 1 = max relevance
  });
}

Best use cases for vector retrieval:

RAG Q&A where the user asks "how to return a product" but the document uses "refund procedure".

Multilingual search.

Fuzzy intent understanding.

Common pitfalls:

Exact number or ID lookup – vectors are not reliable.

Negation handling – models may surface unwanted terms.

Very long documents – compressing too much loses precision.

3. Graph Retrieval: Multi‑hop relational reasoning

Graph retrieval multi‑hop reasoning diagram
Graph retrieval multi‑hop reasoning diagram

Full‑text and vector search treat each document as an isolated node, ignoring relationships. Graph retrieval enables reasoning over edges.

Vector search for "LangGraph author OpenAI partnership" → no direct hit. Graph reasoning path: LangGraph → belongs to → LangChain → founder → Harrison Chase Harrison Chase → involved in → [Project A, Project B, Project C] Project B → partners with → OpenAI → found after 3 hops.
// LangChain.js + Neo4j graph retrieval example (JavaScript)
import { Neo4jGraph } from "@langchain/community/graphs/neo4j_graph";
import { GraphCypherQAChain } from "langchain/chains/graph_qa/cypher";
import { ChatOpenAI } from "@langchain/openai";

const graph = await Neo4jGraph.initialize({
  url: "bolt://localhost:7687",
  username: "neo4j",
  password: "your-password",
  database: "knowledge"
});

const llm = new ChatOpenAI({ model: "gpt-4o", temperature: 0 });

const chain = GraphCypherQAChain.fromLLM({
  llm,
  graph,
  returnIntermediateSteps: true,
  cypherPrompt: `
    You are a Neo4j expert. Convert the user question to a Cypher MATCH/RETURN query only.
    Nodes: Person, Organization, Project, Technology
    Relationships: CREATED, WORKS_FOR, COLLABORATES_WITH, USES, BELONGS_TO
  `
});

const result = await chain.invoke({ query: "What projects does the creator of LangGraph participate in?" });
console.log("Generated Cypher:", result.intermediateSteps[0].query);
console.log("Answer:", result.result);

Graph‑only battlefields:

Multi‑hop reasoning (e.g., supplier‑of‑supplier analysis).

Relationship‑centric Q&A for enterprise knowledge graphs.

Path discovery (e.g., "what intermediate skills are needed to go from Technology A to B").

Impact analysis (e.g., "changing module X affects which downstream services").

4. Side‑by‑side comparison of the three retrieval types

The benchmark below summarizes recall, latency, cost, and explainability.

Exact‑term recall: Full‑text ★★★★★, Vector ★★☆☆☆, Graph ★★★☆☆, Hybrid ★★★★☆.

Semantic recall: Full‑text ★★☆☆☆, Vector ★★★★★, Graph ★★★☆☆, Hybrid ★★★★☆.

Multi‑hop reasoning: Full‑text ★☆☆☆☆, Vector ★☆☆☆☆, Graph ★★★★★, Hybrid ★★★☆☆.

Build cost: Full‑text low, Vector medium (needs embeddings), Graph high (needs relationship extraction), Hybrid medium‑high.

Query latency: Full‑text <50 ms, Vector 50‑200 ms, Graph 200‑500 ms, Hybrid 100‑300 ms.

Explainability: Full‑text ★★★★★, Vector ★★☆☆☆, Graph ★★★★☆, Hybrid ★★★☆☆.

Conclusion: Graph retrieval offers unique multi‑hop reasoning but costs an order of magnitude more to build and maintain.

5. Decision‑tree for choosing a retrieval method

Retrieval selection decision tree
Retrieval selection decision tree

Exact term / ID / jargon: Use full‑text (Elasticsearch/BM25).

"Meaning" rather than exact words: Use vector (Milvus/Weaviate).

Entity relationship reasoning: Use graph (Neo4j).

All of the above: Use hybrid (multi‑source recall + RRF re‑ranking).

6. Engineering a hybrid retrieval pipeline (three‑way recall + RRF merge)

Hybrid retrieval architecture diagram
Hybrid retrieval architecture diagram

In production about 80 % of systems adopt a hybrid approach. The implementation runs full‑text, vector, and (optionally) graph searches in parallel, merges results with Reciprocal Rank Fusion (RRF), and optionally applies a reranker.

// Three‑way recall + RRF merge + optional rerank (TypeScript)
import { CohereRerank } from "@langchain/cohere";

interface SearchResult {
  content: string;
  score: number;
  source: "fulltext" | "vector" | "graph";
  metadata?: Record<string, unknown>;
}

function reciprocalRankFusion(results: SearchResult[][], k = 60): SearchResult[] {
  const scoreMap = new Map<string, { score: number; doc: SearchResult }>();
  for (const resultList of results) {
    resultList.forEach((doc, rank) => {
      const key = doc.content.slice(0, 100);
      const rrfScore = 1 / (k + rank + 1);
      if (scoreMap.has(key)) {
        scoreMap.get(key)!.score += rrfScore;
      } else {
        scoreMap.set(key, { score: rrfScore, doc });
      }
    });
  }
  return Array.from(scoreMap.values())
    .sort((a, b) => b.score - a.score)
    .map(item => item.doc);
}

function needsRelationalReasoning(query: string): boolean {
  const keywords = ["上级", "下级", "负责人", "属于", "管理", "关联", "依赖", "影响", "供应商", "合作方"];
  return keywords.some(kw => query.includes(kw));
}

async function hybridSearch(query: string, topK = 5) {
  const tasks = [
    fullTextSearch(query).then(r => r.map(d => ({ ...d, source: "fulltext" }))),
    semanticSearch(query).then(r => r.map(d => ({ ...d, source: "vector" })))
  ];
  if (needsRelationalReasoning(query)) {
    tasks.push(
      graphSearch(query).then(r => r.map(d => ({ ...d, source: "graph" })))
    );
  }
  const allResults = await Promise.all(tasks);
  const merged = reciprocalRankFusion(allResults);
  if (merged.length > topK) {
    const reranker = new CohereRerank({ model: "rerank-multilingual-v3.0" });
    return await reranker.compressDocuments(
      merged.slice(0, 20).map(r => ({ pageContent: r.content, metadata: r.metadata ?? {} })),
      query
    );
  }
  return merged.slice(0, topK);
}

7. Common pitfalls encountered during selection

Myth: Vector search solves everything – Replacing Elasticsearch with a vector store dropped exact‑ID accuracy from 95 % to 60 % because vectors do not preserve exact lexical matches.

Myth: Graph search requires Neo4j – Migrating an entire knowledge base to Neo4j wasted three months; 80 % of queries were simple document look‑ups better served by ES + vector.

Myth: Hybrid = simple score addition – Adding BM25 scores (e.g., 12.5) to cosine similarity (e.g., 0.87) mixes incomparable units; RRF solves this by using rank only.

Stale graph data – Relationships without timestamps can produce outdated answers; include timestamps and filter by recency.

Embedding model change without re‑indexing – Switching from text-embedding-ada-002 to text-embedding-3-small without recomputing old vectors creates incompatible spaces and breaks similarity.

8. Industry case studies: how leading products pick retrieval strategies

Notion AI

Primary: OpenAI text-embedding-3-large vector search.

Fallback: Postgres tsvector full‑text for exact page titles and block IDs.

No graph search – the content hierarchy is a simple tree.

Logic: Users often know the topic but not the exact wording, which suits vector search; exact titles are covered by the full‑text fallback.

Perplexity

Primary: BM25 full‑text for real‑time web search (queries are usually keyword combos like "Tesla Q1 2024 earnings").

Supplement: Vector index for "related‑question" recommendation.

Logic: Fast, precise keyword search for answers; semantic vectors for discovery.

Glean (enterprise search)

Full‑text per data source (Slack, Jira, Confluence, etc.) for exact identifiers.

Vector for cross‑source semantic matching.

Graph for employee‑relationship and document‑reference graphs, enabling multi‑hop queries like "documents owned by people reporting to Jane".

Logic: Diverse enterprise queries demand all three modalities.

Cursor / GitHub Copilot (code search)

Full‑text for exact symbol names (function, variable, import path).

Vector for semantic matching of comments and docstrings.

Structured (AST‑based) graph search for "go‑to‑definition" style queries.

Logic: Code has its own semantic graph; pure text is insufficient.

9. Summary

Full‑text (BM25) is unbeatable for exact term matching, low latency, and interpretability.

Vector retrieval excels at "meaning‑based" matching for FAQs, document understanding, and multilingual scenarios.

Graph retrieval uniquely handles multi‑hop relational reasoning but incurs high build and maintenance cost.

Production systems almost always adopt a hybrid pipeline: full‑text + vector recall, RRF merge, optional rerank.

Choose the technique that matches the business problem first; avoid chasing hype.

Next article: "Observability for production‑grade agents" – how LangSmith tracks every LLM call and quantifies RAG retrieval quality.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

RAGvector searchretrievalFull-Text Searchgraph searchHybrid Search
James' Growth Diary
Written by

James' Growth Diary

I am James, focusing on AI Agent learning and growth. I continuously update two series: “AI Agent Mastery Path,” which systematically outlines core theories and practices of agents, and “Claude Code Design Philosophy,” which deeply analyzes the design thinking behind top AI tools. Helping you build a solid foundation in the AI era.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.