Artificial Intelligence 7 min read

Understanding Embeddings and Vector Databases for LLM Applications

This article explains what embeddings and vector databases are, how they are generated with models like OpenAI's Ada, why they enable semantic search and help overcome large language model token limits, and demonstrates a practical workflow for retrieving relevant document chunks using cosine similarity.

Architect

May 29, 2023

Understanding Embeddings and Vector Databases for LLM Applications

Vector databases and embeddings have become hot topics in the AI field. Companies such as Pinecone have raised significant funding, and firms like Shopify, Brex, and Hubspot already use these technologies in their AI applications.

An embedding is a multi‑dimensional array of numbers that can represent any item—text, music, video, etc. This article focuses on text embeddings.

Embeddings are created by sending text to an embedding model (e.g., OpenAI’s Ada), which returns a vector that can be stored for later use.

These vectors enable semantic search because they capture meaning, allowing similarity‑based queries such as finding related concepts like “man”, “king”, “woman”, and “queen” in a vector space.

For a more intuitive illustration, imagine a child looking for similar toys (e.g., a toy car and a toy bus) based on the shared concept of transportation; this is semantic similarity.

Embeddings are especially valuable for large language models (LLMs) because LLMs have context‑window limits (e.g., GPT‑3.5 around 4 k tokens, GPT‑4 up to 32 k). By embedding large documents and retrieving only the most relevant chunks, we can stay within these limits.

A typical workflow is:

Split a large document (e.g., a PDF) into chunks.

Generate an embedding vector for each chunk using a model.

Store the vector and its associated text chunk in a database.

When a user asks a question, the query is also embedded, and cosine similarity is used to find the most relevant chunk vectors.

Example data structure (simplified):

{
  [1,2,3,34]: 'text chunk 1',
  [2,3,4,56]: 'text chunk 2',
  [4,5,8,23]: 'text chunk 3',
  ...
}

After retrieving the top‑k similar chunks, they are combined with a prompt and fed to the LLM, for instance:

Known context: text chunk 1, text chunk 2, text chunk 3.
User question: "What did they say about xyz?"
Please answer based on the given context.

If the LLM cannot answer, it should honestly respond, "I cannot answer this question."

This demonstrates how embeddings and vector search empower LLMs to provide chat‑like capabilities over arbitrary data sources, without being a form of fine‑tuning.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM information retrieval semantic search Embeddings token limit

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.