Pinecone Vector Database and Embedding Model Summary from DeepLearning.AI’s AI Course
This article reviews the author’s hands‑on experience with Pinecone’s serverless vector database, various embedding and generation models such as all‑MiniLM‑L6‑v2, text‑embedding‑ada‑002, clip‑ViT‑B‑32, and GPT‑3.5‑turbo‑instruct, and demonstrates how they are applied to semantic search, RAG, recommendation, hybrid, and facial similarity tasks using Python code examples.
Introduction
The author recently completed Andrew Ng’s DeepLearning.AI course “Building Applications with Vector Databases” and used Pinecone for semantic search, retrieval‑augmented generation (RAG), recommendation systems, hybrid search, and facial similarity tasks, sharing practical insights and code snippets.
Pinecone Overview
Pinecone is the vector database used in the course; the author notes that Pinecone now offers a ServerlessSpec cloud service with a $100 free credit, which they used without promotional intent.
Models Used
The workflow starts with OpenAI APIs for text generation and embeddings, moves to open‑source models on Hugging Face, and includes the latest large models such as GPT‑4, Gemini, and DALL‑E. The author emphasizes the importance of combining multiple large models in AI projects.
Embedding Models
For semantic search, the author employed the Hugging Face all-MiniLM-L6-v2 sentence‑transformer, a lightweight 384‑dimensional model suitable for clustering and similarity tasks. They also used OpenAI’s text-embedding-ada-002 (1536 dimensions) for RAG, noting its richer feature representation.
# Install dependencies
pip install -U sentence-transformers
# Import and instantiate model
from sentence_transformers import SentenceTransformer
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = SentenceTransformer('all-MiniLM-L6-v2', device=device) # Generate embedding
query = 'which city is the most populated in the world?'
embedding = model.encode(query)
print(embedding.shape)
print(embedding)The author also describes using clip-ViT-B-32 from the sentence‑transformers library for multimodal (image‑text) search on a fashion product dataset, and employing BM25 for sparse vector retrieval via the pinecone-text library.
from pinecone_text.sparse import BM25Encoder
bm25 = BM25Encoder()
bm25.fit(metadata['productDisplayName'])For facial similarity search, DeepFace’s Facenet model is used, achieving up to 97% accuracy in face recognition tasks.
!pip install deepface
from deepface import DeepFace
embedding = DeepFace.represent(img_path=child_base, model_name=MODEL)[0]['embedding']
print(embedding)Generation Models
In the RAG pipeline, retrieved documents are fed to gpt-3.5-turbo-instruct to generate a final answer. The author shows how to build a prompt that concatenates retrieved contexts and passes it to the model.
query = "write an article titled: what is the berlin wall?"
res = index.query(vector=embed.data[0].embedding, top_k=3, include_metadata=True)
contexts = [x['metadata']['text'] for x in res['matches']]
prompt = ("Answer the question based on the context below.\n\nContext:\n" + "\n---\n".join(contexts) + "\n\nQuestion: {query}\nAnswer:")
print(prompt)Conclusion
The course helped the author understand how different large‑model APIs and open‑source embeddings can be combined for various AI applications, from semantic search to multimodal and facial similarity retrieval, highlighting the practical value of vector databases like Pinecone.
References
Building Applications with Vector Databases – DeepLearning.AI
Rare Earth Juejin Tech Community
Juejin, a tech community that helps developers grow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.