Artificial Intelligence 9 min read

Vector Retrieval and Product Quantization with Faiss

This article explains the challenges of large‑scale vector retrieval, compares Faiss index types such as brute‑force, graph‑based and product quantization, and details how product quantization works, its memory‑speed trade‑offs, hierarchical quantization, and practical hyper‑parameter tuning.

Zhuanzhuan Tech
Zhuanzhuan Tech
Zhuanzhuan Tech
Vector Retrieval and Product Quantization with Faiss

1. Vector Retrieval Problem

With the rise of neural networks, embedding techniques are widely used in search, image, and natural language processing. In industrial scenarios we often need to retrieve relevant text, image, or video items from millions or billions of candidates within a few milliseconds. The three main concerns are speed, memory usage, and accuracy; speed is mandatory, while we also aim to improve accuracy and reduce memory. Faiss, an open‑source vector search library from Facebook AI, provides many index structures that enable millisecond‑level queries on billion‑scale datasets. Understanding the different Faiss index implementations is crucial for selecting the right trade‑off.

2. Faiss Index Types

Faiss index types can be divided into brute‑force, product quantization, locality‑sensitive hashing, and graph‑based methods. In practice, brute‑force offers 100 % recall but unacceptable latency, while graph‑based methods approach brute‑force recall with slightly better speed but require large memory for index construction. Product quantization provides a good balance of speed and memory for very large candidate sets, though its recall is lower than graph‑based methods.

Index

Recall

Latency

Brute‑force

Optimal

Poor

Graph‑based

Good

Good

Product Quantization

Fair

Good

3. Product Quantization

3.1 Vector Quantization

Before introducing product quantization, we review vector quantization. Vector quantization maps a continuous‑valued vector from Euclidean space to a finite set of discrete codewords (the codebook). The mapping is typically performed by k‑means clustering, and the resulting index in Faiss is IndexIVFFlat .

1. Train k‑means to obtain cluster IDs for each vector. 2. Use the cluster IDs as the quantized representation.

3.2 Product Quantization

Product quantization splits a vector into multiple sub‑vectors and quantizes each sub‑vector independently. If a D‑dimensional vector is divided into M segments, each segment has dimension D/M. After training, the global codebook is the Cartesian product of the sub‑codebooks, which gives product quantization its name.

3.3 Memory Consumption Comparison

Assuming a D‑dimensional vector, the codebook size for full‑vector quantization is large, while product quantization stores a much smaller codebook for each segment, dramatically reducing memory while keeping the same representational capacity.

3.4 Application of Product Quantization in Faiss

Faiss implements a hierarchical quantization scheme: a coarse quantizer followed by a fine‑grained product quantizer. The coarse stage quickly narrows down candidate clusters, and the fine stage refines the search within those clusters, greatly reducing query time while preserving accuracy.

The coarse‑plus‑fine quantization accelerates queries but may cause some recall loss; careful hyper‑parameter tuning is required to balance speed and accuracy.

During query, the process can be expressed as:

Vector query after two‑level quantization is performed by first computing distances to coarse centroids, then to residual vectors within selected clusters. The amount of computation depends on the number of coarse centroids and the number of clusters examined at query time.

3.5 Hyper‑Parameters

When building a PQ index, hyper‑parameters such as the number of coarse clusters, the number of sub‑vectors, and the bits per sub‑vector heavily influence recall and latency. Different PQ variants (e.g., OPQ, PQ with PCA) can also be explored.

4. References

Huang J T, Sharma A, Sun S, et al. Embedding‑based Retrieval in Facebook Search[C]// 2020.

Faiss Wiki: https://github.com/facebookresearch/faiss/wiki

Semantic Vector Retrieval – ANN Search: https://mp.weixin.qq.com/s/YOnzPcQiaoNf1aSImNOA5g

Faiss PQ Inverted Index Implementation: https://zhuanlan.zhihu.com/p/34363377

Technical learning platform of ZuanZuan R&D Center and industry partners, sharing frontline experience and cutting‑edge topics. Follow the public accounts “ZuanZuan Technology”, “Big ZuanZuan FE”, and “ZuanZuan QA” for more practical content.
Vector Searchfaissproduct quantizationEmbeddingANN
Zhuanzhuan Tech
Written by

Zhuanzhuan Tech

A platform for Zhuanzhuan R&D and industry peers to learn and exchange technology, regularly sharing frontline experience and cutting‑edge topics. We welcome practical discussions and sharing; contact waterystone with any questions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.