Vector Retrieval and Product Quantization with Faiss
This article explains the challenges of large‑scale vector retrieval, compares Faiss index types such as brute‑force, graph‑based and product quantization, and details how product quantization works, its memory‑speed trade‑offs, hierarchical quantization, and practical hyper‑parameter tuning.
1. Vector Retrieval Problem
With the rise of neural networks, embedding techniques are widely used in search, image, and natural language processing. In industrial scenarios we often need to retrieve relevant text, image, or video items from millions or billions of candidates within a few milliseconds. The three main concerns are speed, memory usage, and accuracy; speed is mandatory, while we also aim to improve accuracy and reduce memory. Faiss, an open‑source vector search library from Facebook AI, provides many index structures that enable millisecond‑level queries on billion‑scale datasets. Understanding the different Faiss index implementations is crucial for selecting the right trade‑off.
2. Faiss Index Types
Faiss index types can be divided into brute‑force, product quantization, locality‑sensitive hashing, and graph‑based methods. In practice, brute‑force offers 100 % recall but unacceptable latency, while graph‑based methods approach brute‑force recall with slightly better speed but require large memory for index construction. Product quantization provides a good balance of speed and memory for very large candidate sets, though its recall is lower than graph‑based methods.
Index
Recall
Latency
Brute‑force
Optimal
Poor
Graph‑based
Good
Good
Product Quantization
Fair
Good
3. Product Quantization
3.1 Vector Quantization
Before introducing product quantization, we review vector quantization. Vector quantization maps a continuous‑valued vector from Euclidean space to a finite set of discrete codewords (the codebook). The mapping is typically performed by k‑means clustering, and the resulting index in Faiss is IndexIVFFlat .
1. Train k‑means to obtain cluster IDs for each vector. 2. Use the cluster IDs as the quantized representation.
3.2 Product Quantization
Product quantization splits a vector into multiple sub‑vectors and quantizes each sub‑vector independently. If a D‑dimensional vector is divided into M segments, each segment has dimension D/M. After training, the global codebook is the Cartesian product of the sub‑codebooks, which gives product quantization its name.
3.3 Memory Consumption Comparison
Assuming a D‑dimensional vector, the codebook size for full‑vector quantization is large, while product quantization stores a much smaller codebook for each segment, dramatically reducing memory while keeping the same representational capacity.
3.4 Application of Product Quantization in Faiss
Faiss implements a hierarchical quantization scheme: a coarse quantizer followed by a fine‑grained product quantizer. The coarse stage quickly narrows down candidate clusters, and the fine stage refines the search within those clusters, greatly reducing query time while preserving accuracy.
The coarse‑plus‑fine quantization accelerates queries but may cause some recall loss; careful hyper‑parameter tuning is required to balance speed and accuracy.
During query, the process can be expressed as:
Vector query after two‑level quantization is performed by first computing distances to coarse centroids, then to residual vectors within selected clusters. The amount of computation depends on the number of coarse centroids and the number of clusters examined at query time.
3.5 Hyper‑Parameters
When building a PQ index, hyper‑parameters such as the number of coarse clusters, the number of sub‑vectors, and the bits per sub‑vector heavily influence recall and latency. Different PQ variants (e.g., OPQ, PQ with PCA) can also be explored.
4. References
Huang J T, Sharma A, Sun S, et al. Embedding‑based Retrieval in Facebook Search[C]// 2020.
Faiss Wiki: https://github.com/facebookresearch/faiss/wiki
Semantic Vector Retrieval – ANN Search: https://mp.weixin.qq.com/s/YOnzPcQiaoNf1aSImNOA5g
Faiss PQ Inverted Index Implementation: https://zhuanlan.zhihu.com/p/34363377
Technical learning platform of ZuanZuan R&D Center and industry partners, sharing frontline experience and cutting‑edge topics. Follow the public accounts “ZuanZuan Technology”, “Big ZuanZuan FE”, and “ZuanZuan QA” for more practical content.
Zhuanzhuan Tech
A platform for Zhuanzhuan R&D and industry peers to learn and exchange technology, regularly sharing frontline experience and cutting‑edge topics. We welcome practical discussions and sharing; contact waterystone with any questions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.