Evolution and Underlying Principles of the Billion‑Scale Image Search System at Youpai Image Manager
This article describes the two‑generation evolution of Youpai Image Manager's billion‑scale image search system, explaining the mathematical representation of images, the limitations of MD5, the first‑generation pHash‑ElasticSearch solution, and the second‑generation CNN‑Milvus approach for robust, large‑scale visual similarity search.
The Youpai Image Manager serves tens of millions of users and manages hundreds of billions of images, requiring a fast solution to locate images by content. The author independently designed and implemented a two‑generation image‑search service, with the first generation launched in early 2019 and the second in 2020.
Fundamentals
What Is an Image?
An image is a collection of pixel points, which can be represented as a matrix where each pixel corresponds to an element.
Mathematical Representation
Images can be expressed as matrices; binary images use 0/1 values, while RGB images use three 8‑bit channels, e.g., (R, G, B) where each component ranges from 0 to 255.
Technical Challenges for Image Search
Finding the original image requires exact pixel matching (e.g., MD5), but compression, watermarks, or minor edits break MD5 comparison. Therefore, the system must extract a comparable representation (feature) and compute similarity between these features.
Represent the image as a computer‑processable data structure.
Enable efficient similarity computation.
First‑Generation Search System
Feature Extraction – Perceptual Hash (pHash)
The first generation uses the perceptual hash (pHash) algorithm, which abstracts the whole image through a series of transformations, producing a hash that is close for visually similar images.
Similarity Computation – Hamming Distance
Similarity between two pHash values is measured by the Hamming distance; a smaller distance indicates higher similarity.
To scale to billions of images, the pHash values are split into eight segments, and a match is considered similar if at least five segments are identical. This segment‑matching is implemented using ElasticSearch term queries with minimum_should_match to filter candidates.
First‑Generation Summary
pHash provides simple computation and tolerance to compression, watermarks, and noise.
ElasticSearch leverages existing infrastructure, avoiding extra cost.
Limitation: pHash is sensitive to global changes (e.g., adding a black border).
Second‑Generation Search System
Feature Extraction – Convolutional Neural Network (CNN)
The second generation adopts a CNN (specifically VGG16) to extract a 512‑dimensional feature vector for each image, offering better robustness and discriminative power.
Vector Search Engine
Feature vectors are stored and searched using the open‑source vector engine Milvus , which efficiently handles high‑dimensional similarity queries at scale.
Second‑Generation Summary
Combines CNN for rich feature extraction with Milvus for fast vector similarity search.
Provides superior support for large‑scale, content‑based image retrieval.
Related Articles
The author previously published two related posts: "Overview of Image Search System" and "Engineering Practice of Image Search System" (links omitted).
System Architect Go
Programming, architecture, application development, message queues, middleware, databases, containerization, big data, image processing, machine learning, AI, personal growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.