Artificial Intelligence 9 min read

Evolution and Underlying Principles of the Billion‑Scale Image Search System at Youpai Image Manager

This article describes the two‑generation evolution of Youpai Image Manager's billion‑scale image search system, explaining the mathematical representation of images, the limitations of MD5, the first‑generation pHash‑ElasticSearch solution, and the second‑generation CNN‑Milvus approach for robust, large‑scale visual similarity search.

System Architect Go
System Architect Go
System Architect Go
Evolution and Underlying Principles of the Billion‑Scale Image Search System at Youpai Image Manager

The Youpai Image Manager serves tens of millions of users and manages hundreds of billions of images, requiring a fast solution to locate images by content. The author independently designed and implemented a two‑generation image‑search service, with the first generation launched in early 2019 and the second in 2020.

Fundamentals

What Is an Image?

An image is a collection of pixel points, which can be represented as a matrix where each pixel corresponds to an element.

Mathematical Representation

Images can be expressed as matrices; binary images use 0/1 values, while RGB images use three 8‑bit channels, e.g., (R, G, B) where each component ranges from 0 to 255.

Technical Challenges for Image Search

Finding the original image requires exact pixel matching (e.g., MD5), but compression, watermarks, or minor edits break MD5 comparison. Therefore, the system must extract a comparable representation (feature) and compute similarity between these features.

Represent the image as a computer‑processable data structure.

Enable efficient similarity computation.

First‑Generation Search System

Feature Extraction – Perceptual Hash (pHash)

The first generation uses the perceptual hash (pHash) algorithm, which abstracts the whole image through a series of transformations, producing a hash that is close for visually similar images.

Similarity Computation – Hamming Distance

Similarity between two pHash values is measured by the Hamming distance; a smaller distance indicates higher similarity.

To scale to billions of images, the pHash values are split into eight segments, and a match is considered similar if at least five segments are identical. This segment‑matching is implemented using ElasticSearch term queries with minimum_should_match to filter candidates.

First‑Generation Summary

pHash provides simple computation and tolerance to compression, watermarks, and noise.

ElasticSearch leverages existing infrastructure, avoiding extra cost.

Limitation: pHash is sensitive to global changes (e.g., adding a black border).

Second‑Generation Search System

Feature Extraction – Convolutional Neural Network (CNN)

The second generation adopts a CNN (specifically VGG16) to extract a 512‑dimensional feature vector for each image, offering better robustness and discriminative power.

Vector Search Engine

Feature vectors are stored and searched using the open‑source vector engine Milvus , which efficiently handles high‑dimensional similarity queries at scale.

Second‑Generation Summary

Combines CNN for rich feature extraction with Milvus for fast vector similarity search.

Provides superior support for large‑scale, content‑based image retrieval.

Related Articles

The author previously published two related posts: "Overview of Image Search System" and "Engineering Practice of Image Search System" (links omitted).

CNNElasticsearchMilvusvector searchImage SearchpHash
System Architect Go
Written by

System Architect Go

Programming, architecture, application development, message queues, middleware, databases, containerization, big data, image processing, machine learning, AI, personal growth.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.