Building a Reverse Image Search Engine with Geometric Distance, ResNet Feature Embeddings, Clustering, and Milvus Vector Database
This article walks through implementing a reverse image search system, starting with simple pixel‑based geometric distance, then improving accuracy using ResNet‑derived feature embeddings, accelerating queries with K‑means clustering, and finally deploying a Milvus vector database for fast, scalable similarity retrieval.
In many search engines a built‑in "search by image" feature simplifies finding similar pictures; this tutorial explains how to build such a system from scratch.
1. Simple geometric distance approach – Images are resized to a common size, then the Euclidean (geometric) distance between the target and each source image is computed. The distance = ((target - source) ** 2).sum() formula is used, and the n smallest distances are returned as results. This works for simple, low‑complexity datasets (e.g., cartoon avatars) but suffers from high computational cost and poor semantic matching.
import os
import cv2
import random
import numpy as np
base_path = r"G:\datasets\lbxx"
files = [os.path.join(base_path, file) for file in os.listdir(base_path)]
target_path = random.choice(files)
target = cv2.imread(target_path)
h, w, _ = target.shape
distances = []
for file in files:
source = cv2.imread(file)
if not isinstance(source, np.ndarray):
continue
source = cv2.resize(source, (w, h))
distance = ((target - source) ** 2).sum()
distances.append((file, distance))
distances = sorted(distances, key=lambda x: x[-1])[:6]
imgs = list(map(lambda x: cv2.imread(x[0]), distances))
result = np.hstack(imgs)
cv2.imwrite("result.jpg", result)2. Limitations of pixel‑level comparison – Pixels lack semantic information (style, objects, color tone) and differ in size, leading to inaccurate similarity and high O(n) search cost.
3. Improvement 1: Feature embeddings – Replace raw pixels with deep‑learning features extracted by a pre‑trained ResNet‑50 model (outputting a 7×7×2048 tensor). These embeddings capture abstract visual concepts, are lower‑dimensional than raw pixels, and are more robust to variations.
from keras.api.keras.applications.resnet50 import ResNet50, preprocess_input
w, h = 224, 224
encoder = ResNet50(include_top=False)
base_path = r"G:\datasets\lbxx"
files = [os.path.join(base_path, file) for file in os.listdir(base_path)]
target_path = random.choice(files)
target = cv2.resize(cv2.imread(target_path), (w, h))
target = encoder(preprocess_input(target[None]))
distances = []
for file in files:
source = cv2.imread(file)
if not isinstance(source, np.ndarray):
continue
source = cv2.resize(source, (w, h))
source = encoder(preprocess_input(source[None]))
distance = np.sum((target - source) ** 2)
distances.append((file, distance))
distances = sorted(distances, key=lambda x: x[-1])[:6]
imgs = list(map(lambda x: cv2.imread(x[0]), distances))
result = np.hstack(imgs)
cv2.imwrite("result.jpg", result)4. Improvement 2: Clustering for speed – Compute embeddings for all images once, store them, and run K‑means (e.g., 500 clusters). At query time, find the nearest cluster center to the target embedding, then perform a linear scan only within that cluster, reducing search complexity from O(n) to O(log n) + O(cluster‑size).
from sklearn.cluster import KMeans
import pickle, joblib
# Assume `embeddings` is a list of dicts {"filepath": ..., "embedding": tf.reshape(...)}
X = [item['embedding'] for item in embeddings]
kmeans = KMeans(n_clusters=500)
kmeans.fit(X)
preds = kmeans.predict(X)
for item, pred in zip(embeddings, preds):
item['cluster'] = pred
joblib.dump(kmeans, 'kmeans.pkl')
with open('embeddings.pkl', 'wb') as f:
pickle.dump(embeddings, f)After clustering, the nearest cluster is located by comparing the target embedding to kmeans.cluster_centers_ . Then a second linear scan inside that cluster yields the final top‑k similar images.
5. Vector database (Milvus) integration – To avoid recomputing distances and to support large‑scale retrieval, embeddings are stored in Milvus. The workflow includes installing Milvus via Docker, creating a collection with an FLOAT_VECTOR field, optionally reducing dimensionality with PCA (e.g., to 2048), inserting embeddings, and performing an L2‑based similarity search.
# Install Milvus
wget https://github.com/milvus-io/milvus/releases/download/v2.2.11/milvus-standalone-docker-compose.yml -O docker-compose.yml
sudo docker-compose up -d
# Connect and create collection
from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection, utility
connections.connect(host='127.0.0.1', port='19530')
def create_milvus_collection(name, dim):
if utility.has_collection(name):
utility.drop_collection(name)
fields = [
FieldSchema(name='id', dtype=DataType.INT64, descrition='ids', max_length=500, is_primary=True, auto_id=True),
FieldSchema(name='filepath', dtype=DataType.VARCHAR, description='filepath', max_length=512),
FieldSchema(name='embedding', dtype=DataType.FLOAT_VECTOR, descrition='embedding vectors', dim=dim),
]
schema = CollectionSchema(fields=fields, description='reverse image search')
collection = Collection(name=name, schema=schema)
index_params = {'metric_type': 'L2', 'index_type': "IVF_FLAT", 'params': {"nlist": 2048}}
collection.create_index(field_name="embedding", index_params=index_params)
return collection
collection = create_milvus_collection('images', 2048)
# Insert embeddings (after optional PCA)
for item in embeddings:
collection.insert([[item['filepath']], [item['embedding']]])
# Search
search_params = {"metric_type": "L2", "params": {"nprobe": 10}, "offset": 5}
results = collection.search(data=[target[0]], anns_field='embedding', param=search_params, output_fields=['filepath'], limit=10, consistency_level="Strong")The final section demonstrates end‑to‑end search: load the target image, extract its ResNet embedding, apply the same PCA transformation used during indexing, query Milvus, retrieve the file paths of the nearest images, load them, and stitch them together for visual inspection.
Conclusion – By progressively replacing raw pixel comparison with deep feature embeddings, clustering, and a dedicated vector database, the reverse image search pipeline becomes both semantically richer and orders of magnitude faster, making it suitable for real‑world image retrieval tasks.
Rare Earth Juejin Tech Community
Juejin, a tech community that helps developers grow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.