Artificial Intelligence 7 min read

Understanding Image Similarity: Image Hashing and Feature-Based Methods

This article explains why simple MD5 checks cannot assess image similarity and introduces two major approaches—image hashing and image feature extraction—detailing their algorithms, practical performance, and how to compare images efficiently using Hamming distance and indexing techniques.

System Architect Go

Mar 14, 2019

Understanding Image Similarity: Image Hashing and Feature-Based Methods

Determining image similarity cannot rely on simple MD5 checks because MD5 only detects exact pixel‑level matches; any scaling, watermark, or noise changes will produce completely different hashes.

The article introduces image hashing as a way to obtain a compact representation of an image through a series of transformations. Using the Average Hash (aHash) algorithm as an example, the process includes reducing the image to 8×8 pixels, converting to grayscale, computing the average gray value, binarizing each pixel against the average, and constructing a 64‑bit hash. Other OpenCV‑supported hashes such as PHash, Marr‑Hildreth, RadialVariance, BlockMean, and ColorMoment are listed, each with different robustness and computational cost. A performance chart (included as an image) shows how various hashes resist transformations like watermarking, noise, rotation, scaling, JPEG compression, Gaussian blur, and contrast changes.

Similarity between two images can be measured by the Hamming distance of their hash values—the number of differing bits. For large‑scale datasets, the article suggests building an inverted index (e.g., Elasticsearch) and optimizing distance calculation by splitting the 64‑bit hash into eight 8‑bit blocks, reducing the number of comparisons.

Beyond global similarity, the article discusses local similarity via image features. Features correspond to low‑frequency components of an image; corner detection identifies points with abrupt intensity changes. While basic corner detectors struggle with scale changes, SIFT (Scale‑Invariant Feature Transform) addresses this, followed by SURF (a faster variant) and the free ORB algorithm (32‑dimensional descriptors). Each keypoint has a descriptor—a multi‑dimensional vector—used for matching. Matching is performed by comparing descriptor Hamming distances and counting matches above a threshold to decide if two images are similar.

In conclusion, exact pixel‑level matches represent identical images, while image hashing captures overall similarity and feature‑based methods capture local similarity, together providing a comprehensive approach to image similarity assessment.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

computer vision feature extraction image similarity Hamming distance image hashing

Written by

System Architect Go

Programming, architecture, application development, message queues, middleware, databases, containerization, big data, image processing, machine learning, AI, personal growth.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.