Information Security 10 min read

Approaches to Fuzzy Search on Encrypted Data: From Naïve to Advanced Methods

This article reviews three categories of techniques for performing fuzzy queries on encrypted data—naïve (memory‑based), conventional (database‑level encryption or tokenization), and advanced algorithmic solutions—detailing their implementation ideas, advantages, drawbacks, and practical recommendations.

Architecture Digest

Mar 12, 2024

Approaches to Fuzzy Search on Encrypted Data: From Naïve to Advanced Methods

Encrypted data is widely used to protect sensitive information such as passwords, phone numbers, and credit card details, but traditional fuzzy search does not work directly on ciphertext. This article explores how to enable fuzzy queries on reversible encrypted fields.

Three broad categories of solutions are presented:

Naïve approaches (dubbed "沙雕做法"): loading all data into memory for decryption and fuzzy matching, or maintaining a plaintext tag table alongside ciphertext. These are simple but scale poorly and compromise security.

Conventional approaches ("常规做法"): implementing encryption/decryption functions in the database and modifying fuzzy conditions to decrypt before matching, or tokenizing the plaintext into fixed‑length substrings, encrypting each token, and storing them in auxiliary columns for indexed LIKE searches. These balance implementation effort, security, and performance, though they may increase storage due to ciphertext expansion.

Advanced approaches ("超神做法"): designing new algorithms (e.g., order‑preserving encryption, Bloom‑filter based schemes, FMES, Hill cipher variants) that allow fuzzy matching directly on ciphertext without excessive length growth. These require deep cryptographic expertise and custom integration.

Examples illustrate the storage impact: using DES, the plaintext 13800138000 (11 bytes) becomes the ciphertext HE9T75xNx6c5yLmS5l4r6Q== (24 bytes), a 2.18× increase. Tokenization typically groups four English characters or two Chinese characters, e.g., the string ningyu1 is split into ning, ingy, ngyu, gyu1.

The article also lists public references from major e‑commerce platforms (Taobao, Alibaba, Pinduoduo, JD) and academic papers describing similar techniques, including Bloom‑filter based fuzzy search and Lucene‑based encrypted indexing.

Conclusion : Naïve methods are discouraged for production use; conventional token‑based schemes are recommended for most scenarios due to their simplicity and acceptable performance; advanced algorithmic solutions are suitable when specialized security or performance requirements justify the additional research effort.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

fuzzy-search

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.