Information Security 10 min read

Approaches to Fuzzy Query on Encrypted Data

This article examines why encrypted data is unfriendly to fuzzy search, categorizes three implementation strategies—naïve, conventional, and advanced—analyzes their advantages and drawbacks, and recommends practical solutions for secure yet searchable encrypted fields.

Sohu Tech Products
Sohu Tech Products
Sohu Tech Products
Approaches to Fuzzy Query on Encrypted Data

For data security, developers often encrypt sensitive fields such as passwords, phone numbers, addresses, and credit‑card information, but encrypted values are not directly amenable to fuzzy matching. This article explores how to enable fuzzy queries on reversible encrypted data.

How to Perform Fuzzy Queries on Encrypted Data

The author classifies existing solutions into three groups:

Naïve approaches (dubbed "silly" methods)

Conventional approaches (widely used, balancing performance and storage)

Advanced approaches (algorithm‑level, high‑end solutions)

Naïve Approaches

1. Load all records into memory, decrypt them, and perform fuzzy matching in application code.

2. Create a plaintext mapping table (a "tag" table) and query the tags to locate the encrypted records.

These methods work only for very small datasets; with larger volumes they cause excessive memory consumption. For example, encrypting the phone number 13800138000 with DES yields a 24‑byte ciphertext HE9T75xNx6c5yLmS5l4r6Q== , so storing many such rows can quickly exhaust memory.

Conventional Approaches

1. Implement encryption/decryption functions inside the database and modify fuzzy‑search conditions to decrypt on the fly, e.g., decode(key) like '%partial%' . This is easy to adopt but cannot leverage indexes and may suffer from algorithm mismatches between application and DB.

2. Tokenize the plaintext into fixed‑length substrings, encrypt each token, and store them in an auxiliary column. Queries then use key like '%partial%' on the encrypted tokens. This method increases storage (ciphertext expands, e.g., 11‑byte plaintext becomes 24‑byte ciphertext, a 2.18× growth) but allows index usage.

Both conventional methods are suitable when security requirements are moderate and query performance is not critical.

Advanced Approaches

These involve designing new algorithms that preserve order and enable fuzzy matching without excessive ciphertext growth, such as order‑preserving encryption, Bloom‑filter‑based schemes, or specialized cryptographic constructions like FMES. They typically require deep cryptographic expertise and are referenced in academic papers and technical blogs.

Examples of external references include:

Alibaba, Taobao, Pinduoduo, and JD.com encrypted field search specifications.

Research on Bloom‑filter‑enhanced encrypted fuzzy search.

Lucene‑based encrypted search implementations.

Conclusion

The naïve methods are discouraged except for tiny datasets. Conventional approach 2 (tokenization + encryption) offers a good trade‑off between security, storage cost, and query performance and is strongly recommended. Advanced algorithmic solutions are worthwhile only when specialized expertise is available.

Databasefuzzy searchEncryptionInformation SecurityData Privacy
Sohu Tech Products
Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.