Information Security 10 min read

Approaches to Fuzzy Search on Encrypted Data

This article examines why encrypted data hinders fuzzy queries and compares three categories of solutions—naïve in‑memory decryption, conventional database‑level techniques, and advanced algorithmic methods—highlighting their trade‑offs in security, performance, and storage overhead.

Architecture Digest
Architecture Digest
Architecture Digest
Approaches to Fuzzy Search on Encrypted Data

We know encrypted data is not friendly to fuzzy queries; this article explores implementation ideas for performing fuzzy search on encrypted data.

How to Perform Fuzzy Search on Encrypted Data

Three categories of approaches are presented: "silly", "conventional", and "advanced".

Silly Approaches

Load all data into memory, decrypt it, then apply fuzzy matching algorithms.

Create a plaintext tag table that maps to ciphertext and query the tag table for fuzzy matches.

These methods work only for very small datasets; with larger volumes they quickly cause Out of memory errors. A table in the original article shows memory consumption ranging from 22 MB for 1 million records to over 2 GB for 100 million records.

Conventional Approaches

Two widely used techniques are described:

Implement decryption functions directly in the database and use queries such as decode(key) like '%partial%' .

Tokenize the plaintext, encrypt each token, store the encrypted tokens in an auxiliary column, and query with key like '%partial%' .

Method 1 is easy to implement but cannot leverage database indexes, leading to poor performance. Method 2 adds storage overhead (encrypted tokens increase data size) but enables index‑based searches. Common reversible algorithms like AES and DES are mentioned as suitable choices.

Advanced (Super‑God) Approaches

These solutions involve deeper algorithmic research, such as designing new reversible encryption schemes, Bloom‑filter‑based searchable encryption, and adaptations of the Hill cipher ( Hill ) and FMES. Papers on "encrypted fuzzy search using BloomFilter", "cloud‑based verifiable fuzzy query encryption", and "Lucene‑based encrypted search" are referenced. The Lucene‑style method resembles the conventional token‑encrypt approach but differs in storage backend (relational DB vs. Elasticsearch).

Conclusion

Silly methods are discouraged due to security and scalability issues. The conventional token‑encrypt method (method 2) offers a good balance of implementation cost, storage overhead, and query performance, and is recommended for most practical scenarios. Advanced algorithmic approaches may be considered when specialized expertise is available.

PerformanceDatabasefuzzy searchInformation SecurityEncrypted Data
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.