Databases 12 min read

How to Perform Fuzzy Search on Encrypted Data

This article examines various techniques for enabling fuzzy queries on encrypted fields, comparing naive memory‑based methods, conventional token‑based approaches that leverage database indexes, and advanced cryptographic schemes, and recommends practical solutions for real‑world applications.

Top Architect
Top Architect
Top Architect
How to Perform Fuzzy Search on Encrypted Data

In the previous article we discussed data security and the difficulty of fuzzy searching encrypted data; this article explores implementation ideas for fuzzy queries on encrypted fields.

Encrypted sensitive data such as passwords, phone numbers, addresses, and credit‑card information are stored using reversible or irreversible encryption; while exact‑match queries are straightforward, fuzzy search requires special handling.

The author classifies three types of approaches:

“Silly” methods: loading all data into memory for decryption and matching, or maintaining a plaintext tag table; these are only feasible for very small datasets and consume excessive memory.

Conventional methods: using database decryption functions in the WHERE clause, or tokenizing the plaintext, encrypting each token and storing them in auxiliary columns, then performing LIKE queries on the encrypted tokens.

Advanced (“god‑level”) methods: designing new algorithms such as order‑preserving encryption, Bloom‑filter based schemes, or other research‑grade techniques that allow fuzzy matching without revealing plaintext.

Examples are given using DES encryption where the plaintext 13800138000 becomes the ciphertext HE9T75xNx6c5yLmS5l4r6Q== , illustrating the storage overhead (24 bytes vs 11 bytes).

The conventional token‑based method is recommended as a balanced solution: it incurs additional storage for encrypted tokens but can leverage database indexes for efficient fuzzy search, especially when the search token length is at least four English characters or two Chinese characters.

Finally, the article concludes that “silly” approaches should be avoided, conventional tokenization is the most practical for most scenarios, and advanced cryptographic schemes are worth exploring when high security and performance are both critical.

Query Optimizationfuzzy searchinformation securityDatabase EncryptionEncrypted Data
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.