Information Security 11 min read

How to Perform Fuzzy Queries on Encrypted Data: Approaches and Trade‑offs

This article examines why encrypted data is unfriendly to fuzzy search, categorises three implementation strategies—naïve, conventional, and advanced—analyses their advantages and disadvantages, and provides practical guidance and reference links for securely enabling fuzzy queries on encrypted fields.

Architecture Digest

Aug 6, 2024

How to Perform Fuzzy Queries on Encrypted Data: Approaches and Trade‑offs

Encrypted data is not naturally compatible with fuzzy search; this article explores the problem and presents three categories of solutions.

Naïve ("Silly") Approaches

Load all encrypted records into memory, decrypt them, and perform fuzzy matching in application code.

Create a plaintext mapping table (tag table) for the ciphertext and query the tags.

These methods work only for very small datasets. For example, encrypting the phone number 13800138000 with DES yields HE9T75xNx6c5yLmS5l4r6Q==, which occupies 24 bytes. Storing millions of such records can quickly consume hundreds of megabytes to several gigabytes of RAM, leading to out‑of‑memory failures.

Conventional Approaches

Implement encryption/decryption functions in the database and modify fuzzy‑search conditions to decrypt before matching, e.g., decode(key) LIKE '%partial%'.

Tokenise the plaintext, encrypt each token, store them in an auxiliary column, and query using key LIKE '%partial%'.

The first method is easy to adopt but cannot leverage indexes and may suffer from algorithm mismatches between application and database. The second method adds storage overhead (encrypted tokens are larger than plaintext) but allows index usage and is generally recommended for most scenarios.

Advanced ("Super‑God") Approaches

These solutions involve algorithmic research, such as designing new reversible encryption schemes that preserve order or using specialized structures like Bloom filters. References include Hill‑cipher based fuzzy encryption (FMES), Bloom‑filter‑enhanced searchable encryption, and Lucene‑based encrypted search.

While offering the best security‑performance balance, they require deep expertise and custom implementation.

Practical Recommendations

For most projects, the second conventional method (tokenisation + encrypted auxiliary column) provides a good trade‑off between security, storage cost, and query performance. If the organization has dedicated cryptography talent, exploring advanced schemes may be worthwhile.

Reference Links

Taobao encrypted field search: https://open.taobao.com/docV3.htm?docId=106213&docType=1

Alibaba encrypted field search: https://jaq-doc.alibaba.com/docs/doc.htm?treeId=1&articleId=106213&docType=1

Pinduoduo encrypted field search: https://open.pinduoduo.com/application/document/browse?idStr=3407B605226E77F2

JD encrypted field search: https://jos.jd.com/commondoc?listId=345

Database fuzzy‑search encryption methods: https://www.jiamisoft.com/blog/6542-zifushujumohupipeijiamifangfa.html

Bloom‑filter based searchable encryption: http://kzyjc.cnjournals.com/html/2019/1/20190112.htm

Lucene‑based encrypted fuzzy search: https://www.cnblogs.com/arthurqin/p/6307153.html

In summary, avoid naïve approaches, prefer the token‑based conventional method for most use‑cases, and consider advanced algorithmic solutions only when you have the necessary expertise.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

fuzzy-search Information Security algorithm design Database Query encrypted data

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.