Information Security 10 min read

How to Perform Fuzzy Queries on Encrypted Data

This article examines the challenges of fuzzy searching encrypted data and compares three categories of solutions—naïve, conventional, and advanced—detailing their implementation ideas, performance trade‑offs, storage costs, and security implications for real‑world applications.

Java Captain

Jul 29, 2024

How to Perform Fuzzy Queries on Encrypted Data

We often encrypt sensitive fields such as passwords, phone numbers, addresses, and bank card numbers for security, but this makes fuzzy searching difficult; this article explores practical ways to enable fuzzy queries on reversible encrypted data.

Naïve Approaches

Load all encrypted records into memory, decrypt them, and perform fuzzy matching in the application.

Create a plaintext "tag" table that maps encrypted values to their clear‑text equivalents and query the tag table.

Naïve Approach 1

Decrypting everything in memory works only for very small datasets; for larger volumes it quickly leads to out‑of‑memory errors. For example, encrypting a 11‑digit phone number with DES yields a 24‑byte ciphertext, so 1 million records would require roughly 23 GB of memory.

Naïve Approach 2

Maintaining a separate plaintext mapping table defeats the purpose of encryption and is therefore insecure and impractical.

Conventional Approaches

Implement encryption/decryption functions in the database and modify fuzzy‑search conditions to decrypt on‑the‑fly, e.g., decode(key) like '%partial%'.

Tokenise the plaintext, encrypt each token, store the encrypted tokens in an auxiliary column, and query using key like '%partial%'.

Conventional Approach 1

Decrypting in the query allows a low‑cost implementation but prevents the use of indexes and may suffer from algorithm mismatches between the application and the database. It works best with standard algorithms such as AES or DES when performance requirements are modest.

Conventional Approach 2

Split the field into fixed‑length substrings (e.g., 4 English characters or 2 Chinese characters), encrypt each substring, and store them in an extra column. Queries then use key like "%partial%" on the encrypted tokens. This method increases storage (ciphertext length grows, e.g., DES expands an 11‑byte phone number to 24 bytes, a 2.18× increase) but enables index usage.

Note that the fuzzy token length must be at least 4 English characters or 2 Chinese characters; shorter tokens cause excessive token explosion and higher storage costs.

Advanced ("Super‑God") Approaches

These solutions involve designing new algorithms that preserve order and allow direct fuzzy matching on ciphertext without excessive length growth. Examples include Hill cipher‑based schemes, FMES, Bloom‑filter‑enhanced encrypted search, and Lucene‑based encrypted indexing. Such methods typically require deep cryptographic expertise and custom implementation.

Summary

Naïve methods are simple but insecure or unscalable; conventional methods—especially the token‑based approach—offer a practical balance of security, performance, and storage cost; advanced cryptographic techniques provide the best theoretical guarantees but demand specialized knowledge.

Overall, for most projects the token‑based conventional approach (Approach 2) is the most recommended solution.

Last updated: 2024-04-23 09:48:15 Original link: https://ningyu1.github.io/20201230/encrypted-data-fuzzy-query.html

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

algorithm query optimization fuzzy-search encryption Information Security

Written by

Java Captain

Focused on Java technologies: SSM, the Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading; occasionally covers DevOps tools like Jenkins, Nexus, Docker, ELK; shares practical tech insights and is dedicated to full‑stack Java development.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.