How to Perform Fuzzy Queries on Encrypted Data
This article examines the challenges of fuzzy searching encrypted data and compares three categories of solutions—naïve, conventional, and advanced—detailing their implementation ideas, performance trade‑offs, storage costs, and security implications for real‑world applications.
How to Perform Fuzzy Queries on Encrypted Data
We often encrypt sensitive fields such as passwords, phone numbers, addresses, and bank card numbers for security, but this makes fuzzy searching difficult; this article explores practical ways to enable fuzzy queries on reversible encrypted data.
Naïve Approaches
Load all encrypted records into memory, decrypt them, and perform fuzzy matching in the application.
Create a plaintext "tag" table that maps encrypted values to their clear‑text equivalents and query the tag table.
Naïve Approach 1
Decrypting everything in memory works only for very small datasets; for larger volumes it quickly leads to out‑of‑memory errors. For example, encrypting a 11‑digit phone number with DES yields a 24‑byte ciphertext, so 1 million records would require roughly 23 GB of memory.
Naïve Approach 2
Maintaining a separate plaintext mapping table defeats the purpose of encryption and is therefore insecure and impractical.
Conventional Approaches
Implement encryption/decryption functions in the database and modify fuzzy‑search conditions to decrypt on‑the‑fly, e.g., decode(key) like '%partial%' .
Tokenise the plaintext, encrypt each token, store the encrypted tokens in an auxiliary column, and query using key like '%partial%' .
Conventional Approach 1
Decrypting in the query allows a low‑cost implementation but prevents the use of indexes and may suffer from algorithm mismatches between the application and the database. It works best with standard algorithms such as AES or DES when performance requirements are modest.
Conventional Approach 2
Split the field into fixed‑length substrings (e.g., 4 English characters or 2 Chinese characters), encrypt each substring, and store them in an extra column. Queries then use key like "%partial%" on the encrypted tokens. This method increases storage (ciphertext length grows, e.g., DES expands an 11‑byte phone number to 24 bytes, a 2.18× increase) but enables index usage.
Note that the fuzzy token length must be at least 4 English characters or 2 Chinese characters; shorter tokens cause excessive token explosion and higher storage costs.
Advanced ("Super‑God") Approaches
These solutions involve designing new algorithms that preserve order and allow direct fuzzy matching on ciphertext without excessive length growth. Examples include Hill cipher‑based schemes, FMES, Bloom‑filter‑enhanced encrypted search, and Lucene‑based encrypted indexing. Such methods typically require deep cryptographic expertise and custom implementation.
Summary
Naïve methods are simple but insecure or unscalable; conventional methods—especially the token‑based approach—offer a practical balance of security, performance, and storage cost; advanced cryptographic techniques provide the best theoretical guarantees but demand specialized knowledge.
Overall, for most projects the token‑based conventional approach (Approach 2) is the most recommended solution.
Last updated: 2024-04-23 09:48:15 Original link: https://ningyu1.github.io/20201230/encrypted-data-fuzzy-query.html
Java Captain
Focused on Java technologies: SSM, the Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading; occasionally covers DevOps tools like Jenkins, Nexus, Docker, ELK; shares practical tech insights and is dedicated to full‑stack Java development.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.