Information Security 19 min read

HG‑LDP: A Memory‑Efficient Framework for High‑Frequency Item Statistics under Local Differential Privacy

This article introduces the HG‑LDP framework, which combines local differential privacy with the HeavyGuardian data structure to enable accurate, privacy‑preserving high‑frequency item statistics on streaming data while using only limited memory, and evaluates four algorithmic variants (BGR, DSR, BDR, CNR) through extensive experiments on synthetic and real‑world datasets.

DataFunSummit
DataFunSummit
DataFunSummit
HG‑LDP: A Memory‑Efficient Framework for High‑Frequency Item Statistics under Local Differential Privacy

The paper addresses the challenge of performing high‑frequency item statistics on streaming data under strict memory constraints while preserving user privacy using local differential privacy (LDP). Existing LDP solutions either require storing the full data domain or suffer from high variance when the domain is large.

To meet the requirements of high processing speed, limited memory, privacy, and accuracy, the authors propose the HG‑LDP framework, which consists of three modules: a randomization module implementing LDP mechanisms, a storage module based on the HeavyGuardian data structure that maintains a fixed‑size list of top‑k items, and a response module that answers queries in real time.

Four algorithmic instantiations are presented:

BGR : directly applies the GRR mechanism to all items before feeding them to HeavyGuardian.

DSR : reduces the randomization domain to the current top‑k items plus a placeholder, decreasing variance for large domains.

BDR : splits the privacy budget between a high‑frequency check and a fallback randomization for cold items, adapting to list changes.

CNR : utilizes the light part of HeavyGuardian to store randomized cold items, allowing more efficient use of the privacy budget.

Extensive experiments on synthetic data and two real‑world datasets (e.g., Amazon product clicks, video streaming logs) compare the proposed methods against non‑private HeavyGuardian and baseline LDP mechanisms (GRR, HR). Results show that the new algorithms achieve higher accuracy than GRR, and in some cases surpass the non‑private baseline, while reducing memory consumption by orders of magnitude.

Further analyses examine the impact of algorithm parameters (privacy‑budget split, warm‑up accuracy) on accuracy and memory usage, demonstrating that proper tuning can balance bias introduced by HeavyGuardian and variance from LDP randomization.

The authors conclude with deployment recommendations: use BGR for small domains, DSR for large but stable domains, and BDR or CNR for large, highly dynamic domains, noting that CNR offers higher accuracy at the cost of slightly more memory.

streaming dataMemory EfficiencyHeavyGuardianHigh-Frequency ItemLocal Differential PrivacyPrivacy-Preserving Algorithms
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.