Exploring Model Dynamics for Accumulative Poisoning Detection
The paper, a joint effort by Alibaba Mama and HKBU TMLR, shows that monitoring model dynamics—specifically a newly defined memorization‑discrepancy metric—can reveal hidden accumulative poisoning attacks in online advertising streams, and introduces a discrepancy‑aware correction algorithm that consistently outperforms existing defenses across benchmark datasets.
We present a collaborative work between Alibaba Mama advertising algorithm team and HKBU TMLR Group that investigates how to train models under noisy signals in complex advertising scenarios. The study was published at ICML 2023.
Abstract
Recent research shows that poisoning attacks pose a serious threat to machine learning. Unlike traditional line‑based poisoning, accumulative poisoning attacks simulate real‑time data streams and can hide malicious samples in the first stage, then amplify their effect in a second stage, drastically degrading model performance.
Existing defenses based on static data or adversarial training struggle with this setting. We explore whether model dynamics can reveal useful signals for detecting such hidden attacks.
First to study poisoning detection from the perspective of model dynamics.
Propose a new information metric, memorization discrepancy, to distinguish invisible poisoned samples.
Develop a defense learning algorithm based on this metric.
We share the ICML 2023 paper “Exploring Model Dynamics for Accumulative Poisoning Discovery”.
1. Rewinding Historical Models
We pre‑train a model to a burn‑in phase, then simulate online data streams, monitoring performance changes. Because accumulative poisoned samples are optimized to be indistinguishable, data‑level differences are minimal, making detection difficult without prior knowledge.
1.1 From Data‑Level to Model‑Dynamic Viewpoint
Historical model outputs provide multi‑dimensional information that can help differentiate clean and poisoned samples.
2. New Metric: Memorization Discrepancy
We define memorization discrepancy as the KL‑divergence between the current model’s output and that of a historical model on the same batch, capturing dynamic changes caused by poisoning.
2.2 Memorization Discrepancy
The metric quantifies the information gap between current and past model predictions, highlighting samples whose behavior deviates unusually due to poisoning.
3. Discrepancy‑aware Sample Correction (DSC)
Using the memorization discrepancy, we correct suspicious samples during online training, limiting them to a safe range while preserving clean data performance.
3.2 Overall Algorithm Flow
For each incoming batch, we compute the discrepancy with a historical model, decide if the sample is safe, otherwise apply adversarial perturbation to bring it into a safe region before training.
4. Experiments and Discussion
We evaluate on three benchmark classification datasets, comparing standard training, gradient clipping, adversarial training, and our DSC method. Results show consistent improvements.
Ablation studies further analyze the impact of different components.
5. Conclusion and Outlook
We demonstrate that model dynamics provide valuable cues for detecting accumulative poisoning in online streams. Future work includes handling out‑of‑distribution samples, noise, and distribution drift.
Alimama Tech
Official Alimama tech channel, showcasing all of Alimama's technical innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.