Artificial Intelligence 13 min read

Exploring Model Dynamics for Accumulative Poisoning Detection

The paper, a joint effort by Alibaba Mama and HKBU TMLR, shows that monitoring model dynamics—specifically a newly defined memorization‑discrepancy metric—can reveal hidden accumulative poisoning attacks in online advertising streams, and introduces a discrepancy‑aware correction algorithm that consistently outperforms existing defenses across benchmark datasets.

Alimama Tech

Sep 20, 2023

Exploring Model Dynamics for Accumulative Poisoning Detection

We present a collaborative work between Alibaba Mama advertising algorithm team and HKBU TMLR Group that investigates how to train models under noisy signals in complex advertising scenarios. The study was published at ICML 2023.

Abstract

Recent research shows that poisoning attacks pose a serious threat to machine learning. Unlike traditional line‑based poisoning, accumulative poisoning attacks simulate real‑time data streams and can hide malicious samples in the first stage, then amplify their effect in a second stage, drastically degrading model performance.

Existing defenses based on static data or adversarial training struggle with this setting. We explore whether model dynamics can reveal useful signals for detecting such hidden attacks.

First to study poisoning detection from the perspective of model dynamics.

Propose a new information metric, memorization discrepancy, to distinguish invisible poisoned samples.

Develop a defense learning algorithm based on this metric.

We share the ICML 2023 paper “Exploring Model Dynamics for Accumulative Poisoning Discovery”.

1. Rewinding Historical Models

We pre‑train a model to a burn‑in phase, then simulate online data streams, monitoring performance changes. Because accumulative poisoned samples are optimized to be indistinguishable, data‑level differences are minimal, making detection difficult without prior knowledge.

1.1 From Data‑Level to Model‑Dynamic Viewpoint

Historical model outputs provide multi‑dimensional information that can help differentiate clean and poisoned samples.

2. New Metric: Memorization Discrepancy

We define memorization discrepancy as the KL‑divergence between the current model’s output and that of a historical model on the same batch, capturing dynamic changes caused by poisoning.

2.2 Memorization Discrepancy

The metric quantifies the information gap between current and past model predictions, highlighting samples whose behavior deviates unusually due to poisoning.

3. Discrepancy‑aware Sample Correction (DSC)

Using the memorization discrepancy, we correct suspicious samples during online training, limiting them to a safe range while preserving clean data performance.

3.2 Overall Algorithm Flow

For each incoming batch, we compute the discrepancy with a historical model, decide if the sample is safe, otherwise apply adversarial perturbation to bring it into a safe region before training.

4. Experiments and Discussion

We evaluate on three benchmark classification datasets, comparing standard training, gradient clipping, adversarial training, and our DSC method. Results show consistent improvements.

Ablation studies further analyze the impact of different components.

5. Conclusion and Outlook

We demonstrate that model dynamics provide valuable cues for detecting accumulative poisoning in online streams. Future work includes handling out‑of‑distribution samples, noise, and distribution drift.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Online Learning defense algorithms Machine Learning Security model dynamics poisoning attacks

Written by

Alimama Tech

Official Alimama tech channel, showcasing all of Alimama's technical innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.