Artificial Intelligence 13 min read

How ALM‑MTA Improves Multi‑Touch Attribution with Front‑Door Identification and Adversarial Modeling

The ALM‑MTA method combines front‑door causal adjustment with an adversarial proxy for the unobserved mediator, eliminating hidden confounding in multi‑touch attribution and delivering more reliable uplift estimates that boosted Kuaishou's DAU by 0.6% and AUC by 11% over SOTA baselines, as reported in an ICLR 2026 paper.

Kuaishou Tech

May 18, 2026

How ALM‑MTA Improves Multi‑Touch Attribution with Front‑Door Identification and Adversarial Modeling

Background

In creator‑centric platforms, the “Consumption Drives Production” (CDP) pattern means that user consumption of inspiring content increases the likelihood of becoming a creator and uploading new videos. Accurate causal attribution of which consumption touchpoints trigger uploads is essential for resource allocation and incentive design. Traditional post‑hoc attribution cannot fully remove complex unobserved confounders.

Problem Formalization

Multi‑Touch Attribution (MTA) is defined as assigning a contribution score to each touchpoint in a user’s consumption sequence. Rule‑based or semantic‑path methods either achieve high coverage with low causal validity or guarantee precision with negligible coverage, and cannot answer the counterfactual question “how much would the upload probability drop if a specific touchpoint were removed?”.

Modeling Approach

Front‑Door Criterion

Conditioning on the treatment sequence T and observing a proxy mediator M′ allows the front‑door criterion to block unobserved confounders W that affect both T and the outcome Y, yielding an unbiased estimate of each touchpoint’s causal effect.

Adversarial Proxy for the Mediator

Because the true mediator M is not directly observable, an adversarially trained proxy Y′ is introduced. A discriminator tries to predict Y from Y′, while the mediator branch is penalized to suppress this predictability, preventing outcome leakage and stabilizing training.

Contrastive Learning for Overlap

In the high‑cardinality treatment space of Kuaishou, many touchpoints are sparse, violating the overlap (positivity) assumption. Contrastive learning aligns representations of similar treatment contexts, ensuring non‑zero conditional treatment probabilities across the dataset.

Key Challenges

Absence of ground‑truth attribution labels in observational data.

Multiple partially unobservable confounders (user‑level and system‑level).

Extremely large candidate touchpoint space, threatening scalability and stability.

Experimental Evaluation

Offline Results

ALM‑MTA improves daily active users (DAU) by 0.6 %, AUC by 11 % relative to the previous SOTA, and AUUC by 2.27 %.

Rule‑based methods achieve 26 % coverage with ~40 % accuracy; strict point‑wise rules achieve 100 % accuracy but only 2.7 % coverage.

Baseline models (LR, deepMTA, DML/DESCN, causalMTA) show trade‑offs: LR performs worst under observed confounding; deepMTA improves discrimination but lags in grouped AUUC; DML/DESCN suffer from imbalance between AUC and grouped AUUC.

ALM‑MTA attains the best scores on AUC, log‑loss, grouped AUUC and average AUUC, thanks to adversarial mediation and contrastive adaptation.

Ablation Study

Removing only back‑door adjustment leaves residual system confounding.

Using the raw proxy Y′ causes outcome leakage and unstable training.

Adversarial learning eliminates leakage and stabilizes convergence.

Contrastive learning mitigates sparsity and improves AUC in the high‑cardinality treatment regime.

Stability Analysis

Across random seeds (10, 100, 1000), ALM‑MTA’s uplift distribution remains highly consistent, whereas DML and DESCN exhibit large variance and causalMTA shows moderate variance reduction.

Online Deployment

In production on Kuaishou, ALM‑MTA was evaluated on supply‑side revenue, scalability, coverage, and attribution accuracy. All metrics showed significant improvements, confirming the offline findings.

Conclusion

ALM‑MTA formulates MTA as a delete‑point counterfactual uplift problem, leverages the front‑door criterion to bypass unobserved confounding, employs adversarial learning to obtain a non‑leaky mediator, and uses contrastive learning to preserve overlap in a massive treatment space. The method delivers state‑of‑the‑art performance both offline and online.

Paper: https://openreview.net/forum?id=3r68a6GOpg

Code: https://github.com/logwhistle/ALM-MTA

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

recommendation system AI research adversarial learning causal attribution front-door multi-touch

Written by

Kuaishou Tech

Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.