How ALM‑MTA Improves Multi‑Touch Attribution with Front‑Door Identification and Adversarial Modeling
The ALM‑MTA method combines front‑door causal adjustment with an adversarial proxy for the unobserved mediator, eliminating hidden confounding in multi‑touch attribution and delivering more reliable uplift estimates that boosted Kuaishou's DAU by 0.6% and AUC by 11% over SOTA baselines, as reported in an ICLR 2026 paper.
Background
In creator‑centric platforms, the “Consumption Drives Production” (CDP) pattern means that user consumption of inspiring content increases the likelihood of becoming a creator and uploading new videos. Accurate causal attribution of which consumption touchpoints trigger uploads is essential for resource allocation and incentive design. Traditional post‑hoc attribution cannot fully remove complex unobserved confounders.
Problem Formalization
Multi‑Touch Attribution (MTA) is defined as assigning a contribution score to each touchpoint in a user’s consumption sequence. Rule‑based or semantic‑path methods either achieve high coverage with low causal validity or guarantee precision with negligible coverage, and cannot answer the counterfactual question “how much would the upload probability drop if a specific touchpoint were removed?”.
Modeling Approach
Front‑Door Criterion
Conditioning on the treatment sequence T and observing a proxy mediator M′ allows the front‑door criterion to block unobserved confounders W that affect both T and the outcome Y, yielding an unbiased estimate of each touchpoint’s causal effect.
Adversarial Proxy for the Mediator
Because the true mediator M is not directly observable, an adversarially trained proxy Y′ is introduced. A discriminator tries to predict Y from Y′, while the mediator branch is penalized to suppress this predictability, preventing outcome leakage and stabilizing training.
Contrastive Learning for Overlap
In the high‑cardinality treatment space of Kuaishou, many touchpoints are sparse, violating the overlap (positivity) assumption. Contrastive learning aligns representations of similar treatment contexts, ensuring non‑zero conditional treatment probabilities across the dataset.
Key Challenges
Absence of ground‑truth attribution labels in observational data.
Multiple partially unobservable confounders (user‑level and system‑level).
Extremely large candidate touchpoint space, threatening scalability and stability.
Experimental Evaluation
Offline Results
ALM‑MTA improves daily active users (DAU) by 0.6 %, AUC by 11 % relative to the previous SOTA, and AUUC by 2.27 %.
Rule‑based methods achieve 26 % coverage with ~40 % accuracy; strict point‑wise rules achieve 100 % accuracy but only 2.7 % coverage.
Baseline models (LR, deepMTA, DML/DESCN, causalMTA) show trade‑offs: LR performs worst under observed confounding; deepMTA improves discrimination but lags in grouped AUUC; DML/DESCN suffer from imbalance between AUC and grouped AUUC.
ALM‑MTA attains the best scores on AUC, log‑loss, grouped AUUC and average AUUC, thanks to adversarial mediation and contrastive adaptation.
Ablation Study
Removing only back‑door adjustment leaves residual system confounding.
Using the raw proxy Y′ causes outcome leakage and unstable training.
Adversarial learning eliminates leakage and stabilizes convergence.
Contrastive learning mitigates sparsity and improves AUC in the high‑cardinality treatment regime.
Stability Analysis
Across random seeds (10, 100, 1000), ALM‑MTA’s uplift distribution remains highly consistent, whereas DML and DESCN exhibit large variance and causalMTA shows moderate variance reduction.
Online Deployment
In production on Kuaishou, ALM‑MTA was evaluated on supply‑side revenue, scalability, coverage, and attribution accuracy. All metrics showed significant improvements, confirming the offline findings.
Conclusion
ALM‑MTA formulates MTA as a delete‑point counterfactual uplift problem, leverages the front‑door criterion to bypass unobserved confounding, employs adversarial learning to obtain a non‑leaky mediator, and uses contrastive learning to preserve overlap in a massive treatment space. The method delivers state‑of‑the‑art performance both offline and online.
Paper: https://openreview.net/forum?id=3r68a6GOpg
Code: https://github.com/logwhistle/ALM-MTA
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Kuaishou Tech
Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
