Artificial Intelligence 5 min read

Paired Data Based A/B Experiments: Causal Inference in Network Experiments

The DataFun Data Science Summit on May 25 will feature Tencent data scientist Li Yilin presenting a comprehensive overview of paired‑data A/B experiments, covering causal inference challenges, unbiased estimators under various randomization designs, theoretical analysis, and practical insights for network‑based online experiments.

DataFunTalk
DataFunTalk
DataFunTalk
Paired Data Based A/B Experiments: Causal Inference in Network Experiments

On May 25, the DataFun‑produced Data Science Summit will bring together eight experts and producers to share the latest practices in data science, with a live broadcast and QR‑code registration for interested participants.

One of the featured speakers is Li Yilin, a data scientist at Tencent. Li is a Ph.D. candidate in Statistics at Peking University, focusing on causal inference, especially in the presence of interference, and observational data analysis. He works on the WeChat experiment platform, and his research has been published in venues such as Biometrics, ACM/IMS Journal of Data Science, and ICML.

Talk Title: Paired Data Based A/B Experiments

Talk Outline: Paired data, a unique data type describing interactions between two entities, enables deeper analysis of complex relationships in fields ranging from international trade to social network communication. With the rise of big data, interest in causal inference for paired data has grown, yet methodological research remains scarce. Traditional causal inference assumes the Stable Unit Treatment Value Assumption (SUTVA), which often fails in networked settings due to interference, leading to biased estimates of global average treatment effects. By incorporating paired outcomes into randomized experiments—where subjects are assigned to treatment or control—we encounter scenarios common in online A/B testing (e.g., message forwarding, link sharing). A novel paired interference assumption is introduced, and it is shown that unbiased global average treatment effect estimators based on unit‑level outcomes generally do not exist under heterogeneity. Leveraging the structure of paired data, we design unbiased estimators for the global causal effect and prove they are unbiased under various randomization schemes (Bernoulli, complete, and cluster randomization). Comprehensive theoretical analysis covers convergence rates, connections to network structure, and asymptotic normality via Stein’s method. Confidence interval construction for Bernoulli randomization and associated statistical inference methods are also provided. Extensive numerical experiments validate the estimators’ accuracy and demonstrate their application to large‑scale online randomized controlled trials.

Audience Benefits:

Understanding the methods available for estimating global causal effects in network experiments.

Learning what paired data analysis entails.

Grasping how to conduct A/B experiments and causal inference with paired data, including the underlying theory and existing challenges.

statisticsA/B Testingcausal inferenceonline experimentsnetwork experimentspaired data
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.