Fundamentals 15 min read

Building Popper: Tubi’s Scalable Experimentation Platform

Tubi’s Popper platform combines a Scala‑based experiment engine, reproducible JSON‑stored configurations, a React UI, and data pipelines using Spark and Akka to enable fast, cross‑team A/B testing, automated analysis, health checks, and data‑driven decision making across mobile and OTT services.

DataFunTalk
DataFunTalk
DataFunTalk
Building Popper: Tubi’s Scalable Experimentation Platform

At Tubi, every team—from feature development to ML—relies heavily on experiments to guide decisions, and experiment velocity has grown 18‑fold over three years, with a third of ML experiments positively impacting company KPIs.

The experiment system, built collaboratively across teams, consists of three main components: the Popper experiment engine, a UI that lowers the barrier to experimentation, and automated analysis and QA methods.

Popper Experiment Engine

The engine is a backend service written in Scala on the Akka framework, representing the second iteration of a system that learned from an earlier Facebook‑based PlanOut implementation. It is named after Karl Popper to emphasize falsifiability.

In so far as a scientific statement speaks about reality, it must be falsifiable Karl Popper

Core Concepts

The top‑level concept is a Namespace, representing a set of mutually exclusive experiments, allowing parallel experiments without coordination across teams.

Each Namespace hashes experiment targets (device, user, IP, etc.) into configurable Segments, which are assigned to at most one experiment, then to Treatment Groups (control and variant). Experiments can have multiple Phases with varying allocation percentages, and conditional rules can further refine segment usage.

Built‑in Reproducibility

Experiments and namespaces are stored as JSON in a Git repository, with an append‑only sequence tracking all CRUD operations, ensuring deterministic user hashing across service restarts without database‑stored segment assignments.

This persistence simplifies architecture, eliminates node coordination, and makes deployments low‑risk; Tubi has not experienced a failed deployment.

Making Experiments Accessible

Popper abstracts away low‑level details such as JSON schema and coordination steps, allowing non‑experts to create and run experiments easily.

Start and end dates are specified in the configuration, enabling independent deployment of new configs and experiment activation.

While the core segmentation logic is stateless, a database stores coverage for development and testing devices, aiding QA.

The React UI guides users through the entire workflow, provides a filtered calendar view, and records publication decisions.

Decision Support

Clients fetch configurations from Popper to decide which code branch to run; each experiment exposure generates an event processed by Spark Streaming (on Databricks) and Akka Streams, both written in Scala.

Combining exposure data with engagement metrics yields key indicators such as watch time, conversion, and retention, segmented by platform or content type.

Statistical significance is assessed using CUPED for variance reduction and the Benjamini‑Hochberg procedure to control false discovery rate.

All metrics are automatically calculated and displayed on a BI dashboard, with a subset designated as “North Star” metrics that drive release decisions, and a “no‑harm” rule that blocks releases harming any North Star metric.

Experiment Health Checks

Popper includes a validation system that surfaces problematic experiments before they affect decisions.

Common failure modes include uneven group sizes, cross‑experiment interference, and biased exposure due to client bugs; pre‑experiment t‑stat checks flag dangerous signals.

Lessons Learned

Self‑service is key to speed

By making experiment configuration the only required change for most model or feature updates, Tubi increased ML experiment throughput five‑fold.

Ask important questions

The platform focuses on identifying statistically significant signals that impact core KPIs, encouraging teams to prioritize high‑value ideas.

Embrace cross‑platform consistency

Experiments run across 20 platforms (Scala, Elixir, Kotlin, Swift, Typescript) are unified under Popper, ensuring consistent language and analysis.

Conclusion

The investment in Popper—from the engine to data pipelines and analysis dashboards—has dramatically boosted productivity and decision quality across Tubi, enabling non‑experts to run experiments, health checks to maintain trust, and North Star metrics to align experimentation with business goals.

If you are passionate about building better experimentation tools, consider joining Tubi.

metricsA/B testingstatistical analysisAkkadata pipelinesScalaExperimentation platform
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.