Artificial Intelligence 13 min read

KPI Forecasting and Anomaly Detection at Tubi Using Prophet

This article describes how Tubi’s data science team built a robust KPI forecasting system with Facebook’s Prophet, covering visualization dashboards, anomaly detection, feature engineering, PySpark deployment, and evaluation using Brier scores to improve business decision‑making.

Bitu Technology

Aug 28, 2020

KPI Forecasting and Anomaly Detection at Tubi Using Prophet

KPI (Key Performance Indicator) forecasting is critical for Tubi’s business, and the data science team developed a robust system that supports budgeting, target setting, and anomaly detection. The article outlines the visualization strategy and the use of Prophet (a Facebook‑open‑source time‑series forecasting framework) as the core modeling tool.

The forecasting dashboard visualizes monthly total watch time with orange bars for predicted values, black points for a 14‑day moving average, gray confidence intervals, and horizontal lines for target benchmarks. Daily predictions are updated with the latest data, and cumulative monthly forecasts are compared against actuals to monitor confidence interval convergence and trigger model re‑evaluation when necessary.

Prophet leverages a statistical backend (STAN) that decomposes time series using a generalized additive model (GAM) with trend, seasonal, and holiday components. The team added cross‑validation, diagnostic tools, and custom error metrics such as monthly percentage error, alongside RMSE and MAPE.

Hyper‑parameter optimization was performed using scikit‑learn’s ParameterGrid and parallelized with joblib, though the authors caution against over‑reliance on exhaustive grid search due to over‑fitting risks.

Feature engineering includes external regressors (e.g., sales team headcount) and holiday effects to capture non‑periodic events that significantly impact the series. Proper domain knowledge is emphasized for selecting appropriate seasonalities and regressors.

Model deployment is handled in PySpark, scheduled via Airflow, with outputs stored as Parquet files in AWS S3 and queried through Redshift Spectrum. DBT is used to transform and load data into the data warehouse for dashboard consumption.

Model evaluation involves back‑testing and scoring using the Brier score, which accounts for both probability calibration and confidence intervals. Transparent reporting of scores builds stakeholder trust and enables timely alerts when performance degrades.

The article concludes with lessons learned: set realistic expectations, repeatedly communicate model value, and educate stakeholders to foster collaboration and continuous improvement.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Anomaly Detection time series forecasting KPI Prophet Brier score PySpark

Written by

Bitu Technology

Bitu Technology is the registered company of Tubi's China team. We are engineers passionate about leveraging advanced technology to improve lives, and we hope to use this channel to connect and advance together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.