Operations 18 min read

Pinterest Performance Plan: Real‑User Monitoring, Regression Detection, and Alerting

Pinterest’s performance program details how the team defines custom Pinner Wait Time metrics, uses real‑user monitoring and fine‑grained alerts to detect regressions quickly, and follows structured root‑cause analysis and ownership processes to prevent performance degradation across web surfaces.

FunTester
FunTester
FunTester
Pinterest Performance Plan: Real‑User Monitoring, Regression Detection, and Alerting

For many years Pinterest has treated detection, prevention, and remediation of performance degradation as a core practice, recognizing that regressions can quickly erase months of optimization work.

Pinner Wait Time (PWT) is the primary custom metric; each key page tracks the time to load critical elements such as large hero images and the button to save a Pin.

Core Web Vitals are incorporated into PWT and monitored via dashboards, A/B experiment frameworks, and diff‑based performance integration tests.

Ownership : The performance team provides logging, data pipelines, dashboards, and tools, while surface owners are responsible for metric health, trade‑off decisions, and regression response.

Performance Budget : Surface owners must keep metric baselines at or below year‑start levels, with alerts and Jira tickets generated when regressions exceed thresholds.

Real‑User Monitoring : Fine‑grained, real‑time RUM charts enable faster regression detection, pinpoint peak spikes, and compare against a two‑week baseline, triggering warning or critical alerts based on regression duration.

Alerting : Moving from daily 7‑day averages to real‑time charts provides immediate, precise detection, clearer peak identification, and staged alerts (warning after 30 min, critical after several hours).

Root‑Cause Analysis : Precise regression start times allow investigators to correlate deployments, experiments, and internal changes, narrowing the suspect change set.

Initial Investigation Steps : Check for concurrent regressions on other surfaces. Determine the exact regression start time. Review deployments and experiments aligned with that time.

Advanced Investigation Steps : Examine log volume and content distribution changes. Identify the regression’s position in the critical path. Inspect network request variations.

Metrics such as HTML flow timing and network congestion timing were added in 2022 to surface hidden regressions.

HTML Flow Timing : Measures time to stream key HTML blocks (e.g., hero image tags, preload links) from server to client, helping diagnose server‑render changes that affect LCP.

Figure 1: Tracking hero image and save‑button timings as part of PWT.

Network Congestion Timing : Records start/end of request batches during PWT, capturing script request progress (e.g., 25%/50% start and completion) to correlate with LCP delays.

Figure 8: Time for 25% of script requests to complete versus preload image request.

A/B Experiment Checks : Performance metrics are streamed into the experiment framework; sustained degradation over five days triggers Slack alerts and Jira tickets, with severity‑based mitigation steps.

Figure 9: Auto‑generated Jira ticket for an experiment performance regression.

Per‑Diff JS Bundle Size Checks : CI pipelines compare bundle sizes against the base commit; significant size changes post a comment, Slack alert, and add surface owners as reviewers.

Figure 12: PR comment showing a critical bundle size regression.

Per‑Diff Performance Regression Tests : Diff‑level integration tests run performance suites before merge, aiming to catch regressions early and eventually provide PR‑level alerts.

Summary : Real‑time, fine‑grained monitoring combined with automated, proactive checks (per‑diff and A/B) enables early detection, clear root‑cause isolation, self‑service performance, and scalable protection against regressions as Pinterest’s deployment velocity grows.

monitoringperformanceoperationsweb‑metricsregressionreal‑user
FunTester
Written by

FunTester

10k followers, 1k articles | completely useless

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.