How ByteDance Cut Billions in Event‑Tracking Costs with Smart Data Governance
This article details ByteDance's end‑to‑end event‑tracking cost governance, covering background, strategies, large‑scale data pipelines, resource challenges, control mechanisms, automated and supervised governance modes, and the substantial savings achieved through point filtering, tiered prioritization, and sampling.
ByteDance shares its event‑tracking (埋点) cost governance practice, covering governance background, strategy, experience review, planning, and outlook.
Event‑tracking data are user behavior logs generated during product usage, crucial for understanding users and optimizing business.
In ByteDance's data processing chain, SDKs on various endpoints report data to a log collection service, which aggregates the data into a real‑time topic. Real‑time ETL then cleans, distributes, and standardizes the data before forwarding it to downstream applications such as real‑time consumption, offline warehouses, UBA, recommendation systems, and A/B testing.
The system handles massive scale, with peak traffic exceeding 100 million events per second, incremental data reaching 10 trillion events per day, and HDFS storage growth of over 10 PB per day.
Such volume creates machine‑resource, cost, operations, and SLA challenges. For example, a HDFS delivery delay was avoided by promptly deleting useless and low‑priority points through the governance mechanism.
The core idea of point governance is to allocate limited resources to valuable data reporting rather than waste them on ineffective reports. Benefits include:
Application across most internal business lines.
Cost savings from decommissioning useless points.
Saving over 100 PB of HDFS storage via point tiering.
Saving tens of millions of yuan through sampling.
The governance approach follows "control new points first, then treat existing ones." A reporting‑control list is introduced: only points listed as allowed can be reported. Controls are enforced at both the SDK level (close to the source) and the real‑time ETL level (covers all downstreams and enables rapid rollout).
To manage the allowed list, ByteDance provides the ByteIO platform for point registration, status management, and direct traffic control.
Unused points are identified by comparing cost (reporting volume) with value, evaluated across three dimensions: offline query usage, real‑time split rules, and UBA queries. Points with high cost and low value are prioritized for governance.
Key metrics for assessing governance needs include total reporting volume, associated cost, proportion of useless points, and point density (reporting volume versus active user time).
Automatic governance monitors these metrics, automatically selects candidate points, and pushes them to business owners for confirmation. Two modes exist:
Supervised automatic governance for large businesses, allowing custom monitoring rules and oversight.
Unsupervised automatic governance for smaller businesses, fully managed by the system.
Future directions aim to link governance scores with resource allocation, provide personalized high‑yield solutions, and expand governance to abnormal and low‑quality data.
Additionally, ByteDance seeks to improve usage‑case analysis by tracing ETL logic across hive tables and real‑time topics, enabling comprehensive visibility of point propagation and consumption.
Overall, the described point‑cost governance is a critical component of ByteDance's internal data governance framework.
ByteDance Data Platform
The ByteDance Data Platform team empowers all ByteDance business lines by lowering data‑application barriers, aiming to build data‑driven intelligent enterprises, enable digital transformation across industries, and create greater social value. Internally it supports most ByteDance units; externally it delivers data‑intelligence products under the Volcano Engine brand to enterprise customers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.