Design and Implementation of Bilibili's Event Tracking (埋点) Analysis Platform
Bilibili’s unified event‑tracking platform now manages over 120,000 definitions and ingests billions of daily events, guiding product, operation and marketing decisions through a full‑lifecycle framework—design, collection, testing, storage, and analysis—while leveraging a Spmid naming scheme, protobuf models, ClickHouse for sub‑second queries, and visual dashboards, with future plans for dynamic TTL, automated DWD tables, and deeper AB‑testing integration.
Bilibili's product iteration heavily relies on data-driven decisions, and event tracking (埋点) serves as a critical source for algorithmic recommendation, channel placement, and business analysis. This article shares Bilibili's experience in designing a standardized event tracking system and building an end‑to‑end analysis platform.
The internal "North Star" tracking platform now manages over 120,000 event definitions and ingests more than a hundred billion events daily, providing a unified foundation for product, operation, and marketing teams to monitor key metrics such as acquisition, activation, retention, and conversion.
Platform framework : The platform covers the full lifecycle of an event – from design and management, through collection, testing, storage, query, analysis, to deprecation. The architecture diagram (see image) illustrates this seamless flow.
Two development stages :
Stage 1 – Event definitions were stored in Excel or Info documents, logs were sent to separate Hive tables, and downstream analysts used generic BI tools or ad‑hoc SQL for analysis.
Stage 2 – A unified reporting model and SDK were introduced, a dedicated visualization platform was built, and ClickHouse replaced Hive as the query engine, dramatically improving query speed.
Design and implementation : The platform abstracts the data flow into several core modules (see architecture image). Key components include:
Event design specification & management : Adoption of the Spmid (Super Position Model) naming convention (business.page.module.position.type) and structured management of event names, common attributes, type‑specific attributes, and private attributes.
Event model : An Event‑User‑Session model serialized via Protocol Buffers, capturing who, when, where, how, and what.
Naming rules (bullet list): index or camelCase, nouns only, reuse of common fields (e.g., topic_id, order_id), avoid special characters.
Reporting protocol : Common parameters, type‑specific parameters, and private parameters are wrapped in a JSON field for downstream parsing.
Metadata management : Supports sampling ratios, core‑event flags, custom routing, and attribute‑group templates, enabling dynamic sampling control and high‑priority queues for critical events.
Testing module : Mobile devices (iOS, Android, iPad) scan a QR code to connect; data is forwarded via Nginx to a Lancer gateway, buffered in Kafka, and visualized in real‑time. The testing UI links reported events with their metadata for quick validation.
Analysis modules :
Event analysis – includes event, funnel, retention, path, single‑user drill‑down, user segmentation, and custom SQL.
Data dashboard – stores analysis results for reuse, with caching and nightly refresh strategies to reduce load.
Query engine migration : Early queries on Hive often exceeded 10 minutes. After switching to ClickHouse, the same analyses complete in sub‑second latency, with >85 % of queries finishing within 30 seconds.
Example of a ClickHouse query used for event analysis:
--已过滤敏感信息
select
AA.logDate,
AA.flag,
CAST(AA.indicator AS String) AS indicator,
AA.private_source
from (
select
log_date AS logDate,
'A' AS flag,
CAST(SUM(pv) AS Float64) AS indicator,
extended_field_values[indexOf(extended_field_keys,'source')] AS private_source
from event_table_name
where log_date BETWEEN '20211209' AND '20211215'
AND event_id = 'event_id'
AND app_id = xx
group by log_date, extended_field_values[indexOf(extended_field_keys,'source')]
order by SUM(pv) desc
limit 5000
union all
B query
) AA settings max_execution_time = 150;Funnel analysis also uses ClickHouse's windowFunnel function. Example:
--已过滤敏感信息
SELECT level,
uniq(buvid) AS cnt
FROM (
SELECT buvid,
windowFunnel(86400)(ctimes,
(event_id = 'event_id123'),
(event_id = 'event_id456'),
(event_id = 'event_id789')) AS level
FROM event_table_name1
WHERE log_date BETWEEN '20220915' AND '20220915'
AND (event_id = 'event_id123')
AND arrayExists(x -> splitByChar('`', x)[indexOf(extended_field_key, 'parent_area_id')] IN ('1'), extended_fields_value_1)
UNION ALL
SELECT buvid, event_id, ctimes FROM event_table_name1
WHERE log_date BETWEEN '20220915' AND '20220915'
AND (event_id = 'event_id456')
UNION ALL
SELECT buvid, event_id, ctimes FROM event_table_name1
WHERE log_date BETWEEN '20220915' AND '20220915'
AND (event_id = 'event_id789')
) GROUP BY buvid SETTINGS distributed_group_by_no_merge = 1) GROUP BY level;Beyond event and funnel analysis, the platform provides drag‑and‑drop visual query building, data dashboards with caching and incremental refresh, and optimization of refresh strategies based on access logs to reduce server pressure.
Summary and future outlook : The platform now supports hundreds of Bilibili products, adding hundreds of new events weekly. Future work includes dynamic TTL lifecycle management for each event_id, automated generation of intermediate DWD tables, and deeper integration with AB testing and user‑tag systems to further unlock data value.
Bilibili Tech
Provides introductions and tutorials on Bilibili-related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.