Big Data 12 min read

Design and Evolution of Zhihu's Event Tracking (埋点) System

This article presents a comprehensive overview of Zhihu's event‑tracking system, covering its motivation, toolset, demand‑management platform, verification workflow, data‑collection pipeline, query service architecture, cloud‑native data service design, and practical Q&A on best practices and optimization strategies.

DataFunSummit
DataFunSummit
DataFunSummit
Design and Evolution of Zhihu's Event Tracking (埋点) System

With the continuous development of big‑data, DT, and AI technologies, event‑tracking (埋点) has become a crucial data source for analysis and decision‑making, especially in the AI era where massive data is required to train models.

The talk is organized into eight parts: an introduction, an overview of tracking tools, demand‑management, verification, data collection, data query, data service, and a Q&A session.

Event‑tracking tools include SDKs and web‑request sniffers that help developers design, implement, and validate tracking points, improving efficiency and data quality.

The tracking‑demand management platform at Zhihu evolved from version 1.0 to 2.0, focusing on cost reduction and efficiency. The new version consolidates multiple configuration steps into a single streamlined workflow, lowering the learning curve and speeding up design.

Verification moved from manual single‑point packet capture to a cloud‑native, high‑availability platform that uses message‑queue middleware, enabling stateless multi‑node deployment and faster, more reliable testing.

Data collection in version 1.0 relied on a Python‑based pipeline with local buffers and Kafka, suffering high latency and maintenance risk. Version 2.0 redesigns the pipeline with multi‑path message backup, reducing end‑to‑end latency to ~30 ms (about 1/15 of the previous time) and allowing horizontal scaling.

Data query is provided via a web‑API that abstracts the underlying storage. It uses Doris for high‑throughput dimensional queries and Presto on Hive for both batch and real‑time analytics, delivering fast and accurate results to product, regional, and other business stakeholders.

The data service layer integrates three core designs: data integration to lower heterogeneous source costs, logical models to avoid duplicated physical schemas and enable API‑driven access, and cloud‑native architecture to ensure high availability and seamless field‑change handling.

The Q&A covers topics such as responsible roles for parameter design, client‑ vs‑server‑side session reporting, characteristics of a good tracking system, version‑to‑product alignment, and cost‑optimisation through lifecycle management of tracking points and warehouse tables.

Overall, the presentation demonstrates how a modern, cloud‑native event‑tracking platform can support large‑scale data collection, high‑quality verification, and efficient querying, thereby empowering data‑driven product and operation decisions.

cloud-nativeBig Datadata collectionsoftware engineeringevent trackingdata services
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.