Building and Managing Huolala's User Event Tracking System: Architecture, Governance, and Monitoring
This article details Huolala's user event tracking (埋点) system, covering its background, challenges, the construction of a four‑module management platform, backend SDK design, monitoring and quality assurance mechanisms, and future plans for service integration, data lineage, and governance optimization.
Background and Challenges Huolala's user event tracking data is a core asset for growth, product optimization, and decision‑making. Rapid business expansion exposed three main pain points: lack of demand control on the business side, missing backend reporting, and high QA regression costs.
Capability Building To address these issues, a custom event‑tracking management platform was built with four modules—demand management, version management, event management, and attribute management. The platform standardizes event design, tracks versioned metadata, and links events to their originating requirements, enabling traceability and controlled incremental tracking.
Backend SDK and Data Pipeline A backend SDK reports events via internal HTTP to a collection service, which retrieves metadata from Redis for validation, tagging, and distribution. Validated events are dispatched to downstream Flink jobs and stored in Doris and Hive for analysis. An ACK and retry mechanism ensures data consistency.
Monitoring and Assurance Incremental and stock event monitoring provides visibility into PV/UV trends and data quality. A dedicated quality monitoring system (the "Dayu" system) visualizes core event health, while real‑time dashboards track error rates and version changes. Events are classified into four governance levels (A‑invalid, B‑general, C‑important, D‑core) to guide filtering and storage decisions.
Future Outlook The roadmap focuses on three areas: integrating self‑built services with the purchased analytics platform, establishing data lineage for downstream traceability, and advancing from passive to proactive governance to standardize event definition across teams.
Q&A Highlights The session covered evaluation of event accuracy, handling of invalid schema data via a separate Kafka topic, linking anonymous and logged‑in user actions through ID replacement, granularity of event and attribute versioning, and tools for quickly locating event implementations for analysts.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.