How ByteDance Guarantees Real‑Time Data Point Quality with Scalable Validation
This article explains ByteDance's end‑to‑end data‑point (埋点) validation system, covering its technical challenges—usability, accuracy, real‑time visibility, stability, and extensibility—along with SDK integration, QR‑code workflow, JSON‑Schema verification, push‑service architecture, SLA metrics, and future automation plans.
Data‑point collection is the foundation of recommendation, search, and product optimization, making its quality critical. To ensure high‑quality data, ByteDance has built a comprehensive data‑point validation system that addresses five key technical challenges: usability, accuracy, real‑time visibility, stability, and extensibility.
Technical Challenges
Usability – quickly integrate and start validation.
Accuracy – ensure validation results are trustworthy.
Real‑time – make data visible instantly.
Stability – guarantee no data loss.
Extensibility – easily add new data formats.
SDK Integration
The SDK provides a "validation switch" that can be enabled per environment. When the switch is on, data is duplicated and sent to the validation platform without affecting business logic. The client SDK supports Android, iOS, Go, Java, Python, and JavaScript, as well as a browser plugin for one‑click activation.
QR‑Code Connection Flow
Establish a WebSocket long‑connection between the server and validation platform.
The platform generates a QR code using
ws_id.
The client scans the QR code.
The client retrieves device info and enables the validation switch.
The client reports
device_idto the server.
The server pushes
device_idback to the platform.
The platform starts the validation phase.
The client begins reporting data‑points.
The server forwards data‑points to the platform.
Validation Engine – Accuracy
The engine uses JSON Schema to validate data‑point formats, ensuring reliable results. Example schema:
<code>{
"$schema":"https://json-schema.org/draft/2019-09/schema",
"type":"object",
"properties":{
"params":{
"type":"object",
"properties":{
"duration":{"type":"integer"},
"enter_from":{"type":"string","enum":["login"]},
"type":{"type":"integer","enum":[1,2,3]}
},
"required":["duration","enter_from","type"]
}
},
"required":["params"]
}</code>Sample data‑point:
<code>{
"app_id":100,
"event":"click",
"params":{
"enter_from":"login",
"duration":1,
"type":3
}
}</code>Real‑time Delivery
A custom Push service built on WebSocket provides a stable, full‑duplex channel, supporting heartbeats, shared connections across business modules, and monitoring interfaces.
Push Service Goals
Implement a universal long‑connection protocol based on WebSocket.
Enable stable, reliable full‑duplex communication between client and server.
Provide a common SDK for clients and an access layer for servers.
Facilitate easy future service integration.
Expose HTTP admin APIs for monitoring and status checks.
Push Service Advantages
Connection stability – decouples business logic from push layer.
Service isolation – isolates traffic per business.
Horizontal scalability – add instances as traffic grows.
SLA and Reliability Measures
The service defines a 99.9% availability SLA, where a p99 latency above 3 seconds is considered unavailable (typical p99 ≈ 1 second). Measures include log conversion plugins, fine‑grained QPS monitoring, threshold‑based alerts (warning at 50% of max QPS, critical at 70%), rate‑limiting per app, and automatic degradation of non‑critical validation when traffic spikes.
Future Outlook
While pre‑deployment validation works well for frequently changing data‑points, core data‑points require higher‑cost verification. ByteDance is exploring automated regression validation, post‑deployment validation with quality scoring models, and a full‑link data‑point quality assurance pipeline that combines pre‑, regression, and post‑validation stages.
ByteDance Data Platform
The ByteDance Data Platform team empowers all ByteDance business lines by lowering data‑application barriers, aiming to build data‑driven intelligent enterprises, enable digital transformation across industries, and create greater social value. Internally it supports most ByteDance units; externally it delivers data‑intelligence products under the Volcano Engine brand to enterprise customers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.