Databases 21 min read

Applying ClickHouse for Real‑Time Advertising Audience Estimation at ByteDance

This article details how ByteDance leverages ClickHouse to power large‑scale advertising audience estimation, profiling, and statistical analysis, describing the challenges of massive data, strict latency requirements, and the evolution from a simple tag‑uid table to a bitmap‑based architecture with extensive parallel and cache optimizations.

DataFunTalk
DataFunTalk
DataFunTalk
Applying ClickHouse for Real‑Time Advertising Audience Estimation at ByteDance

ByteDance's advertising platform processes billions of users and uses ClickHouse as the core engine for online analysis, covering audience estimation, profiling, and statistical analysis.

Audience estimation requires fast set operations (intersection, union, complement) on large user groups, with response time under 5 seconds.

Challenges include massive data volume, complex queries, and strict latency.

ClickHouse was chosen over Druid, Elasticsearch and Spark for its speed on wide tables and flexible architecture.

Version 1 stores tag‑uid pairs in a two‑column table and translates set operations into SQL with sub‑queries; optimizations focus on parallel execution and fast distinct counting.

A&(B|C)
SELECT count distinct(uid)
FROM tag_uid_map
WHERE tag_id = A
AND uid IN (
SELECT distinct uid
FROM tag_uid_map
WHERE (tag_id = B) OR (tag_id = C)
)

Version 2 replaces detailed storage with a Bitmap64 column using RoaringBitmap, reducing space and simplifying queries; further optimizations include data sharding, parallel bitmap computation, cache layers, and low‑level instruction acceleration.

Extensive engineering changes to the read‑execute model, block size, secondary indexes, and caching dramatically cut query latency, storage size, and resource usage, achieving sub‑5‑second response for most queries.

Future work will target deeper computation and data optimizations, smarter caching, and richer expression support.

advertisingbig dataClickHouseDatabase Optimizationbitmap indexAudience Estimation
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.