Big Data 11 min read

Marketing Effectiveness Analysis with ClickHouse at Tencent Games

This article presents Tencent Games' marketing effectiveness analysis workflow, describing the business background, multi‑dimensional analytical challenges, the evolution of their data stack from MySQL to ClickHouse, practical optimizations, experience sharing on hotspot handling, Zookeeper usage, bottlenecks, migration, and future plans for broader ClickHouse adoption.

DataFunTalk
DataFunTalk
DataFunTalk
Marketing Effectiveness Analysis with ClickHouse at Tencent Games

Marketing activities are a key lever for game operations, requiring rapid, multi‑dimensional analysis of massive logs to evaluate effectiveness. The talk outlines the typical user journey from ad exposure to game entry and enumerates common analyst requests such as real‑time exposure, item distribution, ad‑slot performance, user return, and high‑value player analysis.

The technical challenge lies in handling numerous campaigns with varying cycles and dozens of metrics, demanding fast, de‑duplicated, multi‑dimensional queries on billions of rows. The solution evolved from direct MySQL queries to HDFS + Hive/Spark for large‑scale batch processing, then to pre‑computed KV stores for near‑real‑time needs, and finally to an OLAP‑oriented columnar database, ClickHouse.

ClickHouse, an open‑source columnar DBMS from Yandex, offers 100‑1000× faster analytical performance, processing hundreds of millions to billions of rows per second. Its OLAP characteristics—read‑heavy workloads, wide tables, high‑throughput bulk writes, and flexible schema—match the marketing analytics requirements.

Practical ClickHouse optimizations include: (1) cluster throttling with priority queues to limit concurrent heavy queries; (2) BadSQL detection that parses incoming SQL, compares with historical patterns, and splits large scans into smaller sub‑queries; (3) data preprocessing to mitigate write‑side hot‑spotting by sharding logs before ingestion; and (4) leveraging Kafka and Storm for real‑time stream processing alongside Spark for offline batch jobs.

Experience sharing covers hotspot data handling via pre‑sharding and isolated deployments, reducing Zookeeper dependency by limiting table counts and using snapshots, addressing single‑node bottlenecks through shard‑level aggregation and hardware scaling, and managing data migration with ClickHouse‑copier or remote functions.

Future directions aim to expand ClickHouse usage to user behavior analysis, real‑time data warehouses, and live dashboards, improve ecosystem tools for multi‑source ingestion and export, enhance operational tooling, and adopt containerized deployments for elastic scaling and better resource isolation.

The Q&A addresses cluster rebalancing during scaling and result caching strategies, followed by a brief closing thank‑you.

Data Engineeringbig dataClickHouseOLAPmarketing analyticsGame Operations
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.