Big Data 6 min read

Optimizing Game Event Reporting with Stream Processing to Overcome ClickHouse Performance Bottlenecks

Faced with ClickHouse query times ballooning to over an hour for massive game‑event data, the team replaced the DB‑pull model with a stream‑processing pipeline that evaluates trigger rules in real time, cuts batch queries by 60 %, and brings reporting latency down to minutes.

37 Interactive Technology Team
37 Interactive Technology Team
37 Interactive Technology Team
Optimizing Game Event Reporting with Stream Processing to Overcome ClickHouse Performance Bottlenecks

Recently while optimizing an in‑game user behavior event reporting project, we encountered performance bottlenecks in the traditional architecture, especially in data query performance. The volume of user behavior event data is massive, with tables reaching hundreds of billions of rows.

Even though we employed ClickHouse, a high‑performance analytical database, it only provided limited relief. Unable to keep adding hardware and facing immediate business impact, we reconsidered the approach (thanks to suggestions from senior engineers) and changed the data‑fetching method from pulling from the DB to a direct stream‑processing model. This eliminates the intermediate DB query layer, dramatically increasing processing efficiency and solving the current “dead‑end” situation.

Original data flow – user behavior data is ETL‑cleaned, stored in a database, then programs query the data according to predefined rules, aggregate, and report to third‑party services.

The flow seemed normal and stable for a while, but recent incidents caused data pipeline instability, leading to delayed reporting. Investigation revealed that ClickHouse query times grew from under 15 minutes per batch to over an hour, far exceeding the business requirement of 30 minutes.

Root‑cause analysis with the DBA showed that the ClickHouse cluster is shared with other projects and receives massive update workloads, triggering heavy merge operations. Large‑scale updates are a known performance killer for ClickHouse.

After various optimizations (caching, reducing point queries, lowering concurrency, cleaning historical data) the performance gain was still insufficient. We decided to abandon DB‑based queries and implement stream processing at the ETL node. Rules that can be evaluated in a streaming fashion are identified and the matching user‑behavior records are sent directly to the reporting queue.

There are two rule types: aggregation rules, which still require batch queries, and trigger rules, which can be processed instantly in a stream without aggregation.

Benefits – Non‑streaming rule part: the number of ClickHouse batch query tasks dropped from >7,000 to ~2,900 (≈60 % reduction), and total batch execution time fell from >1 hour to roughly 15 minutes.

Streaming rule part: the previous batch‑pull‑then‑report model was replaced by near‑real‑time reporting, greatly improving timeliness.

Practice summary:

1) About 60 % of the rules have been converted to streaming; the remaining 40 % still rely on DB aggregation and can be further optimized.

2) Streaming processing handles each record as it arrives; any enrichment must be prepared in advance (e.g., preload basic info into a fast cache) otherwise DB lookups will remain a bottleneck.

3) ClickHouse is not suitable for massive update workloads; if data needs to be persisted, consider alternative storage such as Hologres.

performance optimizationBig Datadata pipelinestream processingClickHousegame analytics
37 Interactive Technology Team
Written by

37 Interactive Technology Team

37 Interactive Technology Center

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.