NetEase Game Streaming ETL Architecture and Practices Based on Flink
This article presents NetEase Game's streaming ETL solution built on Flink, covering business background, log characteristics, specialized and generic ETL services, architectural evolution, Python UDF integration, runtime optimizations, fault‑tolerance mechanisms, and future roadmap for unified real‑time and offline data warehouses.
NetEase Game's data integration relies on a streaming ETL pipeline powered by Flink, transforming heterogeneous game logs—operational, business, and program logs—into structured data for both real‑time and offline warehouses.
The system handles challenges such as schema‑free sources (e.g., MongoDB), deeply nested fields, high log variety, and frequent schema changes, requiring flexible parsing, transformation, and robust error handling.
Three ETL services are offered: a dedicated operational‑log ETL with custom logic, the generic EntryX ETL for all other text logs, and ad‑hoc jobs for special cases. EntryX defines Source, StreamingTable, and Sink modules, automatically generating Flink jobs from user configurations.
Architectural evolution progressed from Hadoop Streaming (Python scripts) to Spark Streaming (POC) and finally to Flink DataStream, preserving Python UDFs via a Jython‑based Runner layer that executes cross‑language functions within the JVM.
Runtime optimizations include hot‑updating lightweight configuration changes, consolidating multiple stream tables into a single Flink job to avoid redundant Kafka reads, and separating real‑time and offline sinks to mitigate HDFS back‑pressure.
Further performance tuning addresses HDFS small‑file explosion by pre‑partitioning streams (keyBy) and limiting parallelism, while SLA metrics are collected via OperatorState‑based utilities supporting static, dynamic, and TTL metrics.
Fault tolerance is achieved using SideOutput for error streams, with downstream recovery strategies involving batch reprocessing or targeted back‑fill jobs that replace corrupted Hive partitions.
Future plans focus on data‑lake support for update/delete workloads, automatic small‑file merging and deduplication, and extending Python support to the full Flink stack via PyFlink.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.