Big Data 18 min read

iQIYI Real-time Lakehouse: Stream‑Batch Unified Architecture

iQIYI replaced its costly Lambda architecture with a unified Iceberg‑based lakehouse that combines Flink streaming and batch processing, cutting data latency from hours to minutes, supporting thousands of tables via a multi‑table sink, guaranteeing completeness, and saving millions of RMB in operational costs.

iQIYI Technical Product Team
iQIYI Technical Product Team
iQIYI Technical Product Team
iQIYI Real-time Lakehouse: Stream‑Batch Unified Architecture

Overview

Data is the foundation for user, market and operation insights at iQIYI. Historically the company used a Lambda architecture to meet massive data processing and timeliness requirements, but this approach incurred high development, maintenance and resource costs and suffered from data silos.

In recent years, lake‑house technologies such as Iceberg, Hudi and Delta Lake have emerged. iQIYI introduced Iceberg in 2020 and built a stream‑batch unified data production architecture based on Iceberg + Flink, replacing the traditional Lambda architecture. The new real‑time lakehouse processes petabytes of data daily with latency reduced from hours to 1‑5 minutes.

Traditional Lambda Architecture

The Lambda architecture consisted of two parallel pipelines:

Offline path: Data from logs, databases, metrics, etc. was ingested via tools like Venus, MySQLIO CDC and Hubble into an HDFS‑Hive data warehouse (ODS → DWD → DWS). Batch processing used Spark, and latency was on the order of hours.

Real‑time path: A separate pipeline used Kafka and Flink to provide near‑real‑time views. Additional platforms (RCP, RAP) supported real‑time analytics with second‑level latency.

Although functional, this dual‑pipeline design caused high system complexity, difficulty guaranteeing data consistency, duplicated storage/computation costs, and still suffered from hour‑level latency for many use cases.

Real‑time Lakehouse Integration

iQIYI built a unified lakehouse using Iceberg as the table format. The storage stack consists of HDFS as the underlying file system, Alluxio as a caching layer, and Hive Metastore for catalog management. Compute engines such as Spark, Flink and Trino query the Iceberg tables.

Key Iceberg features leveraged:

Controllable data latency (snapshot generation can achieve sub‑minute delay).

Native support for both batch and streaming writes, enabling true stream‑batch integration at the storage layer.

Low resource cost: columnar storage and compression on HDFS, with richer metadata for query optimization, making it cheaper than Kafka for similar workloads.

Support for row‑level data changes (Iceberg V2), facilitating seamless integration of database change data capture.

Multi‑table Flink Iceberg Sink

The original Flink Iceberg sink could write to only a single table, which is insufficient when a single Flink job needs to produce thousands of tables. iQIYI developed a Multi‑table sink with the following components:

Define MultiTableRow that carries the target table name together with the row data.

Insert a Partitioner before the writer to route rows of the same table to a fixed set of writer tasks, avoiding small‑file explosion and memory pressure.

Writer tasks load the appropriate Iceberg table based on the row’s table name and write data. At checkpoint, a MultiTableWriteResult (containing file lists per table) is emitted.

Committer tasks (parallel, not single‑threaded) aggregate the write results per table and commit new snapshots.

This design enables a single Flink job to produce 3,000+ tables while maintaining balanced load and low latency.

Data Production Progress Evaluation

To trigger downstream batch jobs only after data is complete, the sink records the maximum event timestamp per partition. During checkpoint, the smallest max‑timestamp across all writer tasks is stored as a watermark in the snapshot summary. A monitoring service reads this watermark; when it advances to a new hour, the corresponding partition is considered complete and batch jobs are launched.

Stream Consumption Backlog Monitoring

For Iceberg sources, iQIYI extended the Flink source implementation to emit the age of the latest snapshot (current time minus snapshot creation time) as a metric. This metric is exported to Prometheus/Grafana, providing real‑time backlog monitoring similar to Kafka offset lag.

Data Completeness Guarantee

When a Flink streaming job fails or loses data, iQIYI uses the RCP platform to run a batch version of the same job. The batch job reprocesses the affected partitions and overwrites the erroneous results, ensuring eventual data completeness without stopping the streaming job.

Deployment Effects

Venus (log platform): Migrated >10,000 Iceberg tables, ingesting PB‑scale logs with ~5 min latency, saving tens of millions of RMB annually.

Pingback (event tracking): Replaced Hive tables with >1,300 Iceberg tables across ODS/DWD/DWS layers, adding hundreds of TB daily, with minimum latency of 1 minute.

Database sync: Flink CDC now writes directly to Iceberg, synchronizing hundreds of tables from advertising, membership and growth systems.

Future Plans

Continue migrating more workloads to replace Hive completely and to supplant minute‑level Kafka streams.

Upgrade Flink CDC from 2.x to 3.x to support automatic schema evolution.

Introduce Paimon for use cases where Iceberg lacks column‑level updates or change‑data pipelines.

big dataReal-time ProcessingFlinkdata lakeicebergStream-Batch Integration
iQIYI Technical Product Team
Written by

iQIYI Technical Product Team

The technical product team of iQIYI

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.