Big Data 15 min read

iQIYI Multi-AZ Unified Scheduling Architecture for Big Data

iQIYI’s Multi‑AZ unified scheduling architecture combines a unified storage layer (QBFS), an abstracted compute scheduler (QBCS), and a federated metadata service (Waggle Dance) to seamlessly route data and jobs across availability zones, cut storage costs up to 65 %, reduce overall big‑data workload expenses by more than 35 %, and lay the groundwork for future hybrid‑cloud expansion.

iQIYI Technical Product Team

Oct 24, 2024

iQIYI Multi-AZ Unified Scheduling Architecture for Big Data

iQIYI’s big data platform is widely used in operation decisions, user growth, advertising distribution, video recommendation, search, and membership marketing, providing a crucial data‑driven engine for business growth and user experience.

As the business expanded, massive data accumulated across multiple Availability Zones (AZs) and Hadoop clusters, creating data islands that made data discovery, migration, and cross‑cluster processing cumbersome and costly.

To address these challenges, the iQIYI big data team built a Multi‑AZ unified scheduling architecture that supports data read/write routing and compute scheduling across different AZs and clusters, enabling seamless data access and migration while reducing storage and compute costs.

Unified Storage (QBFS)

Provides a unified namespace, e.g., qbfs://online01/warehouse/db1/tableX, abstracting the underlying file system (HDFS, object storage, etc.).

Integrates Alluxio caching to accelerate cross‑AZ data access, achieving up to 25× query speedup in Trino on HDFS and 75% latency reduction in Iceberg lake queries.

Implements tiered storage (standard, low‑frequency, archive) based on data hotness, cutting storage costs by up to 65% and reducing data volume on HDFS.

Unified Compute Scheduling (QBCS)

QBCS (iQIYI Bigdata Computing Scheduler) abstracts underlying compute clusters. When a user submits a job, QBCS selects the optimal cluster based on factors such as project planning, input/output data location, queue availability, and cluster/network health, enabling automatic failover and load balancing.

Key scheduling factors include:

Project‑level cluster planning.

Physical cluster of input/output data (via QBFS).

Queue existence and load.

Cluster and network status for high availability.

Unified Metadata Service (Waggle Dance)

To eliminate metadata silos, iQIYI adopted the open‑source Waggle Dance to federate multiple Hive Metastore instances. This provides horizontal scalability, business isolation, reduced failure impact, and cross‑cluster metadata access, allowing a single Hive Metastore definition to be accessed from many clusters.

Benefits of the unified metadata service include:

Horizontal scaling by splitting metadata across multiple MySQL instances.

Business isolation to prevent one workload from affecting others.

Lower fault risk and faster recovery.

Elimination of data duplication through cross‑cluster access.

Improved developer efficiency by decoupling SQL from physical data location.

Results and Future Plans

The Multi‑AZ unified scheduling architecture has been deployed in iQIYI’s private cloud, achieving more than 35% cost reduction for big data workloads. Looking ahead, iQIYI plans to evolve the architecture into a hybrid‑cloud solution, extending unified storage, compute, and OLAP capabilities across private and public clouds.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Compute Scheduling iQIYI metadata federation Unified Storage

Written by

iQIYI Technical Product Team

The technical product team of iQIYI

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.