Big Data 15 min read

Apache Kyuubi Practices and Service Evolution at iQIYI

This article details iQIYI's implementation of Apache Kyuubi for Spark Thrift Server, covering the evolution from native Spark Thrift to Kyuubi 0.7 and 1.x, multi‑tenant architecture, tag‑based configurations, SQL auditing, lineage collection, service monitoring, small‑file and Z‑order optimizations, and a brief Q&A.

DataFunSummit

Jun 16, 2023

Apache Kyuubi Practices and Service Evolution at iQIYI

Overview – iQIYI shares its practical experience with Apache Kyuubi, focusing on three main topics: the evolution of the Spark Thrift Server service, Spark SQL platform adaptation, and service optimization.

1. Spark Thrift Server Evolution – iQIYI moved from the native Spark Thrift Server (stage 1) to Kyuubi 0.7 (stage 2), which introduced multi‑tenant support and dynamic resource configuration, and finally to Kyuubi 1.x (stage 3) that decouples server and engine, enabling independent or shared engines, fine‑grained resource isolation, and various sharing strategies.

2. Spark SQL Platform Adaptation – The platform introduces tag‑based configuration for different workloads (ETL, ad‑hoc queries), SQL event auditing via Kyuubi’s event system (sending events to Elasticsearch and Spark History), lineage collection using SparkAtlas Connector, and extensive service monitoring with custom reporters feeding metrics to internal monitoring platforms.

3. Spark SQL Service Optimization – Addresses small‑file problems by applying repartition with large values combined with Spark 3 AQE auto‑merge, random‑key partitioning to avoid data skew, and the new rebalance feature (Spark 3.2+) for automatic partition balancing. It also introduces Z‑order optimization, explaining the Z‑value calculation and its impact on query pruning and compression, and shows how to enable it via Kyuubi properties.

Q&A – Answers include how Kyuubi achieves persistent Spark contexts through shared engines and how health‑checking (dial‑testing) is performed via periodic JDBC and REST API calls.

Conclusion – The session demonstrates how Kyuubi enhances Spark‑based data platforms with better multi‑tenant support, resource isolation, observability, and performance optimizations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Data Platform Spark SQL service optimization Apache Kyuubi

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.