Kyuubi Application Practice on Xiaomi's Big Data Platform
This talk presents the end‑to‑end deployment of Kyuubi as a unified, high‑availability SQL gateway on Xiaomi’s big‑data platform, covering its integration, architecture upgrades, multi‑engine support, performance gains, operational improvements, and future roadmap.
The presentation describes how Xiaomi integrated Kyuubi to build a unified, easy‑to‑use, and highly available SQL service for its evolving big‑data platform.
Background : Multiple SQL services (MySQL/Doris, Hive/Kudu, Talos) caused fragmented user experience, duplicated resources, and security risks due to inconsistent authentication and permission models.
Goal : Create a one‑stop data development platform that provides a consistent entry point, SQL traffic governance, and secure data access.
Kyuubi Selection : Chosen for full compatibility with Hive Thrift, high availability, resource isolation, clean architecture, and strong community support, offering a low‑cost migration from the existing SQLProxy.
Architecture Upgrade :
Unified Kyuubi Server as the SQL gateway.
Kyuubi Engine layer runs Spark SQL, with support for Trino, Hive, and Doris.
Engine Manager service handles engine lifecycle, configuration, discovery, and load balancing.
Integration with Ranger for unified permission verification.
Operational Improvements :
Containerized deployment on Kubernetes for elastic high availability.
Workspace‑level resource isolation and token‑based authentication.
Engine pool mechanism to avoid cold‑start latency.
Automatic health checks, load‑aware routing, and auto‑restart for engine failures.
Performance Results : After six months, daily SQL processing grew from 50 k to 500 k queries, handling 80 % of total SQL traffic; Spark and Trino average latency ~30 s (P50 ≈ 5 s); overall service availability > 99.9 %.
New Features Built on Kyuubi :
Small‑file merging to reduce write‑side overhead.
Incremental result fetching and result‑size limits to prevent OOM.
Z‑Ordering for improved query performance on Parquet data.
PlanOnly mode for syntax/semantic validation without execution.
Scala mode allowing direct submission of Scala code via JDBC.
Future Plans include multi‑engine automatic routing based on cost prediction, fully asynchronous HTTP API for ETL jobs, and continued expansion of Kyuubi’s ecosystem as a standard SQL gateway.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.