Kyuubi: NetEase’s Open‑Source Multi‑Tenant SQL Engine for Large‑Scale Data Processing
This article introduces Kyuubi, the first NetEase project contributed to the Apache Foundation, describing its core features, multi‑tenant architecture, Spark‑based execution engine, cloud‑native capabilities, and real‑world use cases within NetEase’s data‑warehouse, ad‑hoc, and internal systems, along with performance gains and community resources.
Kyuubi is NetEase's first open‑source project accepted into the Apache Incubator, designed as a distributed, multi‑tenant, JDBC/ODBC‑compatible big‑data processing service that leverages Apache Spark as its execution engine.
Key capabilities include open‑source collaboration, multi‑tenant security, Hive JDBC compatibility, high‑performance Spark execution, large‑scale data handling, out‑of‑the‑box usability, high availability via ZooKeeper, dynamic resource configuration, and cloud‑native deployment on Yarn or Kubernetes.
The architecture consists of three layers: (1) client side (e.g., Hive Beeline, JDBC/ODBC), (2) Kyuubi server managing sessions and routing them to execution engines, and (3) the actual Spark engine, which can be flexibly assigned per user or workload.
Within NetEase, Kyuubi serves three main user groups—data‑warehouse teams, BI teams, and Spark teams—addressing scenarios such as massive concurrent queries, complex ETL jobs, mixed‑cluster deployments, and ad‑hoc analytics, while providing features like session hot‑start, engine pooling, and standardized SQL plugins.
Business case studies demonstrate how Kyuubi enables seamless migration from Hive to Spark, supports multi‑version Spark clusters, reduces resource consumption by up to 50 %, and improves query performance by 70 % or more, all with minimal changes to existing scripts.
The project timeline shows three phases: initial Hive‑to‑Kyuubi migration, Spark‑on‑Yarn stabilization, and finally one‑click cloud‑native deployment on Kubernetes, accompanied by extensive integration testing (Minikube, TPC‑DS) and community contributions.
Kyuubi’s source code is hosted on GitHub, and the community provides documentation, quick‑start guides for tools like Hue, and avenues for contribution through the DataFunTalk platform.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.