Blaze Engine: A Rust‑Based Native Vectorized Execution Engine for Spark SQL
The article introduces Blaze, Kuaishou's Rust‑powered native execution engine that vectorizes Spark SQL workloads, explains its architecture and operation, presents benchmark results showing up to 50% latency reduction, and details internal deployments, industry case studies, community collaborations, and the 2025 roadmap.
Introduction Spark is one of the most widely used distributed data processing engines for data cleaning, data warehousing, reporting, and machine learning. Kuaishou runs hundreds of thousands of Spark SQL jobs daily, processing exabyte‑scale data and consuming millions of compute units (CUs) with annual costs exceeding one hundred million yuan.
To reduce cost and improve efficiency, Kuaishou developed the Blaze engine, a native execution engine built with Rust and vectorization techniques. By leveraging native code and SIMD instructions, Blaze reduces the resource consumption of online Spark SQL jobs by an average of 30% without requiring users to change their queries.
Blaze Engine Overview
Blaze is a Rust‑based native execution engine that fully utilizes SIMD vectorized computation.
It has been deployed at large scale on Kuaishou's Spark platform and plays a key role in big‑data resource cost‑optimization projects.
Integrating Blaze into Spark requires no code changes for users, yet yields significant SQL execution speedups.
Native Spark SQL Architecture
The original Spark SQL architecture consists of three layers:
Frontend (Spark Catalyst): parses, optimizes, and generates physical plans using the Volcano model.
Backend (Spark Tungsten): performs code‑generation optimizations and submits RDDs to Spark Core.
Execution layer (Spark Core): schedules RDDs and performs distributed computation.
Blaze extends this architecture without altering the frontend. The backend translates Spark physical operators into equivalent native vectorized operators via a Spark plugin, and the execution layer runs these operators using DataFusion + Arrow for high‑performance native computation. Additional capabilities include expression deduplication, fallback for unsupported UDFs, unified memory management, and Remote Shuffle support (integrated with Celeborn).
TPC‑DS Benchmark
In a 1 TB TPC‑DS benchmark, Blaze reduced execution time by 50% compared with Spark 3.5 on identical hardware.
Industry Vectorized Engine Landscape
Four major Spark vectorized engines exist: Databricks Photon, Baidu BMR, Apache Gluten (led by Intel, Kylin, Facebook), and Kuaishou's Rust‑based Blaze.
Why Choose Blaze
Strong internal user support makes Blaze suitable for small teams.
Extensive online validation shows excellent performance for complex, variable‑configuration SQL workloads.
Benchmarks demonstrate substantial speedups and resource savings.
Collaboration with multiple open‑source communities (e.g., Celeborn) expands ecosystem compatibility.
Internal Deployment at Kuaishou
Blaze is now used in roughly half of Kuaishou's Spark ETL jobs, primarily low‑priority ETL tasks.
Deployment follows a double‑run validation (full or sampled) to ensure result consistency.
Statistics: native operator coverage reaches 93%, P50 compute power improves by 50%, and resource usage drops by one‑third, saving millions of CUs per day. About 30% of jobs achieve 100% performance gain, and 50% achieve 50% gain.
Optimization Cases
Simple large‑scale data‑warehouse job saw a 40% reduction in resource consumption (≈200 CU/day saved) after enabling Blaze.
Fine‑grained UDF fallback allows only the expressions containing unsupported UDFs to revert to Spark, avoiding full‑plan fallback and reducing overhead.
Expression deduplication caches repeated expression results, cutting redundant calculations.
JSON parsing reuse dramatically lowers the cost of repeated get_json_object() calls, outperforming Spark 4.0 with the variant type.
Additional Production‑Oriented Features
Full compatibility with Spark’s storage systems (HDFS, S3, etc.) via JNI.
Multi‑level memory management and spill support, efficiently using off‑heap and on‑heap memory.
Custom shuffle format reduces shuffle data volume by ~30%, saving bandwidth.
Remote Shuffle Service (RSS) integration, with support for Apache Celeborn and future Uniffle.
Community Collaboration
Added support for Apache Celeborn in Q4 2024 through a partnership with Alibaba Cloud.
Community contributed ORC file format support, complementing Kuaishou’s internal Parquet usage.
Implemented Paimon scan operator, enabling Blaze to read Paimon tables.
2025 Roadmap
The upcoming year will focus on eight core features, including Spark 4.0 and ARM architecture support, and enhanced data‑lake integrations for Hudi and Paimon.
Overall, Blaze demonstrates how a Rust‑based native vectorized engine can substantially improve Spark SQL performance, reduce resource costs, and foster ecosystem collaboration.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.