YARN Practice and Technical Evolution at Kuaishou
Jiaoxiao Fang’s talk details Kuaishou’s YARN deployment, covering its architecture, support for offline, real‑time and ML workloads, and recent enhancements such as event‑handling stability, refined preemption, high‑throughput parallel scheduling, shuffle‑caching for small I/O, plus plans for job protection and multi‑cluster resource utilization.
This article is a technical sharing session by Jiaoxiao Fang from Kuaishou's Big Data Architecture Team, focusing on YARN system practice and technical evolution at Kuaishou.
YARN Background: YARN was separated from Hadoop 1.0 to 2.0 as a key feature, transforming the original JobTracker centralized scheduling into YARN's two-level scheduling to solve cluster scalability issues. YARN consists of three modules: ResourceManager (manages entire cluster resources), NodeManager (manages machine resources), and ApplicationMaster (manages application resource information). MR/Spark/Flink implement their own AM logic to run on YARN.
YARN at Kuaishou: YARN at Kuaishou serves typical big data application stacks including offline computing (HiveSQL, MR/Spark jobs, PESTO queries), real-time processing (Flink on YARN via the Qingteng platform), and model training (XLearning scheduler for TensorFlow, XGBoost, MPI). The Arthur machine learning platform was built on Spark and XLearning.
Technical Improvements:
1. Cluster Stability: Optimized redundant events between RM and NM; implemented slow-start strategy for NM reporting (starting at 20 seconds, gradually恢复到正常); converted synchronous HDFS operations to asynchronous; moved NM-level event processing to separate threads.
2. Preemption Mechanism: Improved preemption to trigger only from core queues,抢占 only non-core queues; building a job priority system for intra-queue and cross-queue preemption.
3. Scheduling Performance: Reduced sorting scale by filtering queues and apps that don't need resources; optimized comparison operations in sorting by caching temporary objects; replaced merge sort with heap sort. Developed custom KwaiScheduler with "god view" of cluster resources, enabling parallel resource allocation across multiple threads, achieving 40,000+ containers per second scheduling performance.
4. Small IO Optimization: Implemented shuffle caching for MR jobs - when a shuffle request arrives, analyze if it might generate many small IOs, optionally cache shuffle data requiring only one large IO, eliminating subsequent multiple small IOs.
Future Planning: Building job classification protection; considering multi-cluster deployment with cross-IDC solutions; exploring ways to utilize idle resources for appropriate tasks.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.