Big Data 16 min read

YARN Practice and Technical Evolution at Kuaishou

Jiaoxiao Fang’s talk details Kuaishou’s YARN deployment, covering its architecture, support for offline, real‑time and ML workloads, and recent enhancements such as event‑handling stability, refined preemption, high‑throughput parallel scheduling, shuffle‑caching for small I/O, plus plans for job protection and multi‑cluster resource utilization.

Tencent Cloud Developer

Sep 11, 2019

YARN Practice and Technical Evolution at Kuaishou

This article is a technical sharing session by Jiaoxiao Fang from Kuaishou's Big Data Architecture Team, focusing on YARN system practice and technical evolution at Kuaishou.

YARN Background: YARN was separated from Hadoop 1.0 to 2.0 as a key feature, transforming the original JobTracker centralized scheduling into YARN's two-level scheduling to solve cluster scalability issues. YARN consists of three modules: ResourceManager (manages entire cluster resources), NodeManager (manages machine resources), and ApplicationMaster (manages application resource information). MR/Spark/Flink implement their own AM logic to run on YARN.

YARN at Kuaishou: YARN at Kuaishou serves typical big data application stacks including offline computing (HiveSQL, MR/Spark jobs, PESTO queries), real-time processing (Flink on YARN via the Qingteng platform), and model training (XLearning scheduler for TensorFlow, XGBoost, MPI). The Arthur machine learning platform was built on Spark and XLearning.

Technical Improvements:

1. Cluster Stability: Optimized redundant events between RM and NM; implemented slow-start strategy for NM reporting (starting at 20 seconds, gradually恢复到正常); converted synchronous HDFS operations to asynchronous; moved NM-level event processing to separate threads.

2. Preemption Mechanism: Improved preemption to trigger only from core queues,抢占 only non-core queues; building a job priority system for intra-queue and cross-queue preemption.

3. Scheduling Performance: Reduced sorting scale by filtering queues and apps that don't need resources; optimized comparison operations in sorting by caching temporary objects; replaced merge sort with heap sort. Developed custom KwaiScheduler with "god view" of cluster resources, enabling parallel resource allocation across multiple threads, achieving 40,000+ containers per second scheduling performance.

4. Small IO Optimization: Implemented shuffle caching for MR jobs - when a shuffle request arrives, analyze if it might generate many small IOs, optionally cache shuffle data requiring only one large IO, eliminating subsequent multiple small IOs.

Future Planning: Building job classification protection; considering multi-cluster deployment with cross-IDC solutions; exploring ways to utilize idle resources for appropriate tasks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed Systems Big Data YARN Hadoop Kuaishou Cluster Optimization KwaiScheduler

Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.