Big Data 15 min read

Star River Data Scheduling Platform: Architecture, Evolution, and Intelligent Operations at 58.com

This article details the design, evolution, and core capabilities of 58.com's self‑developed Star River data scheduling platform, covering its positioning, architectural challenges, high‑availability master design, intelligent monitoring, baseline management, and future roadmap for big‑data operations.

DataFunSummit

Dec 4, 2022

Star River Data Scheduling Platform: Architecture, Evolution, and Intelligent Operations at 58.com

As big‑data workloads expand, effective operations become crucial; 58.com introduced the self‑built Star River data development platform to provide a robust scheduling system that connects low‑level big‑data components with upper‑level applications.

The scheduling system acts as the "heart" of the data middle platform, managing task dependencies, ensuring timely execution, and addressing challenges of stability, performance, and extensibility amid millions of daily tasks.

Its evolution spans three phases: 2016‑2019 a custom scheduler that eventually hit scalability limits; 2020 a major architectural overhaul introducing a Master/Worker model, Quartz‑driven task generation, Kafka‑based state propagation, and CGroup resource isolation; and 2021‑present adding intelligent operations, baseline monitoring, and resource‑aware scheduling.

Star River’s core capabilities include a visual drag‑and‑drop UI, support for diverse task types (Hive, Shell, MR, SparkSQL, Python, custom plugins), high‑availability with fault‑tolerant master failover via Zookeeper locks, decentralized master election, and flexible dependency handling (time, event, self‑dependency).

Architecturally, it distinguishes static (e.g., Airflow) and dynamic implicit workflow definitions, opting for the latter to simplify task management; the Master orchestrates task scanning, dependency checks, rate‑limiting, and dispatches work to Workers, which run tasks in isolated threads and report status through Kafka and Zookeeper.

Intelligent operations address task‑volume growth through multi‑dimensional throttling, tiered protection (P0‑P2), baseline monitoring with warning and breach thresholds, expected‑time prediction models based on historical runtimes, and key‑path analysis to optimize critical paths.

Future plans focus on integrating data quality checks, enhancing intelligent resource allocation, and automating data‑warehouse task creation to further lower the barrier for big‑data users.

The session concludes with a Q&A and an invitation to follow DataFun for more big‑data and AI insights.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

high availability Task Scheduling Intelligent Operations Resource Isolation

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.