Operations 12 min read

Design and Implementation of Vivo Jenkins Scheduler for High Availability and Resource Scheduling

This article analyzes common Jenkins high‑availability challenges, reviews existing industry solutions, and presents Vivo's own Jenkins Scheduler architecture—including API‑gateway, event center, scheduling algorithms, flow‑control, and callback mechanisms—demonstrating its production deployment and future container‑based evolution.

Architecture Digest

Feb 10, 2023

Design and Implementation of Vivo Jenkins Scheduler for High Availability and Resource Scheduling

Vivo's Internet Server team explains the motivation for a highly available Jenkins deployment, describing typical problems such as single‑master bottlenecks, slave failures, unpersisted queues, and workspace cleanup issues.

The article reviews three mainstream industry approaches: (1) Gearman + Jenkins, which distributes jobs via a Gearman server but requires identical job configuration on each master; (2) Re‑architecting Jenkins to store configuration in a database, enabling multi‑master instances at the cost of custom development and slower reads; (3) A simple active‑passive master‑backup mode using the SCM Sync Configuration plugin, which provides failover but wastes resources and incurs switch‑over latency.

Because these methods do not fully meet Vivo's requirements, the team designed the Vivo Jenkins Scheduler system with the following objectives: improve overall build service reliability, reduce disaster‑recovery time, efficiently allocate tasks to sub‑nodes, enable rapid failover with administrator notification, and provide visual analytics such as build‑time and build‑count reports.

The system adopts a full‑master architecture managed by a custom scheduler. Core components include:

API‑Gateway : handles external requests, performs permission checks, intelligent routing, rate‑limiting, logging, and request data transformation.

Event Center : built on Spring events, emits Jenkins registration, down, job redo, job receive, and job execute events.

Scheduler Center : selects appropriate Jenkins instances using a two‑stage algorithm—grouping by tags (language, tool, JDK version, etc.) and then applying selection strategies such as recent execution history and average build duration.

Flow‑Control & Queue Management : when request volume exceeds a threshold, jobs are persisted to MySQL; otherwise they are queued in Redis for immediate execution.

Callback Center : monitors job status (start, interrupt, success, failure), notifies downstream services, and stores execution records for later visualization.

The design eliminates the traditional master‑slave model, allowing multiple masters to process jobs directly under the scheduler’s coordination, thereby achieving true high availability.

The system has been deployed in production, showing stable operation and effective job scheduling. Future plans include containerizing Jenkins instances, moving toward a pool‑based model, and further integrating with Vivo's Kubernetes ecosystem to improve resource utilization and release efficiency.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

ci/cd high availability Resource Management Scheduler DevOps Jenkins

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.