Operations 12 min read

Design and Implementation of Vivo Jenkins Scheduler for High Availability and Resource Scheduling

This article analyzes common Jenkins high‑availability challenges, reviews existing industry solutions, and presents Vivo's own Jenkins Scheduler architecture—including API‑gateway, event center, scheduling algorithms, flow‑control, and callback mechanisms—demonstrating its production deployment and future container‑based evolution.

Architecture Digest
Architecture Digest
Architecture Digest
Design and Implementation of Vivo Jenkins Scheduler for High Availability and Resource Scheduling

Vivo's Internet Server team explains the motivation for a highly available Jenkins deployment, describing typical problems such as single‑master bottlenecks, slave failures, unpersisted queues, and workspace cleanup issues.

The article reviews three mainstream industry approaches: (1) Gearman + Jenkins, which distributes jobs via a Gearman server but requires identical job configuration on each master; (2) Re‑architecting Jenkins to store configuration in a database, enabling multi‑master instances at the cost of custom development and slower reads; (3) A simple active‑passive master‑backup mode using the SCM Sync Configuration plugin, which provides failover but wastes resources and incurs switch‑over latency.

Because these methods do not fully meet Vivo's requirements, the team designed the Vivo Jenkins Scheduler system with the following objectives: improve overall build service reliability, reduce disaster‑recovery time, efficiently allocate tasks to sub‑nodes, enable rapid failover with administrator notification, and provide visual analytics such as build‑time and build‑count reports.

The system adopts a full‑master architecture managed by a custom scheduler. Core components include:

API‑Gateway : handles external requests, performs permission checks, intelligent routing, rate‑limiting, logging, and request data transformation.

Event Center : built on Spring events, emits Jenkins registration, down, job redo, job receive, and job execute events.

Scheduler Center : selects appropriate Jenkins instances using a two‑stage algorithm—grouping by tags (language, tool, JDK version, etc.) and then applying selection strategies such as recent execution history and average build duration.

Flow‑Control & Queue Management : when request volume exceeds a threshold, jobs are persisted to MySQL; otherwise they are queued in Redis for immediate execution.

Callback Center : monitors job status (start, interrupt, success, failure), notifies downstream services, and stores execution records for later visualization.

The design eliminates the traditional master‑slave model, allowing multiple masters to process jobs directly under the scheduler’s coordination, thereby achieving true high availability.

The system has been deployed in production, showing stable operation and effective job scheduling. Future plans include containerizing Jenkins instances, moving toward a pool‑based model, and further integrating with Vivo's Kubernetes ecosystem to improve resource utilization and release efficiency.

CI/CDHigh Availabilityresource managementSchedulerDevOpsJenkins
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.