Backend Development 17 min read

Elastic-Job: Overview of a Distributed Job Scheduling Framework

This article introduces Elastic-Job, a Java‑based distributed job scheduling framework from Dangdang, covering its origins, core features such as sharding and elastic scaling, deployment with Zookeeper, best‑practice usage, open‑source development philosophy, future improvements, and detailed Q&A.

High Availability Architecture
High Availability Architecture
High Availability Architecture
Elastic-Job: Overview of a Distributed Job Scheduling Framework

Why jobs (scheduled tasks) are needed – Jobs are used when time‑driven processing is required, such as external data fetching, batch settlements, or non‑real‑time actions, and when system decoupling is desired.

Previous job systems at Dangdang – The company used a fragmented set of solutions: Quartz (Java standard scheduler lacking distributed execution), TBSchedule (old Alibaba open‑source scheduler with limitations), Crontab (Linux cron without distribution), and legacy Perl scripts.

Origin of Elastic‑Job – Elastic‑Job originated from Dangdang’s Java application framework ddframe (formerly dd‑job) and was extracted as an open‑source component, while other parts of ddframe remain internal.

Key functionalities of Elastic‑Job

Distributed execution across multiple nodes.

Task sharding: splitting a job into independent shards for parallel processing.

Elastic scaling: automatic re‑sharding when servers join or leave the cluster.

Stability: deterministic shard assignment based on server IP and job name.

High performance with optional trade‑offs.

Idempotence to avoid duplicate execution of the same shard.

Failover (misfire) handling for orphan shards.

Status monitoring of job runs, failures, and execution time.

Multiple job modes: simple, data‑flow (high‑throughput or sequential), with streaming support.

Additional features such as retry, single‑machine parallelism, fault tolerance, Spring namespace, and admin console.

Deployment and usage – Elastic‑Job runs by connecting its JAR/WAR to a shared Zookeeper registry. Users implement business logic that matches the assigned shard ID, optionally using custom parameters to map shard numbers to business entities (e.g., region codes).

Open‑source development philosophy – Emphasizes clean, elegant, highly reusable Java code, thorough testing (≈95% coverage), modular design, and clear documentation to encourage community contributions.

Future outlook

Enhanced monitoring (more metrics, JMX integration, external data pipelines).

Workflow support (task dependencies, init/cleanup tasks).

Improved failover latency.

Support for additional job types (file, MQ).

More sharding strategies.

Q&A highlights

Failover detection relies on Zookeeper session expiration; orphan shards are reassigned to idle servers.

Zookeeper stores configuration, server registration, execution state, and leader election; if Zookeeper goes down, jobs pause and resume automatically when it recovers.

Elastic‑Job does not currently integrate with Spring Batch; it provides its own Spring namespace.

Data‑flow jobs separate fetchData and processData, supporting high‑throughput parallelism or ordered processing per shard.

Idempotence is achieved via execution nodes in Zookeeper to prevent concurrent shard execution.

BackendJavadistributed schedulingZookeeperopen-sourceelastic-jobjob sharding
High Availability Architecture
Written by

High Availability Architecture

Official account for High Availability Architecture.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.