Backend Development 17 min read

Elastic-Job: Overview of a Distributed Job Scheduling Framework

This article introduces Elastic-Job, a Java‑based distributed job scheduling framework from Dangdang, covering its origins, core features such as sharding and elastic scaling, deployment with Zookeeper, best‑practice usage, open‑source development philosophy, future improvements, and detailed Q&A.

High Availability Architecture

Nov 5, 2015

Elastic-Job: Overview of a Distributed Job Scheduling Framework

Why jobs (scheduled tasks) are needed – Jobs are used when time‑driven processing is required, such as external data fetching, batch settlements, or non‑real‑time actions, and when system decoupling is desired.

Previous job systems at Dangdang – The company used a fragmented set of solutions: Quartz (Java standard scheduler lacking distributed execution), TBSchedule (old Alibaba open‑source scheduler with limitations), Crontab (Linux cron without distribution), and legacy Perl scripts.

Origin of Elastic‑Job – Elastic‑Job originated from Dangdang’s Java application framework ddframe (formerly dd‑job) and was extracted as an open‑source component, while other parts of ddframe remain internal.

Key functionalities of Elastic‑Job

Distributed execution across multiple nodes.

Task sharding: splitting a job into independent shards for parallel processing.

Elastic scaling: automatic re‑sharding when servers join or leave the cluster.

Stability: deterministic shard assignment based on server IP and job name.

High performance with optional trade‑offs.

Idempotence to avoid duplicate execution of the same shard.

Failover (misfire) handling for orphan shards.

Status monitoring of job runs, failures, and execution time.

Multiple job modes: simple, data‑flow (high‑throughput or sequential), with streaming support.

Additional features such as retry, single‑machine parallelism, fault tolerance, Spring namespace, and admin console.

Deployment and usage – Elastic‑Job runs by connecting its JAR/WAR to a shared Zookeeper registry. Users implement business logic that matches the assigned shard ID, optionally using custom parameters to map shard numbers to business entities (e.g., region codes).

Open‑source development philosophy – Emphasizes clean, elegant, highly reusable Java code, thorough testing (≈95% coverage), modular design, and clear documentation to encourage community contributions.

Future outlook

Enhanced monitoring (more metrics, JMX integration, external data pipelines).

Workflow support (task dependencies, init/cleanup tasks).

Improved failover latency.

Support for additional job types (file, MQ).

More sharding strategies.

Q&A highlights

Failover detection relies on Zookeeper session expiration; orphan shards are reassigned to idle servers.

Zookeeper stores configuration, server registration, execution state, and leader election; if Zookeeper goes down, jobs pause and resume automatically when it recovers.

Elastic‑Job does not currently integrate with Spring Batch; it provides its own Spring namespace.

Data‑flow jobs separate fetchData and processData, supporting high‑throughput parallelism or ordered processing per shard.

Idempotence is achieved via execution nodes in Zookeeper to prevent concurrent shard execution.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backend Distributed Scheduling ZooKeeper Open Source Elastic-Job Job Sharding

Written by

High Availability Architecture

Official account for High Availability Architecture.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.