Big Data 10 min read

Home (ZhiJia) Distributed Task Scheduling System Overview

The article presents a comprehensive overview of the Home (ZhiJia) distributed task scheduling system, detailing its background, advantages, technology stack, architecture, core concepts, module responsibilities, IDE integration, and future improvement plans for big‑data processing workflows.

HomeTech
HomeTech
HomeTech
Home (ZhiJia) Distributed Task Scheduling System Overview

1. Background of Home Distributed Task Scheduling

Task scheduling is a core infrastructure in big‑data platforms; traditional single‑machine crontab approaches often suffer from unclear dependencies and difficult error tracing due to long data‑processing chains.

2. Advantages of Home Distributed Scheduling

1) Visual task management, dependency handling, execution history, and log viewing.

2) Online task updates with immediate effect.

3) Multiple scheduling modes: manual, timed, dependency‑based, and combined timed + dependency.

4) Distributed deployment ensuring high availability, scalability, and fault tolerance through failover.

5) Persistent task, execution records, and logs to avoid data loss, with robust retry mechanisms and detailed tracking/alerting.

6) Support for cross‑cycle dependencies.

7) Integration of DataX as a job for data synchronization, with a web UI for dynamic configuration.

3. Home Scheduling System Introduction

3.1 Technology Stack Selection

Spring‑Boot – active community framework for rapid development.

MyBatis – flexible persistence framework supporting custom SQL and stored procedures.

HBase – column‑family store on Hadoop, used for logs, scripts, and subscription results.

Zookeeper – proven registry for high‑availability, distributed consistency, and event triggering.

Netty – asynchronous event‑driven network framework for high‑performance master‑worker communication.

Protobuf – serialization framework for master‑worker data exchange.

Apache Curator – convenient Zookeeper client library.

Quartz – open‑source scheduler for timed tasks.

DataX – Alibaba’s tool for offline data exchange between heterogeneous systems.

3.2 Design Philosophy

Web UI actions generate events registered under Zookeeper; the master cluster watches these event nodes, processes task triggers, and updates tasks in real time (including task info, schedule, and dependencies). The cluster relies on ZK for high availability; masters register their IP/port under /group, workers establish long‑lived Netty connections, and all data exchange uses Protobuf. Masters maintain in‑memory groups of execution machines, assign tasks accordingly, and balance load based on worker heartbeat reports (CPU, memory, task status). On worker disconnection, masters kill tasks on the failed node and reassign them.

3.3 Core Scheduling Concepts

1) Job – basic execution unit dispatched by master to executors.

2) Event – operation type such as CRUD, enable/disable, timed trigger, dependency trigger, success/failure.

3) Batch – ordered collection of jobs with dependencies.

4) Listener – handler for each event type.

5) Scheduler – component that schedules jobs according to their cycles.

6) Executor – worker that receives messages and runs tasks.

3.4 Master‑Slave Architecture

Master handles events, task scheduling, batch coordination, and task distribution.

Worker receives scheduling requests and executes tasks.

Zookeeper serves as the registration center coordinating masters and workers.

HBase stores logs.

3.5 Module Division

Web UI – manages tasks, scripts, monitoring, SLA alerts, executor nodes, and batch operations.

Master – core module handling timed scheduling, dependency resolution, event listening, task dispatch, failure recovery, and horizontal scaling.

Worker – executes tasks (Shell, Python, Java, etc.), reports heartbeat, processes parameters, and runs DataX jobs.

Additional components include HBase for log/script storage and Zookeeper for event publishing, registration, and metadata storage.

4. IDE Integration

The system exposes four HTTP interfaces for IDE interaction: task execution (wraps IDE request into a temporary job), task cancellation (cancels by job ID via ZK event), log query (retrieves logs from execution machines), and data import (packages IDE‑submitted data into a DataX import job).

5. Summary

As an internally developed unified scheduling system, it defines the logic and rules for big‑data computation, orchestrates task execution order, and provides convenient operations such as execution, re‑run, termination, log viewing, and real‑time monitoring.

6. Future Plans

1) Optimize the currently single‑node master to remove bottlenecks.

2) Replace long‑lived master‑worker connections with RPC or HTTP to improve reliability.

Big Datadistributed schedulingtask managementZookeeperSpring BootMaster‑Slave
HomeTech
Written by

HomeTech

HomeTech tech sharing

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.