Backend Development 14 min read

Youzan Task Scheduling Platform (TSP): Design, Implementation, and Future Roadmap

Youzan’s Task Scheduling Platform (TSP) unifies its legacy cron and timeout systems into a modular, extensible framework that supports both timed and delayed tasks, offering per‑task rate limiting, sharding, isolation, retry policies, and a rich SDK, while its roadmap adds end‑to‑end monitoring, workflow orchestration, domain‑level clustering, transactional messaging, and dynamic task registration.

Youzan Coder
Youzan Coder
Youzan Coder
Youzan Task Scheduling Platform (TSP): Design, Implementation, and Future Roadmap

Youzan's early growth led to many cron tasks with high maintenance costs, poor visualization, and single‑point risks, prompting the first‑generation centralized scheduler Watchman 1.0. Later, the timeout center TOC focused on order‑timeout scenarios and was eventually maintained by the middleware team.

To unify scattered scheduling products (Watchman, TOC, Poseidon, custom frameworks) and provide extensibility, Youzan created the Task Schedule Platform (TSP), which integrates timed (cron‑based) and delayed (user‑specified executeTime) scheduling capabilities.

TSP consists of five core components: tsp‑client (business‑side SDK), console (metadata management), tsp‑web (API and UI), tsp‑fetcher (scheduling engine), and tsp‑worker (task execution). Additional modules include tsp‑client, console, RateLimit, Schedule Policy, Monitor, and retry Policy.

The fetcher uses two schedulers: ConfigsScheduler for timed tasks and TasksScheduler for delayed tasks, both backed by an NSQ‑based task queue. It implements per‑TaskConfig rate limiting via taskPerLoop and supports retry of failed callbacks.

The worker consumes the queue, executes callbacks via built‑in DubboTaskHandler and RestTaskHandler (asynchronous Dubbo generic calls with customizable POJO parameters), handles retries, and monitors execution.

Task metadata comprises Task (execution unit) and TaskConfig (configuration, callback info, policy). TaskConfig enables isolation, sharding, and rate‑limit settings.

Data storage combines DB (Task and TaskConfig) and MQ (NSQ). DB uses a composite index on configName+status+executeTime and scans by configName with condition status=0&executeTime<=now() . Expired tasks are enqueued to NSQ for worker consumption.

Task state flow: status 0 (idle) → 10 (queued) → 30 (executing, in‑memory) → 40 (paused) → 45 (async call invoked) → 50 (completed).

Scenario 1 – Custom callback logic: TSP abstracts a tsp-consumer-core module; users implement TaskHandler.execute and optionally a CallbackPostProcessor to notify callers or run extra logic after task completion.

Scenario 2 – Scheduling isolation: isolation is achieved via (1) Apollo‑based rules assigning dedicated TaskLauncher per group, (2) per‑TaskConfig queueName configuration to isolate high‑volume tasks, and (3) table sharding by a sharding key to reduce index scan pressure.

Scenario 3 – Sharding like ElasticJob: TSP adds shardingCnt and shardingId fields to TaskConfig; each generated task carries its ID and total count, allowing business logic to process a slice of data. Load balancing (Dubbo or Nginx) distributes slices across workers.

Scenario 4 – Rate limiting for low‑QPS services: TSP enforces taskPerLoop per TaskConfig, delaying excess executions to the next cycle; dynamic adjustment can be based on worker RT feedback.

Roadmap includes comprehensive monitoring (end‑to‑end tracing, look‑ahead forecasts), task orchestration inspired by FaaS (service registration, visibility control, workflow composition), clusterization by business domains, transactional message pattern (pre‑commit delayed callback, post‑commit event), and dynamic task registration (self‑register on service start, auto‑deregister on shutdown).

In summary, TSP consolidates Youzan’s scheduling needs, provides a modular, extensible platform with strong isolation, sharding, and rate‑limiting features, and outlines future enhancements for monitoring, orchestration, and scalability.

backendarchitectureShardingmiddlewaretask schedulingRate LimitingYouzanTSP
Youzan Coder
Written by

Youzan Coder

Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.