Backend Development 10 min read

Interview on xxl-job Task Scheduling Framework and Handling Overlapping Tasks

The interview discusses various routing and blocking strategies of the xxl-job distributed task scheduling framework, explains how it addresses task overlap, idempotency issues, and provides practical solutions such as single‑machine execution, locking mechanisms, and using a business date to avoid date‑related problems.

IT Services Circle
IT Services Circle
IT Services Circle
Interview on xxl-job Task Scheduling Framework and Handling Overlapping Tasks

Interviewer: Let's talk about the task scheduling frameworks you have used.

Candidate: There are many options, such as Quartz, Spring Batch, xxl-job, and the newer PowerJob. I use xxl-job the most.

Interviewer: Have you encountered task overlap problems with xxl-job?

Candidate: Overlap is common in batch scheduling; most distributed frameworks can mitigate some duplication but not all.

Interviewer: Which overlap issues does xxl-job solve, and which remain?

Candidate: The routing strategies of xxl-job are shown below:

FIRST: always select the first machine.

LAST: always select the last machine.

ROUND: round‑robin based on the registration order.

RANDOM: randomly select an online machine.

CONSISTENT_HASH: hash‑based fixed machine selection with even distribution.

LEAST_FREQUENTLY_USED: select the machine used least frequently.

LEAST_RECENTLY_USED: select the machine that has been idle the longest.

FAILOVER: choose the first machine that passes a heartbeat check.

BUSYOVER: choose the first machine that is idle.

SHARDING_BROADCAST: broadcast to all machines in the cluster with shard parameters.

In practice, the most common strategies are fixed‑machine (FIRST/LAST) and round‑robin. For a high‑frequency job that runs every two minutes, running on a single machine can cause overlap if the previous execution hasn't finished before the next trigger.

Interviewer: If we fix the job to a single machine (e.g., FIRST or LAST) and multiple jobs overlap, how does xxl-job handle it?

Candidate: For tightly scheduled jobs, xxl-job provides three blocking strategies:

Single‑machine serial (default): requests enter a FIFO queue and run sequentially.

Discard later schedules: if a job is already running, the new request is discarded and marked as failed.

Cover previous schedule: the running job is terminated, the queue cleared, and the new request runs immediately.

With the default serial strategy, later jobs wait in line until the previous one finishes.

Interviewer: Have you seen any issues with this approach?

Candidate: Generally it works, but if a job depends on the current date, a delayed execution can push the job to the next day, causing business impact.

Interviewer: Can you give a concrete example?

Candidate: In a loan system, a batch job sends repayment reminder SMS to customers whose due date is the next day. The job queries users with a due date of "tomorrow" and sends messages. If the job is queued and runs after the date changes, some customers won't receive the reminder.

Interviewer: Does xxl-job provide a solution for this scenario?

Candidate: Using a round‑robin routing strategy distributes the queued tasks across multiple machines, reducing the backlog on a single node.

Interviewer: If the backlog is caused by slow downstream APIs or poor SQL performance, can round‑robin help?

Candidate: No. Round‑robin only alleviates resource contention on a single machine; it cannot fix slow external calls or inefficient queries.

Interviewer: What solutions exist for those problems?

Candidate: In finance systems, a "cut‑date" concept is used: business logic reads a stored accounting date instead of the system clock. The accounting date is only advanced after all related batch jobs finish.

Interviewer: What other issues can arise from using round‑robin?

Candidate: Idempotency problems. For example, a job fetches 100 "unprocessed" records every two minutes, processes them, and marks them as "processed". If two machines run concurrently, the same records may be fetched and processed twice.

Interviewer: How can we solve the idempotency issue?

Candidate: Two approaches:

Run the job on a single machine.

Introduce an intermediate state "processing" and use a row‑level lock (or exclusive lock) when selecting records, update the state to "processing", commit the transaction, then execute the business logic, finally set the state to "processed".

select * from xxx where status='未处理' limit yy,100 for update;
update xxx set status='处理中' where id in (...);

Interviewer: What if the data source cannot be locked, such as email or API queries?

Candidate: Store a unique key of the fetched data as a primary key in a database; subsequent jobs can exclude already processed keys, ensuring idempotency.

Interviewer: Great, congratulations on moving to the next round.

backenddistributed systemstask schedulingIdempotencyxxl-jobrouting strategies
IT Services Circle
Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.