Interview on xxl-job Task Scheduling Framework and Handling Overlapping Tasks
The interview discusses various routing and blocking strategies of the xxl-job distributed task scheduling framework, explains how it addresses task overlap, idempotency issues, and provides practical solutions such as single‑machine execution, locking mechanisms, and using a business date to avoid date‑related problems.
Interviewer: Let's talk about the task scheduling frameworks you have used.
Candidate: There are many options, such as Quartz, Spring Batch, xxl-job, and the newer PowerJob. I use xxl-job the most.
Interviewer: Have you encountered task overlap problems with xxl-job?
Candidate: Overlap is common in batch scheduling; most distributed frameworks can mitigate some duplication but not all.
Interviewer: Which overlap issues does xxl-job solve, and which remain?
Candidate: The routing strategies of xxl-job are shown below:
FIRST: always select the first machine.
LAST: always select the last machine.
ROUND: round‑robin based on the registration order.
RANDOM: randomly select an online machine.
CONSISTENT_HASH: hash‑based fixed machine selection with even distribution.
LEAST_FREQUENTLY_USED: select the machine used least frequently.
LEAST_RECENTLY_USED: select the machine that has been idle the longest.
FAILOVER: choose the first machine that passes a heartbeat check.
BUSYOVER: choose the first machine that is idle.
SHARDING_BROADCAST: broadcast to all machines in the cluster with shard parameters.
In practice, the most common strategies are fixed‑machine (FIRST/LAST) and round‑robin. For a high‑frequency job that runs every two minutes, running on a single machine can cause overlap if the previous execution hasn't finished before the next trigger.
Interviewer: If we fix the job to a single machine (e.g., FIRST or LAST) and multiple jobs overlap, how does xxl-job handle it?
Candidate: For tightly scheduled jobs, xxl-job provides three blocking strategies:
Single‑machine serial (default): requests enter a FIFO queue and run sequentially.
Discard later schedules: if a job is already running, the new request is discarded and marked as failed.
Cover previous schedule: the running job is terminated, the queue cleared, and the new request runs immediately.
With the default serial strategy, later jobs wait in line until the previous one finishes.
Interviewer: Have you seen any issues with this approach?
Candidate: Generally it works, but if a job depends on the current date, a delayed execution can push the job to the next day, causing business impact.
Interviewer: Can you give a concrete example?
Candidate: In a loan system, a batch job sends repayment reminder SMS to customers whose due date is the next day. The job queries users with a due date of "tomorrow" and sends messages. If the job is queued and runs after the date changes, some customers won't receive the reminder.
Interviewer: Does xxl-job provide a solution for this scenario?
Candidate: Using a round‑robin routing strategy distributes the queued tasks across multiple machines, reducing the backlog on a single node.
Interviewer: If the backlog is caused by slow downstream APIs or poor SQL performance, can round‑robin help?
Candidate: No. Round‑robin only alleviates resource contention on a single machine; it cannot fix slow external calls or inefficient queries.
Interviewer: What solutions exist for those problems?
Candidate: In finance systems, a "cut‑date" concept is used: business logic reads a stored accounting date instead of the system clock. The accounting date is only advanced after all related batch jobs finish.
Interviewer: What other issues can arise from using round‑robin?
Candidate: Idempotency problems. For example, a job fetches 100 "unprocessed" records every two minutes, processes them, and marks them as "processed". If two machines run concurrently, the same records may be fetched and processed twice.
Interviewer: How can we solve the idempotency issue?
Candidate: Two approaches:
Run the job on a single machine.
Introduce an intermediate state "processing" and use a row‑level lock (or exclusive lock) when selecting records, update the state to "processing", commit the transaction, then execute the business logic, finally set the state to "processed".
select * from xxx where status='未处理' limit yy,100 for update;
update xxx set status='处理中' where id in (...);Interviewer: What if the data source cannot be locked, such as email or API queries?
Candidate: Store a unique key of the fetched data as a primary key in a database; subsequent jobs can exclude already processed keys, ensuring idempotency.
Interviewer: Great, congratulations on moving to the next round.
IT Services Circle
Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.