Big Data 9 min read

Design and Implementation of an Online Batch Processing Framework for Large-Scale Promotion Systems

The paper presents a centralized online batch‑processing framework for large‑scale promotion systems, where applications integrate via an SDK, a task‑center schedules and dispatches sub‑tasks through RocketMQ to Dubbo‑enabled containers, employing MapReduce‑style splitting, Guava rate‑limiting, heartbeat health checks, and has successfully handled over 1.3 million tasks during Double‑11.

DaTaobao Tech
DaTaobao Tech
DaTaobao Tech
Design and Implementation of an Online Batch Processing Framework for Large-Scale Promotion Systems

Background: In B‑side systems, batch processing is essential for large‑scale promotional activities such as Double‑11, enabling merchants to upload, manage and publish massive product data.

Overall solution: A centralized online batch‑processing framework was built. Applications integrate via an SDK, the task center handles scheduling, dispatch and state management, while business containers implement the actual task logic.

Architecture: The framework consists of a task center and business containers. The task center stores task definitions, schedules instances, and distributes sub‑tasks through RocketMQ. Business containers expose Dubbo interfaces for callbacks.

Model design: Each batch job creates a main task instance and many sub‑task instances, each representing a single data item. Task registration includes type, rate‑limit parameters, etc.

Main flow: Inspired by MapReduce, the main instance splits input data into sub‑instances stored in DB, sub‑instances are dispatched via RocketMQ, executed in containers, and results are aggregated into an Excel report.

State machine: Diagrams illustrate the lifecycle of main and sub‑tasks.

Key technologies: Rate‑limiting component based on Guava, configured per task type and instance. Main‑instance scheduling with token acquisition; if unavailable, the task waits for a retry. Sub‑instance scheduling using RocketMQ with consumption throttling. Heartbeat‑based health check between task center and containers to recover from restarts. Dubbo for client‑side RPC and asynchronous callbacks.

Client implementation: Developers only need to implement the MapReduceTask interface (processInstance, map, reduce) and annotate the class with @BatchTask. Example code:

@BatchTask(key = "myDemo")
public class MyDemo implements MapReduceTask {
    @Override
    public TaskResult processInstance(ExecuteContext executeContext) {
        while (true) {
            // read input in batches
            ...
            // generate sub‑instances
            List
subInstances = ...;
            executeContext.commit(subInstances);
        }
        return TaskResult.success();
    }

    @Override
    public TaskResult map(SubExecuteContext subExecuteContext) {
        // do something
        ...
        return TaskResult.success();
    }

    @Override
    public TaskResult reduce(ExecuteContext executeContext) {
        do {
            List
subInstanceList = executeContext.read(pageSize);
            if (CollectionUtils.isEmpty(subInstanceList)) {
                break;
            }
            // build result details
            ...
        } while (subInstanceList.size() == pageSize);
        Map
resultInfoMap = ...;
        return TaskResult.success(resultInfoMap);
    }
}

Results & outlook: The framework has processed over 1.3 million task instances during Double‑11, supporting batch registration, one‑click publishing, and export. It remains stable, with future work focusing on finer‑grained MQ rate control and improved throughput under extreme load.

JavaBig Datadistributed schedulingbatch processingDubborocketmqMapReduce
DaTaobao Tech
Written by

DaTaobao Tech

Official account of DaTaobao Technology

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.