Backend Development 19 min read

Introduction to Spring Batch and Its Core Concepts

Spring Batch is a lightweight, comprehensive Java batch processing framework that provides reusable features such as job/step architecture, ItemReader/Writer/Processor, chunk processing, transaction management, and restart capabilities, with detailed explanations of core concepts, configuration examples, and best practices for building robust enterprise batch jobs.

Architecture Digest

Jul 17, 2021

Introduction to Spring Batch and Its Core Concepts

Spring Batch Overview

Spring Batch is a lightweight, comprehensive batch processing framework provided by the Spring ecosystem. It is intended for enterprise applications that need to process large volumes of data in a reliable, transactional, and restartable manner, but it does not include scheduling capabilities.

Architecture Overview

A typical batch job reads a large number of records from a source (database, file, queue), processes them, and writes the results back. The framework visualizes this flow with jobs composed of multiple steps, each step containing an ItemReader, an ItemProcessor, and an ItemWriter. Jobs are launched via a JobLauncher and persisted in a JobRepository.

Core Concepts

Job

A Job is the top‑level abstraction representing an entire batch process. It defines the sequence of steps and can be configured with listeners, restart policies, and parameter validators. Example interface:

/**
 * Batch domain object representing a job.
 */
public interface Job {
    String getName();
    boolean isRestartable();
    void execute(JobExecution execution);
    JobParametersIncrementer getJobParametersIncrementer();
    JobParametersValidator getJobParametersValidator();
}

Spring provides a default implementation SimpleJob. A typical Java‑config definition looks like:

@Bean
public Job footballJob() {
    return jobBuilderFactory.get("footballJob")
        .start(playerLoad())
        .next(gameLoad())
        .next(playerSummarization())
        .end()
        .build();
}

JobInstance

A JobInstance represents a logical execution of a job with a specific set of parameters. Its interface includes methods to obtain a unique instance ID and the job name.

public interface JobInstance {
    /** Get unique id for this JobInstance. */
    public long getInstanceId();
    /** Get job name. */
    public String getJobName();
}

JobParameters

JobParameters

are a key‑value map used to differentiate multiple executions of the same job definition (e.g., daily run dates). They are stored with each JobExecution and can be used for restart logic.

JobExecution

JobExecution

captures a single attempt to run a job, holding status, start/end times, exit code, and the associated JobParameters. Important methods include getBatchStatus() and getJobParameters().

public interface JobExecution {
    public long getExecutionId();
    public String getJobName();
    public BatchStatus getBatchStatus();
    public Date getStartTime();
    public Date getEndTime();
    public String getExitStatus();
    public Date getCreateTime();
    public Date getLastUpdatedTime();
    public Properties getJobParameters();
}

The BatchStatus enum defines possible states such as STARTING, STARTED, STOPPING, STOPPED, FAILED, COMPLETED, and ABANDONED.

public enum BatchStatus {STARTING, STARTED, STOPPING, STOPPED, FAILED, COMPLETED, ABANDONED}

Step and StepExecution

A Step encapsulates a distinct phase of a job. Each execution of a step creates a StepExecution, which records metrics, timestamps, and an ExecutionContext for storing state between restarts.

ExecutionContext

The ExecutionContext is a simple key‑value store attached to a StepExecution (or JobExecution) for persisting intermediate data.

ExecutionContext ecStep = stepExecution.getExecutionContext();
ExecutionContext ecJob = jobExecution.getExecutionContext();

JobRepository and JobLauncher

JobRepository

persists jobs, steps, and their executions. JobLauncher starts a job with given parameters, returning a JobExecution. Example launcher method signature:

public interface JobLauncher {
    JobExecution run(Job job, JobParameters jobParameters) throws JobExecutionAlreadyRunningException,
        JobRestartException, JobInstanceAlreadyCompleteException, JobParametersInvalidException;
}

ItemReader, ItemProcessor, ItemWriter

These three abstractions form the core of a step. ItemReader reads input records, ItemProcessor applies business logic (returning null to filter out a record), and ItemWriter persists the processed items. Spring Batch supplies many ready‑made implementations such as JdbcPagingItemReader and JdbcCursorItemReader.

@Bean
public JdbcPagingItemReader itemReader(DataSource dataSource, PagingQueryProvider queryProvider) {
    Map<String, Object> parameterValues = new HashMap<>();
    parameterValues.put("status", "NEW");
    return new JdbcPagingItemReaderBuilder<CustomerCredit>()
        .name("creditReader")
        .dataSource(dataSource)
        .queryProvider(queryProvider)
        .parameterValues(parameterValues)
        .rowMapper(customerCreditMapper())
        .pageSize(1000)
        .build();
}

@Bean
public SqlPagingQueryProviderFactoryBean queryProvider() {
    SqlPagingQueryProviderFactoryBean provider = new SqlPagingQueryProviderFactoryBean();
    provider.setSelectClause("select id, name, credit");
    provider.setFromClause("from customer");
    provider.setWhereClause("where status=:status");
    provider.setSortKey("id");
    return provider;
}

Chunk Processing

Spring Batch processes items in chunks. A chunk size defines how many items are read, processed, and held in memory before a single transaction commit occurs, improving throughput for large data sets.

Skip and Failure Handling

Steps can be configured with skipLimit(), skip(), and noSkip() to control how many and which exceptions may be ignored during processing. Exceptions not listed in skip() cause the step to fail immediately.

Practical Guidelines

Key principles include keeping batch logic simple, minimizing I/O, processing data close to its storage, allocating sufficient memory up‑front, validating data integrity, and performing load testing with realistic data volumes.

Disabling Automatic Job Startup

When using Java configuration, Spring Boot will launch all defined jobs on application start. To prevent this, set the property spring.batch.job.enabled=false in application.properties.

Memory Exhaustion Issues

If a reader loads the entire dataset into memory, the JVM may run out of heap space, resulting in a "Resource exhaustion event". Solutions are to paginate the reader or increase the JVM heap size.

Source: blog.csdn.net/topdeveloperr/article/details/84337956

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java Batch Processing Spring Framework Job Chunk Spring Batch Step

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.