Handling Large Data Queries in MySQL with MyBatis: Regular, Stream, and Cursor Approaches
To efficiently process massive MySQL result sets without exhausting JVM memory, this article explains three query strategies—regular pagination, stream-based retrieval using MyBatis cursors, and cursor-based fetching with configurable fetchSize—detailing their implementations, advantages, and practical considerations.
Large‑data operation scenarios such as data migration, data export, and batch processing often require reading millions of rows from a MySQL database.
Loading the entire result set into JVM memory can cause OOM (out‑of‑memory) errors and slow performance because the framework spends time materialising each row into entity objects.
Regular Query
By default the full result set is stored in memory. In most cases developers use pagination to limit the amount of data loaded per request.
Example of a MyBatis‑Plus mapper for a paginated query:
@Mapper
public interface BigDataSearchMapper extends BaseMapper
{
@Select("SELECT bds.* FROM big_data_search bds ${ew.customSqlSegment} ")
Page
pageList(@Param("page") Page
page,
@Param(Constants.WRAPPER) QueryWrapper
queryWrapper);
}Without deep‑pagination optimisation, this approach may cause the database server to become a bottleneck and the query can take minutes or even hours.
Stream Query
A stream query returns an iterator instead of a full collection, allowing the application to fetch one row at a time and keep memory usage low.
MyBatis provides the org.apache.ibatis.cursor.Cursor interface (which extends java.io.Closeable and java.lang.Iterable ) for stream processing. Important methods include:
isOpen() : checks whether the cursor is still open before fetching data.
isConsumed() : determines if all rows have been read.
getCurrentIndex() : returns the number of rows already retrieved.
When using a stream, the database connection remains open, so the application must close the cursor after processing and release any temporary containers.
Cursor Query
Cursor‑based queries also use a server‑side cursor but allow the client to control how many rows are fetched per round via the fetchSize parameter. This method is faster than a regular query and avoids memory leaks.
Typical MyBatis mapper using @Options and @ResultType :
@Mapper
public interface BigDataSearchMapper extends BaseMapper
{
// Method 1: multiple rows per fetch
@Select("SELECT bds.* FROM big_data_search bds ${ew.customSqlSegment} ")
@Options(resultSetType = ResultSetType.FORWARD_ONLY, fetchSize = 1000000)
Page
pageList(@Param("page") Page
page,
@Param(Constants.WRAPPER) QueryWrapper
queryWrapper);
// Method 2: one row per fetch
@Select("SELECT bds.* FROM big_data_search bds ${ew.customSqlSegment} ")
@Options(resultSetType = ResultSetType.FORWARD_ONLY, fetchSize = 100000)
@ResultType(BigDataSearchEntity.class)
void listData(@Param(Constants.WRAPPER) QueryWrapper
queryWrapper,
ResultHandler
handler);
}The @Options annotation can specify ResultSetType.FORWARD_ONLY (cursor moves only forward) and the fetchSize (rows per round). @ResultType defines the entity class for the result.
Key differences:
Non‑stream queries cause memory usage to grow linearly with the number of rows.
Stream queries keep memory stable; the size depends on the configured batch size.
After processing each batch, developers should clear temporary containers (e.g., gxids.clear() ) to free memory.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.