How to Split Large Transactions in TiDB: Guidelines, Limits, and Practical Scripts
The article explains why TiDB discourages large transactions, outlines the historical transaction limits, shows how to configure size thresholds, and provides concrete scripts for splitting big updates based on primary‑key characteristics to keep transaction size within safe bounds.
TiDB inherits MySQL's "no large transactions" guideline, and because of its distributed nature and Raft replication, large transactions are strongly discouraged.
Before TiDB 4.0, the system imposed strict limits such as a maximum of 5,000 SQL statements per transaction, a single KV entry size not exceeding 6 MB, a total KV entry count not exceeding 300,000, and a total KV size not exceeding 100 MB.
These limits caused common DML statements without filters (e.g., insert ... select ... where 1 , update ... where 1 , delete from ... where 1 ) to fail with errors like ERROR 8004 (HY000): transaction too large, len:300001 . Typical mitigations include using TiDB's unsafe batch features ( TiDB_batch_insert , TiDB_batch_delete ) or splitting update statements into smaller chunks.
Since TiDB 4.0, most limits have been removed except the 6 MB per KV entry restriction, and a configurable total transaction size limit can be set in the configuration file:
performance.txn-total-size-limit: 10737418240 // range 1G‑10GEven with this setting, TiDB still caps the maximum transaction size at 10 GB to avoid excessive memory consumption and write amplification that can degrade overall performance.
Therefore, large transactions should still be broken into smaller ones. The article classifies splitting strategies based on table primary‑key characteristics:
Table has a continuous primary key.
Table has a primary key that is not continuous.
Table has no primary key.
Case 1: Continuous primary key
Example: a table t1 with 1,000,000 rows and six secondary indexes. An unconditional update like update ytt.t1 set log_date = current_date() - interval ceil(rand()*1000) day where 1; becomes an implicit large transaction ( BEGIN … UPDATE … COMMIT ). By dividing the key range into ten parts (each handling 100,000 rows), the transaction can be split. Sample script:
#!/bin/sh
for i in `seq 1 10`; do
min_id=$(((i-1)*100000+1))
max_id=$((i*100000))
queries="update t1 set log_date = date_sub(current_date(), interval ceil(rand()*1000) day) \
where id >= $min_id and id <= $max_id;"
mysql --login-path=TiDB_login -D ytt -e "$queries" &
doneCase 2: Non‑continuous primary key
TiDB discourages using auto‑increment keys due to hotspot risks; instead auto_random is recommended. For non‑continuous keys, a window function can simulate a dense row number and split the update accordingly:
#!/bin/sh
for i in `seq 1 10`; do
min_id=$(((i-1)*100000+1))
max_id=$((i*100000))
queries="update t2 a, (select *, row_number() over(order by id) rn from t2) b \
set a.log_date = date_sub(current_date(), interval ceil(rand()*1000) day) \
where a.id = b.id and (b.rn >= $min_id and b.rn <= $max_id);"
mysql --login-path=TiDB_login -D ytt -e "$queries" &
doneBoth approaches cover the majority of splitting scenarios. For tables without a primary key, TiDB creates an implicit auto‑increment ID, but using an explicit primary key is still strongly advised.
Conclusion
Although TiDB 4.0 and later provide better support for larger transactions, they should not be used indiscriminately; proper table design and pre‑emptive splitting of data and queries remain essential for maintaining database performance.
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.