Databases 6 min read

High‑Performance Bulk Loading of Over 2 Billion Rows into MySQL Using XeLabs TokuDB

This article describes how to quickly import more than two billion rows from a big‑data source into MySQL by leveraging XeLabs TokuDB’s bulk‑loader, showing configuration, code examples, performance results, and practical recommendations for handling large‑scale data ingestion.

Java Captain
Java Captain
Java Captain
High‑Performance Bulk Loading of Over 2 Billion Rows into MySQL Using XeLabs TokuDB

A friend needed to load over 2 billion rows received from a big‑data platform into MySQL for next‑day business reporting, requiring a fast ingestion method.

Typical InnoDB bulk inserts can reach 100‑150 k rows per second only when the dataset fits in memory; most real‑world cases exceed memory, so the author evaluated XeLabs TokuDB as an alternative.

XeLabs TokuDB (https://github.com/XeLabs/tokudb) adds several optimizations over the official version, including built‑in jemalloc memory allocation, extra performance metrics, Xtrabackup support, ZSTD compression, and the binlog_group_commit feature.

TokuDB core configuration: loose_tokudb_cache_size=4G loose_tokudb_directio=ON loose_tokudb_fsync_log_period=1000 tokudb_commit_sync=0

Table schema used for the test: CREATE TABLE `user_summary` ( `user_id` bigint(20) unsigned NOT NULL COMMENT '用户id/手机号', `weight` varchar(5) DEFAULT NULL COMMENT '和码体重(KG)', `level` varchar(20) DEFAULT NULL COMMENT '重量级', `beat_rate` varchar(12) DEFAULT NULL COMMENT '击败率', `level_num` int(10) DEFAULT NULL COMMENT '同吨位人数', UNIQUE KEY `u_user_id` (`user_id`) ) ENGINE=TokuDB DEFAULT CHARSET=utf8;

Data was loaded with MySQL’s LOAD DATA INFILE command:

root@localhost [zst]>LOAD DATA INFILE '/u01/work/134-136.txt' \
INTO TABLE user_summary(user_id, weight, level, beat_rate, level_num);
Query OK, 200000000 rows affected (5 min 48.30 sec)
Records: 200000000 Deleted: 0 Skipped: 0 Warnings: 0

The resulting write speed was calculated as approximately 574 k rows per second:

root@localhost [zst]>select 200000000/(5*60+48.30);
+------------------------+
| 200000000/(5*60+48.30) |
+------------------------+
| 574217.6285            |
+------------------------+

File size comparison showed the original 8.5 GB text file compressed to 3.5 GB in TokuDB (about 40 % of the original). In contrast, an equivalent InnoDB load produced files 3‑4 times larger (35 GB and 176 GB) and took significantly longer.

Additional tests demonstrated that using an auto‑increment primary key disables TokuDB’s bulk‑loader, causing the load time for 200 million rows to increase to over 22 minutes. The recommendation is to avoid auto‑increment columns for bulk loads, or replace them with a unique index to retain high performance.

Test environment: CentOS 7 VM with 8 CPU cores, 8 GB RAM, and a 500 GB high‑performance cloud disk; XeLabs TokuDB was compiled from source (binary available via Baidu Cloud).

Conclusion: In the described cloud environment, TokuDB can reliably achieve around 570 k rows per second for massive data loads, offering substantial speed and storage savings compared with InnoDB.

performanceMySQLDatabase Optimizationlarge-dataBulk LoadTokuDB
Java Captain
Written by

Java Captain

Focused on Java technologies: SSM, the Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading; occasionally covers DevOps tools like Jenkins, Nexus, Docker, ELK; shares practical tech insights and is dedicated to full‑stack Java development.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.