High‑Performance Bulk Loading of Over 2 Billion Rows into MySQL Using TokuDB
This article describes how to ingest more than two billion records into MySQL by leveraging XeLabs TokuDB’s bulk‑loader, detailing configuration, table schema, performance metrics, and a comparison with InnoDB to demonstrate a three‑to‑fourfold speed improvement.
Demand : A friend needed to load over 2 billion rows received from a big‑data platform into MySQL for next‑day business reporting.
Implementation analysis : InnoDB can sustain 100‑150 k rows/s only when the data fits into memory; for larger datasets we evaluated XeLabs TokuDB.
XeLabs TokuDB introduction : Project URL https://github.com/XeLabs/tokudb with enhancements such as built‑in jemalloc, additional performance metrics, Xtrabackup support, ZSTD compression, and binlog‑group‑commit.
Test table configuration :
loose_tokudb_cache_size=4G loose_tokudb_directio=ON loose_tokudb_fsync_log_period=1000 tokudb_commit_sync=0Table schema:
CREATE TABLE `user_summary` (
`user_id` bigint(20) unsigned NOT NULL COMMENT '用户id/手机号',
`weight` varchar(5) DEFAULT NULL COMMENT '和码体重(KG)',
`level` varchar(20) DEFAULT NULL COMMENT '重量级',
`beat_rate` varchar(12) DEFAULT NULL COMMENT '击败率',
`level_num` int(10) DEFAULT NULL COMMENT '同吨位人数',
UNIQUE KEY `u_user_id` (`user_id`)
) ENGINE=TokuDB DEFAULT CHARSET=utf8;Data loading using MySQL bulk loader:
LOAD DATA INFILE '/u01/work/134-136.txt' \
INTO TABLE user_summary(user_id, weight, level, beat_rate, level_num);
Query OK, 200000000 rows affected (5 min 48.30 sec)
Records: 200000000 Deleted: 0 Skipped: 0 Warnings: 0The operation processed 200 million rows in about 5 minutes 48 seconds, yielding roughly 574217.63 rows per second.
File size and compression : Original data file 8.5 GB, TokuDB storage size 3.5 GB (≈ 41 % of original). The entire load completed in just over 58 minutes, whereas InnoDB would require 3‑4 times longer for the same volume.
Additional scenario : When using an auto‑increment primary key, bulk loader cannot be applied, resulting in significantly slower insert performance.
Conclusion : On an 8‑core, 8 GB RAM, 500 GB high‑speed cloud‑disk environment, TokuDB consistently achieves up to 570 k rows/s, comfortably meeting massive ingestion requirements and outperforming InnoDB by a factor of three to four.
Test environment : CentOS 7 with a compiled XeLabs TokuDB build. Resources: TokuDB Bulk Loader documentation , Baidu Cloud download link, and extraction code provided.
Architect's Tech Stack
Java backend, microservices, distributed systems, containerized programming, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.