Databases 7 min read

High‑Performance Bulk Loading of Over 2 Billion Rows into MySQL Using TokuDB

This article describes how to ingest more than two billion records into MySQL by leveraging XeLabs TokuDB’s bulk‑loader, detailing configuration, table schema, performance metrics, and a comparison with InnoDB to demonstrate a three‑to‑fourfold speed improvement.

Architect's Tech Stack
Architect's Tech Stack
Architect's Tech Stack
High‑Performance Bulk Loading of Over 2 Billion Rows into MySQL Using TokuDB

Demand : A friend needed to load over 2 billion rows received from a big‑data platform into MySQL for next‑day business reporting.

Implementation analysis : InnoDB can sustain 100‑150 k rows/s only when the data fits into memory; for larger datasets we evaluated XeLabs TokuDB.

XeLabs TokuDB introduction : Project URL https://github.com/XeLabs/tokudb with enhancements such as built‑in jemalloc, additional performance metrics, Xtrabackup support, ZSTD compression, and binlog‑group‑commit.

Test table configuration :

loose_tokudb_cache_size=4G
loose_tokudb_directio=ON
loose_tokudb_fsync_log_period=1000
tokudb_commit_sync=0

Table schema:

CREATE TABLE `user_summary` (
  `user_id` bigint(20) unsigned NOT NULL COMMENT '用户id/手机号',
  `weight` varchar(5) DEFAULT NULL COMMENT '和码体重(KG)',
  `level` varchar(20) DEFAULT NULL COMMENT '重量级',
  `beat_rate` varchar(12) DEFAULT NULL COMMENT '击败率',
  `level_num` int(10) DEFAULT NULL COMMENT '同吨位人数',
  UNIQUE KEY `u_user_id` (`user_id`)
) ENGINE=TokuDB DEFAULT CHARSET=utf8;

Data loading using MySQL bulk loader:

LOAD DATA INFILE '/u01/work/134-136.txt' \
INTO TABLE user_summary(user_id, weight, level, beat_rate, level_num);
Query OK, 200000000 rows affected (5 min 48.30 sec)
Records: 200000000  Deleted: 0  Skipped: 0  Warnings: 0

The operation processed 200 million rows in about 5 minutes 48 seconds, yielding roughly 574217.63 rows per second.

File size and compression : Original data file 8.5 GB, TokuDB storage size 3.5 GB (≈ 41 % of original). The entire load completed in just over 58 minutes, whereas InnoDB would require 3‑4 times longer for the same volume.

Additional scenario : When using an auto‑increment primary key, bulk loader cannot be applied, resulting in significantly slower insert performance.

Conclusion : On an 8‑core, 8 GB RAM, 500 GB high‑speed cloud‑disk environment, TokuDB consistently achieves up to 570 k rows/s, comfortably meeting massive ingestion requirements and outperforming InnoDB by a factor of three to four.

Test environment : CentOS 7 with a compiled XeLabs TokuDB build. Resources: TokuDB Bulk Loader documentation , Baidu Cloud download link, and extraction code provided.

PerformancemysqlDatabase Optimizationlarge-dataBulk LoadTokuDB
Architect's Tech Stack
Written by

Architect's Tech Stack

Java backend, microservices, distributed systems, containerized programming, and more.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.