Databases 7 min read

High‑Performance Bulk Loading of Over 2 Billion Rows into MySQL Using TokuDB

This article describes how to ingest more than two billion records into MySQL by leveraging XeLabs TokuDB’s bulk‑loader, detailing configuration, table schema, performance metrics, and a comparison with InnoDB to demonstrate a three‑to‑fourfold speed improvement.

Architect's Tech Stack

Apr 8, 2019

High‑Performance Bulk Loading of Over 2 Billion Rows into MySQL Using TokuDB

Demand : A friend needed to load over 2 billion rows received from a big‑data platform into MySQL for next‑day business reporting.

Implementation analysis : InnoDB can sustain 100‑150 k rows/s only when the data fits into memory; for larger datasets we evaluated XeLabs TokuDB.

XeLabs TokuDB introduction : Project URL https://github.com/XeLabs/tokudb with enhancements such as built‑in jemalloc, additional performance metrics, Xtrabackup support, ZSTD compression, and binlog‑group‑commit.

Test table configuration :

loose_tokudb_cache_size=4G

loose_tokudb_directio=ON

loose_tokudb_fsync_log_period=1000

tokudb_commit_sync=0

Table schema:

CREATE TABLE `user_summary` (
  `user_id` bigint(20) unsigned NOT NULL COMMENT '用户id/手机号',
  `weight` varchar(5) DEFAULT NULL COMMENT '和码体重(KG)',
  `level` varchar(20) DEFAULT NULL COMMENT '重量级',
  `beat_rate` varchar(12) DEFAULT NULL COMMENT '击败率',
  `level_num` int(10) DEFAULT NULL COMMENT '同吨位人数',
  UNIQUE KEY `u_user_id` (`user_id`)
) ENGINE=TokuDB DEFAULT CHARSET=utf8;

Data loading using MySQL bulk loader:

LOAD DATA INFILE '/u01/work/134-136.txt' \
INTO TABLE user_summary(user_id, weight, level, beat_rate, level_num);
Query OK, 200000000 rows affected (5 min 48.30 sec)
Records: 200000000  Deleted: 0  Skipped: 0  Warnings: 0

The operation processed 200 million rows in about 5 minutes 48 seconds, yielding roughly 574217.63 rows per second.

File size and compression : Original data file 8.5 GB, TokuDB storage size 3.5 GB (≈ 41 % of original). The entire load completed in just over 58 minutes, whereas InnoDB would require 3‑4 times longer for the same volume.

Additional scenario : When using an auto‑increment primary key, bulk loader cannot be applied, resulting in significantly slower insert performance.

Conclusion : On an 8‑core, 8 GB RAM, 500 GB high‑speed cloud‑disk environment, TokuDB consistently achieves up to 570 k rows/s, comfortably meeting massive ingestion requirements and outperforming InnoDB by a factor of three to four.

Test environment : CentOS 7 with a compiled XeLabs TokuDB build. Resources: TokuDB Bulk Loader documentation , Baidu Cloud download link, and extraction code provided.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance mysql Database Optimization Large Data bulk load TokuDB

Written by

Architect's Tech Stack

Java backend, microservices, distributed systems, containerized programming, and more.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.