Databases 7 min read

High‑Speed Bulk Loading of 20 Billion Rows into MySQL Using TokuDB

This article details a real‑world test of loading over 20 billion records into MySQL with XeLabs TokuDB, covering the demand, configuration tweaks, table schema, bulk‑loader commands, performance metrics, comparison with InnoDB, and practical conclusions for large‑scale data ingestion.

Architect's Tech Stack

Aug 7, 2023

High‑Speed Bulk Loading of 20 Billion Rows into MySQL Using TokuDB

A friend needed to import more than 20 billion rows from a big‑data platform into MySQL for next‑day reporting. The author evaluated MySQL’s InnoDB limits (10‑15 k rows/s when data fits in memory) and turned to XeLabs TokuDB for better bulk‑load performance.

XeLabs TokuDB Overview Project: https://github.com/XeLabs/tokudb Key enhancements over official TokuDB include built‑in jemalloc, additional performance metrics, Xtrabackup support, ZSTD compression, and group‑commit support for binlog.

Test Table Schema

CREATE TABLE `user_summary` (
  `user_id`   BIGINT(20) UNSIGNED NOT NULL COMMENT '用户id/手机号',
  `weight`    VARCHAR(5)  DEFAULT NULL COMMENT '和码体重(KG)',
  `level`    VARCHAR(20) DEFAULT NULL COMMENT '重量级',
  `beat_rate`VARCHAR(12) DEFAULT NULL COMMENT '击败率',
  `level_num`INT(10)    DEFAULT NULL COMMENT '同吨位人数',
  UNIQUE KEY `u_user_id` (`user_id`)
) ENGINE=TokuDB DEFAULT CHARSET=utf8;

TokuDB Core Configuration

loose_tokudb_cache_size=4G
loose_tokudb_directio=ON
loose_tokudb_fsync_log_period=1000
tokudb_commit_sync=0

Bulk Load Command

LOAD DATA INFILE '/u01/work/134-136.txt' \
INTO TABLE user_summary(user_id, weight, level, beat_rate, level_num);
-- Result: 200,000,000 rows affected in 5 min 48.30 s

The calculated write speed is approximately 574 217.63 rows/s. The source file size is 8.5 GB, while the TokuDB data file occupies 3.5 GB, showing a compression ratio of about 2:1. The entire load completes in roughly 58 minutes, far faster than InnoDB, which would take 3‑4 times longer for the same data volume.

Additional Scenarios

When using an auto‑increment primary key, bulk‑loader benefits disappear, and the load falls back to row‑by‑row inserts, dramatically reducing throughput. The author suggests removing the auto‑increment attribute and using a unique index to retain bulk‑load speed.

Test Environment

CentOS 7 with a custom‑compiled XeLabs TokuDB, 8 CPU cores, 8 GB RAM, and 500 GB high‑performance cloud storage.

Conclusion

In a cloud environment with the above specs, TokuDB can sustain over 570 k rows per second, easily meeting the requirement of loading 20 billion rows within an hour. InnoDB would require significantly more time, making TokuDB the preferred engine for massive bulk imports.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance mysql Database Optimization Large Data TokuDB Bulk Loading

Written by

Architect's Tech Stack

Java backend, microservices, distributed systems, containerized programming, and more.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.