Databases 7 min read

High‑Speed Loading of 2 Billion Rows into MySQL Using TokuDB

This article describes a real‑world requirement to import over 2 billion records into MySQL, analyzes the challenges, introduces XeLabs TokuDB with its optimizations, details the test schema and configuration, demonstrates bulk loading commands, presents performance metrics showing up to 570 k rows per second, and concludes with practical recommendations and environment details.

Architect's Tech Stack
Architect's Tech Stack
Architect's Tech Stack
High‑Speed Loading of 2 Billion Rows into MySQL Using TokuDB

A friend needed to load more than 2 billion rows received from a big‑data platform into MySQL for next‑day business reporting.

Because InnoDB can only sustain about 100‑150 k rows per second when the dataset fits in memory, the author evaluated XeLabs TokuDB as an alternative.

XeLabs TokuDB Overview

Project: https://github.com/XeLabs/tokudb

Built‑in jemalloc memory allocator

Additional TokuDB performance metrics

Supports XtraBackup backups

Integrates ZSTD compression

Supports TokuDB binlog group‑commit

Test Table Configuration

loose_tokudb_cache_size=4G
loose_tokudb_directio=ON
loose_tokudb_fsync_log_period=1000
tokudb_commit_sync=0

Table definition (TokuDB engine):

CREATE TABLE `user_summary`(
  `user_id` bigint(20) unsigned NOT NULL COMMENT '用户id/手机号',
  `weight` varchar(5) DEFAULT NULL COMMENT '和码体重(KG)',
  `level` varchar(20) DEFAULT NULL COMMENT '重量级',
  `beat_rate` varchar(12) DEFAULT NULL COMMENT '击败率',
  `level_num` int(10) DEFAULT NULL COMMENT '同吨位人数',
  UNIQUE KEY `u_user_id`(`user_id`)
) ENGINE=TokuDB DEFAULT CHARSET=utf8;

Data loading using LOAD DATA INFILE :

root@localhost [zst]> LOAD DATA INFILE '/u01/work/134-136.txt' \
INTO TABLE user_summary(user_id,weight,level,beat_rate,level_num);
Query OK, 200000000 rows affected (5 min 48.30 sec)
Records: 200000000  Deleted: 0  Skipped: 0  Warnings: 0

Calculated write speed:

SELECT 200000000/(5*60+48.30);  -- result: 574217.6285 rows/sec

File size: source file 8.5 GB, TokuDB storage 3.5 GB (≈ 40 % compression). The test completed in just over 58 minutes, meeting the requirement. In a comparable InnoDB test, the same data required roughly 3‑4 times longer.

Additional scenarios showed that using an auto‑increment primary key disables TokuDB’s bulk loader, causing a significant slowdown.

Recommendations: when using TokuDB bulk loader, avoid auto‑increment columns or replace them with a unique key to achieve the best performance; note that compression may be less effective in bulk‑load mode.

Test Environment

CentOS 7 with a custom‑compiled XeLabs TokuDB version, 8 CPU cores, 8 GB RAM, 500 GB high‑performance cloud disk.

Further resources: TokuDB Bulk Loader documentation – https://github.com/percona/PerconaFT/wiki/TokuFT-Bulk-Loader

PerformancemysqlDatabase Optimizationlarge-dataBulk LoadTokuDB
Architect's Tech Stack
Written by

Architect's Tech Stack

Java backend, microservices, distributed systems, containerized programming, and more.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.