Databases 7 min read

High‑Speed Loading of 2 Billion Rows into MySQL Using TokuDB

This article describes a real‑world requirement to import over 2 billion records into MySQL, analyzes the challenges, introduces XeLabs TokuDB with its optimizations, details the test schema and configuration, demonstrates bulk loading commands, presents performance metrics showing up to 570 k rows per second, and concludes with practical recommendations and environment details.

Architect's Tech Stack

Jan 9, 2021

High‑Speed Loading of 2 Billion Rows into MySQL Using TokuDB

A friend needed to load more than 2 billion rows received from a big‑data platform into MySQL for next‑day business reporting.

Because InnoDB can only sustain about 100‑150 k rows per second when the dataset fits in memory, the author evaluated XeLabs TokuDB as an alternative.

XeLabs TokuDB Overview

Project: https://github.com/XeLabs/tokudb

Built‑in jemalloc memory allocator

Additional TokuDB performance metrics

Supports XtraBackup backups

Integrates ZSTD compression

Supports TokuDB binlog group‑commit

Test Table Configuration

loose_tokudb_cache_size=4G
loose_tokudb_directio=ON
loose_tokudb_fsync_log_period=1000
tokudb_commit_sync=0

Table definition (TokuDB engine):

CREATE TABLE `user_summary`(
  `user_id` bigint(20) unsigned NOT NULL COMMENT '用户id/手机号',
  `weight` varchar(5) DEFAULT NULL COMMENT '和码体重(KG)',
  `level` varchar(20) DEFAULT NULL COMMENT '重量级',
  `beat_rate` varchar(12) DEFAULT NULL COMMENT '击败率',
  `level_num` int(10) DEFAULT NULL COMMENT '同吨位人数',
  UNIQUE KEY `u_user_id`(`user_id`)
) ENGINE=TokuDB DEFAULT CHARSET=utf8;

Data loading using LOAD DATA INFILE:

root@localhost [zst]> LOAD DATA INFILE '/u01/work/134-136.txt' \
INTO TABLE user_summary(user_id,weight,level,beat_rate,level_num);
Query OK, 200000000 rows affected (5 min 48.30 sec)
Records: 200000000  Deleted: 0  Skipped: 0  Warnings: 0

Calculated write speed:

SELECT 200000000/(5*60+48.30);  -- result: 574217.6285 rows/sec

File size: source file 8.5 GB, TokuDB storage 3.5 GB (≈ 40 % compression). The test completed in just over 58 minutes, meeting the requirement. In a comparable InnoDB test, the same data required roughly 3‑4 times longer.

Additional scenarios showed that using an auto‑increment primary key disables TokuDB’s bulk loader, causing a significant slowdown.

Recommendations: when using TokuDB bulk loader, avoid auto‑increment columns or replace them with a unique key to achieve the best performance; note that compression may be less effective in bulk‑load mode.

Test Environment

CentOS 7 with a custom‑compiled XeLabs TokuDB version, 8 CPU cores, 8 GB RAM, 500 GB high‑performance cloud disk.

Further resources: TokuDB Bulk Loader documentation – https://github.com/percona/PerconaFT/wiki/TokuFT-Bulk-Loader

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance mysql Database Optimization Large Data bulk load TokuDB

Written by

Architect's Tech Stack

Java backend, microservices, distributed systems, containerized programming, and more.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.