ClickHouse Overview, MySQL Migration, Performance Benchmark, and Practical Tips
This article introduces ClickHouse as an OLAP columnar database, explains the differences between row‑ and column‑oriented storage, details a real‑world migration from MySQL using a CREATE TABLE AS SELECT approach, presents performance comparisons, discusses synchronization methods, and shares common pitfalls and solutions.
1. What is ClickHouse?
ClickHouse is a column‑oriented DBMS designed for online analytical processing (OLAP).
Key concepts:
OLTP – traditional relational databases focusing on transaction consistency (e.g., banking, e‑commerce).
OLAP – warehouse‑type databases for complex data analysis and decision support.
Row‑oriented databases (MySQL, Postgres, MS SQL Server) store data row by row, while ClickHouse stores data column by column, reducing I/O and enabling high compression.
2. Business Problem
A MySQL table with 50 million rows required over 3 minutes for a join query. After indexing and sharding, performance was still unsatisfactory, prompting a migration to ClickHouse.
After migration, query time dropped to under 1 second, a 200× speed‑up.
3. ClickHouse Practice
3.1 Installing ClickHouse on macOS
Installation via Docker is recommended; alternatively, compile from source.
3.2 Data Migration: MySQL → ClickHouse
ClickHouse supports most MySQL syntax, offering five migration strategies. The article uses the "CREATE TABLE AS SELECT" method:
CREATE TABLE [IF NOT EXISTS] db.table_name ENGINE = MergeTree AS SELECT *
FROM mysql('host:port','db','database','user','password')3.3 Performance Test Comparison
Type
Data Volume
Table Size
Query Speed
MySQL
50 M
10 GB
205 s
ClickHouse
50 M
600 MB
<1 s
3.4 Data Synchronization Solutions
Temporary Table Method
Create a temporary table in ClickHouse, bulk‑load MySQL data, then replace the original table. Suitable for moderate data volumes with frequent incremental changes.
Synch Tool
Open‑source tool synch captures MySQL binlog statements and forwards them via a message queue for eventual consistency.
3.5 Why ClickHouse Is Fast
Only reads columns required for computation, avoiding full‑row I/O.
Same‑type column storage enables up to ten‑fold compression, further reducing I/O.
Custom storage‑aware search algorithms optimize data access patterns.
4. Pitfalls Encountered
4.1 Data Type Differences Between ClickHouse and MySQL
Queries may fail due to mismatched types; casting to a common unsigned type (e.g., toUInt32 ) resolves the issue.
4.2 Asynchronous Delete/Update Operations
Even MergeTree guarantees only eventual consistency. For strict consistency, perform full‑table synchronization.
5. Conclusion
Using ClickHouse solved the MySQL query bottleneck: queries on sub‑billion‑row datasets return within 1 second, and the system scales to clusters for larger workloads.
References:
ClickHouse official documentation: https://clickhouse.tech/docs/zh/
ClickHouse case study at Ctrip Hotels
Choosing ClickHouse engines
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.