Databases 12 min read

Migrating Log Processing from Elasticsearch to ClickHouse: Architecture, Deployment, Optimization, and Benefits

This article details Ctrip's migration of large‑scale log processing from Elasticsearch to ClickHouse, explaining why ClickHouse was chosen, the high‑availability deployment architecture, data ingestion strategies, dashboard integration, performance gains, operational practices, and overall cost and reliability improvements.

Ctrip Technology

Jan 22, 2020

Migrating Log Processing from Elasticsearch to ClickHouse: Architecture, Deployment, Optimization, and Benefits

ElasticSearch, a distributed full‑text search engine, has been used by Ctrip to handle over 200 TB of daily logs across more than 500 servers, but growing costs, latency, and operational complexity prompted a search for alternatives.

Why ClickHouse? ClickHouse is a high‑performance columnar distributed DBMS offering significantly higher write throughput (50‑200 MB/s per server, >600 k records/s), 5‑30× faster queries, lower storage (1/3‑1/30 of ES), reduced memory and CPU usage, and better stability through sharding and partitioning.

The log format already fits ClickHouse tables, and most queries are aggregations that align with column‑store strengths, making ClickHouse a suitable replacement for the majority of Ctrip's log workloads.

High‑availability deployment uses multiple shards with two replicas each, coordinated via Zookeeper, allowing a shard to lose a node without data loss. Two cluster sizes (6‑node and 20‑node) are deployed to accommodate different log volumes, and cross‑IDC clusters are built using distributed tables.

Key configuration parameters include:

max_threads: 32  # max query threads per user

max_memory_usage: 10000000000  # ~9.31 GB per query

max_execution_time: 30  # seconds

skip_unavailable_shards: 1  # continue query if a shard is down

Data ingestion is performed with gohangout, employing round‑robin writes, large batch low‑frequency inserts, avoiding distributed tables for writes, daily partitioning by day, and careful primary‑key and index design to prevent slowdowns.

For visualization, Kibana 3 was extended to support ClickHouse, reproducing common chart types (terms, histogram, percentiles, ranges, table) with comparable user experience but dramatically faster query performance.

Query optimizations include splitting large table panel queries into two steps (estimate row count, then fetch detailed rows), using approximate calculations, materialized views/columns, and limiting result sets, achieving up to 60× faster response times and reducing data processed by up to 1/120.

A migration tool was built to adjust Kibana dashboard configurations for ClickHouse compatibility.

Operational results show that a single ClickHouse cluster handling ~100 TB of logs (≈600 TB uncompressed) uses far less memory and disk space than Elasticsearch—up to 60 % disk savings and 4.4‑38× query speed improvements—while cutting server resource needs by roughly half.

Basic ClickHouse operations are simpler than ES, covering new log ingestion, performance tuning, partition cleanup, monitoring via ClickHouse‑exporter + VictoriaMetrics + Grafana, and data migration using distributed tables or ClickHouse‑copier.

Common issues such as slow queries, “Too many parts” errors, and startup failures are addressed with configuration adjustments, batch sizing, avoiding distributed‑table writes, and filesystem or table repair procedures.

In summary, migrating logs from Elasticsearch to ClickHouse reduces infrastructure costs, lowers operational overhead, and dramatically improves query latency, enhancing user experience during incident investigations, while acknowledging that Elasticsearch remains indispensable for certain use cases.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed Systems Performance Optimization Elasticsearch ClickHouse database migration Log Processing

Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.