Performance Comparison of Elasticsearch and ClickHouse for Log Search
This article compares Elasticsearch and ClickHouse in architecture, query capabilities, and performance for log search workloads, presenting Docker‑compose setups, data ingestion pipelines, sample queries, and benchmark results that show ClickHouse generally outperforms Elasticsearch, especially in aggregation scenarios.
Elasticsearch is a real‑time distributed search and analytics engine built on Lucene, often used together with Logstash and Kibana (the ELK stack) for log analysis.
ClickHouse, developed by Yandex, is a column‑oriented relational database optimized for OLAP workloads and has become popular in the past two years.
Both systems are compared in terms of architecture, design and query capabilities. Elasticsearch relies on inverted indexes and Bloom filters, while ClickHouse uses an MPP architecture, columnar storage, log‑structured merge trees, sparse indexes and SIMD optimizations, coordinated via Zookeeper.
The test environment consists of four Docker‑compose stacks: an Elasticsearch stack (single‑node Elasticsearch container and a Kibana container), a ClickHouse stack (single‑node ClickHouse container and a TabixUI client), a data‑ingestion stack using Vector.dev, and a test‑control stack using Jupyter notebooks with Python SDKs.
Docker‑compose definitions for the Elasticsearch and ClickHouse stacks are provided:
version: '3.7'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.4.0
container_name: elasticsearch
environment:
- xpack.security.enabled=false
- discovery.type=single-node
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
cap_add:
- IPC_LOCK
volumes:
- elasticsearch-data:/usr/share/elasticsearch/data
ports:
- 9200:9200
- 9300:9300
deploy:
resources:
limits:
cpus: '4'
memory: 4096M
reservations:
memory: 4096M
kibana:
container_name: kibana
image: docker.elastic.co/kibana/kibana:7.4.0
environment:
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
ports:
- 5601:5601
depends_on:
- elasticsearch
volumes:
elasticsearch-data:
driver: local version: "3.7"
services:
clickhouse:
container_name: clickhouse
image: yandex/clickhouse-server
volumes:
- ./data/config:/var/lib/clickhouse
ports:
- "8123:8123"
- "9000:9000"
- "9009:9009"
- "9004:9004"
ulimits:
nproc: 65535
nofile:
soft: 262144
hard: 262144
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "localhost:8123/ping"]
interval: 30s
timeout: 5s
retries: 3
deploy:
resources:
limits:
cpus: '4'
memory: 4096M
reservations:
memory: 4096M
tabixui:
container_name: tabixui
image: spoonest/clickhouse-tabix-web-client
environment:
- CH_NAME=dev
- CH_HOST=127.0.0.1:8123
- CH_LOGIN=default
ports:
- "18080:80"
depends_on:
- clickhouse
deploy:
resources:
limits:
cpus: '0.1'
memory: 128M
reservations:
memory: 128MA ClickHouse table for syslog data is created with a MergeTree engine and a TTL of one month:
CREATE TABLE default.syslog(
application String,
hostname String,
message String,
mid String,
pid String,
priority Int16,
raw String,
timestamp DateTime('UTC'),
version Int16
) ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY timestamp
TTL timestamp + toIntervalMonth(1);The Vector pipeline definition (vector.toml) is shown, illustrating sources, transforms (clone_message, parser, coercer) and sinks (out_console, out_clickhouse, out_es).
Several representative queries are executed on both systems, including match_all, term, multi_match, range, exists, regexp, and aggregation queries. Example ES DSL and ClickHouse SQL statements are listed for each case.
Performance tests run each query ten times via the Python SDK, and the response‑time distributions are visualized. The results indicate that ClickHouse generally outperforms Elasticsearch, especially in aggregation scenarios, while remaining competitive in regex and term queries.
The article concludes that ClickHouse is a highly efficient database for many search‑oriented workloads, explaining why several companies have migrated from Elasticsearch to ClickHouse.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.