Performance Comparison of Elasticsearch and ClickHouse for Log Search
This article compares Elasticsearch and ClickHouse as log‑search solutions, detailing their architectures, Docker‑compose deployments, data‑ingestion pipelines with Vector, query syntax differences, and benchmark results that show ClickHouse generally outperforms Elasticsearch in speed and aggregation efficiency.
Elasticsearch is a real‑time distributed search and analytics engine built on Lucene, often used together with Logstash and Kibana (the ELK stack) for log collection and visualization. ClickHouse, developed by Yandex, is a column‑oriented relational database optimized for OLAP workloads.
Both systems serve large‑scale log search, but ClickHouse has gained traction in recent years, with companies like Ctrip and Kuaishou migrating from Elasticsearch to ClickHouse.
Architecture Comparison
Elasticsearch uses a distributed design with three primary node types: Client Node (API access, no data storage), Data Node (stores and indexes data), and Master Node (cluster coordination). ClickHouse follows an MPP architecture where each node has equal responsibility and stores data in columnar format, leveraging log‑structured merge trees, sparse indexes, and SIMD optimizations. Both support Bloom filters for fast lookups.
Test Setup
The benchmark consists of four stacks:
version: '3.7'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.4.0
container_name: elasticsearch
environment:
- xpack.security.enabled=false
- discovery.type=single-node
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
cap_add:
- IPC_LOCK
volumes:
- elasticsearch-data:/usr/share/elasticsearch/data
ports:
- 9200:9200
- 9300:9300
deploy:
resources:
limits:
cpus: '4'
memory: 4096M
reservations:
memory: 4096M
kibana:
container_name: kibana
image: docker.elastic.co/kibana/kibana:7.4.0
environment:
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
ports:
- 5601:5601
depends_on:
- elasticsearch
volumes:
elasticsearch-data:
driver: local version: "3.7"
services:
clickhouse:
container_name: clickhouse
image: yandex/clickhouse-server
volumes:
- ./data/config:/var/lib/clickhouse
ports:
- "8123:8123"
- "9000:9000"
- "9009:9009"
- "9004:9004"
ulimits:
nproc: 65535
nofile:
soft: 262144
hard: 262144
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "localhost:8123/ping"]
interval: 30s
timeout: 5s
retries: 3
deploy:
resources:
limits:
cpus: '4'
memory: 4096M
reservations:
memory: 4096M
tabixui:
container_name: tabixui
image: spoonest/clickhouse-tabix-web-client
environment:
- CH_NAME=dev
- CH_HOST=127.0.0.1:8123
- CH_LOGIN=default
ports:
- "18080:80"
depends_on:
- clickhouse
deploy:
resources:
limits:
cpus: '0.1'
memory: 128M
reservations:
memory: 128MData ingestion uses Vector (similar to Fluentd) with a generator source that creates 100 000 syslog messages. The pipeline parses, enriches, and routes data to both Elasticsearch and ClickHouse:
[sources.in]
type = "generator"
format = "syslog"
interval = 0.01
count = 100000
[transforms.clone_message]
type = "add_fields"
inputs = ["in"]
fields.raw = "{{ message }}"
[transforms.parser]
type = "regex_parser"
inputs = ["clone_message"]
field = "message"
patterns = ['^<(?P
\d*)>(?P
\d) (?P
\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}Z) (?P
\w+\.\w+) (?P
\w+) (?P
\d+) (?P
ID\d+) - (?P
.*)$']
[transforms.coercer]
type = "coercer"
inputs = ["parser"]
types.timestamp = "timestamp"
types.version = "int"
types.priority = "int"
[sinks.out_console]
type = "console"
inputs = ["coercer"]
target = "stdout"
encoding.codec = "json"
[sinks.out_clickhouse]
type = "clickhouse"
inputs = ["coercer"]
host = "http://host.docker.internal:8123"
table = "syslog"
encoding.only_fields = ["application","hostname","message","mid","pid","priority","raw","timestamp","version"]
encoding.timestamp_format = "unix"
[sinks.out_es]
type = "elasticsearch"
inputs = ["coercer"]
endpoint = "http://host.docker.internal:9200"
index = "syslog-%F"
compression = "none"
healthcheck.enabled = trueClickHouse table creation:
CREATE TABLE default.syslog(
application String,
hostname String,
message String,
mid String,
pid String,
priority Int16,
raw String,
timestamp DateTime('UTC'),
version Int16
) ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY timestamp
TTL timestamp + toIntervalMonth(1);Benchmark queries include match‑all, single‑field match, multi‑field match, term search, range queries, existence checks, regex searches, and aggregations. Each query is executed ten times on both stacks using the Python SDK, and response time distributions are recorded.
Results show ClickHouse consistently outperforms Elasticsearch in most query types, especially aggregations, due to its columnar storage and efficient compression. While Elasticsearch offers richer query capabilities, the basic queries tested demonstrate ClickHouse’s suitability for many log‑search scenarios.
Conclusion
ClickHouse delivers superior performance for the tested workloads and can serve as a compelling alternative to Elasticsearch for log analytics, though Elasticsearch still provides a broader feature set for complex searches.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.