Big Data 13 min read

Performance Comparison of Elasticsearch and ClickHouse for Log Analytics

This article compares Elasticsearch and ClickHouse as log analytics solutions, detailing their architectures, deployment configurations, query capabilities, and performance benchmarks across various query types, and demonstrates that ClickHouse generally outperforms Elasticsearch in speed and aggregation efficiency.

Architecture Digest

Jul 11, 2021

Performance Comparison of Elasticsearch and ClickHouse for Log Analytics

Elasticsearch is a real‑time distributed search and analytics engine built on Lucene, often used together with Logstash and Kibana (the ELK stack) for end‑to‑end log analysis. ClickHouse, developed by Yandex, is a column‑oriented relational DBMS designed for OLAP workloads and has become very popular in the big‑data space.

In recent years many companies (e.g., Ctrip, Kuaishou) have begun migrating log workloads from Elasticsearch to ClickHouse due to performance advantages.

Architecture and Design Comparison

Elasticsearch relies on inverted indexes and Bloom filters to solve search problems, using sharding and replica mechanisms for scalability and high availability. Its nodes can be classified as:

Client Node : handles API and data access, does not store data.

Data Node : stores and indexes data.

Master Node : coordinates the cluster, does not store data.

ClickHouse follows an MPP architecture with each node responsible for a portion of the data processing. It stores data column‑wise, uses compression, sparse indexes, and SIMD instructions for fast computation, and relies on ZooKeeper for node coordination.

Test Stacks

Four Docker‑Compose stacks were built:

ES stack : single‑node Elasticsearch container and a Kibana container.

ClickHouse stack : single‑node ClickHouse container and TabixUI as a client.

Data import stack : Vector.dev (similar to Fluentd) generates syslog data and feeds both stacks.

Test control stack : Jupyter notebooks using Python SDKs for Elasticsearch and ClickHouse to run queries.

Deployment files:

version: '3.7'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.4.0
    container_name: elasticsearch
    environment:
      - xpack.security.enabled=false
      - discovery.type=single-node
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    cap_add:
      - IPC_LOCK
    volumes:
      - elasticsearch-data:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
      - 9300:9300
    deploy:
      resources:
        limits:
          cpus: '4'
          memory: 4096M
        reservations:
          memory: 4096M

  kibana:
    container_name: kibana
    image: docker.elastic.co/kibana/kibana:7.4.0
    environment:
      - ELASTICSEARCH_HOSTS=http://elasticsearch:9200
    ports:
      - 5601:5601
    depends_on:
      - elasticsearch

volumes:
  elasticsearch-data:
    driver: local

version: "3.7"
services:
  clickhouse:
    container_name: clickhouse
    image: yandex/clickhouse-server
    volumes:
      - ./data/config:/var/lib/clickhouse
    ports:
      - "8123:8123"
      - "9000:9000"
      - "9009:9009"
      - "9004:9004"
    ulimits:
      nproc: 65535
      nofile:
        soft: 262144
        hard: 262144
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "localhost:8123/ping"]
      interval: 30s
      timeout: 5s
      retries: 3
    deploy:
      resources:
        limits:
          cpus: '4'
          memory: 4096M
        reservations:
          memory: 4096M

  tabixui:
    container_name: tabixui
    image: spoonest/clickhouse-tabix-web-client
    environment:
      - CH_NAME=dev
      - CH_HOST=127.0.0.1:8123
      - CH_LOGIN=default
    ports:
      - "18080:80"
    depends_on:
      - clickhouse
    deploy:
      resources:
        limits:
          cpus: '0.1'
          memory: 128M
        reservations:
          memory: 128M

A ClickHouse table for syslog data was created:

CREATE TABLE default.syslog(
    application String,
    hostname String,
    message String,
    mid String,
    pid String,
    priority Int16,
    raw String,
    timestamp DateTime('UTC'),
    version Int16
) ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY timestamp
TTL timestamp + toIntervalMonth(1);

Vector pipeline configuration (vector.toml) defines sources, transforms, and sinks to generate 100k syslog records and send them to both Elasticsearch and ClickHouse:

[sources.in]
  type = "generator"
  format = "syslog"
  interval = 0.01
  count = 100000

[transforms.clone_message]
  type = "add_fields"
  inputs = ["in"]
  fields.raw = "{{ message }}"

[transforms.parser]
  type = "regex_parser"
  inputs = ["clone_message"]
  field = "message"
  patterns = ['^(?<priority>\d*)>(?<version>\d) (?<timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}Z) (?<hostname>\w+\.\w+) (?<application>\w+) (?<pid>\d+) (?<mid>ID\d+) - (?<message>.*)$']

[transforms.coercer]
  type = "coercer"
  inputs = ["parser"]
  types.timestamp = "timestamp"
  types.version = "int"
  types.priority = "int"

[sinks.out_console]
  type = "console"
  inputs = ["coercer"]
  target = "stdout"
  encoding.codec = "json"

[sinks.out_clickhouse]
  type = "clickhouse"
  inputs = ["coercer"]
  host = "http://host.docker.internal:8123"
  table = "syslog"
  encoding.only_fields = ["application","hostname","message","mid","pid","priority","raw","timestamp","version"]
  encoding.timestamp_format = "unix"

[sinks.out_es]
  type = "elasticsearch"
  inputs = ["coercer"]
  endpoint = "http://host.docker.internal:9200"
  index = "syslog-%F"
  compression = "none"
  healthcheck.enabled = true

Query Comparison

Both systems were queried using equivalent statements (JSON DSL for ES, SQL for ClickHouse) covering match‑all, single‑field, multi‑field, term, range, exists, regex, and aggregation scenarios. Example queries:

# ES match_all
{ "query": { "match_all": {} } }
# ClickHouse
SELECT * FROM syslog;

# ES term query
{ "query": { "term": { "message": "pretty" } } }
# ClickHouse
SELECT * FROM syslog WHERE lowerUTF8(raw) LIKE '%pretty%';

Performance tests were run ten times per query using Python SDKs, and response time distributions were plotted.

The results show ClickHouse consistently outperforms Elasticsearch in most query types, especially aggregations, where columnar storage provides a clear advantage. Even for regex and term queries, ClickHouse remains competitive.

Conclusion

This article demonstrates that ClickHouse delivers superior performance for log analytics workloads compared to Elasticsearch, explaining why many organizations are transitioning to ClickHouse for such scenarios. While Elasticsearch offers richer query features, the basic queries tested here highlight ClickHouse’s efficiency.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Elasticsearch Performance Benchmark ClickHouse Log Analytics Docker-Compose

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.