Big Data 13 min read

Performance Comparison of Elasticsearch and ClickHouse for Log Search

This article compares Elasticsearch and ClickHouse as log‑search solutions, detailing their architectures, Docker‑compose deployments, data‑ingestion pipelines with Vector, query syntax differences, and benchmark results that show ClickHouse generally outperforms Elasticsearch in speed and aggregation efficiency.

Top Architect

Jul 27, 2023

Performance Comparison of Elasticsearch and ClickHouse for Log Search

Elasticsearch is a real‑time distributed search and analytics engine built on Lucene, often used together with Logstash and Kibana (the ELK stack) for log collection and visualization. ClickHouse, developed by Yandex, is a column‑oriented relational database optimized for OLAP workloads.

Both systems serve large‑scale log search, but ClickHouse has gained traction in recent years, with companies like Ctrip and Kuaishou migrating from Elasticsearch to ClickHouse.

Architecture Comparison

Elasticsearch uses a distributed design with three primary node types: Client Node (API access, no data storage), Data Node (stores and indexes data), and Master Node (cluster coordination). ClickHouse follows an MPP architecture where each node has equal responsibility and stores data in columnar format, leveraging log‑structured merge trees, sparse indexes, and SIMD optimizations. Both support Bloom filters for fast lookups.

Test Setup

The benchmark consists of four stacks:

version: '3.7'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.4.0
    container_name: elasticsearch
    environment:
      - xpack.security.enabled=false
      - discovery.type=single-node
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    cap_add:
      - IPC_LOCK
    volumes:
      - elasticsearch-data:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
      - 9300:9300
    deploy:
      resources:
        limits:
          cpus: '4'
          memory: 4096M
        reservations:
          memory: 4096M

  kibana:
    container_name: kibana
    image: docker.elastic.co/kibana/kibana:7.4.0
    environment:
      - ELASTICSEARCH_HOSTS=http://elasticsearch:9200
    ports:
      - 5601:5601
    depends_on:
      - elasticsearch

volumes:
  elasticsearch-data:
    driver: local

version: "3.7"
services:
  clickhouse:
    container_name: clickhouse
    image: yandex/clickhouse-server
    volumes:
      - ./data/config:/var/lib/clickhouse
    ports:
      - "8123:8123"
      - "9000:9000"
      - "9009:9009"
      - "9004:9004"
    ulimits:
      nproc: 65535
      nofile:
        soft: 262144
        hard: 262144
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "localhost:8123/ping"]
      interval: 30s
      timeout: 5s
      retries: 3
    deploy:
      resources:
        limits:
          cpus: '4'
          memory: 4096M
        reservations:
          memory: 4096M

  tabixui:
    container_name: tabixui
    image: spoonest/clickhouse-tabix-web-client
    environment:
      - CH_NAME=dev
      - CH_HOST=127.0.0.1:8123
      - CH_LOGIN=default
    ports:
      - "18080:80"
    depends_on:
      - clickhouse
    deploy:
      resources:
        limits:
          cpus: '0.1'
          memory: 128M
        reservations:
          memory: 128M

Data ingestion uses Vector (similar to Fluentd) with a generator source that creates 100 000 syslog messages. The pipeline parses, enriches, and routes data to both Elasticsearch and ClickHouse:

[sources.in]
  type = "generator"
  format = "syslog"
  interval = 0.01
  count = 100000

[transforms.clone_message]
  type = "add_fields"
  inputs = ["in"]
  fields.raw = "{{ message }}"

[transforms.parser]
  type = "regex_parser"
  inputs = ["clone_message"]
  field = "message"
  patterns = ['^<(?P<priority>\d*)>(?P<version>\d) (?P<timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}Z) (?P<hostname>\w+\.\w+) (?P<application>\w+) (?P<pid>\d+) (?P<mid>ID\d+) - (?P<message>.*)$']

[transforms.coercer]
  type = "coercer"
  inputs = ["parser"]
  types.timestamp = "timestamp"
  types.version = "int"
  types.priority = "int"

[sinks.out_console]
  type = "console"
  inputs = ["coercer"]
  target = "stdout"
  encoding.codec = "json"

[sinks.out_clickhouse]
  type = "clickhouse"
  inputs = ["coercer"]
  host = "http://host.docker.internal:8123"
  table = "syslog"
  encoding.only_fields = ["application","hostname","message","mid","pid","priority","raw","timestamp","version"]
  encoding.timestamp_format = "unix"

[sinks.out_es]
  type = "elasticsearch"
  inputs = ["coercer"]
  endpoint = "http://host.docker.internal:9200"
  index = "syslog-%F"
  compression = "none"
  healthcheck.enabled = true

ClickHouse table creation:

CREATE TABLE default.syslog(
    application String,
    hostname String,
    message String,
    mid String,
    pid String,
    priority Int16,
    raw String,
    timestamp DateTime('UTC'),
    version Int16
) ENGINE = MergeTree()
    PARTITION BY toYYYYMMDD(timestamp)
    ORDER BY timestamp
    TTL timestamp + toIntervalMonth(1);

Benchmark queries include match‑all, single‑field match, multi‑field match, term search, range queries, existence checks, regex searches, and aggregations. Each query is executed ten times on both stacks using the Python SDK, and response time distributions are recorded.

Results show ClickHouse consistently outperforms Elasticsearch in most query types, especially aggregations, due to its columnar storage and efficient compression. While Elasticsearch offers richer query capabilities, the basic queries tested demonstrate ClickHouse’s suitability for many log‑search scenarios.

Conclusion

ClickHouse delivers superior performance for the tested workloads and can serve as a compelling alternative to Elasticsearch for log analytics, though Elasticsearch still provides a broader feature set for complex searches.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance docker Big Data Elasticsearch ClickHouse Vector log search

Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.