Databases 11 min read

Performance Comparison of Elasticsearch and ClickHouse for Log Search

This article compares Elasticsearch and ClickHouse in architecture, query capabilities, and performance for log search workloads, presenting Docker‑compose setups, data ingestion pipelines, sample queries, and benchmark results that show ClickHouse generally outperforms Elasticsearch, especially in aggregation scenarios.

Architecture Digest

May 3, 2022

Performance Comparison of Elasticsearch and ClickHouse for Log Search

Elasticsearch is a real‑time distributed search and analytics engine built on Lucene, often used together with Logstash and Kibana (the ELK stack) for log analysis.

ClickHouse, developed by Yandex, is a column‑oriented relational database optimized for OLAP workloads and has become popular in the past two years.

Both systems are compared in terms of architecture, design and query capabilities. Elasticsearch relies on inverted indexes and Bloom filters, while ClickHouse uses an MPP architecture, columnar storage, log‑structured merge trees, sparse indexes and SIMD optimizations, coordinated via Zookeeper.

The test environment consists of four Docker‑compose stacks: an Elasticsearch stack (single‑node Elasticsearch container and a Kibana container), a ClickHouse stack (single‑node ClickHouse container and a TabixUI client), a data‑ingestion stack using Vector.dev, and a test‑control stack using Jupyter notebooks with Python SDKs.

Docker‑compose definitions for the Elasticsearch and ClickHouse stacks are provided:

version: '3.7'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.4.0
    container_name: elasticsearch
    environment:
      - xpack.security.enabled=false
      - discovery.type=single-node
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    cap_add:
      - IPC_LOCK
    volumes:
      - elasticsearch-data:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
      - 9300:9300
    deploy:
      resources:
        limits:
          cpus: '4'
          memory: 4096M
        reservations:
          memory: 4096M

  kibana:
    container_name: kibana
    image: docker.elastic.co/kibana/kibana:7.4.0
    environment:
      - ELASTICSEARCH_HOSTS=http://elasticsearch:9200
    ports:
      - 5601:5601
    depends_on:
      - elasticsearch

volumes:
  elasticsearch-data:
    driver: local

version: "3.7"
services:
  clickhouse:
    container_name: clickhouse
    image: yandex/clickhouse-server
    volumes:
      - ./data/config:/var/lib/clickhouse
    ports:
      - "8123:8123"
      - "9000:9000"
      - "9009:9009"
      - "9004:9004"
    ulimits:
      nproc: 65535
      nofile:
        soft: 262144
        hard: 262144
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "localhost:8123/ping"]
      interval: 30s
      timeout: 5s
      retries: 3
    deploy:
      resources:
        limits:
          cpus: '4'
          memory: 4096M
        reservations:
          memory: 4096M

  tabixui:
    container_name: tabixui
    image: spoonest/clickhouse-tabix-web-client
    environment:
      - CH_NAME=dev
      - CH_HOST=127.0.0.1:8123
      - CH_LOGIN=default
    ports:
      - "18080:80"
    depends_on:
      - clickhouse
    deploy:
      resources:
        limits:
          cpus: '0.1'
          memory: 128M
        reservations:
          memory: 128M

A ClickHouse table for syslog data is created with a MergeTree engine and a TTL of one month:

CREATE TABLE default.syslog(
    application String,
    hostname String,
    message String,
    mid String,
    pid String,
    priority Int16,
    raw String,
    timestamp DateTime('UTC'),
    version Int16
) ENGINE = MergeTree()
    PARTITION BY toYYYYMMDD(timestamp)
    ORDER BY timestamp
    TTL timestamp + toIntervalMonth(1);

The Vector pipeline definition (vector.toml) is shown, illustrating sources, transforms (clone_message, parser, coercer) and sinks (out_console, out_clickhouse, out_es).

Several representative queries are executed on both systems, including match_all, term, multi_match, range, exists, regexp, and aggregation queries. Example ES DSL and ClickHouse SQL statements are listed for each case.

Performance tests run each query ten times via the Python SDK, and the response‑time distributions are visualized. The results indicate that ClickHouse generally outperforms Elasticsearch, especially in aggregation scenarios, while remaining competitive in regex and term queries.

The article concludes that ClickHouse is a highly efficient database for many search‑oriented workloads, explaining why several companies have migrated from Elasticsearch to ClickHouse.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL Vector log search

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.