Elasticsearch vs ClickHouse: Performance, Cost, and Deployment Guide
This article compares Elasticsearch and ClickHouse in terms of write throughput, query speed, and server cost, then provides a step‑by‑step deployment guide for a private data pipeline using Zookeeper, Kafka, FileBeat, and ClickHouse, along with common issues and their solutions.
The author explains the need for a private data‑analysis platform to address security and cost concerns of SaaS services, and chooses ClickHouse as a more efficient alternative to Elasticsearch for log storage and analysis.
Elasticsearch vs ClickHouse
ClickHouse offers significantly higher write throughput (50‑200 MB/s per server, over 600 k records/s, >5× ES) and fewer write rejections, resulting in more reliable ingestion.
Query performance is also superior; with data cached in page‑cache, ClickHouse can achieve 2‑30 GB/s, 5‑30× faster than Elasticsearch.
Because ClickHouse compresses data more aggressively (1/3 – 1/30 of ES disk usage) it reduces storage, I/O, and CPU consumption, cutting server costs roughly in half.
Cost Analysis
A cost comparison on Alibaba Cloud (Aliyun) without discounts shows ClickHouse to be markedly cheaper than Elasticsearch for the same workload.
Environment Deployment
1. Zookeeper Cluster Deployment
yum install java-1.8.0-openjdk-devel.x86_64
/etc/profile 配置环境变量
更新系统时间
yum install ntpdate
ntpdate asia.pool.ntp.org
mkdir zookeeper
mkdir ./zookeeper/data
mkdir ./zookeeper/logs
wget --no-check-certificate https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/zookeeper-3.7.1/apache-zookeeper-3.7.1-bin.tar.gz
tar -zvxf apache-zookeeper-3.7.1-bin.tar.gz -C /usr/zookeeper
export ZOOKEEPER_HOME=/usr/zookeeper/apache-zookeeper-3.7.1-bin
export PATH=$ZOOKEEPER_HOME/bin:$PATH
cd $ZOOKEEPER_HOME/conf
vi zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/usr/zookeeper/data
dataLogDir=/usr/zookeeper/logs
clientPort=2181
server.1=zk1:2888:3888
server.2=zk2:2888:3888
server.3=zk3:2888:3888
# create myid on each node
echo "1" > /usr/zookeeper/data/myid
# on other nodes use 2 and 3 respectively
cd $ZOOKEEPER_HOME/bin
sh zkServer.sh start2. Kafka Cluster Deployment
mkdir -p /usr/kafka
chmod 777 -R /usr/kafka
wget --no-check-certificate https://mirrors.tuna.tsinghua.edu.cn/apache/kafka/3.2.0/kafka_2.12-3.2.0.tgz
tar -zvxf kafka_2.12-3.2.0.tgz -C /usr/kafka
# broker configuration (example)
broker.id=1
listeners=PLAINTEXT://ip:9092
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dir=/usr/kafka/logs
num.partitions=5
num.recovery.threads.per.data.dir=3
offsets.topic.replication.factor=2
transaction.state.log.replication.factor=3
transaction.state.log.min.isr=3
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=zk1:2181,zk2:2181,zk3:2181
zookeeper.connection.timeout.ms=30000
group.initial.rebalance.delay.ms=0
nohup /usr/kafka/kafka_2.12-3.2.0/bin/kafka-server-start.sh /usr/kafka/kafka_2.12-3.2.0/config/server.properties > /usr/kafka/logs/kafka.log 2>&1 &
/usr/kafka/kafka_2.12-3.2.0/bin/kafka-server-stop.sh
KAFKA_HOME/bin/kafka-topics.sh --list --bootstrap-server ip:9092
KAFKA_HOME/bin/kafka-console-consumer.sh --bootstrap-server ip:9092 --topic test --from-beginning
KAFKA_HOME/bin/kafka-topics.sh --create --bootstrap-server ip:9092 --replication-factor 2 --partitions 3 --topic xxx_data3. FileBeat Deployment
sudo rpm --import https://packages.elastic.co/GPG-KEY-elasticsearch
# create elastic.repo in /etc/yum.repos.d/
[elastic-8.x]
name=Elastic repository for 8.x packages
baseurl=https://artifacts.elastic.co/packages/8.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
yum install filebeat
systemctl enable filebeat
chkconfig --add filebeat
# filebeat.yml (excerpt)
filebeat.inputs:
- type: log
enabled: true
paths:
- /root/logs/xxx/inner/*.log
json:
keys_under_root: true
output.kafka:
hosts: ["kafka1:9092", "kafka2:9092", "kafka3:9092"]
topic: 'xxx_data_clickhouse'
compression: gzip
processors:
- drop_fields:
fields: ["input", "agent", "ecs", "log", "metadata", "timestamp"]
nohup ./filebeat -e -c /etc/filebeat/filebeat.yml > /user/filebeat/filebeat.log &4. ClickHouse Deployment
# Verify SSE4.2 support
grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"
mkdir -p /data/clickhouse
# add host entries for clickhouse nodes
10.190.85.92 bigdata-clickhouse-01
10.190.85.93 bigdata-clickhouse-02
# performance tuning
echo 'performance' | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
echo 0 | tee /proc/sys/vm/overcommit_memory
echo 'never' | tee /sys/kernel/mm/transparent_hugepage/enabled
# install ClickHouse repository
yum install yum-utils
rpm --import
yum-config-manager --add-repo
yum -y install clickhouse-server clickhouse-client
# adjust config.xml (set
information
)
# logs: /var/log/clickhouse-server/clickhouse-server.log
# error log: /var/log/clickhouse-server/clickhouse-server.err.log
clickhouse-server --version
clickhouse-client --password
sudo clickhouse stop
sudo clickhouse startClickHouse Table Creation and Common Issues
Creating a Kafka engine table:
CREATE TABLE default.kafka_clickhouse_inner_log ON CLUSTER clickhouse_cluster (
log_uuid String,
date_partition UInt32,
event_name String,
activity_name String,
activity_type String,
activity_id UInt16
) ENGINE = Kafka SETTINGS
kafka_broker_list = 'kafka1:9092,kafka2:9092,kafka3:9092',
kafka_topic_list = 'data_clickhouse',
kafka_group_name = 'clickhouse_xxx',
kafka_format = 'JSONEachRow',
kafka_row_delimiter = '\n',
kafka_num_consumers = 1;Issue 1 – direct SELECT on Kafka engine is blocked:
Direct select is not allowed. To enable use setting stream_like_engine_allow_direct_select.(QUERY_NOT_ALLOWED)Solution:
clickhouse-client --stream_like_engine_allow_direct_select 1 --password xxxxxIssue 2 – creating a local replicated table fails because macro 'shard' is missing.
DB::Exception: No macro 'shard' in config while processing substitutions ... (SYNTAX_ERROR)Solution: define distinct on each node, e.g.:
<macros>
<shard>01</shard>
<replica>example01-01-1</replica>
</macros>Issue 3 – replica already exists error when creating the replicated table.
DB::Exception: Replica ... already exists. (REPLICA_IS_ALREADY_EXIST)Solution: delete the corresponding Zookeeper node and recreate the table.
Creating a distributed (cluster) table:
CREATE TABLE default.bi_inner_log_all ON CLUSTER clickhouse_cluster AS default.bi_inner_log_local
ENGINE = Distributed(clickhouse_cluster, default, bi_inner_log_local, xxHash32(log_uuid));Issue 4 – authentication failure when querying the distributed table.
Authentication failed: password is incorrect or there is no user with such name. (AUTHENTICATION_FAILED)Solution: add correct and entries in the configuration:
<remote_servers>
<clickhouse_cluster>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>ip1</host>
<port>9000</port>
<user>default</user>
<password>xxxx</password>
</replica>
</shard>
...
</clickhouse_cluster>
</remote_servers>Creating a materialized view to sync Kafka data to the distributed table:
CREATE MATERIALIZED VIEW default.view_bi_inner_log ON CLUSTER clickhouse_cluster TO default.bi_inner_log_all AS
SELECT
log_uuid,
date_partition,
event_name,
activity_name,
credits_bring,
activity_type,
activity_id
FROM default.kafka_clickhouse_inner_log;After resolving these issues, the end‑to‑end data flow from log generation through Kafka, FileBeat, and ClickHouse works smoothly, providing a cost‑effective, high‑performance analytics pipeline.
Conclusion: Refer to official documentation or --help for troubleshooting; systematic problem solving leads to deeper expertise.
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.