Elasticsearch vs ClickHouse: Performance, Cost, and Deployment Guide for Enterprise Data Analytics
This article compares Elasticsearch and ClickHouse in terms of write throughput, query speed, and server cost, then provides a detailed deployment guide for Zookeeper, Kafka, Filebeat, and ClickHouse clusters, including troubleshooting steps and cost analysis for enterprise data analytics.
Elasticsearch vs ClickHouse
ClickHouse is a high‑performance column‑oriented distributed DBMS. Tests show it has several advantages over Elasticsearch:
Write throughput : a single server can ingest 50‑200 MB/s (over 600 k records/s), more than five times Elasticsearch, with fewer write rejections and latency.
Query speed : data cached in pagecache yields 2‑30 GB/s query rates; even without cache, ClickHouse outperforms Elasticsearch by 5‑30×.
Server cost : ClickHouse compresses data 1/3‑1/30 of Elasticsearch, reducing disk usage, I/O, and memory/CPU consumption, potentially halving server costs.
Cost Analysis
Cost estimation based on Alibaba Cloud (no discounts) shows significant savings when using ClickHouse instead of Elasticsearch.
Environment Deployment
1. Zookeeper cluster deployment
yum install java-1.8.0-openjdk-devel.x86_64
/etc/profile configure environment variables
yum install ntpdate
ntpdate asia.pool.ntp.org
mkdir zookeeper
mkdir ./zookeeper/data
mkdir ./zookeeper/logs
wget --no-check-certificate https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/zookeeper-3.7.1/apache-zookeeper-3.7.1-bin.tar.gz
tar -zvxf apache-zookeeper-3.7.1-bin.tar.gz -C /usr/zookeeper
export ZOOKEEPER_HOME=/usr/zookeeper/apache-zookeeper-3.7.1-bin
export PATH=$ZOOKEEPER_HOME/bin:$PATH
cd $ZOOKEEPER_HOME/conf
vi zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/usr/zookeeper/data
dataLogDir=/usr/zookeeper/logs
clientPort=2181
server.1=zk1:2888:3888
server.2=zk2:2888:3888
server.3=zk3:2888:3888
echo "1" > /usr/zookeeper/data/myid
echo "2" > /usr/zookeeper/data/myid
echo "3" > /usr/zookeeper/data/myid
cd $ZOOKEEPER_HOME/bin
sh zkServer.sh start2. Kafka cluster deployment
mkdir -p /usr/kafka
chmod 777 -R /usr/kafka
wget --no-check-certificate https://mirrors.tuna.tsinghua.edu.cn/apache/kafka/3.2.0/kafka_2.12-3.2.0.tgz
tar -zvxf kafka_2.12-3.2.0.tgz -C /usr/kafka
# broker configuration (example for broker.id=1)
broker.id=1
listeners=PLAINTEXT://ip:9092
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dir=/usr/kafka/logs
num.partitions=5
num.recovery.threads.per.data.dir=3
offsets.topic.replication.factor=2
transaction.state.log.replication.factor=3
transaction.state.log.min.isr=3
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=zk1:2181,zk2:2181,zk3:2181
zookeeper.connection.timeout.ms=30000
group.initial.rebalance.delay.ms=0
nohup /usr/kafka/kafka_2.12-3.2.0/bin/kafka-server-start.sh /usr/kafka/kafka_2.12-3.2.0/config/server.properties > /usr/kafka/logs/kafka.log 2>&1 &
/usr/kafka/kafka_2.12-3.2.0/bin/kafka-server-stop.sh
KAFKA_HOME/bin/kafka-topics.sh --list --bootstrap-server ip:9092
KAFKA_HOME/bin/kafka-console-consumer.sh --bootstrap-server ip:9092 --topic test --from-beginning
KAFKA_HOME/bin/kafka-topics.sh --create --bootstrap-server ip:9092 --replication-factor 2 --partitions 3 --topic xxx_data3. Filebeat deployment
sudo rpm --import https://packages.elastic.co/GPK-KEY-elasticsearch
# create elastic.repo in /etc/yum.repos.d/
[elastic-8.x]
name=Elastic repository for 8.x packages
baseurl=https://artifacts.elastic.co/packages/8.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPK-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
yum install filebeat
systemctl enable filebeat
chkconfig --add filebeat
# filebeat.yml (key points)
filebeat.inputs:
- type: log
enabled: true
paths:
- /root/logs/xxx/inner/*.log
json:
keys_under_root: true
output.kafka:
hosts: ["kafka1:9092", "kafka2:9092", "kafka3:9092"]
topic: 'xxx_data_clickhouse'
partition.round_robin:
reachable_only: false
required_acks: 1
compression: gzip
processors:
- drop_fields:
fields: ["input", "agent", "ecs", "log", "metadata", "timestamp"]
ignore_missing: false
nohup ./filebeat -e -c /etc/filebeat/filebeat.yml > /user/filebeat/filebeat.log &4. ClickHouse deployment
# Check SSE 4.2 support
grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"
mkdir -p /data/clickhouse
# add host entries for clickhouse nodes
10.190.85.92 bigdata-clickhouse-01
10.190.85.93 bigdata-clickhouse-02
# Performance tuning
echo 'performance' | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
echo 0 | tee /proc/sys/vm/overcommit_memory
echo 'never' | tee /sys/kernel/mm/transparent_hugepage/enabled
# Install ClickHouse repository
yum install yum-utils
rpm --import https://repo.clickhouse.tech/CLICKHOUSE-KEY.GPG
yum-config-manager --add-repo https://repo.clickhouse.tech/rpm/stable/x86_64
yum -y install clickhouse-server clickhouse-client
# Adjust config.xml (set log level to information)
information
# Start/stop commands
sudo clickhouse stop
sudo clickhouse start
sudo clickhouse restartCommon ClickHouse Issues and Solutions
1) Kafka engine table cannot be queried directly
Direct select is not allowed. To enable use setting stream_like_engine_allow_direct_select. (QUERY_NOT_ALLOWED)Solution: launch the client with the flag --stream_like_engine_allow_direct_select 1 . clickhouse-client --stream_like_engine_allow_direct_select 1 --password xxxx
2) Local replicated table macro error
DB::Exception: No macro 'shard' in config while processing substitutions ...Solution: define distinct shard values for each node in the <macros> section of the ClickHouse config.
<macros>
<shard>01</shard>
<replica>example01-01-1</replica>
</macros>3) Replica already exists error
DB::Exception: Replica ... already exists. (REPLICA_IS_ALREADY_EXIST)Solution: delete the stale Zookeeper node for that replica and recreate the table.
4) Distributed table authentication failure
Authentication failed: password is incorrect or there is no user with such name. (AUTHENTICATION_FAILED)Solution: correct the user and password entries in the <remote_servers> configuration.
<remote_servers>
<clickhouse_cluster>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>ip1</host>
<port>9000</port>
<user>default</user>
<password>xxxx</password>
</replica>
</shard>
...
</clickhouse_cluster>
</remote_servers>Materialized View for Kafka → ClickHouse
CREATE MATERIALIZED VIEW default.view_bi_inner_log ON CLUSTER clickhouse_cluster TO default.bi_inner_log_all AS
SELECT
log_uuid,
date_partition,
event_name,
activity_name,
credits_bring,
activity_type,
activity_id
FROM default.kafka_clickhouse_inner_log;After resolving the above issues, the data pipeline from Kafka through Filebeat into ClickHouse works end‑to‑end.
In summary, refer to official documentation or --help for troubleshooting; systematic exploration leads to a robust enterprise data platform.
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.