ClickHouse vs Elasticsearch: Choosing a Database for Log Storage and Analysis
The article compares ClickHouse, Elasticsearch, and MySQL for log storage and analysis, highlighting ClickHouse's active development, SQL support, JSON handling, performance, storage efficiency, integration options, and practical query examples to help developers decide the best backend for observability pipelines.
In 2018 the author wrote a popular article about ClickHouse, and the project has continued to evolve rapidly—over 800 PRs were merged in a single month, while Elasticsearch saw a comparable 1,076 PRs, underscoring the intense development pace of both systems.
ClickHouse is used in the ApiRoad.net API marketplace to store and analyze HTTP request/response logs, providing the observability needed for API services; the same team also employs the ELK stack (Elasticsearch, Logstash, Filebeat, Kibana) and MySQL for other logging tasks.
The core argument for preferring ClickHouse over Elasticsearch or MySQL is its native SQL support, first‑class JSON and array handling, and a flexible schema that can be strict when required; this combination offers fast analytical queries, better storage efficiency (5‑6× higher than Elasticsearch), and a familiar SQL syntax.
Practical examples include using arrayJoin , arrayMap , and range functions for date‑gap filling, as well as a full query that generates a virtual table with lambda and joins it to daily‑grouped logs:
SELECT a.timePeriod as t, b.count as c
FROM (
WITH (SELECT toUInt32(dateDiff('day', [START_DATE], [END_DATE]))) AS diffInTimeUnits
SELECT arrayJoin(arrayMap(x -> (toDate(addDays([START_DATE], x))), range(0, diffInTimeUnits+1))) AS timePeriod
) a
LEFT JOIN (
SELECT count(*) AS count, toDate(toStartOfDay(started_at)) AS timePeriod
FROM logs
WHERE [CONDITIONS]
GROUP BY toStartOfDay(started_at)
) b ON a.timePeriod = b.timePeriod;ClickHouse also provides powerful statistical functions such as quantileTiming for median and percentile calculations, enabling concise queries like:
SELECT count(*) AS cnt,
quantileTiming(0.5)(duration) AS duration_median,
quantileTiming(0.9)(duration) AS duration_90th,
quantileTiming(0.99)(duration) AS duration_99th
FROM logs
WHERE status = 404;Integration with MySQL is straightforward via external dictionaries, binlog replication, MySQL table functions, and the MySQL protocol, allowing minimal data duplication while leveraging ClickHouse's columnar performance for large‑scale log analysis.
Newer ClickHouse features simplify operations: loading CSV files from S3 as tables, and using ALTER TABLE ... DELETE WHERE to remove outdated rows, though deletions remain costly in columnar stores.
In summary, Elasticsearch excels at full‑text search and large‑node clusters, but ClickHouse offers a simpler, SQL‑friendly, high‑performance solution for time‑series and log analytics, with lower memory requirements and strong integration capabilities, making it a compelling choice for many logging scenarios.
Big Data Technology Architecture
Exploring Open Source Big Data and AI Technologies
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.