Databases 12 min read

ClickHouse 2024 Core New Features and Product Development Directions

This article introduces ClickHouse, an open‑source columnar OLAP database, outlines its architecture, advantages, self‑hosted and cloud deployment models, highlights recent product features such as async inserts, JSON support, Parquet acceleration, query caching, and summarizes a Q&A covering semi‑structured data, MPP, virtual columns, and future roadmap.

DataFunTalk

Jan 3, 2024

ClickHouse 2024 Core New Features and Product Development Directions

ClickHouse is an open‑source column‑oriented OLAP database management system, originally released in 2016 and now used by thousands of developers worldwide.

Its core, written in C++ and assembly, delivers high performance for aggregation, sorting, indexing and background merges, earning the reputation of “the fastest database”.

The system supports distributed multi‑master architecture, offering both vertical and horizontal scalability, high availability, and the ability to run on a single node or across multiple data centers.

Key advantages include lightweight fast queries that outperform competitors such as Pinot, Redshift, Elasticsearch and Druid; advanced data compression that improves storage efficiency by tens to hundreds of times; and easy‑to‑use table functions and standard SQL support for diverse data sources.

Two deployment modes are available: ClickHouse self‑hosted (on‑premises) and ClickHouse Cloud, the latter using object storage and compute‑storage separation to enable automatic scaling and serverless operation.

Recent open‑source product features (2023‑2024 releases) comprise asynchronous and deduplication inserts, JSON support with schema inference, integration with data lake formats (Hudi, Delta Lake, Iceberg), Parquet read speed improvements, MySQL‑wire protocol compatibility, SSH key authentication, lightweight delete operations, transaction support in development, query caching, enhanced join performance, and experimental inverted indices, vector search and other analytics extensions.

The Q&A session addressed semi‑structured data handling, MPP improvements, virtual/materialized columns, external joins, replacement of Spark, inverted index capabilities, cluster scale, and future performance goals.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance ClickHouse Data Warehouse OLAP cloud Columnar Database

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.