Databases 9 min read

Enhancing ClickHouse with UniqueMergeTree: Real‑time Upsert and Delete Capabilities in ByteHouse

This article explains how ByteHouse extends ClickHouse by introducing the UniqueMergeTree engine, which adds full upsert and delete support, improves query performance, and enables real‑time audience selection scenarios while preserving scalability and reliability.

DataFunTalk
DataFunTalk
DataFunTalk
Enhancing ClickHouse with UniqueMergeTree: Real‑time Upsert and Delete Capabilities in ByteHouse

ClickHouse offers strong analytical performance but lacks complete upsert and delete operations, has weak multi‑table join capabilities, reduced availability at large cluster scales, and no resource isolation, prompting ByteHouse to enhance its functionality.

In e‑commerce real‑time audience selection, the traditional offline T+1 tagging approach hampers user experience, motivating the need for a real‑time tagging pipeline.

The native ClickHouse ReplacingMergeTree engine, based on a merge‑on‑read design similar to LSM‑Tree, suffers from severe read performance degradation, does not support row deletion, and performs deduplication only after merges, increasing system complexity.

ByteHouse addresses these issues with a new table engine called UniqueMergeTree , which provides true real‑time update and delete capabilities.

Two technical approaches are compared: (1) the native ReplacingMergeTree’s merge‑on‑read strategy, which writes efficiently but reads slowly, and (2) the read‑optimized Mark‑Delete + Insert method used by UniqueMergeTree, where updates are performed by marking rows for deletion and inserting new versions.

Creating a UniqueMergeTree table uses the same parameters as ReplacingMergeTree but adds a UNIQUE KEY clause to define one or more unique columns; upsert semantics insert new keys and update existing ones, while a virtual delete flag marks rows for removal.

Key highlights of UniqueMergeTree include upsert‑based writes, optional partial‑column updates, real‑time row deletion via the unique key, customizable version fields to resolve stale‑data conflicts, and multi‑replica synchronous deployment for high availability.

Performance tests show that although write throughput drops slightly, query latency improves by orders of magnitude compared to ReplacingMergeTree and is comparable to the standard MergeTree engine.

With UniqueMergeTree, ByteHouse meets the real‑time audience selection requirements without changing the existing architecture, achieving over 10k rows/s write throughput per shard and query performance nearly identical to native ClickHouse.

Additional features include multi‑field and expression‑based unique keys, partition‑level and table‑level uniqueness modes, custom version fields, and asynchronous master‑slave replication for data reliability.

ByteHouse is now publicly available on Volcano Engine, offering free trial versions and a technical community for users to share experiences and seek expert assistance.

real-time analyticsClickHousedatabase engineUpsertByteHouseUniqueMergeTree
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.