Cloud Native 15 min read

Design and Optimization of CynosDB for PostgreSQL: One‑Primary Multi‑Read Architecture

The article details CynosDB’s cloud‑native PostgreSQL design—using compute‑storage separation, log sinking with asynchronous replay, and multi‑version reads—to enable a one‑primary multi‑read architecture that delivers elastic scaling, reduced I/O, stateless instances, and rapid failover and efficient resource utilization through parallel log‑based recovery.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
Design and Optimization of CynosDB for PostgreSQL: One‑Primary Multi‑Read Architecture

This article, originally presented at the Tencent Cloud community’s “CynosDB for PostgreSQL One‑Primary Multi‑Read Architecture” session, provides a comprehensive technical overview of CynosDB, a cloud‑native database compatible with PostgreSQL.

Author: Sun Xu, Tencent database development engineer with 9 years of experience in database kernel development, familiar with PostgreSQL, Greenplum, PGXC, Teradata, and related storage mechanisms.

The main goal is to share the design and optimization of CynosDB for PostgreSQL’s one‑primary multi‑read architecture.

Why CynosDB? Traditional cloud databases suffer from low resource utilization, limited scalability, difficulty in resource planning, and cumbersome backup procedures.

CynosDB addresses these issues through:

Compute‑storage separation for elastic compute scheduling.

Log sinking and asynchronous replay, eliminating dirty‑page flush logic to reduce network overhead.

Shared distributed storage that can scale elastically.

Continuous background log backup, abstracting backup strategy and storage planning from users.

Core Architecture – CynosDB is a cloud‑native database whose key designs are log sinking, asynchronous replay, and multi‑version page reads. A PostgreSQL instance (the “frontend”) communicates with the distributed block storage CynosStore via the CynosStore Client. Logs are continuously sent to object storage to enable point‑in‑time recovery.

Log Sinking & Asynchronous Replay – All DB‑generated logs are sent to a dedicated log space in CynosStore. A dedicated thread periodically merges these logs into data pages. The log recycle mechanism efficiently reuses the fixed‑size log space, ensuring continuous writes.

The design introduces three key concepts:

MTR (Minimal Transaction Record) : an atomic modification to storage that must be applied wholly or not at all.

CPL : the last log of an MTR.

VDL : the maximum CPL persisted in storage.

Logs are written to an in‑memory buffer; a background thread asynchronously flushes them to storage, allowing parallel insertion rather than serial processing.

Multi‑Version Read – Reads are performed synchronously. For example, to read version 30 of page A when the buffer holds version 20, the CynosClient merges logs 25 and 30 onto the base page and returns the result. The concept of Read Point LSN (RPL) defines the target version; the smallest active RPL (MRPL) becomes the log recycle point.

Motivation for Multi‑Read – Traditional PostgreSQL primary‑standby replication incurs extra log I/O, requires additional storage for logs, and suffers from read‑only conflicts that can block log recovery. These issues lead to slow failover and high storage costs.

One‑Primary Multi‑Read Architecture – Replicas store no data or logs; the primary streams logs directly to replica memory, where they are recovered in parallel. Multi‑version buffers reduce lock contention, and configuration files reside in CynosStore and COS, making the instance stateless and instantly startable anywhere.

Log Recovery Details – Logs are sent to the replica (RO), placed in a hash table, and merged in parallel by background threads. Pages not present in memory are skipped, accelerating recovery.

The DB layer provides callbacks for the CynosStore client: fetch buffer page, allocate buffer slot, and update runtime status. The client supplies the current RPL and multi‑version read interfaces, enabling seamless buffer recovery.

Compared with traditional PostgreSQL standby recovery (walreceiver → log file → startup process), CynosDB performs recovery entirely within the CynosStore process, using a log‑receive thread, a log‑append thread, and parallel apply threads that invoke DB callbacks.

During a read, the backend can fetch a page (e.g., version 15) while the recovery thread continues merging logs for other pages, avoiding blocking.

The CynosStore client returns the nearest version page (e.g., version 20) and merges only the required logs to reach the target version, dramatically reducing latency. If the base page is absent, it is fetched directly from storage.

Replica accesses pages at MTR granularity to preserve index structure integrity, using the highest LSN (e.g., J4) as the access point.

Failover in traditional PostgreSQL requires extensive log recovery before the standby can become primary. CynosDB’s asynchronous replay and log‑based recovery allow much faster promotion, as only the VDL needs to be read, not all logs merged.

Replica startup does not need to recover to a MinRecoveryPoint; providing a valid RPL is sufficient for consistent reads, because the system relies on log‑based retrieval rather than full page reconstruction.

Q&A

Q: Is continuous backup performed at fixed times or only when data changes? A: Backup is log‑based and runs continuously.

Q: If logs are recycled, can they still be retrieved? A: Logs are recycled only after they have been backed up.

For more technical deep‑dives, see the recommended readings linked at the end of the original article.

cloud-nativeDatabase ArchitectureReplicationPostgreSQLCynosDBlog-replay
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.