Fundamentals 40 min read

Red Hat Ceph Storage Architecture Overview and Key Components

This article provides a comprehensive English translation of the Red Hat Ceph Storage Architecture Guide, covering Ceph's distributed object storage concepts, cluster architecture, storage pools, CRUSH algorithm, replication and erasure‑coding I/O, internal operations, high‑availability mechanisms, client interfaces, and encryption considerations for cloud environments.

Architects' Tech Alliance

Sep 5, 2018

Red Hat Ceph Storage Architecture Overview and Key Components

Chapter 1 Overview

Red Hat Ceph is a distributed object storage system designed for high performance, reliability, and scalability, supporting modern and legacy object interfaces such as native language bindings (C/C++, Java, Python), RESTful S3/Swift APIs, block device, and file system interfaces.

Ceph can scale to thousands of clients and petabyte‑to‑exabyte data volumes, making it suitable for cloud platforms like RHEL OSP.

The core of any Ceph deployment is the Ceph storage cluster, which consists of two main daemon types:

Ceph OSD daemon : stores data, performs replication, rebalancing, recovery, health monitoring, and status reporting.

Ceph Monitor daemon : maintains a master copy of the cluster map.

Clients interact with the cluster using a configuration file (or cluster name and monitor addresses), a pool name, and user credentials.

Chapter 2 Storage Cluster Architecture

The cluster provides data storage, replication, health monitoring, dynamic rebalancing, integrity checking, and failure recovery while remaining transparent to client interfaces.

2.1 Storage Pools

Pools logically partition data and can be configured for different types (replicated or erasure‑coded). They define pool type, placement groups (PGs), CRUSH rule sets, and persistence methods.

2.2 Authentication (CephX)

CephX uses shared secret keys for mutual authentication between clients and monitors, providing protection against man‑in‑the‑middle attacks.

2.3 Placement Groups (PGs)

Objects are hashed into PGs, which are then mapped to an acting set of OSDs via the CRUSH algorithm, enabling dynamic data placement and rebalancing.

2.4 CRUSH Algorithm

CRUSH maps objects to PGs and PGs to OSDs based on a hierarchical bucket topology, supporting fault‑domain and performance‑domain isolation.

2.5 I/O Operations

Clients obtain the latest cluster map from a monitor, then use the object ID, pool name, and CRUSH to compute the target PG and primary OSD. The primary OSD coordinates writes to replica OSDs.

2.5.1 Replicated I/O

The primary OSD writes the object to replica OSDs; once acknowledgments are received, the client is notified of success.

2.5.2 Erasure‑Coding I/O

Data is split into K data blocks and M coding blocks; the primary OSD distributes these blocks across OSDs, enabling reconstruction when up to M OSDs fail.

2.6 Internal Self‑Management Operations

Heartbeat – OSDs report up/down status to monitors.

Sync – OSDs synchronize PG state automatically.

Rebalancing – New OSDs cause a small fraction of data to migrate based on CRUSH.

Scrubbing – Periodic verification and cleaning of object metadata and data.

2.7 High Availability

Data Replication – Default three‑copy replication; writes require at least two clean copies.

Mon Cluster – Multiple monitors provide quorum and avoid single‑point failure.

CephX – Provides secure, key‑based authentication without a single monitor bottleneck.

Chapter 3 Client Architecture

Ceph offers block devices (RBD), object gateway (RGW), and CephFS, all built on the RADOS protocol.

3.1 Native Protocol and librados

Provides direct, parallel object access with operations such as pool management, snapshots, read/write, XATTR and key/value handling, and compound operations.

3.2 Object Watch/Notify

Clients can register persistent watches on objects and receive notifications from the primary OSD.

3.3 Exclusive Locks

Allows a single client to obtain an exclusive lock on an RBD image, preventing concurrent writes.

3.4 Object Map Index

Tracks existence of RADOS objects in client memory to avoid unnecessary OSD queries for non‑existent objects, improving operations such as resize, export, copy, flatten, delete, and read.

3.5 Data Striping

Striping splits data across multiple objects to improve throughput; parameters include object size, stripe width, and stripe count. Ceph’s CRUSH algorithm then maps striped objects to PGs and OSDs.

Chapter 4 Encryption

LUKS can encrypt OSD data and journal partitions. Ceph‑ansible uses ceph-disk to create encrypted partitions, a lockbox partition, and stores LUKS keys in the monitor’s KV store. At service start‑up, OSDs automatically decrypt their data using the stored keys.

For detailed steps, refer to the Red Hat Ceph Storage Installation Guide.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

replication distributed storage erasure coding Ceph Object Storage cloud infrastructure CRUSH

Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.