Cloud Computing 13 min read

Comprehensive Overview of Ceph Architecture and Data Storage Mechanisms

This article provides a systematic summary of Ceph's architecture, including its core services, RADOS layer, libraries, high‑level storage interfaces, client interactions, data placement algorithms, and deployment considerations, while also comparing its ecosystem and enterprise features.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
Comprehensive Overview of Ceph Architecture and Data Storage Mechanisms

Previously the author shared articles about the open‑source storage system Ceph, but the architectural description was fragmented; this piece consolidates a systematic overview of Ceph’s architecture and compares its ecosystem, features, and enterprise‑grade storage options.

Ceph is written in C++ and released under the LGPL. Sage Weil, the author of the original Ceph paper, founded Inktank in 2011 to lead development and community maintenance. Red Hat acquired Inktank in 2014 and released the Inktank Ceph Enterprise (ICE) edition, focusing on cloud, backup, and archival workloads and supporting object and block storage. Since then, both the open‑source community version and the Red Hat enterprise version have co‑existed.

Ceph Basic Service Architecture

The basic service architecture consists of Object Storage Devices (OSDs), Monitors, and Metadata Servers (MDS). Built on this foundation, Ceph offers the native librados object library, librbd block‑storage library, librgw S3/Swift‑compatible object library, and the libceph filesystem library.

OSDs store data and handle replication, recovery, rebalancing, and heartbeat monitoring of other OSDs, reporting status to Ceph Monitors.

Monitors track cluster health, including their own status, OSD status, Placement Group (PG) status, and CRUSH map status. They also record every historical version change to determine which version the cluster should follow.

MDS manages filesystem metadata and provides a standard POSIX interface; it is required only for CephFS, not for object or block services.

Deploying a Ceph cluster requires at least one Monitor and two OSD roles; the Metadata Server is needed only when CephFS is used. These logical roles can run on the same physical machine, but the default data replication factor is two, so a minimal production cluster needs at least two OSD servers.

Ceph Software Architecture

1. Underlying Storage System RADOS

RADOS (Reliable Autonomic Distributed Object Store) is a complete object‑storage system that includes the basic services (MDS, OSD, Monitor). All user data ultimately resides in RADOS, and its reliability, scalability, performance, and automation stem from this layer.

Physically, RADOS consists of many storage nodes, each with its own CPU, memory, disks, and network, running an OS and filesystem.

2. librados Library

This layer abstracts and encapsulates RADOS, exposing APIs for native object, block, and file applications. The provided APIs target object‑storage functionality.

RADOS offers C and C++ librados APIs. Applications link against the local librados library, which communicates with the RADOS cluster via sockets.

3. High‑Level Storage Interfaces

Built on librados, this layer includes RADOS Gateway (RGW), RBD (Reliable Block Device), and CephFS.

RGW provides S3‑ and Swift‑compatible RESTful APIs for object‑storage applications. It is higher‑level but less feature‑rich than librados, so developers should choose based on requirements.

RBD offers a standard block‑device interface, commonly used to create volumes for virtual machines; Red Hat has integrated the RBD driver into KVM/QEMU for better VM performance.

CephFS is a POSIX‑compatible distributed filesystem, still under development and not recommended for production by the Ceph project.

4. Server‑Client Layer

This layer describes how various client applications interact with Ceph interfaces—direct librados‑based object apps, RGW‑based object apps, RBD‑based cloud disks, etc.

Ceph clients, built on FUSE (user space) or VFS (kernel space), use POSIX interfaces. The Ceph Metadata Daemon stores metadata, while the Ceph Object Storage Daemon stores actual data and metadata. All reads/writes go through the CRUSH algorithm, which determines data placement and retrieval.

Ceph Internal Data Storage View

Ceph clusters are logically divided into pools, each containing multiple Placement Groups (PGs). The number of replicas per pool is configurable.

Because Ceph stores objects, a file is split into several objects, each mapped to a PG, which in turn maps to a set of OSDs. The first OSD in the set is the primary; the rest are replicas. PGs simplify OSD data management and enable dynamic object‑to‑OSD mapping, so adding or failing OSDs does not affect object placement.

Ceph Data Storage Process

Data storage involves three mapping steps: (1) the user file is split into RADOS‑level objects (similar to RAID striping); (2) each object is mapped to a PG; (3) each PG is mapped to actual OSDs via the CRUSH algorithm. PG‑OSD relationships are many‑to‑many, and OSDs can be grouped into failure domains that span racks or servers, allowing replicas to be placed across different domains.

Ceph Data Distribution Algorithm

Ceph provides client access via Linux user‑space (FUSE) and kernel‑space (VFS). Clients perform data slicing and use the CRUSH algorithm to locate objects for reads and writes. In practice, Ceph can be exported as an NFS service (ExportFS) for standard NFS clients.

Mapping PGs to OSDs requires a CRUSH map, CRUSH rules, and the CRUSH algorithm itself.

The Cluster Map records global system state and consists of a CRUSH map (hardware hierarchy) and an OSD map (pool and OSD status). CRUSH rules define replication policies and placement strategies, using weighted pseudo‑random distribution. The algorithm supports both replication and erasure coding, and offers bucket types such as Uniform, List, Tree, and Straw (the most common).

End of the technical overview.

Further comparative analysis of Ceph’s ecosystem, features, and enterprise storage options is available through a donation‑based request.

Warm Reminder: Please search for “ICT_Architect” or scan the QR code below to follow the public account and get more content.

cloud computingDistributed StorageCephObject StorageRADOS
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.