Cloud Computing 9 min read

Ceph Deployment and Usage at Tongcheng: Architecture, Applications, Pain Points, and Future Work

This article details Tongcheng's adoption of Ceph for handling massive unstructured data, describing the motivations, architecture, specific uses of RGW, block storage, and CephFS, the challenges encountered, optimization measures, monitoring practices, and planned future improvements.

Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Ceph Deployment and Usage at Tongcheng: Architecture, Applications, Pain Points, and Future Work

1. Ceph usage background

Tongcheng generates large volumes of unstructured data—documents, XML, HTML, reports, audio/video, etc.—which were previously stored on local disks using a single‑node file system, leading to single‑point failures, difficult scaling, low reliability, high metadata overhead, and lack of migration capability.

Single‑point failure

Scaling difficulties

Inability to achieve high data reliability and high availability

Metadata management overhead causing performance bottlenecks

Stateful applications on local storage cannot be migrated

OS‑bound storage; bugs in the file system can bring down the whole system

Object storage is the mainstream solution for unstructured data, offering RESTful APIs, flat data organization, massive scalability, multi‑tenant security, and elastic resource pools.

For Tongcheng's private cloud, a highly reliable, elastically scalable block storage service is required.

After evaluating many open‑source distributed storage systems, Ceph was selected because of its advanced decentralized architecture, active community, support for object, block, and file storage, and deep integration with OpenStack.

2. Ceph usage in Tongcheng

2.1 RGW for unstructured data storage

RGW provides S3/Swift‑compatible interfaces, allowing direct use of AWS S3 SDKs. Multiple RGW instances are load‑balanced by Nginx with Keepalived for high availability. Nginx also compresses static CSS/JS assets via Lua and implements automatic cache cleaning.

2.2 Block storage for private cloud platform

Ceph serves as the storage backend for OpenStack services Nova, Glance, and Cinder, eliminating the need for separate storage solutions. RBD snapshots enable near‑instant VM creation and reduce backup time and space consumption.

Since Ceph lacks native QoS, QEMU QoS is used to limit IOPS and bandwidth.

2.3 CephFS for shared data between servers

CephFS replaces legacy NAS, providing POSIX‑compatible access via Kernel client and FUSE. The current deployment uses active‑standby MDS and disables directory sharding for stability, with plans to integrate OpenStack Manila.

3. Pain points in Ceph usage

Key issues include log‑disk write patterns, the need for SSD+HDD hybrid OSDs, bandwidth limitations of SATA drives, and the large size of bucket index shards causing scrub and recovery performance problems.

Large bucket index shards increase scrub time and I/O latency.

Recovery of oversized shards after OSD failures can cause client I/O timeouts.

Optimizations applied:

Configure multiple bucket shards to limit shard size.

Use SSD for the buckets.index pool to speed up recovery.

Enable bucket quota to control shard growth.

Ceph’s pre‑L version lacked a dashboard; a custom management platform was built using RGW admin APIs, Cinder APIs, and Nova APIs.

4. Monitoring focus

OSD health – down/up events trigger peering and object recovery, potentially hanging client I/O.

Monitor quorum – loss of more than one monitor renders the cluster unavailable.

I/O latency – long‑running requests may indicate a slow OSD.

Cluster capacity and load – guide scaling decisions.

5. Future work

Ceph is likened to a gold mine; current extraction is limited. Future efforts will explore more hardware configurations, expand storage interfaces (e.g., adding iSCSI support), deepen usage across services, and improve stability and reliability.

cloud computingDistributed StorageCephobject storageblock storageCephFS
Tongcheng Travel Technology Center
Written by

Tongcheng Travel Technology Center

Pursue excellence, start again with Tongcheng! More technical insights to help you along your journey and make development enjoyable.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.