Cloud Native 9 min read

Design and Implementation of a Cloud‑Native Operator Platform for Component Management at Tongcheng Travel

This article details Tongcheng Travel's cloud‑native migration journey, describing the challenges of resource quota management, component versioning, and operational automation, and explains how a Kubernetes Operator platform built with Go, kubebuilder, and a watch‑broadcast mechanism addresses these issues while outlining future expansion plans.

Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Design and Implementation of a Cloud‑Native Operator Platform for Component Management at Tongcheng Travel

Since 2019, Tongcheng Travel has migrated more than 20 components to the cloud, encountering pain points such as diverse resource quota management, coarse‑grained component services, manual configuration handling, circular dependencies, and lack of intelligent operations.

Selection and Challenges – Although Golang dominates cloud‑native development, the team’s Java background required a careful transition.

1) CustomResourceDefinition version – The production cluster runs Kubernetes v1.14, supporting only v1beta1 CRDs; with newer Kubernetes releases moving to stable v1 CRDs, the team upgraded the cluster to use v1 CRDs for operator compatibility.

2) Operator development framework – After comparing available frameworks, kubebuilder was chosen for its flexibility, testing support, extensibility, and large community.

Application Practice – The architecture emphasizes configuration versioning, container debugging, and a watch‑broadcast mechanism that instantly notifies users of cluster changes.

3.1 Configuration Versioning – Platform‑level file version management uploads configuration files to S3, enabling real‑time change tracking and safe roll‑outs.

3.2 Container Startup Scheme – Containers initialize configurations via a custom configini tool, then follow mode‑based logic; Dockerfile conventions enforce script placement, entrypoint naming, and health‑check scripts with debug support.

3.3 Tuning Logic Design – Pause logic is added at both controller and reconcile levels to improve controllability, especially for complex Pod tuning.

3.4 Resource Listening – After evaluating direct API access, in‑memory caching, and watch‑broadcast, the team adopted the watch‑broadcast mechanism, using a custom Watchdog to persist changes, broadcast updates, and reduce load on the Kubernetes API server.

3.5 Platform Scope – Currently, the platform automates operators for over 100 services across seven categories, providing fine‑grained component management for big‑data developers.

3.6 Advantages – The new architecture decouples API handling from Kubernetes tuning, enables stateless scaling, solidifies configuration, isolates operators per component, removes ZooKeeper reliance via leader election, and provides faster state synchronization for better user experience.

Future Plans – Expand operator coverage to all internal big‑data components, expose service lineage, implement dynamic resource scaling, and integrate hybrid‑cloud capabilities with external providers.

cloud-nativeKubernetesOperatorgoConfiguration Managementwatchdog
Tongcheng Travel Technology Center
Written by

Tongcheng Travel Technology Center

Pursue excellence, start again with Tongcheng! More technical insights to help you along your journey and make development enjoyable.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.