Ctrip's Service Mesh Evolution: Architecture, Implementation, and Lessons Learned
This article details Ctrip's transition from traditional SOA to a cloud‑native Service Mesh using Istio, covering background challenges, technical solutions for control and data planes, SDK compatibility, configuration management, performance optimizations, and future directions such as WebAssembly and sidecar considerations.
Ctrip's SOA system, originally built on ESB and micro‑services, faced limitations such as multi‑language SDK support, cumbersome version upgrades, and performance constraints, prompting the team to explore Service Mesh as a solution.
The technical plan adopted Istio as the core of the cloud‑native architecture, leveraging its support for HTTP, HTTPS, and gRPC, and extending it with custom operators to achieve seamless migration without modifying existing business code.
Control Plane : Two migration strategies were evaluated—keeping the existing configuration as the authoritative source and gradually shifting to Istio configuration via GitOps. The team chose the former, using an Operator to translate legacy configurations to Istio resources while preserving compatibility.
Key features such as warm‑up, circuit‑breaking, and rate‑limiting were re‑implemented using Istio constructs. For warm‑up, the DestinationRule with a minimum‑connections load‑balancing algorithm was applied: apiVersion: networking.istio.io/v1alpha3 kind: DestinationRule metadata: name: warmup-rule spec: host: my-service trafficPolicy: loadBalancer: simple: LEAST_CONN
Circuit‑breaking and outlier detection were configured via ConnectionPool and OutlierDetection in the DestinationRule, mirroring the existing Hystrix‑based mechanisms.
Rate‑limiting leveraged Envoy's EnvoyFilter and LocalRateLimit filter, with descriptor‑based token buckets defined in YAML and applied through Lua scripts to extract custom keys from request headers.
SDK Compatibility : A lightweight mode disables service registration, routing, and governance features for applications running inside the mesh, allowing seamless coexistence of legacy and mesh‑enabled services.
Data Plane : HTTP traffic is natively supported by Istio, while Dubbo services are migrated to gRPC using a content‑type negotiation strategy (e.g., application/grpc+json ) and dynamic code generation to avoid protobuf contracts.
Configuration on‑demand is achieved with Istio Sidecar resources that limit egress hosts to only required services, reducing push frequency and data volume. When a service accesses an unknown host, a fallback route forwards the request to a gateway, which records the dependency and updates the relevant Sidecar configuration via an EnvoyFilter patch.
Future Outlook : The team plans to adopt WebAssembly for extensible filters, evaluate sidecar versus proxy deployment models, and continue improving control‑plane performance and reliability.
In summary, after two years of development and over a year in production, Ctrip's Service Mesh now serves thousands of applications and pods, providing hot‑update capabilities, SDK freeze, and cross‑language support, while still facing challenges such as WebAssembly integration and further performance tuning.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.