Backend Development 25 min read

Implementation and Optimization of the QUIC Protocol in the Trip.com App

This article details the deployment of QUIC in Trip.com’s mobile app, covering multi‑process architecture, containerized upgrades, service discovery, health monitoring, push‑pull resilience, full‑link tracing, congestion‑control algorithm redesign, and the resulting performance and reliability improvements achieved across global users.

Ctrip Technology
Ctrip Technology
Ctrip Technology
Implementation and Optimization of the QUIC Protocol in the Trip.com App

The article introduces the QUIC (Quick UDP Internet Connections) protocol, a UDP‑based transport layer standard for HTTP/3, and explains its advantages over TCP such as multiplexing, fast connection establishment with 0‑RTT, connection migration via CID, and application‑layer congestion control.

Trip.com completed a multi‑process QUIC deployment in 2022, achieving a 20% reduction in link latency for overseas users. The architecture consists of two key components: a QUIC SLB (load balancer) operating at the transport layer to forward UDP packets, and a QUIC Server handling protocol processing and request forwarding, both supporting 0‑RTT.

Key high‑availability and performance enhancements include containerizing the QUIC Server with HPA capability, adding active health checks, implementing a push‑pull strategy for client‑side network disaster recovery, and building a robust monitoring and alerting system.

Containerization replaced the earlier VM‑based deployment, reducing rollout time from minutes to seconds and enabling automatic scaling. The QUIC SLB was kept as a VM image to support both UDP and TCP acceleration via Akamai, while the QUIC Server runs as a container image integrated with the internal Captain release system.

Service discovery uses Consul: each QUIC Server registers its IP on start and deregisters on stop. The SLB watches Consul for IP changes to update Nginx configuration. However, UDP health checks alone proved insufficient, leading to the adoption of an internal L4LB that provides TCP‑based active health monitoring while still forwarding UDP traffic.

The push‑pull mechanism enables the client app to receive configuration updates in real time, allowing seamless channel or IP switching within seconds when a failure is detected, thus preserving user experience.

Monitoring combines full‑link tracing (writing metrics to access and error logs, forwarding them via LogAgent to Kafka, then processing with Hangout and storing in ClickHouse) with Prometheus‑based key‑metric export from Nginx. Custom HPA metrics such as idle‑connection ratio and idle‑port ratio were added to drive more accurate scaling decisions.

Full‑link instrumentation captures connection lifecycle timestamps, data transmission details, RTT and congestion‑control metrics, and client IP/geolocation, enabling deep analysis of performance bottlenecks.

Analysis uncovered a bug in the ngx_quic_run() timer logic that caused premature closure of 0‑RTT connections, leading to duplicate requests. The bug was fixed in February 2024, improving connection reuse by 0.5% and reducing unnecessary 0‑RTT connections by 7%.

Further, the Cronet library in the client was upgraded from a heavily trimmed 2020 version to 120.0.6099.301, resulting in an 18% reduction in 95th‑percentile end‑to‑end latency for users.

The original QUIC congestion‑control implementation in nginx‑quic used a simplified Reno algorithm with a large initial window, causing fairness and congestion issues. The team abstracted congestion‑control logic, implemented Reno, Cubic, and BBR, and tuned parameters, achieving a 15‑point reduction in connection‑congestion ratio and a 4% decrease in end‑to‑end latency in the SHA environment.

Overall outcomes include automated scaling (30× faster), full‑link tracing‑driven optimizations (0‑RTT bug fix, congestion‑control improvement), 20% latency reduction for European users, expanded QUIC support to additional Trip.com apps, and an 18% drop in 95th‑percentile latency after client‑side Cronet upgrades.

The team plans to continue monitoring community developments, explore new QUIC use cases, and further tailor congestion‑control algorithms per IDC based on long‑term A/B experiments.

cloud nativebackend developmentPerformance Monitoringnetwork optimizationQUICcongestion control
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.