Design and Implementation of Ctrip's Soft Load Balancer: Evolving from Nginx Reverse Proxy to an OpenResty‑Based API Gateway
This article details how Ctrip's Soft Load Balancer (SLB) transitioned from a simple Nginx reverse‑proxy to a multi‑datacenter, dynamic routing gateway using OpenResty, Lua scripts, and a three‑layer architecture that eliminates reloads and supports high‑frequency configuration updates.
Background : Ctrip's Soft Load Balancer (SLB) is built on Nginx and handles billions of HTTP requests daily. Initially it performed basic HTTP routing and reverse‑proxy functions, replacing traditional hardware load balancers.
Traffic Chain : Early SLB configurations routed a fixed domain to a static upstream cluster, e.g.: server { listen 80; server_name foo.bar.com; location ~* ^/hello { proxy_pass http://backend_499; } } upstream backend_499 { server 10.10.10.1:80 weight=5 max_fails=0 fail_timeout=30; server 10.10.10.2:80 weight=5 max_fails=0 fail_timeout=30; server 10.10.10.3:80 weight=5 max_fails=0 fail_timeout=30; }
As business grew, multi‑datacenter disaster recovery, hybrid‑cloud deployment, and diversified routing requirements emerged, demanding dynamic, per‑request routing decisions that static Nginx configs could not satisfy.
Challenges : Limited routing capabilities of native Nginx. Configuration reloads required for updates, causing resource spikes and request failures. Managing hundreds of clusters across many environments. Need for fine‑grained request tagging, data collection, and custom logic.
Introducing OpenResty & Lua : By embedding LuaJIT into Nginx, OpenResty provides APIs for reading/writing headers, custom routing, shared memory, and network programming. Example Lua‑based routing logic: server { location /foo/bar { content_by_lua_block { local r = math.random() if (ngx.var.flag == "group_1" or ngx.var.flag == "group_2") then ngx.exec("@" .. ngx.var.flag) elseif (r >= 0.00 and r < 0.20) then ngx.exec("@group_1") elseif (r >= 0.20 and r <= 1.00) then ngx.exec("@group_2") end } } location @group_1 { content_by_lua_block { ngx.say("I am group_1") } } location @group_2 { content_by_lua_block { ngx.say("I am group_2") } } }
Although Lua solved many routing limits, reloads were still required for data changes.
Solution Architecture : API Module (Java) : Manages lifecycle of Lua scripts and data models, supports gray‑release and cluster registration. Agent Module : Bridges control plane and data plane, persists Lua files and models locally, pushes updates to Nginx workers. Nginx & Lua Module : Executes request handling using Lua scripts that read in‑memory data models for dynamic routing.
Data Model Lifecycle : Data models are stored on disk, loaded into shared memory, and periodically polled by workers. When a new version appears, workers reload the model without restarting Nginx, eliminating reload‑induced downtime.
Example JSON data model for weighted traffic distribution: { "groups": [ {"group_1": {"weight": 20}}, {"group_2": {"weight": 80}} ] }
Example JSON for header‑based routing: { "groups": [ {"group_1": {"header": "foo"}}, {"group_2": {"header": "bar"}} ] }
Practice and Outcomes : The solution enables per‑application, per‑IDC traffic distribution, global and granular failover, and side‑channel functions such as request tagging and response data collection. Deployments have been validated through multiple datacenter failure drills, achieving sub‑second switchover.
Conclusion : By separating routing logic from data and leveraging OpenResty, Ctrip's SLB evolved into a flexible API‑gateway‑style platform that supports dynamic, high‑frequency configuration updates, reduces operational overhead, and scales to complex multi‑cloud, multi‑datacenter environments.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.