Operations 11 min read

Design and Implementation of a High‑Performance, High‑Reliability Four‑Layer Load Balancer (TVS) with FullNat Model

The article describes the motivation, architecture, FullNat forwarding process, high‑availability mechanisms, and performance‑optimizing techniques of TVS, a custom four‑layer load‑balancing platform built to overcome LVS limitations and meet the company’s demanding traffic and reliability requirements.

Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Design and Implementation of a High‑Performance, High‑Reliability Four‑Layer Load Balancer (TVS) with FullNat Model

Background: As the company’s business grew rapidly, a high‑performance, high‑reliability L4 load‑balancing device was required. Commercial solutions were too expensive, and the open‑source LVS project did not meet the desired throughput and stability, prompting the development of a proprietary solution called TVS with extensive optimizations.

Load‑Balancing Introduction: Early Internet services used a single server, but increasing traffic and reliability needs made DNS‑based load distribution insufficient due to caching delays and simple algorithms. A dedicated L4 load balancer distributes traffic to multiple backend servers using weighted round‑robin and health checks, automatically removing failed servers.

FullNat Forwarding Flow (FullNat model): 1. Client sends request to VIP. 2. Load balancer selects a Real Server and a LocalIP via round‑robin. 3. Session is created, recording ClientIP, VIP, RSIP, LocalIP. 4. Packet’s ClientIP is rewritten to LocalIP, VIP to RSIP. 5. Real Server processes the packet and replies. 6. Load balancer receives the reply and looks up the session. 7. Packet’s RSIP is rewritten back to VIP, LocalIP to ClientIP. 8. Reply packet is sent to the client.

Overall Architecture: TVS adopts a dual‑cluster active‑standby design within the same data center. Each cluster (A and B) contains four machines that load‑balance and back up each other, while the two clusters also back up each other, providing both intra‑ and inter‑cluster redundancy.

Key Features of TVS: 1. Per‑core session tables. 2. Support for establishing and long‑lived session synchronization. 3. Dynamic online/offline of TVS servers within a cluster. 4. Hot‑switching between TVS clusters for upper‑level services.

Inter‑Cluster High Availability: Both clusters announce the same VIP1 via BGP; Cluster B uses a lower‑priority route‑policy so that all traffic is handled by Cluster A under normal conditions, and automatically switches to Cluster B if Cluster A fails. No session synchronization is performed between clusters.

Intra‑Cluster High Availability: ECMP, long/short mask routing, and session synchronization ensure that any single or multiple server failures do not affect service. ECMP distributes traffic evenly; when a server fails, ECMP routes adjust while remaining servers continue handling traffic.

High‑Performance Implementation: 1. Reduce context switches by isolating TVS cores from the Linux scheduler and binding processing threads to specific CPUs. 2. Accelerate memory access using NUMA‑aware data placement, hugepages, and cache‑line‑aligned structures to minimize cache misses and QPI traffic. 3. Bypass the kernel using Linux UIO and DMA, sharing memory between user and kernel space, and employing DPDK for fast packet I/O. 4. Eliminate lock contention with per‑core session tables, avoiding global locks.

Lock Competition Solution: Because RSS may split request and response flows across different cores, TVS uses the Intel 82599 NIC’s FDIR feature. Each core receives a dedicated LocalIP pool, and FDIR rules direct packets of the same session to the same core, ensuring consistent processing.

Conclusion: TVS’s architecture combines network design and business requirements to provide a reliable, high‑performance load‑balancing platform. Future work includes adding more ALG protocol support, improving scheduling algorithms, reducing forwarding latency, and enhancing DDoS defense.

Reference Materials: 1. DPDK official site 2. Alibaba LVS repository 3. Intel 82599 10GbE controller datasheet

High Availabilityload balancingnetwork optimizationDPDKCluster ArchitectureKernel BypassFULLNAT
Tongcheng Travel Technology Center
Written by

Tongcheng Travel Technology Center

Pursue excellence, start again with Tongcheng! More technical insights to help you along your journey and make development enjoyable.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.