Backend Development 30 min read

Performance Optimization of SSL/TLS in JD.com JDDLB Load Balancer Using Freescale Acceleration Cards

This article describes the architecture of JD.com’s JDDLB public‑traffic load balancer and details how offloading CPU‑intensive SSL/TLS cryptographic operations to Freescale C291 acceleration cards—via custom NGINX modules, OpenSSL Engine integration, and synchronous/asynchronous driver interfaces—significantly improves connection‑establishment rates and overall throughput.

JD Tech Talk
JD Tech Talk
JD Tech Talk
Performance Optimization of SSL/TLS in JD.com JDDLB Load Balancer Using Freescale Acceleration Cards

JDDLB is JD.com’s primary public‑traffic entry point, replacing commercial F5 devices and handling massive traffic during major sales events. Its core consists of a DPDK‑based four‑layer SLB, a customized NGINX for extended features, and a unified management platform.

The system offers high performance (millions of concurrent connections), high availability (ECMP, session sync, health checks), and extensibility (four‑layer/seven‑layer load‑balancing clusters, horizontal scaling, gray releases). It supports various load‑balancing algorithms, SSL/TLS offload, traffic control (SYN‑Flood, WAF, QoS, ACL), and comprehensive monitoring.

To accelerate SSL/TLS, the article proposes offloading the CPU‑heavy asymmetric cryptographic calculations to Freescale acceleration cards. Tests show a single card raises HTTPS handshake creation rate by 130% and reduces CPU usage by 40%; two cards increase the rate by 320%.

Implementation details include:

Custom PCI acceleration cards with 3‑4 Freescale C291 processors per card.

Kernel driver ( fsl_pkc_crypto_offload_drv.ko ) exposing a /dev/crypto character device ( cryptodev.ko ) supporting open , close , ioctl , and event notification via select/poll/epoll .

Synchronous mode: user‑space process issues ioctl(/dev/crypto, SYNC_CMD, ...) and blocks until the driver returns the result.

Asynchronous mode: process issues ioctl(/dev/crypto, ASYNC_CMD, ...) , receives an event‑fd, registers it with NGINX’s epoll loop, and continues processing; the driver notifies completion via the event‑fd.

OpenSSL Engine integration allows NGINX to transparently use the hardware accelerator for RSA/ECDHE operations.

NGINX’s event‑driven architecture (worker processes with a single thread each) benefits from asynchronous offload, preventing handshake‑related CPU stalls and improving new‑connection rates.

The article also explains SSL/TLS fundamentals—confidentiality, integrity, and identity verification—illustrating how asymmetric key operations dominate handshake latency and why hardware offload is effective.

Overall, the asynchronous offload model yields higher throughput and lower CPU utilization compared to the synchronous model, making it suitable for large‑scale public‑facing services.

backendLoad BalancingnginxOpenSSLhardware accelerationSSL/TLS
JD Tech Talk
Written by

JD Tech Talk

Official JD Tech public account delivering best practices and technology innovation.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.