Information Security 16 min read

Asynchronous RSA Proxy Computation and Performance Optimization for SSL Handshake

The article presents a comprehensive engineering solution that separates RSA operations, employs parallel and asynchronous processing, modifies OpenSSL and Nginx event handling, and adds symmetric‑encryption optimizations to boost SSL/TLS handshake throughput by over threefold while reducing CPU load.

Tencent Architect
Tencent Architect
Tencent Architect
Asynchronous RSA Proxy Computation and Performance Optimization for SSL Handshake

RSA Asynchronous Proxy Computation

The most CPU‑intensive part of HTTPS is the RSA calculation during key exchange, so the optimization focuses on separating this step, using parallel hardware acceleration, and making the computation asynchronous.

Algorithm Separation

Only RSA‑based key‑exchange algorithms (ECDHE_RSA, DHE_RSA, RSA) are targeted. The RSA signing step uses a 2048‑bit private key, which dominates CPU usage.

Parallel Computing

Reduce the time of a single request by using higher‑frequency CPUs or dedicated accelerator cards.

Increase concurrent request handling by employing multiple CPUs or accelerator cards, achieving up to a 4× throughput gain.

Asynchronous Request Handling

Nginx normally blocks until OpenSSL finishes ServerKeyExchange or the premaster secret decryption. By offloading RSA_sign to another CPU or hardware and returning immediately, Nginx can continue processing other requests.

Nginx receives a request and calls RSA_sign .

RSA_sign invokes RSA_private_encrypt and returns without waiting for the signature result.

Nginx proceeds with other work.

RSA_private_encrypt performs the costly modular exponentiation using the private key.

The heavy computation runs on a separate CPU/accelerator, so the local CPU is not blocked.

Engineering Implementation Challenges

Working with OpenSSL and Nginx core code requires deep knowledge of SSL/TLS (RFC 5246, 5280, 4492), PKI, ECC, and the massive, legacy OpenSSL codebase (over 500 k lines, inconsistent style, extensive macros).

OpenSSL Stack Refactoring

Because OpenSSL only supports synchronous RSA, the stack must be modified to allow asynchronous delegation. After evaluating forks, BoringSSL was rejected due to limited compatibility, and LibreSSL was rejected because its ECDHE performance is ~¼ of OpenSSL. The project ultimately stayed with OpenSSL.

Nginx Event Mechanism Refactoring

Nginx has 11 processing phases, but custom modules can only hook into 7 after HTTP headers are parsed, preventing intervention in the TLS handshake. Therefore, the core event code ( ngx_ssl_handshake in src/event/ngx_event_openssl.c ) was extended to invoke the asynchronous RSA proxy without altering existing logic.

Performance Results

RSA asynchronous proxy raised Nginx ECDHE_RSA full‑handshake throughput from ~18 000 qps to ~65 000 qps (≈3.5× improvement).

Symmetric Encryption Optimizations

For large payloads, symmetric ciphers dominate; asynchronous offloading is unsuitable. Instead, hardware acceleration is used: Intel AES‑NI instructions provide ~20% speedup (4.3 W → 5.1 W). The aes-ni: OPENSSL_ia32cap="~0x200000200000000" openssl speed -elapsed -evp environment variable can enable testing. Additionally, ChaCha20‑Poly1305 offers >30% improvement on platforms without AES‑NI.

Session Resume Enhancements

Reducing full handshakes further improves performance. Two mechanisms are discussed:

Session cache: server stores a session ID (32‑48 bytes) and reuses it on subsequent connections, saving one RTT.

Session ticket (RFC 5077): server issues an encrypted ticket, eliminating server‑side state; requires a shared key across distributed Nginx instances.

Both mechanisms are compared in a table of mechanisms, advantages, and drawbacks.

Conclusions

Increase session‑resume ratio via distributed session cache and tickets to cut full handshakes.

Asynchronous RSA proxy boosts SSL handshake throughput ~3.5×, reducing hardware costs.

Adopt high‑performance symmetric ciphers (AES‑GCM, ChaCha20‑Poly1305) and enable AES‑NI.

performance optimizationRSAnginxOpenSSLCryptographyasynchronous proxySSL handshake
Tencent Architect
Written by

Tencent Architect

We share insights on storage, computing, networking and explore leading industry technologies together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.