Operations 12 min read

Optimizing VUA HTTPS Forwarding Performance with Intel QuickAssist Technology (QAT)

By integrating Intel QuickAssist hardware and AVX‑512 software acceleration into the VUA component of Vivo’s load‑balancing platform, the article demonstrates asynchronous OpenSSL offloading that boosts HTTPS forwarding throughput to roughly 44 000 QPS with QAT cards and 51 000 QPS with software, while preserving scalability and security.

vivo Internet Technology
vivo Internet Technology
vivo Internet Technology
Optimizing VUA HTTPS Forwarding Performance with Intel QuickAssist Technology (QAT)

Author: vivo Internet Server Team - Qiu Xiangcun

This article explores how to improve the HTTPS forwarding performance of Vivo Unified Access (VUA) by leveraging Intel QuickAssist Technology (QAT) for hardware acceleration. It introduces the QAT acceleration mechanism, evaluates its performance in different scenarios, and discusses practical optimization strategies to achieve the best forwarding throughput.

VLB (vivo load balance) serves as the IDC traffic entry for vivo's internet services, handling massive public‑facing traffic. The focus of this article is the seven‑layer load‑balancing component VUA and its HTTPS performance tuning.

1. Overall VLB Architecture

The core consists of a DPDK‑based four‑layer load balancer (VGW), a seven‑layer load balancer (VUA) built on Apache APISIX and NGINX extensions, and a unified management platform. Key features include high performance (millions of concurrent connections), high availability (ECMP, health checks), scalability, four‑ and seven‑layer load‑balancing capabilities, SSL/TLS offloading, traffic control, and comprehensive monitoring.

2. VUA Seven‑Layer Load Balancer

VUA (vivo Unified Access) is a second‑generation development based on APISIX‑2.4. It provides dynamic upstream, routing, certificate management, traffic gray‑release, black‑white lists, scheduling, and log tracing. The architecture incorporates Apache APISIX, a Go‑based manager‑api, an APISIX‑Ingress‑Controller for Kubernetes, and Etcd for configuration storage.

3. QAT Acceleration Technology

Intel QuickAssist (QAT) provides both a hardware engine and a vectorized‑instruction software engine for OpenSSL. It originated with Intel® Xeon® Scalable processors. QAT integrates with the OpenSSL engine (qatengine.so) to offload cryptographic operations.

3.1 Asynchronous Architecture

VUA extends NGINX’s native asynchronous framework to handle QAT events. The workflow includes:

NGINX calls SSL_do_handshake , starting an async job.

RSA/ECDH encryption/decryption is delegated to the QAT engine.

The QAT driver sends the ciphertext to the accelerator and registers an async file descriptor.

qat_pause_job saves the job’s stack and returns WANT_ASYNC to NGINX.

NGINX adds the async fd to epoll and continues processing other I/O.

When the accelerator finishes, qat_wake_job marks the fd readable.

NGINX resumes the job, restoring the saved context.

3.2 QAT Component Overview

The QAT stack consists of:

Application : patches for the async framework and the QAT OpenSSL engine.

SAL (Service Access Layer) : abstracts crypto and compression services, providing instance creation, queue initialization, and request handling.

ADF (Acceleration Driver Framework) : kernel drivers such as intel_qat.ko , 8950pci , and USDM memory management.

3.3 QAT_HW vs QAT_SW

QAT_HW uses a physical QAT accelerator card via qatengine.so . QAT_SW uses software acceleration based on Intel AVX‑512 (IFMA) through crypto_mb and ipsec_mb libraries, supporting RSA (2048/3072/4096) and AES‑GCM. When both are available, hardware is used for asymmetric operations and software for symmetric GCM; otherwise, QAT_SW handles all supported algorithms.

4. Performance Evaluation

4.1 QAT_HW

Test platform: Intel 8970 accelerator card, RSA certificates for HTTPS.

Method: Deploy VUA with QAT engine, generate traffic until CPU reaches 100%, then compare QPS before and after optimization.

Results:

RSA QPS increased by 1.27× per worker.

With 56 workers, peak throughput reached ~44,000 QPS, limited by the accelerator card.

Key contributors to the gain:

User‑space driver enabling zero‑copy between kernel and user memory.

Asynchronous OpenSSL calls in VUA.

Support for multiple accelerator cards.

4.2 QAT_SW

Test platform: Intel Ice Lake 6330 (AVX‑512), RSA certificates.

Method: Same as above, but using the instruction‑set‑optimized software path.

Results:

RSA QPS increased by ~1× per worker.

With 56 workers, peak throughput reached ~51,000 QPS, showing linear scaling.

Performance gain stems from AVX‑512‑based cryptographic kernels.

5. Conclusion

vivo VLB now supports both hardware (Intel QAT) and software (AVX‑512) acceleration, achieving substantial HTTPS forwarding throughput improvements while maintaining a secure, controllable gateway architecture. Future work will continue to expand the access‑layer capabilities, including QUIC support and MQTT integration for IoT scenarios.

performance optimizationLoad BalancingHTTPShardware accelerationSSL/TLSIntel QATVUA
vivo Internet Technology
Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.