Developing a Business Gateway: Protocol Conversion, Performance, and Observability
The article details how a Node‑based business gateway for QQ Channel services replaced legacy RPC with a lightweight tRPC solution, achieving ten‑fold latency reduction, higher QPS, improved security and availability, and added observability through logging, metrics, and a WebSocket push layer.
The article describes the end‑to‑end development of a business gateway for QQ Channel services, focusing on converting legacy RPC protocols to a lightweight tRPC‑based solution, improving performance, enhancing security, and building observability.
1. Background and Motivation
Rapid traffic growth and functional expansion required a more efficient protocol conversion layer. Existing client‑side JSAPI and backend HTTPSSO solutions were abandoned in favor of a Node‑based front‑end approach that could serve both Mini‑Programs and H5.
2. Protocol Conversion Design
The tRPC request consists of a frame header and a protobuf‑defined packet header. The key fields include version, call_type, request_id, timeout, caller, callee, func, message_type, trans_info, content_type, and content_encoding.
message RequestProtocol {
uint32 version = 1;
uint32 call_type = 2;
uint32 request_id = 3;
uint32 timeout = 4;
bytes caller = 5;
bytes callee = 6;
bytes func = 7;
uint32 message_type = 8;
map
trans_info = 9;
uint32 content_type = 10;
uint32 content_encoding = 11;
}By mapping HTTP paths to callee and func , and serializing the body with JSON.stringify , the gateway can forward requests without requiring proto files.
3. Performance Improvements
Benchmarking with APISIX showed a ten‑fold latency reduction compared with the legacy NGW/TSW stack (single‑core QPS increased from 2K to 20K, success rate from 99.985% to 99.997%).
import { sleep, check } from "pts";
import HTTP from "pts/HTTP";
export const options = {};
export default function main() {
let response;
response = HTTP.get("https://xx.com/apisix/");
check("status equals 200", () => response.statusCode.toString() === "200");
}Similar scripts were used for NGW/TSW comparisons.
4. Security and Availability
During a traffic surge, issues such as link‑code corruption, high failure rates, and pod overload were observed. Root‑cause analysis revealed missing connection‑pool limits and cache mis‑management in the Polaris gRPC client. The fix involved adding connection‑pool caching, proper eviction, and separating clusters for different services.
5. Observability Enhancements
To detect hard‑to‑reproduce bugs, comprehensive logging, real‑time debugging (Node --inspect), and metric collection (CPU, memory, latency) were introduced. A WebSocket layer using socket.io was added to provide reliable downstream push capabilities with message caching, ACK, and retransmission mechanisms.
6. Tooling and Automation
CI pipelines, Git‑managed configuration, and enterprise WeChat bots were employed to automate whitelist updates, OIDB integration, and traffic shading. This reduced manual configuration time from minutes to seconds.
7. Conclusion
The gateway evolution demonstrates that incremental, pragmatic engineering—prioritising stability, performance, and developer experience—can transform a complex protocol conversion into a robust, observable, and scalable backend service.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.