Backend Development 13 min read

Remote Aware Load Balance (RALB) Algorithm for Search Recommendation System: Design, Implementation, and Performance Evaluation

This article presents the design and evaluation of the Remote Aware Load Balance (RALB) algorithm applied to JD’s search‑recommendation architecture, describing its CPU‑centric load‑balancing principles, implementation details, functional verification, throughput and boundary testing, and the observed improvements in CPU utilization and overall system performance.

JD Retail Technology
JD Retail Technology
JD Retail Technology
Remote Aware Load Balance (RALB) Algorithm for Search Recommendation System: Design, Implementation, and Performance Evaluation

Background: JD's search recommendation services use CPU‑adaptive throttling but client calls use round‑robin (RR) without considering server performance, leading to CPU imbalance.

RALB Overview: RALB (Remote Aware Load Balance) targets CPU balance by adjusting traffic weights based on real‑time server CPU usage reported via RPC.

Algorithm Goals: Equalize server‑side CPU usage; exploit linear relation between QPS and CPU to control load.

Algorithm Steps: 1) Distribute traffic by weighted random (wr). 2) Collect per‑server CPU every second via RPC. 3) Every 3 s recompute weights to balance CPU.

Metric Dependencies: Uses IP list, real‑time health, historical health, dynamic CPU target, and weight as inputs (table shown).

Weight Adjustment: Initialize weight to 10000, periodically update based on average cluster CPU, apply scaling factor (default 0.5) and limit weight changes to avoid extreme shifts.

Boundary Handling: Handles cases where a server receives no traffic (CPU reported as 0) and network failures (weight set to 0 until recovery).

Functional Verification: RALB deployed in the search‑recommendation cluster; after rollout, QPS distribution became layered and CPU usage converged across servers.

Throughput Tests: Compared RALB with RR under unlimited, partial, and full throttling. Results show RALB maintains CPU balance and achieves up to 7 % higher throughput at the critical transition point.

Test Data: Includes tables of QPS, CPU, TP99 for both algorithms and a Python script used to plot throughput curves:

import matplotlib.pyplot as plt
import numpy as np

x = [0,1,2,3,4,5,6,7,8,9,9.73,10.958,11.52,17.15,22.7]
y = [0,1,2,3,4,5,6,7,8,9,9.73,10.61,10.49,10.10,9.82]

w = [0,1,2,3,4,5,6,7,8,9.674,10.823,11.496,11.723,12.639,13.141,17.15,22.7]
z = [0,1,2,3,4,5,6,7,8,9.27,9.91,10.24,10.36,10.48,10.47,10.10,9.82]

plt.plot(x, y, 'r-o')
plt.plot(w, z, 'g-o')
plt.show()

Conclusions: RALB effectively eliminates CPU short‑board effects, provides stable latency, and improves overall cluster throughput, especially around the non‑limited to fully‑limited transition.

Deployment: After full rollout, server‑side QPS and CPU distributions became more uniform, confirming the algorithm’s production readiness.

distributed systemsalgorithmload balancingperformance testingCPU utilization
JD Retail Technology
Written by

JD Retail Technology

Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.