Backend Development 13 min read

Dynamic Load Balancing Algorithms in the TARS Microservice Framework

The Vivo Internet Server Team extended TARS with a dynamic load‑balancing algorithm that recalculates each node’s weight every minute using metrics such as 5‑minute average response time, timeout and exception rates, CPU, memory and network load, automatically adapting traffic distribution beyond the built‑in round‑robin, weighted round‑robin and consistent‑hash methods.

vivo Internet Technology
vivo Internet Technology
vivo Internet Technology
Dynamic Load Balancing Algorithms in the TARS Microservice Framework

Background

Vivo's internet services use the TARS microservice framework for various reasons. While TARS already provides several built‑in load‑balancing algorithms (round‑robin, weighted round‑robin, and consistent hash), this article focuses on a custom dynamic load‑balancing algorithm developed by the Vivo Internet Server Team.

What is Load Balancing?

Load balancing distributes traffic across multiple computers, network links, CPUs, disks, or other resources to optimize resource usage, maximize throughput, minimize response time, and avoid overload. It is a core technique for handling high concurrency and high availability in Internet architectures.

TARS Supported Load‑Balancing Algorithms

TARS provides three built‑in algorithms:

Round‑robin

Weighted round‑robin

Consistent hash

The entry point is selectAdapterProxy in EndpointManager.cpp .

3.1 Round‑Robin

The algorithm cycles through the list of available IPs, assigning each incoming request to the next server in order. It works well when all nodes have similar capacity.

3.2 Weighted Round‑Robin

Each node is assigned a weight that reflects the proportion of traffic it should receive. For example, with five nodes weighted 4, 1, 1, 1, 3, a total of 100 requests would be distributed as 40, 10, 10, 10, 30. The implementation must be smooth, i.e., the distribution should not send the first four requests all to the highest‑weight node.

3.3 Consistent Hash

Consistent hash ensures that the same client request is likely to be routed to the same node, which is useful for cache‑centric services. TARS implements both an MD5‑based hash with XOR offset and a Ketama hash.

Why Dynamic Load Balancing?

In mixed‑deployment environments (multiple services on the same VM), a faulty service can consume excessive CPU or memory, affecting co‑located services. Static algorithms cannot react quickly to such anomalies, and manual weight adjustments are slow and error‑prone. A dynamic algorithm automatically adjusts node weights based on real‑time metrics, reducing operational overhead.

Dynamic Load‑Balancing Strategy

The strategy calculates a weight for each node using several load factors: 5‑minute average response time, 5‑minute timeout rate, 5‑minute exception rate, CPU load, memory usage, and network load. The factors are extensible.

The overall weight is a weighted sum of the individual factor scores. For example, if response‑time weight = 10 (40% importance) and timeout‑rate weight = 20 (60% importance), the total is 10 × 0.4 + 20 × 0.6 = 16.

Weight Calculation Details

1. Time‑consuming weight (inverse proportion to average time):

weight = initialWeight * (totalTime - nodeAvgTime) / totalTime

2. Timeout‑rate weight (penalize high timeout rates):

weight = initialWeight - timeoutRate * initialWeight * 0.9

Reference Implementation (C++)

void LoadBalanceThread::calculateWeight(LoadCache &loadCache)
{
    for (auto &loadPair : loadCache)
    {
        std::ostringstream log;
        const auto ITEM_SIZE(static_cast
(loadPair.second.vtBalanceItem.size()));
        int aveTime(loadPair.second.aveTimeSum / ITEM_SIZE);
        log << "aveTime: " << aveTime << "|"
            << "vtBalanceItem size: " << ITEM_SIZE << "|";
        for (auto &loadInfo : loadPair.second.vtBalanceItem)
        {
            // Time‑consuming weight (inverse proportion)
            TLOGDEBUG("loadPair.second.aveTimeSum: " << loadPair.second.aveTimeSum << std::endl);
            int aveTimeWeight(loadPair.second.aveTimeSum ? (DEFAULT_WEIGHT * ITEM_SIZE * (loadPair.second.aveTimeSum - loadInfo.aveTime) / loadPair.second.aveTimeSum) : 0);
            aveTimeWeight = aveTimeWeight <= 0 ? MIN_WEIGHT : aveTimeWeight;
            // Timeout‑rate weight
            int timeoutRateWeight(loadInfo.succCount ? (DEFAULT_WEIGHT - static_cast
(loadInfo.timeoutCount * TIMEOUT_WEIGHT_FACTOR / (loadInfo.succCount + loadInfo.timeoutCount))) : (loadInfo.timeoutCount ? MIN_WEIGHT : DEFAULT_WEIGHT));
            // Combine weights
            loadInfo.weight = aveTimeWeight * getProportion(TIME_CONSUMING_WEIGHT_PROPORTION) / WEIGHT_PERCENT_UNIT
                              + timeoutRateWeight * getProportion(TIMEOUT_WEIGHT_PROPORTION) / WEIGHT_PERCENT_UNIT ;
            log << "aveTimeWeight: " << aveTimeWeight << ", "
                << "timeoutRateWeight: " << timeoutRateWeight << ", "
                << "loadInfo.weight: " << loadInfo.weight << "; ";
        }
        TLOGDEBUG(log.str() << "|" << std::endl);
    }
}

The core class is LoadBalanceThread , invoked by RegistryServer to update weights every 60 seconds.

Usage

Enable dynamic load balancing by adding -w -v parameters to the Servant configuration. All nodes must enable the feature; otherwise the framework falls back to round‑robin.

Applicable Scenarios

Dynamic balancing is useful for mixed‑deployment services on VMs where one service may affect others. It is less needed for containerized services that already have orchestration‑level scaling.

Future Plans

Currently only average response time and timeout rate are used. Future work includes adding CPU usage, memory usage, and response‑code‑based weight adjustments.

BackendalgorithmMicroservicesCLoad BalancingTARSDynamic Weight
vivo Internet Technology
Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.