Databases 17 min read

Implementation and Analysis of MongoDB Nearest Mode for Multi-Data Center Deployment

This article explains how MongoDB's nearest mode achieves proximity‑aware reads across multiple data centers by analyzing the internal mongos and driver code, detailing latency collection, smoothing algorithms, node selection logic, and providing configuration recommendations for latency‑sensitive workloads.

Tencent Database Technology
Tencent Database Technology
Tencent Database Technology
Implementation and Analysis of MongoDB Nearest Mode for Multi-Data Center Deployment

1. Background Introduction

To ensure service availability and data reliability, critical services deploy storage systems across multiple regions and data centers, for example Beijing, Shanghai, and Shenzhen, each storing a data replica so that a failure in one region does not affect business.

When deploying across data centers, network latency must be considered; for example, the ping between Shanghai and Shenzhen is about 30 ms, while intra‑data‑center latency is around 0.1 ms.

Tencent Cloud MongoDB combines L5 proximity access and an internal “nearest” mode to achieve near‑by access, avoiding latency penalties. The architecture includes mongos as a proxy and mongod as storage nodes forming a primary‑secondary replica set distributed across data centers.

2. What is the nearest access mode

2.1 Replica Set Concept

In MongoDB, a replica set is a collection of nodes that store identical data. Clients can access the replica set directly via the driver or through mongos .

Replica sets elect a primary via the Raft algorithm and synchronize data using the oplog.

2.2 Read‑Write Splitting and readPreference

MongoDB reads and writes default to the primary, but provides readPreference to separate read/write requests. Five readPreference types are available:

2.3 Read‑Write Consistency Guarantee

To ensure that reads from secondaries see the latest writes, set WriteConcern so that data is written to all nodes before acknowledging.

If the business model is write‑heavy and read‑light, cross‑data‑center synchronization should be considered carefully.

3. Nearest Mode Implementation Details

3.1 mongos Code Analysis

Latency Information Collection

mongos runs a probe thread every 5 seconds that issues the isMaster command to each replica set and records latency.

try {
    ScopedDbConnection conn(ConnectionString(ns.host), socketTimeoutSecs);
    bool ignoredOutParam = false;
    Timer timer; // start timing
    if (conn->isStillConnected()) {
        conn->isMaster(ignoredOutParam, &reply); // execute isMaster
    } else {
        log() << "Connection to " << ns.host.toString() << " is closed";
        reply = BSONObj();
    }
    pingMicros = timer.micros(); // record this round latency
    conn.done();
} catch (const DBException &ex) {
    ...
}

Then it smooths the latency using a moving average (1/4 of the delta):

if (reply.latencyMicros >= 0) {
    if (latencyMicros == unknownLatency) {
        latencyMicros = reply.latencyMicros; // first update
    } else {
        latencyMicros += (reply.latencyMicros - latencyMicros) / 4; // smooth update
    }
}

Nearest Node Selection

The algorithm sorts nodes by latency, discards nodes whose latency exceeds the nearest node by more than 15 ms, and randomly returns a qualifying node.

case ReadPreference::SecondaryOnly:
case ReadPreference::Nearest: {
    BSONForEach(tagElem, criteria.tags.getTagBSON()) {
        uassert(16358, "Tags should be a BSON object", tagElem.isABSONObj());
        BSONObj tag = tagElem.Obj();
        std::vector
matchingNodes;
        for (size_t i = 0; i < nodes.size(); i++) {
            if (nodes[i].matches(criteria.pref) && nodes[i].matches(tag)) {
                matchingNodes.push_back(&nodes[i]);
            }
        }
        if (matchingNodes.empty()) continue;
        if (matchingNodes.size() == 1) return matchingNodes.front()->host;
        std::sort(matchingNodes.begin(), matchingNodes.end(), compareLatencies);
        for (size_t i = 1; i < matchingNodes.size(); i++) {
            int64_t distance = matchingNodes[i]->latencyMicros - matchingNodes[0]->latencyMicros;
            if (distance >= latencyThresholdMicros) {
                matchingNodes.erase(matchingNodes.begin() + i, matchingNodes.end());
                break;
            }
        }
        if (ReplicaSetMonitor::useDeterministicHostSelection) {
            return matchingNodes[roundRobin++ % matchingNodes.size()]->host;
        } else {
            return matchingNodes[rand.nextInt32(matchingNodes.size())]->host;
        }
    }
    return HostAndPort();
}

3.2 mgo Driver Code Analysis

Latency Information Collection

The mgo driver probes every 15 seconds using ping , keeps the maximum of the last six measurements as the latency reference.

for {
    if loop {
        time.Sleep(delay) // collect every 15 seconds
    }
    socket, _, err := server.AcquireSocket(0, delay)
    if err == nil {
        start := time.Now()
        _, _ = socket.SimpleQuery(&op) // execute ping
        delay := time.Now().Sub(start) // measure duration
        server.pingWindow[server.pingIndex] = delay
        server.pingIndex = (server.pingIndex + 1) % len(server.pingWindow)
        server.pingCount++
        var max time.Duration
        for i := 0; i < len(server.pingWindow) && uint32(i) < server.pingCount; i++ {
            if server.pingWindow[i] > max {
                max = server.pingWindow[i]
            }
        }
        server.pingValue = max // use max as latency metric
        logf("Ping for %s is %d ms", server.Addr, max/time.Millisecond)
    } else if err == errServerClosed {
        return
    }
    if !loop { return }
}

Nearest Node Selection

Similar to mongos, but prefers nodes with lower connection count to achieve load balancing.

func (servers *mongoServers) BestFit(mode Mode, serverTags []bson.D) *mongoServer {
    var best *mongoServer
    for _, next := range servers.slice {
        if best == nil {
            best = next
            best.RLock()
        }
        if serverTags != nil && !next.info.Mongos && !best.hasTags(serverTags) {
            best.RUnlock()
            best = nil
        }
        next.RLock()
        swap := false
        switch {
        case serverTags != nil && !next.info.Mongos && !next.hasTags(serverTags):
            // must have requested tags
        case next.info.Master != best.info.Master && mode != Nearest:
            // prefer slaves unless mode is PrimaryPreferred
            swap = (mode == PrimaryPreferred) != best.info.Master
        case absDuration(next.pingValue-best.pingValue) > 15*time.Millisecond:
            // prefer nearest server
            swap = next.pingValue < best.pingValue
        case len(next.liveSockets)-len(next.unusedSockets) < len(best.liveSockets)-len(best.unusedSockets):
            // prefer servers with fewer connections
            swap = true
        }
        if swap {
            best.RUnlock()
            best = next
        } else {
            next.RUnlock()
        }
    }
    if best != nil {
        best.RUnlock()
    }
    return best
}

3.3 Official Go Driver Code Analysis

Latency Information Collection

The Go driver runs isMaster every 10 seconds, measures round‑trip time, and updates an exponential moving average with α = 0.2.

func (s *Server) updateAverageRTT(delay time.Duration) time.Duration {
    if !s.averageRTTSet {
        s.averageRTT = delay // first measurement
    } else {
        alpha := 0.2
        s.averageRTT = time.Duration(alpha*float64(delay) + (1-alpha)*float64(s.averageRTT))
    }
    return s.averageRTT
}

Nearest Node Selection

A composite selector combines ReadPrefSelector and LatencySelector . LatencySelector computes the minimum RTT among candidates, adds a configurable threshold (default 15 ms), and returns all nodes within that window.

func (ls *latencySelector) SelectServer(t Topology, candidates []Server) ([]Server, error) {
    if ls.latency < 0 {
        return candidates, nil
    }
    if len(candidates) == 0 || len(candidates) == 1 {
        return candidates, nil
    }
    min := time.Duration(math.MaxInt64)
    for _, candidate := range candidates {
        if candidate.AverageRTTSet && candidate.AverageRTT < min {
            min = candidate.AverageRTT
        }
    }
    if min == time.Duration(math.MaxInt64) {
        return candidates, nil
    }
    max := min + ls.latency
    var result []Server
    for _, candidate := range candidates {
        if candidate.AverageRTTSet && candidate.AverageRTT <= max {
            result = append(result, candidate)
        }
    }
    return result, nil
}

After obtaining the qualified list, a random node is chosen as the target.

selected := suitable[rand.Intn(len(suitable))]
selectedS, err := t.FindServer(selected)
if err != nil {
    return nil, err
}
return selectedS, nil

Usage Recommendations

The default 15 ms threshold can be overridden in mongos configuration ( replication.localPingThresholdMs ) or via Go driver ClientOptions for latency‑sensitive workloads.

4. Summary

MongoDB’s nearest mode enables proximity‑aware reads in multi‑data‑center deployments for both driver‑to‑mongod and mongos‑to‑mongod paths. This article dissected the implementation in Tencent Cloud MongoDB and common Go drivers, and offered configuration tips.

腾讯数据库技术团队对内支持QQ空间、微信红包、腾讯广告、腾讯音乐、腾讯新闻等公司自研业务,对外在腾讯云上支持TencentDB相关产品,如CynosDB、CDB、CTSDB、CMongo等。腾讯数据库技术团队专注于持续优化数据库内核和架构能力,提升数据库性能和稳定性,为腾讯自研业务和腾讯云客户提供“省心、放心”的数据库服务。此公众号和广大数据库技术爱好者一起,推广和分享数据库领域专业知识,希望对大家有所帮助。
latencyReplicationMongoDBmulti-data centerGo DriverNearest ModeRead Preference
Tencent Database Technology
Written by

Tencent Database Technology

Tencent's Database R&D team supports internal services such as WeChat Pay, WeChat Red Packets, Tencent Advertising, and Tencent Music, and provides external support on Tencent Cloud for TencentDB products like CynosDB, CDB, and TDSQL. This public account aims to promote and share professional database knowledge, growing together with database enthusiasts.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.