Unlock Hidden Savings: Optimizing Multi‑Data Center Bandwidth Costs
This article examines the characteristics and billing models of multi‑data‑center networks, analyzes external traffic patterns, identifies challenges in optimizing Internet‑facing bandwidth, and proposes practical scheduling strategies to better utilize idle bandwidth and reduce carrier costs.
1. Characteristics and Billing Models of Multi‑Data Centers
Background As internet services grow rapidly, the number of data centers and the complexity of their networks increase dramatically.
Multi‑Data Center Networks
Similar to Google’s network, large internet companies split their network into:
Data Center Internal Networks (DCNs)
WAN (Wide Area Networks)
According to traffic direction, WAN can be divided into two backbone networks:
Intra‑DC WAN (Inter‑DC WAN, like Google B4) connecting geographically distributed data centers.
Internet‑Facing WAN serving end‑users (search, video, download).
Bandwidth Billing Models
Internet‑Facing WAN incurs high fees from carriers. With rapid service growth, capacity has expanded from 10 G to 1 T or more.
Common external bandwidth billing models include:
Peak billing: sample bandwidth over a period (e.g., 5 min) and charge based on the maximum sample in a month.
95th percentile billing: discard the top 5 % of samples and charge on the highest remaining sample.
Daily peak average billing: sample daily peaks and charge on the average of those daily peaks over a month.
2. Characteristics of Data Center External Traffic
A Simple Peak‑Billing Example
Peak billing charges based on the monthly outbound bandwidth peak.
The figure shows a traffic graph where the peak occurs at 22:00 on day 1; billing is based on that peak.
Outside the peak, the data center can use more traffic for free.
The green area represents idle bandwidth that is free.
Thus, each data center has considerable idle bandwidth that is not fully utilized.
Special days (e.g., JD 618, Double 11, popular series releases) cause very high peaks.
During the rest of the month, bandwidth values are far below the peak, leaving abundant idle bandwidth.
A Daily‑Peak Billing Example
Some data centers use daily‑peak billing.
Similar to peak billing, daily‑peak data centers also have large amounts of idle bandwidth.
The figure shows that non‑peak moments have traffic far below the daily peak, leaving substantial idle bandwidth.
3. Challenges of Optimizing External Traffic
Because of peak billing, each data center often has significant idle bandwidth that could be used to optimize external traffic.
Challenges include:
1. Real‑time nature of Internet‑Facing Traffic
Traffic patterns show low usage around 6 am and peaks in the evening.
Shifting peak traffic to off‑peak times could reduce billed traffic, but user‑facing traffic is real‑time and cannot be delayed.
High‑real‑time services such as search, social networking, e‑commerce, and gaming cannot be shifted.
2. Uncertainty of Traffic Peaks
In practice, it is hard to know in advance when the monthly peak will occur or its magnitude, making it difficult to calculate available free bandwidth.
Typical external traffic scheduling changes DNS, which introduces latency.
However, user behavior shows regular patterns; machine‑learning models can predict traffic, allowing estimation of short‑term idle bandwidth.
4. How to Better Utilize Paid Bandwidth
We consider ways to use idle bandwidth across data centers to more fully utilize paid bandwidth.
1. Reduce Paid Bandwidth of Other Peak‑Billing DCs
Two DCs have peak times at 16:00 and 23:00 respectively; the other DC has idle bandwidth at the opposite time, allowing traffic shifting.
Simple traffic scheduling can lower both DCs’ peaks and reduce carrier fees.
With many DCs and diverse user behavior, appropriate scheduling algorithms can significantly cut external paid bandwidth.
Limitations arise when peaks occur close together or simultaneously, leaving no idle bandwidth to absorb others.
2. Reduce Paid Bandwidth of Daily‑Peak Billing DCs
Using idle bandwidth from peak‑billing DCs to absorb traffic from daily‑peak DCs yields the following optimized traffic graph:
Daily‑peak DCs see reduced billed bandwidth each day, lowering monthly carrier costs.
Even if all DCs hit the monthly peak on one day, other days still have capacity to accept daily‑peak traffic, reducing costs.
This approach relaxes the limitation of peak‑billing scheduling.
3. Use External Idle Bandwidth for Internal Traffic
Idle bandwidth is mostly available during off‑peak hours (midnight to 10 am), when user‑facing traffic is low.
Background flows of distributed storage, cloud services, search, etc., can use this idle external bandwidth for tolerant‑delay internal jobs, alleviating internal link congestion.
However, external networks have higher latency and loss; unified scheduling must consider external performance to avoid harming internal jobs.
5. Future Outlook
Multi‑data‑center external traffic scheduling can reduce billed bandwidth but has limitations and cannot fully exploit abundant idle bandwidth during idle periods.
Using external idle bandwidth for internal traffic requires coordinated intra‑ and inter‑network scheduling and attention to external transmission quality.
Overall, the high cost of external bandwidth warrants addressing these challenges.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.