Frontend Development 12 min read

Design and Implementation of a Multi‑CDN Disaster Recovery Mechanism for Frontend Resource Loading

This article presents a comprehensive multi‑CDN disaster‑recovery solution for frontend static resources, detailing the background, current issues, goals, SDK‑based architecture, monitoring and retry strategies, data‑reporting mechanisms, evaluation results, and future dynamic scheduling improvements.

Yang Money Pot Technology Team
Yang Money Pot Technology Team
Yang Money Pot Technology Team
Design and Implementation of a Multi‑CDN Disaster Recovery Mechanism for Frontend Resource Loading

CDN (Content Delivery Network) is a core Internet infrastructure that delivers static assets such as JavaScript, CSS, images, and videos from edge nodes to improve user experience, but single‑provider failures can cause page white‑screens and service disruption.

The current situation shows that each country uses only one CDN provider (e.g., Alibaba Cloud in China, Qiniu in Indonesia), monitoring is fragmented, recovery time is long, and availability varies, with some providers below the 99.9% baseline.

To address these problems, the team set two goals: establish a multi‑CDN disaster‑recovery mechanism and achieve overall CDN availability of four nines (99.99%).

The overall solution builds an SDK on the client side, consisting of an SDK layer that manages the full lifecycle of resource loading (monitoring status, retrying, and reporting) and a monitoring layer that provides a complete CDN observability platform with real‑time dashboards and alerts.

Detailed design – Monitoring resource load status

Three loading scenarios are covered: direct loading via script , link , img tags; dynamic creation using import transformed to document.createElement ; and CSS‑based images via background-image , border-image , etc. Example code shows global error and load listeners for direct resources:

function errorHandler() {
    const target = event.target;
    if (!["SCRIPT", "LINK", "IMG"].includes(target.tagName)) {
        return;
    }
    // collect failure info
}

function loadHandler() {
    // collect success info
}

window.document.addEventListener('error', errorHandler, true);
window.document.addEventListener('load', loadHandler, true);

For dynamically created resources, the SDK overrides document.createElement to attach load and error listeners:

const originalCreateElement = document.createElement;
document.createElement = function(tagName) {
  const element = originalCreateElement.call(document, tagName);
  if (isResourceTag(tagName)) {
    attachLoadMonitor(element);
  }
  return element;
};

function isResourceTag(tagName) {
  return ['SCRIPT', 'LINK', 'IMG'].includes(tagName);
}

function attachLoadMonitor(element) {
  element.addEventListener('load', () => { recordLoadSuccess(element); });
  element.addEventListener('error', () => { recordLoadError(element); });
}

For CSS images, the SDK parses style sheets and appends fallback CDN URLs to background-image (or similar) declarations, leveraging the browser’s built‑in sequential loading of multiple URLs.

const urlProperties = ['backgroundImage', 'borderImage', 'listStyleImage'];
function createStyleSheetMonitor() {
  const styleSheets = Array.from(document.styleSheets);
  styleSheets.forEach(processStyleSheetImages);
}

function processStyleSheetImages(sheet) {
  // replace single URL with multiple fallback URLs
}

Retry mechanism

When a resource fails, the SDK selects the next CDN domain from a pool and retries; if all alternatives fail, it falls back to the primary domain. Resources are pre‑uploaded to all providers and warmed up on edge nodes during CI.

Reporting strategy

The SDK records fields such as appid, metricsType, assetsUrl, provider, retryTimes, and page URL, batches them (e.g., 20 items or 1 s interval) and sends them to a backend service to ensure data consistency and avoid duplicate uploads.

Effect evaluation

Four evaluation dimensions were used: fault‑simulation verification, overall stability improvement (near‑100% success rate), real‑fault scenario validation, and historical data analysis. Results show the success rate rising from 99.7441% to 99.9988%, meeting the four‑nine target.

Conclusion and outlook

The multi‑CDN disaster‑recovery mechanism dramatically improves static‑resource availability, but a fixed retry order may not suit all users; future work will implement a user‑perceived dynamic CDN scheduling system.

frontendMonitoringsdkcdndisaster recoveryRetry
Yang Money Pot Technology Team
Written by

Yang Money Pot Technology Team

Enhancing service efficiency with technology.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.