Big Data 11 min read

Cross‑IDC Kafka Hot‑Standby with MirrorMaker 2: Architecture, Design, and Productization

This article explains how 360 Commercialization implements cross‑IDC hot‑standby for Kafka using MirrorMaker 2, covering MM2 fundamentals, architecture, internal topics, deployment on Kubernetes, design goals, solution details, challenges such as dynamic configuration and offset reverse‑mapping, and productized risk mitigation.

360 Tech Engineering
360 Tech Engineering
360 Tech Engineering
Cross‑IDC Kafka Hot‑Standby with MirrorMaker 2: Architecture, Design, and Productization

The article introduces 360 Commercialization's practice of cross‑IDC Kafka hot‑standby, beginning with an overview of MirrorMaker 2 (MM2) and its advantages over the older MM1 component.

MM2, built on the Kafka Connect framework, provides automatic topic discovery, configuration sync, active‑active cluster support, cross‑IDC replication, metadata synchronization, no‑rebalance operation, extensive metrics, and fault‑tolerant horizontal scalability.

Key internal topics include checkpoint (consumer group state), offset_sync (broker offset mapping), and heartbeat (replication flow monitoring).

Deployment is performed per Kafka cluster: each target cluster runs an MM2 instance on Kubernetes, exposing metrics via jmx_prometheus_javaagent for Prometheus and visualizing them in Grafana.

Design goals focus on IDC‑level disaster recovery, decoupled upstream/downstream pipelines, minimal switch cost (ideally without restarting services), and a simple productized workflow that hides complexity from business users.

The solution adopts bidirectional synchronization so that topics exist in both clusters and offsets are kept consistent, enabling seamless consumer failover and producer redirection.

Challenges addressed include:

Dynamic synchronization configuration – solved by implementing TopicFilter and GroupFilter interfaces that query MySQL for meta‑information and reload configuration periodically.

Missing reverse consumer‑group offset mapping – resolved by calculating reverse offsets using the mm2‑offset‑syncs.A.internal topic:

现在我们假设B.topicT的offset为offset,upstreamOffset与downstreamOffset是 mm2-offset-syncs.A.internal 这个topic中记录的broker offset映射关系 
delta = upstreamOffset - downstreamOffset 
reverseOffset = offset + delta

Business‑triggered cluster switching is achieved by wrapping the Kafka client (C++/Java) in a framework that reads configuration from a central store (Apollo or MySQL) and performs the switch without restarting the application.

Productization introduces a "hot‑standby" attribute, defining hot‑standby topics, producers, and consumers, and automates configuration, consumer‑group state sync, and permission granting through the Ultron platform.

Risk considerations focus on data integrity, requiring producers to use retries=Long.MAX_VALUE and acks=ALL , and acknowledging possible log loss if the primary cluster crashes irrecoverably.

References: Apache Kafka Geo‑Replication documentation and KIP‑382 (MirrorMaker 2.0).

Kafkadata replicationcross-IDCHot-StandbyMirrorMaker2Operational Resilience
360 Tech Engineering
Written by

360 Tech Engineering

Official tech channel of 360, building the most professional technology aggregation platform for the brand.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.