Backend Development 9 min read

Design and Evolution of a High‑Availability SMS Platform at AutoHome

This article details the architectural evolution, high‑availability strategies, fault‑monitoring mechanisms, and performance optimizations of AutoHome's enterprise SMS platform, covering its migration from .Net to Java, service decomposition with Kafka, multi‑datacenter deployment, and operational safeguards for large‑scale events.

HomeTech
HomeTech
HomeTech
Design and Evolution of a High‑Availability SMS Platform at AutoHome

1. Introduction In the information age, SMS remains a fundamental communication channel, and its high availability has become a critical challenge as business volume and quality expectations grow.

2. SMS Platform Overview The AutoHome SMS platform provides enterprise‑grade messaging services, supporting SMS, MMS, and secure verification codes with a focus on high availability, stability, and security.

3. Architecture Evolution The platform progressed from a simple .Net 1.0 implementation to a Java‑based 2.0 version addressing maintenance costs and management needs, then to a 3.0 version that decoupled API validation, third‑party integration, and database persistence into three stateless services communicating via Kafka, enabling horizontal scaling. Version 4.0 introduced AnyCast‑based multi‑datacenter failover and TiDB master‑slave replication to ensure data continuity.

4. Support for Large‑Scale Events (e.g., 818 Global Car Shopping Festival) Multi‑cluster deployments across private and public clouds, supplier channel optimization, dynamic traffic routing, network bandwidth provisioning, and selective status‑report degradation were implemented to handle peak loads of up to one million QPS.

5. Fault Monitoring A comprehensive monitoring system combines internal alerts, third‑party logs, real‑time carrier feedback, and multi‑channel notifications (SMS, DingTalk, WeChat, phone) to detect and respond to failures across services, middleware, network, and external providers.

6. Conclusion High availability for the SMS platform is achieved through multi‑active architecture, load balancing, rate limiting, automatic recovery, data redundancy, and continuous monitoring, ensuring reliable message delivery for critical business operations.

backendoperationsdistributed architectureHigh AvailabilityKafkaTiDBSMS
HomeTech
Written by

HomeTech

HomeTech tech sharing

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.