Backend Development 17 min read

Designing a High‑Performance Message Notification System

This article explains how to design and implement a high‑performance, scalable message notification system, covering service partitioning, system architecture, first‑time and retry message handling, idempotency, dynamic routing, thread‑pool management, stability measures such as traffic surge handling, resource isolation, monitoring, and elastic scaling.

Architect

Nov 28, 2024

Designing a High‑Performance Message Notification System

1 Service Partition

Service division

System design

Stability assurance

Summary

2 System Design

2.1 First Message Sending

When a message‑sending request arrives, it can be processed via RPC or MQ. RPC avoids message loss, while MQ provides asynchronous decoupling and traffic shaping.

2.1.1 Idempotency Handling

To prevent duplicate processing, a Redis‑based idempotency check is used: a unique Redis key is set with a short TTL; if the same key already exists and the payload matches, the request is considered duplicate.

private boolean isDuplicate(MessageDto messageDto) {
    String redisKey = getRedisKey(messageDto);
    boolean isDuplicate = false;
    try {
        if (!RedisUtils.setNx(redisKey, messageDto, 30*60L)) {
            isDuplicate = true;
        }
        if (isDuplicate) {
            MessageDto oldDTO = RedisUtils.getObject(redisKey);
            if (Objects.equals(messageDto, oldDTO)) {
                log.info("消息重复了");
            } else {
                isDuplicate = false;
            }
        }
    } catch (Exception e) {
        isDuplicate = false;
    }
    return isDuplicate;
}

2.1.2 Problem Service Dynamic Detector

The routing component distinguishes normal and problematic services. When a service shows high latency or failures within a configured time window, Sentinel marks it as problematic and routes subsequent requests to an exception executor, isolating it from normal traffic.

2.1.3 Sentinel Sliding Window Implementation (Ring Buffer)

Sentinel uses a ring‑buffer based sliding window to count requests and failures. The window index is calculated as (currentTime / bucketSize) % windowCount, and the start time of each bucket is updated when the window rolls over.

2.1.4 Dynamic Thread‑Pool Adjustment

Two separate thread pools are created: one for normal services and another for problematic services. The pool size is initially set based on CPU cores, and runtime metrics are used to adjust the size dynamically without changing the queue length.

ThreadPoolExecutor pool = new ThreadPoolExecutor(poolSize, poolSize, 0L, TimeUnit.MILLISECONDS, new LinkedBlockingQueue<Runnable>());

The executor checks its queue size; if it exceeds a threshold, new tasks are rejected and placed into MQ for later processing, preventing overload.

2.2 Retry Message Sending

Messages that fail or exceed capacity are retried using a distributed scheduled‑task framework with shard‑broadcast. The framework ensures that each retry respects the same idempotency and routing rules.

public void init() {
    ScheduledExecutorService scheduledService = new ScheduledThreadPoolExecutor(taskRepository.size());
    for (Map.Entry<String, TaskHandler> entry : taskRepository.entrySet()) {
        final String taskName = entry.getKey();
        final TaskHandler handler = entry.getValue();
        scheduledService.scheduleAtFixedRate(new Runnable() {
            @Override
            public void run() {
                try {
                    if (handler.isBusy()) {
                        return;
                    }
                    handleTask(taskName, handler);
                } catch (Throwable e) {
                    logger.error(taskName + " task handler fail!", e);
                }
            }
        }, 30, 5, TimeUnit.SECONDS);
    }
}

Task execution is guarded by a distributed lock to avoid duplicate processing across nodes.

public void handTask(String taskName, TaskHandler handler) {
    Lock lock = LockFactory.getLock(taskName);
    List<ScheduleTaskEntity> taskList = null;
    try {
        if (lock.tryLock()) {
            taskList = getTaskList(taskName, handler);
        }
    } finally {
        lock.unlock();
    }
    if (taskList == null) return;
    handler.handleData(taskList);
}

2.2.1 ES and MySQL Data Synchronization

Message send records are stored in MySQL, while searchable logs are indexed in Elasticsearch (ES). Synchronization uses the update_time field as a version stamp; only records with a newer update_time than the one stored in ES are applied, ensuring consistency.

3 Stability Assurance

3.1 Traffic Surge

Two‑level degradation is applied. For gradual spikes, the thread pool marks itself busy, and messages are off‑loaded to MQ for asynchronous persistence. For sudden spikes, Sentinel immediately routes excess traffic to MQ, where delayed consumption processes it when resources become available.

3.2 Resource Isolation for Problem Services

Problematic services are executed in a separate thread pool, preventing long‑running or failing calls from starving normal services.

3.3 Protection of Third‑Party Services

Third‑party integrations are wrapped with rate‑limiting and circuit‑breaker logic to avoid cascading failures.

3.4 Middleware Fault Tolerance

The design anticipates middleware outages (e.g., MQ downtime) by providing fallback mechanisms and graceful degradation.

3.5 Comprehensive Monitoring System

A full‑stack monitoring solution tracks request latency, error rates, thread‑pool usage, and resource saturation, enabling early detection and rapid response.

3.6 Dual‑Active Deployment and Elastic Scaling

Services are deployed across multiple data centers with active‑active configuration and can scale elastically based on real‑time metrics, balancing cost and performance.

4 Summary

Effective message notification systems require careful service partitioning, robust system design, and comprehensive stability measures such as traffic shaping, resource isolation, monitoring, and elastic scaling. No single solution fits all scenarios; architects must tailor designs to specific business requirements.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Backend Development System Design high-performance stability Message Notification

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.