Backend Development 15 min read

Design and Implementation of a Business‑Facing Message Center Management Platform

The platform centralizes message‑center management for e‑commerce by adding end‑to‑end tracing, real‑time metrics, and unified logging, enabling business users to query message links, view dashboards, automate retries and approvals, dramatically reducing manual monitoring, improving completion rates above 90%, and paving the way for cost‑optimized, data‑driven operations.

NetEase Yanxuan Technology Product Team
NetEase Yanxuan Technology Product Team
NetEase Yanxuan Technology Product Team
Design and Implementation of a Business‑Facing Message Center Management Platform

The Message Center is a core component for e‑commerce scenarios. Since its launch in Yanxuan, it has been integrated with more than 200 services and over 1,500 message types, covering all business domains such as infrastructure, supply chain, distribution, main site, order processing, and data algorithms.

Pain points identified during development and operation include low development efficiency due to missing end‑to‑end message tracing, inability for producers to detect consumer exceptions, and high operational costs caused by manual monitoring and retry handling.

To address these issues, a management platform was built for business users, providing complete message‑link query capabilities, real‑time statistical analysis, and automated operational functions.

Overall design focuses on improving observability by integrating three pillars: Tracing, Metrics, and Logging. Tracing records the full lifecycle of a message (traceId, messageId) using the existing APM system; Metrics aggregates performance indicators (send/consume latency, throughput, failure rates) via a Flink job that writes to the NetEase Time‑Series Database (NTSDB); Logging captures detailed payload and metadata through the Yanxuan log platform.

The data‑flow architecture consists of four layers:

Data source layer – application business logs, access logs, and gateway logs.

Data collection layer – log platform agents ingest logs.

Data analysis layer – Flink jobs compute real‑time metrics; Hive stores offline aggregates.

Data storage layer – Elasticsearch for trace queries, Hive for batch data, NTSDB for time‑series metrics, and a relational DB for platform metadata.

Message‑link nodes are defined (e.g., message_received_success , mq_received_failed , polled , consumed , etc.) to standardize the tracing information.

Platform features include:

Message‑link query by messageId or traceId with topology and log views.

Statistical dashboards for production/consumption volume, latency, failure rates, and message size, supporting both business‑side and system‑admin perspectives.

Metadata management for topics, producers, subscribers, and retry policies.

Self‑service operations allowing authorized users to re‑push messages to all or specific subscribers, with full audit trails.

Automated workflow integration with Yanxuan DevOps (Tianshu) for message publishing, subscription, and de‑subscription lifecycle, including automated approval tickets.

The platform has received positive feedback: over 200 services now use it, technical consultation frequency dropped from 50+ per week to under 10, and message‑order completion rates exceed 90%.

Future outlook aims to evolve toward a FinOps‑style cost‑visibility model (cost display → cost analysis → cost optimization), enhance data‑driven alerting beyond failure and latency, and provide automated optimization recommendations that are pushed to business owners and tracked to closure.

backend architectureobservabilityDevOpsmetricsLoggingmessage center
NetEase Yanxuan Technology Product Team
Written by

NetEase Yanxuan Technology Product Team

The NetEase Yanxuan Technology Product Team shares practical tech insights for the e‑commerce ecosystem. This official channel periodically publishes technical articles, team events, recruitment information, and more.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.