Design and Implementation of Meituan's Logan Real-Time Log System
This article describes how Meituan built Logan, a high‑performance, end‑to‑end real‑time logging platform for mobile, web, mini‑programs and IoT, covering its background, architecture, data collection, processing, consumption, monitoring, deployment strategies, achieved results and future roadmap.
1 Background
Logan is Meituan's unified log service for terminals, supporting mobile apps, web, mini‑programs and IoT. It provides log collection, storage, upload, query and analysis, helping developers locate issues quickly and preventing log loss and storage overflow.
1.1 Logan Overview
Logan offers high‑performance, secure, loss‑free logging and is one of the earliest open‑source large‑scale front‑end logging systems.
1.2 Workflow
Logs are actively reported via HTTPS, encrypted, stored in object storage, then downloaded, decrypted, parsed and delivered to the log platform for querying and analysis.
1.3 Why Real‑Time Logging?
Traditional "store‑locally‑then‑report" approaches cause delayed troubleshooting, lack real‑time alerts, and make full‑link tracing difficult, prompting the need for a unified real‑time solution.
1.4 What is Logan Real‑Time Logging?
It delivers high‑scalability, high‑performance, and high‑reliability real‑time log services, covering collection, upload, processing, consumption, delivery, query and analysis.
2 Design and Implementation
2.1 Overall Architecture
The system consists of five layers: Collection End, Access Layer, Processing Layer, Consumption Layer, and Log Platform.
2.2 Collection End
A cross‑platform SDK collects logs, encrypts them with ECDH + AES, compresses, aggregates and reports them. It includes configuration management, encryption, local disk caching for failed uploads, and queue management to avoid memory bloat.
2.3 Access Layer
The access layer provides a public HTTPS endpoint, handles high concurrency, low latency, and forwards logs to a Kafka topic for downstream processing.
2.4 Processing Layer
Three candidates (Java, Storm, Flink) were evaluated; Flink was chosen for its low latency and high throughput. The Flink job parses, decrypts, splits logs by service dimension and applies custom transformations.
2.5 Consumption Layer
Processed logs are delivered to Kafka streams, enabling use cases such as full‑link tracing, metric aggregation with alerts, and offline analysis via Hive.
2.6 Log Platform
The platform, built on Elasticsearch, offers multi‑dimensional search (user ID, tags, keywords) and supports plug‑in adapters for other storage engines.
3 Stability Assurance
3.1 Core Monitoring
Key SLA metrics (availability, latency, error rate) are monitored, along with a dashboard covering report success rate, domain QPS, job throughput, and alerting for anomalies.
3.2 Blue‑Green Deployment
A blue‑green strategy runs a new job alongside the existing one, switches traffic after validation, and ensures deployment failures do not cause data loss or latency spikes.
4 Achievements
By Q3 2022, Logan was adopted by over 20 business systems (e.g., Meituan Mini‑Program, SaaS platforms). It reduced average issue‑location time from 10 minutes to under 3 minutes and saved 10‑15 minutes per incident during internal testing.
5 Future Plans
Upcoming work includes expanding terminal support, adding log cleaning, metric statistics, alerting, full‑link tracing, supporting millions of QPS, improving upload success to 99.9 %, and implementing rate‑limiting, circuit‑breaker and incident‑response mechanisms.
High Availability Architecture
Official account for High Availability Architecture.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.