Mastering Log Aggregation: From LogID Generation to Powerful Analysis Tools
This article explores the challenges of log aggregation in micro‑service architectures, introduces a globally unique log identifier (logid) with its required properties, compares various logid generation schemes, and presents end‑to‑end solutions for log distribution, aggregation, and analysis using custom tools such as ylog and watcher.
Preface
In the previous article we covered log printing and collection; this continuation focuses on log aggregation and analysis tools, discussing how to efficiently use logs in a micro‑service environment.
Why Log Aggregation Is Hard
Micro‑services mean a single request traverses many services, each possibly running multiple instances. To locate problems and perform offline analysis we need to correlate logs from all services involved in a request.
LogID: A Unique Request Identifier
At the gateway we generate a globally unique logid and propagate it to every downstream service. Each service includes the logid in its log entries, enabling us to group logs belonging to the same request.
Desired LogID Characteristics
Globally unique (or with an extremely low collision probability)
Distributed generation across multiple nodes
High generation throughput
Compact representation
Highly available and scalable
These requirements often conflict; for example, strict uniqueness can increase ID length and generation cost. In practice we relax absolute uniqueness and accept a negligible collision risk.
LogID Generation Schemes
We evaluated several industry‑standard approaches:
Auto‑Increment (Database)
Simple to implement but suffers from poor scalability, single‑point failure, predictability, and longer IDs when sharding is required.
UUID
Generates 128‑bit IDs locally using timestamp, random counter, and MAC address. Advantages: simplicity, no external service, good scalability. Disadvantages: long (128 bits), privacy concerns with MAC address, and larger storage.
Snowflake
Twitter’s 64‑bit ID composed of a millisecond timestamp (41 bits), a machine ID (10 bits), and a per‑millisecond counter (12 bits). It is short, high‑throughput, but requires a dedicated Thrift service and careful handling of clock rollback.
ElasticSearch Doc ID
Creates a 20‑byte Base64‑encoded ID from a millisecond timestamp, MAC address, and counter. It is fast and local but longer than Snowflake and inherits MAC‑address privacy issues.
MongoDB ObjectID
Combines a second‑precision timestamp, a 40‑bit random value, and a 24‑bit counter. It is locally generated, compact, and highly scalable, though the 96‑bit size is slightly larger than Snowflake.
Our Custom Scheme
We adopted the MongoDB approach, then applied a MurmurHash to compress the 96‑bit value to 64 bits. This yields a short, locally generated ID without external dependencies. Randomness is sourced from /dev/urandom via C++ std::random_device . To handle clock rollback we compare the current timestamp with the last used one and regenerate a random node identifier if necessary.
Log Distribution
All services send JSON‑formatted logs to Kafka via Filebeat. Our Dispatcher service subscribes to Kafka, extracts the logid , computes logid % partition_count , and forwards the log to the appropriate partition, ensuring that logs of the same request end up together.
Log Aggregation
The Joiner service consumes the partitioned logs, buffers them in memory, and merges logs sharing the same logid into a single JSON object. It respects a configurable timeout and buffer size; if logs are delayed, partial aggregates are emitted with an alarm for possible scaling.
Analysis Tools
After aggregation, logs are ready for analysis. We built ylog , a command‑line tool that reads aggregated JSON logs from Kafka and provides common operations such as time‑range selection, regex filtering, field extraction, aggregation (count, avg, percentile, etc.), pretty printing, limiting, and sampling.
Typical commands include:
ylog cat log – tail‑like streaming
ylog cat --offset -3m log – last 3 minutes
ylog cat --match search --match total>100 log – filter by keyword and numeric field
ylog cat --sample 0.1 --offset 5m --match query=~^abc --fields total --aggr avg log – sampled average
Watcher: Automated Metric Monitoring
Watcher consumes logs from Kafka over a sliding window, extracts business metrics, computes aggregates (sum, min, max, avg) and can run custom Lua scripts for complex calculations. When a metric exceeds configured thresholds (absolute, relative, or trend‑based), it triggers alerts and writes the metrics to ElasticSearch for Kibana visualization.
Conclusion
Effective log usage requires standardized, machine‑readable log formats, asynchronous collection (Filebeat → Kafka), a robust logid for request correlation, and end‑to‑end pipelines for distribution, aggregation, and analysis. Our tools—Dispatcher, Joiner, ylog, and watcher—provide a complete solution that reduces operational overhead, improves debugging speed, and enables real‑time metric monitoring.
Yuewen Technology
The Yuewen Group tech team supports and powers services like QQ Reading, Qidian Books, and Hongxiu Reading. This account targets internet developers, sharing high‑quality original technical content. Follow us for the latest Yuewen tech updates.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.