Operations 26 min read

Mastering Log Aggregation: From LogID Generation to Powerful Analysis Tools

This article explores the challenges of log aggregation in micro‑service architectures, introduces a globally unique log identifier (logid) with its required properties, compares various logid generation schemes, and presents end‑to‑end solutions for log distribution, aggregation, and analysis using custom tools such as ylog and watcher.

Yuewen Technology

Jul 16, 2021

Mastering Log Aggregation: From LogID Generation to Powerful Analysis Tools

Preface

In the previous article we covered log printing and collection; this continuation focuses on log aggregation and analysis tools, discussing how to efficiently use logs in a micro‑service environment.

Why Log Aggregation Is Hard

Micro‑services mean a single request traverses many services, each possibly running multiple instances. To locate problems and perform offline analysis we need to correlate logs from all services involved in a request.

LogID: A Unique Request Identifier

At the gateway we generate a globally unique logid and propagate it to every downstream service. Each service includes the logid in its log entries, enabling us to group logs belonging to the same request.

Desired LogID Characteristics

Globally unique (or with an extremely low collision probability)

Distributed generation across multiple nodes

High generation throughput

Compact representation

Highly available and scalable

These requirements often conflict; for example, strict uniqueness can increase ID length and generation cost. In practice we relax absolute uniqueness and accept a negligible collision risk.

LogID Generation Schemes

We evaluated several industry‑standard approaches:

Auto‑Increment (Database)

Simple to implement but suffers from poor scalability, single‑point failure, predictability, and longer IDs when sharding is required.

UUID

Generates 128‑bit IDs locally using timestamp, random counter, and MAC address. Advantages: simplicity, no external service, good scalability. Disadvantages: long (128 bits), privacy concerns with MAC address, and larger storage.

Snowflake

Twitter’s 64‑bit ID composed of a millisecond timestamp (41 bits), a machine ID (10 bits), and a per‑millisecond counter (12 bits). It is short, high‑throughput, but requires a dedicated Thrift service and careful handling of clock rollback.

ElasticSearch Doc ID

Creates a 20‑byte Base64‑encoded ID from a millisecond timestamp, MAC address, and counter. It is fast and local but longer than Snowflake and inherits MAC‑address privacy issues.

MongoDB ObjectID

Combines a second‑precision timestamp, a 40‑bit random value, and a 24‑bit counter. It is locally generated, compact, and highly scalable, though the 96‑bit size is slightly larger than Snowflake.

Our Custom Scheme

We adopted the MongoDB approach, then applied a MurmurHash to compress the 96‑bit value to 64 bits. This yields a short, locally generated ID without external dependencies. Randomness is sourced from /dev/urandom via C++ std::random_device. To handle clock rollback we compare the current timestamp with the last used one and regenerate a random node identifier if necessary.

Log Distribution

All services send JSON‑formatted logs to Kafka via Filebeat. Our Dispatcher service subscribes to Kafka, extracts the logid, computes logid % partition_count, and forwards the log to the appropriate partition, ensuring that logs of the same request end up together.

Log Aggregation

The Joiner service consumes the partitioned logs, buffers them in memory, and merges logs sharing the same logid into a single JSON object. It respects a configurable timeout and buffer size; if logs are delayed, partial aggregates are emitted with an alarm for possible scaling.

Analysis Tools

After aggregation, logs are ready for analysis. We built ylog , a command‑line tool that reads aggregated JSON logs from Kafka and provides common operations such as time‑range selection, regex filtering, field extraction, aggregation (count, avg, percentile, etc.), pretty printing, limiting, and sampling.

Typical commands include: ylog cat log – tail‑like streaming ylog cat --offset -3m log – last 3 minutes ylog cat --match search --match total>100 log – filter by keyword and numeric field

ylog cat --sample 0.1 --offset 5m --match query=~^abc --fields total --aggr avg log

– sampled average

Watcher: Automated Metric Monitoring

Watcher consumes logs from Kafka over a sliding window, extracts business metrics, computes aggregates (sum, min, max, avg) and can run custom Lua scripts for complex calculations. When a metric exceeds configured thresholds (absolute, relative, or trend‑based), it triggers alerts and writes the metrics to ElasticSearch for Kibana visualization.

Conclusion

Effective log usage requires standardized, machine‑readable log formats, asynchronous collection (Filebeat → Kafka), a robust logid for request correlation, and end‑to‑end pipelines for distribution, aggregation, and analysis. Our tools—Dispatcher, Joiner, ylog, and watcher—provide a complete solution that reduces operational overhead, improves debugging speed, and enables real‑time metric monitoring.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed Systems monitoring log analysis log aggregation logid

Written by

Yuewen Technology

The Yuewen Group tech team supports and powers services like QQ Reading, Qidian Books, and Hongxiu Reading. This account targets internet developers, sharing high‑quality original technical content. Follow us for the latest Yuewen tech updates.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.