Mastering Efficient Log Utilization: Best Practices for Logging and Collection
This article outlines how to design, print, collect, and manage online service logs efficiently—covering log levels, key information, formatting, rolling, local vs. remote storage, real‑time collection, and tool selection—to turn logs into a valuable debugging and analytics asset.
Background
Online system logs record detailed information such as request parameters, returned results, and error messages, which help debug issues, trigger alerts, replay logs for testing, and analyze user behavior for reporting and optimization. Efficient log utilization is therefore essential.
Effective Log Utilization
Log Printing : Services follow a logging standard to write logs to local disk.
Log Collection : Logs from all services are aggregated centrally.
Log Aggregation : Related logs are grouped together.
Log Analysis : Logs are analyzed to locate issues, issue alerts, mine user behavior, and replay logs in test environments.
The Yuewen Group search team has implemented the above steps, as illustrated in the architecture diagram:
Log Printing
Parsing logs is difficult when logging is inconsistent. To address this, a logging specification was created, covering log levels, key information, parsable format, and time/size based rolling.
Log Levels
Four levels are defined (low to high):
DEBUG : Records critical processes for debugging.
INFO : Records request context and service responses.
WARN : Records non‑fatal errors that do not affect subsequent requests.
ERROR : Records fatal errors that cause process termination.
Configuration can set the minimum level to print; typically INFO is used in production.
Key Information
Logs should record only essential data, such as service name, request timestamp, log level, thread ID, filename/function/line number, request IP/port, processing time, log ID, request/response payloads, and detailed error messages for WARN/ERROR logs.
Log Frequency
To simplify statistics, the policy is:
One INFO log per request, containing request and response details.
WARN/ERROR logs are printed as needed.
DEBUG logs are disabled by default and enabled only for troubleshooting.
Easy Parsing
Logs should be easy for both code and humans to parse. JSON is chosen as the format, printed as a single line per log entry to balance readability and machine parsing.
Log Rolling
Logs are split by size or time. The team uses hourly rolling and retains the last two days, deleting older logs via scheduled scripts.
Local vs. Remote Logs
Printing logs locally and decoupling collection via dedicated tools avoids coupling services with remote log servers and reduces resource impact on the service.
Log Collection
Collecting logs from online nodes must not affect service performance. Real‑time collection requires high throughput, fault tolerance, ordering guarantees, and handling of downstream failures. The team evaluated several open‑source tools and selected Filebeat for its lightweight nature, strong community, and seamless integration with the Elastic Stack.
Collected logs are sent to Kafka, where stream processing performs unified ETL before downstream modules (e.g., HDFS for batch analysis, Elasticsearch for search) consume them, ensuring a single source of truth.
Summary
Online logs are a critical asset for debugging, monitoring, user behavior analysis, and system optimization. This article presented the Yuewen Group's practices for log printing and collection; future articles will explore log aggregation and analysis tools.
Yuewen Technology
The Yuewen Group tech team supports and powers services like QQ Reading, Qidian Books, and Hongxiu Reading. This account targets internet developers, sharing high‑quality original technical content. Follow us for the latest Yuewen tech updates.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.