Ten Rules for Writing High‑Quality Logs in Production Systems
This article presents ten practical rules for producing high‑quality, searchable logs—including unified formatting, stack‑trace inclusion, proper log levels, complete parameters, data masking, asynchronous writing, trace‑ID linking, dynamic level control, structured storage, and intelligent monitoring—to help developers quickly diagnose issues in high‑traffic applications.
Preface
During a major sales event I saw a flood of red error metrics and, when I opened the server logs, only saw vague messages such as "User login failed", "Order creation error null", and "ERROR illegal argument". I realized that poorly written logs are like incomplete medical records.
This article shares ten military‑grade rules for printing high‑quality logs.
Rule 1: Unified Format
Bad example (management will deduct money):
log.info("start process");
log.error("error happen");Missing timestamp and context.
Correct configuration (logback.xml):
%d{yy-MM-dd HH:mm:ss.SSS} |%X{traceId:-NO_ID} |%thread |%-5level |%logger{36} |%msg%nThis centralizes time format, traceId, thread, level, and message, making it easier to locate issues.
Rule 2: Include Stack Trace on Exceptions
Bad example (colleagues want to punch you):
try {
processOrder();
} catch (Exception e) {
log.error("Processing failed");
}No stack trace is printed, making debugging hard.
Correct usage:
log.error("Order processing exception orderId={}", orderId, e); // e must be present!The log now contains the orderId and the full exception stack.
Rule 3: Reasonable Log Levels
Bad practice:
log.debug("User balance insufficient userId={}"); // business exception should be WARN
log.error("Interface response slightly slow"); // normal timeout should be INFOMisusing levels makes monitoring noisy.
Level definition table:
Level
Correct Use Case
FATAL
System about to crash (OOM, disk full)
ERROR
Core business failure (payment failure, order creation error)
WARN
Recoverable exception (retry succeeded, degradation triggered)
INFO
Key process node (order status change)
DEBUG
Debug information (parameter flow, intermediate results)
Rule 4: Complete Parameters
Bad example (operations will curse):
log.info("User login failed");Only the message is printed; who, where, and why are missing.
Detective‑style log:
log.warn("User login failed username={}, clientIP={}, failReason={}", username, clientIP, "Password attempts exceeded");This records username, IP, timestamp (via logback pattern), and failure reason, enabling fast issue location.
Rule 5: Data Masking
Bleeding case: A colleague leaked a user's phone number in logs.
Use a masking utility:
// Masking utility
public class LogMasker {
public static String maskMobile(String mobile) {
return mobile.replaceAll("(\\d{3})\\d{4}(\\d{4})", "$1****$2");
}
}
log.info("User registration mobile={}", LogMasker.maskMobile("13812345678"));Rule 6: Asynchronous for Performance
Problem reproduction: During a flash‑sale, synchronous log writes blocked many threads.
log.info("Flash sale request userId={}, itemId={}", userId, itemId);Analysis shows thread context switches, disk I/O bottleneck, and logging consuming ~25% of request latency.
Correct three‑step solution:
Step 1: Async appender in logback.xml
0
4096Step 2: Optimized logging code
// No pre‑check needed, framework handles it
log.debug("Received MQ message: {}", msg.toSimpleString());
// Avoid expensive computation before async write
// Wrong: log.debug("Detail: {}", computeExpensiveLog());Step 3: Performance formula
Max memory ≈ queue length × avg log size
Recommended queue depth = peak TPS × tolerated latency (seconds)
Example: 10000 TPS × 0.5s ⇒ 5000 queue sizeRisk mitigation:
Monitor queue usage, alert at 80%.
Prevent OOM by restricting large toString() calls.
Provide JMX switch to revert to synchronous mode quickly.
Rule 7: Traceability Across Services
Cross‑service calls lose log correlation; inject a traceId.
// Interceptor injects traceId
MDC.put("traceId", UUID.randomUUID().toString().substring(0,8));
// Log pattern includes traceId
%d{HH:mm:ss} |%X{traceId}| %msg%nNow logs can be linked end‑to‑end via traceId.
Rule 8: Dynamic Log Level Adjustment
During incidents we may need to enable DEBUG without restarting.
@GetMapping("/logLevel")
public String changeLogLevel(@RequestParam String loggerName, @RequestParam String level) {
Logger logger = (Logger) LoggerFactory.getLogger(loggerName);
logger.setLevel(Level.valueOf(level)); // takes effect immediately
return "OK";
}Dynamic control avoids service restarts.
Rule 9: Structured Storage
Plain text logs are hard to query. Store logs as JSON.
{
"event": "ORDER_CREATE",
"orderId": 1001,
"amount": 8999,
"products": [{"name": "iPhone", "sku": "A123"}]
}This makes data machine‑friendly and searchable.
Rule 10: Intelligent Monitoring
Example ELK alert rule:
ERROR logs continuously for 5 minutes > 100 entries → phone alert
WARN logs continuously for 1 hour → email notificationIntegrating ELK with structured logs enables proactive alerts.
Conclusion
Three developer levels:
Bronze: System.out.println("error!")
Diamond: Standardized logs + ELK monitoring
King: Log‑driven code optimization, anomaly prediction, root‑cause AI models
Final soul‑searching question: When the next production incident occurs, can your logs help a newcomer locate the problem within five minutes?
Java Tech Enthusiast
Sharing computer programming language knowledge, focusing on Java fundamentals, data structures, related tools, Spring Cloud, IntelliJ IDEA... Book giveaways, red‑packet rewards and other perks await!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.