Operations 11 min read

Ten Rules for Writing High‑Quality Logs in Production Systems

This article presents ten practical rules for producing high‑quality, searchable logs—including unified formatting, stack‑trace inclusion, proper log levels, complete parameters, data masking, asynchronous writing, trace‑ID linking, dynamic level control, structured storage, and intelligent monitoring—to help developers quickly diagnose issues in high‑traffic applications.

Java Tech Enthusiast

May 18, 2025

Ten Rules for Writing High‑Quality Logs in Production Systems

Preface

During a major sales event I saw a flood of red error metrics and, when I opened the server logs, only saw vague messages such as "User login failed", "Order creation error null", and "ERROR illegal argument". I realized that poorly written logs are like incomplete medical records.

This article shares ten military‑grade rules for printing high‑quality logs.

Rule 1: Unified Format

Bad example (management will deduct money):

log.info("start process");
log.error("error happen");

Missing timestamp and context.

Correct configuration (logback.xml):

<!-- logback.xml core pattern -->
<pattern>
    %d{yy-MM-dd HH:mm:ss.SSS} |%X{traceId:-NO_ID} |%thread |%-5level |%logger{36} |%msg%n
</pattern>

This centralizes time format, traceId, thread, level, and message, making it easier to locate issues.

Rule 2: Include Stack Trace on Exceptions

Bad example (colleagues want to punch you):

try {
    processOrder();
} catch (Exception e) {
    log.error("Processing failed");
}

No stack trace is printed, making debugging hard.

Correct usage:

log.error("Order processing exception orderId={}", orderId, e); // e must be present!

The log now contains the orderId and the full exception stack.

Rule 3: Reasonable Log Levels

Bad practice:

log.debug("User balance insufficient userId={}"); // business exception should be WARN
log.error("Interface response slightly slow"); // normal timeout should be INFO

Misusing levels makes monitoring noisy.

Level definition table:

Level

Correct Use Case

FATAL

System about to crash (OOM, disk full)

ERROR

Core business failure (payment failure, order creation error)

WARN

Recoverable exception (retry succeeded, degradation triggered)

INFO

Key process node (order status change)

DEBUG

Debug information (parameter flow, intermediate results)

Rule 4: Complete Parameters

Bad example (operations will curse): log.info("User login failed"); Only the message is printed; who, where, and why are missing.

Detective‑style log:

log.warn("User login failed username={}, clientIP={}, failReason={}", username, clientIP, "Password attempts exceeded");

This records username, IP, timestamp (via logback pattern), and failure reason, enabling fast issue location.

Rule 5: Data Masking

Bleeding case: A colleague leaked a user's phone number in logs.

Use a masking utility:

// Masking utility
public class LogMasker {
    public static String maskMobile(String mobile) {
        return mobile.replaceAll("(\\d{3})\\d{4}(\\d{4})", "$1****$2");
    }
}
log.info("User registration mobile={}", LogMasker.maskMobile("13812345678"));

Rule 6: Asynchronous for Performance

Problem reproduction: During a flash‑sale, synchronous log writes blocked many threads.

log.info("Flash sale request userId={}, itemId={}", userId, itemId);

Analysis shows thread context switches, disk I/O bottleneck, and logging consuming ~25% of request latency.

Correct three‑step solution:

Step 1: Async appender in logback.xml

<!-- AsyncAppender core config -->
<appender name="ASYNC" class="ch.qos.logback.classic.AsyncAppender">
    <discardingThreshold>0</discardingThreshold>
    <queueSize>4096</queueSize>
    <appender-ref ref="FILE"/>
</appender>

Step 2: Optimized logging code

// No pre‑check needed, framework handles it
log.debug("Received MQ message: {}", msg.toSimpleString());
// Avoid expensive computation before async write
// Wrong: log.debug("Detail: {}", computeExpensiveLog());

Step 3: Performance formula

Max memory ≈ queue length × avg log size
Recommended queue depth = peak TPS × tolerated latency (seconds)
Example: 10000 TPS × 0.5s ⇒ 5000 queue size

Risk mitigation:

Monitor queue usage, alert at 80%.

Prevent OOM by restricting large toString() calls.

Provide JMX switch to revert to synchronous mode quickly.

Rule 7: Traceability Across Services

Cross‑service calls lose log correlation; inject a traceId.

// Interceptor injects traceId
MDC.put("traceId", UUID.randomUUID().toString().substring(0,8));
// Log pattern includes traceId
<pattern>%d{HH:mm:ss} |%X{traceId}| %msg%n</pattern>

Now logs can be linked end‑to‑end via traceId.

Rule 8: Dynamic Log Level Adjustment

During incidents we may need to enable DEBUG without restarting.

@GetMapping("/logLevel")
public String changeLogLevel(@RequestParam String loggerName, @RequestParam String level) {
    Logger logger = (Logger) LoggerFactory.getLogger(loggerName);
    logger.setLevel(Level.valueOf(level)); // takes effect immediately
    return "OK";
}

Dynamic control avoids service restarts.

Rule 9: Structured Storage

Plain text logs are hard to query. Store logs as JSON.

{
  "event": "ORDER_CREATE",
  "orderId": 1001,
  "amount": 8999,
  "products": [{"name": "iPhone", "sku": "A123"}]
}

This makes data machine‑friendly and searchable.

Rule 10: Intelligent Monitoring

Example ELK alert rule:

ERROR logs continuously for 5 minutes > 100 entries → phone alert
WARN logs continuously for 1 hour → email notification

Integrating ELK with structured logs enables proactive alerts.

Conclusion

Three developer levels:

Bronze: System.out.println("error!") Diamond: Standardized logs + ELK monitoring

King: Log‑driven code optimization, anomaly prediction, root‑cause AI models

Final soul‑searching question: When the next production incident occurs, can your logs help a newcomer locate the problem within five minutes?

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Monitoring Performance Best Practices logging logback

Written by

Java Tech Enthusiast

Sharing computer programming language knowledge, focusing on Java fundamentals, data structures, related tools, Spring Cloud, IntelliJ IDEA... Book giveaways, red‑packet rewards and other perks await!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.