Common Intermittent Issues in Production and How to Diagnose Them
This article examines various intermittent problems that surface in production environments—such as concurrency bugs, cache inconsistencies, dirty data, boundary‑value failures, hardware limits, and improper shutdown—provides categorized scenarios, concrete code examples, and practical lessons for preventing and troubleshooting these elusive issues.
In daily development, many developers encounter intermittent problems that only appear under specific conditions in production, such as concurrency issues, cache inconsistency, dirty data, boundary limits, hardware failures, and improper shutdown.
The article first lists seven categories of such scenarios, each illustrated with diagrams, and then provides concrete code examples including non‑thread‑safe collections in parallel streams, misuse of ThreadLocal, mutable template variables, asynchronous dependencies, unsafe concurrency, cache staleness, and lack of graceful shutdown.
Each example includes the problematic code wrapped in ... tags and explains why the bug surfaces when data volume grows or when the environment differs from local testing.
The final section summarizes key lessons: write robust code, consider edge cases, maintain proper logging, avoid blind exception conversion, perform load testing, never swallow exceptions, monitor the whole chain, and ensure graceful shutdown and resource cleanup.
Architect's Guide
Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.