Backend Development 10 min read

Investigation and Fix of OpenTelemetry ThreadPool Trace Propagation Bug in Non‑Capturing Lambda Scenarios

This article analyzes a sporadic loss of trace information when using OpenTelemetry’s non‑capturing lambda tasks in a Java ThreadPoolExecutor, explains the underlying cause related to Runnable reuse and lambda caching, and presents the community‑driven patches that correctly propagate context across threads.

ZhongAn Tech Team
ZhongAn Tech Team
ZhongAn Tech Team
Investigation and Fix of OpenTelemetry ThreadPool Trace Propagation Bug in Non‑Capturing Lambda Scenarios

Based on OpenTelemetry, a non‑intrusive distributed tracing solution has been deployed at ZhongAn for some time, but a bug was reported where trace IDs were intermittently missing in logs generated from thread‑pool tasks.

To reproduce the issue, a simple Spring Boot controller was created that submits 20 lambda tasks to a ThreadPoolExecutor and logs the thread name together with the trace context configured via logback.xml . The logs showed that only a subset of the threads printed the trace information.

@RestController
@Slf4j
public class ThreadController {
    private final ThreadPoolExecutor threadPoolExecutor = new ThreadPoolExecutor(
        10, 10, 5, TimeUnit.MINUTES,
        new LinkedBlockingDeque<>(1024),
        new BasicThreadFactory.Builder().namingPattern("queue-worker-%d").build(),
        new ThreadPoolExecutor.CallerRunsPolicy());

    @GetMapping("/testLog")
    public String testLog() {
        for (int j = 0; j < 20; j++) {
            threadPoolExecutor.execute(() -> log.info("===>" + Thread.currentThread().getName()));
        }
        return "OK";
    }
}

Adding a downstream Feign call to the lambda reproduced the same inconsistent trace propagation. Replacing the lambda with a concrete Worker class or a new instance eliminated the problem, indicating that the issue is tied to the reuse of non‑capturing lambda instances.

class Worker implements Runnable {
    @Override
    public void run() {
        log.info("--->" + Thread.currentThread().getName());
    }
}
// replace lambda
threadPoolExecutor.execute(new Worker());

The OpenTelemetry Java agent enhances MDC via bytecode weaving, injecting a PropagatedContext field into the submitted Runnable . When the same Runnable instance is reused (as happens with cached non‑capturing lambdas), the context from a previous execution can be overwritten, causing missing trace IDs.

Investigation of the agent’s thread‑pool plugin revealed that for lambdas with zero captured parameters (non‑capturing lambdas), the JVM reuses a single generated instance via the ConstantCallSite mechanism, leading to the observed context leakage.

The community confirmed this as a bug and provided a quick fix: before executing a lambda, the agent now wraps it in a custom Runnable that sets the correct context, ensuring each execution gets an isolated copy.

// simplified fix inside the agent
if (isLambda && factoryType.parameterCount() == 0) {
    Runnable wrapped = () -> {
        // set context
        original.run();
    };
    return new ConstantCallSite(MethodHandles.constant(Runnable.class, wrapped));
}

The final upstream fix expands this wrapping to all runnable submissions, adding a boolean flag to decide whether wrapping is needed and excluding special queue implementations (e.g., PriorityBlockingQueue ) that cannot handle the wrapper.

In summary, the root cause was the reuse of non‑capturing lambda instances causing shared PropagatedContext across threads; the resolution involved explicit wrapping of runnables to isolate context, with additional safeguards for custom thread‑pool queues.

javalambdaThreadPoolOpenTelemetrytracingBugFix
ZhongAn Tech Team
Written by

ZhongAn Tech Team

China's first online insurer. Through tech innovation we make insurance simpler, warmer, and more valuable. Powered by technology, we support 50 billion RMB of policies and serve 600 million users with smart, personalized solutions. ZhongAn's hardcore tech and article shares are here.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.