Backend Development 10 min read

Investigation and Fix of OpenTelemetry ThreadPool Trace Propagation Bug in Non‑Capturing Lambda Scenarios

This article analyzes a sporadic loss of trace information when using OpenTelemetry’s non‑capturing lambda tasks in a Java ThreadPoolExecutor, explains the underlying cause related to Runnable reuse and lambda caching, and presents the community‑driven patches that correctly propagate context across threads.

ZhongAn Tech Team

Sep 1, 2023

Investigation and Fix of OpenTelemetry ThreadPool Trace Propagation Bug in Non‑Capturing Lambda Scenarios

Based on OpenTelemetry, a non‑intrusive distributed tracing solution has been deployed at ZhongAn for some time, but a bug was reported where trace IDs were intermittently missing in logs generated from thread‑pool tasks.

To reproduce the issue, a simple Spring Boot controller was created that submits 20 lambda tasks to a ThreadPoolExecutor and logs the thread name together with the trace context configured via logback.xml. The logs showed that only a subset of the threads printed the trace information.

@RestController
@Slf4j
public class ThreadController {
    private final ThreadPoolExecutor threadPoolExecutor = new ThreadPoolExecutor(
        10, 10, 5, TimeUnit.MINUTES,
        new LinkedBlockingDeque<>(1024),
        new BasicThreadFactory.Builder().namingPattern("queue-worker-%d").build(),
        new ThreadPoolExecutor.CallerRunsPolicy());

    @GetMapping("/testLog")
    public String testLog() {
        for (int j = 0; j < 20; j++) {
            threadPoolExecutor.execute(() -> log.info("===>" + Thread.currentThread().getName()));
        }
        return "OK";
    }
}

Adding a downstream Feign call to the lambda reproduced the same inconsistent trace propagation. Replacing the lambda with a concrete Worker class or a new instance eliminated the problem, indicating that the issue is tied to the reuse of non‑capturing lambda instances.

class Worker implements Runnable {
    @Override
    public void run() {
        log.info("--->" + Thread.currentThread().getName());
    }
}
// replace lambda
threadPoolExecutor.execute(new Worker());

The OpenTelemetry Java agent enhances MDC via bytecode weaving, injecting a PropagatedContext field into the submitted Runnable. When the same Runnable instance is reused (as happens with cached non‑capturing lambdas), the context from a previous execution can be overwritten, causing missing trace IDs.

Investigation of the agent’s thread‑pool plugin revealed that for lambdas with zero captured parameters (non‑capturing lambdas), the JVM reuses a single generated instance via the ConstantCallSite mechanism, leading to the observed context leakage.

The community confirmed this as a bug and provided a quick fix: before executing a lambda, the agent now wraps it in a custom Runnable that sets the correct context, ensuring each execution gets an isolated copy.

// simplified fix inside the agent
if (isLambda && factoryType.parameterCount() == 0) {
    Runnable wrapped = () -> {
        // set context
        original.run();
    };
    return new ConstantCallSite(MethodHandles.constant(Runnable.class, wrapped));
}

The final upstream fix expands this wrapping to all runnable submissions, adding a boolean flag to decide whether wrapping is needed and excluding special queue implementations (e.g., PriorityBlockingQueue) that cannot handle the wrapper.

In summary, the root cause was the reuse of non‑capturing lambda instances causing shared PropagatedContext across threads; the resolution involved explicit wrapping of runnables to isolate context, with additional safeguards for custom thread‑pool queues.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java Lambda ThreadPool OpenTelemetry Tracing BugFix

Written by

ZhongAn Tech Team

China's first online insurer. Through tech innovation we make insurance simpler, warmer, and more valuable. Powered by technology, we support 50 billion RMB of policies and serve 600 million users with smart, personalized solutions. ZhongAn's hardcore tech and article shares are here.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.