Backend Development 11 min read

Thread Profiling: Design and Implementation of Client‑Server Performance Analysis

Thread profiling uses threshold‑triggered tasks on business threads to capture stack snapshots, which a dedicated profiler thread sends via high‑performance gRPC to a server that queues them in Kafka, enriches and stores them in ClickHouse, correlates with OpenTelemetry traces, and provides metrics that let developers quickly pinpoint latency bottlenecks and improve system stability.

DeWu Technology
DeWu Technology
DeWu Technology
Thread Profiling: Design and Implementation of Client‑Server Performance Analysis

Thread profiling is a powerful technique for identifying high‑latency issues by collecting and analyzing runtime thread stacks.

The core idea is to create a threshold‑triggered detection task on business threads; when the threshold is exceeded, a dedicated profiling thread captures the stack and asynchronously sends it to a profiling server for analysis.

Implementation Overview

Client Design – tasks are created, scheduled on a time‑wheel (default 100 ms tick), executed, and exported. Tasks are queued, executed by a thread pool, and stack snapshots are pushed to a diagnostic queue. Data size can reach >200 KB per snapshot, so queue length is configurable.

Server Design – the server receives data via high‑performance gRPC, enqueues it into Kafka, parses and enriches it, then persists (e.g., ClickHouse). It also supports OpenTelemetry trace correlation.

Data Processing

Snapshots are pre‑aggregated, parent‑child stack frames are inferred, and self‑time is calculated using defined rules.

[
    {
        "data": "YXQgc3VuLm5pby5jaC5Vd...",
        "thread_name": "XNIO-1 I/O-1",
        "thread_state": "RUNNABLE",
        "trigger_millisecond": 500,
        "self_millisecond": 38,
        "source_snapshot_count": 153
    },
    {
        "data": "YXQgaW8udW5kZXJ0b3cuc2Vy...",
        "thread_name": "XNIO-1 task-1",
        "thread_state": "RUNNABLE",
        "trigger_millisecond": 500,
        "self_millisecond": 0,
        "source_snapshot_count": 140
    }
]

Monitoring Metrics – task queue size, task release latency, number of active profiling tasks, stack export latency, data queue size, ingestion rate, aggregation latency, export byte size and rate.

By integrating with OpenTelemetry, thread profiling can associate Trace ID, span ID, and interface names, providing comprehensive observability.

Overall, the approach helps developers quickly locate performance bottlenecks, improve application quality, and maintain system stability.

JavaGogRPCKafkaOpenTelemetryPerformance Monitoringthread profiling
DeWu Technology
Written by

DeWu Technology

A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.