Thread Profiling: Design and Implementation of Client‑Server Performance Analysis
Thread profiling uses threshold‑triggered tasks on business threads to capture stack snapshots, which a dedicated profiler thread sends via high‑performance gRPC to a server that queues them in Kafka, enriches and stores them in ClickHouse, correlates with OpenTelemetry traces, and provides metrics that let developers quickly pinpoint latency bottlenecks and improve system stability.
Thread profiling is a powerful technique for identifying high‑latency issues by collecting and analyzing runtime thread stacks.
The core idea is to create a threshold‑triggered detection task on business threads; when the threshold is exceeded, a dedicated profiling thread captures the stack and asynchronously sends it to a profiling server for analysis.
Implementation Overview
Client Design – tasks are created, scheduled on a time‑wheel (default 100 ms tick), executed, and exported. Tasks are queued, executed by a thread pool, and stack snapshots are pushed to a diagnostic queue. Data size can reach >200 KB per snapshot, so queue length is configurable.
Server Design – the server receives data via high‑performance gRPC, enqueues it into Kafka, parses and enriches it, then persists (e.g., ClickHouse). It also supports OpenTelemetry trace correlation.
Data Processing
Snapshots are pre‑aggregated, parent‑child stack frames are inferred, and self‑time is calculated using defined rules.
[
{
"data": "YXQgc3VuLm5pby5jaC5Vd...",
"thread_name": "XNIO-1 I/O-1",
"thread_state": "RUNNABLE",
"trigger_millisecond": 500,
"self_millisecond": 38,
"source_snapshot_count": 153
},
{
"data": "YXQgaW8udW5kZXJ0b3cuc2Vy...",
"thread_name": "XNIO-1 task-1",
"thread_state": "RUNNABLE",
"trigger_millisecond": 500,
"self_millisecond": 0,
"source_snapshot_count": 140
}
]Monitoring Metrics – task queue size, task release latency, number of active profiling tasks, stack export latency, data queue size, ingestion rate, aggregation latency, export byte size and rate.
By integrating with OpenTelemetry, thread profiling can associate Trace ID, span ID, and interface names, providing comprehensive observability.
Overall, the approach helps developers quickly locate performance bottlenecks, improve application quality, and maintain system stability.
DeWu Technology
A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.