Operations 19 min read

QTracer: An In‑Depth Overview of Qunar’s Distributed Tracing System

This article provides a comprehensive technical overview of QTracer, Qunar’s internal distributed tracing platform, covering its architecture, core concepts, key features such as execution‑chain queries, log association, conditional searches, data storage, non‑intrusive instrumentation, bytecode injection, and the QTracer Debug tool for online breakpoint debugging.

Ctrip Technology
Ctrip Technology
Ctrip Technology
QTracer: An In‑Depth Overview of Qunar’s Distributed Tracing System

QTracer is Qunar’s internally developed distributed tracing system that generates a globally unique TraceID for each request, propagates it across services, records operations in each system, and reconstructs the full execution flow for analysis.

Key Features

1. Execution Chain Query : Visualizes the complete request path, showing data center, description, type, and execution time. 2. Associated Log Query : Retrieves logs from all involved services using the TraceID. 3. Conditional Search : Searches TraceIDs by prefix, originating application, time range, or custom keywords stored in the trace data. 4. Service Upstream/Downstream Relationship : Analyzes call dependencies and provides QPS and latency metrics. 5. Database Operation Statistics : Offers table‑level and statement‑level execution counts, QPS, latency, and slow‑query detection. 6. Transparent Data Transmission : Passes custom data (e.g., ABTest flags, unit identifiers) along the trace without modifying service interfaces. 7. Additional Functions : Automatic TraceID association for exceptions, service call QPS analysis, recent slow‑call statistics, and failure statistics.

Client Core Design

The trace model consists of a Trace composed of multiple Span objects. Each Span records service description, start/end times, results, and type. Core concepts include:

TraceID – a globally unique identifier that may embed origin app, machine, timestamp, and sampling flag.

SpanID – identifies individual operations and encodes hierarchical order (e.g., 1, 1.1, 1.1.1).

TimelineAnnotation – records fine‑grained timing steps within a Span (e.g., HTTP connection, request write, response read).

KVAnnotation – stores custom key‑value data such as order numbers or user IDs.

TraceContext – carries transparent data downstream without being collected.

The core API uses startTrace to begin a Span and add methods to attach various annotations. Most components embed automatic instrumentation, so developers rarely need to call the API directly.

Trace Continuation

Synchronous calls propagate TraceID/SpanID via ThreadLocal . Asynchronous or cross‑thread calls require explicit context transfer; QTracer provides helper wrappers for thread pools and integrates automatic propagation into Dubbo, HTTP, and MQ components.

Non‑Intrusive Instrumentation

For internal components, QTracer adds direct instrumentation code. For third‑party libraries (e.g., MySQL, PostgreSQL drivers), it uses bytecode injection at runtime. Local method instrumentation is achieved through annotations such as @QTrace , @QP , and @QF , which generate insertion points during compilation.

Log Association

QTracer stores TraceID and SpanID in MDC, allowing them to be output in log patterns. When logs are collected, the TraceID links log entries to the corresponding trace, enabling end‑to‑end debugging.

System Architecture

1. Trace Data Recording & Collection : Data is first written to local logs, then asynchronously batched and sent to a Kafka cluster. Resource consumption is minimized via async writes, size‑based log rotation, and optional data dropping under pressure.

2. Trace Data Processing & Analysis : Samza processes streams from Kafka, persisting raw Span data to HBase for fast lookup and aggregating Spans into complete Traces. It also computes recent slow operations, QPS, and latency statistics.

3. Data Storage : HBase stores raw Trace and Span data (using TraceID as rowkey and SpanID as column) and minute‑level QPS metrics. Elasticsearch holds condensed Trace data for multi‑dimensional search, service dependency graphs, and recent slow‑call information.

QTracer Debug

QTracer Debug offers online, non‑blocking breakpoint debugging similar to IDE debuggers. Users select a project and line in the front‑end, enable the breakpoint on a target machine, trigger the URL, and the system records all in‑scope variables and call stack without pausing the application. The implementation leverages GitLab APIs for code browsing, bytecode injection to insert breakpoint code, KVAnnotation for data capture, and the existing data pipeline to store results in HBase for front‑end visualization.

backendJavaMicroservicesObservabilityDistributed Tracingtrace analysisQTracer
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.