Introduction to QTracer: An Internal Distributed Tracing System at Qunar
QTracer is Qunar’s internal distributed tracing system that generates a global TraceID for each request, records operations across services, and provides features such as execution chain visualization, log correlation, conditional search, service dependency analysis, database statistics, transparent data propagation, and low‑overhead instrumentation for debugging and performance monitoring.
As business grows and traffic increases, companies split services across more machines, leading to complex distributed systems that rely on network calls such as RPC, HTTP APIs, and message queues, making it difficult for any single node to understand the overall architecture.
To address this, QTracer—a distributed tracing system developed by Qunar—assigns a unique TraceID to each request, propagates it downstream, records operations in each service, and reconstructs the complete execution flow across the system.
1. Overview
QTracer generates a global TraceID for every request and records each operation (Span) along the call chain, allowing the full request path to be visualized.
2. Core Features
2.1 Execution Chain Query
The chain view displays request location, description, type, execution time, and other details, helping users quickly understand the global request flow, involved services, machines, latency distribution, success status, retries, and bottlenecks.
2.2 Associated Log Query
By providing a TraceID, users can retrieve logs from all services involved in the request, gaining deeper insight when the trace data alone is insufficient.
2.3 Conditional Search
Users can search TraceIDs by prefix, originating application, time range, or keywords (e.g., order numbers) to locate related traces.
2.4 Service Upstream/Downstream Relationships
Real‑time analysis of trace links reveals service dependencies, QPS, and latency metrics.
2.5 Database Operation Statistics
Statistics are provided at the database, table, and statement levels, including QPS, latency, and the slowest operations to help identify indexing or performance issues.
2.6 Transparent Data Propagation
Trace data can carry custom key‑value pairs (e.g., feature flags, identifiers) downstream without altering business interfaces, enabling consistent context sharing across services.
Examples include AB‑test branch flags and unit identifiers for cross‑unit calls.
2.7 Additional Features
QTracer also supports automatic TraceID association with exceptions, QPS analysis, recent slow‑call statistics, and failure rate aggregation.
3. QTracer Client Core Design
3.1 Data Model
A Trace consists of multiple Spans forming a tree; each Span records service name, start/end times, result, and type.
3.2 Basic Concepts
TraceID is a globally unique identifier that may embed origin application, machine, timestamp, and sampling flag.
SpanID identifies each operation within a Trace, using hierarchical numbering (e.g., 1, 1.1, 1.1.1) to represent call order and relationships.
TimelineAnnotation records internal step timestamps within a Span, useful for pinpointing failures in multi‑step operations such as HTTP requests.
KVAnnotation stores custom business data (order ID, UID, etc.) that can be searched later.
TraceContext carries non‑collected data downstream for transparent propagation.
3.3 Core API
Developers start a Trace with startTrace , then add annotations via various add* methods; however, most components already embed automatic instrumentation.
3.4 Trace Continuation
For synchronous calls, ThreadLocal propagates context; for asynchronous or cross‑thread calls, explicit context transfer is required. QTracer integrates with Dubbo, HTTP, MQ, etc., to automatically forward context.
3.5 Non‑Intrusive Instrumentation
Internal components have built‑in hooks; third‑party components are instrumented via bytecode agents (e.g., MySQL, PostgreSQL drivers).
3.6 Bytecode Instrumentation
QTracer uses a Java agent to modify bytecode at runtime, allowing configuration‑driven insertion of tracing logic without source changes.
3.7 Fast Local Method Instrumentation
Annotations such as @QTrace , @QP , and @QF mark methods, parameters, and fields for automatic instrumentation during compilation.
3.8 Log Correlation
TraceID and SpanID are stored in MDC, enabling log statements to automatically include tracing identifiers; downstream log aggregation can then link logs to specific traces.
4. QTracer System Architecture
The system consists of three layers: data recording & collection, data processing & analysis, and data presentation.
4.1 Data Recording & Collection
Spans are logged locally, then shipped via a Kafka cluster. Asynchronous batch logging, file rotation, and selective dropping keep resource usage low.
4.2 Data Processing & Analysis
Samza processes streams: Span data is stored in HBase for fast lookup and aggregated into complete Traces; Trace data is compacted and indexed in Elasticsearch for search, dependency analysis, QPS, and latency metrics.
HBase stores raw Span data keyed by TraceID and SpanID; Elasticsearch holds the compacted Trace view and service dependency graphs.
5. QTracer Debug
5.1 Overview
QTracer Debug provides IDE‑like breakpoint debugging without pausing the application, capturing variable states and call stacks with minimal overhead.
5.2 Implementation
Users select a project and line in the front‑end UI, enable the breakpoint on a target machine, trigger the code path, and the system records the snapshot via bytecode‑injected instrumentation.
5.3 Breakpoint Injection
The process maps source files to classes, gathers variable scope information, modifies bytecode to insert collection code at the specified line, and records the data using QTracer’s KVAnnotation mechanism.
5.4 Data Collection
Collected debug data travels through the existing logging pipeline, is filtered by real‑time jobs, stored in HBase, and displayed in the front‑end UI.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.