Backend Development 20 min read

Introduction to QTracer: An Internal Distributed Tracing System at Qunar

QTracer is Qunar’s internal distributed tracing system that generates a global TraceID for each request, records operations across services, and provides features such as execution chain visualization, log correlation, conditional search, service dependency analysis, database statistics, transparent data propagation, and low‑overhead instrumentation for debugging and performance monitoring.

Qunar Tech Salon

Aug 14, 2017

Introduction to QTracer: An Internal Distributed Tracing System at Qunar

As business grows and traffic increases, companies split services across more machines, leading to complex distributed systems that rely on network calls such as RPC, HTTP APIs, and message queues, making it difficult for any single node to understand the overall architecture.

To address this, QTracer—a distributed tracing system developed by Qunar—assigns a unique TraceID to each request, propagates it downstream, records operations in each service, and reconstructs the complete execution flow across the system.

1. Overview

QTracer generates a global TraceID for every request and records each operation (Span) along the call chain, allowing the full request path to be visualized.

2. Core Features

2.1 Execution Chain Query

The chain view displays request location, description, type, execution time, and other details, helping users quickly understand the global request flow, involved services, machines, latency distribution, success status, retries, and bottlenecks.

2.2 Associated Log Query

By providing a TraceID, users can retrieve logs from all services involved in the request, gaining deeper insight when the trace data alone is insufficient.

2.3 Conditional Search

Users can search TraceIDs by prefix, originating application, time range, or keywords (e.g., order numbers) to locate related traces.

2.4 Service Upstream/Downstream Relationships

Real‑time analysis of trace links reveals service dependencies, QPS, and latency metrics.

2.5 Database Operation Statistics

Statistics are provided at the database, table, and statement levels, including QPS, latency, and the slowest operations to help identify indexing or performance issues.

2.6 Transparent Data Propagation

Trace data can carry custom key‑value pairs (e.g., feature flags, identifiers) downstream without altering business interfaces, enabling consistent context sharing across services.

Examples include AB‑test branch flags and unit identifiers for cross‑unit calls.

2.7 Additional Features

QTracer also supports automatic TraceID association with exceptions, QPS analysis, recent slow‑call statistics, and failure rate aggregation.

3. QTracer Client Core Design

3.1 Data Model

A Trace consists of multiple Spans forming a tree; each Span records service name, start/end times, result, and type.

3.2 Basic Concepts

TraceID is a globally unique identifier that may embed origin application, machine, timestamp, and sampling flag.

SpanID identifies each operation within a Trace, using hierarchical numbering (e.g., 1, 1.1, 1.1.1) to represent call order and relationships.

TimelineAnnotation records internal step timestamps within a Span, useful for pinpointing failures in multi‑step operations such as HTTP requests.

KVAnnotation stores custom business data (order ID, UID, etc.) that can be searched later.

TraceContext carries non‑collected data downstream for transparent propagation.

3.3 Core API

Developers start a Trace with startTrace, then add annotations via various add* methods; however, most components already embed automatic instrumentation.

3.4 Trace Continuation

For synchronous calls, ThreadLocal propagates context; for asynchronous or cross‑thread calls, explicit context transfer is required. QTracer integrates with Dubbo, HTTP, MQ, etc., to automatically forward context.

3.5 Non‑Intrusive Instrumentation

Internal components have built‑in hooks; third‑party components are instrumented via bytecode agents (e.g., MySQL, PostgreSQL drivers).

3.6 Bytecode Instrumentation

QTracer uses a Java agent to modify bytecode at runtime, allowing configuration‑driven insertion of tracing logic without source changes.

3.7 Fast Local Method Instrumentation

Annotations such as @QTrace, @QP, and @QF mark methods, parameters, and fields for automatic instrumentation during compilation.

3.8 Log Correlation

TraceID and SpanID are stored in MDC, enabling log statements to automatically include tracing identifiers; downstream log aggregation can then link logs to specific traces.

4. QTracer System Architecture

The system consists of three layers: data recording & collection, data processing & analysis, and data presentation.

4.1 Data Recording & Collection

Spans are logged locally, then shipped via a Kafka cluster. Asynchronous batch logging, file rotation, and selective dropping keep resource usage low.

4.2 Data Processing & Analysis

Samza processes streams: Span data is stored in HBase for fast lookup and aggregated into complete Traces; Trace data is compacted and indexed in Elasticsearch for search, dependency analysis, QPS, and latency metrics.

HBase stores raw Span data keyed by TraceID and SpanID; Elasticsearch holds the compacted Trace view and service dependency graphs.

5. QTracer Debug

5.1 Overview

QTracer Debug provides IDE‑like breakpoint debugging without pausing the application, capturing variable states and call stacks with minimal overhead.

5.2 Implementation

Users select a project and line in the front‑end UI, enable the breakpoint on a target machine, trigger the code path, and the system records the snapshot via bytecode‑injected instrumentation.

5.3 Breakpoint Injection

The process maps source files to classes, gathers variable scope information, modifies bytecode to insert collection code at the specified line, and records the data using QTracer’s KVAnnotation mechanism.

5.4 Data Collection

Collected debug data travels through the existing logging pipeline, is filtered by real‑time jobs, stored in HBase, and displayed in the front‑end UI.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backend Observability distributed tracing Trace Analysis QTracer

Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.