Tars: High‑Performance RPC Framework and Service Governance Platform Overview
The article introduces Tars, a high‑performance RPC framework and integrated service governance platform that leverages micro‑service architecture, detailing its design philosophy, layered architecture, key features such as the Tars protocol, load balancing, fault tolerance, overload protection, configuration management, and open‑source availability.
Tars is a high‑performance RPC development framework based on name service and the Tars protocol, accompanied by an integrated service governance platform that enables individuals and enterprises to quickly build stable, reliable distributed applications using a micro‑service approach.
Tars originated from Tencent's internal micro‑service architecture TAF (Total Application Framework), summarizing years of practice into an open‑source project; its name comes from the friendly robot Tars in the movie "Interstellar", reflecting ease of use, high performance, and comprehensive service governance.
Currently, the framework runs on over 100 business lines and more than 100,000 servers within Tencent.
Design Philosophy
Tars adopts a micro‑service mindset for service governance and abstracts system modules into layered abstractions, decoupling each layer.
The lowest protocol layer unifies business network communication using an IDL (Interface Definition Language), supporting multi‑platform, extensible, auto‑generated code, allowing developers to focus on protocol fields without handling cross‑platform compatibility.
The middle layer comprises common libraries, communication framework, and platform components, simplifying business development by encapsulating common code and RPC mechanisms, while ensuring high stability, availability, and performance, and addressing issues like fault tolerance, load balancing, capacity management, proximity access, and gray releases.
The topmost operation layer enables operators to manage deployment, publishing, configuration, monitoring, and scheduling.
Overall Architecture
Architecture Topology
The topology consists of two parts: service nodes and common framework nodes.
Service Nodes
A service node is an operating‑system instance (physical, virtual, or cloud) where services run; a large system may have thousands to hundreds of thousands of such nodes. Each node hosts a Node service and N (≥0) business service instances, with the Node service managing lifecycle, deployment, monitoring, and heartbeat collection.
Common Framework Nodes
All non‑service‑node components belong to this category. Their quantity varies, and for fault tolerance they are typically deployed across multiple data centers, scaling with the number of service nodes and specific needs such as logging.
Framework Node Details
Web Management System: provides real‑time service data and operations like publish, start/stop, and deployment.
Registry (routing + management): handles service registration, discovery, address queries, and heartbeat management.
Patch (release management): offers service publishing capabilities.
Config (configuration center): centralizes configuration file management.
Log (remote logging): enables services to write logs to remote storage.
Stat (call statistics): aggregates metrics such as traffic, latency, and timeout rates for alerting.
Property (business attributes): reports custom metrics like memory usage, queue size, cache hit rate.
Notify (exception information): reports service state changes, DB failures, and other errors for alerting.
All nodes must be network‑reachable, with each machine's Node able to communicate with the common framework nodes.
Features
Tars Protocol
The Tars protocol uses an Interface Description Language (IDL) to define binary, extensible, auto‑generated, multi‑platform compatible interfaces, facilitating RPC communication, serialization, and deserialization between backend services.
Supported types include basic types (void, bool, byte, short, int, long, float, double, string, unsigned variants) and complex types (enum, const, struct, vector, map, and nested combinations).
Invocation Modes
Using the IDL‑generated code, services implement business logic on the server side, while clients call services via three modes:
Synchronous call: client waits for the result before proceeding.
Asynchronous call: client continues other work; results are handled via callbacks.
One‑way call: client sends the request and does not expect a response.
Load Balancing
Through the name service, clients obtain a list of service addresses and select a load‑balancing strategy such as round‑robin, hash, or weighted distribution.
Fault Tolerance
Fault tolerance is achieved via name‑service exclusion and client‑side shielding.
Name‑service exclusion: services report heartbeats; the name service stops returning addresses of failed nodes, typically within about one minute.
Client shielding: clients monitor error rates and timeouts; if a service exhibits continuous timeouts or exceeds a threshold, the client temporarily shields that node, periodically retrying until it recovers.
Overload Protection
The framework implements request queues and non‑blocking asynchronous calls to improve throughput, monitors queue length, rejects new requests when thresholds are exceeded, and discards timed‑out requests.
Message Coloring
The framework allows coloring specific requests for a service interface; colored messages are forwarded to all downstream services and logged to a dedicated server, facilitating request tracing and debugging.
IDC Grouping
To reduce latency and cross‑region traffic, the framework supports near‑site access by grouping services across data centers and rooms.
SET Grouping
SET deployment isolates groups of services with no inter‑call relationships, providing fault isolation, standardized capacity management, and improved operational efficiency.
Data Monitoring
The framework reports various metrics to monitor service health, including call statistics, custom attribute data, and exception information, enabling users to view traffic, latency, error rates, and service state changes.
Centralized Configuration
The platform provides web‑based centralized management of business configurations, enabling easy modification, timely notifications, secure changes, version history, and rollback capabilities. Configuration retrieval is service‑oriented.
Configuration files are organized into four levels: application, SET, service, and node.
Application configuration is the highest level, shared by multiple services.
SET configuration supplements application configuration for a specific SET group.
Service configuration applies to all nodes of a particular service and can reference application configuration.
Node configuration is personalized for an individual application node and merges with service configuration to form the final node configuration.
Project Address
Open‑source repository: https://gitee.com/TarsCloud/Tars
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.