Backend Development 14 min read

Tars: High‑Performance RPC Framework and Service Governance Platform Overview

The article introduces Tars, a high‑performance RPC framework and integrated service governance platform that leverages micro‑service architecture, detailing its design philosophy, layered architecture, key features such as the Tars protocol, load balancing, fault tolerance, overload protection, configuration management, and open‑source availability.

Architecture Digest

Oct 8, 2020

Tars: High‑Performance RPC Framework and Service Governance Platform Overview

Tars is a high‑performance RPC development framework based on name service and the Tars protocol, accompanied by an integrated service governance platform that enables individuals and enterprises to quickly build stable, reliable distributed applications using a micro‑service approach.

Tars originated from Tencent's internal micro‑service architecture TAF (Total Application Framework), summarizing years of practice into an open‑source project; its name comes from the friendly robot Tars in the movie "Interstellar", reflecting ease of use, high performance, and comprehensive service governance.

Currently, the framework runs on over 100 business lines and more than 100,000 servers within Tencent.

Design Philosophy

Tars adopts a micro‑service mindset for service governance and abstracts system modules into layered abstractions, decoupling each layer.

The lowest protocol layer unifies business network communication using an IDL (Interface Definition Language), supporting multi‑platform, extensible, auto‑generated code, allowing developers to focus on protocol fields without handling cross‑platform compatibility.

The middle layer comprises common libraries, communication framework, and platform components, simplifying business development by encapsulating common code and RPC mechanisms, while ensuring high stability, availability, and performance, and addressing issues like fault tolerance, load balancing, capacity management, proximity access, and gray releases.

The topmost operation layer enables operators to manage deployment, publishing, configuration, monitoring, and scheduling.

Overall Architecture

Architecture Topology

The topology consists of two parts: service nodes and common framework nodes.

Service Nodes

A service node is an operating‑system instance (physical, virtual, or cloud) where services run; a large system may have thousands to hundreds of thousands of such nodes. Each node hosts a Node service and N (≥0) business service instances, with the Node service managing lifecycle, deployment, monitoring, and heartbeat collection.

Common Framework Nodes

All non‑service‑node components belong to this category. Their quantity varies, and for fault tolerance they are typically deployed across multiple data centers, scaling with the number of service nodes and specific needs such as logging.

Framework Node Details

Web Management System: provides real‑time service data and operations like publish, start/stop, and deployment.

Registry (routing + management): handles service registration, discovery, address queries, and heartbeat management.

Patch (release management): offers service publishing capabilities.

Config (configuration center): centralizes configuration file management.

Log (remote logging): enables services to write logs to remote storage.

Stat (call statistics): aggregates metrics such as traffic, latency, and timeout rates for alerting.

Property (business attributes): reports custom metrics like memory usage, queue size, cache hit rate.

Notify (exception information): reports service state changes, DB failures, and other errors for alerting.

All nodes must be network‑reachable, with each machine's Node able to communicate with the common framework nodes.

Features

Tars Protocol

The Tars protocol uses an Interface Description Language (IDL) to define binary, extensible, auto‑generated, multi‑platform compatible interfaces, facilitating RPC communication, serialization, and deserialization between backend services.

Supported types include basic types (void, bool, byte, short, int, long, float, double, string, unsigned variants) and complex types (enum, const, struct, vector, map, and nested combinations).

Invocation Modes

Using the IDL‑generated code, services implement business logic on the server side, while clients call services via three modes:

Synchronous call: client waits for the result before proceeding.

Asynchronous call: client continues other work; results are handled via callbacks.

One‑way call: client sends the request and does not expect a response.

Load Balancing

Through the name service, clients obtain a list of service addresses and select a load‑balancing strategy such as round‑robin, hash, or weighted distribution.

Fault Tolerance

Fault tolerance is achieved via name‑service exclusion and client‑side shielding.

Name‑service exclusion: services report heartbeats; the name service stops returning addresses of failed nodes, typically within about one minute.

Client shielding: clients monitor error rates and timeouts; if a service exhibits continuous timeouts or exceeds a threshold, the client temporarily shields that node, periodically retrying until it recovers.

Overload Protection

The framework implements request queues and non‑blocking asynchronous calls to improve throughput, monitors queue length, rejects new requests when thresholds are exceeded, and discards timed‑out requests.

Message Coloring

The framework allows coloring specific requests for a service interface; colored messages are forwarded to all downstream services and logged to a dedicated server, facilitating request tracing and debugging.

IDC Grouping

To reduce latency and cross‑region traffic, the framework supports near‑site access by grouping services across data centers and rooms.

SET Grouping

SET deployment isolates groups of services with no inter‑call relationships, providing fault isolation, standardized capacity management, and improved operational efficiency.

Data Monitoring

The framework reports various metrics to monitor service health, including call statistics, custom attribute data, and exception information, enabling users to view traffic, latency, error rates, and service state changes.

Centralized Configuration

The platform provides web‑based centralized management of business configurations, enabling easy modification, timely notifications, secure changes, version history, and rollback capabilities. Configuration retrieval is service‑oriented.

Configuration files are organized into four levels: application, SET, service, and node.

Application configuration is the highest level, shared by multiple services.

SET configuration supplements application configuration for a specific SET group.

Service configuration applies to all nodes of a particular service and can reference application configuration.

Node configuration is personalized for an individual application node and merges with service configuration to form the final node configuration.

Project Address

Open‑source repository: https://gitee.com/TarsCloud/Tars

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

microservices Backend Development RPC service governance Tars

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.