Tag

Observability

0 views collected around this technical thread.

Bilibili Tech
Bilibili Tech
May 31, 2024 · Backend Development

Design and High‑Availability Practices of Bilibili's Video Submission System

Bilibili’s video submission platform uses a layered micro‑service architecture with a DAG‑based scheduler, extensive observability, and HA tactics such as sharding, 64‑bit ID migration, full‑link stress tests, chaos engineering, and multi‑active data‑center deployment, while tooling like trace correlation and automated alerts ensures stability and guides future hybrid‑cloud migration.

BilibiliDAGHigh Availability
0 likes · 35 min read
Design and High‑Availability Practices of Bilibili's Video Submission System
Tencent Cloud Developer
Tencent Cloud Developer
Apr 2, 2024 · Backend Development

tRPC Scaffolding Tooling and Observability Best Practices for Tencent Docs Backend

By introducing the unified tRPC scaffolding tool trpcx and embedding OpenTelemetry‑generated observability configurations, the Tencent Docs backend team streamlined service creation, standardized directory structures, migrated metrics and logs to ClickHouse for cost‑effective performance, and established best‑practice workflows that dramatically improve development speed and fault‑diagnosis efficiency.

ClickHouseMetricsMicroservices
0 likes · 18 min read
tRPC Scaffolding Tooling and Observability Best Practices for Tencent Docs Backend
Tencent Cloud Developer
Tencent Cloud Developer
Mar 19, 2024 · Operations

Chaos Engineering in WeChat Pay: Design, Implementation, and Results

WeChat Pay’s team adopted Netflix‑style chaos engineering, building an automated, YAML‑driven fault‑injection platform that isolates experiments in multi‑zone partitions, enabling over 500 safe experiments in 2021‑2022, uncovering critical bugs across core services while maintaining five‑nine availability and zero production incidents.

AutomationChaos EngineeringHigh Availability
0 likes · 18 min read
Chaos Engineering in WeChat Pay: Design, Implementation, and Results
Didi Tech
Didi Tech
Sep 12, 2023 · Operations

Observability: Concepts, Challenges, and Didi’s Implementation

The article explains observability as the ability to infer any system state from external data, contrasts it with traditional monitoring, outlines challenges of high‑dimensional, high‑cardinality data and storage costs, and describes Didi’s hybrid MTL architecture that separates low‑ and high‑cardinality logs and metrics while linking them via TraceIDs to provide detailed, cost‑effective insight and streamlined debugging.

DidiMetricsMicroservices
0 likes · 9 min read
Observability: Concepts, Challenges, and Didi’s Implementation
Tencent Cloud Developer
Tencent Cloud Developer
Jun 28, 2021 · Cloud Native

Effective Service Governance for Serverless: Challenges and Solutions

Effective serverless governance requires comprehensive observability, traffic management, and service registration built on Kubernetes, using either a mesh sidecar with Istio or an embedded SDK, to simplify complex operational tasks such as discovery, fault tolerance, gray releases, and metric correlation for large‑scale function deployments.

ObservabilityService Meshcloud native
0 likes · 17 min read
Effective Service Governance for Serverless: Challenges and Solutions