Tagged articles
983 articles
Page 8 of 10
Architect's Guide
Architect's Guide
Jul 5, 2022 · Backend Development

Architect’s Guide: Backend Architecture, Microservices, Service Mesh, and Message Queues

This comprehensive article reviews backend architectural concepts such as microservices design, service mesh, observability pillars, gateway patterns, service registration, configuration centers, and a detailed comparison of message‑queue technologies, providing practical guidance for architects and engineers.

Backend ArchitectureObservabilityService Mesh
0 likes · 27 min read
Architect’s Guide: Backend Architecture, Microservices, Service Mesh, and Message Queues
dbaplus Community
dbaplus Community
Jul 4, 2022 · Operations

Why Most Monitoring Systems Fail: Lessons from a Veteran Ops Engineer

A seasoned operations professional shares personal experiences and hard‑earned insights on why traditional monitoring often becomes ineffective, how over‑automation and noisy dashboards hurt teams, and what a capability‑focused, user‑centric approach to observability should look like.

ObservabilityOperationsSRE
0 likes · 12 min read
Why Most Monitoring Systems Fail: Lessons from a Veteran Ops Engineer
AntTech
AntTech
Jun 28, 2022 · Operations

AntMonitor: Evolution, Features, and Core Technologies of Ant Group’s Observability Platform

The article details Ant Group’s AntMonitor observability platform, covering its development timeline, holographic monitoring capabilities, integrated performance analysis, efficient data integration, built‑in AI‑driven analytics, Monitoring‑as‑a‑Service, and the underlying high‑performance time‑series database and cloud‑native architecture that support massive real‑time data processing.

CloudNativeObservabilityTimeSeriesDatabase
0 likes · 17 min read
AntMonitor: Evolution, Features, and Core Technologies of Ant Group’s Observability Platform
High Availability Architecture
High Availability Architecture
Jun 24, 2022 · Backend Development

Improving Cache Invalidation and Consistency at Scale

Meta engineers describe the challenges of cache invalidation and consistency in large‑scale distributed systems, explain why stale caches are problematic, present their Polaris observability service and consistency‑tracking techniques, and detail how they raised TAO’s cache consistency from six‑nines to ten‑nines.

Cache InvalidationCachingConsistency
0 likes · 17 min read
Improving Cache Invalidation and Consistency at Scale
HaoDF Tech Team
HaoDF Tech Team
Jun 21, 2022 · Operations

Evolution and High‑Availability Construction of the Haodafu Offline Message Push System

This article describes how the Haodafu offline push service grew from a simple PHP notification tool into a robust, highly‑available micro‑service platform by redesigning architecture, adopting vendor push channels, adding message‑queue reliability, implementing comprehensive monitoring, observability, and a fault‑diagnosis platform to ensure delivery rates and operational stability.

Mobile BackendObservabilitySRE
0 likes · 21 min read
Evolution and High‑Availability Construction of the Haodafu Offline Message Push System
Programmer DD
Programmer DD
Jun 21, 2022 · Operations

Discover Grafana 9.0: Visual Query Builders, Heatmap Panel & More

Grafana 9.0 introduces a suite of usability enhancements—including visual Prometheus and Loki query builders, an Explore‑to‑dashboard workflow, a high‑performance heatmap panel, command‑palette navigation, and improved alerting—making data exploration, visualization, and monitoring more intuitive for developers and operators.

GrafanaLokiObservability
0 likes · 8 min read
Discover Grafana 9.0: Visual Query Builders, Heatmap Panel & More
Architecture Digest
Architecture Digest
Jun 20, 2022 · Backend Development

Architectural Guide: Microservices, Service Mesh, Messaging, and Observability

This article presents a comprehensive architectural roadmap covering microservice fundamentals, design principles, service discovery, API protocols, gateway patterns, observability pillars, service mesh options, and a detailed comparison of modern message‑queue technologies, offering practical guidance for backend system design and selection.

Backend ArchitectureCloud NativeObservability
0 likes · 28 min read
Architectural Guide: Microservices, Service Mesh, Messaging, and Observability
ITPUB
ITPUB
Jun 18, 2022 · Operations

How MDD and SRE Cut Mini‑Program Image‑Upload Failures from Days to Minutes

This article recounts a three‑day image‑upload outage in a mini‑program, analyzes the multi‑layer causes, and shows how combining Metrics‑Driven Development with SRE and a custom observability platform dramatically reduces diagnosis time and improves reliability.

Metrics-Driven DevelopmentMini ProgramObservability
0 likes · 20 min read
How MDD and SRE Cut Mini‑Program Image‑Upload Failures from Days to Minutes
Xingsheng Youxuan Technology Community
Xingsheng Youxuan Technology Community
Jun 17, 2022 · Frontend Development

How Prism Transformed Front‑End Monitoring at Scale: Architecture, Challenges & Insights

This article details the design, challenges, and solutions behind Prism, a self‑built front‑end monitoring platform that collects multi‑device SDK data, processes it through Kafka, Flink and ClickHouse, visualizes metrics, integrates with A/B testing, and outlines future enhancements for broader enterprise adoption.

AB testingFrontendObservability
0 likes · 14 min read
How Prism Transformed Front‑End Monitoring at Scale: Architecture, Challenges & Insights
Architecture Digest
Architecture Digest
Jun 17, 2022 · Cloud Native

Vivo Container Cluster Monitoring Architecture and Cloud‑Native Practices

This article describes Vivo's practical experience building a cloud‑native monitoring system for large‑scale container clusters, covering the shortcomings of traditional monitoring, the Prometheus‑centric ecosystem, high‑availability architecture, challenges faced, and future directions such as automation and AI‑driven operations.

ObservabilityPrometheusVictoriaMetrics
0 likes · 13 min read
Vivo Container Cluster Monitoring Architecture and Cloud‑Native Practices
Meituan Technology Team
Meituan Technology Team
Jun 16, 2022 · Artificial Intelligence

Building a Quality Model for Meituan's Recommendation System

This article presents a request‑granularity quality model for Meituan's integrated recommendation system, linking data tables, algorithm models, services, and user requests, and details its metrics, defect taxonomy, calculation formulas, data‑lineage expansion, implementation, alert routing, and operational outcomes.

Data LineageMeituanObservability
0 likes · 22 min read
Building a Quality Model for Meituan's Recommendation System
vivo Internet Technology
vivo Internet Technology
Jun 15, 2022 · Cloud Native

Vivo Container Cluster Monitoring Architecture and Cloud‑Native Observability Practices

Vivo’s cloud‑native monitoring solution combines high‑availability Prometheus clusters, VictoriaMetrics storage, Grafana visualization, and a custom leader‑election adapter to deduplicate data while forwarding metrics to Kafka and OLAP systems, addressing large‑scale performance, scalability, and integration challenges and paving the way for AI‑driven AIOps.

Cloud Native MonitoringKubernetesObservability
0 likes · 18 min read
Vivo Container Cluster Monitoring Architecture and Cloud‑Native Observability Practices
dbaplus Community
dbaplus Community
Jun 13, 2022 · Operations

How We Built a Mini‑Program Observability Platform to Slash Incident Resolution Time

After a three‑day, ten‑person investigation into a mini‑program image‑upload failure, we designed and implemented an end‑to‑end observability platform using MDD and SRE principles, defining SLI/SLO, instrumenting client, network, gateway and backend layers, and visualizing metrics with Grafana, ClickHouse and Prometheus.

GrafanaMDDMetrics
0 likes · 18 min read
How We Built a Mini‑Program Observability Platform to Slash Incident Resolution Time
Top Architect
Top Architect
Jun 12, 2022 · Backend Development

Comprehensive Guide to Backend Architecture: Microservices, Observability, Service Mesh, and Messaging

This article provides an in‑depth overview of modern backend architecture, covering microservice fundamentals, design principles, gateway patterns, service registration, configuration management, observability pillars, service mesh options, and a detailed comparison of popular message‑queue technologies.

MessagingObservabilityarchitecture
0 likes · 29 min read
Comprehensive Guide to Backend Architecture: Microservices, Observability, Service Mesh, and Messaging
Shopee Tech Team
Shopee Tech Team
Jun 10, 2022 · Mobile Development

MDAP Stack Symbolization Service: Architecture, Implementation, and Optimization

The MDAP Stack Symbolization Service unifies high‑throughput address‑and symbol‑based stack resolution for iOS, Android native, Android Java, Web and React Native by parsing dSYM/ELF files and source‑map or ProGuard mappings, caching results in Redis (with RocksDB fallback), and exposing a gRPC API for fast, scalable de‑obfuscation.

DWARFDebuggingObservability
0 likes · 49 min read
MDAP Stack Symbolization Service: Architecture, Implementation, and Optimization
Top Architect
Top Architect
Jun 9, 2022 · Backend Development

Microservice Architecture and Design Patterns Overview

This article provides a comprehensive overview of microservice architecture, covering its core goals, design principles, various decomposition and integration patterns, database strategies, observability, resilience, deployment, and operational concerns, offering practical guidance for building scalable, maintainable services.

DeploymentObservabilityarchitecture
0 likes · 18 min read
Microservice Architecture and Design Patterns Overview
IT Architects Alliance
IT Architects Alliance
Jun 8, 2022 · Backend Development

Mastering Microservice Patterns: From Decomposition to Resilience

This article provides a comprehensive overview of common microservice patterns and design principles, covering goals such as cost reduction, faster releases, resilience, visibility, and detailing decomposition, integration, database, CQRS, observability, health‑check, and deployment strategies for building robust backend systems.

Blue‑Green deploymentCQRSDesign Patterns
0 likes · 20 min read
Mastering Microservice Patterns: From Decomposition to Resilience
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 8, 2022 · Fundamentals

eBPF Explained: Core Concepts, Use Cases, and Best Practices

eBPF is a kernel‑level sandbox technology that enables safe, high‑performance, programmable instrumentation for networking, security, and observability, and this article answers seven key questions covering its definition, applications, origins, usage steps, implementation details, best practices, and current ecosystem.

Kernel InstrumentationLinuxObservability
0 likes · 21 min read
eBPF Explained: Core Concepts, Use Cases, and Best Practices
Architect
Architect
Jun 6, 2022 · Backend Development

Microservice Architecture and Design Patterns

This article provides a comprehensive overview of microservice architecture, detailing its core objectives, design principles, various decomposition and integration patterns, database strategies, consistency mechanisms, observability techniques, and deployment practices for building resilient, scalable backend systems.

Observabilityarchitecturemicroservices
0 likes · 18 min read
Microservice Architecture and Design Patterns
Efficient Ops
Efficient Ops
May 24, 2022 · Cloud Native

How AutoTagging and MultistageCodec Transform Cloud‑Native Observability

This article explores the challenges of building a unified observability data platform for hybrid‑cloud microservices, examines six common data‑island scenarios, and presents DeepFlow's AutoTagging and MultistageCodec techniques that dramatically reduce tagging overhead and storage costs while enabling seamless cross‑data correlation.

ClickHouseObservabilityauto-tagging
0 likes · 11 min read
How AutoTagging and MultistageCodec Transform Cloud‑Native Observability
Snowball Engineer Team
Snowball Engineer Team
May 24, 2022 · Cloud Native

How Snowball Used Apache APISIX to Build a Dual‑Active Architecture and Streamline Authentication

This article details Snowball's transition from a single‑datacenter setup to a dual‑active, cloud‑native architecture using Apache APISIX, covering background challenges, problem analysis, gateway selection, architectural adjustments, authentication unification, observability enhancements, ZooKeeper integration, and future plans.

Apache APISIXAuthenticationCloud Native
0 likes · 11 min read
How Snowball Used Apache APISIX to Build a Dual‑Active Architecture and Streamline Authentication
Programmer DD
Programmer DD
May 16, 2022 · Cloud Native

Master Loki: Scalable Log Aggregation for Kubernetes and Prometheus

This guide introduces Loki, the open‑source, horizontally scalable log aggregation system optimized for Prometheus and Kubernetes, covering its core concepts, architecture, components, deployment steps, Grafana integration, label‑based indexing, and best practices for handling dynamic and high‑cardinality tags.

GrafanaKubernetesLoki
0 likes · 19 min read
Master Loki: Scalable Log Aggregation for Kubernetes and Prometheus
Alibaba Cloud Native
Alibaba Cloud Native
May 11, 2022 · Cloud Native

How Zuoyebang Cut 22% Costs with Kubernetes Serverless Virtual Nodes

Zuoyebang’s shift to cloud‑native architecture leveraged Alibaba Cloud’s Kubernetes Serverless virtual nodes, achieving a 22.5% cost reduction during peak traffic by dynamically scaling workloads, while addressing scheduling, observability, and performance challenges through custom schedulers, enhanced monitoring, and careful testing.

Cloud NativeKubernetesObservability
0 likes · 11 min read
How Zuoyebang Cut 22% Costs with Kubernetes Serverless Virtual Nodes
Tencent Cloud Developer
Tencent Cloud Developer
May 7, 2022 · Cloud Native

Fourth Techo TVP Developer Conference: Cloud Native Trends and Best Practices

The Fourth Techo TVP Developer Conference highlighted current cloud‑native adoption, FinOps cost‑optimization, distributed‑cloud strategies, and maturity models on Day 1, then showcased practical best‑practice case studies—from automotive edge computing to service‑mesh migration, hybrid‑cloud PaaS evolution, observability standards, and high‑performance API‑gateway deployments—on Day 2.

APISIXCloud NativeDevOps
0 likes · 33 min read
Fourth Techo TVP Developer Conference: Cloud Native Trends and Best Practices
Efficient Ops
Efficient Ops
Apr 27, 2022 · Operations

Why Choose Loki Over ELK? A Practical Guide to Scalable Log Aggregation

This article explains the motivations for selecting Grafana Loki instead of traditional ELK/EFK stacks, introduces Loki's core concepts and architecture, details component roles, provides step‑by‑step deployment of Promtail and Loki, and demonstrates how to configure and query logs in Grafana while addressing label indexing, dynamic tags, high‑cardinality challenges, and query performance.

GrafanaKubernetesLoki
0 likes · 18 min read
Why Choose Loki Over ELK? A Practical Guide to Scalable Log Aggregation
JD Retail Technology
JD Retail Technology
Apr 27, 2022 · Industry Insights

How JD Achieves Seamless Stability During Massive Sales Events

The article reviews the Global Information System Stability Summit and JD's technical architect Li Junliang's detailed case study on the engineering practices, observability, chaos engineering, and resource‑scheduling innovations that enable JD’s e‑commerce platform to handle sales‑peak traffic that spikes hundreds of times over normal load.

Industry InsightsObservabilitychaos engineering
0 likes · 7 min read
How JD Achieves Seamless Stability During Massive Sales Events
Top Architect
Top Architect
Apr 27, 2022 · Backend Development

Comprehensive Guide to Backend Architecture: Microservices, Service Mesh, Observability, and Messaging

This article provides an in‑depth overview of modern backend architecture, covering microservice fundamentals, service mesh concepts, observability pillars, messaging queue choices, and practical design considerations such as service registration, configuration centers, and security mechanisms.

MessagingObservabilitybackend-architecture
0 likes · 28 min read
Comprehensive Guide to Backend Architecture: Microservices, Service Mesh, Observability, and Messaging
Volcano Engine Developer Services
Volcano Engine Developer Services
Apr 26, 2022 · Operations

How Volcano Engine’s TLS Transforms Log Management for Kubernetes at Scale

This article explains the challenges of traditional open‑source log collection in cloud‑native environments, describes Volcano Engine’s unified TLS architecture, its centralized configuration, CRD‑based deployment, and showcases real‑world case studies that demonstrate improved availability, efficiency, and scalability.

Cloud NativeDistributed SystemsKubernetes
0 likes · 15 min read
How Volcano Engine’s TLS Transforms Log Management for Kubernetes at Scale
dbaplus Community
dbaplus Community
Apr 25, 2022 · Operations

From Monitoring to Observability: Expert Insights on Evolving Cloud‑Native Operations

In this interview series, three industry experts explain how monitoring differs from observability, the shifts required for ops, developers, and architects, the core methodologies and technologies behind metrics, traces, and logs, and practical guidance for selecting and integrating observability tools in cloud‑native environments.

MetricsObservabilityOperations
0 likes · 16 min read
From Monitoring to Observability: Expert Insights on Evolving Cloud‑Native Operations
MaGe Linux Operations
MaGe Linux Operations
Apr 22, 2022 · Backend Development

Essential Microservice Patterns: Decomposition, Integration & Observability

This article outlines the key microservice design patterns—including decomposition, integration, event‑driven, saga, and observability techniques—while explaining their goals, principles, and practical considerations such as database per service, CQRS, and cross‑cutting concerns like health checks and circuit breakers.

Backend ArchitectureDesign PatternsObservability
0 likes · 19 min read
Essential Microservice Patterns: Decomposition, Integration & Observability
Ops Development Stories
Ops Development Stories
Apr 21, 2022 · Cloud Native

Essential Kubernetes Production Checklist for Web Services

A comprehensive, step‑by‑step checklist guides teams through documentation, application design, security, CI/CD, Kubernetes configuration, monitoring, testing, and 24/7 support to reliably run web services with HTTP APIs in production on Kubernetes.

DevOpsKubernetesObservability
0 likes · 9 min read
Essential Kubernetes Production Checklist for Web Services
政采云技术
政采云技术
Apr 19, 2022 · Cloud Native

A Practical Guide to Dapr Core Features: Pub/Sub, Resource Bindings, Actors, Observability, Secrets, and Configuration

This comprehensive technical tutorial demonstrates how to implement and configure core Dapr features, including publish/subscribe messaging, resource bindings, virtual actors, distributed tracing, secrets management, and dynamic configuration, using Java applications deployed on Kubernetes with practical code examples and command-line instructions.

Cloud NativeDaprJava
0 likes · 21 min read
A Practical Guide to Dapr Core Features: Pub/Sub, Resource Bindings, Actors, Observability, Secrets, and Configuration
YunZhu Net Technology Team
YunZhu Net Technology Team
Apr 15, 2022 · Operations

Design and Architecture of a Cloud‑Native Monitoring Platform for Business Systems

The document outlines the background, vision, current status, technical research, value, product and technical architecture, and functional design of a cloud‑native monitoring platform that integrates SkyWalking and Prometheus to provide comprehensive APM, resource utilization, alerting, and rapid fault localization for business and technical middle‑platform services.

APMMetricsObservability
0 likes · 10 min read
Design and Architecture of a Cloud‑Native Monitoring Platform for Business Systems
Alibaba Cloud Native
Alibaba Cloud Native
Apr 13, 2022 · Cloud Native

From Dapper to OpenTelemetry: A Practical Guide to Distributed Tracing and Observability

This article explains the challenges of long request chains in micro‑service architectures, reviews Google’s Dapper tracing requirements, introduces OpenTracing and OpenCensus standards, compares their strengths, and details how OpenTelemetry unifies tracing, metrics and logs with practical integration steps and best‑practice guidance.

Cloud NativeMetricsObservability
0 likes · 24 min read
From Dapper to OpenTelemetry: A Practical Guide to Distributed Tracing and Observability
DevOps
DevOps
Apr 12, 2022 · Operations

Understanding Observability: Core Concepts, SRE Methodology, AIOps, and Business Architecture

The article explains the rising importance of observability in modern operations, defines its control‑theory roots, breaks it down into metrics, traces and logs, and argues that successful implementation requires three pillars—SRE practices, AIOps algorithms, and deep business‑architecture knowledge—together with well‑designed SLOs and critical‑path mapping.

LoggingObservabilitySRE
0 likes · 10 min read
Understanding Observability: Core Concepts, SRE Methodology, AIOps, and Business Architecture
Alibaba Cloud Native
Alibaba Cloud Native
Apr 3, 2022 · Cloud Native

How to Achieve Full Observability for Performance Testing with Prometheus

This guide explains the essential observability concepts—metrics, logs, and traces—for performance testing, compares Zabbix and Prometheus, shows how to extend JMeter with a Prometheus exporter, and details step‑by‑step integration of Alibaba Cloud PTS and Grafana dashboards for comprehensive monitoring.

Cloud NativeObservabilityPrometheus
0 likes · 9 min read
How to Achieve Full Observability for Performance Testing with Prometheus
SQB Blog
SQB Blog
Apr 2, 2022 · Operations

Designing a Next‑Gen Observability Platform: From Zipkin to Hera

This article chronicles the evolution of a company's monitoring system from a Zipkin‑based tracing solution to a cloud‑native observability platform called Hera, detailing design goals, technology choices, challenges with MySQL storage, and the adoption of Prometheus‑compatible metrics, Jaeger tracing, and Kubernetes operators.

ObservabilityPrometheusdistributed tracing
0 likes · 22 min read
Designing a Next‑Gen Observability Platform: From Zipkin to Hera
Aikesheng Open Source Community
Aikesheng Open Source Community
Mar 29, 2022 · Databases

Performance Tuning and Observation Techniques for dble Using BenchmarkSQL

This article shares practical configuration recommendations, system‑resource monitoring methods, and thread‑adjustment strategies for optimizing dble performance during BenchmarkSQL TPC‑C style load testing, highlighting how observable metrics guide effective tuning of the middleware and underlying MySQL nodes.

BenchmarkSQLObservabilitythread optimization
0 likes · 10 min read
Performance Tuning and Observation Techniques for dble Using BenchmarkSQL
StarRocks
StarRocks
Mar 28, 2022 · Backend Development

Scaling Microservice Tracing with Zipkin and StarRocks: A Practical Guide

This article explains how Sohu Smart Media built a high‑performance tracing system for microservices by integrating Zipkin for data collection with StarRocks for storage and analytics, covering architecture, data models, SQL queries, Flink processing, and real‑world results that boost observability and engineering efficiency.

FlinkObservabilitySQL
0 likes · 31 min read
Scaling Microservice Tracing with Zipkin and StarRocks: A Practical Guide
Open Source Linux
Open Source Linux
Mar 18, 2022 · Operations

Evolution of Open‑Source Monitoring Tools: From Nagios to Prometheus

This article traces the development of open‑source monitoring solutions from early tools like Nagios and Cacti through modern platforms such as Prometheus and Nightingale, comparing their strengths, weaknesses, and typical use cases while also looking ahead to emerging observability trends in cloud‑native environments.

NagiosObservabilityOperations
0 likes · 14 min read
Evolution of Open‑Source Monitoring Tools: From Nagios to Prometheus
Open Source Linux
Open Source Linux
Mar 8, 2022 · Operations

Master Kubernetes Troubleshooting: The Three Pillars Every Engineer Needs

This article breaks down Kubernetes troubleshooting into three essential steps—understanding the failure, managing the response, and preventing recurrence—while mapping key monitoring, observability, and incident‑response tools to each phase for reliable cloud‑native operations.

Incident ManagementKubernetesObservability
0 likes · 8 min read
Master Kubernetes Troubleshooting: The Three Pillars Every Engineer Needs
政采云技术
政采云技术
Mar 1, 2022 · Cloud Native

Introduction to Dapr: Features, Architecture, and Installation Guide

This article introduces Dapr, a cloud‑native sidecar runtime for building resilient microservices, explains its core features such as service invocation, state management, pub/sub, bindings, actors, observability, and secrets, and provides step‑by‑step installation instructions for CLI, binaries, Kubernetes, and Helm.

Cloud NativeDaprInstallation
0 likes · 10 min read
Introduction to Dapr: Features, Architecture, and Installation Guide
Alibaba Cloud Native
Alibaba Cloud Native
Feb 28, 2022 · Cloud Native

How to Observe and Diagnose DNS Failures in Kubernetes Clusters

This article explains how DNS operates inside Kubernetes, enumerates common failure causes, describes CoreDNS's built‑in observability plugins, introduces BPF‑based client‑side diagnostics, and provides a step‑by‑step troubleshooting workflow to identify and resolve DNS issues in cloud‑native environments.

BPFCoreDNSDNS
0 likes · 18 min read
How to Observe and Diagnose DNS Failures in Kubernetes Clusters
21CTO
21CTO
Feb 24, 2022 · Backend Development

42 Hard‑Earned Lessons for Building Reliable Production Databases

This article translates Mahesh Balakrishnan’s 42‑point guide on building production databases, covering customer focus, project management, design principles, code review practices, strategy, observability, and research, offering concrete advice for engineers and teams creating robust backend systems.

Code ReviewObservabilityProduction Systems
0 likes · 12 min read
42 Hard‑Earned Lessons for Building Reliable Production Databases
Laravel Tech Community
Laravel Tech Community
Feb 20, 2022 · Backend Development

Highlights of .NET 7 Preview 1: Nullable Annotations, Observability, Code Generation, and New APIs

The article outlines the major features of .NET 7 Preview 1, including nullable annotations for Microsoft.Extensions libraries, enhancements to tracing APIs, code‑generation improvements, dynamic PGO and Arm64 support, p/invoke source generation, new System.Text.Json APIs, and expanded hot‑reload capabilities.

Nullable AnnotationsObservabilitycode generation
0 likes · 5 min read
Highlights of .NET 7 Preview 1: Nullable Annotations, Observability, Code Generation, and New APIs
Ctrip Technology
Ctrip Technology
Feb 17, 2022 · Operations

Evolution and Architecture of the Hickwall Enterprise Monitoring Platform

The article details the background, challenges, multi‑year evolution, current architecture, and future roadmap of Hickwall, Ctrip's enterprise‑grade monitoring and observability platform, covering metrics, logs, traces, high‑cardinality handling, cloud‑native integration, alert governance, and storage engine migrations.

ObservabilityOperationsTSDB
0 likes · 15 min read
Evolution and Architecture of the Hickwall Enterprise Monitoring Platform
Baidu Tech Salon
Baidu Tech Salon
Jan 27, 2022 · Cloud Native

How China Unicom’s Service Mesh Evolved: From SDKs to Sidecars and Beyond

This article details China Unicom Software Research Institute's multi‑year journey of adopting Kubernetes‑based service mesh, outlining the evolution from SDK‑driven microservices to sidecar‑based architectures, migration strategies with Baidu, performance optimizations, observability enhancements, and future product roadmaps.

Cloud NativeIstioKubernetes
0 likes · 13 min read
How China Unicom’s Service Mesh Evolved: From SDKs to Sidecars and Beyond
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Jan 26, 2022 · Operations

Mastering Microservice Monitoring, Fault Tolerance, and Security: A Complete Guide

This article explains how to monitor microservice architectures, describes log, tracing, and metric monitoring, compares open‑source tracing tools, outlines fault‑tolerance strategies such as timeout, rate‑limiting, degradation, async buffering and circuit breaking, and details access‑security mechanisms including gateway authentication, service‑side auth, and OAuth2.0 token flows, while also introducing container technology and its role in microservice deployment.

ContainersObservabilityfault-tolerance
0 likes · 43 min read
Mastering Microservice Monitoring, Fault Tolerance, and Security: A Complete Guide
MaGe Linux Operations
MaGe Linux Operations
Jan 22, 2022 · Cloud Native

Boost Kubernetes Monitoring: Migrate from Prometheus to Thanos for Scalable Low‑Cost Metrics

This article examines the limitations of a standard Prometheus‑based monitoring stack on Kubernetes, explains how adopting Thanos improves metric retention and reduces infrastructure costs, and provides a detailed multi‑cluster deployment guide with Terraform, TLS configuration, and Grafana visualization.

KubernetesObservabilityPrometheus
0 likes · 16 min read
Boost Kubernetes Monitoring: Migrate from Prometheus to Thanos for Scalable Low‑Cost Metrics
Efficient Ops
Efficient Ops
Jan 20, 2022 · Operations

Mastering Prometheus Metrics: Best Practices for Effective Monitoring

This article outlines practical guidelines for designing Prometheus metrics, covering how to define monitoring targets, choose appropriate vectors and labels, name metrics and labels correctly, select histogram buckets, and leverage Grafana features to visualize and troubleshoot data effectively.

GrafanaMetricsObservability
0 likes · 11 min read
Mastering Prometheus Metrics: Best Practices for Effective Monitoring
Baidu Geek Talk
Baidu Geek Talk
Jan 12, 2022 · Backend Development

Serverless Architecture Evolution: Baidu Search Content Platform's FaaS and Intelligent Transformation

Baidu’s search content platform transitioned to a serverless, FaaS‑based architecture with intelligent scheduling and automated control, cutting resource waste by 87%, boosting automatic recovery to 96.7%, and delivering roughly tenfold productivity gains across development, deployment, and maintenance while simplifying scalability and high‑availability concerns.

FaaSIntelligent SchedulingObservability
0 likes · 27 min read
Serverless Architecture Evolution: Baidu Search Content Platform's FaaS and Intelligent Transformation
Java High-Performance Architecture
Java High-Performance Architecture
Jan 12, 2022 · Cloud Native

Mastering Service Mesh with Istio: A Hands‑On Guide to Traffic, Security, and Observability

This tutorial explains the fundamentals of service mesh, explores Istio’s architecture and core components, and provides step‑by‑step instructions for installing Istio on Kubernetes, deploying a sample microservice application, and leveraging traffic management, mutual TLS, observability, and advanced use cases such as routing, circuit breaking, and JWT‑based access control.

IstioKubernetesObservability
0 likes · 22 min read
Mastering Service Mesh with Istio: A Hands‑On Guide to Traffic, Security, and Observability
IT Architects Alliance
IT Architects Alliance
Jan 7, 2022 · Cloud Native

Introduction to Service Mesh and Istio: Concepts, Architecture, and Practical Deployment

This tutorial explains the fundamentals of service mesh, outlines Istio’s architecture and core components, demonstrates how to install and configure Istio on Kubernetes, and showcases common use cases such as traffic management, security, observability, and alternatives, providing a comprehensive guide for modern micro‑service deployments.

IstioObservabilityService Mesh
0 likes · 18 min read
Introduction to Service Mesh and Istio: Concepts, Architecture, and Practical Deployment
Architect
Architect
Jan 5, 2022 · Cloud Native

Introduction to Service Mesh and Istio: Concepts, Architecture, and Hands‑On Guide

This tutorial explains the fundamentals of service mesh, outlines Istio’s architecture and core components, demonstrates how to install Istio on Kubernetes, and walks through practical examples such as traffic routing, security policies, observability, and common use‑cases, while also comparing alternative solutions.

IstioKubernetesObservability
0 likes · 20 min read
Introduction to Service Mesh and Istio: Concepts, Architecture, and Hands‑On Guide
Tencent Cloud Developer
Tencent Cloud Developer
Dec 23, 2021 · Cloud Native

An Overview of OpenTelemetry: Origins, Architecture, and Instrumentation

OpenTelemetry unifies tracing, metrics, and logs by merging OpenTracing and OpenCensus into a cross‑language specification, collector, language SDKs, and instrumentation libraries, offering vendor‑agnostic, low‑maintenance telemetry collection that separates data gathering from business logic while requiring external back‑ends for storage and analysis.

Cloud NativeCollectorInstrumentation
0 likes · 10 min read
An Overview of OpenTelemetry: Origins, Architecture, and Instrumentation
Qingyun Technology Community
Qingyun Technology Community
Dec 22, 2021 · Cloud Native

What’s New in KubeSphere 3.2.1? Key Features, Fixes, and Upgrade Guide

Version 3.2.1 of the open‑source KubeSphere platform introduces a series of enhancements—including container group status filtering, improved image builder dialogs, expanded quota visibility, numerous UI bug fixes, and updated DevOps pipelines—alongside detailed installation and upgrade instructions for Linux and Kubernetes environments.

Cloud NativeKubeSphereKubernetes
0 likes · 8 min read
What’s New in KubeSphere 3.2.1? Key Features, Fixes, and Upgrade Guide
21CTO
21CTO
Dec 20, 2021 · Cloud Native

Why Cloud‑Native Architecture Is the Future of SaaS and How to Implement It

This article explains what cloud‑native architecture is, why it is essential for modern SaaS businesses, and provides a step‑by‑step guide—including serverless migration, elasticity, observability, resilience, and automation—on how to adopt it using Alibaba Cloud SAE and related services.

ObservabilitySaaScloud-native
0 likes · 22 min read
Why Cloud‑Native Architecture Is the Future of SaaS and How to Implement It
Java Architecture Diary
Java Architecture Diary
Dec 13, 2021 · Backend Development

Essential Java & Cloud Native Resources: From JDK 17 to GraalVM, Spring & More

This curated collection gathers essential articles and tutorials covering Java 8‑17 updates, GraalVM performance tricks, Spring Native adoption, Spring Cloud and RSocket alternatives, GraphQL frameworks, observability stacks like Grafana, Prometheus and Loki, IDE enhancements, database fundamentals, and low‑code platform building, providing a comprehensive knowledge base for modern backend developers.

JavaLow‑codeObservability
0 likes · 4 min read
Essential Java & Cloud Native Resources: From JDK 17 to GraalVM, Spring & More
Tencent Cloud Middleware
Tencent Cloud Middleware
Dec 9, 2021 · Cloud Native

Why Observability Is the Missing Piece for Day‑2 Success in Cloud‑Native and Serverless Systems

The article explains how observability—through logs, metrics, and traces—transforms the opaque, complex day‑2 operations of micro‑service, Kubernetes, and serverless environments into a deterministic, diagnosable system, highlighting OpenTelemetry, practical collection methods, and real‑world implementation challenges and benefits.

ObservabilityOpenTelemetryServerless
0 likes · 17 min read
Why Observability Is the Missing Piece for Day‑2 Success in Cloud‑Native and Serverless Systems
Alibaba Cloud Native
Alibaba Cloud Native
Dec 7, 2021 · Cloud Native

Unlocking the Third Way of Distributed Tracing: Post‑Aggregation Link Analysis Explained

This article introduces the third, post‑aggregation approach to link tracing—link analysis—showing how real‑time aggregation of stored trace data can quickly pinpoint uneven traffic, single‑machine failures, slow interfaces, business‑level traffic shifts, and gray‑release anomalies while outlining its practical constraints.

APMCloud NativeLink Analysis
0 likes · 11 min read
Unlocking the Third Way of Distributed Tracing: Post‑Aggregation Link Analysis Explained
Laravel Tech Community
Laravel Tech Community
Dec 2, 2021 · Cloud Native

New Features in Apache APISIX 2.11.0: LDAP Authentication, Observability Plugins, Azure Functions, and WASM Support

Apache APISIX 2.11.0 adds an LDAP‑based authentication plugin, expands observability with Datadog and SkyWalking plugins, introduces Azure Functions integration, provides early WASM support, and enhances existing plugins, all illustrated with detailed configuration examples and code snippets.

Azure FunctionsLDAPObservability
0 likes · 8 min read
New Features in Apache APISIX 2.11.0: LDAP Authentication, Observability Plugins, Azure Functions, and WASM Support
GrowingIO Tech Team
GrowingIO Tech Team
Dec 2, 2021 · Cloud Native

Mastering Chaos Mesh: A Hands‑On Guide to Cloud‑Native Chaos Engineering

Chaos Mesh is an open‑source cloud‑native chaos engineering platform that lets you experiment with fault injection across Kubernetes environments, offering visual dashboards, extensive fault types, and step‑by‑step installation and experiment creation guides to help teams uncover system weaknesses and improve resilience.

Chaos MeshFault InjectionKubernetes
0 likes · 12 min read
Mastering Chaos Mesh: A Hands‑On Guide to Cloud‑Native Chaos Engineering
Baidu Geek Talk
Baidu Geek Talk
Nov 24, 2021 · Operations

How Baidu’s Fengjing Uses Holographic Logs to Debug Massive Microservices

Baidu’s Fengjing monitoring platform tackles the daunting challenge of pinpointing failures in its massive Java‑based microservice ecosystem by employing a non‑intrusive probe that captures log metadata, stores it in a database, and reconstructs full request‑level logs with minimal storage overhead.

JavaObservabilitydistributed tracing
0 likes · 9 min read
How Baidu’s Fengjing Uses Holographic Logs to Debug Massive Microservices
Efficient Ops
Efficient Ops
Nov 16, 2021 · Operations

How to Build a Scalable Prometheus Monitoring System with Thanos on Kubernetes

This article explains why monitoring is essential for production stability, compares white‑box and black‑box approaches, and provides a step‑by‑step guide to deploying Prometheus, configuring scrape targets, using Pushgateway and Alertmanager, and scaling the solution with Thanos in a Kubernetes environment.

AlertmanagerObservabilityPrometheus
0 likes · 21 min read
How to Build a Scalable Prometheus Monitoring System with Thanos on Kubernetes
Code Ape Tech Column
Code Ape Tech Column
Nov 15, 2021 · Operations

A Comprehensive Guide to Using Apache SkyWalking for Distributed Tracing, Logging, and Performance Analysis

This article introduces Apache SkyWalking as a powerful open‑source APM solution, compares it with Spring Cloud Sleuth+ZipKin, explains its architecture, walks through server and client setup, data persistence, log collection, performance profiling, alert configuration, and provides practical code snippets and configuration examples.

JavaObservabilitydistributed tracing
0 likes · 14 min read
A Comprehensive Guide to Using Apache SkyWalking for Distributed Tracing, Logging, and Performance Analysis
Open Source Linux
Open Source Linux
Oct 31, 2021 · Operations

Designing Effective Metrics: From Requirements to Labels and Buckets

This guide explains how to define, name, and organize monitoring metrics—covering Google’s four golden indicators, system‑specific measurement objects, vector selection, label conventions, bucket design, and practical Grafana tips—for reliable observability of diverse services.

MetricsObservabilitylabeling
0 likes · 10 min read
Designing Effective Metrics: From Requirements to Labels and Buckets
Top Architect
Top Architect
Oct 17, 2021 · Cloud Native

How Redis Simplifies Microservice Design Patterns, Distributed Transactions, and Observability

This article explains how Redis can be used to implement and simplify a wide range of microservice design patterns—including bounded contexts, asynchronous messaging, orchestrated sagas, transaction inboxes, telemetry, event sourcing, CQRS, and shared data—while improving performance, scalability, and observability in cloud‑native architectures.

CQRSCloud NativeObservability
0 likes · 16 min read
How Redis Simplifies Microservice Design Patterns, Distributed Transactions, and Observability
Alibaba Cloud Native
Alibaba Cloud Native
Oct 10, 2021 · Cloud Native

How to Detect Service and Workload Anomalies in Kubernetes with Advanced Monitoring

This article explains the common pain points of locating anomalies in Kubernetes environments and presents a multi‑layer monitoring framework—trace, metrics, events, and alerts—along with best‑practice scenarios such as network performance, DNS issues, full‑link stress testing, external MySQL access, and multi‑tenant architectures.

DNSKubernetesMetrics
0 likes · 20 min read
How to Detect Service and Workload Anomalies in Kubernetes with Advanced Monitoring
21CTO
21CTO
Sep 27, 2021 · Cloud Native

Why Loki Beats ELK for Kubernetes Logging: Architecture, Deployment, and Query Guide

This article explains the motivation behind choosing Loki over heavyweight ELK/EFK stacks for container‑cloud logging, outlines Loki's lightweight architecture and components, provides step‑by‑step deployment instructions on OpenShift/Kubernetes, and demonstrates how to query logs using the LogQL language and HTTP API.

Cloud NativeKubernetesLogQL
0 likes · 17 min read
Why Loki Beats ELK for Kubernetes Logging: Architecture, Deployment, and Query Guide
21CTO
21CTO
Sep 26, 2021 · Backend Development

How Baidu’s Hulk Framework Accelerates Go Service Development

The Hulk framework, built on GDP2, provides a business‑oriented Go web development platform with out‑of‑the‑box components, standardized architecture, rich observability, and tooling that together improve code quality, development speed, and SRE efficiency for large‑scale short‑video services.

FrameworkGoObservability
0 likes · 18 min read
How Baidu’s Hulk Framework Accelerates Go Service Development
Top Architect
Top Architect
Sep 24, 2021 · Cloud Native

Loki Log System Overview, Architecture, and Deployment Guide

This article introduces Loki, a lightweight log aggregation system for Kubernetes, explains its background and motivations, details its simple architecture and core components (Distributor, Ingester, Querier), discusses scalability and storage options, and provides step‑by‑step deployment instructions with example YAML and shell commands.

Cloud NativeDeploymentKubernetes
0 likes · 16 min read
Loki Log System Overview, Architecture, and Deployment Guide
IT Architects Alliance
IT Architects Alliance
Sep 20, 2021 · Operations

Why Loki Beats ELK for Kubernetes Logging: Architecture and Deployment Guide

This article explains the motivations behind choosing Loki over ELK for container‑cloud logging, details Loki's lightweight architecture—including Distributor, Ingester, and Querier components—covers deployment steps on OpenShift/Kubernetes with YAML manifests, and demonstrates LogQL query syntax for efficient log retrieval.

KubernetesLogQLLogging
0 likes · 18 min read
Why Loki Beats ELK for Kubernetes Logging: Architecture and Deployment Guide
Alibaba Cloud Native
Alibaba Cloud Native
Sep 16, 2021 · Cloud Native

How to Use Kubernetes Monitoring for End-to-End Application Architecture Exploration

This session explains why Kubernetes monitoring is essential for end-to-end observability, describes the five data sources and layers it covers, and walks through discovering and locating architecture, performance, resource, scheduling, and network issues using topology, anomaly detection, and correlation techniques.

Cloud NativeKubernetesObservability
0 likes · 13 min read
How to Use Kubernetes Monitoring for End-to-End Application Architecture Exploration
IT Architects Alliance
IT Architects Alliance
Sep 15, 2021 · Backend Development

Comprehensive Guide to Backend Architecture: Microservices, Service Mesh, Messaging, and Observability

This article provides a detailed overview of modern backend architecture, covering microservice fundamentals, design principles such as Conway's Law and DDD, gateway patterns, communication protocols, service registration, configuration management, observability pillars, service mesh options, and a comparative analysis of popular message‑queue technologies.

Observabilitybackend-architecturecloud-native
0 likes · 27 min read
Comprehensive Guide to Backend Architecture: Microservices, Service Mesh, Messaging, and Observability
HomeTech
HomeTech
Sep 15, 2021 · Backend Development

How ASF Simplifies gRPC‑to‑Go Migration and Boosts Service Governance

This article explains the AutoHome Service Framework (ASF), its architecture, how it enables seamless migration from gRPC to Go services, the added Dubbo‑go support, configuration optimizations, advanced load‑balancing strategies, observability enhancements, and future plans for adaptive balancing and zero‑downtime deployments.

GoObservabilityconfiguration
0 likes · 18 min read
How ASF Simplifies gRPC‑to‑Go Migration and Boosts Service Governance
Dada Group Technology
Dada Group Technology
Sep 10, 2021 · Operations

Design and Implementation of JD Daojia Log System Based on Loki

This document details the motivation, architecture, components, query language, and deployment of a Loki‑based log collection and analysis platform for JD Daojia, comparing it with ELK, describing ingestion, real‑time and historical log handling, technical challenges, configuration examples, and future scaling plans.

GrafanaLog ManagementLoki
0 likes · 15 min read
Design and Implementation of JD Daojia Log System Based on Loki
Baidu Intelligent Testing
Baidu Intelligent Testing
Sep 9, 2021 · Cloud Native

Observability Practices in Baidu Search Platform: Real‑time Metrics, Tracing, Logging, and Topology at Hundred‑Billion Scale

This article explains how Baidu's search middle‑platform adopts cloud‑native observability—covering metrics, distributed tracing, log querying, and topology analysis—to ensure high availability, performance, and controllability for a system handling hundreds of billions of requests across millions of micro‑service instances.

LoggingObservabilityTracing
0 likes · 12 min read
Observability Practices in Baidu Search Platform: Real‑time Metrics, Tracing, Logging, and Topology at Hundred‑Billion Scale
Efficient Ops
Efficient Ops
Sep 5, 2021 · Operations

Why Prometheus’s TSDB Makes Monitoring Scalable: A Deep Dive

This article explains how Prometheus’s time‑series database handles massive monitoring data, illustrates practical query examples, and shows why its storage engine and pre‑computation features enable efficient, high‑performance observability for large‑scale services.

ObservabilityPrometheusTSDB
0 likes · 8 min read
Why Prometheus’s TSDB Makes Monitoring Scalable: A Deep Dive
DevOps
DevOps
Aug 31, 2021 · Backend Development

Designing an Uber‑Like Microservice System with DDD, OpenTelemetry Observability, and Reinforced Chaos Engineering

This article describes how to model a complex Uber‑style ride‑hailing system using Domain‑Driven Design, implement it with Java Spring Boot microservices, instrument it with OpenTelemetry for full observability, and validate the observability pipeline through a gamified chaos‑engineering approach that reduces MTTR.

DDDJavaObservability
0 likes · 13 min read
Designing an Uber‑Like Microservice System with DDD, OpenTelemetry Observability, and Reinforced Chaos Engineering