Introduction to SkyWalking: Architecture, Components, and Performance Tuning for Cloud‑Native Microservices
This article explains the background of cloud‑native microservices, introduces the open‑source SkyWalking observability platform and its core components (Agent, OAP server, storage, UI), and demonstrates how to extend SkyWalking with custom plugins and tune its performance to minimize monitoring overhead.
Background: With the rapid advancement of open‑source communities and cloud computing, cloud‑native microservices have become the core architecture for modern applications; Gartner defines microservices as narrowly scoped, tightly encapsulated, loosely coupled components that can be independently deployed and scaled.
Martin Fowler notes that there is no single standard definition for microservices, but generally the style divides a monolithic application into a set of small services, each running in its own process and communicating via lightweight mechanisms such as HTTP RESTful APIs.
Each service is built around a specific business capability and can be deployed independently, improving response speed, flexibility, and deployment elasticity, which enables rapid iteration but also introduces new challenges for application monitoring.
To address these challenges, the SkyWalking open‑source observability platform is introduced. It collects full‑link monitoring data in a non‑intrusive way and visualizes the system’s topology, trace chains, and performance bottlenecks, filling gaps left by traditional testing tools.
SkyWalking Overview: SkyWalking is an open‑source APM system designed for microservices, cloud‑native, and container‑based architectures (Docker, Kubernetes, Mesos). It provides distributed tracing, service‑mesh telemetry analysis, metric aggregation, and integrated visualization.
Agent: The probe runs inside each service instance, gathers trace and metric data, reformats it according to SkyWalking’s specifications, and reports it to the OAP server via gRPC.
OAP Server (Observability Analysis Platform): It processes three main data types—Record data (traces, logs), Metrics data (aggregated indicators defined by OAL), and TopN data (periodic samples such as slow‑SQL). The server performs a two‑step aggregation: a Receiver role handles initial parsing and optional aggregation, while an Aggregator role performs secondary aggregation before persisting results to external storage.
Storage: SkyWalking supports multiple external storage back‑ends, including Elasticsearch, MySQL, TiDB, InfluxDB, and H2. H2 is the default in‑memory store (data lost on restart), while production deployments typically use an Elasticsearch cluster.
UI: The front‑end is a separate application that sends GraphQL queries to the OAP back‑end, retrieves persisted data, and visualizes service dependencies, topology, and various performance metrics.
Application Extensions and Performance Tuning: A custom plugin example shows how to deploy a plugin to the SkyWalking plugins directory. Sample screenshots illustrate trace sampling for a query method and the associated span details. Performance tuning of the SkyWalking Agent—adjusting sampling frequency and count—reduces monitoring overhead, as demonstrated by comparative test results showing lower impact after optimization.
Author: Chen Mingkun, China Agricultural Bank R&D Center.
FunTester
10k followers, 1k articles | completely useless
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.