Search

Discover articles.

Search across authors, categories, and technical themes. The layout mirrors the editorial references while staying responsive and fast.

Results

Matches for “observability”

661 results
Big Data Jun 11, 2025 vivo Internet Technology

How Vivo Built a Scalable Pulsar Monitoring System for Trillion‑Message Workloads

This article details Vivo's end‑to‑end Pulsar observability solution, covering the challenges of Prometheus‑based monitoring, the architecture of the alerting pipeline, adaptor development, metric optimizations for subscription backlog and bundle load, and fixes for kop lag reporting issues.

MonitoringBig DataObservabilityMetricsPrometheusPulsar
Cloud Native Jun 10, 2025 Big Data Technology Tribe

Mastering eBPF Maps: Design, Implementation, and Real‑World Use Cases

This article provides an in‑depth analysis of BPF maps—explaining their design principles, core features, various map types with code examples, and the macro expansion process that turns high‑level BCC helpers into native kernel map definitions for cloud‑native observability.

Cloud NativeObservabilityeBPFLinux kernelBPF mapsBCC
Big Data Jun 1, 2025 DataFunSummit

Scaling WeChat’s Big Data and AI Workloads on Kubernetes: Challenges and Optimizations

This article details WeChat's migration of large‑scale big data and AI workloads to a cloud‑native Kubernetes platform, discussing performance bottlenecks, API server and ETCD overload protection, scheduler enhancements, observability solutions, resource utilization gains, and future serverless directions.

Cloud NativePerformance OptimizationBig DataAIObservabilityKubernetesScheduling
Cloud Native May 28, 2025 FunTester

Extending Automated Thread Dumps: Log Collection, Resource Monitoring, Chaos Engineering, Performance Analysis, and Environment Cleanup

The article explores how automated thread dumps can be expanded into multiple testing scenarios—including log collection, resource monitoring, fault injection, performance result analysis, and environment cleanup—by leveraging Kubernetes APIs, Prometheus, Chaos Mesh, and scripting tools to improve efficiency, observability, and system resilience.

AutomationKubernetesPerformance TestingChaos EngineeringLog CollectionResource Monitoring
Artificial Intelligence May 26, 2025 Java Architecture Diary

How to Build Enterprise‑Ready AI Monitoring with Spring AI and Micrometer

This article explains why observability is essential for Spring AI applications, outlines common cost‑control and performance challenges, and provides a step‑by‑step guide—including Maven setup, client configuration, service implementation, metric exposure, Zipkin tracing, and architecture insights—to create a fully observable, enterprise‑grade AI translation service.

JavaMonitoringObservabilitySpring AITracingMicrometer
Operations Apr 29, 2025 Efficient Ops

Master Linux Performance: Essential Monitoring Tools & Commands

This guide compiles the most important Linux performance analysis utilities—such as vmstat, iostat, dstat, iotop, pidstat, top, htop, mpstat, netstat, ps, strace, uptime, lsof, and perf—explaining their usage, output fields, and how they fit into a comprehensive system observability workflow.

ObservabilityPerformance MonitoringLinuxSystem AdministrationCommand Line Tools
Databases Apr 28, 2025 DeWu Technology

GreptimeDB Distributed Architecture, Transparent Caching, and Flow‑Based Real‑Time Analytics

GreptimeDB solves front‑end observability challenges with a distributed architecture (frontend, datanode, flownode, metasrv), transparent two‑level caching, elastic scaling, and an SQL‑based flow engine for real‑time multi‑granularity aggregation and approximate counting, delivering millisecond query latency and cost‑effective storage.

SQLReal-time AnalyticsDistributed ArchitectureFlow EngineGreptimeDBHyperLogLogTransparent Caching
Backend Development Apr 25, 2025 JavaScript

How I Quickly Added Robust Logging to an Express App with AI‑Powered Trae IDE

Facing unexplained slowdowns in an Express site, I used the AI‑driven Trae IDE to automatically install Morgan, persist logs to rotating files, separate access and error logs, add buffering for performance, and even automate code commits, dramatically improving observability and debugging speed.

Node.jsLoggingExpressAI IDETraeMorganRotating-file-stream
Artificial Intelligence Mar 24, 2025 Airbnb Technology Team

Chronon: Open‑Source Feature Platform for Machine Learning – Architecture, Workflow, and Code Examples

Chronon is an open‑source ML feature platform that lets engineers declaratively define, compute, and serve both batch and real‑time features with built‑in observability, data‑quality checks, and a low‑latency retrieval API, ensuring online‑offline consistency while simplifying pipeline management and enabling future automation.

Machine LearningObservabilityStreamingOpen SourceFeature EngineeringChrononData Pipelines
Operations Mar 23, 2025 Efficient Ops

Essential Linux Log Files Every SRE Should Monitor

This article outlines the most important Linux log files under /var/log, explains what each records—from system and kernel messages to authentication, web server, database, and firewall events—and shows practical commands for inspecting them, helping SREs improve fault detection and system observability.

monitoringOperationsSRElinuxtroubleshootingsystem logs
Previous Page 13 Next