Discover | BestHub

Quick starts platform engineering distributed systems kubernetes golang mysql raft

Results

Matches for “observability”

678 results

Frontend Development Feb 20, 2024 Tencent Cloud Developer

From Frontend to Full‑Stack: Architecture, Challenges, and Practices of the QQ Frontend Unified Access Layer

The veteran front‑end engineer chronicles a decade of building QQ’s large‑scale products, detailing how the new Frontend Unified Access Layer replaced fragmented SDKs with a high‑performance, scalable, secure gateway built on an internal http2rpc framework, while tackling legacy protocol coexistence, observability, alert fatigue, and targeted performance optimizations.

frontendperformancearchitectureobservabilitytRPCfull-stack

Operations Feb 19, 2024 Efficient Ops

Mastering Prometheus: Practical Tips for Effective Application Monitoring

This article explains how to design and implement Prometheus metrics for application monitoring, covering the selection of monitoring targets, golden metrics, label conventions, naming rules, histogram bucket choices, and Grafana visualization tricks to help engineers build reliable observability pipelines.

MonitoringOperationsObservabilityMetricsPrometheusGrafana

Operations Feb 17, 2024 DevOps Cloud Academy

Implementing Reusable GitHub Actions Workflows for Scalable CI at McDonald's

McDonald's engineering team built a fast, reliable, and flexible continuous integration system by leveraging reusable GitHub Actions workflows, centralizing CI code, defining a golden‑path pipeline, balancing developer autonomy, and adding observability across multilingual microservices, improving productivity and maintainability.

CI/CDMicroservicesObservabilityGitHub ActionsReusable Workflows

Databases Feb 7, 2024 DataFunTalk

Case Study: Replacing Legacy OLAP Database with RisingWave for Real-Time Monitoring at QianXiang Investment

QianXiang Investment replaced its legacy OLAP database with the streaming database RisingWave, achieving three‑fold real‑time performance, over 95% reduction in compute resources, and improved scalability, consistency, and observability for its high‑frequency trading alert system.

Real-time MonitoringDatabase MigrationAlert SystemRisingWaveStreaming Database

Big Data Jan 29, 2024 DataFunTalk

Case Study: Deploying RisingWave for Real-Time Stream Processing in a Large-Scale Quantitative Hedge Fund

An ultra‑large hedge fund with over $10 billion AUM replaced ksqlDB and Flink with RisingWave, leveraging its PostgreSQL‑compatible streaming SQL to achieve sub‑10 ms latency, lower learning and operational costs, rich connectors, advanced operators, and comprehensive observability for real‑time trade data processing.

SQLStreamingData IntegrationLow LatencyRisingWaveQuantitative Trading

Operations Jan 9, 2024 Tencent Cloud Developer

Tencent Cloud APM Full-Link Tracing Implementation and Best Practices

The article explains how Tencent Cloud APM implements full‑link tracing using OpenTelemetry standards, addresses challenges such as protocol compatibility, massive trace storage, and bytecode overhead with solutions like conversion gateways, tail sampling and thread profiling, and showcases best‑practice scenarios for topology analysis, front‑end/back‑end integration, and log‑trace correlation within the broader TCOP observability suite.

APMobservabilityOpenTelemetrythread profilingperformance analysiscloud monitoringfull-link tracingtrace sampling

Operations Jan 5, 2024 Zhuanzhuan Tech

Building an Integrated Monitoring Platform: Architecture, Implementation, and Lessons from ZhaiZhai

This article presents a detailed case study of how ZhaiZhai designed and implemented a unified monitoring platform—combining business services, middleware, and operations resources—by selecting Prometheus and M3DB, automating Grafana dashboards, creating a low‑noise alerting system, and achieving large‑scale observability with significant cost and efficiency gains.

monitoringarchitectureobservabilityalertingPrometheusM3DB

Artificial Intelligence Jan 4, 2024 DataFunTalk

Using OpenLLM to Quickly Build and Deploy Large Language Model Applications

This presentation explains how OpenLLM, an open‑source LLM framework, together with BentoML, addresses the challenges of deploying large language models by offering model switching, memory optimizations, multi‑GPU support, observability, and easy containerized deployment for production AI applications.

PythonLarge Language ModelsAI optimizationLLM deploymentBentoMLOpenLLM

Big Data Dec 28, 2023 Zuoyebang Tech Team

How We Scaled Our Data Platform by Migrating to Apache DolphinScheduler

Facing growing task volumes and diverse workload types, we upgraded our data development platform's scheduling engine to Apache DolphinScheduler, detailing the migration process, architectural enhancements, stability and observability improvements, multi‑tenant support, and the resulting performance gains and future roadmap.

MigrationBig DataObservabilityTask SchedulingData PlatformApache DolphinScheduler

Cloud Native Dec 22, 2023 Bilibili Tech

Safe Change Management in Bilibili's Cloud‑Native Container Platform Caster

The paper describes Bilibili’s Caster platform, which implements standardized workflows, left‑shifted pre‑checks, tiered release checkpoints, and an emergency green‑channel to safely manage containerized application changes, providing real‑time observability, automated rollback, and capacity‑aware scaling that together cut change‑induced incidents and improve production stability.

Cloud Nativeci/cdchange managementstabilitycontainer platformrelease engineering

Previous Page 20 Next