Tag

system performance

0 views collected around this technical thread.

OPPO Kernel Craftsman
OPPO Kernel Craftsman
Aug 9, 2024 · Fundamentals

Linux Kernel Memory Management Locks and Optimization Case Studies

The article examines Linux kernel 6.9 memory-management locks—PG_locked, lru_lock, mmap_lock, anon_vma rwsem, mapping i_mmap_rwsem, and shrinker_rwsem—explaining their roles and presenting eight community-driven optimizations such as per-memcg lru_lock, per-VMA locks, speculative faults, and lock-less shrinker techniques to improve concurrency and performance.

LRU LockLinux KernelMemory Management
0 likes · 24 min read
Linux Kernel Memory Management Locks and Optimization Case Studies
Efficient Ops
Efficient Ops
Mar 27, 2024 · Operations

Master System Monitoring with the USE Method and Prometheus

This article explains how to design a comprehensive monitoring system using the concise USE (Utilization, Saturation, Errors) method, outlines essential system and application metrics, and demonstrates practical implementation with Prometheus, Grafana, and related open‑source tools.

PrometheusUSE methodmonitoring
0 likes · 14 min read
Master System Monitoring with the USE Method and Prometheus
IT Services Circle
IT Services Circle
Aug 3, 2023 · Information Security

Linus Torvalds Criticizes AMD fTPM for System Hangs and Calls for Its Disablement

Linus Torvalds, after previously praising AMD, now denounces the AMD fTPM implementation for causing intermittent system hangs on Windows and Linux, explaining the underlying memory‑transaction issue, AMD’s delayed fixes, and his recommendation to disable fTPM in favor of the CPU’s rdrand instruction.

AMDLinuxfTPM
0 likes · 7 min read
Linus Torvalds Criticizes AMD fTPM for System Hangs and Calls for Its Disablement
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
May 15, 2023 · Artificial Intelligence

GPU-Accelerated Inference Optimization for Large-Scale Machine Learning at Xiaohongshu

Xiaohongshu transformed its recommendation, advertising, and search inference pipeline by migrating to GPU‑centric hardware, deploying a custom TensorFlow‑Core Lambda service, and applying system‑level, virtualization, and compute‑level optimizations—including NUMA binding, kernel fusion, dynamic scaling, and FP16 quantization—achieving roughly 30× compute capacity growth, over 10% user‑metric gains, and more than 50% cluster‑resource savings.

GPU optimizationLarge ModelsMachine Learning Inference
0 likes · 20 min read
GPU-Accelerated Inference Optimization for Large-Scale Machine Learning at Xiaohongshu
Refining Core Development Skills
Refining Core Development Skills
Jan 10, 2023 · Fundamentals

Understanding Linux Load Average: Principles and Calculations

This article explains Linux load average, covering how it is calculated, its relationship with CPU usage, and how the kernel exposes load data to applications.

CPUI/OLinux
0 likes · 15 min read
Understanding Linux Load Average: Principles and Calculations
Efficient Ops
Efficient Ops
Aug 17, 2022 · Operations

Master System Monitoring with the USE Method and Prometheus

This article explains how to build a comprehensive monitoring system using the concise USE (Utilization, Saturation, Errors) method, outlines key system and application metrics, and demonstrates practical implementation with Prometheus, Grafana, full‑link tracing, and ELK for observability and performance troubleshooting.

Log AnalysisObservabilityPrometheus
0 likes · 13 min read
Master System Monitoring with the USE Method and Prometheus
Architecture Digest
Architecture Digest
May 18, 2022 · R&D Management

From System Performance Optimization to R&D Process Improvement: Measuring and Optimizing Workflow

The article explains how quantifying and measuring both technical systems and organizational processes can reveal inefficiencies, using a concrete image‑processing service example to illustrate how workflow analysis, metric collection, and architectural redesign lead to resource savings and how the same principles apply to DevOps and R&D management.

DevOpsR&D managementValue Stream
0 likes · 12 min read
From System Performance Optimization to R&D Process Improvement: Measuring and Optimizing Workflow
Efficient Ops
Efficient Ops
Apr 19, 2022 · Operations

Master Linux ‘top’: Real‑Time Process Monitoring and Performance Tuning

This guide explains how to use the Linux top command to monitor real‑time process activity, interpret CPU, memory and swap statistics, customize displayed columns, apply command‑line options, and understand key metrics such as load average and steal time for effective system performance management.

CPULinuxProcess Monitoring
0 likes · 10 min read
Master Linux ‘top’: Real‑Time Process Monitoring and Performance Tuning
OPPO Kernel Craftsman
OPPO Kernel Craftsman
Nov 12, 2021 · Operations

Linux CPU Time Guardians: Understanding cputime, cpufreq_stats, cpufreq_times, and cpuidle_time

The article explains Linux kernel CPU‑time accounting modules—cputime, cpufreq_stats, cpufreq_times, and cpuidle_time—detailing how each records processor usage, frequency transitions, per‑process frequency data, and idle‑state durations, and why they are essential for performance analysis and power‑optimization by system engineers.

CPU TimeLinux KernelOperating System
0 likes · 7 min read
Linux CPU Time Guardians: Understanding cputime, cpufreq_stats, cpufreq_times, and cpuidle_time
JD Retail Technology
JD Retail Technology
Dec 4, 2020 · Operations

JD.com 11.11 Shopping Festival: Technical Operations, Performance, and Cost Optimization Case Study

The article reviews JD.com’s 11.11 shopping festival technical preparation, detailing how the operations team handled record traffic spikes, improved user experience, increased resource efficiency, and reduced IT costs through high‑fidelity testing, dual‑active systems, and a four‑year Taishan project.

Cost Optimizatione-commerceoperations
0 likes · 8 min read
JD.com 11.11 Shopping Festival: Technical Operations, Performance, and Cost Optimization Case Study
Refining Core Development Skills
Refining Core Development Skills
Nov 8, 2020 · Backend Development

Quantifying TCP Connection Latency: Analysis, Abnormal Cases, and Optimization Strategies

This article provides a detailed quantitative analysis of TCP connection establishment latency, examining normal handshake processes, abnormal scenarios like queue overflows and TIME_WAIT exhaustion, and offering practical optimization strategies for backend systems.

Connection QueuesLinux KernelTCP Protocol
0 likes · 16 min read
Quantifying TCP Connection Latency: Analysis, Abnormal Cases, and Optimization Strategies
OPPO Kernel Craftsman
OPPO Kernel Craftsman
Jan 3, 2020 · Operations

Understanding Linux PSI: Pressure Stall Information for System Resource Monitoring

Pressure Stall Information (PSI) is a Linux kernel feature that measures real‑time CPU, memory, and I/O pressure by tracking task wait times, offering finer granularity than load average or vmpressure, and enabling more accurate scheduling, cgroup management, and out‑of‑memory handling.

CPU schedulingLinux KernelMemory Management
0 likes · 14 min read
Understanding Linux PSI: Pressure Stall Information for System Resource Monitoring
Efficient Ops
Efficient Ops
Oct 20, 2019 · Operations

Why Low CPU Usage Coexists with High Load? Linux Load & Scheduling Explained

This article explains why a Linux system can show a high load average while CPU utilization remains low, covering the concepts of load, multi‑tasking operating systems, process states, scheduling, and common scenarios that cause I/O‑bound load spikes.

IO waitLinuxload average
0 likes · 13 min read
Why Low CPU Usage Coexists with High Load? Linux Load & Scheduling Explained
Efficient Ops
Efficient Ops
Jul 11, 2019 · Fundamentals

Why Is My CPU Idle Yet Load Is High? Uncover Linux Load & I/O Bottlenecks

High system load can occur even when CPU usage is low, typically due to many processes waiting for disk I/O; this article explains load concepts, process states, scheduling, and common scenarios such as excessive I/O requests, unindexed MySQL queries, and faulty external storage that cause such bottlenecks.

Linuxio bottleneckload average
0 likes · 12 min read
Why Is My CPU Idle Yet Load Is High? Uncover Linux Load & I/O Bottlenecks
Efficient Ops
Efficient Ops
Feb 28, 2019 · Operations

What Is Linux loadavg? Understanding Run Queues and Kernel Calculation

This article explains the Linux load average metric, the run queue structure, why both running (R) and uninterruptible (D) processes are counted, and how the kernel uses an exponential weighted moving average to compute the 1‑, 5‑, and 15‑minute load values.

Linuxkernelload average
0 likes · 8 min read
What Is Linux loadavg? Understanding Run Queues and Kernel Calculation
Efficient Ops
Efficient Ops
Jul 3, 2016 · Operations

Memory Myths, Subnet Mask Mistakes, and Telnet Tricks: Ops Lessons

This article shares real‑world ops stories about a disputed memory upgrade, explains how Linux calculates usable memory, clarifies common subnet‑mask misunderstandings, and demonstrates why Telnet cannot test UDP ports, highlighting practical troubleshooting lessons for system administrators.

linux memorynetwork troubleshootingoperations
0 likes · 12 min read
Memory Myths, Subnet Mask Mistakes, and Telnet Tricks: Ops Lessons
Qunar Tech Salon
Qunar Tech Salon
Jun 29, 2015 · Fundamentals

Understanding System Load Average and Its Interpretation

The article explains how Unix-like systems calculate load average using an exponentially damped weighted moving average, how the values reflect CPU and I/O contention on single- and multi‑CPU machines, and why different kernel implementations may count processes and threads differently, affecting performance monitoring.

CPU utilizationLinuxUnix
0 likes · 6 min read
Understanding System Load Average and Its Interpretation