Tagged articles

12 articles

Page 1 of 1

Jul 17, 2025 · Backend Development

Boosting Dubbo Performance: Extract Hot Branches, If vs Switch, and CPU Branch Prediction

The article explores how Dubbo’s ChannelEventRunnable code was optimized by separating the frequently‑taken ChannelState.RECEIVED case into its own if statement, compares the runtime efficiency of pure if‑else, mixed if‑switch, and pure switch structures, and explains the underlying CPU branch‑prediction and instruction‑pipeline mechanisms that affect these choices.

CPU optimizationDubboJava performance

0 likes · 15 min read

Boosting Dubbo Performance: Extract Hot Branches, If vs Switch, and CPU Branch Prediction

OPPO Kernel Craftsman

Dec 1, 2023 · Fundamentals

Performance Optimization: Register Access, Assembly Basics, and CPU Pipeline Techniques

The article explains how performance can be dramatically improved by keeping frequently used data in CPU registers instead of memory, understanding basic assembly syntax and instruction types, using branch‑prediction hints, and exploiting the CPU pipeline to reduce stalls and wasted cycles.

Assembly languageCPU registersPerformance Optimization

0 likes · 12 min read

Performance Optimization: Register Access, Assembly Basics, and CPU Pipeline Techniques

Tencent Cloud Developer

Oct 19, 2023 · Fundamentals

Profile-Guided Optimization (PGO) Principles and Practice in Go and C++

Profile‑Guided Optimization (PGO) collects runtime profiling data to recompile programs for higher performance, reducing branch mispredictions and improving code layout; Go gained built‑in PGO in 1.21 with typical 5 % gains, while C++ sees 15‑18 % QPS improvements and devirtualization benefits, and future work aims at deeper block ordering and register allocation.

C++GoPGO

0 likes · 16 min read

IT Services Circle

Apr 1, 2022 · Fundamentals

Using likely/unlikely Macros for Performance Optimization in the Linux Kernel

This article explains how the Linux kernel’s likely and unlikely macros, which wrap GCC’s __builtin_expect, guide branch prediction to improve cache utilization and pipeline efficiency, and demonstrates their impact with sample code and assembly analysis.

CLinux kernelbranch prediction

0 likes · 7 min read

Using likely/unlikely Macros for Performance Optimization in the Linux Kernel

Refining Core Development Skills

Mar 30, 2022 · Fundamentals

Understanding Linux Kernel likely/unlikely Macros for Performance Optimization

This article explains how the Linux kernel's likely and unlikely macros, which wrap __builtin_expect, guide the compiler's branch prediction to improve cache usage and pipeline efficiency, and demonstrates the effect with sample C code and assembly output.

Linuxbranch predictionlikely

0 likes · 9 min read

Understanding Linux Kernel likely/unlikely Macros for Performance Optimization

IT Services Circle

Feb 22, 2022 · Fundamentals

Why Sorting an Array Speeds Up Summation: CPU Pipeline, Hazards, and Branch Prediction Explained

The article examines a puzzling StackOverflow case where sorting a random array before summation yields a six‑fold speedup, explains the phenomenon through CPU five‑stage pipeline fundamentals, structural, data, and control hazards, and shows how branch prediction and operand forwarding mitigate the performance loss.

CPUComputer ArchitectureSorting

0 likes · 16 min read

Why Sorting an Array Speeds Up Summation: CPU Pipeline, Hazards, and Branch Prediction Explained

Node Underground

Aug 23, 2021 · Fundamentals

Why V8 Copies Built‑in Methods Near JIT Code to Beat Branch‑Prediction Penalties

The article explains how V8’s built‑in methods, originally placed far from JIT‑generated code, cause costly branch‑prediction failures on 64‑bit CPUs, and how copying these snippets into a nearby memory region reduces mispredictions and improves performance at a modest memory cost.

CPUJITPerformance

0 likes · 4 min read

Why V8 Copies Built‑in Methods Near JIT Code to Beat Branch‑Prediction Penalties

ITPUB

Apr 8, 2021 · Fundamentals

Why Ordered Arrays Run 10× Faster: CPU Pipelines and Branch Prediction Explained

This article explains how the invention of assembly‑line manufacturing parallels modern CPU pipelines, why processing an ordered array can be nearly ten times faster than an unordered one, and shows a practical bit‑wise optimization to eliminate costly if‑statements for high‑performance code.

CPUOptimizationassembly line

0 likes · 10 min read

Why Ordered Arrays Run 10× Faster: CPU Pipelines and Branch Prediction Explained

vivo Internet Technology

Mar 10, 2021 · Fundamentals

CPU Performance Optimization Using Top‑Down Micro‑architecture Analysis (TMAM)

The article demonstrates how Top‑down Micro‑architecture Analysis Methodology (TMAM) can quickly pinpoint CPU bottlenecks—such as front‑end, back‑end, and bad speculation stalls—in a simple C++ accumulation loop, and shows that applying targeted compiler, alignment, and branch‑prediction optimizations reduces runtime by roughly 34 % while increasing retiring slots.

CCPU performanceTMAM

0 likes · 20 min read

CPU Performance Optimization Using Top‑Down Micro‑architecture Analysis (TMAM)

ITPUB

Jul 31, 2020 · Backend Development

What Java Developers Can Learn from Top StackOverflow Questions: Branch Prediction, Security, Exceptions, and More

This article reviews several of the most popular Java questions on StackOverflow, explaining branch prediction for sorted arrays, why char[] is safer than String for passwords, handling NullPointerException, deterministic random strings, historic timezone quirks, creating an uncatchable exception, and the differences between HashMap, TreeMap and LinkedHashMap, highlighting practical lessons for developers.

Exception HandlingHashMapJava

0 likes · 10 min read

What Java Developers Can Learn from Top StackOverflow Questions: Branch Prediction, Security, Exceptions, and More

ITPUB

Oct 8, 2017 · Fundamentals

Mastering Branch Prediction: Techniques to Minimize Branch Overhead in x86 Code

This article explains the different types of CPU branches, how branch prediction works, and presents practical techniques—including branch‑prediction hints, SETcc/CMOVx instructions, and branch‑less coding—to reduce the performance impact of conditional and indirect jumps in x86 programs.

CMOVCPU pipelineSETcc

0 likes · 14 min read

Mastering Branch Prediction: Techniques to Minimize Branch Overhead in x86 Code

ITPUB

Aug 22, 2017 · Fundamentals

Why Adding Unnecessary Sorting Can Triple Your x86 Code Speed – A Deep Dive into Performance Metrics

This article explores x86 performance optimization by comparing a simple sum‑of‑array loop with and without a pre‑sort step, demonstrating how branch prediction and cache behavior can make seemingly redundant code run up to three times faster, and outlines practical benchmarking principles and common pitfalls.

C ProgrammingCPU cyclesPerformance Optimization

0 likes · 14 min read

Why Adding Unnecessary Sorting Can Triple Your x86 Code Speed – A Deep Dive into Performance Metrics