Tag

branch prediction

0 views collected around this technical thread.

OPPO Kernel Craftsman
OPPO Kernel Craftsman
Dec 1, 2023 · Fundamentals

Performance Optimization: Register Access, Assembly Basics, and CPU Pipeline Techniques

The article explains how performance can be dramatically improved by keeping frequently used data in CPU registers instead of memory, understanding basic assembly syntax and instruction types, using branch‑prediction hints, and exploiting the CPU pipeline to reduce stalls and wasted cycles.

CPU registersPerformance OptimizationPipeline
0 likes · 12 min read
Performance Optimization: Register Access, Assembly Basics, and CPU Pipeline Techniques
Tencent Cloud Developer
Tencent Cloud Developer
Oct 19, 2023 · Fundamentals

Profile-Guided Optimization (PGO) Principles and Practice in Go and C++

Profile‑Guided Optimization (PGO) collects runtime profiling data to recompile programs for higher performance, reducing branch mispredictions and improving code layout; Go gained built‑in PGO in 1.21 with typical 5 % gains, while C++ sees 15‑18 % QPS improvements and devirtualization benefits, and future work aims at deeper block ordering and register allocation.

C++GoPGO
0 likes · 16 min read
Profile-Guided Optimization (PGO) Principles and Practice in Go and C++
OPPO Kernel Craftsman
OPPO Kernel Craftsman
Aug 19, 2022 · Fundamentals

Superscalar Processor Architecture and Performance Modeling for Mobile Devices

Modern mobile CPUs are superscalar, using deep pipelining, branch prediction, register renaming, out‑of‑order issue, execution, write‑back, and commit stages to boost instruction‑level parallelism, while performance modeling via CPI and hardware counters helps engineers overcome power, memory, and compiler limitations for efficient code.

CPUMobile ProcessorPipeline
0 likes · 13 min read
Superscalar Processor Architecture and Performance Modeling for Mobile Devices
IT Services Circle
IT Services Circle
Apr 1, 2022 · Fundamentals

Using likely/unlikely Macros for Performance Optimization in the Linux Kernel

This article explains how the Linux kernel’s likely and unlikely macros, which wrap GCC’s __builtin_expect, guide branch prediction to improve cache utilization and pipeline efficiency, and demonstrates their impact with sample code and assembly analysis.

C++Linux kernelPerformance Optimization
0 likes · 7 min read
Using likely/unlikely Macros for Performance Optimization in the Linux Kernel
Refining Core Development Skills
Refining Core Development Skills
Mar 30, 2022 · Fundamentals

Understanding Linux Kernel likely/unlikely Macros for Performance Optimization

This article explains how the Linux kernel's likely and unlikely macros, which wrap __builtin_expect, guide the compiler's branch prediction to improve cache usage and pipeline efficiency, and demonstrates the effect with sample C code and assembly output.

C++KernelOptimization
0 likes · 9 min read
Understanding Linux Kernel likely/unlikely Macros for Performance Optimization
IT Services Circle
IT Services Circle
Feb 22, 2022 · Fundamentals

Why Sorting an Array Speeds Up Summation: CPU Pipeline, Hazards, and Branch Prediction Explained

The article examines a puzzling StackOverflow case where sorting a random array before summation yields a six‑fold speedup, explains the phenomenon through CPU five‑stage pipeline fundamentals, structural, data, and control hazards, and shows how branch prediction and operand forwarding mitigate the performance loss.

CPUComputer ArchitecturePerformance
0 likes · 16 min read
Why Sorting an Array Speeds Up Summation: CPU Pipeline, Hazards, and Branch Prediction Explained
vivo Internet Technology
vivo Internet Technology
Mar 10, 2021 · Fundamentals

CPU Performance Optimization Using Top‑Down Micro‑architecture Analysis (TMAM)

The article demonstrates how Top‑down Micro‑architecture Analysis Methodology (TMAM) can quickly pinpoint CPU bottlenecks—such as front‑end, back‑end, and bad speculation stalls—in a simple C++ accumulation loop, and shows that applying targeted compiler, alignment, and branch‑prediction optimizations reduces runtime by roughly 34 % while increasing retiring slots.

C++CPU performanceMicroarchitecture
0 likes · 20 min read
CPU Performance Optimization Using Top‑Down Micro‑architecture Analysis (TMAM)