Profiling Rust Applications with macOS Instruments Time Profiler
This article explains how to use the macOS Instruments Time Profiler to perform CPU‑time profiling of Rust programs, demonstrates a sample π‑calculation benchmark, shows the required Cargo configuration, walks through recording and inspecting trace files, and applies the method to diagnose performance regressions in the Rspack project.
After submitting a large PR to improve SourceMapDevToolPlugin in Rspack, the author observed a significant performance regression and decided to investigate using macOS Instruments.
Instruments, built into Xcode, provides a suite of analysis tools; the article focuses on the Time Profiler, which samples stack traces at configurable intervals (e.g., 1 ms) to reveal where CPU time is spent without heavily impacting the program.
To generate meaningful data, a Rust benchmark that approximates π using the Leibniz series is prepared. The source code is deliberately marked with #[inline(never)] to keep the function visible in the profile:
#[inline(never)]
fn calculate_pi(iterations: u64) -> f64 {
let mut pi: f64 = 0.0;
let mut denominator: f64 = 1.0;
for i in 0..iterations {
if i % 2 == 0 {
pi += 4.0 / denominator;
} else {
pi -= 4.0 / denominator;
}
denominator += 2.0;
}
pi
}
fn main() {
let pi = calculate_pi(1_000_000_000);
println!("Calculated Pi is: {}", pi);
}The Cargo.toml is configured to emit debug information in release builds:
[profile.release]
debug = 1 # enable debug info
strip = false # keep symbolsRunning the binary with cargo run --release ensures the profile reflects the optimized code that end users will execute.
To record a profile, the following command is used:
xcrun xctrace record --template 'Time Profile' --output ./output.trace --launch -- /path/to/your/rust/project/target/release/your_binaryAfter the trace is generated, it can be opened with open ./output.trace , which launches Instruments and displays the sampled CPU usage. In the example, the calculate_pi function consumes about 4.90 seconds (≈99.9% of total CPU time), clearly identifying the hotspot.
The same technique is applied to the Rspack codebase. Comparing the main branch with a development branch revealed that the process_assets_stage_dev_tooling method took 2.22 seconds versus 1.06 seconds, primarily due to the source.map call inside a filter_map iteration.
The regression stemmed from unintentionally replacing a parallel iterator ( par_iter from Rayon) with a sequential iter , eliminating multi‑threaded execution of source.map . Restoring par_iter eliminated the slowdown and restored benchmark numbers.
In conclusion, the author reflects that while Rust offers high performance, achieving optimal results requires deep understanding of its tooling and concurrency primitives.
References:
Rust Profiling with Instruments and FlameGraph on macOS (CPU/Time)
Apple Instruments Help
Rspack Development Guide – Profiling
Rare Earth Juejin Tech Community
Juejin, a tech community that helps developers grow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.