Master Linux Observability: Quick Guide to BCC Tools for Performance Debugging
This tutorial introduces the BPF Compiler Collection (BCC) suite, explains how to install it, lists essential Linux commands, and provides step‑by‑step examples of each BCC tool for fast performance analysis, fault isolation, and network troubleshooting on Linux systems.
Observability
In the previous article we introduced the revolutionary eBPF technology in Linux. Writing raw eBPF programs is complex, so developers created the BPF Compiler Collection (BCC) toolkit to let us stand on the shoulders of giants.
BCC provides many useful tools and examples for efficient kernel tracing and program manipulation. This article gives an overall guide on using BCC tools to quickly solve performance, fault‑diagnosis, and network problems (the principles of eBPF and BCC are omitted here; they will be covered later).
The tutorial assumes BCC is already installed and that tools such as
execsnooprun successfully. For installation instructions, refer to the previous article (or the Lima article for macOS).
0. Before Using BCC
Before using BCC you should be familiar with basic Linux commands. The following commands are essential; if you are unsure of their meaning, ask ChatGPT.
<code>uptime
dmesg | tail
vmstat 1
mpstat -P ALL 1
pidstat 1
iostat -xz 1
free -m
sar -n DEV 1
sar -n TCP,ETCP 1
top</code>1. General Performance Analysis
Below is a checklist of BCC tools for performance inspection. These tools are located in the
toolsdirectory of the BCC git repository.
1.1 execsnoop
execsnoopprints a line for each new process. It helps identify short‑lived processes that may consume CPU but are invisible to most periodic monitoring tools such as
top. It traces
exec()rather than
fork(), so it captures many new processes but not those that only fork.
<code># ./execsnoop
PCOMM PID RET ARGS
supervise 96600 ./run
supervise 96610 ./run
mkdir 96620 /bin/mkdir -p ./main
run 96630 ./run
[...]</code>1.2 opensnoop
opensnoopprints a line for each
open()system call, showing detailed information. The opened files reveal how an application works (data files, config files, logs, etc.). Frequent attempts to open non‑existent files can cause performance degradation.
<code># ./opensnoop
PID COMM FD ERR PATH
1565 redis-server 50 /proc/1565/stat
1603 snmpd 90 /proc/net/dev
1603 snmpd 110 /proc/net/if_inet6
[...]</code>1.3 ext4slower (or btrfsslower, xfsslower, zfsslower)
ext4slowertraces ext4 file‑system operations and times them, printing only operations that exceed a threshold. It is useful for identifying slow disk I/O at the file‑system layer, which is hard to correlate with application‑level latency. Similar tools exist for other file systems, and
fileslowertraces all VFS operations (with higher overhead).
<code># ./ext4slower
Tracing ext4 operations slower than 10 ms
TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME
06:35:01 cron 16464 R 1249016.05 common-auth
06:35:01 cron 16463 R 1249016.04 common-auth
06:35:01 cron 16465 R 1249016.03 common-auth
06:35:01 cron 16465 R 4096010.62 login.defs
[...]</code>1.4 biolatency
biolatencytracks disk I/O latency (time from issue to completion) and prints a histogram when the tool exits (Ctrl‑C or a timeout). It reveals the distribution of latency, exposing outliers and multimodal patterns that average‑only tools like
iostathide.
<code># ./biolatency
Tracing block device I/O...Hit Ctrl‑C to end.
^C
usecs : count distribution
0->1:0||
2->3:0||
4->7:0||
8->15:0||
16->31:0||
32->63:0||
64->127:1||
128->255:12|********|...
[...]</code>1.5 biosnoop
biosnoopoutputs a line for each disk I/O, including latency. It allows detailed inspection of I/O patterns, such as reads queuing behind writes. When the system performs many I/Os, the output can be very verbose.
<code># ./biosnoop
TIME(s) COMM PID DISK T SECTOR BYTES LAT(ms)
0.000004001 supervise 1950 xvda1 W 1309256040960.74
0.000178002 supervise 1950 xvda1 W 1309243240960.61
0.001469001 supervise 1956 xvda1 W 1309244040961.24
[...]</code>1.6 cachestat
cachestatprints a summary line every second (or custom interval) showing file‑system cache statistics. It helps identify low cache‑hit rates and provides clues for performance tuning.
<code># ./cachestat
HITS MISSES DIRTIES READ_HIT% WRITE_HIT% BUFFERS_MB CACHED_MB
10744 1394.9% 2.9% 1223
21951 7089.2% 5.6% 1143
[...]</code>1.7 tcpconnect
tcpconnectprints a line for each active TCP connection (e.g.,
connect()), showing source and destination addresses. It helps locate unexpected connections that may indicate misconfiguration or intrusion.
<code># ./tcpconnect
PID COMM IP SADDR DADDR DPORT
1479 telnet 127.0.0.1 127.0.0.1 23
1469 curl 10.201.219.236 54.245.105.25 80
[...]</code>1.8 tcpaccept
tcpacceptprints a line for each passive TCP connection (e.g.,
accept()), also showing source and destination addresses.
<code># ./tcpaccept
PID COMM IP RADDR LADDR LPORT
907 sshd 192.168.56.119 192.168.56.10 22
[...]</code>1.9 tcpretrans
tcpretransprints a line for each TCP retransmission, including source/destination and kernel state. Retransmissions cause latency and throughput issues; analyzing their patterns can reveal network problems or kernel overload.
<code># ./tcpretrans
TIME PID IP LADDR:LPORT T> RADDR:RPORT STATE
01:55:05 0410.153.223.157:22 R> 69.53.245.40:34619 ESTABLISHED
[...]</code>1.10 runqlat
runqlatmeasures the time threads spend waiting on the CPU run queue and outputs a histogram, quantifying the time lost due to CPU saturation.
<code># ./runqlat
Tracing run queue latency...Hit Ctrl‑C to end.
^C
usecs : count distribution
0->1:233|***********|
2->3:742|************************************|
4->7:203|**********|
[...]</code>1.11 profile
profileis a CPU sampling profiler that periodically captures stack traces and reports a summary of unique stacks with occurrence counts, helping identify code paths that consume CPU resources.
<code># ./profile
Sampling at 49 Hertz of all threads by user + kernel stack...Hit Ctrl‑C to end.
^C
00007f31d76c3251 [unknown]
- sign-file (8877)
1
ffffffff813d0af8 __clear_user
ffffffff813d5277 iov_iter_zero
...
00007f12a133e830 __libc_start_main
083e258d4c544155 [unknown]
- func_ab (13549)
5
[...]</code>2. Observability with Generic Tools
In addition to the performance‑focused tools above, the following generic BCC utilities provide broader observability capabilities.
<code>trace
argdist
funccount</code>2.1 trace
Example: tracing file ownership changes by monitoring the
chown,
fchown, and
lchownsystem calls (entry point
SyS_[f|l]chown). The command prints parameters and the invoking process's UID.
<code>$ trace.py \
'p::SyS_chown "file = %s, to_uid = %d, to_gid = %d, from_uid = %d", arg1, arg2, arg3, $uid' \
'p::SyS_fchown "fd = %d, to_uid = %d, to_gid = %d, from_uid = %d", arg1, arg2, arg3, $uid' \
'p::SyS_lchown "file = %s, to_uid = %d, to_gid = %d, from_uid = %d", arg1, arg2, arg3, $uid'
PID TID COMM FUNC
1269255 1269 python3.6 SyS_lchown file =/tmp/dotsync-usis ...
1269441 1269 zstd SyS_chown file =/tmp/dotsync-vic7...
[...]</code>2.2 argdist
argdistprobes a specified function and aggregates argument values into a histogram or frequency count, revealing the distribution of a parameter without needing a debugger.
Example: measuring typical memory allocation sizes in an application.
<code># ./argdist -p 2420 -c -C 'p:c:malloc(size_t size):size_t:size'
[01:42:29] p:c:malloc(size_t size):size_t:size
COUNT EVENT
1 size =16
2 size =16
3 size =16
4 size =16
^C</code>Another example: building a histogram of buffer sizes passed to
write()across the whole system.
<code># ./argdist -c -H 'p:c:write(int fd, void *buf, size_t len):size_t:len'
[01:45:22] p:c:write(int fd,void*buf,size_t len):size_t:len
len : count distribution
2->3:2|*************|
8->15:2|*************|
32->63:28|****************************************|
64->127:12|*****************|
[...]</code>2.3 funccount
funccounttracks functions, tracepoints, or USDT probes matching a pattern and, upon termination, prints a summary of call counts. Example: counting all kernel functions that start with
vfs_.
<code># ./funccount 'vfs_*'
Tracing...Ctrl‑C to end.
^C
FUNC COUNT
vfs_create 1
vfs_rename 1
vfs_fsync_range 2
vfs_lock_file 30
vfs_fstatat 152
vfs_fstat 154
vfs_write 166
vfs_getattr_nosec 262
vfs_getattr 262
vfs_open 264
vfs_read 470
Detaching...</code>Big Data Technology Tribe
Focused on computer science and cutting‑edge tech, we distill complex knowledge into clear, actionable insights. We track tech evolution, share industry trends and deep analysis, helping you keep learning, boost your technical edge, and ride the digital wave forward.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.