Mastering Sparse Files on Linux: Detect, Compress, and Transfer Huge Logs Efficiently
Learn how to recognize Linux sparse files, use commands like du, ls, tar, and find to accurately assess their true disk usage, and efficiently copy or compress massive log files without consuming unnecessary space.
Yesterday a developer asked me to retrieve a log file. The server showed the log size as 9.0G, and after compression it became 744M, which seemed reasonable. However, later the developer claimed the log was actually 100G.
<code>[root@xxxxx apps]# du -hs smartorder.log
9.0G smartorder.log</code>After compressing:
<code>[root@xxxxx apps]# du -hs smartorder.log.tar.gz
744M smartorder.log.tar.gz</code>When I later checked the file size with
lsusing block size in gigabytes, it reported 103G:
<code>[root@xxxxx apps]# ls -l --block-size=G smartorder.log
-rw-r--r-- 1 root root 103G Oct 21 09:00 smartorder.log</code>This discrepancy is due to the file being a
sparse file, a feature of Unix-like and NTFS file systems that delays allocating disk space. Sparse files start without allocated data blocks; space is allocated only as data is written, growing in 64KB increments.
To handle sparse files efficiently, the
cpcommand provides the
--spare=WHENoption (auto, always, never). Setting it to
neverforces data to be written, eliminating sparseness.
Other tools that preserve sparseness include
tar,
cpio, and
rsync. For example, using
tar:
<code>[root@bibang-server apps]# tar cSf smartorder.log.tar smartorder.log
[root@bibang-server apps]# ls -l --block-size=G smartorder.log.tar
-rw-r--r-- 1 root root 10G Oct 21 09:57 smartorder.log.tar</code>How to find sparse files on the system or verify if a file is sparse?
You can use
findwith the
%Sformat to display the sparseness ratio (BLOCK‑SIZE*st_blocks/st_size). Values less than 1.0 indicate a sparse file.
<code>[root@xxxxx apps]# find ./smartorder.log -type f -printf "%S\t%p\n"
0.0886597 ./smartorder.log</code>To list all sparse files on the filesystem:
<code>find / -type f -printf "%S\t%p\n" | gawk '$1 < 1.0 {print}'</code>In summary, recognizing that a large log file is actually a sparse file explains the unexpected size discrepancy, and using the appropriate commands allows you to handle such files without wasting disk space.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.