Operations 6 min read

Mastering Sparse Files on Linux: Detect, Compress, and Transfer Huge Logs Efficiently

Learn how to recognize Linux sparse files, use commands like du, ls, tar, and find to accurately assess their true disk usage, and efficiently copy or compress massive log files without consuming unnecessary space.

Efficient Ops

Jun 29, 2022

Mastering Sparse Files on Linux: Detect, Compress, and Transfer Huge Logs Efficiently

Yesterday a developer asked me to retrieve a log file. The server showed the log size as 9.0G, and after compression it became 744M, which seemed reasonable. However, later the developer claimed the log was actually 100G.

[root@xxxxx apps]# du -hs smartorder.log
9.0G smartorder.log

After compressing:

[root@xxxxx apps]# du -hs smartorder.log.tar.gz
744M smartorder.log.tar.gz

When I later checked the file size with ls using block size in gigabytes, it reported 103G:

[root@xxxxx apps]# ls -l --block-size=G smartorder.log
-rw-r--r-- 1 root root 103G Oct 21 09:00 smartorder.log

This discrepancy is due to the file being a sparse file, a feature of Unix-like and NTFS file systems that delays allocating disk space. Sparse files start without allocated data blocks; space is allocated only as data is written, growing in 64KB increments.

To handle sparse files efficiently, the cp command provides the --spare=WHEN option (auto, always, never). Setting it to never forces data to be written, eliminating sparseness.

Other tools that preserve sparseness include tar, cpio, and rsync. For example, using tar:

[root@bibang-server apps]# tar cSf smartorder.log.tar smartorder.log
[root@bibang-server apps]# ls -l --block-size=G smartorder.log.tar
-rw-r--r-- 1 root root 10G Oct 21 09:57 smartorder.log.tar

How to find sparse files on the system or verify if a file is sparse?

You can use find with the %S format to display the sparseness ratio (BLOCK‑SIZE*st_blocks/st_size). Values less than 1.0 indicate a sparse file.

[root@xxxxx apps]# find ./smartorder.log -type f -printf "%S\t%p
"
0.0886597 ./smartorder.log

To list all sparse files on the filesystem:

find / -type f -printf "%S\t%p
" | gawk '$1 < 1.0 {print}'

In summary, recognizing that a large log file is actually a sparse file explains the unexpected size discrepancy, and using the appropriate commands allows you to handle such files without wasting disk space.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

command line sysadmin file system disk usage sparse file

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.