Operations 6 min read

Mastering Sparse Files on Linux: Detect, Compress, and Transfer Huge Logs Efficiently

Learn how to recognize Linux sparse files, use commands like du, ls, tar, and find to accurately assess their true disk usage, and efficiently copy or compress massive log files without consuming unnecessary space.

Efficient Ops
Efficient Ops
Efficient Ops
Mastering Sparse Files on Linux: Detect, Compress, and Transfer Huge Logs Efficiently

Yesterday a developer asked me to retrieve a log file. The server showed the log size as 9.0G, and after compression it became 744M, which seemed reasonable. However, later the developer claimed the log was actually 100G.

<code>[root@xxxxx apps]# du -hs smartorder.log
9.0G smartorder.log</code>

After compressing:

<code>[root@xxxxx apps]# du -hs smartorder.log.tar.gz
744M smartorder.log.tar.gz</code>

When I later checked the file size with

ls

using block size in gigabytes, it reported 103G:

<code>[root@xxxxx apps]# ls -l --block-size=G smartorder.log
-rw-r--r-- 1 root root 103G Oct 21 09:00 smartorder.log</code>

This discrepancy is due to the file being a

sparse file

, a feature of Unix-like and NTFS file systems that delays allocating disk space. Sparse files start without allocated data blocks; space is allocated only as data is written, growing in 64KB increments.

To handle sparse files efficiently, the

cp

command provides the

--spare=WHEN

option (auto, always, never). Setting it to

never

forces data to be written, eliminating sparseness.

Other tools that preserve sparseness include

tar

,

cpio

, and

rsync

. For example, using

tar

:

<code>[root@bibang-server apps]# tar cSf smartorder.log.tar smartorder.log
[root@bibang-server apps]# ls -l --block-size=G smartorder.log.tar
-rw-r--r-- 1 root root 10G Oct 21 09:57 smartorder.log.tar</code>
How to find sparse files on the system or verify if a file is sparse?

You can use

find

with the

%S

format to display the sparseness ratio (BLOCK‑SIZE*st_blocks/st_size). Values less than 1.0 indicate a sparse file.

<code>[root@xxxxx apps]# find ./smartorder.log -type f -printf "%S\t%p\n"
0.0886597 ./smartorder.log</code>

To list all sparse files on the filesystem:

<code>find / -type f -printf "%S\t%p\n" | gawk '$1 < 1.0 {print}'</code>

In summary, recognizing that a large log file is actually a sparse file explains the unexpected size discrepancy, and using the appropriate commands allows you to handle such files without wasting disk space.

linuxcommand-lineSysadminfile systemDisk Usagesparse file
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.