Fundamentals 10 min read

Understanding File System Inodes, Block Indexing, and Sparse Files: Why the cp Command Can Be Extremely Fast

The article explains how Linux file systems use inodes and multi‑level block indexing to separate a file's logical size from its physical storage, illustrating why copying a seemingly huge file with the cp command can complete instantly when the file is sparse.

Top Architect
Top Architect
Top Architect
Understanding File System Inodes, Block Indexing, and Sparse Files: Why the cp Command Can Be Extremely Fast

A senior architect recounts a puzzling case where a colleague used the cp command to copy a 100 GB file in less than a second, prompting an investigation into the file system behavior.

Using ls -lh the file appears as 100 GB, but du -sh ./test.txt reports only 2 MB, and stat ./test.txt shows a size of 107374182400 bytes with only 4096 blocks (2 MB) allocated, revealing a discrepancy between logical size and actual disk usage.

The article clarifies that the "Size" field reflects the file's logical length, while the "Blocks" field indicates the physical space actually occupied on disk.

To make the concept intuitive, the file system is likened to a luggage storage service: the file name is the label, the inode is the metadata tag, the file data are the luggage, and the disk is the storage room.

In a typical Unix‑like file system, data are stored in fixed‑size blocks (usually 4 KB). An inode contains direct pointers to up to 12 blocks and indirect pointers (single, double, triple) that reference additional blocks of pointers, enabling the system to address very large files while keeping the inode size modest.

Direct indexing can address up to 48 KB, single indirect up to 4 MB, double indirect up to 4 GB, and triple indirect up to 4 TB, illustrating how multi‑level indexing expands addressable space without inflating the inode.

When a file is sparse—its logical size is large but most blocks are unallocated—the cp command copies only the allocated blocks, so the operation finishes quickly. The article shows an example where a file reports a size of 1 TB + 4 KB but actually contains only two 4 KB data blocks, resulting in a physical usage of just 8 KB.

Key takeaways: the inode stores the file's logical size and a list of block pointers; physical storage is allocated only for blocks that contain data; and sparse files allow large logical sizes with minimal disk consumption, which explains the surprising speed of cp on such files.

linuxfile systemInodeblock indexingsparse filecp command
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.