Why cp Copies a 100GB File Instantly: Sparse Files and Inode Basics
An unexpected fast copy of a 100 GB file using the cp command reveals the concept of sparse files, where the logical size differs from physical disk usage, and explains how file systems employ inodes, block allocation, and multi‑level indexing to manage storage efficiently.
cp Triggered Thoughts
A colleague was shocked that copying a 100 GB file with
cpfinished in less than a second. The
ls -lhcommand confirmed the file size, but the copy speed seemed impossible for a typical SATA disk.
<code># ls -lh
-rw-r--r-- 1 root root 100G Mar 6 12:22 test.txt</code>Timing the copy showed:
<code># time cp ./test.txt ./test.txt.cp
real 0m0.107s
user 0m0.008s
sys 0m0.085s</code>A SATA drive can write at about 150 MB/s, so copying 100 GB should take roughly 11 minutes, not a fraction of a second.
Running
du -sh ./test.txtreported only 2 M, indicating the apparent size does not reflect actual disk usage.
<code># du -sh ./test.txt
2.0M ./test.txt</code>The
statcommand showed:
<code># stat ./test.txt
File: ./test.txt
Size: 107374182400 Blocks: 4096 IO Block: 4096 regular file
...</code>Key observations:
Size is the logical file size (what users see).
Blocks represent the actual disk space allocated.
This discrepancy led to the discussion of file systems.
File System Basics
A file system is simply a container for storing data, analogous to a luggage storage service: the file name is the label, metadata is the tag, the file itself is the luggage, the storage room is the disk, and the overall management mechanism is the file system.
Space management involves dividing the disk into fixed‑size blocks (typically 4 KB). Data is stored in these blocks, and an
inoderecords which blocks belong to a file.
inode / block concepts
Inodes contain metadata and an array of block pointers. Direct pointers store up to 12 block numbers (≈48 KB). Larger files use indirect pointers:
Direct index (12 pointers)
Single indirect (points to a block that holds more pointers)
Double indirect
Triple indirect
Capacity calculations:
Direct: 12 × 4 KB = 48 KB
Single indirect: 1024 × 4 KB ≈ 4 MB
Double indirect: 1024 × 4 MB ≈ 4 GB
Triple indirect: 1024 × 4 GB ≈ 4 TB
Thus a typical ext2 file system can address up to about 4 TB.
Why cp Is So Fast
The observed file is a sparse file: its logical size is 1 TB + 4 KB, but only two 4 KB blocks contain actual data (total 8 KB). Unwritten regions do not allocate physical blocks.
When copying such a file,
cponly copies the allocated blocks, so the operation finishes quickly.
Key point: The file size stored in the inode is just a metadata attribute; actual disk usage depends on the number of allocated blocks.
Summary
File systems achieve efficient storage by:
Dividing the disk into fixed‑size blocks.
Using inodes to map a file to its blocks.
Allocating blocks lazily, allowing sparse files where logical size exceeds physical usage.
This three‑step approach explains why copying a seemingly huge file can be instantaneous.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.