Fundamentals 12 min read

Understanding Sparse Files and Why cp Can Copy a 100 GB File Instantly on Linux

This article explains how Linux file systems represent file size versus actual disk usage, demonstrates the difference between Size and Blocks using du and stat, describes inode and multi‑level block indexing, and shows why copying a sparse 100 GB file with cp finishes in a fraction of a second.

Top Architect
Top Architect
Top Architect
Understanding Sparse Files and Why cp Can Copy a 100 GB File Instantly on Linux

cp Triggered a Thought

A colleague copied a 100 GB file with cp and it finished in less than a second, which contradicts the expected transfer time on a SATA disk (about 11 minutes for 100 GB at 150 MB/s).

Analyzing the File

Running du -sh ./test.txt reported only 2 M, while stat ./test.txt showed a Size of 107374182400 bytes (100 GB) and Blocks of 4096 (2 M). The key points are:

Size is the logical file size visible to users.

Blocks indicate the actual physical space allocated on disk.

File System Basics

A file system is a container for data, similar to a luggage storage service: the file name is the label, metadata is the tag, the file itself is the luggage, and the disk is the storage room.

Space Management

Data is stored in fixed‑size blocks (typically 4 KB). Directly storing a whole file in a contiguous area works for tiny files but wastes space for larger ones. Therefore, the file system splits data into blocks and records their locations in an inode.

inode / block Concept

An inode contains metadata and an array of block pointers. In ext2 the array has 15 entries: the first 12 are direct pointers, the 13th is a single‑indirect pointer, the 14th a double‑indirect pointer, and the 15th a triple‑indirect pointer.

Direct pointers can address up to 12 × 4 KB = 48 KB. A single‑indirect block can address 1024 × 4 KB ≈ 4 MB. Double‑indirect adds another factor of 1024, reaching ~4 GB, and triple‑indirect reaches ~4 TB. Thus the maximum file size supported by this scheme is roughly 4 TB.

Why cp Is So Fast

The demonstrated file is a sparse file: its logical size is 100 GB, but only two 4 KB blocks contain real data (at offsets 0 and 1 TB). The filesystem allocates blocks only for the regions that contain data, so the physical usage is only 8 KB.

When cp copies a sparse file, it copies the metadata (size) and the allocated blocks, leaving the empty regions unallocated, which makes the copy complete almost instantly.

Summary

Linux file systems achieve this efficiency by:

Dividing the disk into fixed‑size blocks.

Using inodes to map a file’s logical blocks to physical blocks.

Allocating blocks lazily, only when data is actually written.

This three‑step approach explains the discrepancy between reported file size and actual disk usage, and why copying a sparse file with cp can be dramatically faster than copying a fully allocated file.

linuxfile systemInodeblock indexingsparse filecp command
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.