Understanding Git Internals: Objects, Trees, Commits, Tags, and Packfiles
This article explains how Git stores data by describing the .git directory layout, the four object types (blob, tree, commit, tag), how objects are hashed and organized, and how Git packs objects to save space, answering why a second commit stores a full file.
Git is a powerful tool used daily in software development, and this article explores its internal mechanisms.
When a file is modified and committed, Git does not store a simple diff; it records the complete file as a blob object. The workflow starts with the Working Directory, then git add moves changes to the index (staging area), and git commit creates a commit that records the tree, author, and message.
Running git init creates a .git directory containing configuration, hooks, and, most importantly, the objects , refs , HEAD , and index files. The objects directory stores data using a two‑level hash‑based path, e.g. .git/objects/ab/cdef... , to avoid filesystem limits on the number of files per directory.
Git defines four object types:
Blob : stores file contents only. Example: $ echo 'version 1' | git hash-object -w --stdin 83baae61804e65cc73a7201a7252750c76066a30
Tree : records directory structure, file names, modes, and links to blob objects. Created with git write-tree after staging files.
Commit : points to a tree and includes author, timestamp, and message. Created with git commit-tree or the higher‑level git commit .
Tag : a named reference to a commit, created with git tag -a v1.0.0 -m "test tag" .
Each object can be inspected with git cat-file (e.g., git cat-file -p <hash> to view contents, git cat-file -t <hash> to view type).
Git initially stores objects as loose files. When many objects accumulate, or when git gc is run, Git packs them into a binary packfile ( .git/objects/pack/pack‑*.pack ) along with an index file. Packing reduces disk usage by delta‑compressing similar objects; the newest version is stored fully for fast access, while older versions are stored as deltas.
Running git verify-pack -v .git/objects/pack/pack‑*.idx shows the size of each packed object and its delta relationships.
In summary, a second commit of a modified file stores the complete new file as a new blob; Git may later pack this blob together with others, keeping the original version as a delta to save space.
Xueersi Online School Tech Team
The Xueersi Online School Tech Team, dedicated to innovating and promoting internet education technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.