Understanding Variable-Length Data Structures and Storage Costs in PostgreSQL
This article explains how variable‑length data structures are stored in databases, illustrates PostgreSQL's page and row formats, and compares the storage overhead of DEFAULT NULL versus DEFAULT '' fields, showing that NULL can reduce space dramatically when many columns are nullable.
In database systems, variable‑length data is commonly stored using a structure that contains a length field followed by a flexible array, for example:
struct { int len; char data[1]; }Without alignment this structure occupies 5 bytes, while with typical alignment it occupies 8 bytes.
Even when no actual data is stored, the length field still requires four bytes on disk, which can be visualized as:
|00|00|00|00|PostgreSQL (and many other databases) stores data files as a series of pages. Each page begins with a header, followed by a line‑pointer array and other auxiliary information, and then contains one or more rows. A simplified page layout looks like:
|.......20‑byte page header.......|
|...line‑pointer array and other info...|
|.....row......|
|.....row......|Each row has an internal format consisting of three parts: a row descriptor, a null‑bitmap, and the actual field data.
Row descriptor
Null bitmap
Actual field data
The row descriptor records the number of fields and their lengths. The null‑bitmap is a bit array that indicates which fields are NULL; a bit value of 1 means the corresponding field is NULL, otherwise it is not.
For example, if a row has 30 fields and some of them are NULL, a 32‑bit bitmap is stored at the beginning of the row, costing at most field count / 8 bytes, with a theoretical minimum of a single bit.
Cost comparison:
DEFAULT NULL : For a table with 32 columns that frequently contain NULLs, each row may incur up to 4 bytes of overhead to store the null‑bitmap.
DEFAULT '' : Each empty string literal requires 4 bytes, so a table with N columns incurs 4 × N bytes per row.
In summary, using DEFAULT NULL can reduce storage by roughly a factor of 32 compared with DEFAULT '' when many columns are nullable, and the savings increase proportionally with the number of columns.
Author: He Weiping, Search Technology Researcher and Database Researcher at Qunar, translator of the first Chinese PostgreSQL manual and contributor to Programming Perl (3rd edition).
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.