Performance Impact of Using TIMESTAMP vs VARCHAR for Date‑Time Data in Large Tables
Replacing a VARCHAR column that stores timestamp strings with a native TIMESTAMP type in a massive sorting table reduced query time from over 20 minutes to 39 seconds, demonstrating a 30‑plus‑fold speedup due to smaller storage, faster CPU comparisons, and reduced I/O.
A large table used for product ranking grew from millions to tens of millions of rows, causing the basic sorting query to expand from under ten minutes to more than forty minutes. After analyzing the schema, it was found that the timestamp column was stored as a VARCHAR string despite having an index.
By altering the column to a native TIMESTAMP type, the query time dropped dramatically from over twenty minutes (≈1200 s) to just 39 seconds.
According to the ISO standard, a timestamp string such as '2013-04-07 12:33:20.000+08' occupies 26 bytes. Databases can store timestamps as integers or floating‑point numbers: a 4‑byte integer for a timezone‑less, millisecond‑less timestamp, or an 8‑byte value for a full‑precision timestamp with timezone. In contrast, a VARCHAR representation requires 4 bytes for length plus the actual data length, resulting in roughly 23–30 bytes per value. This means VARCHAR storage is 5–7 times larger than native timestamp storage.
From a CPU perspective, comparing an 8‑byte integer involves a few memory accesses and a single register comparison. Comparing a 23‑ or 30‑byte string requires multiple 4‑byte chunks, effectively doubling the number of memory accesses and leading to 12–16 times slower comparisons.
On the I/O side, ten million rows of 8‑byte timestamps occupy about 80 MB, while the same number of 23‑byte or 30‑byte strings occupy roughly 230 MB and 300 MB respectively. This translates to three to four times more disk blocks (≈160 k vs 460 k‑600 k blocks of 512 bytes), further degrading performance.
Summing the storage, CPU, and I/O factors yields an expected speedup of 36–64×, which aligns closely with the observed reduction from 20 minutes to 39 seconds (≈30×). The conclusion is clear: whenever a native data type exists, it should be used instead of storing equivalent data as strings.
Author: He Weiping, Search and Database Researcher at Qunar, contributor to the Chinese PostgreSQL manual and Programming Perl translation.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.