Row Store vs Column Store in openGauss: Comparison, Advantages, Disadvantages, and Experimental Evaluation
This article explains the fundamental differences between row‑based and column‑based storage in databases, compares their read/write performance, presents experimental results on openGauss showing space and speed trade‑offs, and offers practical guidance on when to choose each storage model.
Many beginners learn databases using relational tables, which store data in a row‑oriented format (row store) where each record’s fields are placed together on disk.
Some databases also support column‑oriented storage (column store), which stores each column’s values contiguously on disk.
Storage Comparison
The differences are illustrated in the figure below.
In a row store, the attribute values of a record are stored together, followed by the next record’s values. In a column store, all values of a single attribute are stored consecutively, giving each column its own contiguous space.
Consider which model is more suitable for queries and which for updates.
Write‑Performance Comparison
1) Row store writes are performed in a single operation. Because the write is handled by the operating system’s file system, success or failure is guaranteed and data integrity is clear.
2) Column store must split a row into individual columns for storage, resulting in many more write operations and additional disk‑head movements, so write latency is higher. Consequently, row store has a clear advantage for writes.
3) Data modification is essentially a write operation, so row store also outperforms column store for updates.
Read‑Performance Comparison
1) Row store typically reads an entire row; when only a few columns are needed, extra columns are read and later filtered in memory, adding overhead.
2) Column store reads only the required column(s) or a contiguous segment, eliminating redundancy and making it especially efficient for projection queries.
3) Because each column contains homogeneous data types, parsing is straightforward and CPU‑efficient, unlike row store where mixed types require frequent type conversion.
4) Uniform column data enables effective compression; different columns can use different algorithms, improving I/O performance.
Advantages & Disadvantages
The choice of storage type is the first step in table design and should be driven by business requirements. Both models have pros and cons, and the appropriate one depends on the workload.
Row Store
Column Store
Advantages
Data kept together; INSERT/UPDATE are simple.
Only the needed columns are read during queries.
Projection is highly efficient.
Any column can serve as an index.
Disadvantages
When selecting a subset of columns, all data is still read.
Selected columns must be re‑assembled after reading.
INSERT/UPDATE are more complex.
Point queries are not well‑suited.
Applicable Scenarios
Point queries (few records, index‑based simple queries).
Workloads with frequent INSERT/UPDATE/DELETE.
Analytical queries (OLAP, data‑warehouse) where few columns are aggregated over many rows.
Ad‑hoc queries where filter conditions are uncertain and full‑table scans are needed.
Experimental Evaluation
openGauss supports hybrid storage; the storage orientation can be specified at table creation. The following experiment was performed on a Huawei Cloud server with openGauss Enterprise 3.0.0 and openEuler 20.03.
Two tables were created: custom1 (row store) and custom2 (column store). 500,000 rows were inserted into each.
# create table custom1 (id integer, name varchar2(20));
CREATE TABLE
# create table custom2 (id integer, name varchar2(20)) with (orientation = column);
CREATE TABLE
# insert into custom1 select n,'testtt'||n from generate_series(1,500000) n;
INSERT 0 500000
# insert into custom2 select * from custom1;
INSERT 0 500000Storage size comparison:
# \d+
List of relations
Schema | Name | Type | Owner | Size | Storage
--------+---------+-------+-------+--------+--------------------------------------
public | custom1 | table | omm | 24 MB | {orientation=row,compression=no}
public | custom2 | table | omm | 3104 kB| {orientation=column,compression=low}The column‑store table occupies roughly one‑seventh of the row‑store table’s space.
Insert latency for a single new record:
# explain analyze insert into custom1 values(1,'zhang3');
QUERY PLAN
... Total runtime: 0.135 ms
# explain analyze insert into custom2 values(1,'zhang3');
QUERY PLAN
... Total runtime: 0.207 msFinally, the test tables were dropped:
# drop table custom1;
DROP TABLE
# drop table custom2;
DROP TABLESelection Recommendations
Frequent updates → prefer row store.
Frequent small inserts → row store; bulk inserts → column store.
Wide tables with few queried columns → column store; narrow tables with most columns queried → row store.
If queries touch only a small fraction of columns (<50% of total), column store is advantageous.
Column store offers higher compression but consumes more CPU.
Precautions
Column stores have many restrictions: they do not support arrays, generated columns, global temporary tables, or foreign keys, and the supported data types are fewer than in row stores. Always consult the database documentation before using column‑oriented tables.
Interested readers can further test scenarios such as creating very wide tables or performing extensive UPDATE operations.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.