Databases 23 min read

Core Technologies of OLAP Systems: Storage, Computation, Optimizer, and Emerging Trends

This article systematically examines the core technologies of OLAP systems, covering storage models, columnar formats, indexing, distributed storage architectures, query execution steps, optimizer designs, and emerging trends such as real‑time analytics, HTAP, cloud‑native deployment, and hardware acceleration.

Big Data Technology Architecture

Sep 30, 2020

OLAP systems are widely used in BI, reporting, ad‑hoc analysis, and ETL data‑warehouse scenarios. This article analyzes OLAP from a systematic perspective, extracting commonalities across industry implementations and organizing the discussion into four chapters: storage, computation, optimizer, and trends.

Storage

Row‑oriented storage follows the N‑ary Storage Model (NSM) and is suitable for OLTP workloads (e.g., InnoDB B+‑tree clustered indexes). Column‑oriented storage evolved from the Decomposition Storage Model (DSM) and PAX, becoming popular after the C‑Store paper for OLAP workloads because it can read only required columns, achieve higher compression, and reduce I/O.

For primitive types, column stores apply encodings such as bitmap, varint, delta, RLE, and dictionary encoding, often combined with lightweight CPU‑intensive compression (e.g., LZ4, Snappy, ZSTD). Complex nested types use the Dremel Striping/Assembly algorithm (as in Parquet) with definition and repetition levels. Modern OLAP formats like Parquet and ORC use a Data Block + Header/Footer layout: RowGroup → ColumnChunk → Page, each with metadata (min/max, statistics, dictionaries) that enable column pruning and optimizer hints.

Indexing varies by format. Parquet/ORC provide only column metadata, while systems such as Alibaba ADB add bitmap, inverted, or Bloom‑filter indexes at the page level. Ordered columns enable sparse B‑Tree or Masstree indexes, allowing point queries and range scans.

Distributed Storage

To overcome single‑node limits, files are sharded across machines using strategies like round‑robin, constant, random, range, or hash. Two main architectures exist:

Storage‑compute integrated (share‑nothing) engines such as ClickHouse, Baidu Palo/Doris, and Greenplum, which store data locally on SSD/NVM and replicate for fault tolerance.

Storage‑compute separated architectures that keep data in distributed file systems (GFS, HDFS) or object stores (S3, OSS, GCS) and run compute engines (Spark SQL, Presto, Impala) that read these files.

Example sharding by UID range:

shard1              shard2
+---------------+  +---------------+
|uid|   date    |  |uid|   date    |
+---------------+  +---------------+
| 1 | 2020-11-11|  | 3 | 2020-11-13|
| 2 | 2020-11-12|  | 4 | 2020-11-14|
+---------------+  +---------------+

Example sharding by UID hash (uid mod 2):

shard1              shard2
+---------------+  +---------------+
|uid|   date    |  |uid|   date    |
+---------------+  +---------------+
| 1 | 2020-11-11|  | 2 | 2020-11-12|
| 3 | 2020-11-13|  | 4 | 2020-11-14|
+---------------+  +---------------+

Further two‑level partitioning (hash then range) can produce four files, improving partition‑prune efficiency.

Computation

SQL is the de‑facto language for OLAP. A full query goes through lexical and syntactic parsing, AST generation, validation, conversion to relational algebra, logical plan creation, optimizer‑driven cost‑based rewriting, physical plan generation, and finally execution.

Execution engines fall into two streams:

Offline batch engines (Hive on MapReduce, Spark SQL, MaxCompute) that provide fault tolerance and can handle massive data but have high latency.

MPP engines (Greenplum, Presto, Impala, Alibaba ADB, Redshift) that schedule short tasks without heavyweight resource managers, offering low‑latency interactive queries.

Physical plans are DAGs of operators; the leaf is typically a TableScan. Execution models include the Volcano iterator model (open/next/close) and vectorized execution, which processes batches of rows to improve CPU pipeline utilization and enable SIMD instructions. Dynamic code generation (e.g., Java ASM, C++ LLVM) can further reduce interpreter overhead.

Optimizer

Optimizers transform relational algebra trees using rule‑based (RBO) or cost‑based (CBO) approaches. RBO applies heuristics such as partition pruning, column pruning, predicate/project push‑down, aggregation push‑down, limit/sort push‑down, constant folding, and sub‑query flattening. CBO explores alternative equivalent trees using statistics (row count, NDV, histograms, index availability) and selects the plan with the lowest estimated cost. Popular frameworks include Apache Calcite (Volcano model) and Orca (Cascades model).

Trends

Real‑time analytics : Hybrid serving/analytical processing (HSAP) integrates streaming writes directly into OLAP tables, eliminating data duplication and reducing latency to seconds.

HTAP : Combining transactional (OLTP) and analytical (OLAP) workloads in a single system, often via multi‑replica designs where one replica is optimized for analytics.

Cloud‑native : Disaggregated storage‑compute architectures enable elastic scaling, lower cost, and easier deployment on public clouds.

Hardware acceleration : Leveraging NVM, RDMA, FPGA, GPU, and SIMD to boost I/O bandwidth, reduce latency, and accelerate vectorized query execution.

Multi‑modal data : Extending OLAP to semi‑structured (JSON, ARRAY) and unstructured (vector) data, supporting richer analytics.

The article concludes with references to seminal papers and open‑source projects that underpin these technologies.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed Systems OLAP Columnar Storage Query Optimizer

Written by

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.