Big Data 15 min read

Inside Presto 2.0: The Native C++ Query Engine Explained

This article provides a detailed technical overview of Presto 2.0, the native C++ query engine built on the Velox library, covering its motivation, vectorized architecture, memory management, performance benchmarks from Meta and IBM, and deployment practices for large‑scale data warehouses.

Past Memory Big Data

Jun 27, 2024

Inside Presto 2.0: The Native C++ Query Engine Explained

Over the past three years engineers from Meta, Ahana (now IBM), Intel and ByteDance collaborated to create Velox, an advanced execution engine designed for composable data systems. The effort produced a C++‑based Presto worker, originally called Project Prestissimo and now named Presto 2.0.

Motivation

Presto 2.0 completely rewrites the Presto query execution engine, aiming for a 3‑4× boost in performance and scalability by moving from Java to modern C++. The shift to vectorized execution aligns with industry trends such as Databricks Photon and Apache DataFusion.

Vectorized Engine Features

The engine leverages SIMD, runtime optimizations, intelligent I/O prefetching and merging, and a custom memory manager that removes Java garbage‑collection pauses. IBM’s recent TPC‑DS benchmark results confirm the claimed performance gains.

Architecture

The native worker replaces the Java worker while exposing the same data and control APIs to the coordinator. A Presto cluster still consists of a coordinator, multiple workers and a Hive metastore. Apart from a few differences inside the worker node, the C++ cluster behaves like the Java cluster.

Integration with Velox

Presto Native delegates most query‑processing work to the Velox library. When a worker receives a plan fragment, it translates the Presto plan fragment into a Velox plan‑node tree. Velox then creates multiple operator pipelines (drivers). Each operator works on Velox vectors, which represent columns and can be encoded as Flat, Constant or Dictionary for optimal layout.

Operators adapt at runtime based on vector characteristics—for example, using array indexing for low‑cardinality group keys instead of a hash table, or reordering filter predicates according to selectivity—thereby improving efficiency and scalability.

Memory Management and Arbitration

Every operator owns a memory pool that tracks intermediate structures such as RowContainers and HashTables. The Velox memory manager monitors per‑operator and system‑wide usage. When an operator requests more memory than is available, arbitration may spill other operators’ state, temporarily slowing query progress but keeping the workload within memory limits. Priorities can be adjusted to give users control over arbitration behavior.

Deployment at Meta

Meta runs one of the world’s largest data warehouses on Presto Native. The engine supports interactive dashboards and short‑duration batch jobs (< 20 minutes). Deploying Prestissimo and Velox reduced hardware usage threefold and increased query speed by 1.5‑2×. Extensive testing—including fuzzers, Presto verification, data‑writer validation and shadow testing against live workloads—ensured stability, with wall‑clock time improving ~1.5× and CPU time 2‑3×.

Deployment at IBM

IBM integrates the native engine into its watsonx.data platform, which offers open‑format storage (Parquet, Iceberg) and multi‑engine support (Presto, Spark, Milvus). The engine includes a high‑performance Parquet/Iceberg reader, S3 support, JWT/TLS authentication and catalog integration. Early TPC‑DS benchmarks at 1 TB, 10 TB and 100 TB scale factors show encouraging performance gains.

Conclusion

The Presto C++ vectorized engine advances the Presto ecosystem by exploiting Velox’s composable components, delivering 2‑3× performance over the Java implementation, and gaining broad community support from Meta, IBM, Uber, ByteDance, Pinterest, Intel and others. Ongoing community meetings continue to drive design improvements and broader adoption.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data sql C++Data Warehouse presto Vectorized Execution velox TPC-DS

Written by

Past Memory Big Data

A popular big-data architecture channel with over 100,000 developers. Publishes articles on Spark, Hadoop, Flink, Kafka and more. Visit the Past Memory Big Data blog at https://www.iteblog.com. Search "Past Memory" on Google or Baidu.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.