Big Data 19 min read

Step-by-Step Guide: Integrating Presto with Velox on macOS (Build, Configure, and Run)

This article walks through the performance bottleneck of CPU in data analytics, introduces the Velox vectorized execution engine, and provides a detailed, zero‑to‑one tutorial for downloading Presto source, syncing Velox, fixing build paths, compiling both Java and C++ components, configuring CLion and IntelliJ, launching the servers, and executing SQL queries while noting stability concerns.

Past Memory Big Data
Past Memory Big Data
Past Memory Big Data
Step-by-Step Guide: Integrating Presto with Velox on macOS (Build, Configure, and Run)

Over the past decade, storage speeds have risen from 50 MB/s (HDD) to 16 GB/s (NVMe) and network speeds from 1 Gbps to 100 Gbps, but CPU clock rates have stagnated around 3 GHz, making CPU the main bottleneck for data analytics. To address this, many vectorized execution engines have been created, such as Photon, ClickHouse, Apache Doris, Intel Gazelle, and Facebook's Velox.

Using Velox with Presto

Velox is a unified execution engine written in C++ that can be integrated with many compute engines. Within Facebook, Velox is integrated with Presto (project name Prestissimo, open‑source), Spark (project Spruce, not open‑source), and other systems. Because Presto is Java‑based and Velox is C++‑based, direct calls are impossible; Facebook created the Prestissimo project to provide a C++ implementation of Presto's HTTP REST interface, handling worker‑worker serialization, coordinator‑worker orchestration, and status endpoints. Prestissimo receives a Presto plan fragment from the Java coordinator, converts it to a Velox plan, and executes it.

Code Download and Compilation

Note: The steps below are demonstrated on an Apple M1 Pro running macOS Monterey; other platforms may differ.

Download the Presto source code:

git remote add upstream https://github.com/prestodb/presto.git
git fetch upstream
git checkout upstream/master
cd presto
./mvnw clean install -DskipTests -T12

Sync the Velox submodule:

cd presto
make -C presto-native-execution submodules
git submodule sync --recursive
git submodule update --init --recursive
# The output shows Velox submodule checked out at commit 2c7eea574d3d7c3d3307528b08c67a77f4636f99

Initialize dependencies (install fizz, thrift, antlr, glog, etc.) by running the provided script:

cd presto-native-execution
sudo chown -R $(whoami) /usr/local/{bin,lib,sbin}
chmod u+w /usr/local/{bin,lib,sbin}
./scripts/setup-macos.sh

Compile Velox:

cd velox
make debug

The build creates _build/debug with the compiled libraries.

Compile Prestissimo (the C++ Presto server):

cd presto-native-execution
make debug

The build initially fails because the Thrift headers are not found.

Fix the missing include path by adding /usr/local/include to the CMake include directories (edit presto-native-execution/CMakeLists.txt): include_directories(SYSTEM /usr/local/include) Re‑run make debug and the compilation succeeds, producing presto_cpp/main/presto_server .

Launching Java and C++ PrestoServers

There are two ways to start the C++ PrestoServer:

Manually, via an IDE such as CLion or directly from the command line.

Automatically, when launching the Java PrestoServer.

Manual Launch with CLion

Open the presto-native-execution project in CLion, then set the following CMake options:

-DTREAT_WARNINGS_AS_ERRORS=1 -DENABLE_ALL_WARNINGS=1 -DCMAKE_PREFIX_PATH="/usr/local" -DPRESTO_ENABLE_PARQUET="OFF" -GNinja -DCMAKE_CXX_COMPILER_LAUNCHER=ccache -DVELOX_BUILD_TESTING=ON -DCMAKE_BUILD_TYPE=Debug

Set the build directory to _build/debug and apply the changes.

After reloading the CMake project, create a Run/Debug configuration for the presto_server target with program arguments:

--logtostderr=1 --v=1 --etc_dir=/path/to/presto-native-execution/etc

Set the working directory to the presto-native-execution root and run the configuration. The server logs “Announcement succeeded: 202”, indicating a successful start. Adjust http-server.http.port in etc/config.properties to run multiple servers.

Automatic Launch from Java

Create an IntelliJ Application Run/Debug configuration (e.g., HiveExternalWorkerQueryRunner) with:

Main class: com.facebook.presto.hive.HiveExternalWorkerQueryRunner VM options:

-ea -Xmx5G -XX:+ExitOnOutOfMemoryError -Duser.timezone=America/Bahia_Banderas -Dhive.security=legacy

Environment variables:

PRESTO_SERVER=/path/to/presto_cpp/main/presto_server;DATA_DIR=/path/to/data;WORKER_COUNT=2

Use classpath of the presto-native-execution module.

Running this configuration starts the Java PrestoServer and spawns the specified number of C++ workers.

Running SQL Queries

With both servers running, launch the Presto CLI:

presto-cli/target/presto-cli-*-executable.jar --catalog hive --schema tpch

Examples:

presto:tpch> show schemas;
presto:tpch> use tpch;
presto:tpch> show tables;
presto:tpch> select count(*) from customer;
presto:tpch> select count(*) from lineitem;

The query planning and execution plan generation happen in the Java server, while the actual execution runs on the C++ server. Users may observe occasional crashes of the C++ server during testing.

Observations

In practice, the C++ PrestoServer can be unstable, frequently throwing exceptions and terminating. Nevertheless, Velox shows promise as a reusable vectorized engine for Presto and other compute frameworks. Both Velox and Prestissimo are still evolving, and production‑ready stability may require more time.

To start the C++ server directly from the terminal: <code>/Users/iteblog/data/code/apache/presto/presto-native-execution/_build/debug/presto_cpp/main/presto_server --logtostderr=1 --v=1 --etc_dir=/Users/iteblog/data/code/apache/presto/presto-native-execution/etc</code> Successful startup is indicated by the log line “Announcement succeeded: 202”.
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

javaintegrationsqlmacosprestobig-dataveloxcppprestissimo
Past Memory Big Data
Written by

Past Memory Big Data

A popular big-data architecture channel with over 100,000 developers. Publishes articles on Spark, Hadoop, Flink, Kafka and more. Visit the Past Memory Big Data blog at https://www.iteblog.com. Search "Past Memory" on Google or Baidu.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.