Databases 17 min read

Optimizing Database Expression Evaluation with JIT Technology Using Gandiva

The article explains how database expression evaluation—especially in WHERE and SELECT clauses—can be dramatically accelerated by replacing interpreted AST traversal with Just‑In‑Time compilation using Apache Gandiva, which leverages LLVM to generate SIMD‑optimized machine code for Arrow columnar data, and discusses extensions such as timestamp, array, higher‑order functions, and UDF support.

Sohu Tech Products
Sohu Tech Products
Sohu Tech Products
Optimizing Database Expression Evaluation with JIT Technology Using Gandiva

This article shares how to efficiently optimize database expression evaluation using JIT (Just-In-Time) compilation technology.

The main content includes:

1. What is expression evaluation problem and how to evaluate expressions in databases

2. Basic concepts of JIT compilation technology and why JIT is needed

3. Gandiva expression compiler - how to use Gandiva's JIT compilation technology to accelerate computation

4. Q&A

Expression Evaluation Problem: In database queries, expression evaluation occurs during filtering (WHERE clauses) and projection (SELECT clauses). The traditional approach converts expressions into Abstract Syntax Trees (AST) and uses interpreted execution with deep traversal. However, this approach has problems: excessive virtual function calls causing CPU branch prediction failures, dynamic type recognition overhead during computation, and recursive function calls interrupting execution flow.

Three Approaches to Database Expression Evaluation: The first approach adds vectorization optimizations to interpreted execution. The second uses virtual machines for bytecode optimization. The third is JIT compilation, which converts AST to intermediate bytecode and then to machine code at runtime.

Gandiva JIT Implementation: Apache Gandiva is an expression compiler built on the LLVM compiler framework, optimized for Apache Arrow columnar memory format. The execution flow: Expression Tree → Gandiva Expression Compiler generates LLVM IR → JIT compiler converts to machine code → Execution on Apache Arrow Record Batches.

Gandiva provides over 100 built-in functions for arithmetic operators, boolean operators, and supports SIMD/AVX vectorization. The speaker's team enhanced Gandiva with timestamp support, 20+ array-related functions, higher-order function support, and UDF registration mechanisms.

Query OptimizationDatabase OptimizationLLVMSIMD vectorizationApache ArrowApache GandivaExpression EvaluationJIT compilation
Sohu Tech Products
Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.