Big Data 17 min read

Understanding Join Algorithms in Presto: Theory, Implementation, and Engineering Practices

The article explains Presto’s join processing by detailing the business need to limit multi‑table joins, then describing nested‑loop, sort‑merge, and hash join algorithms with Java examples, and finally showing how the Volcano model, columnar pages, and planner integration enable scalable, efficient OLAP join execution.

vivo Internet Technology

Apr 13, 2022

Understanding Join Algorithms in Presto: Theory, Implementation, and Engineering Practices

In the series "Exploring the Presto SQL Engine", this second article focuses on the principles and implementation of Join operations in Presto. It starts by describing the business need to limit multi‑table joins for performance reasons, citing Alibaba’s development guidelines that forbid joins on more than three tables and require indexed join columns.

The article then explains the fundamental semantics of Join in relational databases and introduces the Cartesian product (Cross Join) as the simplest case. A concrete Java example demonstrates how a nested loop produces the Cartesian product:

List<Tuple> r = newArrayList(
        new Tuple(newArrayList(1,"a")),
        new Tuple(newArrayList(2,"b")));
List<Tuple> s = newArrayList(
        new Tuple(newArrayList(3,"c")),
        new Tuple(newArrayList(4,"d")));
int cnt = 0;
for (Tuple ri : r) {
    for (Tuple si : s) {
        Tuple c = new Tuple().merge(ri).merge(si);
        System.out.println(++cnt + ": " + c);
    }
}
/**
 * out:
 * 1: [1, a, 3, c]
 * 2: [1, a, 4, d]
 * 3: [2, b, 3, c]
 * 4: [2, b, 4, d]
 */

From this simple nested loop, the article derives the Nested Loop Join algorithm, highlighting its O(m·n) complexity and discussing practical optimizations such as driving the join with the smaller table and using block‑nested loops to reduce I/O.

Next, the Sort‑Merge Join is presented. By sorting both inputs first, the algorithm reduces the comparison count to O(m log m + n log n + m + n), which can be further lowered to O(m + n) when indexes guarantee ordered data. Visual diagrams illustrate the two‑phase process (Sort then Merge).

The article then covers the Hash Join . It explains why, in large‑scale data warehouses, maintaining sorted data is costly, and how hashing the smaller table enables fast equality lookups with O(m + n) complexity. A Java snippet shows how to build a Page in Presto using column‑oriented storage:

import com.facebook.presto.common.Page;
import com.facebook.presto.common.PageBuilder;
import com.facebook.presto.common.block.Block;
import com.facebook.presto.common.block.BlockBuilder;
import com.facebook.presto.common.type.BigintType;
import com.facebook.presto.common.type.Type;
import com.facebook.presto.common.type.VarcharType;
import com.google.common.collect.Lists;
import io.airlift.slice.Slice;
import java.util.List;
import static io.airlift.slice.Slices.utf8Slice;

public class PageBlockDemo {
    private static Page buildPage(List<Type> types, List<Object[]> dataSet) {
        PageBuilder pageBuilder = new PageBuilder(types);
        for (Object[] row : dataSet) {
            pageBuilder.declarePosition();
            for (int column = 0; column < types.size(); column++) {
                BlockBuilder out = pageBuilder.getBlockBuilder(column);
                Object colVal = row[column];
                if (colVal == null) {
                    out.appendNull();
                } else {
                    Type type = types.get(column);
                    Class<?> javaType = type.getJavaType();
                    if (javaType == long.class) {
                        type.writeLong(out, (long) colVal);
                    } else if (javaType == Slice.class) {
                        type.writeSlice(out, utf8Slice((String) colVal));
                    } else {
                        throw new UnsupportedOperationException("not implemented");
                    }
                }
            }
        }
        Page page = pageBuilder.build();
        pageBuilder.reset();
        return page;
    }
    private static void readColumn(List<Type> types, Page page) {
        for (int column = 0; column < types.size(); column++) {
            Block block = page.getBlock(column);
            Type type = types.get(column);
            Class<?> javaType = type.getJavaType();
            System.out.print("column[" + type.getDisplayName() + "]>>");
            List<Object> colList = Lists.newArrayList();
            for (int pos = 0; pos < block.getPositionCount(); pos++) {
                if (javaType == long.class) {
                    colList.add(block.getLong(pos));
                } else if (javaType == Slice.class) {
                    colList.add(block.getSlice(pos, 0, block.getSliceLength(pos)).toStringUtf8());
                } else {
                    throw new UnsupportedOperationException("not implemented");
                }
            }
            System.out.println(colList);
        }
    }
    public static void main(String[] args) {
        List<Type> types = Lists.newArrayList(BigintType.BIGINT, VarcharType.VARCHAR);
        List<Object[]> dataSet = Lists.newArrayList(
                new Object[]{1L, "aa"},
                new Object[]{2L, "ba"},
                new Object[]{3L, "cc"},
                new Object[]{4L, "dd"});
        Page page = buildPage(types, dataSet);
        readColumn(types, page);
    }
}
// column[bigint]>>[1, 2, 3, 4]
// column[varchar]>>[aa, ba, cc, dd]

The discussion then introduces the Volcano Model , an extensible and parallel query evaluation architecture originally described by Goetz Graefe. It emphasizes operator separation (open‑next‑close iterator pattern) and dynamic pipeline assembly. Diagrams illustrate how Scan, Select, and Project operators are combined, and how Presto adapts the model with column‑oriented Pages and vectorized execution.

Subsequent sections describe the engineering prerequisites for implementing Join in Presto, such as parsing Join syntax with ANTLR, binding tables and columns, choosing the appropriate Join algorithm (Hash, Sort‑Merge, or Nested Loop), and constructing the execution pipeline (Parser → Binding → Planner → Executor). A code excerpt shows how Presto’s NestedLoopPageBuilder builds the Cartesian product of two Pages using Run‑Length‑Encoded blocks.

Finally, the article summarizes the key takeaways: understanding Join fundamentals, the Volcano execution framework, columnar storage, and the practical considerations for scaling Join operations in OLAP workloads. It points to future topics like multi‑table Join ordering, semi‑join optimizations, and handling data skew.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data SQL Hash Join Join Algorithms Volcano Model

Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.