Databases 7 min read

Cross‑Database JOIN and Distributed Query Planning in DBLE Middleware

This tutorial explains how DBLE splits tables using different rules, demonstrates cross‑database INNER, LEFT, and RIGHT JOINs with jump‑hash sharding, and details the distributed query plan and sorting optimization that reduces join complexity in a middleware environment.

Aikesheng Open Source Community

Dec 9, 2019

Cross‑Database JOIN and Distributed Query Planning in DBLE Middleware

In the previous episode we introduced complex query capabilities; this episode continues by explaining tables that use different sharding rules.

We have four nodes that are split by modulo; the table tb_mod contains five rows. If values 6 and 7 existed, they would be distributed as shown. After applying the jump‑hash algorithm, rows 1, 2, 3 reside on dn2 and rows 4, 6 on dn1.

When we need to JOIN tb_mod with the jump‑hash table, sending the JOIN directly would be incorrect because only dn2's row 1 and dn1's row 4 can match; rows 2 and 3 would be dropped. This leads us to explore cross‑database JOINs.

Cross‑Database JOIN

We demonstrate three examples: INNER JOIN, LEFT JOIN, and RIGHT JOIN. Since the id and code fields are identical in the jump‑hash table, we use id for the join condition.

Executing the queries returns rows 1, 2, 3, and 4. The LEFT JOIN behaves as expected, returning left‑side rows even when the right side has no match; the RIGHT JOIN works similarly. Adding EXPLAIN shows the distributed query plan.

Distributed Query Plan

To keep the terminal output readable, we simplify the result by selecting only necessary columns and shorten the horizontal width. Adding \G rotates the output for easier viewing.

The plan initially shows 12 rows; the first four rows are identical except for the datanode column, indicating that the query was dispatched to the four nodes. After the nodes return results, the middleware merges and sorts them. Sorting is explained later.

Subsequently, the second table (jump‑hash) is queried on two nodes, and their results are also merged. The middleware distinguishes nodes using suffixes (e.g., dn2_1, dn2_2). After a shuffle phase, the middleware performs the actual JOIN by merging the shuffled result sets. The third shuffle is merely a forwarding step, after which the final JOIN result is produced.

This example shows that the JOIN is performed entirely within the DBLE middleware: each node collects and sorts its data, then the middleware merges and joins the datasets.

Sorting Implementation

The reason for sorting becomes clear: without sorting, the middleware would need to perform a Cartesian product between the left and right tables. By sorting each side, the join can be executed as a linear merge, comparing only the leading rows. If the leading rows do not match, the left row is discarded, dramatically reducing the number of comparisons.

This optimization reduces the time complexity from O(N × M) (Cartesian product) to O(N + M), at the cost of additional sorting work on the MySQL side. The goal is to push as much computation as possible to the underlying databases, keeping the middleware lightweight.

That concludes today's introduction.

DBLE and related project code repositories:

https://github.com/actiontech/dble

https://github.com/actiontech/dble-docs-cn

https://github.com/actiontech/dble-test-suite

Course inquiries:

WeChat public account: ActiontechOSS

Official technical group (ID: 669663113)

Click “Read the original article” to get the latest lessons in the “DBLE Quick Start” series!

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

sql Query Planning DBLE jump hash distributed join

Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.