Databases 9 min read

How CDW PG Optimizer Finds Optimal Join Order for Multi-Table Queries

CDW PG’s optimizer determines the most efficient join order for multi‑table OLAP queries by combining bottom‑up dynamic programming for smaller joins with a genetic algorithm for larger ones, while jointly selecting scan paths, join algorithms, and data‑distribution strategies to minimize execution cost.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
How CDW PG Optimizer Finds Optimal Join Order for Multi-Table Queries

This article introduces how CDW PG, Tencent's self-developed next-generation distributed database, uses MPP cluster architecture to achieve industry-leading data analysis query processing capabilities for PB-level massive data OLAP scenarios.

In OLAP scenarios, multi-table join queries are one of the main query types. CDW PG supports various join types including left join, right join, inner join, and full join. For left join and right join, the join order is fixed, so there are relatively fewer path options. However, for inner join, the join order can be swapped, and different join orders may produce huge performance differences.

The article explores how CDW PG's optimizer finds the optimal join order path when the number of tables in join queries keeps increasing, thereby generating an efficient query plan.

In database optimization, the optimizer faces the problem of selecting a good scan path from all possibilities. For single-table queries, we usually only need to choose the scan path with the smaller cost. However, in multi-table join scenarios, besides determining the scan path for each table, we also need to determine the join algorithm and join order used.

CDW PG supports three table join algorithms: Hashjoin, Mergejoin, and Nestloop. Typically, the scan path, join algorithm, and join order influence each other. The article provides examples of three-table joins to illustrate different possible plans.

CDW PG's optimizer uses two methods: bottom-up dynamic programming and random genetic algorithms, depending on the number of joined tables. The dynamic programming algorithm first uses subproblem solutions repeatedly to reduce calculations and lower problem complexity, then constructs the final problem's optimal solution from subproblem optimal solutions.

The genetic algorithm implementation includes initializing populations, selecting chromosomes, crossover and mutation operations, and eliminating chromosomes. In CDW PG, users can set GUC parameters enable_geqo to choose whether to enable genetic algorithms, and set geqo_threshold to choose when to use genetic algorithms based on the number of joined tables.

CDW PG adopts MPP architecture with two types of data distribution: Shard distribution and Replication distribution. The article explains how different distribution types affect join query performance and provides suggestions for data distribution selection to improve join query performance.

Query OptimizationDistributed Databasegenetic algorithmdynamic programmingCDW PGJoin OptimizationMPP architecture
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.