Master SQL SELECT: 10 Steps to Understand Execution Order and Joins
This comprehensive guide explains how SQL's declarative nature, the mismatch between syntax and execution order, table references, various join types, derived tables, GROUP BY behavior, and key keywords like DISTINCT, UNION, ORDER BY, and OFFSET work together to help readers master SELECT statements.
This article, originally saved years ago, revisits fundamental concepts of the SQL SELECT statement, targeting readers who use SQL at work but lack deep understanding, those proficient in SQL yet unfamiliar with its logical execution, and instructors teaching SQL.
People who use SQL at work but do not fully understand it.
People who can use SQL fluently but do not grasp its logical syntax.
People who want to teach SQL to others.
1. SQL is a declarative language
Remember the concept of "declaration": SQL tells the database what result you want, not how to obtain it. The execution engine determines the steps to retrieve the data.
SELECT first_name, last_name FROM employees WHERE salary > 100000We care only about employees whose salary exceeds 100 000, not where the records come from.
The difficulty many feel comes from thinking in an imperative way—telling the computer step‑by‑step what to do.
2. SQL syntax order differs from execution order
The order in which SQL keywords appear is not the order in which they are executed.
<code>SELECT [DISTINCT]
FROM
WHERE
GROUP BY
HAVING
UNION
ORDER BY</code>Execution proceeds as:
<code>FROM
WHERE
GROUP BY
HAVING
SELECT
DISTINCT
UNION
ORDER BY</code>Key points:
FROM is the first step; the engine loads data from disk into a buffer.
SELECT is evaluated after most other clauses, which explains why aliases defined in SELECT cannot be used in WHERE.
Example illustrating alias misuse:
<code>SELECT A.x + A.y AS z FROM A WHERE z = 10
-- z cannot be used here because SELECT is evaluated last</code>To reuse an alias you must repeat the expression or use a derived table, CTE, or view.
3. The core of SQL is table references (FROM clause)
According to the SQL standard:
<code><from clause> ::= FROM <table reference> [ { , <table reference> } ... ]</code>The FROM clause produces a combined table (Cartesian product) of all referenced tables.
<code>FROM a, b</code>If table a has 3 columns and b has 5, the result has 8 columns. The Cartesian product of rows yields 3 × 5 = 15 rows when a has 3 rows and b has 5 rows.
4. Flexible table references make SQL powerful
JOIN is a special form of table reference, not part of SELECT.
<code><table reference> ::= <table name> | <derived table> | <joined table></code>Example:
<code>FROM a1 JOIN a2 ON a1.id = a2.id, b</code>This produces a combined table with columns from a1, a2, and b.
5. Prefer explicit JOINs over commas
Using commas to join tables can lead to confusing, hard‑to‑read queries and errors. Explicit JOINs keep related tables close together and make the intent clear.
<code>FROM a, b, c, d, e, f, g, h
WHERE a.a1 = b.bx
AND a.a2 = c.c1
AND d.d1 = b.bc</code>JOIN also distinguishes inner, left, right, and full outer joins.
6. Different join operations
SQL defines five basic join types:
<code>EQUI JOIN
SEMI JOIN
ANTI JOIN
CROSS JOIN
DIVISION</code>EQUI JOIN includes:
INNER JOIN (or simply JOIN)
OUTER JOIN (LEFT, RIGHT, FULL)
SEMI JOIN returns rows from the left side that have matching rows on the right side, often expressed with IN or EXISTS:
<code>FROM author
WHERE author.id IN (SELECT book.author_id FROM book)</code>ANTI JOIN is the opposite, using NOT IN or NOT EXISTS.
<code>FROM author
WHERE author.id NOT IN (SELECT book.author_id FROM book)</code>CROSS JOIN produces the Cartesian product of two tables.
<code>FROM author CROSS JOIN book</code>DIVISION is the inverse of JOIN and is rarely used in everyday queries.
7. Derived tables act like variables
A derived table is a subquery placed in the FROM clause:
<code>FROM (SELECT * FROM author) a</code>They can be given an alias and reused, similar to a temporary view.
<code>WITH a AS (
SELECT first_name, last_name, CURRENT_DATE - date_of_birth AS age
FROM author
)
SELECT * FROM a WHERE age > 10000;</code>8. GROUP BY operates on table references
Applying GROUP BY to a FROM result creates a new table reference that aggregates rows.
<code>SELECT A.x, A.y, SUM(A.z)
FROM A
GROUP BY A.x, A.y</code>Columns not included in GROUP BY must be aggregated. MySQL relaxes this rule, which can be confusing.
10. Simple keywords: DISTINCT, UNION, ORDER BY, OFFSET
Set operations:
DISTINCT – removes duplicate rows after projection.
UNION – concatenates two subqueries and removes duplicates.
UNION ALL – concatenates without removing duplicates.
EXCEPT – removes rows of the second query from the first.
INTERSECT – keeps only rows present in both queries.
Ordering operations:
ORDER BY is the final step of a query and determines the output order. OFFSET…FETCH (or LIMIT…OFFSET, TOP…START AT) controls pagination across different database systems.
Source: Translated from the original Chinese article originally published on the "Data and Cloud" public account.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.