Databases 9 min read

Mastering SQL Performance: Indexes, Hashing, and Parallel Access

This article reviews advanced SQL optimization techniques—including using indexes to avoid sorting, reasons an optimizer may skip indexes, hash‑based access methods, and parallel query execution—while summarizing a four‑part series on DBMS‑agnostic performance tuning.

360 Zhihui Cloud Developer

Jul 16, 2019

Mastering SQL Performance: Indexes, Hashing, and Parallel Access

We briefly overview additional aspects of SQL optimization not covered in earlier articles, summarizing the four‑part series on SQL performance.

1. Use Indexes to Avoid Sorting

Sorting can be costly; DBMS may need to sort for certain queries. By creating indexes on columns used for sorting, the optimizer can use them to avoid sorting. Sorting may be required by clauses such as DISTINCT, UNION, GROUP BY, and ORDER BY.

DISTINCT requires ordered results to eliminate duplicates.

UNION (and INTERSECT, EXCEPT) requires sorting to remove duplicates.

GROUP BY needs ordering for aggregation.

ORDER BY explicitly requests sorted output.

Example query:

SELECT last_name, first_name, middle_initial, empno, position
FROM employee
WHERE position in ('MANAGER','DIRECTOR','VICE PRESIDENT')
ORDER BY last_name;

If an index exists on last_name, the query can use it to avoid sorting, reducing CPU and I/O costs. Whether using an index is faster depends on row count, sort speed, and index characteristics.

When using UNION, consider UNION ALL to skip duplicate elimination if duplicates are not a concern.

2. Why the Optimizer Might Not Choose an Index

Various reasons can cause the optimizer to avoid an index, such as lack of search predicates, joining many tables, outdated statistics, very small tables, stored procedures with cached plans, or missing suitable predicates.

Some tables may legitimately have no indexes, e.g., when every row is always scanned or the table is tiny with no primary key.

3. Hash Access

Some DBMS support hash access, where a hash function maps a key to a storage location, allowing direct retrieval with typically one I/O. Hashing is useful for small data sets and random I/O but is uncommon in relational systems.

4. Parallel Access

The optimizer can choose parallel execution, employing I/O parallelism, CPU parallelism, or system parallelism to run multiple concurrent tasks, improving performance for I/O‑bound or CPU‑bound queries. Parallelism is not a cure‑all; the optimizer also prepares fallback single‑stream plans.

Summary

This series provides a DBMS‑agnostic introduction to SQL performance and optimization, emphasizing the importance of proper indexes, access methods, and parallel execution to achieve efficient query plans.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization Indexes Hashing Parallel Query

Written by

360 Zhihui Cloud Developer

360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.