Understanding Data Indexes, B+Tree vs Hash Indexes, Table Partitioning, and MySQL Optimization Techniques
This article explains how ordered data indexes improve query efficiency, compares B+Tree and hash indexes, discusses table partitioning versus sharding, outlines MVCC read types, examines row‑level lock pros and cons, and provides practical MySQL optimization tips including key vs index differences and engine choices.
1. Why Using Data Indexes Improves Efficiency
Data indexes are stored in an ordered manner, allowing queries to locate data without traversing all index records. In the best case, index lookup follows a binary‑search pattern with a time complexity close to log₂(N).
2. Differences Between B+Tree Indexes and Hash Indexes
B+Tree is a balanced multi‑branch tree where the height difference between any leaf and the root is at most one; nodes on the same level are linked by pointers, providing an ordered structure.
Hash indexes apply a hash algorithm to convert a key into a hash value, enabling direct lookup without traversing a tree, but the structure is unordered.
3. Advantages of Hash Indexes
For equality queries, hash indexes offer absolute performance advantages, provided there are not many duplicate keys (duplicate keys cause hash collisions and degrade performance).
4. Scenarios Where Hash Indexes Are Not Suitable
Range queries are unsupported.
Index‑based sorting is unavailable.
Composite indexes cannot use the left‑most prefix rule.
5. What Is Table Partitioning?
Table partitioning splits a logical table into multiple physical partitions based on defined rules, while appearing as a single table to the application.
6. Difference Between Partitioning and Sharding
Partitioning keeps a single logical table with multiple physical parts; sharding creates multiple independent tables (e.g., splitting order data by month).
7. Benefits of Table Partitioning
Allows storage of larger data sets across multiple devices.
Improves query efficiency by scanning only relevant partitions; aggregation can be parallelized.
Facilitates maintenance tasks such as bulk deletion of entire partitions.
Helps avoid specific bottlenecks (e.g., InnoDB index mutex, ext3 inode lock contention).
8. MVCC Read Types
Snapshot read : reads a visible version of a record (possibly a historical version) without acquiring any lock.
Current read : reads the latest version of a record and acquires a lock to prevent concurrent modifications.
9. Pros and Cons of Row‑Level Locking
Pros:
Low lock‑conflict when many threads access different rows.
Rollback affects only a few changes.
Allows long‑duration locks on a single row.
Cons:
Consumes more memory than page‑ or table‑level locks.
When used on most of a table, it can be slower due to the overhead of acquiring many locks.
May degrade performance for large‑scale GROUP BY or full‑table scans.
10. MySQL Optimization Tips
Enable query cache to speed up repeated queries.
Use EXPLAIN to analyze query plans and index usage.
Apply LIMIT 1 when only a single row is needed to stop scanning early.
Create indexes on searchable fields.
Prefer ENUM over VARCHAR where appropriate.
Use prepared statements to improve performance and protect against SQL injection.
Consider vertical sharding and choose the proper storage engine.
11. Difference Between Key and Index
A key represents a physical database structure used for constraints (primary, unique, foreign) and also serves as an index. An index is a separate physical structure stored in its own tablespace, primarily to accelerate query performance (e.g., prefix or full‑text indexes).
12. MyISAM vs InnoDB
InnoDB supports transactions and foreign keys; MyISAM does not.
InnoDB uses clustered indexes with data stored together; MyISAM stores data and indexes separately.
InnoDB does not keep an exact row count, requiring full scans for SELECT COUNT(*) .
MyISAM supports full‑text indexes, while InnoDB does not (at least historically).
13. Database Table Creation Considerations
Field Naming and Configuration
Avoid unrelated fields; use consistent, meaningful naming conventions.
Avoid abbreviations and mixed‑case names; prefer snake_case.
Do not use reserved words as field names.
Maintain consistency between field names and data types.
Choose numeric types carefully and allocate sufficient length for text fields.
Special System Fields and Post‑Creation Advice
Add deletion markers (e.g., deleted_by, deleted_at).
Implement versioning mechanisms.
Table Structure Rationality
Handle multi‑value fields by normalizing into separate tables.
Separate large data fields (e.g., long text) into auxiliary tables.
Other Recommendations
Store large data fields in separate tables to improve performance.
Prefer VARCHAR over CHAR for variable‑length strings.
Define primary keys; avoid NULLable fields by setting sensible defaults.
Build indexes on unique and non‑null columns, but limit the number of indexes to balance insert/update overhead.
END
IT Xianyu
We share common IT technologies (Java, Web, SQL, etc.) and practical applications of emerging software development techniques. New articles are posted daily. Follow IT Xianyu to stay ahead in tech. The IT Xianyu series is being regularly updated.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.