Databases 23 min read

Eight Major Database Optimization Strategies for Backend Engineers

This article outlines eight comprehensive database optimization approaches—reducing data volume, using space for performance, selecting appropriate storage systems, and related techniques such as data archiving, intermediate tables, serialization, sharding, distributed caching, read‑write splitting, and CQRS—to help backend engineers systematically address common performance bottlenecks.

Code Ape Tech Column

Dec 26, 2023

Eight Major Database Optimization Strategies for Backend Engineers

Hello everyone, I am Chen.

Backend engineers across companies often encounter database performance problems as their first major headache. A systematic methodology can help quickly and accurately choose suitable optimization solutions for 80‑90% of performance issues.

Why is the database slow?

Nature of Slowness

Search time complexity

Search algorithm

Storage data structure

Data volume

Data partitioning

High load

CPU, disk busy

Regardless of relational or NoSQL databases, three main factors affect query performance: search time complexity, total data volume, and high load.

Search time complexity

Data volume

High load

The search time complexity is determined by two aspects: the search algorithm and the storage data structure.

Search algorithm

Storage data structure

For relational databases, the index structure is typically B+Tree with O(log n) complexity and row‑store storage, so optimization usually focuses on reducing data volume.

High load stems from high concurrency, complex queries, and insufficient server resources, often mitigated by clustering or data redundancy.

At which layer should we think about optimization?

The stack consists of four layers from top to bottom: hardware, storage system, storage structure, and concrete implementation. Optimizing lower layers (e.g., concrete implementation) is cheaper and yields immediate gains, while higher layers involve higher cost and lower cost‑effectiveness.

Therefore, prioritize concrete implementation (e.g., adding indexes). If no room remains, move up to storage structure (e.g., sharding, compression) and then to storage system (e.g., switching to NoSQL).

This article focuses on the middle two layers: storage structure and storage system.

Eight Solution Summary

The core of database optimization can be categorized into three principles: reduce data volume , use space to improve performance , and choose the right storage system . These correspond to the three root causes: data volume, high load, and search time complexity.

Solutions can bring short‑term benefits with low implementation cost but may incur technical debt, or long‑term benefits with higher upfront cost but better scalability.

Static data refers to low‑frequency changes and simple queries; dynamic data changes frequently and requires more complex filtering.

Reduce Data Volume

Four typical approaches: data serialization storage, data archiving, intermediate/result tables, and sharding (database‑and‑table partitioning).

Data Archiving

Move infrequently accessed data to historical tables; use OPTIMIZE TABLE in MySQL if space needs to be reclaimed, noting that it locks the table.

Intermediate (Result) Tables

Run batch jobs to materialize complex query results into a physical table, dramatically compressing data for reporting purposes.

Data Serialization Storage

Store non‑relational data as serialized blobs (e.g., JSON) to keep the row count low; suitable for data that does not require field‑level queries.

Sharding (Database & Table Partitioning)

Classic but costly approach; recommended only when other methods fail. Prefer NoSQL alternatives when possible.

Treat sharding as a last resort.

Prefer NoSQL for scalability and performance.

Use table sharding for large data volume, database sharding for high concurrency.

Plan for future scaling; avoid over‑engineering.

Splitting Methods

Two main split types: vertical (business‑oriented) and horizontal (technical‑oriented).

Vertical splitting reduces business coupling and can lower row size, improving I/O.

Horizontal splitting creates identical tables distributed by a sharding key, solving data‑volume issues.

Routing Methods

After horizontal splitting, queries must use the sharding key to locate the correct physical table. Common routing strategies are range, hash, and shard‑mapping tables.

Range : split by intervals (e.g., month); easy to manage but may cause data skew.

Hash : uniform distribution via modulo; fast but queries without the sharding key require full scans.

Shard‑mapping table : an auxiliary table maps alternative query fields to the sharding key, enabling indirect lookups.

Use Space to Improve Performance

Two typical solutions for high‑load scenarios: distributed caching and read‑write splitting (one‑master‑many‑slaves).

Distributed Caching

Caching layers include client cache, API service local cache, and distributed cache. The latter typically uses NoSQL key‑value stores such as Memcached or Redis.

Common cache strategies: Cache‑Aside, Read/Write‑Through, Write‑Back. Cache‑Aside is most widely used.

Key cautions: avoid overusing cache, beware of cache penetration (queries for non‑existent data) and cache breakdown (high concurrency on cold keys). Cache empty results with short TTL to mitigate breakdown.

One‑Master‑Many‑Slaves (Read‑Write Splitting)

Deploy multiple read‑only replicas to offload read traffic. Routing can be handled by code or middleware. This is an effective short‑term solution but incurs higher hardware cost due to full data duplication.

Choose the Appropriate Storage System

NoSQL types include key‑value, document, column, graph, and search engine. Different storage systems dictate search algorithms and data structures, addressing performance, concurrency, and scalability challenges of relational databases.

Examples: Elasticsearch uses inverted indexes for fast full‑text search; Redis hash provides O(1) lookups with in‑memory storage.

Two main patterns: CQRS and storage replacement.

CQRS

CQS (Command‑Query Separation) states that a method should either modify state or query it, not both. CQRS extends this by using a relational database for writes and a separate system (e.g., NoSQL) for reads, combining ACID guarantees with high‑performance queries.

Implementation difficulty lies in data synchronization between the two stores.

Data Synchronization Methods

Synchronization can be push‑based (CDC or domain events) or pull‑based (periodic polling). CDC captures changes at the database level; domain events are emitted by the application.

Replace (Select) Storage System

Understanding NoSQL strengths and weaknesses enables informed decisions when swapping storage. A migration strategy should include an intermediate version with data sync and feature toggles to ensure consistency before fully switching.

Conclusion

The eight solutions each fit specific scenarios; there is no one‑size‑fits‑all. Most involve data synchronization, which introduces consistency considerations, making them best suited for read‑heavy workloads or carefully managed write‑read patterns.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Backend Engineering Sharding Performance tuning Database Optimization CQRS

Written by

Code Ape Tech Column

Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.