Challenges and Strategies for Database Vertical and Horizontal Sharding
The article examines the increasing storage bottleneck in large-scale web systems, explains vertical and horizontal database sharding, discusses the technical difficulties they introduce—including join rewrites, transaction handling, and ORM integration—and proposes practical approaches to minimize service impact and manage distributed transactions.
The storage bottleneck is now entering deep water; if a website has reached the stage of vertical and horizontal database sharding, the technical difficulty challenges increase significantly.
First, let’s review the definitions of vertical and horizontal sharding.
Vertical sharding: distributing data of different business units in a single database into separate databases.
Horizontal sharding: splitting data of the same business unit across multiple databases according to certain rules.
Vertical sharding is coarse‑grained, moving whole tables to different databases; horizontal sharding is finer, splitting a single table across databases, making it technically more difficult. Simpler solutions should be preferred; complex techniques only when unavoidable.
The author, originally a Java engineer, describes their company’s simple DB modeling practices—no foreign keys, limited stored procedures, and a preference for Java programs for data synchronization—highlighting that databases serve primarily for storage, and as systems scale, storage demands dominate over computation.
Returning to sharding, what problems arise after horizontal sharding? Two main differences: join queries need changes, and certain CRUD operations that involve multiple tables lose transactional integrity, risking rollback failures.
Solution to the first issue is relatively simple; many methods exist. Method 1: When vertically sharding, list join queries involving the split table; rewrite weakly dependent joins, and split strongly dependent tables together. This may increase coupling and lead to redundant tables across databases.
Method 2: After sharding, rewrite affected join queries. Since reads are low‑cost, cross‑database joins can be replaced by multiple queries merged in memory; however, this approach requires extensive checking of all related SQL and carries risk.
Method 3: Both methods suffer from high coupling between DB and business logic; therefore, an ORM‑based decoupling is needed. Traditional ORM tools (Hibernate, MyBatis) target single databases and don’t solve sharding, so a custom cross‑database ORM must be built, focusing on query handling.
Two ways ORM can interact with the service layer: (1) Define a custom query language like HQL, which is hard to implement and often avoided; (2) Provide service‑layer methods that encapsulate specific DB operations, keeping service code stable despite DB changes. The second is preferred.
Vertical sharding also turns local transactions into distributed ones, making distributed transaction solutions complex. The author lists some industry approaches: XA protocol, CAP/BASE theory, and Paxos. He notes that Paxos underlies Zookeeper’s consistency mechanism.
Distributed transactions are essentially about ensuring atomicity and rollback; many problems can be solved at the business level rather than by heavyweight technical solutions.
In summary: (1) When storage is the bottleneck, limit database computation features; (2) Migrate necessary computation to the application layer; (3) Design a data‑access layer to decouple service from DB changes; (4) This layer plus sharded DB forms a distributed database solution, increasing development difficulty; (5) Handle distributed transactions case‑by‑case, avoiding unnecessary complexity.
Source: 51CTO (http://developer.51cto.com/art/201501/464238_2.htm)
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.