Practical Guide to Application and Database Splitting for Large‑Scale Systems
This article explains why monolithic systems need to be split, how to assess business complexity, define service boundaries, perform vertical and horizontal database sharding, adopt global ID generators, migrate data safely, and ensure consistency and stability during cut‑over using stop‑write or dual‑write strategies.
1. Why split? Tight coupling between applications, poor extensibility, legacy code, limited system scalability, and accumulating technical debt make monolithic systems fragile and hard to evolve.
2. Preparation before splitting includes multi‑dimensional analysis of business complexity, understanding existing domain models, and reaching consensus among product, development, and operations to define clear service boundaries with high cohesion and low coupling.
3. Defining boundaries follows the single‑responsibility principle; each new service should be independent like the distinct abilities of the “Huluwa” brothers, yet capable of being combined into a platform.
4. Database splitting covers vertical sharding (separating tables into different databases) and horizontal sharding (splitting large tables such as message tables). New tables should use utf8mb4 charset and include all necessary indexes.
5. Global ID generation replaces auto‑increment primary keys to avoid conflicts during migration. Options include snowflake (https://github.com/twitter/snowflake), a dedicated MySQL table with auto_increment , or dual‑table odd/even strategies.
6. Data migration steps – create new tables, perform full data sync during low‑traffic windows, then use binlog‑based incremental sync tools (e.g., Alibaba Canal or Otter) ensuring the binlog position is captured before the full load.
7. SQL refactoring is required because cross‑database joins are unsupported; strategies include eliminating joins, using global tables, adding redundant fields, or performing in‑memory joins via RPC or local cache.
8. Cut‑over strategies – (a) stop‑write: pause writes, switch reads, and migrate; (b) dual‑write: write to both old and new tables for a short period, then disable binlog sync. Each has trade‑offs in risk, latency, and rollback complexity.
9. Consistency after split can be achieved via distributed transactions (generally avoided), message‑based compensation, or scheduled compensation jobs to achieve eventual consistency.
10. Post‑split stability relies on defensive programming, clear interface contracts, flow control, caching strategies, SOP‑driven incident response, and continuous monitoring of resource usage (CPU, memory, DB QPS, cache hit rates).
11. Key takeaways – prepare thoroughly, decompose complex work into testable steps, anticipate Murphy’s Law, and maintain SOPs to handle unexpected failures.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.