Backend Development 19 min read

System Refactoring Case Study: From Monolithic to Distributed Architecture and Database Redesign

This article details a comprehensive system refactoring project that transformed a monolithic, all‑in‑one architecture into a distributed, micro‑service‑based design with a completely rebuilt database model, covering problem analysis, solution design, phased implementation, testing, and rollout strategies.

Wukong Talks Architecture

Dec 26, 2023

System Refactoring Case Study: From Monolithic to Distributed Architecture and Database Redesign

Hello, I am Wukong.

In our work we often encounter system or module refactoring; today I share a system refactoring experience I went through.

01 Background

The refactoring background: the original system used an all‑in‑one architecture. As business grew rapidly, user traffic surged, request volume multiplied, and various problems emerged. The original architecture diagram is shown below.

02 Pain Points

The typical problems encountered were:

Severe module coupling, unable to scale quickly with traffic increase.

Mixed database tables, unclear positioning (e.g., payment and product orders stored in one table with a single status field representing two different order flows, causing status anomalies).

Complex SQL and cross‑table joins leading to slow queries and frequent database alerts.

No service and domain separation, causing tight coupling between system and interfaces, resulting in single‑point failures and full system outages.

Slow interface responses, poor system stability, frequent data loss and corruption.

Bulky product requirement versions, many business scenarios, scattered logic, slow iteration speed.

High frequency of customer complaints, difficult issue tracing, R&D exhausted by troubleshooting.

Faced with these issues, two options were considered:

Continue iterating the existing system, which would require more manpower and effort to maintain stability and iteration speed.

Fully rebuild the system, requiring investment and potentially affecting short‑term business iteration.

Given the long‑term product iteration and the current system bottleneck, a complete system rebuild was decided.

I was assigned as the project lead, with the following leadership requirements:

During rapid business growth and system refactoring, maintain business iteration speed, possibly increasing staff.

Design the new system to accommodate increased user traffic and data volume three years ahead.

Ensure no impact on users or business during the switch, avoiding data loss or corruption.

With the task defined, the next step was to consider how to execute it.

03 Solution

System refactoring is a complex engineering effort, akin to changing an aircraft engine mid‑flight; it requires thorough planning to ensure safety.

Based on the problems and goals, the following technical principles were established:

Adopt a distributed architecture, fully separating modules for independent deployment and evolution.

Completely redesign the database model to support business expansion, aligned with the distributed architecture.

Consolidate business logic, defining clear domain boundaries and unified service interfaces.

Implement dual‑write between old and new databases to ensure stability and prevent data loss.

Run old and new systems in parallel, using gray‑scale traffic control until the old system is decommissioned.

04 Implementation

Requirement and Interface Analysis

With the overall goals and technical direction set, the implementation phase began.

Since core bottlenecks were in the order module, a phased approach was adopted, starting with order‑related functionality.

First, business requirements and product functions were clarified using historical documents and real‑time product simulations.

Functional requirements needed to be broken down to the interface level, mapping upstream callers and downstream dependencies to fully cover the order module.

This analysis informed the new database table design.

Data Model Considerations

Key considerations for the new database model included:

Large data volume solution: sharding. The order table was split into 64 tables based on user‑ID modulo, each handling up to 50 million rows, supporting three‑year growth. Additional query dimensions (time, region) are handled via Elasticsearch indexes.

Primary key strategy: distributed ID generation using a Snowflake‑like approach.

Cross‑table query solution: service‑level aggregation. New design prohibits cross‑table joins; required data is fetched from single tables and aggregated in code.

Dual‑write between old and new models: a switch‑controlled mechanism handles writes to both models, allowing flexible toggling.

With the database model completed, the order module architecture was designed.

Architecture Design

The overall system architecture after redesign is illustrated below.

Key points:

Legacy app versions cannot force all users to upgrade, so Nginx redirects old version interfaces to the new service layer.

The interface service layer is split to handle different front‑ends (App, web admin) and consolidates authentication and encryption.

The business logic layer abstracts order‑related logic, providing aggregated services (order details, order list) and may call other domain services.

The domain service layer encapsulates CRUD operations for its own tables.

The order database is separated from the original monolithic database, with potential further domain splitting.

During implementation, switch mechanisms were added to enable seamless transition.

Phase 1 architecture diagram:

Phase 1 design highlights:

A switch in the all‑in‑one app controls whether it calls the new interface service layer or continues using the original direct database access.

Switches in domain services (order‑domain1‑service, order‑domain2‑service, other‑domain‑service) control read/write access to the old monolithic database and the new order database.

Normal flow after Phase 1 launch:

NGINX redirects old order interfaces to the new service layer, switching traffic.

The all‑in‑one app switch is turned on to use the new business logic layer.

Domain services read from the old database and write to the new order database (write‑only).

This validates the service and interface call chain; any issue can be quickly reverted via switches.

Phase 2 architecture diagram:

Phase 2 focuses on data‑layer verification, ensuring data written to the new model is correct.

Key actions in Phase 2:

In order domain services, keep write access to the old database but disable reads.

Enable both read and write on the new order database.

Full call and data paths now use the new services and tables; product‑level validation compares new and old data to confirm correctness.

If issues arise, switches allow immediate fallback to the original path.

After Phase 2 validation, remaining tasks include removing dual‑write code, switches, and legacy logic.

Project Execution

With the architecture finalized, a detailed refactoring plan was created, securing resources and defining milestones.

Beyond order module changes, the project required coordination with dependent teams, resource allocation, and stakeholder alignment.

Development tasks covered interface changes, data migration compatibility, message queue and cache compatibility.

Post‑development, simulated data migrations verified that legacy data could be imported without functional issues.

Comprehensive testing included:

Automated interface test cases for repeated validation.

Cross‑testing by QA to cover missed scenarios.

Replay of live traffic for logical verification.

Gray‑scale traffic in pre‑release environments for end‑to‑end validation.

Release planning involved detailed step‑by‑step procedures, time estimates, responsibility assignments, risk hypotheses with mitigation plans, and coordinated traffic switching.

Monitoring systems tracked service, interface, and performance metrics during rollout.

Gradual gray‑scale traffic increase confirmed normal operation, completing the refactoring successfully.

Over roughly six months, the entire monolithic service was fully split and refactored, with ongoing iterative improvements such as gateway integration, further order service decomposition, Elasticsearch query replacement, and business‑logic layer middle‑platform evolution.

05 Summary

Key steps in the refactoring process:

Analyze current system issues and prioritize critical pain points.

Define refactoring goals, direction, and constraints.

Identify core technical solutions and feasibility.

Map requirements, scenarios, and upstream/downstream dependencies.

Design a clear and complete technical solution.

Create a detailed project plan, lock resources, and drive milestones.

Conduct full‑process testing and validation.

Prepare a comprehensive release plan.

Perform essential gray‑scale verification.

System refactoring is labor‑intensive but a significant challenge that enhances overall engineering capability; encountering real‑world problems drives learning and growth.

If you have similar experiences or thoughts, feel free to share them.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Backend Engineering Migration microservices distributed architecture system refactoring database redesign

Written by

Wukong Talks Architecture

Explaining distributed systems and architecture through stories. Author of the "JVM Performance Tuning in Practice" column, open-source author of "Spring Cloud in Practice PassJava", and independently developed a PMP practice quiz mini-program.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.