Design and Evolution of a Distributed Accounting System for High‑Volume Transaction Processing
This article details the background, architectural evolution, design challenges, and distributed implementation of an accounting system that automates the processing of millions of transaction records across thousands of accounts, highlighting how splitting accounts, workflows, and bills improves performance and reliability.
Background As the company grew, transaction volume surged and over 1,000 corporate accounts (virtual and cash) were created, leading to massive daily manual accounting effort. To reduce labor costs, an automated accounting system was conceived within the payment services R&D team.
Architecture Evolution The accounting system has undergone three major versions: (1) single‑machine single‑threaded processing, which took six hours for millions of records; (2) single‑machine multi‑threaded processing, reducing time to four hours but still limited by hardware; (3) a distributed processing approach that batches work across multiple nodes, delivering high availability, timeliness, and accuracy.
Design Challenges Three core difficulties were identified: extremely large bill files (up to 600 MB per account), a large number of accounts (>1,000), and a long, time‑critical processing pipeline that must complete shortly after bill generation by payment providers. The pipeline includes file storage, format handling (ZIP/Excel), content parsing, data enrichment, and final validation and storage.
To address these, the system adopts a “split” strategy—splitting accounts, workflows, and bills—to enable parallel processing.
System Design
Single‑Machine Design Both single‑threaded and multi‑threaded versions suffered from memory pressure during the reconciliation phase, where millions of records were compared in nested loops, consuming over two hours and exhausting memory.
Distributed Design The distributed solution revolves around a distributed job system. Upstream clusters handle bill acquisition, archival, parsing, and fee/discount aggregation, while downstream clusters perform data enrichment, classification, validation, error handling, and persistence. Job configuration is stored in Zookeeper, and the master node dispatches bill‑pull tasks to upstream workers, achieving bill‑level splitting.
Overall Processing Flow
Job configuration in the control center, pushed to Zookeeper for task distribution.
Job scheduling: upstream workers enqueue tasks, and the scheduler triggers bill‑pull jobs.
Bill pulling: the gateway constructs protocols (FTP/SFTP/HTTPS) to retrieve bills and stores raw files.
Bill parsing: ZIP and Excel formats are normalized, fees and discounts are summed, and standardized bill data is produced.
Data distribution: the core module shards millions of bill records to downstream clusters for enrichment, validation, classification, error handling, and final storage.
Distributed Reconciliation Reconciliation compares two data sets to find differences. The single‑machine algorithm uses set union and intersection, which is inefficient for large volumes. The distributed approach splits the bill set A into single‑element subsets (A1) and checks each against the transaction set B in parallel across the cluster, marking matches and extracting the difference set as mismatched records.
Conclusion and Outlook The system has saved the company approximately 1.305 million CNY in labor costs, streamlined receipt verification, improved efficiency, ensured data completeness, timeliness, and high availability, and reduced external communication overhead. However, challenges remain in reconciling offline transaction data and non‑standard payment flows, requiring ongoing optimization.
Tongcheng Travel Technology Center
Pursue excellence, start again with Tongcheng! More technical insights to help you along your journey and make development enjoyable.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.