Backend Development 10 min read

Design and Evolution of a Distributed Accounting System for High‑Volume Transaction Processing

This article details the background, architectural evolution, design challenges, and distributed implementation of an accounting system that automates the processing of millions of transaction records across thousands of accounts, highlighting how splitting accounts, workflows, and bills improves performance and reliability.

Tongcheng Travel Technology Center

Nov 15, 2017

Design and Evolution of a Distributed Accounting System for High‑Volume Transaction Processing

Background As the company grew, transaction volume surged and over 1,000 corporate accounts (virtual and cash) were created, leading to massive daily manual accounting effort. To reduce labor costs, an automated accounting system was conceived within the payment services R&D team.

Architecture Evolution The accounting system has undergone three major versions: (1) single‑machine single‑threaded processing, which took six hours for millions of records; (2) single‑machine multi‑threaded processing, reducing time to four hours but still limited by hardware; (3) a distributed processing approach that batches work across multiple nodes, delivering high availability, timeliness, and accuracy.

Design Challenges Three core difficulties were identified: extremely large bill files (up to 600 MB per account), a large number of accounts (>1,000), and a long, time‑critical processing pipeline that must complete shortly after bill generation by payment providers. The pipeline includes file storage, format handling (ZIP/Excel), content parsing, data enrichment, and final validation and storage.

To address these, the system adopts a “split” strategy—splitting accounts, workflows, and bills—to enable parallel processing.

System Design

Single‑Machine Design Both single‑threaded and multi‑threaded versions suffered from memory pressure during the reconciliation phase, where millions of records were compared in nested loops, consuming over two hours and exhausting memory.

Distributed Design The distributed solution revolves around a distributed job system. Upstream clusters handle bill acquisition, archival, parsing, and fee/discount aggregation, while downstream clusters perform data enrichment, classification, validation, error handling, and persistence. Job configuration is stored in Zookeeper, and the master node dispatches bill‑pull tasks to upstream workers, achieving bill‑level splitting.

Overall Processing Flow

Job configuration in the control center, pushed to Zookeeper for task distribution.

Job scheduling: upstream workers enqueue tasks, and the scheduler triggers bill‑pull jobs.

Bill pulling: the gateway constructs protocols (FTP/SFTP/HTTPS) to retrieve bills and stores raw files.

Bill parsing: ZIP and Excel formats are normalized, fees and discounts are summed, and standardized bill data is produced.

Data distribution: the core module shards millions of bill records to downstream clusters for enrichment, validation, classification, error handling, and final storage.

Distributed Reconciliation Reconciliation compares two data sets to find differences. The single‑machine algorithm uses set union and intersection, which is inefficient for large volumes. The distributed approach splits the bill set A into single‑element subsets (A1) and checks each against the transaction set B in parallel across the cluster, marking matches and extracting the difference set as mismatched records.

Conclusion and Outlook The system has saved the company approximately 1.305 million CNY in labor costs, streamlined receipt verification, improved efficiency, ensured data completeness, timeliness, and high availability, and reduced external communication overhead. However, challenges remain in reconciling offline transaction data and non‑standard payment flows, requiring ongoing optimization.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed Systems big data processing Data Reconciliation Accounting automation

Written by

Tongcheng Travel Technology Center

Pursue excellence, start again with Tongcheng! More technical insights to help you along your journey and make development enjoyable.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.