Evolution of Payment Reconciliation Architecture: From MySQL to StarRocks with Flink and DataX
This article describes how a payment reconciliation system progressed from a simple MySQL‑based solution through a Hive‑based big‑data approach to a high‑performance StarRocks architecture, detailing the integration of Flink, DataX, and SQL adaptations that dramatically improved query speed, cost, and operational efficiency.
Reconciliation is essential for enterprises to verify financial transactions, manage inventory, and assess business performance, requiring reliable cross‑system data matching among payment, order, finance, settlement, and rights systems.
1. Payment Closed‑Loop – Although end users see a simple order‑pay flow, the backend involves notifications, settlement, accounting, and report generation, forming a complex payment closed‑loop.
2. Reconciliation Architecture Evolution
2.1 Reconciliation 1.0 – All in MySQL – Data from multiple servers is synchronized to a single MySQL instance for cross‑database queries, but large tables (billions of rows) cause queries to take hours, making the approach inefficient.
2.2 Reconciliation 2.0 – Big Data (Hive) – Using ETL, data from various business systems is loaded into a data warehouse for cross‑table analysis, improving scalability but still suffering from latency and high resource consumption.
2.3 Reconciliation 2.0 Limitations – T+1 data sync and inability to handle adjustment or supplemental accounting scenarios limit its applicability.
3. Reconciliation 3.0 – StarRocks Acceleration
Facing growing data volume and finer‑grained billing requirements, the team evaluated OLAP solutions and selected StarRocks for its billion‑row join capability, sub‑second query speed, real‑time ingestion, and MySQL‑compatible protocol.
3.1 StarRocks Architecture – StarRocks replaces Hive, retaining most of the existing SQL‑based tasks with minor adjustments; scheduled jobs and data validation tasks ensure consistency between MySQL and StarRocks.
3.2 Model Selection – Choosing the appropriate data model (detail, aggregate, update, primary‑key) is crucial to avoid duplicate‑key issues that previously caused reconciliation errors.
3.3 Flink Real‑Time Synchronization – DataX handles bulk data migration, while Flink captures binlog for incremental sync; custom Flink jobs write directly to StarRocks when needed.
Key implementation steps include creating StarRocks tables, defining corresponding Flink tables, and building Flink SQL jobs to stream data.
4. SQL Adaptation – All 18 reconciliation scenarios required SQL adjustments; thanks to StarRocks’ MySQL compatibility, the effort was completed within a day, with only minor syntax changes for sub‑queries.
5. Results – The most complex reconciliation query dropped from 1 hour to 50 seconds, achieving 50‑70× performance gains, halving storage costs, and reducing daily manual effort from three people to one.
6. Summary – Manual data merging is being replaced by automated, scheduled reconciliation and reporting; leveraging StarRocks’ materialized views will further boost efficiency and sustain continuous improvement.
HomeTech
HomeTech tech sharing
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.