Understanding the Challenges of Distributed Transactions in Microservices
The article explains that distributed transactions in micro‑service architectures are difficult because they must guarantee atomicity across heterogeneous services, handle time‑outs, and reconcile results using unique transaction IDs, undo‑logs, or NoSQL strategies, while balancing locking, availability, and consistency trade‑offs.
In the era of micro‑service architecture, distributed transactions have become a widely discussed yet difficult problem. This article examines why distributed transactions are hard, focusing on the need for atomicity across multiple services and the complications introduced by high concurrency and NoSQL storage.
Problem Description
Distributed transactions require that all sub‑operations either succeed together or have no effect. The core difficulty lies in handling time‑outs: a transaction may be partially completed, and without clear knowledge of each sub‑task’s outcome, it is impossible to decide the correct follow‑up action.
Key Challenges
Identifying the real execution status of each sub‑task (A and B).
Providing a reliable way to reconcile ("立字据") the results.
Avoiding confusion with consensus protocols such as Paxos or Raft, which address different problems.
Illustration of Consensus vs. Distributed Transaction
节点1完成任务1
节点2完成任务1
节点3完成任务1In contrast, a distributed transaction binds multiple distinct tasks:
节点1完成任务1
节点1完成任务2
节点1完成任务3Timeout Handling and Reconciliation
The article proposes a "record" ("立字据") approach: each sub‑task must generate a unique transaction ID that can be used for later reconciliation.
1. Local Log Persistence
Logging each operation locally and collecting logs centrally can help, but it still cannot determine whether a DB operation succeeded without additional information such as WAL (Write‑Ahead Log) entries.
开启本地事务
add 业务sql:具体业务逻辑对应的 sql
add undolog 表sql:insert 一条 undo log,key 包含事务 ID // 基于该表反查
提交本地事务2. SQL Undo‑Log Based Reconciliation
Frameworks like Seata create an undo‑log table within the same local transaction, allowing the coordinator to query the undo‑log to know whether the original SQL succeeded.
3. NoSQL Persistence
If the NoSQL store lacks transaction support, two strategies are suggested:
Attach the transaction ID to the operation log and provide a real‑time log‑query API.
Embed the transaction ID array directly in the stored value (e.g., a protobuf definition).
message ServiceTable{
repeated string txids = 1; // 事务 id 数组,按一定 size 滚存
int32 field1 = 2;
int32 field2 = 3;
int32 field3 = 4;
int32 field4 = 5;
// ...
}4. External Interface Calls
When a distributed transaction involves external services, the external side must also provide reconciliation and idempotency capabilities.
5. Typical Complex Micro‑service Scenario
事务开启
任务1:改数据 A // sql 存储
任务2:改数据 B // nosql 存储
任务3:调用系统内服务接口 C // 可能被调服务
任务4:调用外部门服务接口 D
事务结束Large‑scale systems may need to split such a transaction into multiple parallel sub‑transactions, each with its own locking and coordination strategy.
// A B C D 四种数据
并行事务一:改 A B C
并行事务二:改 B D
并行事务三:改 B A
并行事务四:改 C DService dependency graphs further complicate matters:
SvrE ---|
SvrB----\
SvrC--- SvrA 数据表A
SvrD----/6. Locking vs. Availability Trade‑off
When a timeout occurs, one can sacrifice availability by locking the involved user/resource until the transaction fully completes (a TCC‑style approach).
try 阶段 lock 好
confirm commit... 直至确定 ok 为止
明确完成后才 unlockAlternatively, for scenarios where strict locking is undesirable, eventual consistency with idempotent operations (e.g., setting a VIP flag) can be used.
7. Preconditions for Using Distributed Transactions
Each sub‑task must support idempotency and reconciliation.
A clear transaction‑ID‑based locking mechanism must be defined.
The business’s consistency requirements must be understood (strong vs. eventual).
Prefer low‑intrusiveness solutions and evaluate cost‑benefit.
Conclusion
Distributed transactions are inherently complex; they require unique transaction IDs, idempotent/reconciliable sub‑tasks, and careful lock management. When the business truly needs strong consistency, appropriate frameworks (TCC, 2‑PC, etc.) can be employed, but only after thorough analysis of trade‑offs, performance impact, and operational overhead.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.