Backend Development 18 min read

Understanding the Challenges of Distributed Transactions in Microservices

The article explains that distributed transactions in micro‑service architectures are difficult because they must guarantee atomicity across heterogeneous services, handle time‑outs, and reconcile results using unique transaction IDs, undo‑logs, or NoSQL strategies, while balancing locking, availability, and consistency trade‑offs.

Tencent Cloud Developer

Jun 25, 2024

Understanding the Challenges of Distributed Transactions in Microservices

In the era of micro‑service architecture, distributed transactions have become a widely discussed yet difficult problem. This article examines why distributed transactions are hard, focusing on the need for atomicity across multiple services and the complications introduced by high concurrency and NoSQL storage.

Problem Description

Distributed transactions require that all sub‑operations either succeed together or have no effect. The core difficulty lies in handling time‑outs: a transaction may be partially completed, and without clear knowledge of each sub‑task’s outcome, it is impossible to decide the correct follow‑up action.

Key Challenges

Identifying the real execution status of each sub‑task (A and B).

Providing a reliable way to reconcile ("立字据") the results.

Avoiding confusion with consensus protocols such as Paxos or Raft, which address different problems.

Illustration of Consensus vs. Distributed Transaction

节点1完成任务1<br/>节点2完成任务1<br/>节点3完成任务1

In contrast, a distributed transaction binds multiple distinct tasks:

节点1完成任务1<br/>节点1完成任务2<br/>节点1完成任务3

Timeout Handling and Reconciliation

The article proposes a "record" ("立字据") approach: each sub‑task must generate a unique transaction ID that can be used for later reconciliation.

1. Local Log Persistence

Logging each operation locally and collecting logs centrally can help, but it still cannot determine whether a DB operation succeeded without additional information such as WAL (Write‑Ahead Log) entries.

开启本地事务
  add 业务sql：具体业务逻辑对应的 sql
  add undolog 表sql：insert 一条 undo log，key 包含事务 ID // 基于该表反查
提交本地事务

2. SQL Undo‑Log Based Reconciliation

Frameworks like Seata create an undo‑log table within the same local transaction, allowing the coordinator to query the undo‑log to know whether the original SQL succeeded.

3. NoSQL Persistence

If the NoSQL store lacks transaction support, two strategies are suggested:

Attach the transaction ID to the operation log and provide a real‑time log‑query API.

Embed the transaction ID array directly in the stored value (e.g., a protobuf definition).

message ServiceTable{
  repeated string txids = 1; // 事务 id 数组，按一定 size 滚存
  int32 field1 = 2;
  int32 field2 = 3;
  int32 field3 = 4;
  int32 field4 = 5;
  // ...
}

4. External Interface Calls

When a distributed transaction involves external services, the external side must also provide reconciliation and idempotency capabilities.

5. Typical Complex Micro‑service Scenario

事务开启
  任务1：改数据 A // sql 存储
  任务2：改数据 B // nosql 存储
  任务3：调用系统内服务接口 C // 可能被调服务
  任务4：调用外部门服务接口 D
事务结束

Large‑scale systems may need to split such a transaction into multiple parallel sub‑transactions, each with its own locking and coordination strategy.

// A B C D 四种数据
并行事务一：改 A B C
并行事务二：改 B D
并行事务三：改 B A
并行事务四：改 C D

Service dependency graphs further complicate matters:

SvrE ---|
       SvrB----\
       SvrC--- SvrA 数据表A
       SvrD----/

6. Locking vs. Availability Trade‑off

When a timeout occurs, one can sacrifice availability by locking the involved user/resource until the transaction fully completes (a TCC‑style approach).

try 阶段 lock 好
confirm commit... 直至确定 ok 为止
明确完成后才 unlock

Alternatively, for scenarios where strict locking is undesirable, eventual consistency with idempotent operations (e.g., setting a VIP flag) can be used.

7. Preconditions for Using Distributed Transactions

Each sub‑task must support idempotency and reconciliation.

A clear transaction‑ID‑based locking mechanism must be defined.

The business’s consistency requirements must be understood (strong vs. eventual).

Prefer low‑intrusiveness solutions and evaluate cost‑benefit.

Conclusion

Distributed transactions are inherently complex; they require unique transaction IDs, idempotent/reconciliable sub‑tasks, and careful lock management. When the business truly needs strong consistency, appropriate frameworks (TCC, 2‑PC, etc.) can be employed, but only after thorough analysis of trade‑offs, performance impact, and operational overhead.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Reconciliation microservices locking Idempotency distributed transactions timeout handling Transaction ID

Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.