Backend Development 15 min read

Distributed Transaction Solutions: XA, TCC, and SAGA with DTM Implementation

This article analyzes the challenges of distributed transactions in a micro‑service order system, compares XA, TCC, and SAGA patterns, and details how the open‑source DTM framework is applied to achieve reliable order creation while handling crashes, rollbacks, idempotency, and network anomalies.

360 Tech Engineering
360 Tech Engineering
360 Tech Engineering
Distributed Transaction Solutions: XA, TCC, and SAGA with DTM Implementation

1. Project Background

During the evolution of a new internal trading platform for the 360 Group, many business functions (store, product, order, coupon, red‑packet, user, payment, fulfillment, after‑sale, etc.) were split into independent micro‑services written in different languages. When a user submits an order, the backend must create the order, deduct inventory, apply coupons, red‑packets, points, and generate a payment record.

In a monolithic system this could be handled with a single database transaction, but after micro‑service decomposition the operations are spread across multiple services, leading to potential process failures, partial rollbacks, and duplicate requests, thus requiring a distributed transaction consistency solution.

2. Distributed Solution Options

Because order creation requires rollback, message‑based patterns were excluded, leaving three classic distributed transaction models: XA, TCC, and SAGA.

XA

XA, originally defined by Tuxedo and later standardized by X/Open, defines an interface between a global transaction manager (TM) and local resource managers (RM). Most mainstream databases (MySQL, Oracle, SQL Server, PostgreSQL) support XA.

XA works in two phases:

Prepare phase – TM asks all participants if they can commit.

Commit/Rollback phase – If all participants agree, TM sends commit; otherwise it sends rollback.

While XA guarantees strong consistency, it locks database resources for the duration of the transaction, causing performance overhead.

TCC

TCC (Try‑Confirm‑Cancel) was introduced by Pat Helland in 2007. It consists of three phases:

Try – Perform business checks and reserve required resources.

Confirm – After all tries succeed, execute the actual business logic using the reserved resources.

Cancel – If any try fails, release the reserved resources.

Characteristics of TCC:

High concurrency, no long‑term locks.

Requires additional Try/Confirm/Cancel APIs, increasing development effort.

Provides strong consistency, preventing “paid‑but‑canceled” orders.

Best suited for order‑type business with constrained intermediate states.

Our legacy codebase could not be quickly refactored to support resource reservation, so TCC was ruled out.

SAGA

SAGA, first described in 1987, splits a long transaction into a series of short, compensatable sub‑transactions coordinated by a saga orchestrator. If all sub‑transactions succeed, the saga completes; otherwise compensating actions are executed in reverse order.

SAGA has two phases:

Action – Execute the forward operation without reserving resources.

Compensate – If a step fails, invoke its compensating operation to roll back.

Characteristics of SAGA:

High concurrency, no long‑term locks.

Requires definition of both forward and compensating actions; development effort is larger than XA but smaller than TCC.

Weaker consistency – a paid‑but‑canceled order may still appear.

Because SAGA only needs a compensating interface when rollback is required, it required minimal changes to existing business logic, so we selected SAGA despite its temporary data‑inconsistency risk.

3. Practicing SAGA with DTM in Order Creation

1. Confirmation of DTM as the Distributed Transaction Manager

We evaluated open‑source solutions and chose DTM (Golang) over Seata because DTM supports HTTP/gRPC and has mature SDKs for multiple languages, while Seata’s support is limited to Java‑centric ecosystems.

After integrating DTM, the order flow becomes:

The API creates a SAGA transaction, registers multiple branches (each with an action and a compensate endpoint), and submits the saga to DTM. DTM then executes all actions; if any action fails (e.g., insufficient inventory), DTM triggers the corresponding compensations and rolls back the global transaction.

Process crash handling – If a service crashes during the saga, DTM retries until the operation succeeds.

Rollback handling – When an action fails, DTM records completed branches and invokes their compensations.

Network timing uncertainties introduce idempotency, empty rollback, and hanging problems.

2. Dealing with Duplicate Requests, Empty Rollback, and Hanging

Distributed systems face NPC issues (Network Delay, Process Pause, Clock Drift). These can cause out‑of‑order execution, leading to:

Empty compensation – Compensate runs without a prior action.

Hanging – Action runs after its compensation has already been applied.

Idempotency – Repeated requests must not produce cumulative side effects.

To address these, we rely on a transaction‑event log that records the state of each branch.

3. DTM Solution in the Inventory Deduction Service

DTM provides a “branch barrier” mechanism. A local table dtm_barrier stores a unique key composed of global‑transaction‑id, branch‑id, and operation (action/compensate). The workflow is:

Start a local DB transaction.

Insert‑ignore a row for the current operation; if the insert fails, the operation is considered already processed (idempotency).

If the operation is a compensate, insert a row for the corresponding action; a successful insert indicates the action has not run, preventing empty compensation.

Execute business logic inside the transaction; commit on success, rollback on failure.

Because the Java SDK of DTM did not yet support SAGA branch barriers, we implemented the same principle ourselves.

Using this barrier we achieve:

Empty compensation control – If compensate runs without a prior action, the barrier prevents business logic execution.

Idempotency control – The unique key prevents duplicate inserts, ensuring an operation runs only once.

Hanging prevention – If compensate has already executed, the subsequent action’s insert fails, skipping its business logic.

4. Summary

As business complexity grows, many systems migrate from monoliths to micro‑services, making distributed transactions a critical challenge. Without proper frameworks, they add significant operational overhead. By adopting DTM and delegating all transaction coordination to it, developers can focus on core business logic while DTM handles consistency, retries, idempotency, and compensation.

backendMicroservicestccdistributed transactionsSagaXAdtm
360 Tech Engineering
Written by

360 Tech Engineering

Official tech channel of 360, building the most professional technology aggregation platform for the brand.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.