Design and Implementation of seadt-SAGA Distributed Transaction Framework
The article details Shopee’s seadt‑SAGA framework—a Go‑based distributed transaction solution that implements the SAGA pattern via two modes, orchestration with a central coordinator and collaboration with embedded logic, explains their state‑machine designs, compares rollback ordering using vector clocks versus sequence numbers, and discusses each mode’s trade‑offs.
Abstract
seadt is a Golang‑based distributed transaction solution developed by the Shopee Financial Products team. After introducing the TCC model in a previous article, this chapter focuses on the design and implementation of seadt‑SAGA.
1. SAGA Introduction
SAGA originates from the database paper “Sagas”. A SAGA transaction consists of a series of short‑lived local transactions, each paired with a compensating transaction. If any step fails, the whole SAGA is terminated and the corresponding compensations are executed to restore the system to its original state.
1.1 SAGA Application Scenarios
SAGA is suitable for long‑running business processes such as cash‑loan applications, where multiple steps (quota deduction, coupon usage, insurance purchase, disbursement, etc.) must either all succeed or be compensated.
1.2 SAGA Implementation Modes
Two modes are described:
Orchestration mode : The transaction state machine is submitted to a Transaction Coordinator (TC) as DSL. TC drives forward execution, compensation, and error handling.
Collaboration mode : The business code (Transaction Manager, TM) directly orchestrates the flow; the TC only performs compensation.
Orchestration mode workflow
Business starts a SAGA.
TM requests TC to start the global transaction.
TC executes the first branch (e.g., quota deduction).
If the branch fails, TC triggers compensation and ends the SAGA.
If the branch succeeds, TC proceeds to the next branch (e.g., coupon usage).
Compensation is performed similarly for subsequent failures.
When all branches succeed, the SAGA ends.
Sample code (Go) for the orchestration mode:
import seadt
func BizFunc() {
sagaTxId := "BizTrans"
seadt.StartSAGATx(ctx, seadt_name, req_id)
}DSL definition (JSON‑like) used by the TC:
// DSL pseudo‑code
{
"seadt_name":"BizTrans",
"tx_seq":[
{
"sub_tx_id": "UseQuota",
"commit_method": "/Module_Name/UseQuota",
"callback_method": "/Module_Name/UnUseQuota",
"pre_step":"Start",
"next_step":"UseCoupon"
},
{
"sub_tx_id": "UseCoupon",
"commit_method": "/Module_Name/UseCoupon",
"callback_method": "/Module_Name/UnUseCoupon",
"pre_step":"UseQuota",
"next_step":"End"
}
]
}Orchestration mode advantages
Visualizable workflows with DSL.
Centralized management and monitoring.
Flexible forward/reverse retry strategies.
Atomic interfaces for participants.
Orchestration mode disadvantages
Steeper learning curve for state‑machine concepts.
Higher integration cost for existing services.
Complex state‑machine engine development.
Higher development effort.
2. Collaboration Mode
In this mode, the TM embeds the SAGA orchestration logic locally. The forward flow runs in the TM, while the TC only drives compensation.
Collaboration mode workflow:
TM initiates the SAGA.
TM calls quota service (branch A).
If branch A fails, TM reports failure to TC, which triggers compensation.
If branch A succeeds, TM calls coupon service (branch B).
If branch B fails, TM reports to TC, which compensates both branches.
If all succeed, the SAGA commits.
Sample code for the collaboration mode:
func BizTrans(ctx context.Context, userId, loanId, couponId, Principal string) {
saga.WithGlobalTransaction(ctx, func(ctx context.Context) {
// use quota
biz.RefAccount().UseQuota(ctx, userId, loanId, Principal)
// use coupon
biz.RefPromotion().UseCoupon(ctx, userId, couponId)
// local DB operation
dao.ClFileAuthDAO().Insert(ctx, record)
}, &seadt_model.Option{TimeOutSecond: 10, TransactionName: "loan_apply"})
}Collaboration mode advantages
Easy integration – business code directly defines the flow.
Low integration cost; existing seadt‑TCC components can be reused.
Collaboration mode disadvantages
Lack of centralized visual management.
Business logic tightly coupled with transaction flow.
No built‑in forward retry; must be handled manually.
Ordered rollback is harder to guarantee.
3. State Machines
Both SAGA and TCC have a global transaction state machine and a branch transaction state machine. SAGA’s branch states are Prepared, Confirmed, and Canceled, with Confirmed able to transition to Canceled.
State matrices for TM↔TC and RM↔RM are provided in the original article.
4. Rollback Strategies
Two approaches to determine the happened‑before relationship for ordered rollback:
Vector Clock : Captures causal ordering across nodes, enabling both ordered and parallel rollback.
Transaction Sequence Number : Assigns a monotonically increasing ID to each branch when registered with the global TC. Guarantees total order for rollback but cannot distinguish parallel branches.
The team evaluated both; the vector‑clock approach offers precise ordering but adds complexity, while the sequence‑number approach is simpler and fits the current architecture. Consequently, seadt‑SAGA adopts the transaction sequence number method.
5. References
SAGA paper: https://github.com/mltds/sagas-report
Partial order theory: https://zh.wikipedia.org/wiki/偏序关系
Total order theory: https://zh.wikipedia.org/wiki/全序关系
Happened‑before: https://en.wikipedia.org/wiki/Happened-before
Logical and vector clocks: https://writings.sh/post/logical-clocks
Shopee Tech Team
How to innovate and solve technical challenges in diverse, complex overseas scenarios? The Shopee Tech Team will explore cutting‑edge technology concepts and applications with you.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.