Product Middle Platform Workflow Orchestration Engine: Use Cases, Architecture, and High‑Availability Solutions
Tencent’s product middle platform employs a self‑built, stateless workflow orchestration engine—configurable via drag‑and‑drop or DSL—to coordinate massive product processing and audit tasks, using load‑balancing, retry, rate‑limiting, circuit‑breaker and service isolation strategies that ensure high availability, performance, and horizontal scalability on TKE.
The article introduces the product middle platform (Tencent Advertising product middle platform) and explains how a self‑built workflow orchestration engine is used to handle various business scenarios, achieving high‑availability, high‑performance, and high‑scalability, thereby improving development efficiency and system stability.
Use Cases
1. Product processing : Managing a catalog of nearly 4 billion products with daily processing volume exceeding 80 million items. The engine coordinates tasks such as category recognition, attribute extraction, brand identification, tagging, understanding, auditing, image/video transfer, and creative generation.
2. Product audit governance : Provides comprehensive audit units (public feature audit, baseline style audit, low‑quality style audit, infringement audit, prohibited content audit, etc.) and supports asynchronous human‑review callbacks within the workflow.
Why Use a Workflow Orchestration Engine?
Complex business processes involve multiple services that must be coordinated for sequencing, parallelism, timeout handling, retries, circuit breaking, and distributed consistency. An orchestration engine abstracts these concerns, allowing developers to focus on high‑cohesion, low‑coupling service nodes.
Building a Workflow
Two approaches are provided:
• Visual drag‑and‑drop editing : Users compose tasks (e.g., SCF, Kafka producer, HTTP interface) by dragging nodes in the console. An example shows a workflow with an image‑save task followed by an audit task.
• Code‑based creation using a DSL : The workflow is defined in JSON‑like DSL. The following example defines a parallel branch with an image‑save task and an audit task, followed by a final DAG node.
{
"Comment": "业务A",
"StartAt": "Parallel",
"States": {
"Parallel": {
"Type": "Parallel",
"Next": "FinalState",
"Branches": [
{
"StartAt": "ImageSave",
"States": {
"ImageSave": {
"Type": "Task",
"Comment": "图片转存",
"Resource": "resource地址,支持http协议、kafka协议、Serverless协议",
"Next": "NewAudit"
},
"NewAudit": {
"Type": "Task",
"Comment": "审核流程",
"Resource": "resource地址,支持http协议、kafka协议、Serverless协议",
"InputPath": "$.inputAswStr",
"End": true
}
}
}
]
},
"FinalState": {
"Type": "Task",
"Comment": "DAG结束节点",
"Resource": "resource地址,支持http协议、kafka协议、Serverless协议",
"End": true
}
}
}Engine Architecture
The engine runs on Tencent Cloud TKE using Docker containers and is stateless, enabling dynamic scaling based on resource pressure.
High‑Availability (Three‑High) Solutions
1. Load‑balancing strategies – round‑robin, least‑connection, hash, random, weighted round‑robin. The scheduler distributes DAG tasks to executors based on these policies and monitors executor health every second.
Executor
Load
Remaining Capacity
Selection Probability
1
a%
100%‑a%
(100‑a)/[(100‑a)+(100‑b)+(100‑c)]
2
b%
100%‑b%
(100‑b)/[(100‑a)+(100‑b)+(100‑c)]
3
c%
100%‑c%
(100‑c)/[(100‑a)+(100‑b)+(100‑c)]
2. Interface retry strategy – detects error codes, decides whether to retry, and configures retry interval, max attempts, and back‑off rate. Example configuration:
{
"Comment": "业务A",
"StartAt": "unit_a",
"States": {
"unit_a": {
"Type": "Task",
"Comment": "审核A单元",
"Resource": "resource地址,支持http协议、kafka协议、Serverless协议",
"Retry": [
{
"ErrorEquals": ["StatesTimeout"],
"IntervalSeconds": 1,
"MaxAttempts": 2,
"BackoffRate": 2.0
}
],
"End": true
}
}
}3. Rate limiting and circuit‑breaker – prevents overload and early failure of faulty services, avoiding cascade failures. The engine also uses message‑queue buffering for back‑pressure. 4. Service isolation – physical isolation of clusters based on hot‑spot and user‑scenario dimensions, ensuring that failures in one service do not affect others. Performance and Scalability The engine has undergone extensive stress testing, optimizing storage I/O, large object handling, concurrency, and component usage. Being stateless and containerized, it supports automatic horizontal scaling on TKE.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.