Why Orchestrator Beats Agentic Loop: Architecture of LLM Decision‑Execution Separation

The Orchestrator pattern reduces LLM calls from seven to two, cutting latency from 4.2 s to 1.1 s and cost by about 70%, by separating routing and synthesis from deterministic execution and supporting single, parallel, and sequential agent strategies.

DeepHub IMBA
DeepHub IMBA
DeepHub IMBA
Why Orchestrator Beats Agentic Loop: Architecture of LLM Decision‑Execution Separation

Cost and Latency Comparison

A simple agentic loop implements a while loop where the LLM decides, calls a tool, observes the result, and decides again. For a three‑agent query this requires seven LLM calls, 4.2 seconds of latency, and $0.12 cost. The orchestrator performs the same query with only two LLM calls, 1.1 seconds, and $0.03 cost – a 70 % cost reduction.

Orchestrator Architecture Overview

User Query
    ↓
[STEP 1: ROUTE]   ← one LLM call: "Which agents should handle this?"
    ↓
[STEP 2: EXECUTE] ← deterministic Python code calls the selected agents
    ↓
[STEP 3: SYNTHESIZE] ← one LLM call: "Compose a final answer"
    ↓
Final Answer

Only the routing and synthesis stages involve the LLM; execution is pure Python and therefore free of LLM latency and cost.

Agent Registry as Discovery Protocol

REGISTRY = {
    "data_agent__get_report": {
        "agent": "Data Agent",
        "description": "Fetch the latest report for a given entity",
        "execute": get_report,
    },
    "analytics_agent__get_trends": {
        "agent": "Analytics Agent",
        "description": "Get historical trends and anomaly detection",
        "execute": get_trends,
    },
    "config_agent__check_config": {
        "agent": "Config Agent",
        "description": "Check system configuration for a given component",
        "execute": check_config,
    },
}

In production the registry can be stored in Redis or a database; agents register via HTTP POST.

Step 1 – Deterministic Routing LLM Call

The router runs with temperature=0.0 to ensure deterministic tool selection. Its system prompt instructs the LLM to choose tools only and never answer the user directly.

SYSTEM_PROMPT = """You are a query router. Your ONLY job is to decide which tool(s) to call.

Rules:
- If the query needs ONE agent, call that one tool.
- If the query needs MULTIPLE INDEPENDENT agents, call all of them.
- If the query needs steps IN ORDER, call plan_execution.

Do NOT answer the user's question — just pick tools."""

response = client.chat.completions.create(
    model=deployment,
    messages=[{"role": "system", "content": SYSTEM_PROMPT}, {"role": "user", "content": query}],
    tools=TOOL_DEFINITIONS,
    tool_choice="auto",
    temperature=0.0,
)

tool_names = [tc.function.name for tc in response.tool_calls]
if "plan_execution" in tool_names:
    mode = "sequential"
elif len(tool_names) == 1:
    mode = "single"
else:
    mode = "parallel"

The LLM returns a structured tool‑call list: a single tool → single mode; multiple tools → parallel mode; the special plan_execution tool → sequential mode.

Step 2 – Execution Engine (No LLM)

The executor runs the selected agents using pure Python. Three execution strategies are supported:

Single – directly invoke the chosen agent: result = REGISTRY[tool_name]["execute"]() Parallel – run all agents concurrently:

with concurrent.futures.ThreadPoolExecutor() as pool:
    futures = {name: pool.submit(REGISTRY[name]["execute"]) for name in tool_names}
    results = {name: f.result() for name, f in futures.items()}

Sequential – follow a DAG where later steps depend on earlier results:

for step in plan:
    results[step["tool"]] = REGISTRY[step["tool"]]["execute"]()

In production the executor can be swapped for asyncio.gather with HTTP calls, keeping the pipeline deterministic and observable.

Step 3 – Synthesis LLM Call

Agent outputs are JSON; a second LLM call (temperature 0.7) converts them into a natural‑language answer.

response = client.chat.completions.create(
    model=deployment,
    messages=[
        {"role": "system", "content": "Summarize the agent results into a clear, helpful answer."},
        {"role": "user", "content": f"User asked: {query}
Results: {json.dumps(results)}"},
    ],
    temperature=0.7,
)

The routing LLM uses temperature=0.0 for precision, while synthesis uses temperature=0.7 for readability.

Three Query Types, Three Execution Modes

Examples illustrate how the same pipeline handles different queries:

Single – "Current system metrics?" → router selects data_agent__get_report → executor runs it → synthesizer writes a summary.

Parallel – "Give me metrics and trend analysis" → router selects two agents → executor runs both concurrently → synthesizer merges results.

Sequential – "Check anomalies and pull config if any" → router selects plan_execution → executor runs analytics_agent then config_agent in order → synthesizer explains the chain.

All three queries share the same three‑function pipeline:

decision = route_query(client, deployment, query)   # LLM call 1
results  = execute(decision)                     # deterministic, no LLM
answer   = synthesize(client, deployment, query, results)  # LLM call 2

Conclusion

LLM acts as the "brain" to plan (one call).

Application code acts as the "hands" to execute the plan deterministically.

LLM acts again as the "mouth" to synthesize the final answer (one call).

This separation enables the orchestrator to handle hundreds of requests per second, cache deterministic routes (temperature 0.0), and optionally skip the synthesis step when raw JSON is desired, making it far more suitable for production than a traditional agentic loop.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonLLMPrompt Engineeringcost optimizationOrchestrationAgentic Loop
DeepHub IMBA
Written by

DeepHub IMBA

A must‑follow public account sharing practical AI insights. Follow now. internet + machine learning + big data + architecture = IMBA

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.