Oracle-and-worker pattern

TL;DR

Not every subtask in an agentic workflow requires your best model. The oracle-and-worker pattern uses a powerful model for planning, routing, and complex reasoning, and delegates well-defined subtasks to smaller, cheaper, faster models.
The oracle (large model) handles ambiguity, decomposition, complex multi-step reasoning, and synthesis. Workers (small models) handle predictable, well-scoped tasks: extraction, classification, formatting, summarization of known-structure inputs.
The cost difference is significant. GPT-4-class tokens are roughly 10-30x more expensive than GPT-3.5-class tokens. On high-volume workflows, routing even half the subtasks to workers can reduce costs by 70-80%.
The pattern requires you to define clear, narrow task specifications for worker calls. Ambiguous tasks sent to workers fail or produce incorrect results. That's the oracle's job to prevent.
Works best when most work in a pipeline is structured execution, not reasoning. If most tasks require judgment, the routing overhead won't pay off.

A document processing pipeline uses GPT-4 for every operation: extract fields from a structured form, classify the document type, translate a code identifier, summarize a section, validate a data format. Each operation costs GPT-4 tokens even though most of the tasks are simple pattern-matching that a much smaller model handles just as well.

At moderate scale (say 10,000 documents per day), the difference between using GPT-4 for everything versus routing simple tasks to a GPT-3.5-class model is several thousand dollars per month in API costs, with minimal quality difference for the structured subtasks.

The oracle-and-worker pattern is essentially load routing: use the expensive resource for tasks that actually require it, and route everything else to cheaper alternatives.

What is it?

An oracle model receives the top-level task and produces a structured work plan: a list of subtasks with their types, inputs, and assigned worker models. Workers execute individual subtasks in parallel where possible. The oracle then synthesizes the worker outputs into a final response.

The split maps naturally to the difference in what models are good at:

Capability	Oracle (large model)	Worker (small model)
Decomposing ambiguous requests	Excellent	Poor
Complex multi-step reasoning	Excellent	Unreliable
JSON extraction from structured input	Adequate	Excellent
Text classification (fixed categories)	Adequate	Excellent
Spelling/grammar correction	Adequate	Excellent
Synthesis across multiple inputs	Excellent	Poor
Cost per thousand tokens	High	Low

How it works

The oracle phase

The oracle receives the task and produces a structured execution plan. It also decides which worker to use for each subtask:

ORACLE_SYSTEM = """
You plan and route tasks to workers. Given a task, produce a JSON plan.
Each step must have: step_id, worker_type, input, output_key.
Worker types: "extractor", "classifier", "summarizer", "validator".
Only assign steps to workers if the task is fully specified. No judgment required.
"""

plan = oracle_llm.generate(task=user_request, system=ORACLE_SYSTEM)
# Returns structured JSON like:
# {
#   "steps": [
#     {"step_id": 1, "worker_type": "extractor", "input": ..., "output_key": "extracted_fields"},
#     {"step_id": 2, "worker_type": "classifier", "input": ..., "output_key": "doc_type"},
#   ],
#   "synthesis_instruction": "Combine extracted fields with document type to produce..."
# }

The worker phase

Workers are usually small models with tight, specialized system prompts. Workers run the oracle's specified tasks:

Workers run in parallel when dependencies allow:

async def run_workers(plan: list[Step]) -> dict:
    tasks = [run_worker(step) for step in plan]
    results = await asyncio.gather(*tasks)
    return {step.output_key: result for step, result in zip(plan, results)}

Re-routing to oracle on worker failure

Workers fail when the oracle sends them ambiguous or under-specified inputs. Detect this and escalate to the oracle:

result = await run_worker(step)
if result.confidence < WORKER_CONFIDENCE_THRESHOLD:
    # Escalate to oracle for this specific step
    result = await oracle_llm.generate(task=step, system="Handle this step directly.")

This prevents worker failures from silently propagating as bad structured outputs. You want loud failures that re-route to a better model, not quiet hallucinated placeholders.

When not to route to workers

Route to the oracle when:

The worker receives a task with ambiguous structure ("parse this but it might be in several formats").
The worker needs to make a judgment call, not match a pattern.
Prior worker steps have dependencies that aren't yet resolved.
A worker returned low-confidence output that the oracle needs to handle directly.

Workers are only effective when the task specification is complete and unambiguous. If you're unsure whether a subtask qualifies, keep it in the oracle.

When to use it

TL;DR

Not every subtask in an agentic workflow requires your best model. The oracle-and-worker pattern uses a powerful model for planning, routing, and complex reasoning, and delegates well-defined subtasks to smaller, cheaper, faster models.
The oracle (large model) handles ambiguity, decomposition, complex multi-step reasoning, and synthesis. Workers (small models) handle predictable, well-scoped tasks: extraction, classification, formatting, summarization of known-structure inputs.
The cost difference is significant. GPT-4-class tokens are roughly 10-30x more expensive than GPT-3.5-class tokens. On high-volume workflows, routing even half the subtasks to workers can reduce costs by 70-80%.
The pattern requires you to define clear, narrow task specifications for worker calls. Ambiguous tasks sent to workers fail or produce incorrect results. That's the oracle's job to prevent.
Works best when most work in a pipeline is structured execution, not reasoning. If most tasks require judgment, the routing overhead won't pay off.

The problem it solves

The oracle-and-worker pattern is essentially load routing: use the expensive resource for tasks that actually require it, and route everything else to cheaper alternatives.

What is it?

The split maps naturally to the difference in what models are good at:

Capability	Oracle (large model)	Worker (small model)
Decomposing ambiguous requests	Excellent	Poor
Complex multi-step reasoning	Excellent	Unreliable
JSON extraction from structured input	Adequate	Excellent
Text classification (fixed categories)	Adequate	Excellent
Spelling/grammar correction	Adequate	Excellent
Synthesis across multiple inputs	Excellent	Poor
Cost per thousand tokens	High	Low

How it works

The oracle phase

The oracle receives the task and produces a structured execution plan. It also decides which worker to use for each subtask:

ORACLE_SYSTEM = """
You plan and route tasks to workers. Given a task, produce a JSON plan.
Each step must have: step_id, worker_type, input, output_key.
Worker types: "extractor", "classifier", "summarizer", "validator".
Only assign steps to workers if the task is fully specified. No judgment required.
"""

plan = oracle_llm.generate(task=user_request, system=ORACLE_SYSTEM)
# Returns structured JSON like:
# {
#   "steps": [
#     {"step_id": 1, "worker_type": "extractor", "input": ..., "output_key": "extracted_fields"},
#     {"step_id": 2, "worker_type": "classifier", "input": ..., "output_key": "doc_type"},
#   ],
#   "synthesis_instruction": "Combine extracted fields with document type to produce..."
# }

The worker phase

Workers are usually small models with tight, specialized system prompts. Workers run the oracle's specified tasks:

Workers run in parallel when dependencies allow:

async def run_workers(plan: list[Step]) -> dict:
    tasks = [run_worker(step) for step in plan]
    results = await asyncio.gather(*tasks)
    return {step.output_key: result for step, result in zip(plan, results)}

Re-routing to oracle on worker failure

Workers fail when the oracle sends them ambiguous or under-specified inputs. Detect this and escalate to the oracle:

result = await run_worker(step)
if result.confidence < WORKER_CONFIDENCE_THRESHOLD:
    # Escalate to oracle for this specific step
    result = await oracle_llm.generate(task=step, system="Handle this step directly.")

This prevents worker failures from silently propagating as bad structured outputs. You want loud failures that re-route to a better model, not quiet hallucinated placeholders.

When not to route to workers

Route to the oracle when:

The worker receives a task with ambiguous structure ("parse this but it might be in several formats").
The worker needs to make a judgment call, not match a pattern.
Prior worker steps have dependencies that aren't yet resolved.
A worker returned low-confidence output that the oracle needs to handle directly.

Workers are only effective when the task specification is complete and unambiguous. If you're unsure whether a subtask qualifies, keep it in the oracle.

Oracle-and-worker pattern

TL;DR

The problem it solves

What is it?

How it works

The oracle phase

The worker phase

Re-routing to oracle on worker failure

When not to route to workers

When to use it

Continue Reading with Premium

Comments

Oracle-and-worker pattern

TL;DR

The problem it solves

What is it?

How it works

The oracle phase

The worker phase

Re-routing to oracle on worker failure

When not to route to workers

When to use it

Continue Reading with Premium

Comments