Oracle-and-worker pattern
Learn how to route tasks between a powerful orchestrator model and cost-efficient worker models, combining reasoning quality with execution speed and lower inference costs.
TL;DR
- Not every subtask in an agentic workflow requires your best model. The oracle-and-worker pattern uses a powerful model for planning, routing, and complex reasoning, and delegates well-defined subtasks to smaller, cheaper, faster models.
- The oracle (large model) handles ambiguity, decomposition, complex multi-step reasoning, and synthesis. Workers (small models) handle predictable, well-scoped tasks: extraction, classification, formatting, summarization of known-structure inputs.
- The cost difference is significant. GPT-4-class tokens are roughly 10-30x more expensive than GPT-3.5-class tokens. On high-volume workflows, routing even half the subtasks to workers can reduce costs by 70-80%.
- The pattern requires you to define clear, narrow task specifications for worker calls. Ambiguous tasks sent to workers fail or produce incorrect results. That's the oracle's job to prevent.
- Works best when most work in a pipeline is structured execution, not reasoning. If most tasks require judgment, the routing overhead won't pay off.
The problem it solves
A document processing pipeline uses GPT-4 for every operation: extract fields from a structured form, classify the document type, translate a code identifier, summarize a section, validate a data format. Each operation costs GPT-4 tokens even though most of the tasks are simple pattern-matching that a much smaller model handles just as well.
At moderate scale (say 10,000 documents per day), the difference between using GPT-4 for everything versus routing simple tasks to a GPT-3.5-class model is several thousand dollars per month in API costs, with minimal quality difference for the structured subtasks.
The oracle-and-worker pattern is essentially load routing: use the expensive resource for tasks that actually require it, and route everything else to cheaper alternatives.
What is it?
An oracle model receives the top-level task and produces a structured work plan: a list of subtasks with their types, inputs, and assigned worker models. Workers execute individual subtasks in parallel where possible. The oracle then synthesizes the worker outputs into a final response.
The split maps naturally to the difference in what models are good at:
| Capability | Oracle (large model) | Worker (small model) |
|---|---|---|
| Decomposing ambiguous requests | Excellent | Poor |
| Complex multi-step reasoning | Excellent | Unreliable |
| JSON extraction from structured input | Adequate | Excellent |
| Text classification (fixed categories) | Adequate | Excellent |
| Spelling/grammar correction | Adequate | Excellent |
| Synthesis across multiple inputs | Excellent | Poor |
| Cost per thousand tokens | High | Low |
How it works
The oracle phase
The oracle receives the task and produces a structured execution plan. It also decides which worker to use for each subtask:
ORACLE_SYSTEM = """
You plan and route tasks to workers. Given a task, produce a JSON plan.
Each step must have: step_id, worker_type, input, output_key.
Worker types: "extractor", "classifier", "summarizer", "validator".
Only assign steps to workers if the task is fully specified. No judgment required.
"""
plan = oracle_llm.generate(task=user_request, system=ORACLE_SYSTEM)
# Returns structured JSON like:
# {
# "steps": [
# {"step_id": 1, "worker_type": "extractor", "input": ..., "output_key": "extracted_fields"},
# {"step_id": 2, "worker_type": "classifier", "input": ..., "output_key": "doc_type"},
# ],
# "synthesis_instruction": "Combine extracted fields with document type to produce..."
# }
The worker phase
Workers are usually small models with tight, specialized system prompts. Workers run the oracle's specified tasks:
Workers run in parallel when dependencies allow:
async def run_workers(plan: list[Step]) -> dict:
tasks = [run_worker(step) for step in plan]
results = await asyncio.gather(*tasks)
return {step.output_key: result for step, result in zip(plan, results)}
Re-routing to oracle on worker failure
Workers fail when the oracle sends them ambiguous or under-specified inputs. Detect this and escalate to the oracle:
result = await run_worker(step)
if result.confidence < WORKER_CONFIDENCE_THRESHOLD:
# Escalate to oracle for this specific step
result = await oracle_llm.generate(task=step, system="Handle this step directly.")
This prevents worker failures from silently propagating as bad structured outputs. You want loud failures that re-route to a better model, not quiet hallucinated placeholders.
When not to route to workers
Route to the oracle when:
- The worker receives a task with ambiguous structure ("parse this but it might be in several formats").
- The worker needs to make a judgment call, not match a pattern.
- Prior worker steps have dependencies that aren't yet resolved.
- A worker returned low-confidence output that the oracle needs to handle directly.
Workers are only effective when the task specification is complete and unambiguous. If you're unsure whether a subtask qualifies, keep it in the oracle.
When to use it
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.