Progressive tool discovery

TL;DR

Agents with 50+ tools waste 5-10K tokens on tool schemas that go unused in 90% of interactions. Progressive discovery starts with ~5 core tools and a search_tools meta-tool.
When the agent needs a capability it doesn't have, it calls search_tools("send email") to find and load the relevant tool schema on demand.
Token math: 50 tools at ~200 tokens each = 10,000 tokens upfront. Progressive loading: 5 core tools (1,000 tokens) + 200 per discovered tool. A typical task using 7 tools costs 2,400 tokens instead of 10,000.
Tool descriptions must be search-friendly with clear keywords, because the agent can only find tools it knows how to ask for.
Limitation: if the agent doesn't know a tool category exists, it won't search for it. Hint at available categories in the system prompt to bridge the discovery gap.

You build an agent with access to 60 tools: file operations, database queries, email, calendar, Slack, GitHub, Jira, monitoring, deployment, analytics, and more. The agent's system prompt now contains 12,000 tokens just from tool definitions. Every single request pays this tax, even when the user asks "what time is my next meeting?" and the agent needs exactly one tool (calendar lookup).

This is not a theoretical problem. I've profiled agents where 40% of every API call's input tokens were tool schemas the agent never touched. At $3 per million input tokens, an agent handling 50,000 requests/day wastes roughly $90/day on unused tool definitions. That's $2,700/month in pure waste.

The problem compounds with quality. Large tool catalogs confuse the model. When the agent sees 60 possible actions, it occasionally picks the wrong tool, especially when tool names are similar (update_ticket vs update_issue vs edit_task). Fewer visible tools means fewer selection errors.

What Is It?

Progressive tool discovery starts the agent with a small set of always-available tools and a special meta-tool (search_tools) that lets the agent find additional tools when it realizes it needs capabilities beyond the core set. Tools are loaded into context on demand, not upfront.

Think of it as a library catalog. You don't carry every book in the library when you walk in. You carry a card catalog (the meta-tool). When you need a book on organic chemistry, you look it up in the catalog, find its location, and retrieve just that book. The library has 100,000 books, but you only carry the 3 you're actively reading.

How It Works

Tool tier classification

Tools are organized into three tiers based on usage frequency. Tier classification comes from production data: log which tools the agent actually calls across 10,000+ interactions and rank by frequency.

Tier 1 (always loaded): The 3-7 tools the agent uses in over 50% of interactions. These are the bread-and-butter operations: read files, write files, search, run commands. The search_tools meta-tool itself is always Tier 1.

Tier 2 (on-demand): Tools used in 5-50% of interactions. Specialized but common: email, calendar, git operations, database queries, API calls. These are the primary targets for progressive discovery.

Tier 3 (rare): Tools used in under 5% of interactions. Admin operations, deployment, billing, user management. Loading these upfront is almost always waste.

I've found that in most production agents, Tier 1 covers 5-7 tools and handles 70-80% of all requests without any discovery needed. The search_tools meta-tool fires on the remaining 20-30%.

Don't guess tiers, measure them

Tier classification based on intuition is wrong surprisingly often. A tool you think is Tier 2 might be Tier 1 for a specific user segment. Log actual tool call frequency for at least 2 weeks before setting tiers, and re-evaluate quarterly.

The search_tools meta-tool

The meta-tool is the bridge between "I need a capability" and "here's the tool for it." The agent calls search_tools with a natural language query, and the system returns matching tool schemas.

The implementation can be simple (keyword matching against tool descriptions) or sophisticated (embedding-based semantic search). In practice, keyword matching with TF-IDF works surprisingly well because tool descriptions are short and keyword-dense.

The critical design decision is the return format. The meta-tool should return the full tool schema (parameters, types, descriptions), not just a tool name. Returning only names requires a second round trip to get the schema. Returning the full schema in the search results cuts the discovery process from two round trips to one.

Session-level caching

Once the agent discovers a tool, it stays available for the rest of the conversation. The agent doesn't need to re-discover send_email every time it wants to send another message. This amortizes the discovery cost over multi-turn interactions.

The cache is session-scoped, not global. Each new conversation starts fresh with only Tier 1 tools. This prevents context creep: a conversation that touched 30 tools shouldn't burden the next conversation with all 30 schemas.

Cache sizing matters more than you'd expect. I've seen agents accumulate 15-20 discovered tools over a long conversation, pushing context usage back up toward the "load everything" baseline. The fix is a cache eviction policy: keep only the 10 most recently used tools in the active context, and move older ones back to discoverable status. The agent can re-discover them in one round trip if needed.

For the interview: the caching strategy is a great detail to mention. It shows you understand context window management and the difference between session state and persistent state.

Task Received

>Waiting for task...

Try Core Tools

>5 core tools loaded

Search Tools

>Waiting...

Load Schema

>Waiting...

Execute Tool

>Waiting...

Progressive discovery flow: the agent starts with core tools, discovers additional capabilities mid-task, and caches them for reuse

Bridging the discovery gap

The biggest weakness of progressive discovery is the "unknown unknowns" problem. If the agent doesn't know a tool category exists, it won't think to search for it. An agent asked to "deploy the staging environment" might not realize there's a deploy_service tool available if it has never encountered deployment tools before.

TL;DR

Agents with 50+ tools waste 5-10K tokens on tool schemas that go unused in 90% of interactions. Progressive discovery starts with ~5 core tools and a search_tools meta-tool.
When the agent needs a capability it doesn't have, it calls search_tools("send email") to find and load the relevant tool schema on demand.
Token math: 50 tools at ~200 tokens each = 10,000 tokens upfront. Progressive loading: 5 core tools (1,000 tokens) + 200 per discovered tool. A typical task using 7 tools costs 2,400 tokens instead of 10,000.
Tool descriptions must be search-friendly with clear keywords, because the agent can only find tools it knows how to ask for.
Limitation: if the agent doesn't know a tool category exists, it won't search for it. Hint at available categories in the system prompt to bridge the discovery gap.

The Problem It Solves

What Is It?

How It Works

Tool tier classification

Tier 3 (rare): Tools used in under 5% of interactions. Admin operations, deployment, billing, user management. Loading these upfront is almost always waste.

I've found that in most production agents, Tier 1 covers 5-7 tools and handles 70-80% of all requests without any discovery needed. The search_tools meta-tool fires on the remaining 20-30%.

Don't guess tiers, measure them

The search_tools meta-tool

The meta-tool is the bridge between "I need a capability" and "here's the tool for it." The agent calls search_tools with a natural language query, and the system returns matching tool schemas.

Session-level caching

For the interview: the caching strategy is a great detail to mention. It shows you understand context window management and the difference between session state and persistent state.

Task Received

>Waiting for task...

Try Core Tools

>5 core tools loaded

Search Tools

>Waiting...

Load Schema

>Waiting...

Execute Tool

>Waiting...

Progressive discovery flow: the agent starts with core tools, discovers additional capabilities mid-task, and caches them for reuse

Progressive tool discovery

TL;DR

The Problem It Solves

What Is It?

How It Works

Tool tier classification

The search_tools meta-tool

Session-level caching

Bridging the discovery gap

Continue Reading with Premium

Comments

Progressive tool discovery

TL;DR

The Problem It Solves

What Is It?

How It Works

Tool tier classification

The search_tools meta-tool

Session-level caching

Bridging the discovery gap

Continue Reading with Premium

Comments