Progressive tool discovery
Start agents with a minimal toolset and let them discover additional tools on demand as task complexity reveals the need, avoiding context window bloat from unused tool schemas.
TL;DR
- Agents with 50+ tools waste 5-10K tokens on tool schemas that go unused in 90% of interactions. Progressive discovery starts with ~5 core tools and a
search_toolsmeta-tool. - When the agent needs a capability it doesn't have, it calls
search_tools("send email")to find and load the relevant tool schema on demand. - Token math: 50 tools at ~200 tokens each = 10,000 tokens upfront. Progressive loading: 5 core tools (1,000 tokens) + 200 per discovered tool. A typical task using 7 tools costs 2,400 tokens instead of 10,000.
- Tool descriptions must be search-friendly with clear keywords, because the agent can only find tools it knows how to ask for.
- Limitation: if the agent doesn't know a tool category exists, it won't search for it. Hint at available categories in the system prompt to bridge the discovery gap.
The Problem It Solves
You build an agent with access to 60 tools: file operations, database queries, email, calendar, Slack, GitHub, Jira, monitoring, deployment, analytics, and more. The agent's system prompt now contains 12,000 tokens just from tool definitions. Every single request pays this tax, even when the user asks "what time is my next meeting?" and the agent needs exactly one tool (calendar lookup).
This is not a theoretical problem. I've profiled agents where 40% of every API call's input tokens were tool schemas the agent never touched. At $3 per million input tokens, an agent handling 50,000 requests/day wastes roughly $90/day on unused tool definitions. That's $2,700/month in pure waste.
The problem compounds with quality. Large tool catalogs confuse the model. When the agent sees 60 possible actions, it occasionally picks the wrong tool, especially when tool names are similar (update_ticket vs update_issue vs edit_task). Fewer visible tools means fewer selection errors.
What Is It?
Progressive tool discovery starts the agent with a small set of always-available tools and a special meta-tool (search_tools) that lets the agent find additional tools when it realizes it needs capabilities beyond the core set. Tools are loaded into context on demand, not upfront.
Think of it as a library catalog. You don't carry every book in the library when you walk in. You carry a card catalog (the meta-tool). When you need a book on organic chemistry, you look it up in the catalog, find its location, and retrieve just that book. The library has 100,000 books, but you only carry the 3 you're actively reading.
How It Works
Tool tier classification
Tools are organized into three tiers based on usage frequency. Tier classification comes from production data: log which tools the agent actually calls across 10,000+ interactions and rank by frequency.
Tier 1 (always loaded): The 3-7 tools the agent uses in over 50% of interactions. These are the bread-and-butter operations: read files, write files, search, run commands. The search_tools meta-tool itself is always Tier 1.
Tier 2 (on-demand): Tools used in 5-50% of interactions. Specialized but common: email, calendar, git operations, database queries, API calls. These are the primary targets for progressive discovery.
Tier 3 (rare): Tools used in under 5% of interactions. Admin operations, deployment, billing, user management. Loading these upfront is almost always waste.
I've found that in most production agents, Tier 1 covers 5-7 tools and handles 70-80% of all requests without any discovery needed. The search_tools meta-tool fires on the remaining 20-30%.
Don't guess tiers, measure them
Tier classification based on intuition is wrong surprisingly often. A tool you think is Tier 2 might be Tier 1 for a specific user segment. Log actual tool call frequency for at least 2 weeks before setting tiers, and re-evaluate quarterly.
The search_tools meta-tool
The meta-tool is the bridge between "I need a capability" and "here's the tool for it." The agent calls search_tools with a natural language query, and the system returns matching tool schemas.
The implementation can be simple (keyword matching against tool descriptions) or sophisticated (embedding-based semantic search). In practice, keyword matching with TF-IDF works surprisingly well because tool descriptions are short and keyword-dense.
The critical design decision is the return format. The meta-tool should return the full tool schema (parameters, types, descriptions), not just a tool name. Returning only names requires a second round trip to get the schema. Returning the full schema in the search results cuts the discovery process from two round trips to one.
Session-level caching
Once the agent discovers a tool, it stays available for the rest of the conversation. The agent doesn't need to re-discover send_email every time it wants to send another message. This amortizes the discovery cost over multi-turn interactions.
The cache is session-scoped, not global. Each new conversation starts fresh with only Tier 1 tools. This prevents context creep: a conversation that touched 30 tools shouldn't burden the next conversation with all 30 schemas.
Cache sizing matters more than you'd expect. I've seen agents accumulate 15-20 discovered tools over a long conversation, pushing context usage back up toward the "load everything" baseline. The fix is a cache eviction policy: keep only the 10 most recently used tools in the active context, and move older ones back to discoverable status. The agent can re-discover them in one round trip if needed.
For the interview: the caching strategy is a great detail to mention. It shows you understand context window management and the difference between session state and persistent state.
Bridging the discovery gap
The biggest weakness of progressive discovery is the "unknown unknowns" problem. If the agent doesn't know a tool category exists, it won't think to search for it. An agent asked to "deploy the staging environment" might not realize there's a deploy_service tool available if it has never encountered deployment tools before.
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.