Language agent tree search (LATS)

TL;DR

LATS combines an LLM with Monte Carlo Tree Search (MCTS): the agent explores a tree of action sequences, using environment feedback (not just self-evaluation) to score and backtrack.
The LLM plays three roles simultaneously: action proposer, state evaluator, and self-reflector that learns from failed branches.
LATS achieved 94.4% on HumanEval (vs 80.1% for ReAct) and state-of-the-art on WebShop, proving that search + environment feedback dramatically outperforms single-pass agents.
The UCB1 selection formula balances exploitation of high-scoring branches with exploration of under-visited ones, borrowed directly from game-tree search.
The cost is significant: 20-100x more LLM calls per task than ReAct. Reserve this for high-value tasks where getting the right answer matters more than speed.

Your coding agent receives a task: "Write a function that merges two sorted linked lists." It generates code, runs the tests, and two out of five test cases fail. The agent looks at the error, patches the code, and now three tests fail. It patches again and introduces a new bug. After six iterations, the agent is stuck in a local minimum, making tiny edits to fundamentally broken logic with no way to start fresh from a different approach.

This is the core failure of linear agent execution. ReAct-style agents take one action, observe the result, and take the next action. They never back up. Once the agent commits to a flawed approach (say, an iterative solution when recursion would be cleaner), every subsequent action is constrained by that initial choice. The agent has no mechanism to say "this entire approach is wrong, let me try something fundamentally different."

I've watched coding agents spend 15+ tool calls patching bugs that a human developer would solve in one attempt, simply by stepping back and choosing a different algorithm. The agents weren't stupid. They were trapped by a linear execution model that doesn't support backtracking.

LATS (Language Agent Tree Search), introduced by Zhou et al. (2023), solves this by treating agent execution as a search problem. Instead of one linear chain of actions, the agent builds a tree of possible action sequences, evaluates branches using real environment feedback (test results, tool outputs, web page content), and backtracks to explore alternatives when a branch fails. When it backtracks, it generates a reflection on why the branch failed, and that reflection informs future attempts.

What Is It?

LATS merges LLM-based agents with Monte Carlo Tree Search (MCTS) to create an agent that explores, evaluates, backtracks, and learns from its mistakes during a single task.

Think of it like a chess engine that also plays the game. In MCTS for chess, the engine explores future board positions, evaluates them with a heuristic, and focuses its search on the most promising lines of play. LATS does the same thing, but the "board positions" are agent states (accumulated context, tool results, partial solutions), the "moves" are LLM-generated actions, and the "heuristic" is a combination of LLM self-evaluation and real environment feedback.

The critical difference between LATS and Tree-of-Thought (ToT) is the source of evaluation signal. ToT asks the LLM "how good does this partial solution look?" LATS actually executes the action in the environment (runs the code, navigates the webpage, calls the API) and uses the real result to score the branch. This is like the difference between a chess player imagining what a position might look like versus actually playing the move and seeing what happens.

Where LATS Sits in the Landscape

Strategy	Search Structure	Evaluation Signal	Backtracking	Learning from Failures
ReAct	Linear chain	Environment feedback	None	None
Reflexion	Linear + retry	Environment feedback	Restart only	Global reflection
Tree-of-Thought	Tree search	LLM self-evaluation	Branch-level	None
LATS	Tree search	Environment + LLM	Branch-level	Per-branch reflection

LATS is the only approach that combines all four capabilities: tree-structured search, real environment feedback, branch-level backtracking, and learned reflections that transfer knowledge between branches.

How It Works

LATS follows the four phases of Monte Carlo Tree Search, adapted for LLM agents. Each phase uses the LLM in a different role.

Phase 1: Selection (UCB1)

Starting from the root, the algorithm traverses the tree by selecting child nodes that maximize the Upper Confidence Bound (UCB1) formula:

UCB1(node) = V(node) / N(node) + c * sqrt(ln(N(parent)) / N(node))

The first term (V/N) is exploitation: prefer nodes with high average reward. The second term (c * sqrt(...)) is exploration: prefer nodes that haven't been visited much. The constant c controls the balance, typically set to 1.4 (the theoretical optimum for reward in [0,1]).

This formula is what prevents LATS from getting stuck. Even if branch A has been performing well, if branch B has barely been explored, the exploration bonus will eventually push the algorithm to try B. I've seen this rescue agents from local optima where they kept refining a mediocre approach when a fundamentally better one existed one branch away.

Phase 2: Expansion (LLM as Action Proposer)

At a selected leaf node, the LLM generates n candidate next actions. Unlike ToT where thoughts are pure reasoning, LATS actions are executable: "write function merge_sorted_lists with a two-pointer approach", "run pytest test_merge.py", "edit line 15 to fix the off-by-one error."

The LLM receives the current state (all accumulated context, tool results, and any reflections from failed sibling branches) and proposes multiple distinct actions. The key word is distinct. If the LLM proposes three nearly identical code edits, the tree search wastes its branching budget. High temperature (0.7-1.0) and explicit diversity instructions in the prompt help.

Each candidate action creates a new child node. The child node's state includes the parent's state plus the action and its execution result.

Phase 3: Simulation (LLM as Evaluator + Environment Feedback)

This is where LATS diverges most sharply from pure MCTS. In game-playing MCTS, simulation means running random rollouts to a terminal state. LATS replaces random rollouts with two forms of evaluation:

Environment feedback: Execute the action and observe the real result. Run the code and check test outcomes. Navigate the webpage and check if the target element appeared. Call the API and verify the response schema. This provides ground-truth signal that no amount of LLM self-evaluation can match.
LLM state evaluation: The LLM scores the resulting state on a scale (typically 1-10 or binary success/fail). This is useful for intermediate states where environment feedback is partial, like "3 out of 5 tests pass" combined with "the remaining failures look like edge cases, not fundamental logic errors."

The combined score goes into the node's value estimate. I always weight environment feedback more heavily than LLM self-evaluation. A test that passes is worth more than the LLM saying "this looks correct."

Phase 4: Backpropagation (Update + Reflection)

The simulation score propagates back up the tree, updating the value V and visit count N of every ancestor node. This is standard MCTS backpropagation.

But LATS adds a critical twist: when a branch reaches a terminal failure state (all tests fail, the agent hits a dead end), the LLM generates a reflection. The reflection is a short analysis of what went wrong and why. "The recursive approach failed because it doesn't handle the case where one list is empty. Future attempts should add a base case check for null head pointers."

This reflection is stored and injected into the context of sibling branches during their expansion phase. When the algorithm backtracks to try a different approach, the new branch starts with knowledge of what already failed and why. This is the "learning" component that separates LATS from vanilla tree search.

TL;DR

LATS combines an LLM with Monte Carlo Tree Search (MCTS): the agent explores a tree of action sequences, using environment feedback (not just self-evaluation) to score and backtrack.
The LLM plays three roles simultaneously: action proposer, state evaluator, and self-reflector that learns from failed branches.
LATS achieved 94.4% on HumanEval (vs 80.1% for ReAct) and state-of-the-art on WebShop, proving that search + environment feedback dramatically outperforms single-pass agents.
The UCB1 selection formula balances exploitation of high-scoring branches with exploration of under-visited ones, borrowed directly from game-tree search.
The cost is significant: 20-100x more LLM calls per task than ReAct. Reserve this for high-value tasks where getting the right answer matters more than speed.

The Problem It Solves

What Is It?

LATS merges LLM-based agents with Monte Carlo Tree Search (MCTS) to create an agent that explores, evaluates, backtracks, and learns from its mistakes during a single task.

Where LATS Sits in the Landscape

Strategy	Search Structure	Evaluation Signal	Backtracking	Learning from Failures
ReAct	Linear chain	Environment feedback	None	None
Reflexion	Linear + retry	Environment feedback	Restart only	Global reflection
Tree-of-Thought	Tree search	LLM self-evaluation	Branch-level	None
LATS	Tree search	Environment + LLM	Branch-level	Per-branch reflection

Environment feedback: Execute the action and observe the real result. Run the code and check test outcomes. Navigate the webpage and check if the target element appeared. Call the API and verify the response schema. This provides ground-truth signal that no amount of LLM self-evaluation can match.
LLM state evaluation: The LLM scores the resulting state on a scale (typically 1-10 or binary success/fail). This is useful for intermediate states where environment feedback is partial, like "3 out of 5 tests pass" combined with "the remaining failures look like edge cases, not fundamental logic errors."

Phase 4: Backpropagation (Update + Reflection)

The simulation score propagates back up the tree, updating the value V and visit count N of every ancestor node. This is standard MCTS backpropagation.

Language agent tree search (LATS)

TL;DR

The Problem It Solves

What Is It?

Where LATS Sits in the Landscape

How It Works

Phase 1: Selection (UCB1)

Phase 2: Expansion (LLM as Action Proposer)

Phase 3: Simulation (LLM as Evaluator + Environment Feedback)

Phase 4: Backpropagation (Update + Reflection)

Continue Reading with Premium

Comments

Language agent tree search (LATS)

TL;DR

The Problem It Solves

What Is It?

Where LATS Sits in the Landscape

How It Works

Phase 1: Selection (UCB1)

Phase 2: Expansion (LLM as Action Proposer)

Phase 3: Simulation (LLM as Evaluator + Environment Feedback)

Phase 4: Backpropagation (Update + Reflection)

Continue Reading with Premium

Comments