Structured output

TL;DR

Structured output forces the LLM to produce valid JSON, XML, or any schema-compliant format by constraining the token sampler at generation time.
OpenAI's structured output mode and Anthropic's tool-use with input schemas achieve this via constrained decoding: the model physically cannot emit an invalid token.
For open-source models, libraries like Outlines and Instructor intercept the logit distribution and mask out tokens that would violate the schema.
Structured output eliminates the "parse and retry" anti-pattern that adds 200-500ms latency and wastes tokens on malformed responses.
JSON mode alone is not enough for production. You need schema enforcement (specific fields, types, enums) not just "some valid JSON."

Your invoice extraction pipeline looks clean in staging. You prompt the model to return JSON with invoice_number, amount, and currency, and it does. Then you deploy and JSONDecodeError starts appearing in your logs.

The model returns something like "Here is the JSON you requested:\n```json\n{...}\n```" for about 2% of inputs. Another 1.5% return amount as the string "149.99" rather than a float, so downstream arithmetic throws a TypeError. Your response.choices[0].message.content parser expected raw JSON and received a markdown code block.

# What you asked for (and received in staging):
{"invoice_number": "INV-2024-001", "amount": 149.99, "currency": "USD"}

# What production returned at 2am on a Friday:

Mode	Guarantees	Failure modes eliminated	Production parse success
No constraint	Nothing	None	~94-97%
JSON mode	Syntactically valid JSON	Markdown fences, prose preambles	~99%
Structured output (strict)	Fields, types, enums match schema	Wrong types, missing required fields, invalid enum values	~99.9%

Structured output

TL;DR

The problem it solves

Continue Reading with Premium

Comments

What is it?

How it works

Token-level constrained decoding

JSON mode vs schema-enforced mode

Open-source: Outlines, Instructor, and SGLang