AI/MLJune 20, 20264 min read

Structured output: Implementing Deterministic JSON Schema Validation

Structured output is critical for stable LLM apps. Learn how to use Zod schemas and validation patterns to move past unreliable text and into production-grade data.

LLMZodTypeScriptAI Engineeringstructured outputprompt engineeringAIRAG

Last month, I spent about three days debugging a "can't read property of undefined" error in our production pipeline. It turned out the LLM had decided to return a "null" field when it couldn't find a piece of data, even though my prompt explicitly told it to return an empty string. That was the moment I stopped trusting the LLM to follow instructions and started forcing it to follow a schema.

If you’re building production-grade AI tools, you quickly learn that getting reliable structured output from an LLM in production is the difference between a demo that works 80% of the time and a product that works 99.9% of the time.

Why Prompt Engineering Isn't Enough

We first tried solving this with clever prompt engineering. We appended phrases like "You must return a valid JSON object" and included an example JSON schema in the system prompt. It worked—until it didn't.

LLMs are probabilistic, not deterministic. Even with high-quality models like GPT-4o or Claude 3.5 Sonnet, a "temperature" setting of 0 isn't a silver bullet. You’ll inevitably hit edge cases where the model hallucinates a trailing comma or wraps the JSON in markdown code blocks, breaking your downstream parsers.

Relying on prompts for LLM schema enforcement is like trying to enforce type safety in TypeScript by writing comments in your code. It might help the developer, but the compiler doesn't care. You need a runtime validation layer.

The Zod Approach to Schema Enforcement

A detailed crime investigation board filled with photos, maps, and red strings connecting clues and evidence.

The best way to handle this is by using a schema definition library like Zod. Zod allows you to define your data structure once and use it to both validate the LLM's output and generate the instructions for the model.

Here is a simplified version of what we use to ensure we get exactly what we need:


TYPESCRIPT
import { z } from CE9178">'zod';

const UserPreferencesSchema = z.object({
  theme: z.enum([CE9178">'light', CE9178">'dark', CE9178">'system']),
  notifications: z.boolean(),
  tags: z.array(z.string()).max(5),
});

// We turn this schema into a string to inject into the prompt
const schemaInstructions = CE9178">`Return a JSON object that matches this schema: ${JSON.stringify(UserPreferencesSchema.shape)}`;

By defining the schema as a source of truth, you can write a wrapper function that attempts to parse the response. If the LLM returns invalid JSON or violates a schema constraint, you can catch the error, feed it back to the model, and ask for a correction.

Implementing Deterministic JSON Schema Validation

To achieve true structured output, you should leverage tool-calling capabilities or "JSON mode" provided by major API providers. These features aren't just features; they're constraints.

When you use the OpenAI json_object response format or Anthropic’s tool use, the model is physically constrained from generating tokens that don't fit the specified structure. It's not just "trying" to follow your rules—the architecture of the completion prevents it from outputting malformed data.

However, even with these features, you shouldn't skip validation. I recommend a three-layer defense:

System-Level Enforcement: Use native tool/JSON mode to force the model to output valid structure.
Runtime Validation: Use Zod to parse the string result immediately. If Zod.parse() fails, log the raw output for debugging.
Feedback Loops: If your application allows it, treat schema violations as "soft failures." Use the error message from Zod to tell the LLM exactly why its output failed (e.g., "The 'tags' array contained 6 items, but the limit is 5").

Handling Trade-offs and Latency

There is a cost to this approach. Every retry adds latency. If you are building a RAG pipeline, check out LLM Guardrails for Production: Input Validation and Output Filtering to see how you can move validation logic closer to the edge.

We’ve found that by combining Zod with a "retry-once" policy, we solve about 95% of our parsing errors without significantly increasing the user-facing latency. If it fails twice, we fail gracefully rather than trying to force the LLM to fix its own logic infinitely.

Moving Beyond Simple Schemas

As your apps grow, you'll eventually need to handle complex, nested objects. Don't try to cram a 50-field schema into a single prompt. If you find yourself doing that, you probably need to break your prompt into smaller, logical units.

Think of it like database normalization. If the LLM is struggling to produce a giant JSON object, split the task into two sequential calls. The first call gets the high-level summary; the second call fills in the specific details.

I’m still experimenting with how to handle streaming structured output. While libraries like ai (from Vercel) have made massive strides here, streaming valid JSON while maintaining strict schema compliance is still a bit of a "wild west" in some contexts. I often find myself falling back to non-streaming responses for critical data extraction tasks just to ensure the validation logic has the full payload to work with.

Ultimately, the goal isn't to make the LLM perfect. The goal is to build a system that is robust enough to handle the LLM's imperfections. Keep your schemas strict, validate everything at the boundary, and never assume the model will do exactly what you asked for on the first try.

Back to Blog

Structured output: Implementing Deterministic JSON Schema Validation

Why Prompt Engineering Isn't Enough

The Zod Approach to Schema Enforcement

Implementing Deterministic JSON Schema Validation

Handling Trade-offs and Latency

Moving Beyond Simple Schemas

Similar Posts

LLM Guardrails for Production: Input Validation and Output Filtering

LLM Routing: A Strategy for Multi-Model Architectures

Controlling LLM cost and latency: A Practical Production Guide