Structured output is critical for stable LLM apps. Learn how to use Zod schemas and validation patterns to move past unreliable text and into production-grade data.

Last month, I spent about three days debugging a "can't read property of undefined" error in our production pipeline. It turned out the LLM had decided to return a "null" field when it couldn't find a piece of data, even though my prompt explicitly told it to return an empty string. That was the moment I stopped trusting the LLM to follow instructions and started forcing it to follow a schema.
If you’re building production-grade AI tools, you quickly learn that getting reliable structured output from an LLM in production is the difference between a demo that works 80% of the time and a product that works 99.9% of the time.
We first tried solving this with clever prompt engineering. We appended phrases like "You must return a valid JSON object" and included an example JSON schema in the system prompt. It worked—until it didn't.
LLMs are probabilistic, not deterministic. Even with high-quality models like GPT-4o or Claude 3.5 Sonnet, a "temperature" setting of 0 isn't a silver bullet. You’ll inevitably hit edge cases where the model hallucinates a trailing comma or wraps the JSON in markdown code blocks, breaking your downstream parsers.
Relying on prompts for LLM schema enforcement is like trying to enforce type safety in TypeScript by writing comments in your code. It might help the developer, but the compiler doesn't care. You need a runtime validation layer.

The best way to handle this is by using a schema definition library like Zod. Zod allows you to define your data structure once and use it to both validate the LLM's output and generate the instructions for the model.
Here is a simplified version of what we use to ensure we get exactly what we need:
TYPESCRIPTimport { z } from CE9178">'zod'; const UserPreferencesSchema = z.object({ theme: z.enum([CE9178">'light', CE9178">'dark', CE9178">'system']), notifications: z.boolean(), tags: z.array(z.string()).max(5), }); // We turn this schema into a string to inject into the prompt const schemaInstructions = CE9178">`Return a JSON object that matches this schema: ${JSON.stringify(UserPreferencesSchema.shape)}`;
By defining the schema as a source of truth, you can write a wrapper function that attempts to parse the response. If the LLM returns invalid JSON or violates a schema constraint, you can catch the error, feed it back to the model, and ask for a correction.
To achieve true structured output, you should leverage tool-calling capabilities or "JSON mode" provided by major API providers. These features aren't just features; they're constraints.
When you use the OpenAI json_object response format or Anthropic’s tool use, the model is physically constrained from generating tokens that don't fit the specified structure. It's not just "trying" to follow your rules—the architecture of the completion prevents it from outputting malformed data.
However, even with these features, you shouldn't skip validation. I recommend a three-layer defense:
Zod.parse() fails, log the raw output for debugging.There is a cost to this approach. Every retry adds latency. If you are building a RAG pipeline, check out LLM Guardrails for Production: Input Validation and Output Filtering to see how you can move validation logic closer to the edge.
We’ve found that by combining Zod with a "retry-once" policy, we solve about 95% of our parsing errors without significantly increasing the user-facing latency. If it fails twice, we fail gracefully rather than trying to force the LLM to fix its own logic infinitely.
As your apps grow, you'll eventually need to handle complex, nested objects. Don't try to cram a 50-field schema into a single prompt. If you find yourself doing that, you probably need to break your prompt into smaller, logical units.
Think of it like database normalization. If the LLM is struggling to produce a giant JSON object, split the task into two sequential calls. The first call gets the high-level summary; the second call fills in the specific details.
I’m still experimenting with how to handle streaming structured output. While libraries like ai (from Vercel) have made massive strides here, streaming valid JSON while maintaining strict schema compliance is still a bit of a "wild west" in some contexts. I often find myself falling back to non-streaming responses for critical data extraction tasks just to ensure the validation logic has the full payload to work with.
Ultimately, the goal isn't to make the LLM perfect. The goal is to build a system that is robust enough to handle the LLM's imperfections. Keep your schemas strict, validate everything at the boundary, and never assume the model will do exactly what you asked for on the first try.
Master LLM routing to optimize costs and latency in production. Learn how to build a deterministic multi-model architecture for your AI application.