Getting reliable structured output from an LLM is the difference between a prototype and a product. Learn how to enforce JSON schemas effectively.

Last Thursday, our internal data extraction pipeline crashed because an LLM decided to wrap its JSON response in markdown code blocks and add a conversational "Here is your data!" prefix. We spent three hours debugging why our parser was throwing syntax errors, eventually realizing that relying on prompt-based instructions alone is a fool's errand.
Getting reliable structured output from an LLM is the difference between a toy project and a production-grade application. If you’re building features that need to talk to your database or trigger reliable background jobs, you cannot afford non-deterministic output.
Most developers start by appending "Return valid JSON" to their system prompt. This works about 85% of the time, until it doesn't. You’ll eventually encounter truncated responses, trailing commas, or the dreaded "Sure, I can help with that!" preamble.
When we first built our extraction service, we tried regex-based cleaning. We’d strip everything before the first { and after the last }. It broke the moment the LLM included nested objects that looked like structural delimiters. It's a fragile, high-maintenance way to handle data.
Instead of fighting the model's output, you need to constrain the sampling process. Modern LLM APIs now support "JSON Mode" or "Structured Outputs," which fundamentally change how the model generates tokens.
For instance, if you're using OpenAI's gpt-4o, you can define a response_format using a JSON schema. When you do this, the model is physically incapable of generating tokens that violate your structure.
JAVASCRIPTconst completion = await openai.chat.completions.create({ model: "gpt-4o-2024-08-06", messages: [{ role: "user", content: "Extract user info." }], response_format: { type: "json_schema", json_schema: { name: "user_profile", schema: { type: "object", properties: { name: { type: "string" }, age: { type: "number" } }, required: ["name", "age"], additionalProperties: false } } } });
This approach is significantly more reliable than prompt-engineering your way to a result. It’s roughly 10x more stable in my experience than using standard temperature: 0 settings alone.

Even with schema enforcement, your application needs to handle cases where the model refuses to answer or returns an empty object. You should treat LLM responses like external API calls—always validate them before they hit your business logic.
I recommend using a library like Zod in TypeScript or Pydantic in Python to define your schema. Even if the LLM claims to follow a schema, you should re-validate the output at the application boundary. If the validation fails, don't just log it and move on; use a retry strategy similar to the one we use for reliable background jobs.
If you want to stop fighting your parser, follow these steps:

While these tools are great, they aren't magic. I’ve noticed that when I force a very complex schema, the model's reasoning quality sometimes dips. It’s as if the model is spending so much "compute" trying to adhere to the JSON structure that it loses track of the actual task.
Next time, I plan to experiment with "Few-Shot" prompting combined with schema enforcement to see if I can improve the quality of the extraction. We’re also looking into smaller, local models for simple classification tasks, as they can be easier to constrain via grammar-based sampling (like llama.cpp grammars) than massive black-box models.
Are you using schema enforcement, or are you still manually cleaning strings? Either way, stop relying on the "Please return JSON" prompt—it's time to move to structural constraints.
Does JSON mode cost more? Generally, no. You pay for the tokens generated. However, some providers might charge slightly more for the overhead of processing complex schemas.
What if the model ignores my schema? If you are using the provider's native "JSON mode" or "Structured Outputs," the model literally cannot ignore the schema. If it fails, it’s usually because the request was too complex or the model hit a guardrail.
Should I use Pydantic or Zod? Use whichever matches your backend stack. Both are excellent for ensuring your LLM output actually matches your database models.
Building a small RAG pipeline is the fastest way to ground LLMs in your data. Learn the end-to-end process of indexing, retrieval, and generation.