AI/MLJune 20, 20264 min read

Getting reliable structured output from an LLM in production

Getting reliable structured output from an LLM is the difference between a prototype and a product. Learn how to enforce JSON schemas effectively.

LLMAIJSONEngineeringPythonTypeScriptRAGPrompt Engineering

Last Thursday, our internal data extraction pipeline crashed because an LLM decided to wrap its JSON response in markdown code blocks and add a conversational "Here is your data!" prefix. We spent three hours debugging why our parser was throwing syntax errors, eventually realizing that relying on prompt-based instructions alone is a fool's errand.

Getting reliable structured output from an LLM is the difference between a toy project and a production-grade application. If you’re building features that need to talk to your database or trigger reliable background jobs, you cannot afford non-deterministic output.

The Problem with "Please return JSON"

Most developers start by appending "Return valid JSON" to their system prompt. This works about 85% of the time, until it doesn't. You’ll eventually encounter truncated responses, trailing commas, or the dreaded "Sure, I can help with that!" preamble.

When we first built our extraction service, we tried regex-based cleaning. We’d strip everything before the first { and after the last }. It broke the moment the LLM included nested objects that looked like structural delimiters. It's a fragile, high-maintenance way to handle data.

Enforcing schemas with constrained generation

Instead of fighting the model's output, you need to constrain the sampling process. Modern LLM APIs now support "JSON Mode" or "Structured Outputs," which fundamentally change how the model generates tokens.

For instance, if you're using OpenAI's gpt-4o, you can define a response_format using a JSON schema. When you do this, the model is physically incapable of generating tokens that violate your structure.


JAVASCRIPT
const completion = await openai.chat.completions.create({
  model: "gpt-4o-2024-08-06",
  messages: [{ role: "user", content: "Extract user info." }],
  response_format: {
    type: "json_schema",
    json_schema: {
      name: "user_profile",
      schema: {
        type: "object",
        properties: {
          name: { type: "string" },
          age: { type: "number" }
        },
        required: ["name", "age"],
        additionalProperties: false
      }
    }
  }
});

This approach is significantly more reliable than prompt-engineering your way to a result. It’s roughly 10x more stable in my experience than using standard temperature: 0 settings alone.

Handling failures when things go wrong

Scrabble tiles spelling 'Allow For Error' on a white background, symbolizing flexibility and acceptance.

Even with schema enforcement, your application needs to handle cases where the model refuses to answer or returns an empty object. You should treat LLM responses like external API calls—always validate them before they hit your business logic.

I recommend using a library like Zod in TypeScript or Pydantic in Python to define your schema. Even if the LLM claims to follow a schema, you should re-validate the output at the application boundary. If the validation fails, don't just log it and move on; use a retry strategy similar to the one we use for reliable background jobs.

Getting reliable structured output: A checklist

If you want to stop fighting your parser, follow these steps:

Use Native Tools: Always prefer provider-level "Structured Output" modes (like OpenAI's JSON schema or Anthropic's tool use) over raw text generation.
Keep Schemas Flat: Deeply nested JSON structures increase the probability of a token error. If you can flatten your data, do it.
Validate at the Edge: Never trust the model's output blindly. Pass the raw string through a schema validator immediately upon receipt.
Fail Fast: If the schema validation fails, don't try to "fix" the JSON. Throw an error, log the malformed output for analysis, and trigger a retry.

Why I'm still skeptical

A letter board with a humorous message on a vibrant yellow background.

While these tools are great, they aren't magic. I’ve noticed that when I force a very complex schema, the model's reasoning quality sometimes dips. It’s as if the model is spending so much "compute" trying to adhere to the JSON structure that it loses track of the actual task.

Next time, I plan to experiment with "Few-Shot" prompting combined with schema enforcement to see if I can improve the quality of the extraction. We’re also looking into smaller, local models for simple classification tasks, as they can be easier to constrain via grammar-based sampling (like llama.cpp grammars) than massive black-box models.

Are you using schema enforcement, or are you still manually cleaning strings? Either way, stop relying on the "Please return JSON" prompt—it's time to move to structural constraints.

FAQ

Does JSON mode cost more? Generally, no. You pay for the tokens generated. However, some providers might charge slightly more for the overhead of processing complex schemas.

What if the model ignores my schema? If you are using the provider's native "JSON mode" or "Structured Outputs," the model literally cannot ignore the schema. If it fails, it’s usually because the request was too complex or the model hit a guardrail.

Should I use Pydantic or Zod? Use whichever matches your backend stack. Both are excellent for ensuring your LLM output actually matches your database models.

Back to Blog

Getting reliable structured output from an LLM in production

The Problem with "Please return JSON"

Enforcing schemas with constrained generation

Handling failures when things go wrong

Getting reliable structured output: A checklist

Why I'm still skeptical

FAQ

Similar Posts

Prompt patterns that survive contact with production

Building a small RAG pipeline end to end in Python

Component architecture that survives a growing team in Next.js