AI/MLJune 22, 20263 min read

LLM Streaming with Partial JSON Reconstruction for Better UI

LLM streaming with partial JSON reconstruction keeps your AI interfaces fast. Learn to parse incomplete tokens and update UI components in real time.

LLMstreamingfrontendJSONReactAI engineeringAIRAGPrompt Engineering

Last month, I spent about three days debugging a "stuttering" chat interface that felt sluggish despite using high-speed streaming. The issue wasn't the API latency; it was that we were waiting for the entire JSON payload to finish before rendering anything. If you're building production AI tools, LLM streaming isn't just about showing text character-by-character; it’s about providing immediate, structured feedback to the user.

When you need to stream structured data, the standard JSON.parse() approach fails immediately because the stream is, by definition, invalid JSON until the very last byte arrives. To bridge this gap, you need to implement a parser that can handle partial chunks.

Why standard parsing fails

We first tried simply concatenating chunks and attempting a try-catch block around JSON.parse(). It worked for small objects, but as soon as the LLM generated a nested list or a long string, the parser threw an error, and the UI stayed blank.

If you want to master the basics of this approach, I highly recommend checking out my guide on LLM Streaming Structured Data: Real-Time Parsing Guide. It covers the fundamental state machine approach required to handle these edge cases.

Implementing Partial JSON Reconstruction

To make this work, you need a library that performs incremental parsing. I’ve had success with jsonrepair or similar libraries that attempt to close open brackets and quotes automatically.

Here is a simplified pattern for how we handle this in our React components:


JAVASCRIPT
import { parsePartialJson } from CE9178">'./utils/parser';

const useStreamingJson = (stream) => {
  const [data, setData] = useState({});

  useEffect(() => {
    let accumulated = "";
    const reader = stream.getReader();

    const read = async () => {
      const { done, value } = await reader.read();
      if (done) return;
      
      accumulated += new TextDecoder().decode(value);
      try {
        const partial = parsePartialJson(accumulated);
        setData(partial);
      } catch (e) {
        // Silently ignore during streaming
      }
      read();
    };
    read();
  }, [stream]);

  return data;
};

This approach allows the UI to update as the model generates tokens. The key is ensuring your frontend performance doesn't tank because you're triggering a re-render on every single token. We usually add a debounce or a requestAnimationFrame throttle to limit updates to ~60fps.

Handling Schema Mismatches

Even with a good parser, your structured output might drift. If your schema expects a number but the LLM starts spitting out a string, your UI will crash. I’ve written extensively about Getting reliable structured output from an LLM in production to help mitigate these common failure modes.

When you're dealing with token generation speeds that can hit 50-100 tokens per second, validation becomes expensive. I prefer to validate the final object fully only once the stream completes. During the stream, I treat the data as "optimistic" and keep the UI in a "loading/streaming" state.

The Trade-offs of Partial Parsing

There's a hidden cost to partial JSON parsing. By attempting to fix broken JSON on the fly, you might accidentally interpret a hallucination as a valid field.

Memory Overhead: Storing and parsing the full accumulated string on every chunk gets expensive if the response is large (e.g., a massive JSON object).
UI Flickering: If your parsing logic isn't deterministic, the UI might jump around as the parser guesses the structure of incomplete keys.
Complexity: You now have two sources of truth: the raw stream and the parsed object.

If you find that your schemas are becoming too complex, it’s often better to switch to a tool that handles the heavy lifting, such as Zod. You can see how we handle that in Structured output: Implementing Deterministic JSON Schema Validation.

Lessons Learned

I’m still not 100% happy with how we handle "interrupted" streams—when the network cuts out, we’re left with a half-baked object that isn't quite valid. Next time, I think I’ll implement a more robust buffer that persists the last valid state to a local store, rather than relying solely on memory.

Streaming is a game of managing expectations. If you show the user the data as it arrives, they’re much more patient with the model’s generation time. Just don't let the complexity of the parser become the bottleneck that slows down your application.

Back to Blog

LLM Streaming with Partial JSON Reconstruction for Better UI

Why standard parsing fails

Implementing Partial JSON Reconstruction

Handling Schema Mismatches

The Trade-offs of Partial Parsing

Lessons Learned

Similar Posts

LLM Streaming Structured Data: Real-Time Parsing Guide

Getting reliable structured output from an LLM in production

LLM Security: PII Redaction and Prompt Injection Defense