Mahamudul Hasan Rubel
HomeAboutProjectsSkillsExperienceBlogPhotosContact
Mahamudul Hasan Rubel

Senior Software Engineer crafting high-performance web applications and SaaS platforms.

Navigation

  • Home
  • About
  • Projects
  • Skills
  • Experience
  • Blog
  • Photos
  • Contact

Get in Touch

Available for senior/lead roles and consulting.

bd.mhrubel@gmail.comHire Me

© 2026 Mahamudul Hasan Rubel. All rights reserved.

Built with using Next.js 16 & Tailwind v4

Back to Blog
AI/MLJune 22, 20263 min read

LLM Streaming with Partial JSON Reconstruction for Better UI

LLM streaming with partial JSON reconstruction keeps your AI interfaces fast. Learn to parse incomplete tokens and update UI components in real time.

LLMstreamingfrontendJSONReactAI engineeringAIRAGPrompt Engineering

Last month, I spent about three days debugging a "stuttering" chat interface that felt sluggish despite using high-speed streaming. The issue wasn't the API latency; it was that we were waiting for the entire JSON payload to finish before rendering anything. If you're building production AI tools, LLM streaming isn't just about showing text character-by-character; it’s about providing immediate, structured feedback to the user.

When you need to stream structured data, the standard JSON.parse() approach fails immediately because the stream is, by definition, invalid JSON until the very last byte arrives. To bridge this gap, you need to implement a parser that can handle partial chunks.

Why standard parsing fails

We first tried simply concatenating chunks and attempting a try-catch block around JSON.parse(). It worked for small objects, but as soon as the LLM generated a nested list or a long string, the parser threw an error, and the UI stayed blank.

If you want to master the basics of this approach, I highly recommend checking out my guide on LLM Streaming Structured Data: Real-Time Parsing Guide. It covers the fundamental state machine approach required to handle these edge cases.

Implementing Partial JSON Reconstruction

To make this work, you need a library that performs incremental parsing. I’ve had success with jsonrepair or similar libraries that attempt to close open brackets and quotes automatically.

Here is a simplified pattern for how we handle this in our React components:

JAVASCRIPT
import { parsePartialJson } from CE9178">'./utils/parser';

const useStreamingJson = (stream) => {
  const [data, setData] = useState({});

  useEffect(() => {
    let accumulated = "";
    const reader = stream.getReader();

    const read = async () => {
      const { done, value } = await reader.read();
      if (done) return;
      
      accumulated += new TextDecoder().decode(value);
      try {
        const partial = parsePartialJson(accumulated);
        setData(partial);
      } catch (e) {
        // Silently ignore during streaming
      }
      read();
    };
    read();
  }, [stream]);

  return data;
};

This approach allows the UI to update as the model generates tokens. The key is ensuring your frontend performance doesn't tank because you're triggering a re-render on every single token. We usually add a debounce or a requestAnimationFrame throttle to limit updates to ~60fps.

Handling Schema Mismatches

Even with a good parser, your structured output might drift. If your schema expects a number but the LLM starts spitting out a string, your UI will crash. I’ve written extensively about Getting reliable structured output from an LLM in production to help mitigate these common failure modes.

When you're dealing with token generation speeds that can hit 50-100 tokens per second, validation becomes expensive. I prefer to validate the final object fully only once the stream completes. During the stream, I treat the data as "optimistic" and keep the UI in a "loading/streaming" state.

The Trade-offs of Partial Parsing

There's a hidden cost to partial JSON parsing. By attempting to fix broken JSON on the fly, you might accidentally interpret a hallucination as a valid field.

  1. Memory Overhead: Storing and parsing the full accumulated string on every chunk gets expensive if the response is large (e.g., a massive JSON object).
  2. UI Flickering: If your parsing logic isn't deterministic, the UI might jump around as the parser guesses the structure of incomplete keys.
  3. Complexity: You now have two sources of truth: the raw stream and the parsed object.

If you find that your schemas are becoming too complex, it’s often better to switch to a tool that handles the heavy lifting, such as Zod. You can see how we handle that in Structured output: Implementing Deterministic JSON Schema Validation.

Lessons Learned

I’m still not 100% happy with how we handle "interrupted" streams—when the network cuts out, we’re left with a half-baked object that isn't quite valid. Next time, I think I’ll implement a more robust buffer that persists the last valid state to a local store, rather than relying solely on memory.

Streaming is a game of managing expectations. If you show the user the data as it arrives, they’re much more patient with the model’s generation time. Just don't let the complexity of the parser become the bottleneck that slows down your application.

Back to Blog

Similar Posts

AI/MLJune 21, 20264 min read

LLM Streaming Structured Data: Real-Time Parsing Guide

Master LLM streaming for structured output by parsing partial JSON in real-time. Learn to build responsive AI interfaces with robust validation techniques.

Read more
Close-up view of a vintage tape recorder showcasing VU meters and control knobs.
AI/MLJune 20, 20264 min read

Getting reliable structured output from an LLM in production

Getting reliable structured output from an LLM is the difference between a prototype and a product. Learn how to enforce JSON schemas effectively.

Read more
AI/MLJune 22, 20264 min read

LLM Security: PII Redaction and Prompt Injection Defense

Master LLM security with practical PII redaction and prompt injection defense strategies. Keep your production AI pipelines safe with multi-layered filtering.

Read more