Tool: AI Chunking Simulator

Simulates how RAG (Retrieval-Augmented Generation) systems split your content into vector search chunks.

What it does

RAG systems power AI search products like Perplexity, ChatGPT web search, and Google AI Overviews. They work by splitting web pages into chunks, embedding them as vectors, and retrieving the most relevant chunks to answer questions. This tool shows you exactly how your page gets chunked — and whether those chunks make sense.

How it works

Step 1: Extract & tokenize

Fetches your page HTML, extracts the main content text, and tokenizes it using a word-based tokenizer (approximating GPT-style token counts).

Step 2: Split at 3 sizes

The content is recursively split into chunks at three standard sizes used by real RAG systems:

Chunk Size	Use Case
256 tokens	Fine-grained retrieval — precise answers to specific questions
512 tokens	Balanced — most common in production RAG systems
1024 tokens	Coarse retrieval — broad context for complex questions

Splitting is recursive: first by headings (H1/H2/H3), then by paragraphs, then by sentences if paragraphs exceed the chunk size.

Step 3: Score chunk health

Each chunk set is scored starting from 100, with penalties for quality issues:

Penalty	Points	Meaning
Mid-sentence split	−5 per chunk	A sentence was cut in half between chunks, losing meaning.
Structural split	−3 per chunk	A list, table, or code block was split across chunks.
Tiny chunk (<50 tokens)	−2 per chunk	Chunk too small to carry meaningful information.
Oversized chunk	−3 per chunk	Chunk exceeds the target size significantly.

How to improve chunk health

Use clear heading structure: H2/H3 headings create natural chunking boundaries.
Keep sections self-contained: Each section should make sense on its own.
Avoid very long paragraphs: Break text into digestible paragraphs of 3–5 sentences.
Put key information early: The first chunk is often the most important for retrieval.

API endpoint

GET /api/chunking?url=https://example.com/page

JSON output

chunks_256, chunks_512, chunks_1024 — arrays of chunk objects
Each chunk: text, token_count, boundary_type
health_score — 0–100 overall chunk health
total_tokens — total token count of the extracted content