What is Token Bloat?

When boilerplate overwhelms useful content, AI crawlers waste their context window on noise.

Definition

Token Bloat Ratio is the ratio of total HTML size to useful visible text. It's computed during HTML analysis as:

TokenBloatRatio = round(TotalHTMLBytes / UsefulTextBytes, 1)

A ratio of 10× means only 10% of the page's bytes are useful content — the other 90% is HTML tags, scripts, navigation, inline JSON state, and other boilerplate.

Why it matters for AI

When AI crawlers extract content from a page, high bloat means the useful information is buried in noise. RAG (Retrieval-Augmented Generation) systems that chunk your pages will create chunks full of navigation links and boilerplate rather than your actual content.

Token Bloat also feeds into the Extractability score via Bloat Efficiency:

BloatEfficiency = clamp(100 / TokenBloatRatio × 5, 0, 100)

This means a page with 20× bloat gets a BloatEfficiency of only 25/100 — dragging down the overall Extractability score.

What "good" looks like

Common causes