Definition
Extractability is a derived score (0–100) that combines four signals to predict whether an AI system will successfully extract your primary content — or get confused by noise, empty shells, or unstructured text.
How SEODiff computes it
Extractability is a weighted composite of four components:
Extractability = 0.30 × Structure + 0.25 × Schema + 0.25 × Rendering + 0.20 × BloatEfficiency
Where BloatEfficiency is derived from the Token Bloat Ratio:
BloatEfficiency = clamp(100 / TokenBloatRatio × 5, 0, 100)
This means a page with 20× token bloat gets a BloatEfficiency of 25, while a page with 5× bloat or less gets 100.
Why it matters
A page can be accessible (bots aren't blocked) but still have terrible extractability. Common scenarios:
- Empty JavaScript shell — The page loads, returns 200, but the HTML body is just a
<div id="root"></div>. Ghost ratio is high, extractability is near zero.
- Buried content — The real content exists but is surrounded by 50KB of navigation, footer, inline JSON state, and ad markup. Token bloat is high, extractability suffers.
- Flat text wall — The content is there but has no headings, no lists, no semantic structure. AI systems can't parse the information hierarchy.
- No schema — Without JSON-LD, AI crawlers must guess what entities, products, or topics your page represents.
Score interpretation
- 80–100: AI systems can reliably extract your primary content.
- 60–79: Content is partially extractable but some signals are weak.
- 40–59: Significant extraction issues — AI may misrepresent your content.
- Below 40: AI crawlers will likely fail to extract meaningful content.