Interpretation
Low extractability usually means the page is an “empty shell” without JS, the main content is missing/fragmented, or the DOM is dominated by non-content.
Common causes
- Client-side rendering hides primary text behind JS-only flows.
- Missing headings/titles or unstable page structure.
- Overwhelming boilerplate (often correlated with token bloat).
Fixes
- Server-render or pre-render primary content for crawlers.
- Keep headings, titles, and main text stable and present in initial HTML.
- Reduce noise so the main content is dominant.