The definitive data on how generative AI interprets the web. Original, large-scale studies on AI visibility, hallucination risk, and citation authority — published as open, ungated HTML.
We analyzed 1,000,111 domains and computed their AI-Trust Score — the first PageRank designed for LLMs. 46.8% of the web is invisible to generative search. The Facebook paradox, the full leaderboard, and an engineering playbook to escape the Graveyard.
Rigorous empirical validation of the AI-Crawler Reality Index. Shadow RAG calibration on 926 domains with 4,630 queries proves ACRI predicts retrieval success (Spearman ρ = 0.629). The scoring methodology behind all our studies.
When AI can't find your pricing, it invents one. Paired extraction experiments on 50 production domains show that ACRI-optimised pages reduce hallucination by 62% and cut token costs by 52%. The brand safety imperative for clean HTML.
Your React app looks perfect in Chrome. To ChatGPT, it's an empty page. Pure CSR apps have a 97% ghost ratio. Framework-by-framework analysis and the SSR remediation playbook that restores AI visibility.
Paired extraction experiments proving that cleaning HTML structure reduces LLM token cost by 52% and cuts hallucination rate by 3×. The "Golden Semantic String" methodology and deterministic remediation protocol.
Shannon entropy analysis of 100,000 sites reveals the median page wastes 55% of tokens on structural overhead. Pages in the top decile for structural efficiency show 2.3× higher AI citation rates. The 10-point engineering noise-reduction checklist.
Every study follows the same principles: large-scale data collection, statistical rigor, full transparency, and actionable engineering recommendations. We publish as ungated HTML — because if we gated our research behind a PDF, AI models couldn't read it either.
Every finding in these papers can be measured for your specific site. Run a free scan to see your ACRI score, ghost ratio, hallucination risk, and AI-Trust ranking.
Scan Your Domain Free →Or explore the 1M-domain Radar to see how any site scores.