SEODiff Research

The definitive data on how generative AI interprets the web. Original, large-scale studies on AI visibility, hallucination risk, and citation authority — published as open, ungated HTML.

6 studies · 1,000,111 domains analyzed · February 2026

1M+
Domains Crawled
46.8%
In AI Graveyard
865K
Citation Edges
6
Research Papers
ρ=0.629
ACRI Validation
1

The Great AI Disconnect: Why Domain Authority is Dead in the Era of Generative Search

We analyzed 1,000,111 domains and computed their AI-Trust Score — the first PageRank designed for LLMs. 46.8% of the web is invisible to generative search. The Facebook paradox, the full leaderboard, and an engineering playbook to escape the Graveyard.

1M Domains 46.8% Graveyard AI-Trust Score Leaderboard
2

The Science of ACRI: How Technical Structure Predicts AI Retrieval

Rigorous empirical validation of the AI-Crawler Reality Index. Shadow RAG calibration on 926 domains with 4,630 queries proves ACRI predicts retrieval success (Spearman ρ = 0.629). The scoring methodology behind all our studies.

926 Domains ρ = 0.629 Shadow RAG 5 Pillars
3

Hallucination Risk: How Structural Noise Causes LLMs to Invent Facts

When AI can't find your pricing, it invents one. Paired extraction experiments on 50 production domains show that ACRI-optimised pages reduce hallucination by 62% and cut token costs by 52%. The brand safety imperative for clean HTML.

3.5× Hallucination −62% After Fix 50 Domains Brand Safety
4

Ghost Content: How Client-Side Rendering Erases Pages from AI

Your React app looks perfect in Chrome. To ChatGPT, it's an empty page. Pure CSR apps have a 97% ghost ratio. Framework-by-framework analysis and the SSR remediation playbook that restores AI visibility.

97% Ghost Ratio CSR vs SSR +34 ACRI After Fix Framework Guide
5

Extraction Lab: How HTML Structure Determines LLM Fact Extraction

Paired extraction experiments proving that cleaning HTML structure reduces LLM token cost by 52% and cuts hallucination rate by 3×. The "Golden Semantic String" methodology and deterministic remediation protocol.

−52% Token Cost 3× Fewer Errors 50 Paired Tests Reproducible
6

Information Theory & The Generative Web: Why DOM Noise is the New Blocked Crawl

Shannon entropy analysis of 100,000 sites reveals the median page wastes 55% of tokens on structural overhead. Pages in the top decile for structural efficiency show 2.3× higher AI citation rates. The 10-point engineering noise-reduction checklist.

100K Sites 55% Token Waste 2.3× Citation Rate Shannon Entropy

Our Research Methodology

Every study follows the same principles: large-scale data collection, statistical rigor, full transparency, and actionable engineering recommendations. We publish as ungated HTML — because if we gated our research behind a PDF, AI models couldn't read it either.

📊
Data at Scale
1M+ domains crawled. 865K citation edges. Studies sized for statistical significance.
🔬
Reproducible
Full methodologies, API endpoints, and evaluation protocols published with every paper.
🔓
Ungated HTML
No PDFs, no email walls. Semantic HTML so AI models can read and cite our research.
🛠️
Actionable
Every paper ends with an engineering playbook. Data science you can immediately apply.

Apply This Research to Your Domain

Every finding in these papers can be measured for your specific site. Run a free scan to see your ACRI score, ghost ratio, hallucination risk, and AI-Trust ranking.

Scan Your Domain Free →

Or explore the 1M-domain Radar to see how any site scores.