API reference (v1)

SEODiff's API is the product. The dashboard and CI integrations are clients of the same API documented here.

Contents

Authentication

Most endpoints require an API key passed in the Authorization header:

Authorization: Bearer <your_api_key>

Get your key from the API Keys page in your account. Keys use the sd_live_ or sd_test_ prefix.

Base URL

All paths are relative to:

https://seodiff.io/api/v1

https://api.seodiff.io/api/v1 also works (same backend).

Quick test

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/me

Rate limits

API requests are rate-limited per account tier. Free accounts get 60 requests per minute. Pro and Enterprise get higher limits. Exceeding the limit returns 429 Too Many Requests.

Account & Plan

GET /api/v1/me

Returns your account info, plan details, and checkout URL (for plan upgrades).

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/me

Response:

{
  "account": { "id": "...", "name": "[email protected]" },
  "plan": {
    "tier": "pro",
    "max_sites": 10,
    "max_pages_per_scan": 5000,
    "max_deep_audit_pages": 10000
  },
  "checkout_url": "https://..."
}

GET /api/v1/audit

List recent API key audit events (key created, rotated, revoked, etc.).

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/audit

Sites & Monitoring

GET /api/v1/sites

List all monitored sites for your account.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/sites

Response:

[
  {
    "id": "...",
    "base_url": "https://example.com",
    "enabled": true,
    "schedule": "nightly"
  }
]

POST /api/v1/sites

Add or update a monitored site.

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "base_url": "https://example.com",
    "enabled": true,
    "schedule": "nightly"
  }' \
  https://seodiff.io/api/v1/sites

POST /api/v1/monitor/keepalive

Send a keepalive ping for a monitored site. Keeps monitoring active if schedule-based probes are paused.

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"base_url":"https://example.com"}' \
  https://seodiff.io/api/v1/monitor/keepalive

Scanning & CI/CD

Recommended first call

If you are integrating for the first time, start with POST /api/v1/validate using wait=true. It gives a single response with pass/fail plus links to all artifacts.

POST /api/v1/scan

Enqueue a surface scan. Returns 202 Accepted with an id and a status_url.

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "base_url": "https://preview.example.com",
    "render_js": false,
    "lighthouse": false
  }' \
  https://seodiff.io/api/v1/scan

Response:

{
  "id": "s_abc123...",
  "status_url": "/api/v1/scans/s_abc123.../status"
}

POST /api/v1/validate

CI-friendly scan wrapper. When wait=true, blocks until complete and returns pass/fail.

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "base_url": "https://preview.example.com",
    "preset": "fast",
    "fail_on": "fetch_errors,non200_status,schema_missing_required",
    "max_issue_rate": 10,
    "wait": true,
    "timeout_seconds": 180
  }' \
  https://seodiff.io/api/v1/validate

Response (wait=true):

{
  "pass": true,
  "reason": "",
  "failing": {},
  "report_url": "/scans/s_abc123/report.html",
  "json_url": "/scans/s_abc123/findings.json",
  "summary_markdown_url": "/api/v1/scans/s_abc123/summary.md"
}
FieldTypeDescription
passbooleanWhether the scan passed all checks
reasonstringHuman-readable failure reason (empty on pass)
failingobjectFailing keys with details
report_urlstringHTML report link
json_urlstringJSON findings export
summary_markdown_urlstringMarkdown summary for PR comments

Returns 200 for pass, 409 for fail. May return 202 if the scan hasn't completed within the timeout.

GET /api/v1/scans/{id}/summary.md Domain verified

Markdown summary suitable for GitHub PR comments.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/scans/s_abc123/summary.md

GET /api/v1/scans/{id}/findings.json Domain verified

Normalized findings as JSON for downstream tooling.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/scans/s_abc123/findings.json

GET /api/v1/scans/{id}/findings.csv Domain verified

Normalized findings as CSV.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/scans/s_abc123/findings.csv

GET /api/v1/incidents

List recent drift incidents detected by monitoring.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/incidents

GET /api/v1/templates?base_url=...

List template identifiers detected for a monitored site.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  "https://seodiff.io/api/v1/templates?base_url=https://example.com"

Response:

{
  "templates": ["/product/*", "/collections/*"]
}

GET /api/v1/timeline?base_url=...&template=...

Drift timeline for a given base URL and template. The template value should match an entry from /api/v1/templates.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  "https://seodiff.io/api/v1/timeline?base_url=https://example.com&template=/product/*"

GET /api/v1/template-drift-summaries?base_url=...

Aggregated template drift summaries across all templates for a site.

GET /api/v1/project-overview?base_url=...

Dashboard aggregate for a project. Returns project summary cards, recent scans, and Search Console data.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  "https://seodiff.io/api/v1/project-overview?base_url=https://example.com"

Deep Audit Pro

POST /api/v1/deep-audit Domain verified

Start a full-site deep crawl. Requires domain verification and Pro (or higher) plan.

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "base_url": "https://example.com",
    "crawl_scope": "deep_audit",
    "max_pages": 500,
    "render_js": false,
    "respect_robots": true,
    "crawl_speed": "normal",
    "include_patterns": [],
    "exclude_patterns": []
  }' \
  https://seodiff.io/api/v1/deep-audit

Response:

{
  "job_id": "da_abc123...",
  "status_url": "/api/v1/deep-audit/da_abc123...",
  "report_url": "/api/v1/deep-audit/da_abc123.../report"
}
ParameterTypeDescription
base_urlstringRequired. URL to crawl.
crawl_scopestringdeep_audit (default) or full_site (Enterprise).
max_pagesintegerMax pages to crawl (plan-limited).
render_jsbooleanEnable JavaScript rendering.
respect_robotsbooleanObey robots.txt (default true).
crawl_speedstringslow, normal, or fast.
include_patternsstring[]URL patterns to include (glob).
exclude_patternsstring[]URL patterns to exclude (glob).

GET /api/v1/deep-audit/

List deep-audit jobs for your account.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/deep-audit/

GET /api/v1/deep-audit/{id}

Get job status, progress percentage, and metadata.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/deep-audit/da_abc123

Response:

{
  "job_id": "da_abc123",
  "status": "complete",
  "progress": 100,
  "base_url": "https://example.com",
  "pages_crawled": 342,
  "started_at": "2025-01-15T10:00:00Z",
  "finished_at": "2025-01-15T10:05:23Z"
}

GET /api/v1/deep-audit/{id}/report

HTML report of the deep audit (job must be complete).

GET /api/v1/deep-audit/{id}/json

Raw JSON result with all crawled page data.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/deep-audit/da_abc123/json

GET /api/v1/deep-audit/{id}/summary

Lightweight summary payload. Includes top-level metrics, issue counts, and crawl stats.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/deep-audit/da_abc123/summary

GET /api/v1/deep-audit/{id}/graph

Template-level internal link graph payload for visualization.

GET /api/v1/deep-audit/{id}/url-pagerank

URL-level internal PageRank scores and distribution summary.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/deep-audit/da_abc123/url-pagerank

GET /api/v1/deep-audit/{id}/link-heat

Link heatmap data showing internal link equity distribution across templates and URLs.

GET /api/v1/deep-audit/{id}/full-audit

Full audit aggregate payload for enterprise-style views. Combines all deep-audit sub-reports.

GET /api/v1/deep-audit/{id}/agent-md

Structured Markdown export designed for LLM and AI agent consumption.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/deep-audit/da_abc123/agent-md

GET /api/v1/deep-audit/{id}/pseo-agent

Programmatic SEO diagnostics payload for coding agents. Includes template stats, placeholder detection, hallucination rates, schema drift analysis, and actionable fix lists.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/deep-audit/da_abc123/pseo-agent

Response (abbreviated):

{
  "meta": { "job_id": "da_abc123", "base_url": "https://example.com" },
  "health": "B",
  "health_score": 72,
  "templates": [ ... ],
  "data_integrity": {
    "placeholder_outbreak": { "severity": "warning" },
    "schema_type_drift": { "severity": "ok" }
  },
  "top_fixes": [ ... ]
}

GET /api/v1/deep-audit/{id}/diff

Scan-over-scan diff JSON comparing this audit to the previous one for the same domain. Returns detected changes across 23 detection algorithms.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/deep-audit/da_abc123/diff

GET /api/v1/deep-audit/{id}/diff-report

HTML diff report showing changes between consecutive deep audits.

GET /api/v1/project/{id}/graph Legacy

Redirects to /api/v1/deep-audit/{id}/graph.

GET /api/v1/project/{id}/url_pagerank Legacy

Redirects to /api/v1/deep-audit/{id}/url-pagerank.

GET /api/v1/project/{id}/full_audit Legacy

Redirects to /api/v1/deep-audit/{id}/full-audit.

Extraction Rules Pro

GET /api/v1/extraction-rules?base_url=...

List custom extraction rules for a site.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  "https://seodiff.io/api/v1/extraction-rules?base_url=https://example.com"

POST /api/v1/extraction-rules?base_url=...

Create or update a custom extraction rule.

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "field_name": "price",
    "selector_type": "css",
    "selector": ".product-price",
    "expected_type": "number",
    "required": true
  }' \
  "https://seodiff.io/api/v1/extraction-rules?base_url=https://example.com"

DELETE /api/v1/extraction-rules?base_url=...&field_name=...

Delete an extraction rule by field name.

curl -X DELETE -H "Authorization: Bearer $SEODIFF_API_KEY" \
  "https://seodiff.io/api/v1/extraction-rules?base_url=https://example.com&field_name=price"

POST /api/v1/extraction-rules/validate

Dry-run a rule against sampled pages before saving.

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "base_url": "https://example.com",
    "rule": {
      "field_name": "price",
      "selector_type": "css",
      "selector": ".product-price",
      "expected_type": "number",
      "required": true
    }
  }' \
  https://seodiff.io/api/v1/extraction-rules/validate

Domain Verification

Some endpoints (deep-audit, scan exports) require you to prove ownership of the domain. You can verify via DNS TXT record or by connecting Google Search Console.

GET POST /api/v1/domain-verify/challenge

Get or create a verification challenge for a domain. Accepts either GET with query params or POST with JSON body.

# GET with query param
curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  "https://seodiff.io/api/v1/domain-verify/challenge?domain=example.com"

# POST with JSON body
curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"domain":"example.com"}' \
  https://seodiff.io/api/v1/domain-verify/challenge

Response:

{
  "domain": "example.com",
  "token": "seodiff-verify=abc123...",
  "verified": false,
  "dns_host": "_seodiff-verify.example.com",
  "record_type": "TXT",
  "record_value": "seodiff-verify=abc123...",
  "instructions": "Add a DNS TXT record..."
}

POST /api/v1/domain-verify/confirm

Confirm domain verification after adding the DNS TXT record.

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"domain":"example.com"}' \
  https://seodiff.io/api/v1/domain-verify/confirm

GET /api/v1/domain-verify/status?domain=...

Check whether a domain is verified for your account.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  "https://seodiff.io/api/v1/domain-verify/status?domain=example.com"

Google Search Console Pro

GET /api/v1/gsc/connect?base_url=...

Start Google Search Console OAuth flow. Returns an auth_url to redirect the user to.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  "https://seodiff.io/api/v1/gsc/connect?base_url=https://example.com"

GET /api/v1/gsc/properties?base_url=...

List available and selected Search Console properties for a connected site.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  "https://seodiff.io/api/v1/gsc/properties?base_url=https://example.com"

POST /api/v1/gsc/select-property

Set the active Search Console property for a site.

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "base_url": "https://example.com",
    "property": "sc-domain:example.com"
  }' \
  https://seodiff.io/api/v1/gsc/select-property

POST /api/v1/gsc/sync

Trigger a fresh Search Console data sync for a site.

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"base_url":"https://example.com"}' \
  https://seodiff.io/api/v1/gsc/sync

GET POST /api/v1/gsc/diagnostics

GSC indexability diagnostics summary. Requires an active GSC connection. Accepts either GET with query params or POST with JSON body.

# GET with query param
curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  "https://seodiff.io/api/v1/gsc/diagnostics?base_url=https://example.com"

# POST with JSON body
curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"base_url":"https://example.com"}' \
  https://seodiff.io/api/v1/gsc/diagnostics

Alerts

GET /api/v1/alerts

List alerts for your account.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/alerts

GET /api/v1/alerts/unread

Get count of unread alerts.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/alerts/unread

Response:

{"unread": 3}

POST /api/v1/alerts/dismiss

Dismiss a single alert by ID.

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"id":"alert_abc123"}' \
  https://seodiff.io/api/v1/alerts/dismiss

POST /api/v1/alerts/dismiss-all

Dismiss all alerts.

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/alerts/dismiss-all

GET /api/v1/alerts/preferences

Get alert notification preferences.

POST /api/v1/alerts/preferences

Update alert notification preferences.

Schema Drift

GET /api/v1/schema-drift?base_url=...

Get schema drift analysis for a monitored site.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  "https://seodiff.io/api/v1/schema-drift?base_url=https://example.com"

GET /api/v1/schema-drift/diff?base_url=...&from=...&to=...

Get schema diff between two snapshots.

GET /api/v1/schema-drift/timeline?base_url=...

Schema change timeline for a site.

Indexation & IndexNow

Cloudflare & User-Agent

If your HTTP client uses a default or generic User-Agent (e.g. Python urllib), Cloudflare may block requests to seodiff.io with a 403. Fix: set a descriptive User-Agent header (e.g. MyApp/1.0), or use api.seodiff.io as an alternative host which bypasses Cloudflare’s browser-integrity check. All /api/v1/* endpoints work on both hosts.

GET /api/v1/indexation-health?base_url=...

Indexation health summary for a site. Requires GSC connection.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  "https://seodiff.io/api/v1/indexation-health?base_url=https://example.com"

GET /api/v1/indexation-health/inspect?url=...

Inspect indexation status for a specific URL.

GET /api/v1/indexnow/settings?base_url=...

Get IndexNow configuration for a site.

POST /api/v1/indexnow/settings/update

Update IndexNow settings (enable/disable, key, thresholds).

POST /api/v1/indexnow/push

Push URLs to search engines via IndexNow protocol.

Pre-configuration required

Before using this endpoint, you must configure an IndexNow key for the site via POST /api/v1/indexnow/settings/update (or the dashboard). The key file must be hosted at your domain root (e.g. https://example.com/{key}.txt) for search engines to verify ownership.

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "base_url": "https://example.com",
    "urls": [
      "https://example.com/page-1",
      "https://example.com/page-2"
    ]
  }' \
  https://seodiff.io/api/v1/indexnow/push

GET /api/v1/indexnow/log?base_url=...

Get IndexNow push log with submission history and status codes.

Visibility & Radar Public

These endpoints are publicly accessible without authentication.

GET /api/v1/visibility/domain/{domain}

Get ACRI visibility score and metadata for any domain.

curl https://seodiff.io/api/v1/visibility/domain/example.com

Response:

{
  "domain": "example.com",
  "acri_score": 78.5,
  "tranco_rank": 1234,
  "categories": ["technology"],
  "last_crawled": "2025-01-15T08:00:00Z"
}

GET /api/v1/visibility/search?q=...

Search for domains in the visibility index.

curl "https://seodiff.io/api/v1/visibility/search?q=example"

GET /api/v1/visibility/leaderboard?category=...

Leaderboard of top domains by ACRI score, optionally filtered by category.

GET /api/v1/radar/scanner/status

Live ticker showing recent radar scans, freshness stats, and queue depth.

curl https://seodiff.io/api/v1/radar/scanner/status

GET /api/v1/radar/scanner/pulse

Industry pulse: daily movers, biggest ACRI changes, and trending domains.

curl https://seodiff.io/api/v1/radar/scanner/pulse

Agentic Evaluation Pro

Give “web eyes” to your AI coding agent

Instead of using curl to fetch raw HTML (which blows out the LLM’s context window), call this endpoint. SEODiff crawls the pages, runs your assertions, computes SEO/GEO metrics, and returns a token-compressed summary designed for LLM consumption.

POST /api/v1/agent/evaluate

Evaluate programmatic SEO pages at scale. Provide URLs (explicit list or sitemap + pattern), custom assertions, and get back a pass/fail verdict with clustered errors and a structural DOM fingerprint — all in a compact JSON payload your AI agent can reason about.

Request

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "base_url": "https://staging.example.com",
    "url_pattern": "/etf/*",
    "sample_size": 30,
    "assertions": [
      {"rule": "contains_string", "value": "Dividend History", "severity": "critical"},
      {"rule": "not_contains_string", "value": "undefined", "severity": "critical"},
      {"rule": "min_word_count", "value": 500, "severity": "warning"},
      {"rule": "selector_exists", "value": "table.holdings", "severity": "critical"},
      {"rule": "has_schema"},
      {"rule": "no_placeholders"},
      {"rule": "max_js_ghost_ratio", "value": 0.1, "severity": "critical"}
    ],
    "wait": false
  }' \
  https://seodiff.io/api/v1/agent/evaluate
FieldTypeDescription
base_urlstringBase URL for sitemap discovery and pattern matching. Required unless urls is provided.
urlsstring[]Explicit list of URLs to evaluate. Overrides sitemap discovery.
url_patternstringGlob pattern to filter discovered URLs (e.g. /etf/*, /trail/*/details).
sample_sizeintegerMax pages to evaluate (default 20, max 200).
assertionsarrayCustom assertions to run against each page (see below).
url_assertionsobjectPer-URL assertion overrides. Keys are full URLs, values are assertion arrays. When present for a URL, these replace the global assertions for that URL. Example: {"https://example.com/old-page": [{"rule": "status_code", "value": 410}]}
baseline_eval_idstringPrevious evaluation_id to compare against. If provided, the response includes a regressions array showing any metric degradations (e.g. dropped H1 coverage, ACRI regression, new placeholder leaks).
waitbooleanBlock until evaluation completes (default true). When false, returns 202 Accepted with a status_url for polling.
timeout_secondsintegerMax wait time (default 120, max 300).
cache_bustbooleanAppend a unique query parameter to each URL to bypass CDN and server-side caches. Useful for evaluating freshly deployed pages.
Synchronous mode is the default

wait defaults to true — the request blocks until evaluation completes (up to timeout_seconds, default 120s). This is ideal for small-to-medium evaluations (≤20 pages). No polling required; you get the full result in one call.

LLM Timeout Safety: use wait: false

Most LLM tool-call APIs (OpenAI, Anthropic, local IDE extensions) have HTTP timeouts of 30–60 seconds. For large evaluations (30+ pages), set wait: false. The endpoint returns instantly with a status_url. Instruct your agent: “Poll the status_url every 5 seconds until status is passed or failed.”

Assertion rules

RuleValue typeDescription
contains_stringstringVisible text must contain this string
not_contains_stringstringVisible text must NOT contain this string (substring match)
not_contains_wordstringVisible text must NOT contain this word as a whole word (word-boundary match, case-insensitive). Unlike not_contains_string, this won’t match inside other words — e.g. "undefined" won’t match “fee structures remain undefined” but will match the standalone word “undefined” as rendered placeholder text. Ideal for avoiding false positives on English prose that naturally uses words like “undefined”, “null”, or “NaN”.
html_contains_stringstringRaw HTML must contain this string (searches full source including <script>, <meta> tags, JSON-LD, etc.). Use this when asserting on elements not in visible text.
html_not_contains_stringstringRaw HTML must NOT contain this string
min_word_countintegerMinimum word count
max_word_countintegerMaximum word count
status_codeintegerExpected HTTP status (e.g. 200)
has_schemaAt least one JSON-LD schema block
has_schema_typestringJSON-LD must contain a specific @type (e.g. BreadcrumbList, Product). Case-insensitive.
min_schema_countintegerMinimum number of JSON-LD blocks
has_h1Page must have an H1
has_meta_descriptionPage must have a meta description
selector_existsCSS selectorA CSS selector that must match ≥1 element
selector_countintegerCSS selector (in selector field) must match ≥N elements
regex_matchregexVisible text must match this regex
regex_not_matchregexVisible text must NOT match this regex
no_placeholdersNo pSEO placeholder leaks ({{var}}, undefined, NaN, etc.). Scans visible text only — excludes <script>, <style>, <code>, <pre>, <svg>, and aria-hidden="true" elements. Safe for React/Next.js RSC payloads, URL-encoded strings, and SVG chart labels. Matches show surrounding text for easy triage. ERROR is matched in all-caps only. AI hallucination patterns require full phrases (e.g. “as an AI language model”, “as an AI, I” — not “as an AI” followed by a regular word like “infrastructure”). Hyphenated-word safe: null inside compound words like “non-null” is excluded automatically. N/A context-aware: bare N/A values only trigger as placeholders when ≥3 occurrences are found on a page (to avoid false positives on data tables where “N/A” is a legitimate “not applicable” value). Supports an optional ignore array to whitelist terms: {"rule": "no_placeholders", "ignore": ["error", "n/a"]}.
min_acriintegerMinimum ACRI score (0–100)
max_token_bloatfloatMaximum HTML-to-text ratio
no_noindexPage must not have noindex
max_js_ghost_ratiofloat (0–1)Maximum JS ghost ratio. Detects pages that require JavaScript to render content (React/Next.js/Vue/Angular/Svelte). A ratio of 0.95 means the page is almost invisible to HTML-only crawlers. Use 0.1 to ensure proper SSR.
has_canonicalPage must have a <link rel="canonical"> tag
no_duplicate_h1Page must have at most one <h1> element
response_okConvenience alias for status_code: 200. Requires no value field.

Each assertion accepts an optional severity: "critical" (default) or "warning".

thresholdvalue alias

Both "value" and "threshold" are accepted in assertion objects. If both are present, value takes precedence. Example: {"rule": "max_token_bloat", "threshold": 12} is equivalent to {"rule": "max_token_bloat", "value": 12}.

typerule alias

Both "rule" and "type" are accepted as the assertion key. If you write {"type": "has_schema_type", "value": "BreadcrumbList"}, it works identically to {"rule": "has_schema_type", "value": "BreadcrumbList"}.

contains_string vs html_contains_string

contains_string searches visible text only (scripts, styles, and framework payloads are stripped). To assert on JSON-LD schema types, meta tags, or other elements inside <script> or <head>, use html_contains_string or the dedicated has_schema_type assertion.

regressions field

When baseline_eval_id is provided, the response always includes a regressions array — even during processing status. If there are no regressions, it will be an empty array ([]), never null.

Response (synchronous, wait: true)

{
  "evaluation_id": "eval_17120...",
  "status": "failed",
  "pages_evaluated": 30,
  "pass_rate": 87,
  "duration_ms": 4521,
  "summary": "Evaluation failed: 26/30 pages passed. critical assertion \"not_contains_string\"=\"undefined\" failed on 4/30 pages.",

  "failed_assertions": [
    {
      "rule": "not_contains_string",
      "value": "undefined",
      "severity": "critical",
      "failure_count": 4,
      "failure_rate": "4/30 pages",
      "example_url": "https://staging.example.com/etf/GLD",
      "context": "Found \"undefined\" in: …<div class='dividend-yield'>undefined</div>…"
    }
  ],

  "regressions": [
    {
      "metric": "schema_coverage",
      "previous": "100%",
      "current": "87%",
      "delta": "-13%",
      "diagnosis": "Schema coverage dropped — JSON-LD blocks may have been removed."
    }
  ],

  "metrics": {
    "avg_word_count": 1234,
    "avg_acri": 72,
    "schema_coverage": "87%",
    "h1_coverage": "100%",
    "meta_desc_coverage": "93%",
    "avg_token_bloat": 8.2,
    "non_200_count": 0,
    "error_count": 0,
    "placeholder_pages": 2
  },

  "structural_fingerprint": "H1(SPY ETF Overview) > H2(Holdings) > Table(50 rows) > H2(Dividend History) > Chart > Footer",

  "failing_pages": [
    {
      "url": "https://staging.example.com/etf/GLD",
      "http_status": 200,
      "word_count": 12,
      "acri": 31,
      "failed_assertions": ["not_contains_string:undefined", "min_word_count:500"],
      "structural_fingerprint": "H1(GLD) > Div(error) > [EXPECTED table.holdings MISSING]"
    }
  ]
}

Response (async, wait: false)

When wait: false, the endpoint returns 202 Accepted immediately:

{
  "evaluation_id": "eval_17120...",
  "status": "processing",
  "status_url": "/api/v1/agent/evaluate/eval_17120...",
  "pages_planned": 30
}

GET /api/v1/agent/evaluate/{evaluation_id}

Poll evaluation status. While processing, returns a lightweight status object. Once complete, returns the full evaluation result (same schema as the synchronous response above).

# Poll until complete
curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/agent/evaluate/eval_17120...

Evaluation results are cached for 1 hour. Completed evaluations can be referenced as baseline_eval_id in subsequent evaluations to detect regressions.

FieldDescription
statuspassed, failed, or processing (async mode)
status_urlPolling URL (async mode only). GET this URL to check progress or retrieve the completed result.
pass_ratePercentage of pages that passed all assertions (0–100)
summaryOne-paragraph human/LLM-readable summary of the evaluation
failed_assertionsAggregated assertion failures clustered by rule (not per-page). Includes exact context snippets showing where each failure occurred.
regressionsMetric degradations vs. the baseline_eval_id (only present when a baseline is provided). Tracks: pass_rate, avg_acri, avg_word_count, schema_coverage, h1_coverage, meta_desc_coverage, non_200_count, placeholder_pages, avg_token_bloat.
metricsAveraged SEO metrics across all evaluated pages
structural_fingerprintCompact DOM skeleton of the first page (token-compressed). For failing pages, missing selectors are annotated: [EXPECTED .selector MISSING]
failing_pagesDetails of pages that failed (max 20, for token budget)
sample_notePresent when sample_size truncated the URL list. Shows how many URLs were evaluated vs. discovered, e.g. “Evaluated 20 of 275 URLs (sample_size=20). Increase sample_size for broader coverage.”

Workflow: Test-Driven SEO with an AI Agent

The autonomous fix loop

1. You ask your AI agent: “Add a dividend section to the ETF template. Verify it works across edge cases.”

2. The agent writes code, pushes to staging.

3. The agent calls POST /api/v1/agent/evaluate with wait: false and assertions like not_contains_string: undefined, selector_exists: .dividend-table, and max_js_ghost_ratio: 0.1.

4. The agent polls GET /api/v1/agent/evaluate/{id} every 5 seconds until status is passed or failed.

5. SEODiff finds GLD (Gold ETFs have no dividends) renders “undefined”. The failing page’s fingerprint shows: H1(GLD) > [EXPECTED .dividend-table MISSING].

6. The agent reads the compact JSON, adds a null-check, re-pushes, re-evaluates — this time passing baseline_eval_id from the first run to catch regressions. Status: passed, 0 regressions.

7. The agent reports back: “Feature deployed and verified across 30 edge cases.”

Macro vs. Micro

/agent/evaluate (Micro): Use in your daily coding prompts. The agent runs this during the coding loop on 30 staging pages to verify a feature PR before merging.

/deep-audit/{id}/pseo-agent (Macro): Use for weekly portfolio maintenance. An agent runs this on a full 10,000-page deep audit to find widespread architectural rot.

Agentic QA (Human Review) Pro

Visual and semantic QA for individual pages. Supports three modes: manual (you define the tasks), chaos (autonomous exploratory QA — the LLM decides what to test), and profile-template (auto-generate assertion rules from golden pages).

Chaos mode: “find what I didn’t think to look for”

Set exploration_mode: "chaos" and the LLM autonomously generates QA tasks covering data trust, template integrity, visual sanity, UX confusion, and SEO meta — without you writing a single assertion. Optionally provide a persona to unlock domain-specific probes (finance, outdoor/trail, recipe).

POST /api/v1/agent/human-review

Submit a page for multi-modal QA. Returns immediately with a review_id and status_url for async polling. The endpoint fetches the page, optionally captures a screenshot, and runs QA tasks through text and vision LLMs.

Request (manual mode)

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/etf/SPY",
    "qa_tasks": [
      {
        "id": "holdings_sum",
        "modality": "text_llm",
        "prompt": "Check if the portfolio weights in the holdings table sum to 100%"
      },
      {
        "id": "chart_match",
        "modality": "vision_vlm",
        "prompt": "Does the performance chart visually match the stated YTD return?"
      }
    ]
  }' \
  https://seodiff.io/api/v1/agent/human-review

Request (chaos mode)

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/etf/SPY",
    "exploration_mode": "chaos",
    "persona": "You are a skeptical financial analyst"
  }' \
  https://seodiff.io/api/v1/agent/human-review

No qa_tasks needed. The LLM auto-generates 5+ tasks spanning text and vision modalities.

FieldTypeDescription
urlstringRequired (or use urls). The page URL to evaluate.
urlsstring[]Batch mode: array of URLs to review with shared config. Each gets its own review_id. Max 50.
exploration_modestringSet to "chaos" for autonomous exploratory QA. When set, qa_tasks is optional (auto-generated if empty).
personastringOptional persona for the LLM (e.g. “You are a senior Mountain Guide and UX critic”). In chaos mode, triggers domain-specific probes. Keywords: financ*/investor/stock/trading/portfolio/analyst, hiker/outdoor/trail, chef/cook/recipe.
qa_tasksarrayExplicit QA tasks. Each has id (string), modality (text_llm, vision_vlm, or data_interpreter), and prompt (string). Omit in chaos mode to auto-generate.
strictnessstringlow, medium (default), or high. Controls how aggressively the LLM flags issues.
waitbooleanDefault false (async — returns immediately). Set true to block until complete.
depthstring"quick" skips vision tasks for faster iteration (text_llm only, ~3 tasks). Default is full mode (text + vision, 5+ tasks). Only affects chaos mode auto-generated tasks.
modelstringOverride inference model. Routing is automatic: gemini-* / gemma-* → Google AI Studio, llama3.1-* / qwen-* / gpt-oss-* / zai-glm-* → Cerebras Cloud, org/model / llama-3.* → Groq Cloud, anything else → local Ollama. Default: server-configured model (currently gemma4:26b). See model comparison below.
timeout_secintegerMax seconds for the review. If exceeded, completed tasks are returned and remaining tasks are marked “model unavailable: timeout expired”. Range: 30–1500 (25 min). Default: 20 min (sync) / 25 min (async).

Auto-generated chaos tasks

When exploration_mode: "chaos" is set, the following tasks are always generated:

Task IDModalityWhat it checks
chaos_content_trusttext_llmImplausible claims, contradictory numbers, suspiciously round data
chaos_template_integritytext_llmPlaceholder text, Lorem ipsum, data from wrong page instance
chaos_seo_metatext_llmTitle/meta/H1 uniqueness — template stamps vs. dynamic insertion
chaos_visual_sanityvision_vlmOverlapping elements, broken images, charts that don’t match data
chaos_ux_confusionvision_vlm5-second test: can you tell what this page is about and what to do next?

Persona-triggered extras: chaos_financial_data (finance personas), chaos_trail_data (outdoor personas), chaos_recipe_data (cooking personas).

Hallucination guard (text_llm)

After the LLM produces findings, each issue’s element field is cross-referenced against the raw page text. If the claimed element text does not appear anywhere on the page, the issue is demoted to severity: "info" and its description is prefixed with [unverified]. This prevents pure LLM hallucinations (e.g. “copyright says © 2020” when no such text exists) from being reported as high-severity findings.

Chaos mode duration expectations

Full mode (default): 2–15 minutes per page. The 2 vision_vlm tasks require headless Chromium screenshot capture + VLM inference, which is CPU-intensive. Expect ~2–5 min with Google AI models, ~10–15 min with local Ollama on CPU-only servers.

Quick mode ("depth": "quick"): 3–60 seconds per page. Skips all vision_vlm tasks and runs only text_llm tasks. Recommended for batch validation of programmatic pages where visual checks are less critical.

For multi-page chaos reviews, submit jobs concurrently and poll each status_url independently.

Response

Initial response (async):

{
  "review_id": "hr_8528b6edc53b7454",
  "status": "processing",
  "status_url": "/api/v1/agent/human-review/hr_8528b6edc53b7454",
  "url": "https://example.com/etf/SPY",
  "overall_passed": null,
  "duration_ms": 0
}
overall_passed is null while processing

overall_passed is null until the review completes. Only treat it as a boolean (true/false) when status is "complete".

Batch mode response

When urls array is provided, the response wraps multiple reviews:

{
  "batch": true,
  "count": 3,
  "reviews": [
    { "review_id": "hr_a1b2c3d4e5f6a7b8", "status": "processing", "status_url": "/api/v1/agent/human-review/hr_a1b2c3d4e5f6a7b8", "url": "https://example.com/page-1" },
    { "review_id": "hr_b2c3d4e5f6a7b8c9", "status": "processing", "status_url": "/api/v1/agent/human-review/hr_b2c3d4e5f6a7b8c9", "url": "https://example.com/page-2" },
    { "review_id": "hr_c3d4e5f6a7b8c9d0", "status": "processing", "status_url": "/api/v1/agent/human-review/hr_c3d4e5f6a7b8c9d0", "url": "https://example.com/page-3" }
  ]
}

Each review can be polled independently via its status_url.

GET /api/v1/agent/human-review/{review_id}

Poll review status. Once complete, returns the full result:

{
  "review_id": "hr_8528b6edc53b7454",
  "status": "complete",
  "url": "https://example.com/etf/SPY",
  "human_verdict": "4 of 6 QA checks failed.",
  "overall_passed": false,
  "qa_results": [
    {
      "task_id": "chaos_content_trust",
      "modality": "text_llm",
      "prompt": "Examine all claims, numbers, and data points...",
      "category": "data",
      "passed": false,
      "issues": [
        {
          "element": "Holdings Table",
          "description": "Portfolio weights sum to 103.2%, exceeding 100%.",
          "severity": "critical"
        }
      ],
      "notes": "The top-10 holdings alone sum to 37%. Full table sums to 103.2%."
    }
  ],
  "page_context": {
    "title": "SPY ETF Overview",
    "word_count": 1234,
    "has_h1": true
  },
  "duration_ms": 47294,
  "model": "gemma4:26b"
}
FieldDescription
statusprocessing, complete, or failed
human_verdictOne-line summary (e.g. “4 of 6 QA checks failed.”)
overall_passedtrue/false when complete; null while status is processing
qa_resultsArray of results, one per QA task. Each has task_id, modality, prompt, category, passed, issues[], and notes.
issues[].severitycritical, warning, or info
page_contextPage metadata: title, word_count, has_h1
modelLLM model used (e.g. gemma4:26b)
duration_msTotal wall-clock time in milliseconds
tasks_total(Polling only) Total QA tasks planned
tasks_completed(Polling only) Tasks finished so far
progress_pct(Polling only) Completion percentage (0–100)
est_remain_sec(Polling only) Estimated seconds remaining. Based on elapsed time per completed task. 0 when unknown (no tasks completed yet).
current_step(Polling only) Human-readable progress step (e.g. “running vision_vlm (2 tasks)”, “capturing screenshot”)
retry_after_secPresent (default: 30) when one or more QA tasks were skipped due to model unavailability. Agents should wait this many seconds before retrying the review.
When to use chaos mode vs. /agent/evaluate

Chaos mode is for exploratory, open-ended QA on individual pages — finding issues you didn’t know to look for. It uses LLM reasoning (text + vision) and is best for spot-checking programmatic pages in production.

/agent/evaluate is for deterministic, rule-based validation at scale — running 17 assertion types across 200 pages. Use it in CI/CD pipelines where you know exactly what to check.

POST /api/v1/agent/profile-template

Automated baseline profiler. Feed 2–10 “golden” page URLs from the same pSEO template, and the LLM reverse-engineers the template’s structure to generate assertion rules you can feed into /agent/human-review or /agent/evaluate.

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      "https://example.com/etf/SPY",
      "https://example.com/etf/QQQ",
      "https://example.com/etf/IWM"
    ],
    "template_name": "etf_page"
  }' \
  https://seodiff.io/api/v1/agent/profile-template
FieldTypeDescription
urlsstring[]Required. 2–10 golden page URLs from the same template.
template_namestringHuman-friendly label for the template.
focus_areasstring[]Optional focus: data_accuracy, layout, seo_meta, etc.
waitbooleanDefault false (async). Set true to block until complete. Async returns 202 Accepted with a status_url for polling at /api/v1/agent/profile-template/{job_id}.
modelstringOverride inference model. Same routing as /agent/human-review: Google AI (gemini-* / gemma-*) auto-routes, local Ollama names accepted.

Profile Template Response

{
  "template_name": "etf_page",
  "page_count": 3,
  "generated_assertions": [
    {
      "rule": "The H1 must follow the pattern: '[Ticker] ETF Overview'",
      "category": "seo",
      "modality": "text_llm",
      "confidence": 0.95,
      "reason": "All 3 pages follow this H1 pattern consistently."
    },
    {
      "rule": "A holdings table with at least 10 rows must be present",
      "category": "structural",
      "modality": "text_llm",
      "confidence": 0.9,
      "reason": "Core functional component present in all sampled pages."
    }
  ],
  "variance_notes": "Dynamic slots: [Ticker], [Fund Name], [AUM], [Expense Ratio]...",
  "duration_ms": 289825
}

Google AI Studio Models

Both /agent/human-review and /agent/profile-template support Google AI Studio models via the model parameter. When a model name starts with gemini- or gemma-, the request is automatically routed to Google’s OpenAI-compatible endpoint.

Setup: Set the SEODIFF_GOOGLE_AI_API_KEY environment variable on the server with your Google AI Studio API key. No other configuration is needed.

ModelRPMRPDVisionBest for
gemini-2.5-flash520YesPowerful reasoning (may 503 under free-tier demand)
gemini-3-flash-preview520YesLatest generation, multimodal
gemini-3.1-flash-lite-preview15500YesBest daily limit, fast (<10s)
gemma-4-31b-it151,500NoHigh-quality text, high volume
gemma-4-26b-a4b-it151,500NoHigh-volume text tasks
Vision support

Gemini models support vision_vlm tasks (screenshot analysis). Gemma models are text-only — vision tasks will fall back to the server-configured VLM. For full chaos mode (text + vision), use a Gemini model or omit the model field to use the default.

Auto-retry on 503 / rate limits

If Google AI Studio returns HTTP 503 (high demand) or 429 (rate limited), SEODiff automatically retries up to 3 times with exponential backoff (5s, 10s, 15s). The Retry-After header from upstream providers (Groq, Cerebras, etc.) is respected when present. If all attempts fail, affected QA tasks are reported as “Model unavailable” and the response includes a retry_after_sec field (default: 30) so agents can schedule a retry.

Example: Google AI model override

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/etf/SPY",
    "exploration_mode": "chaos",
    "model": "gemini-2.5-flash"
  }' \
  https://seodiff.io/api/v1/agent/human-review

Cerebras Cloud Models

Model names starting with llama3.1-, qwen-, gpt-oss-, or zai-glm- are routed to Cerebras Cloud, which provides extremely fast inference via wafer-scale hardware.

Setup: Set the SEODIFF_CEREBRAS_API_KEY environment variable on the server. No other configuration is needed.

ModelContextVisionBest for
llama3.1-8b8KNoFastest option (<1s). Good for quick text QA at scale.
qwen-3-235b-a22b-instruct-250765KNoBest accuracy/speed balance. 235B MoE, deep reasoning in ~3s.
gpt-oss-120b128KNoLarge context window, general-purpose.
zai-glm-4.732KNoPremium quality, highest per-token cost.
Cerebras is text-only

Cerebras models do not support vision_vlm tasks. Vision tasks will fall back to the server-configured VLM (local Ollama). For full chaos mode (text + vision), use a Gemini model or the default.

Example: Cerebras model override

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/recipe/carbonara",
    "exploration_mode": "chaos",
    "model": "qwen-3-235b-a22b-instruct-2507"
  }' \
  https://seodiff.io/api/v1/agent/human-review

Groq Cloud Models

Models using org/model format (e.g. meta-llama/llama-4-scout-17b-16e-instruct) or starting with llama-3. are routed to Groq Cloud, which provides fast LPU-accelerated inference with a generous free tier.

Setup: Set the SEODIFF_GROQ_API_KEY environment variable on the server. No other configuration is needed.

ModelContextVisionFree-tier RPM / TPMBest for
meta-llama/llama-4-scout-17b-16e-instruct128KNo30 / 30KHighest free-tier throughput. Great overall quality.
llama-3.3-70b-versatile128KNo30 / 12KLargest free model. Best accuracy for complex QA.
llama-3.1-8b-instant128KNo30 / 6KFastest free option. Good for quick text QA.
qwen/qwen3-32b32KNo60 / 6KHighest RPM. Strong reasoning.
moonshotai/kimi-k2-instruct128KNo60 / 10KHigh RPM + throughput. Long context.
Groq is text-only

Groq models do not support vision_vlm tasks. Vision tasks will fall back to the server-configured VLM. For full chaos mode (text + vision), use a Gemini model or the default.

Groq free-tier rate limits

The free tier has per-minute and per-day limits (RPM/RPD/TPM/TPD). If you hit a 429 response, the retry-after header tells you how long to wait. For high-volume work, consider models with higher RPM like qwen/qwen3-32b (60 RPM) or use batch mode with smaller payloads.

Example: Groq model override

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/product/wireless-headphones",
    "exploration_mode": "chaos",
    "model": "meta-llama/llama-4-scout-17b-16e-instruct"
  }' \
  https://seodiff.io/api/v1/agent/human-review

Model Comparison

Benchmark results for a standardized QA task (recipe page, chaos-style prompt, single text_llm task). Speed measured end-to-end from API call to parsed response.

ModelProviderSpeedAccuracyCostNotes
llama3.1-8bCerebras~500msMedium$0.10/M tokensFastest. Occasional false positives on valid data. Best for high-volume quick scans.
qwen-3-235bCerebras~3sHighPreview (free)Best speed/accuracy balance. Catches nutrition inconsistencies, analytical reasoning.
meta-llama/llama-4-scout-17b-16e-instructGroq~500msHighFree tierHighest free-tier throughput (30K TPM). 17B MoE with 128K context.
llama-3.3-70b-versatileGroq~400msHighFree tierLargest free Groq model. Concise, accurate QA. Best quality per free token.
llama-3.1-8b-instantGroq~250msMediumFree tierFastest free Groq model. Some noise in output. Good for quick scans.
qwen/qwen3-32bGroq~1–2sHighFree tierHighest RPM (60). Uses thinking tokens. Strong analytical reasoning.
gemini-2.5-flashGoogle AI~10sHighestFree tierDeep analysis with thinking tokens. Best for thorough QA. May 503 under demand.
gemma4:26bLocal Ollama~2–5 minHighFree (self-hosted)Default. No external API calls. Requires server CPU/GPU. Best for privacy-sensitive work.
Choosing a model

Speed priority: llama3.1-8b (sub-second). Accuracy priority: gemini-2.5-flash (deep reasoning). Best balance: qwen-3-235b-a22b-instruct-2507 (fast + smart). Free + fast: meta-llama/llama-4-scout-17b-16e-instruct (Groq). Privacy: gemma4:26b (local, no data leaves your server).

Automatic vision acceleration

When SEODIFF_GOOGLE_AI_API_KEY is configured and the server’s default VLM is local Ollama, vision tasks (vision_vlm) are automatically routed to Gemini 2.5 Flash via Google AI Studio. This cuts vision latency from ~2–5 minutes (CPU inference) to ~10 seconds (cloud). No configuration or model override needed — it happens transparently. Text-only models (Cerebras, Gemma) still fall back correctly.

Automatic text acceleration

When the server’s default text LLM is local Ollama, text tasks (text_llm) are automatically routed to a fast cloud provider. Priority: Groq (llama-3.3-70b-versatile, ~400ms) → Cerebras (qwen-3-235b, ~3s) → Google AI (gemini-2.5-flash, ~10s) → local Ollama (fallback). This cuts text_llm from ~3–10 minutes (CPU) to under 1 second. To force local inference, set SEODIFF_LLM_BASE_URL to a non-localhost URL or use the model parameter to explicitly select a local model like gemma4:26b.

AI Readiness Tools Free tier

These tools analyze how well a page is optimized for AI consumption. Most are free-tier and accept a URL in the POST body.

POST /api/v1/aes

AI Extractability Score (AES). Analyzes how easily AI systems can extract structured data from a page.

curl -X POST -H "Content-Type: application/json" \
  -d '{"url":"https://example.com/page"}' \
  https://seodiff.io/api/v1/aes

POST /api/v1/chunking

RAG chunking simulator. Shows how the page would be chunked for retrieval-augmented generation.

POST /api/v1/entity-schema

Entity schema generator. Extracts and suggests JSON-LD structured data for a page.

POST /api/v1/training-data

LLM training data auditor. Analyzes quality signals for training corpus inclusion.

POST /api/v1/crawler-health

Robots.txt and crawler health check. Validates bot access and directives.

POST /api/v1/ai-crawler-sim

AI crawler simulation. Shows what an AI bot sees when crawling the page.

POST /api/v1/ai-answer-preview

AI answer preview. Simulates how an LLM would summarize or reference the page.

POST /api/v1/entropy

Structural entropy analysis. Measures content structure quality for machine consumption.

POST /api/v1/hallucination-test

Hallucination risk checker. Evaluates whether page content may induce LLM hallucinations.

POST /api/v1/llmstxt/validate

Validate a site's llms.txt file against the emerging standard.

POST /api/v1/llmstxt/generate

Generate an llms.txt file for a site.

Error Model

Errors return JSON with an error field:

{
  "error": "domain not verified for this account"
}
StatusMeaning
400Invalid input (missing/malformed fields)
401Missing or invalid API key
403Plan-gated feature or insufficient permissions
404Resource not found
409Validation failed (scan did not pass)
429Rate limit exceeded. Includes Retry-After header (seconds) and retry_after_seconds field in the JSON body.
502Upstream error (database, crawl engine)

Access Scoping

API keys are scoped to your account. You can only access:

Domain verification is required for deep-audit creation and scan exports. Verify domains via DNS TXT or by connecting Google Search Console.

Pass/fail behavior

API-first by design

The dashboard and CI/CD automation are clients of the same API. This keeps behavior consistent and allows SEODiff to evolve heuristics without changing your integration surface.

Integrations & Examples

Developer Hub
Central index of all code examples, integrations, and guides.
Assertion Glossary
17 assertion rules with problem explanations and API payloads.
Code Examples
Copy-paste examples in cURL, Python, Node.js, Go, and PHP.
CI/CD Integrations
GitHub Actions, GitLab CI, Vercel, Jenkins, and more.
Agent & IDE Guides
System prompts for Cursor, Copilot, Cline, Windsurf, LangChain.

Related