API reference (v1)

SEODiff's API is the product. The dashboard and CI integrations are clients of the same API documented here.

Authentication Account & Plan Sites & Monitoring Scanning & CI/CD Deep Audit Extraction Rules Domain Verification Google Search Console Alerts Schema Drift Indexation & IndexNow Visibility & Radar Agentic Evaluation Agentic QA (Human Review) AI Readiness Tools Errors

Authentication

Most endpoints require an API key passed in the Authorization header:

Authorization: Bearer <your_api_key>

Get your key from the API Keys page in your account. Keys use the sd_live_ or sd_test_ prefix.

Base URL

All paths are relative to:

https://seodiff.io/api/v1

https://api.seodiff.io/api/v1 also works (same backend).

Quick test

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/me

Rate limits

API requests are rate-limited per account tier. Free accounts get 60 requests per minute. Pro and Enterprise get higher limits. Exceeding the limit returns 429 Too Many Requests.

Account & Plan

GET /api/v1/me

Returns your account info, plan details, and checkout URL (for plan upgrades).

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/me

Response:

{
  "account": { "id": "...", "name": "[email protected]" },
  "plan": {
    "tier": "pro",
    "max_sites": 10,
    "max_pages_per_scan": 5000,
    "max_deep_audit_pages": 10000
  },
  "checkout_url": "https://..."
}

GET /api/v1/audit

List recent API key audit events (key created, rotated, revoked, etc.).

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/audit

Sites & Monitoring

GET /api/v1/sites

List all monitored sites for your account.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/sites

Response:

[
  {
    "id": "...",
    "base_url": "https://example.com",
    "enabled": true,
    "schedule": "nightly"
  }
]

POST /api/v1/sites

Add or update a monitored site.

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "base_url": "https://example.com",
    "enabled": true,
    "schedule": "nightly"
  }' \
  https://seodiff.io/api/v1/sites

POST /api/v1/monitor/keepalive

Send a keepalive ping for a monitored site. Keeps monitoring active if schedule-based probes are paused.

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"base_url":"https://example.com"}' \
  https://seodiff.io/api/v1/monitor/keepalive

Scanning & CI/CD

Recommended first call

If you are integrating for the first time, start with POST /api/v1/validate using wait=true. It gives a single response with pass/fail plus links to all artifacts.

POST /api/v1/scan

Enqueue a surface scan. Returns 202 Accepted with an id and a status_url.

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "base_url": "https://preview.example.com",
    "render_js": false,
    "lighthouse": false
  }' \
  https://seodiff.io/api/v1/scan

Response:

{
  "id": "s_abc123...",
  "status_url": "/api/v1/scans/s_abc123.../status"
}

POST /api/v1/validate

CI-friendly scan wrapper. When wait=true, blocks until complete and returns pass/fail.

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "base_url": "https://preview.example.com",
    "preset": "fast",
    "fail_on": "fetch_errors,non200_status,schema_missing_required",
    "max_issue_rate": 10,
    "wait": true,
    "timeout_seconds": 180
  }' \
  https://seodiff.io/api/v1/validate

Response (wait=true):

{
  "pass": true,
  "reason": "",
  "failing": {},
  "report_url": "/scans/s_abc123/report.html",
  "json_url": "/scans/s_abc123/findings.json",
  "summary_markdown_url": "/api/v1/scans/s_abc123/summary.md"
}

Field	Type	Description
`pass`	boolean	Whether the scan passed all checks
`reason`	string	Human-readable failure reason (empty on pass)
`failing`	object	Failing keys with details
`report_url`	string	HTML report link
`json_url`	string	JSON findings export
`summary_markdown_url`	string	Markdown summary for PR comments

Returns 200 for pass, 409 for fail. May return 202 if the scan hasn't completed within the timeout.

GET /api/v1/scans/{id}/summary.md Domain verified

Markdown summary suitable for GitHub PR comments.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/scans/s_abc123/summary.md

GET /api/v1/scans/{id}/findings.json Domain verified

Normalized findings as JSON for downstream tooling.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/scans/s_abc123/findings.json

GET /api/v1/scans/{id}/findings.csv Domain verified

Normalized findings as CSV.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/scans/s_abc123/findings.csv

GET /api/v1/incidents

List recent drift incidents detected by monitoring.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/incidents

GET /api/v1/templates?base_url=...

List template identifiers detected for a monitored site.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  "https://seodiff.io/api/v1/templates?base_url=https://example.com"

Response:

{
  "templates": ["/product/*", "/collections/*"]
}

GET /api/v1/timeline?base_url=...&template=...

Drift timeline for a given base URL and template. The template value should match an entry from /api/v1/templates.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  "https://seodiff.io/api/v1/timeline?base_url=https://example.com&template=/product/*"

GET /api/v1/template-drift-summaries?base_url=...

Aggregated template drift summaries across all templates for a site.

GET /api/v1/project-overview?base_url=...

Dashboard aggregate for a project. Returns project summary cards, recent scans, and Search Console data.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  "https://seodiff.io/api/v1/project-overview?base_url=https://example.com"

Deep Audit Pro

POST /api/v1/deep-audit Domain verified

Start a full-site deep crawl. Requires domain verification and Pro (or higher) plan.

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "base_url": "https://example.com",
    "crawl_scope": "deep_audit",
    "max_pages": 500,
    "render_js": false,
    "respect_robots": true,
    "crawl_speed": "normal",
    "include_patterns": [],
    "exclude_patterns": []
  }' \
  https://seodiff.io/api/v1/deep-audit

Response:

{
  "job_id": "da_abc123...",
  "status_url": "/api/v1/deep-audit/da_abc123...",
  "report_url": "/api/v1/deep-audit/da_abc123.../report"
}

Parameter	Type	Description
`base_url`	string	Required. URL to crawl.
`crawl_scope`	string	`deep_audit` (default) or `full_site` (Enterprise).
`max_pages`	integer	Max pages to crawl (plan-limited).
`render_js`	boolean	Enable JavaScript rendering.
`respect_robots`	boolean	Obey robots.txt (default true).
`crawl_speed`	string	`slow`, `normal`, or `fast`.
`include_patterns`	string[]	URL patterns to include (glob).
`exclude_patterns`	string[]	URL patterns to exclude (glob).

GET /api/v1/deep-audit/

List deep-audit jobs for your account.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/deep-audit/

GET /api/v1/deep-audit/{id}

Get job status, progress percentage, and metadata.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/deep-audit/da_abc123

Response:

{
  "job_id": "da_abc123",
  "status": "complete",
  "progress": 100,
  "base_url": "https://example.com",
  "pages_crawled": 342,
  "started_at": "2025-01-15T10:00:00Z",
  "finished_at": "2025-01-15T10:05:23Z"
}

GET /api/v1/deep-audit/{id}/report

HTML report of the deep audit (job must be complete).

GET /api/v1/deep-audit/{id}/json

Raw JSON result with all crawled page data.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/deep-audit/da_abc123/json

GET /api/v1/deep-audit/{id}/summary

Lightweight summary payload. Includes top-level metrics, issue counts, and crawl stats.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/deep-audit/da_abc123/summary

GET /api/v1/deep-audit/{id}/graph

Template-level internal link graph payload for visualization.

GET /api/v1/deep-audit/{id}/url-pagerank

URL-level internal PageRank scores and distribution summary.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/deep-audit/da_abc123/url-pagerank

GET /api/v1/deep-audit/{id}/link-heat

Link heatmap data showing internal link equity distribution across templates and URLs.

GET /api/v1/deep-audit/{id}/full-audit

Full audit aggregate payload for enterprise-style views. Combines all deep-audit sub-reports.

GET /api/v1/deep-audit/{id}/agent-md

Structured Markdown export designed for LLM and AI agent consumption.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/deep-audit/da_abc123/agent-md

GET /api/v1/deep-audit/{id}/pseo-agent

Programmatic SEO diagnostics payload for coding agents. Includes template stats, placeholder detection, hallucination rates, schema drift analysis, and actionable fix lists.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/deep-audit/da_abc123/pseo-agent

Response (abbreviated):

{
  "meta": { "job_id": "da_abc123", "base_url": "https://example.com" },
  "health": "B",
  "health_score": 72,
  "templates": [ ... ],
  "data_integrity": {
    "placeholder_outbreak": { "severity": "warning" },
    "schema_type_drift": { "severity": "ok" }
  },
  "top_fixes": [ ... ]
}

GET /api/v1/deep-audit/{id}/diff

Scan-over-scan diff JSON comparing this audit to the previous one for the same domain. Returns detected changes across 23 detection algorithms.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/deep-audit/da_abc123/diff

GET /api/v1/deep-audit/{id}/diff-report

HTML diff report showing changes between consecutive deep audits.

GET /api/v1/project/{id}/graph Legacy

Redirects to /api/v1/deep-audit/{id}/graph.

GET /api/v1/project/{id}/url_pagerank Legacy

Redirects to /api/v1/deep-audit/{id}/url-pagerank.

GET /api/v1/project/{id}/full_audit Legacy

Redirects to /api/v1/deep-audit/{id}/full-audit.

Extraction Rules Pro

GET /api/v1/extraction-rules?base_url=...

List custom extraction rules for a site.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  "https://seodiff.io/api/v1/extraction-rules?base_url=https://example.com"

POST /api/v1/extraction-rules?base_url=...

Create or update a custom extraction rule.

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "field_name": "price",
    "selector_type": "css",
    "selector": ".product-price",
    "expected_type": "number",
    "required": true
  }' \
  "https://seodiff.io/api/v1/extraction-rules?base_url=https://example.com"

DELETE /api/v1/extraction-rules?base_url=...&field_name=...

Delete an extraction rule by field name.

curl -X DELETE -H "Authorization: Bearer $SEODIFF_API_KEY" \
  "https://seodiff.io/api/v1/extraction-rules?base_url=https://example.com&field_name=price"

POST /api/v1/extraction-rules/validate

Dry-run a rule against sampled pages before saving.

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "base_url": "https://example.com",
    "rule": {
      "field_name": "price",
      "selector_type": "css",
      "selector": ".product-price",
      "expected_type": "number",
      "required": true
    }
  }' \
  https://seodiff.io/api/v1/extraction-rules/validate

Domain Verification

Some endpoints (deep-audit, scan exports) require you to prove ownership of the domain. You can verify via DNS TXT record or by connecting Google Search Console.

GET POST /api/v1/domain-verify/challenge

Get or create a verification challenge for a domain. Accepts either GET with query params or POST with JSON body.

# GET with query param
curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  "https://seodiff.io/api/v1/domain-verify/challenge?domain=example.com"

# POST with JSON body
curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"domain":"example.com"}' \
  https://seodiff.io/api/v1/domain-verify/challenge

Response:

{
  "domain": "example.com",
  "token": "seodiff-verify=abc123...",
  "verified": false,
  "dns_host": "_seodiff-verify.example.com",
  "record_type": "TXT",
  "record_value": "seodiff-verify=abc123...",
  "instructions": "Add a DNS TXT record..."
}

POST /api/v1/domain-verify/confirm

Confirm domain verification after adding the DNS TXT record.

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"domain":"example.com"}' \
  https://seodiff.io/api/v1/domain-verify/confirm

GET /api/v1/domain-verify/status?domain=...

Check whether a domain is verified for your account.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  "https://seodiff.io/api/v1/domain-verify/status?domain=example.com"

Google Search Console Pro

GET /api/v1/gsc/connect?base_url=...

Start Google Search Console OAuth flow. Returns an auth_url to redirect the user to.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  "https://seodiff.io/api/v1/gsc/connect?base_url=https://example.com"

GET /api/v1/gsc/properties?base_url=...

List available and selected Search Console properties for a connected site.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  "https://seodiff.io/api/v1/gsc/properties?base_url=https://example.com"

POST /api/v1/gsc/select-property

Set the active Search Console property for a site.

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "base_url": "https://example.com",
    "property": "sc-domain:example.com"
  }' \
  https://seodiff.io/api/v1/gsc/select-property

POST /api/v1/gsc/sync

Trigger a fresh Search Console data sync for a site.

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"base_url":"https://example.com"}' \
  https://seodiff.io/api/v1/gsc/sync

GET POST /api/v1/gsc/diagnostics

GSC indexability diagnostics summary. Requires an active GSC connection. Accepts either GET with query params or POST with JSON body.

# GET with query param
curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  "https://seodiff.io/api/v1/gsc/diagnostics?base_url=https://example.com"

# POST with JSON body
curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"base_url":"https://example.com"}' \
  https://seodiff.io/api/v1/gsc/diagnostics

Alerts

GET /api/v1/alerts

List alerts for your account.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/alerts

GET /api/v1/alerts/unread

Get count of unread alerts.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/alerts/unread

Response:

{"unread": 3}

POST /api/v1/alerts/dismiss

Dismiss a single alert by ID.

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"id":"alert_abc123"}' \
  https://seodiff.io/api/v1/alerts/dismiss

POST /api/v1/alerts/dismiss-all

Dismiss all alerts.

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/alerts/dismiss-all

GET /api/v1/alerts/preferences

Get alert notification preferences.

POST /api/v1/alerts/preferences

Update alert notification preferences.

Schema Drift

GET /api/v1/schema-drift?base_url=...

Get schema drift analysis for a monitored site.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  "https://seodiff.io/api/v1/schema-drift?base_url=https://example.com"

GET /api/v1/schema-drift/diff?base_url=...&from=...&to=...

Get schema diff between two snapshots.

GET /api/v1/schema-drift/timeline?base_url=...

Schema change timeline for a site.

Indexation & IndexNow

Cloudflare & User-Agent

If your HTTP client uses a default or generic User-Agent (e.g. Python urllib), Cloudflare may block requests to seodiff.io with a 403. Fix: set a descriptive User-Agent header (e.g. MyApp/1.0), or use api.seodiff.io as an alternative host which bypasses Cloudflare’s browser-integrity check. All /api/v1/* endpoints work on both hosts.

GET /api/v1/indexation-health?base_url=...

Indexation health summary for a site. Requires GSC connection.

curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  "https://seodiff.io/api/v1/indexation-health?base_url=https://example.com"

GET /api/v1/indexation-health/inspect?url=...

Inspect indexation status for a specific URL.

GET /api/v1/indexnow/settings?base_url=...

Get IndexNow configuration for a site.

POST /api/v1/indexnow/settings/update

Update IndexNow settings (enable/disable, key, thresholds).

POST /api/v1/indexnow/push

Push URLs to search engines via IndexNow protocol.

Pre-configuration required

Before using this endpoint, you must configure an IndexNow key for the site via POST /api/v1/indexnow/settings/update (or the dashboard). The key file must be hosted at your domain root (e.g. https://example.com/{key}.txt) for search engines to verify ownership.

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "base_url": "https://example.com",
    "urls": [
      "https://example.com/page-1",
      "https://example.com/page-2"
    ]
  }' \
  https://seodiff.io/api/v1/indexnow/push

GET /api/v1/indexnow/log?base_url=...

Get IndexNow push log with submission history and status codes.

Visibility & Radar Public

These endpoints are publicly accessible without authentication.

GET /api/v1/visibility/domain/{domain}

Get ACRI visibility score and metadata for any domain.

curl https://seodiff.io/api/v1/visibility/domain/example.com

Response:

{
  "domain": "example.com",
  "acri_score": 78.5,
  "tranco_rank": 1234,
  "categories": ["technology"],
  "last_crawled": "2025-01-15T08:00:00Z"
}

GET /api/v1/visibility/search?q=...

Search for domains in the visibility index.

curl "https://seodiff.io/api/v1/visibility/search?q=example"

GET /api/v1/visibility/leaderboard?category=...

Leaderboard of top domains by ACRI score, optionally filtered by category.

GET /api/v1/radar/scanner/status

Live ticker showing recent radar scans, freshness stats, and queue depth.

curl https://seodiff.io/api/v1/radar/scanner/status

GET /api/v1/radar/scanner/pulse

Industry pulse: daily movers, biggest ACRI changes, and trending domains.

curl https://seodiff.io/api/v1/radar/scanner/pulse

Agentic Evaluation Pro

Give “web eyes” to your AI coding agent

Instead of using curl to fetch raw HTML (which blows out the LLM’s context window), call this endpoint. SEODiff crawls the pages, runs your assertions, computes SEO/GEO metrics, and returns a token-compressed summary designed for LLM consumption.

POST /api/v1/agent/evaluate

Evaluate programmatic SEO pages at scale. Provide URLs (explicit list or sitemap + pattern), custom assertions, and get back a pass/fail verdict with clustered errors and a structural DOM fingerprint — all in a compact JSON payload your AI agent can reason about.

Request

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "base_url": "https://staging.example.com",
    "url_pattern": "/etf/*",
    "sample_size": 30,
    "assertions": [
      {"rule": "contains_string", "value": "Dividend History", "severity": "critical"},
      {"rule": "not_contains_string", "value": "undefined", "severity": "critical"},
      {"rule": "min_word_count", "value": 500, "severity": "warning"},
      {"rule": "selector_exists", "value": "table.holdings", "severity": "critical"},
      {"rule": "has_schema"},
      {"rule": "no_placeholders"},
      {"rule": "max_js_ghost_ratio", "value": 0.1, "severity": "critical"}
    ],
    "wait": false
  }' \
  https://seodiff.io/api/v1/agent/evaluate

Field	Type	Description
`base_url`	string	Base URL for sitemap discovery and pattern matching. Required unless `urls` is provided.
`urls`	string[]	Explicit list of URLs to evaluate. Overrides sitemap discovery.
`url_pattern`	string	Glob pattern to filter discovered URLs (e.g. `/etf/`, `/trail//details`).
`sample_size`	integer	Max pages to evaluate (default 20, max 200).
`assertions`	array	Custom assertions to run against each page (see below).
`url_assertions`	object	Per-URL assertion overrides. Keys are full URLs, values are assertion arrays. When present for a URL, these replace the global `assertions` for that URL. Example: `{"https://example.com/old-page": [{"rule": "status_code", "value": 410}]}`
`baseline_eval_id`	string	Previous `evaluation_id` to compare against. If provided, the response includes a `regressions` array showing any metric degradations (e.g. dropped H1 coverage, ACRI regression, new placeholder leaks).
`wait`	boolean	Block until evaluation completes (default `true`). When `false`, returns `202 Accepted` with a `status_url` for polling.
`timeout_seconds`	integer	Max wait time (default 120, max 300).
`cache_bust`	boolean	Append a unique query parameter to each URL to bypass CDN and server-side caches. Useful for evaluating freshly deployed pages.

Synchronous mode is the default

wait defaults to true — the request blocks until evaluation completes (up to timeout_seconds, default 120s). This is ideal for small-to-medium evaluations (≤20 pages). No polling required; you get the full result in one call.

LLM Timeout Safety: use wait: false

Most LLM tool-call APIs (OpenAI, Anthropic, local IDE extensions) have HTTP timeouts of 30–60 seconds. For large evaluations (30+ pages), set wait: false. The endpoint returns instantly with a status_url. Instruct your agent: “Poll the status_url every 5 seconds until status is passed or failed.”

Assertion rules

Rule	Value type	Description
`contains_string`	string	Visible text must contain this string
`not_contains_string`	string	Visible text must NOT contain this string (substring match)
`not_contains_word`	string	Visible text must NOT contain this word as a whole word (word-boundary match, case-insensitive). Unlike `not_contains_string`, this won’t match inside other words — e.g. `"undefined"` won’t match “fee structures remain undefined” but will match the standalone word “undefined” as rendered placeholder text. Ideal for avoiding false positives on English prose that naturally uses words like “undefined”, “null”, or “NaN”.
`html_contains_string`	string	Raw HTML must contain this string (searches full source including `<script>`, `<meta>` tags, JSON-LD, etc.). Use this when asserting on elements not in visible text.
`html_not_contains_string`	string	Raw HTML must NOT contain this string
`min_word_count`	integer	Minimum word count
`max_word_count`	integer	Maximum word count
`status_code`	integer	Expected HTTP status (e.g. 200)
`has_schema`	—	At least one JSON-LD schema block
`has_schema_type`	string	JSON-LD must contain a specific `@type` (e.g. `BreadcrumbList`, `Product`). Case-insensitive.
`min_schema_count`	integer	Minimum number of JSON-LD blocks
`has_h1`	—	Page must have an H1
`has_meta_description`	—	Page must have a meta description
`selector_exists`	CSS selector	A CSS selector that must match ≥1 element
`selector_count`	integer	CSS selector (in `selector` field) must match ≥N elements
`regex_match`	regex	Visible text must match this regex
`regex_not_match`	regex	Visible text must NOT match this regex
`no_placeholders`	—	No pSEO placeholder leaks ({{var}}, undefined, NaN, etc.). Scans visible text only — excludes `<script>`, `<style>`, `<code>`, `<pre>`, `<svg>`, and `aria-hidden="true"` elements. Safe for React/Next.js RSC payloads, URL-encoded strings, and SVG chart labels. Matches show surrounding text for easy triage. `ERROR` is matched in all-caps only. AI hallucination patterns require full phrases (e.g. “as an AI language model”, “as an AI, I” — not “as an AI” followed by a regular word like “infrastructure”). Hyphenated-word safe: `null` inside compound words like “non-null” is excluded automatically. N/A context-aware: bare N/A values only trigger as placeholders when ≥3 occurrences are found on a page (to avoid false positives on data tables where “N/A” is a legitimate “not applicable” value). Supports an optional `ignore` array to whitelist terms: `{"rule": "no_placeholders", "ignore": ["error", "n/a"]}`.
`min_acri`	integer	Minimum ACRI score (0–100)
`max_token_bloat`	float	Maximum HTML-to-text ratio
`no_noindex`	—	Page must not have noindex
`max_js_ghost_ratio`	float (0–1)	Maximum JS ghost ratio. Detects pages that require JavaScript to render content (React/Next.js/Vue/Angular/Svelte). A ratio of 0.95 means the page is almost invisible to HTML-only crawlers. Use `0.1` to ensure proper SSR.
`has_canonical`	—	Page must have a `<link rel="canonical">` tag
`no_duplicate_h1`	—	Page must have at most one `<h1>` element
`response_ok`	—	Convenience alias for `status_code: 200`. Requires no `value` field.

Each assertion accepts an optional severity: "critical" (default) or "warning".

threshold ↔ value alias

Both "value" and "threshold" are accepted in assertion objects. If both are present, value takes precedence. Example: {"rule": "max_token_bloat", "threshold": 12} is equivalent to {"rule": "max_token_bloat", "value": 12}.

type ↔ rule alias

Both "rule" and "type" are accepted as the assertion key. If you write {"type": "has_schema_type", "value": "BreadcrumbList"}, it works identically to {"rule": "has_schema_type", "value": "BreadcrumbList"}.

contains_string vs html_contains_string

contains_string searches visible text only (scripts, styles, and framework payloads are stripped). To assert on JSON-LD schema types, meta tags, or other elements inside <script> or <head>, use html_contains_string or the dedicated has_schema_type assertion.

regressions field

When baseline_eval_id is provided, the response always includes a regressions array — even during processing status. If there are no regressions, it will be an empty array ([]), never null.

Response (synchronous, `wait: true`)

{
  "evaluation_id": "eval_17120...",
  "status": "failed",
  "pages_evaluated": 30,
  "pass_rate": 87,
  "duration_ms": 4521,
  "summary": "Evaluation failed: 26/30 pages passed. critical assertion \"not_contains_string\"=\"undefined\" failed on 4/30 pages.",

  "failed_assertions": [
    {
      "rule": "not_contains_string",
      "value": "undefined",
      "severity": "critical",
      "failure_count": 4,
      "failure_rate": "4/30 pages",
      "example_url": "https://staging.example.com/etf/GLD",
      "context": "Found \"undefined\" in: …<div class='dividend-yield'>undefined</div>…"
    }
  ],

  "regressions": [
    {
      "metric": "schema_coverage",
      "previous": "100%",
      "current": "87%",
      "delta": "-13%",
      "diagnosis": "Schema coverage dropped — JSON-LD blocks may have been removed."
    }
  ],

  "metrics": {
    "avg_word_count": 1234,
    "avg_acri": 72,
    "schema_coverage": "87%",
    "h1_coverage": "100%",
    "meta_desc_coverage": "93%",
    "avg_token_bloat": 8.2,
    "non_200_count": 0,
    "error_count": 0,
    "placeholder_pages": 2
  },

  "structural_fingerprint": "H1(SPY ETF Overview) > H2(Holdings) > Table(50 rows) > H2(Dividend History) > Chart > Footer",

  "failing_pages": [
    {
      "url": "https://staging.example.com/etf/GLD",
      "http_status": 200,
      "word_count": 12,
      "acri": 31,
      "failed_assertions": ["not_contains_string:undefined", "min_word_count:500"],
      "structural_fingerprint": "H1(GLD) > Div(error) > [EXPECTED table.holdings MISSING]"
    }
  ]
}

Response (async, `wait: false`)

When wait: false, the endpoint returns 202 Accepted immediately:

{
  "evaluation_id": "eval_17120...",
  "status": "processing",
  "status_url": "/api/v1/agent/evaluate/eval_17120...",
  "pages_planned": 30
}

GET /api/v1/agent/evaluate/{evaluation_id}

Poll evaluation status. While processing, returns a lightweight status object. Once complete, returns the full evaluation result (same schema as the synchronous response above).

# Poll until complete
curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
  https://seodiff.io/api/v1/agent/evaluate/eval_17120...

Evaluation results are cached for 1 hour. Completed evaluations can be referenced as baseline_eval_id in subsequent evaluations to detect regressions.

Field	Description
`status`	`passed`, `failed`, or `processing` (async mode)
`status_url`	Polling URL (async mode only). `GET` this URL to check progress or retrieve the completed result.
`pass_rate`	Percentage of pages that passed all assertions (0–100)
`summary`	One-paragraph human/LLM-readable summary of the evaluation
`failed_assertions`	Aggregated assertion failures clustered by rule (not per-page). Includes exact `context` snippets showing where each failure occurred.
`regressions`	Metric degradations vs. the `baseline_eval_id` (only present when a baseline is provided). Tracks: pass_rate, avg_acri, avg_word_count, schema_coverage, h1_coverage, meta_desc_coverage, non_200_count, placeholder_pages, avg_token_bloat.
`metrics`	Averaged SEO metrics across all evaluated pages
`structural_fingerprint`	Compact DOM skeleton of the first page (token-compressed). For failing pages, missing selectors are annotated: `[EXPECTED .selector MISSING]`
`failing_pages`	Details of pages that failed (max 20, for token budget)
`sample_note`	Present when `sample_size` truncated the URL list. Shows how many URLs were evaluated vs. discovered, e.g. “Evaluated 20 of 275 URLs (sample_size=20). Increase sample_size for broader coverage.”

Workflow: Test-Driven SEO with an AI Agent

The autonomous fix loop

1. You ask your AI agent: “Add a dividend section to the ETF template. Verify it works across edge cases.”

2. The agent writes code, pushes to staging.

3. The agent calls POST /api/v1/agent/evaluate with wait: false and assertions like not_contains_string: undefined, selector_exists: .dividend-table, and max_js_ghost_ratio: 0.1.

4. The agent polls GET /api/v1/agent/evaluate/{id} every 5 seconds until status is passed or failed.

5. SEODiff finds GLD (Gold ETFs have no dividends) renders “undefined”. The failing page’s fingerprint shows: H1(GLD) > [EXPECTED .dividend-table MISSING].

6. The agent reads the compact JSON, adds a null-check, re-pushes, re-evaluates — this time passing baseline_eval_id from the first run to catch regressions. Status: passed, 0 regressions.

7. The agent reports back: “Feature deployed and verified across 30 edge cases.”

Macro vs. Micro

/agent/evaluate (Micro): Use in your daily coding prompts. The agent runs this during the coding loop on 30 staging pages to verify a feature PR before merging.

/deep-audit/{id}/pseo-agent (Macro): Use for weekly portfolio maintenance. An agent runs this on a full 10,000-page deep audit to find widespread architectural rot.

Agentic QA (Human Review) Pro

Visual and semantic QA for individual pages. Supports three modes: manual (you define the tasks), chaos (autonomous exploratory QA — the LLM decides what to test), and profile-template (auto-generate assertion rules from golden pages).

Chaos mode: “find what I didn’t think to look for”

Set exploration_mode: "chaos" and the LLM autonomously generates QA tasks covering data trust, template integrity, visual sanity, UX confusion, and SEO meta — without you writing a single assertion. Optionally provide a persona to unlock domain-specific probes (finance, outdoor/trail, recipe).

POST /api/v1/agent/human-review

Submit a page for multi-modal QA. Returns immediately with a review_id and status_url for async polling. The endpoint fetches the page, optionally captures a screenshot, and runs QA tasks through text and vision LLMs.

Request (manual mode)

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/etf/SPY",
    "qa_tasks": [
      {
        "id": "holdings_sum",
        "modality": "text_llm",
        "prompt": "Check if the portfolio weights in the holdings table sum to 100%"
      },
      {
        "id": "chart_match",
        "modality": "vision_vlm",
        "prompt": "Does the performance chart visually match the stated YTD return?"
      }
    ]
  }' \
  https://seodiff.io/api/v1/agent/human-review

Request (chaos mode)

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/etf/SPY",
    "exploration_mode": "chaos",
    "persona": "You are a skeptical financial analyst"
  }' \
  https://seodiff.io/api/v1/agent/human-review

No qa_tasks needed. The LLM auto-generates 5+ tasks spanning text and vision modalities.

Field	Type	Description
`url`	string	Required (or use `urls`). The page URL to evaluate.
`urls`	string[]	Batch mode: array of URLs to review with shared config. Each gets its own `review_id`. Max 50.
`exploration_mode`	string	Set to `"chaos"` for autonomous exploratory QA. When set, `qa_tasks` is optional (auto-generated if empty).
`persona`	string	Optional persona for the LLM (e.g. “You are a senior Mountain Guide and UX critic”). In chaos mode, triggers domain-specific probes. Keywords: `financ*/investor/stock/trading/portfolio/analyst`, `hiker/outdoor/trail`, `chef/cook/recipe`.
`qa_tasks`	array	Explicit QA tasks. Each has `id` (string), `modality` (`text_llm`, `vision_vlm`, or `data_interpreter`), and `prompt` (string). Omit in chaos mode to auto-generate.
`strictness`	string	`low`, `medium` (default), or `high`. Controls how aggressively the LLM flags issues.
`wait`	boolean	Default `false` (async — returns immediately). Set `true` to block until complete.
`depth`	string	`"quick"` skips vision tasks for faster iteration (text_llm only, ~3 tasks). Default is full mode (text + vision, 5+ tasks). Only affects chaos mode auto-generated tasks.
`model`	string	Override inference model. Routing is automatic: `gemini-` / `gemma-` → Google AI Studio, `llama3.1-` / `qwen-` / `gpt-oss-` / `zai-glm-` → Cerebras Cloud, `org/model` / `llama-3.*` → Groq Cloud, anything else → local Ollama. Default: server-configured model (currently `gemma4:26b`). See model comparison below.
`timeout_sec`	integer	Max seconds for the review. If exceeded, completed tasks are returned and remaining tasks are marked “model unavailable: timeout expired”. Range: 30–1500 (25 min). Default: 20 min (sync) / 25 min (async).

Auto-generated chaos tasks

When exploration_mode: "chaos" is set, the following tasks are always generated:

Task ID	Modality	What it checks
`chaos_content_trust`	text_llm	Implausible claims, contradictory numbers, suspiciously round data
`chaos_template_integrity`	text_llm	Placeholder text, Lorem ipsum, data from wrong page instance
`chaos_seo_meta`	text_llm	Title/meta/H1 uniqueness — template stamps vs. dynamic insertion
`chaos_visual_sanity`	vision_vlm	Overlapping elements, broken images, charts that don’t match data
`chaos_ux_confusion`	vision_vlm	5-second test: can you tell what this page is about and what to do next?

Persona-triggered extras: chaos_financial_data (finance personas), chaos_trail_data (outdoor personas), chaos_recipe_data (cooking personas).

Hallucination guard (text_llm)

After the LLM produces findings, each issue’s element field is cross-referenced against the raw page text. If the claimed element text does not appear anywhere on the page, the issue is demoted to severity: "info" and its description is prefixed with [unverified]. This prevents pure LLM hallucinations (e.g. “copyright says © 2020” when no such text exists) from being reported as high-severity findings.

Chaos mode duration expectations

Full mode (default): 2–15 minutes per page. The 2 vision_vlm tasks require headless Chromium screenshot capture + VLM inference, which is CPU-intensive. Expect ~2–5 min with Google AI models, ~10–15 min with local Ollama on CPU-only servers.

Quick mode ("depth": "quick"): 3–60 seconds per page. Skips all vision_vlm tasks and runs only text_llm tasks. Recommended for batch validation of programmatic pages where visual checks are less critical.

For multi-page chaos reviews, submit jobs concurrently and poll each status_url independently.

Response

Initial response (async):

{
  "review_id": "hr_8528b6edc53b7454",
  "status": "processing",
  "status_url": "/api/v1/agent/human-review/hr_8528b6edc53b7454",
  "url": "https://example.com/etf/SPY",
  "overall_passed": null,
  "duration_ms": 0
}

overall_passed is null while processing

overall_passed is null until the review completes. Only treat it as a boolean (true/false) when status is "complete".

Batch mode response

When urls array is provided, the response wraps multiple reviews:

{
  "batch": true,
  "count": 3,
  "reviews": [
    { "review_id": "hr_a1b2c3d4e5f6a7b8", "status": "processing", "status_url": "/api/v1/agent/human-review/hr_a1b2c3d4e5f6a7b8", "url": "https://example.com/page-1" },
    { "review_id": "hr_b2c3d4e5f6a7b8c9", "status": "processing", "status_url": "/api/v1/agent/human-review/hr_b2c3d4e5f6a7b8c9", "url": "https://example.com/page-2" },
    { "review_id": "hr_c3d4e5f6a7b8c9d0", "status": "processing", "status_url": "/api/v1/agent/human-review/hr_c3d4e5f6a7b8c9d0", "url": "https://example.com/page-3" }
  ]
}

Each review can be polled independently via its status_url.

GET /api/v1/agent/human-review/{review_id}

Poll review status. Once complete, returns the full result:

{
  "review_id": "hr_8528b6edc53b7454",
  "status": "complete",
  "url": "https://example.com/etf/SPY",
  "human_verdict": "4 of 6 QA checks failed.",
  "overall_passed": false,
  "qa_results": [
    {
      "task_id": "chaos_content_trust",
      "modality": "text_llm",
      "prompt": "Examine all claims, numbers, and data points...",
      "category": "data",
      "passed": false,
      "issues": [
        {
          "element": "Holdings Table",
          "description": "Portfolio weights sum to 103.2%, exceeding 100%.",
          "severity": "critical"
        }
      ],
      "notes": "The top-10 holdings alone sum to 37%. Full table sums to 103.2%."
    }
  ],
  "page_context": {
    "title": "SPY ETF Overview",
    "word_count": 1234,
    "has_h1": true
  },
  "duration_ms": 47294,
  "model": "gemma4:26b"
}

Field	Description
`status`	`processing`, `complete`, or `failed`
`human_verdict`	One-line summary (e.g. “4 of 6 QA checks failed.”)
`overall_passed`	`true`/`false` when complete; `null` while `status` is `processing`
`qa_results`	Array of results, one per QA task. Each has `task_id`, `modality`, `prompt`, `category`, `passed`, `issues[]`, and `notes`.
`issues[].severity`	`critical`, `warning`, or `info`
`page_context`	Page metadata: `title`, `word_count`, `has_h1`
`model`	LLM model used (e.g. `gemma4:26b`)
`duration_ms`	Total wall-clock time in milliseconds
`tasks_total`	(Polling only) Total QA tasks planned
`tasks_completed`	(Polling only) Tasks finished so far
`progress_pct`	(Polling only) Completion percentage (0–100)
`est_remain_sec`	(Polling only) Estimated seconds remaining. Based on elapsed time per completed task. `0` when unknown (no tasks completed yet).
`current_step`	(Polling only) Human-readable progress step (e.g. “running vision_vlm (2 tasks)”, “capturing screenshot”)
`retry_after_sec`	Present (default: 30) when one or more QA tasks were skipped due to model unavailability. Agents should wait this many seconds before retrying the review.

When to use chaos mode vs. /agent/evaluate

Chaos mode is for exploratory, open-ended QA on individual pages — finding issues you didn’t know to look for. It uses LLM reasoning (text + vision) and is best for spot-checking programmatic pages in production.

/agent/evaluate is for deterministic, rule-based validation at scale — running 17 assertion types across 200 pages. Use it in CI/CD pipelines where you know exactly what to check.

POST /api/v1/agent/profile-template

Automated baseline profiler. Feed 2–10 “golden” page URLs from the same pSEO template, and the LLM reverse-engineers the template’s structure to generate assertion rules you can feed into /agent/human-review or /agent/evaluate.

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      "https://example.com/etf/SPY",
      "https://example.com/etf/QQQ",
      "https://example.com/etf/IWM"
    ],
    "template_name": "etf_page"
  }' \
  https://seodiff.io/api/v1/agent/profile-template

Field	Type	Description
`urls`	string[]	Required. 2–10 golden page URLs from the same template.
`template_name`	string	Human-friendly label for the template.
`focus_areas`	string[]	Optional focus: `data_accuracy`, `layout`, `seo_meta`, etc.
`wait`	boolean	Default `false` (async). Set `true` to block until complete. Async returns `202 Accepted` with a `status_url` for polling at `/api/v1/agent/profile-template/{job_id}`.
`model`	string	Override inference model. Same routing as `/agent/human-review`: Google AI (`gemini-` / `gemma-`) auto-routes, local Ollama names accepted.

Profile Template Response

{
  "template_name": "etf_page",
  "page_count": 3,
  "generated_assertions": [
    {
      "rule": "The H1 must follow the pattern: '[Ticker] ETF Overview'",
      "category": "seo",
      "modality": "text_llm",
      "confidence": 0.95,
      "reason": "All 3 pages follow this H1 pattern consistently."
    },
    {
      "rule": "A holdings table with at least 10 rows must be present",
      "category": "structural",
      "modality": "text_llm",
      "confidence": 0.9,
      "reason": "Core functional component present in all sampled pages."
    }
  ],
  "variance_notes": "Dynamic slots: [Ticker], [Fund Name], [AUM], [Expense Ratio]...",
  "duration_ms": 289825
}

Google AI Studio Models

Both /agent/human-review and /agent/profile-template support Google AI Studio models via the model parameter. When a model name starts with gemini- or gemma-, the request is automatically routed to Google’s OpenAI-compatible endpoint.

Setup: Set the SEODIFF_GOOGLE_AI_API_KEY environment variable on the server with your Google AI Studio API key. No other configuration is needed.

Model	RPM	RPD	Vision	Best for
`gemini-2.5-flash`	5	20	Yes	Powerful reasoning (may 503 under free-tier demand)
`gemini-3-flash-preview`	5	20	Yes	Latest generation, multimodal
`gemini-3.1-flash-lite-preview`	15	500	Yes	Best daily limit, fast (<10s)
`gemma-4-31b-it`	15	1,500	No	High-quality text, high volume
`gemma-4-26b-a4b-it`	15	1,500	No	High-volume text tasks

Vision support

Gemini models support vision_vlm tasks (screenshot analysis). Gemma models are text-only — vision tasks will fall back to the server-configured VLM. For full chaos mode (text + vision), use a Gemini model or omit the model field to use the default.

Auto-retry on 503 / rate limits

If Google AI Studio returns HTTP 503 (high demand) or 429 (rate limited), SEODiff automatically retries up to 3 times with exponential backoff (5s, 10s, 15s). The Retry-After header from upstream providers (Groq, Cerebras, etc.) is respected when present. If all attempts fail, affected QA tasks are reported as “Model unavailable” and the response includes a retry_after_sec field (default: 30) so agents can schedule a retry.

Example: Google AI model override

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/etf/SPY",
    "exploration_mode": "chaos",
    "model": "gemini-2.5-flash"
  }' \
  https://seodiff.io/api/v1/agent/human-review

Cerebras Cloud Models

Model names starting with llama3.1-, qwen-, gpt-oss-, or zai-glm- are routed to Cerebras Cloud, which provides extremely fast inference via wafer-scale hardware.

Setup: Set the SEODIFF_CEREBRAS_API_KEY environment variable on the server. No other configuration is needed.

Model	Context	Vision	Best for
`llama3.1-8b`	8K	No	Fastest option (<1s). Good for quick text QA at scale.
`qwen-3-235b-a22b-instruct-2507`	65K	No	Best accuracy/speed balance. 235B MoE, deep reasoning in ~3s.
`gpt-oss-120b`	128K	No	Large context window, general-purpose.
`zai-glm-4.7`	32K	No	Premium quality, highest per-token cost.

Cerebras is text-only

Cerebras models do not support vision_vlm tasks. Vision tasks will fall back to the server-configured VLM (local Ollama). For full chaos mode (text + vision), use a Gemini model or the default.

Example: Cerebras model override

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/recipe/carbonara",
    "exploration_mode": "chaos",
    "model": "qwen-3-235b-a22b-instruct-2507"
  }' \
  https://seodiff.io/api/v1/agent/human-review

Groq Cloud Models

Models using org/model format (e.g. meta-llama/llama-4-scout-17b-16e-instruct) or starting with llama-3. are routed to Groq Cloud, which provides fast LPU-accelerated inference with a generous free tier.

Setup: Set the SEODIFF_GROQ_API_KEY environment variable on the server. No other configuration is needed.

Model	Context	Vision	Free-tier RPM / TPM	Best for
`meta-llama/llama-4-scout-17b-16e-instruct`	128K	No	30 / 30K	Highest free-tier throughput. Great overall quality.
`llama-3.3-70b-versatile`	128K	No	30 / 12K	Largest free model. Best accuracy for complex QA.
`llama-3.1-8b-instant`	128K	No	30 / 6K	Fastest free option. Good for quick text QA.
`qwen/qwen3-32b`	32K	No	60 / 6K	Highest RPM. Strong reasoning.
`moonshotai/kimi-k2-instruct`	128K	No	60 / 10K	High RPM + throughput. Long context.

Groq is text-only

Groq models do not support vision_vlm tasks. Vision tasks will fall back to the server-configured VLM. For full chaos mode (text + vision), use a Gemini model or the default.

Groq free-tier rate limits

The free tier has per-minute and per-day limits (RPM/RPD/TPM/TPD). If you hit a 429 response, the retry-after header tells you how long to wait. For high-volume work, consider models with higher RPM like qwen/qwen3-32b (60 RPM) or use batch mode with smaller payloads.

Example: Groq model override

curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/product/wireless-headphones",
    "exploration_mode": "chaos",
    "model": "meta-llama/llama-4-scout-17b-16e-instruct"
  }' \
  https://seodiff.io/api/v1/agent/human-review

Model Comparison

Benchmark results for a standardized QA task (recipe page, chaos-style prompt, single text_llm task). Speed measured end-to-end from API call to parsed response.

Model	Provider	Speed	Accuracy	Cost	Notes
`llama3.1-8b`	Cerebras	~500ms	Medium	$0.10/M tokens	Fastest. Occasional false positives on valid data. Best for high-volume quick scans.
`qwen-3-235b`	Cerebras	~3s	High	Preview (free)	Best speed/accuracy balance. Catches nutrition inconsistencies, analytical reasoning.
`meta-llama/llama-4-scout-17b-16e-instruct`	Groq	~500ms	High	Free tier	Highest free-tier throughput (30K TPM). 17B MoE with 128K context.
`llama-3.3-70b-versatile`	Groq	~400ms	High	Free tier	Largest free Groq model. Concise, accurate QA. Best quality per free token.
`llama-3.1-8b-instant`	Groq	~250ms	Medium	Free tier	Fastest free Groq model. Some noise in output. Good for quick scans.
`qwen/qwen3-32b`	Groq	~1–2s	High	Free tier	Highest RPM (60). Uses thinking tokens. Strong analytical reasoning.
`gemini-2.5-flash`	Google AI	~10s	Highest	Free tier	Deep analysis with thinking tokens. Best for thorough QA. May 503 under demand.
`gemma4:26b`	Local Ollama	~2–5 min	High	Free (self-hosted)	Default. No external API calls. Requires server CPU/GPU. Best for privacy-sensitive work.

Choosing a model

Speed priority: llama3.1-8b (sub-second). Accuracy priority: gemini-2.5-flash (deep reasoning). Best balance: qwen-3-235b-a22b-instruct-2507 (fast + smart). Free + fast: meta-llama/llama-4-scout-17b-16e-instruct (Groq). Privacy: gemma4:26b (local, no data leaves your server).

Automatic vision acceleration

When SEODIFF_GOOGLE_AI_API_KEY is configured and the server’s default VLM is local Ollama, vision tasks (vision_vlm) are automatically routed to Gemini 2.5 Flash via Google AI Studio. This cuts vision latency from ~2–5 minutes (CPU inference) to ~10 seconds (cloud). No configuration or model override needed — it happens transparently. Text-only models (Cerebras, Gemma) still fall back correctly.

Automatic text acceleration

When the server’s default text LLM is local Ollama, text tasks (text_llm) are automatically routed to a fast cloud provider. Priority: Groq (llama-3.3-70b-versatile, ~400ms) → Cerebras (qwen-3-235b, ~3s) → Google AI (gemini-2.5-flash, ~10s) → local Ollama (fallback). This cuts text_llm from ~3–10 minutes (CPU) to under 1 second. To force local inference, set SEODIFF_LLM_BASE_URL to a non-localhost URL or use the model parameter to explicitly select a local model like gemma4:26b.

AI Readiness Tools Free tier

These tools analyze how well a page is optimized for AI consumption. Most are free-tier and accept a URL in the POST body.

POST /api/v1/aes

AI Extractability Score (AES). Analyzes how easily AI systems can extract structured data from a page.

curl -X POST -H "Content-Type: application/json" \
  -d '{"url":"https://example.com/page"}' \
  https://seodiff.io/api/v1/aes

POST /api/v1/chunking

RAG chunking simulator. Shows how the page would be chunked for retrieval-augmented generation.

POST /api/v1/entity-schema

Entity schema generator. Extracts and suggests JSON-LD structured data for a page.

POST /api/v1/training-data

LLM training data auditor. Analyzes quality signals for training corpus inclusion.

POST /api/v1/crawler-health

Robots.txt and crawler health check. Validates bot access and directives.

POST /api/v1/ai-crawler-sim

AI crawler simulation. Shows what an AI bot sees when crawling the page.

POST /api/v1/ai-answer-preview

AI answer preview. Simulates how an LLM would summarize or reference the page.

POST /api/v1/entropy

Structural entropy analysis. Measures content structure quality for machine consumption.

POST /api/v1/hallucination-test

Hallucination risk checker. Evaluates whether page content may induce LLM hallucinations.

POST /api/v1/llmstxt/validate

Validate a site's llms.txt file against the emerging standard.

POST /api/v1/llmstxt/generate

Generate an llms.txt file for a site.

Error Model

Errors return JSON with an error field:

{
  "error": "domain not verified for this account"
}

Status	Meaning
`400`	Invalid input (missing/malformed fields)
`401`	Missing or invalid API key
`403`	Plan-gated feature or insufficient permissions
`404`	Resource not found
`409`	Validation failed (scan did not pass)
`429`	Rate limit exceeded. Includes `Retry-After` header (seconds) and `retry_after_seconds` field in the JSON body.
`502`	Upstream error (database, crawl engine)

Access Scoping

API keys are scoped to your account. You can only access:

Sites, scans, and jobs belonging to your account
Scan exports (findings.json, findings.csv, summary.md) for verified domains only
Deep audit jobs you created

Domain verification is required for deep-audit creation and scan exports. Verify domains via DNS TXT or by connecting Google Search Console.

Pass/fail behavior

Snapshot gating: fail when selected issue keys are non-zero and/or overall issue rate exceeds a threshold.
When wait=true, SEODiff returns 200 for pass and 409 for fail (JSON body always includes pass).

API-first by design

The dashboard and CI/CD automation are clients of the same API. This keeps behavior consistent and allows SEODiff to evolve heuristics without changing your integration surface.

Integrations & Examples

Developer Hub
Central index of all code examples, integrations, and guides.

Assertion Glossary
17 assertion rules with problem explanations and API payloads.

Code Examples
Copy-paste examples in cURL, Python, Node.js, Go, and PHP.

CI/CD Integrations
GitHub Actions, GitLab CI, Vercel, Jenkins, and more.

Agent & IDE Guides
System prompts for Cursor, Copilot, Cline, Windsurf, LangChain.

API reference (v1)

Contents

Authentication

Base URL

Quick test

Rate limits

Account & Plan

GET /api/v1/me

GET /api/v1/audit

Sites & Monitoring

GET /api/v1/sites

POST /api/v1/sites

POST /api/v1/monitor/keepalive

Scanning & CI/CD

POST /api/v1/scan

POST /api/v1/validate

GET /api/v1/scans/{id}/summary.md Domain verified

GET /api/v1/scans/{id}/findings.json Domain verified

GET /api/v1/scans/{id}/findings.csv Domain verified

GET /api/v1/incidents

GET /api/v1/templates?base_url=...

GET /api/v1/timeline?base_url=...&template=...

GET /api/v1/template-drift-summaries?base_url=...

GET /api/v1/project-overview?base_url=...

Deep Audit Pro

POST /api/v1/deep-audit Domain verified

GET /api/v1/deep-audit/

GET /api/v1/deep-audit/{id}

GET /api/v1/deep-audit/{id}/report

GET /api/v1/deep-audit/{id}/json

GET /api/v1/deep-audit/{id}/summary

GET /api/v1/deep-audit/{id}/graph

GET /api/v1/deep-audit/{id}/url-pagerank

GET /api/v1/deep-audit/{id}/link-heat

GET /api/v1/deep-audit/{id}/full-audit

GET /api/v1/deep-audit/{id}/agent-md

GET /api/v1/deep-audit/{id}/pseo-agent

GET /api/v1/deep-audit/{id}/diff

GET /api/v1/deep-audit/{id}/diff-report

GET /api/v1/project/{id}/graph Legacy

GET /api/v1/project/{id}/url_pagerank Legacy

GET /api/v1/project/{id}/full_audit Legacy

Extraction Rules Pro

GET /api/v1/extraction-rules?base_url=...

POST /api/v1/extraction-rules?base_url=...

DELETE /api/v1/extraction-rules?base_url=...&field_name=...

POST /api/v1/extraction-rules/validate

Domain Verification

GET POST /api/v1/domain-verify/challenge

POST /api/v1/domain-verify/confirm

GET /api/v1/domain-verify/status?domain=...

Google Search Console Pro

GET /api/v1/gsc/connect?base_url=...

GET /api/v1/gsc/properties?base_url=...

POST /api/v1/gsc/select-property

POST /api/v1/gsc/sync

GET POST /api/v1/gsc/diagnostics

Alerts

GET /api/v1/alerts

GET /api/v1/alerts/unread

POST /api/v1/alerts/dismiss

POST /api/v1/alerts/dismiss-all

GET /api/v1/alerts/preferences

POST /api/v1/alerts/preferences

Schema Drift

GET /api/v1/schema-drift?base_url=...

GET /api/v1/schema-drift/diff?base_url=...&from=...&to=...

GET /api/v1/schema-drift/timeline?base_url=...

Indexation & IndexNow

GET /api/v1/indexation-health?base_url=...

GET /api/v1/indexation-health/inspect?url=...

GET /api/v1/indexnow/settings?base_url=...

POST /api/v1/indexnow/settings/update

POST /api/v1/indexnow/push

GET /api/v1/indexnow/log?base_url=...

Visibility & Radar Public

GET /api/v1/visibility/domain/{domain}

GET /api/v1/visibility/search?q=...

GET /api/v1/visibility/leaderboard?category=...

GET /api/v1/radar/scanner/status

Response (synchronous, `wait: true`)

Response (async, `wait: false`)