Most endpoints require an API key passed in the Authorization header:
Authorization: Bearer <your_api_key>
Get your key from the API Keys page in your account. Keys use the sd_live_ or sd_test_ prefix.
All paths are relative to:
https://seodiff.io/api/v1
https://api.seodiff.io/api/v1 also works (same backend).
curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
https://seodiff.io/api/v1/me
API requests are rate-limited per account tier. Free accounts get 60 requests per minute. Pro and Enterprise get higher limits. Exceeding the limit returns 429 Too Many Requests.
Returns your account info, plan details, and checkout URL (for plan upgrades).
curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
https://seodiff.io/api/v1/me
Response:
{
"account": { "id": "...", "name": "[email protected]" },
"plan": {
"tier": "pro",
"max_sites": 10,
"max_pages_per_scan": 5000,
"max_deep_audit_pages": 10000
},
"checkout_url": "https://..."
}
List recent API key audit events (key created, rotated, revoked, etc.).
curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
https://seodiff.io/api/v1/audit
List all monitored sites for your account.
curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
https://seodiff.io/api/v1/sites
Response:
[
{
"id": "...",
"base_url": "https://example.com",
"enabled": true,
"schedule": "nightly"
}
]
Add or update a monitored site.
curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"base_url": "https://example.com",
"enabled": true,
"schedule": "nightly"
}' \
https://seodiff.io/api/v1/sites
Send a keepalive ping for a monitored site. Keeps monitoring active if schedule-based probes are paused.
curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
-H "Content-Type: application/json" \
-d '{"base_url":"https://example.com"}' \
https://seodiff.io/api/v1/monitor/keepalive
If you are integrating for the first time, start with POST /api/v1/validate using wait=true. It gives a single response with pass/fail plus links to all artifacts.
Enqueue a surface scan. Returns 202 Accepted with an id and a status_url.
curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"base_url": "https://preview.example.com",
"render_js": false,
"lighthouse": false
}' \
https://seodiff.io/api/v1/scan
Response:
{
"id": "s_abc123...",
"status_url": "/api/v1/scans/s_abc123.../status"
}
CI-friendly scan wrapper. When wait=true, blocks until complete and returns pass/fail.
curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"base_url": "https://preview.example.com",
"preset": "fast",
"fail_on": "fetch_errors,non200_status,schema_missing_required",
"max_issue_rate": 10,
"wait": true,
"timeout_seconds": 180
}' \
https://seodiff.io/api/v1/validate
Response (wait=true):
{
"pass": true,
"reason": "",
"failing": {},
"report_url": "/scans/s_abc123/report.html",
"json_url": "/scans/s_abc123/findings.json",
"summary_markdown_url": "/api/v1/scans/s_abc123/summary.md"
}
| Field | Type | Description |
|---|---|---|
pass | boolean | Whether the scan passed all checks |
reason | string | Human-readable failure reason (empty on pass) |
failing | object | Failing keys with details |
report_url | string | HTML report link |
json_url | string | JSON findings export |
summary_markdown_url | string | Markdown summary for PR comments |
Returns 200 for pass, 409 for fail. May return 202 if the scan hasn't completed within the timeout.
Markdown summary suitable for GitHub PR comments.
curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
https://seodiff.io/api/v1/scans/s_abc123/summary.md
Normalized findings as JSON for downstream tooling.
curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
https://seodiff.io/api/v1/scans/s_abc123/findings.json
Normalized findings as CSV.
curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
https://seodiff.io/api/v1/scans/s_abc123/findings.csv
List recent drift incidents detected by monitoring.
curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
https://seodiff.io/api/v1/incidents
List template identifiers detected for a monitored site.
curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
"https://seodiff.io/api/v1/templates?base_url=https://example.com"
Response:
{
"templates": ["/product/*", "/collections/*"]
}
Drift timeline for a given base URL and template. The template value should match an entry from /api/v1/templates.
curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
"https://seodiff.io/api/v1/timeline?base_url=https://example.com&template=/product/*"
Aggregated template drift summaries across all templates for a site.
Dashboard aggregate for a project. Returns project summary cards, recent scans, and Search Console data.
curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
"https://seodiff.io/api/v1/project-overview?base_url=https://example.com"
Start a full-site deep crawl. Requires domain verification and Pro (or higher) plan.
curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"base_url": "https://example.com",
"crawl_scope": "deep_audit",
"max_pages": 500,
"render_js": false,
"respect_robots": true,
"crawl_speed": "normal",
"include_patterns": [],
"exclude_patterns": []
}' \
https://seodiff.io/api/v1/deep-audit
Response:
{
"job_id": "da_abc123...",
"status_url": "/api/v1/deep-audit/da_abc123...",
"report_url": "/api/v1/deep-audit/da_abc123.../report"
}
| Parameter | Type | Description |
|---|---|---|
base_url | string | Required. URL to crawl. |
crawl_scope | string | deep_audit (default) or full_site (Enterprise). |
max_pages | integer | Max pages to crawl (plan-limited). |
render_js | boolean | Enable JavaScript rendering. |
respect_robots | boolean | Obey robots.txt (default true). |
crawl_speed | string | slow, normal, or fast. |
include_patterns | string[] | URL patterns to include (glob). |
exclude_patterns | string[] | URL patterns to exclude (glob). |
List deep-audit jobs for your account.
curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
https://seodiff.io/api/v1/deep-audit/
Get job status, progress percentage, and metadata.
curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
https://seodiff.io/api/v1/deep-audit/da_abc123
Response:
{
"job_id": "da_abc123",
"status": "complete",
"progress": 100,
"base_url": "https://example.com",
"pages_crawled": 342,
"started_at": "2025-01-15T10:00:00Z",
"finished_at": "2025-01-15T10:05:23Z"
}
HTML report of the deep audit (job must be complete).
Raw JSON result with all crawled page data.
curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
https://seodiff.io/api/v1/deep-audit/da_abc123/json
Lightweight summary payload. Includes top-level metrics, issue counts, and crawl stats.
curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
https://seodiff.io/api/v1/deep-audit/da_abc123/summary
Template-level internal link graph payload for visualization.
URL-level internal PageRank scores and distribution summary.
curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
https://seodiff.io/api/v1/deep-audit/da_abc123/url-pagerank
Link heatmap data showing internal link equity distribution across templates and URLs.
Full audit aggregate payload for enterprise-style views. Combines all deep-audit sub-reports.
Structured Markdown export designed for LLM and AI agent consumption.
curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
https://seodiff.io/api/v1/deep-audit/da_abc123/agent-md
Programmatic SEO diagnostics payload for coding agents. Includes template stats, placeholder detection, hallucination rates, schema drift analysis, and actionable fix lists.
curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
https://seodiff.io/api/v1/deep-audit/da_abc123/pseo-agent
Response (abbreviated):
{
"meta": { "job_id": "da_abc123", "base_url": "https://example.com" },
"health": "B",
"health_score": 72,
"templates": [ ... ],
"data_integrity": {
"placeholder_outbreak": { "severity": "warning" },
"schema_type_drift": { "severity": "ok" }
},
"top_fixes": [ ... ]
}
Scan-over-scan diff JSON comparing this audit to the previous one for the same domain. Returns detected changes across 23 detection algorithms.
curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
https://seodiff.io/api/v1/deep-audit/da_abc123/diff
HTML diff report showing changes between consecutive deep audits.
Redirects to /api/v1/deep-audit/{id}/graph.
Redirects to /api/v1/deep-audit/{id}/url-pagerank.
Redirects to /api/v1/deep-audit/{id}/full-audit.
List custom extraction rules for a site.
curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
"https://seodiff.io/api/v1/extraction-rules?base_url=https://example.com"
Create or update a custom extraction rule.
curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"field_name": "price",
"selector_type": "css",
"selector": ".product-price",
"expected_type": "number",
"required": true
}' \
"https://seodiff.io/api/v1/extraction-rules?base_url=https://example.com"
Delete an extraction rule by field name.
curl -X DELETE -H "Authorization: Bearer $SEODIFF_API_KEY" \
"https://seodiff.io/api/v1/extraction-rules?base_url=https://example.com&field_name=price"
Dry-run a rule against sampled pages before saving.
curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"base_url": "https://example.com",
"rule": {
"field_name": "price",
"selector_type": "css",
"selector": ".product-price",
"expected_type": "number",
"required": true
}
}' \
https://seodiff.io/api/v1/extraction-rules/validate
Some endpoints (deep-audit, scan exports) require you to prove ownership of the domain. You can verify via DNS TXT record or by connecting Google Search Console.
Get or create a verification challenge for a domain. Accepts either GET with query params or POST with JSON body.
# GET with query param
curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
"https://seodiff.io/api/v1/domain-verify/challenge?domain=example.com"
# POST with JSON body
curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
-H "Content-Type: application/json" \
-d '{"domain":"example.com"}' \
https://seodiff.io/api/v1/domain-verify/challenge
Response:
{
"domain": "example.com",
"token": "seodiff-verify=abc123...",
"verified": false,
"dns_host": "_seodiff-verify.example.com",
"record_type": "TXT",
"record_value": "seodiff-verify=abc123...",
"instructions": "Add a DNS TXT record..."
}
Confirm domain verification after adding the DNS TXT record.
curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
-H "Content-Type: application/json" \
-d '{"domain":"example.com"}' \
https://seodiff.io/api/v1/domain-verify/confirm
Check whether a domain is verified for your account.
curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
"https://seodiff.io/api/v1/domain-verify/status?domain=example.com"
Start Google Search Console OAuth flow. Returns an auth_url to redirect the user to.
curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
"https://seodiff.io/api/v1/gsc/connect?base_url=https://example.com"
List available and selected Search Console properties for a connected site.
curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
"https://seodiff.io/api/v1/gsc/properties?base_url=https://example.com"
Set the active Search Console property for a site.
curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"base_url": "https://example.com",
"property": "sc-domain:example.com"
}' \
https://seodiff.io/api/v1/gsc/select-property
Trigger a fresh Search Console data sync for a site.
curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
-H "Content-Type: application/json" \
-d '{"base_url":"https://example.com"}' \
https://seodiff.io/api/v1/gsc/sync
GSC indexability diagnostics summary. Requires an active GSC connection. Accepts either GET with query params or POST with JSON body.
# GET with query param
curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
"https://seodiff.io/api/v1/gsc/diagnostics?base_url=https://example.com"
# POST with JSON body
curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
-H "Content-Type: application/json" \
-d '{"base_url":"https://example.com"}' \
https://seodiff.io/api/v1/gsc/diagnostics
List alerts for your account.
curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
https://seodiff.io/api/v1/alerts
Get count of unread alerts.
curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
https://seodiff.io/api/v1/alerts/unread
Response:
{"unread": 3}
Dismiss a single alert by ID.
curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
-H "Content-Type: application/json" \
-d '{"id":"alert_abc123"}' \
https://seodiff.io/api/v1/alerts/dismiss
Dismiss all alerts.
curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
https://seodiff.io/api/v1/alerts/dismiss-all
Get alert notification preferences.
Update alert notification preferences.
Get schema drift analysis for a monitored site.
curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
"https://seodiff.io/api/v1/schema-drift?base_url=https://example.com"
Get schema diff between two snapshots.
Schema change timeline for a site.
If your HTTP client uses a default or generic User-Agent (e.g. Python urllib), Cloudflare may block requests to seodiff.io with a 403. Fix: set a descriptive User-Agent header (e.g. MyApp/1.0), or use api.seodiff.io as an alternative host which bypasses Cloudflare’s browser-integrity check. All /api/v1/* endpoints work on both hosts.
Indexation health summary for a site. Requires GSC connection.
curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
"https://seodiff.io/api/v1/indexation-health?base_url=https://example.com"
Inspect indexation status for a specific URL.
Get IndexNow configuration for a site.
Update IndexNow settings (enable/disable, key, thresholds).
Push URLs to search engines via IndexNow protocol.
Before using this endpoint, you must configure an IndexNow key for the site via POST /api/v1/indexnow/settings/update (or the dashboard). The key file must be hosted at your domain root (e.g. https://example.com/{key}.txt) for search engines to verify ownership.
curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"base_url": "https://example.com",
"urls": [
"https://example.com/page-1",
"https://example.com/page-2"
]
}' \
https://seodiff.io/api/v1/indexnow/push
Get IndexNow push log with submission history and status codes.
These endpoints are publicly accessible without authentication.
Get ACRI visibility score and metadata for any domain.
curl https://seodiff.io/api/v1/visibility/domain/example.com
Response:
{
"domain": "example.com",
"acri_score": 78.5,
"tranco_rank": 1234,
"categories": ["technology"],
"last_crawled": "2025-01-15T08:00:00Z"
}
Search for domains in the visibility index.
curl "https://seodiff.io/api/v1/visibility/search?q=example"
Leaderboard of top domains by ACRI score, optionally filtered by category.
Live ticker showing recent radar scans, freshness stats, and queue depth.
curl https://seodiff.io/api/v1/radar/scanner/status
Industry pulse: daily movers, biggest ACRI changes, and trending domains.
curl https://seodiff.io/api/v1/radar/scanner/pulse
Instead of using curl to fetch raw HTML (which blows out the LLM’s context window), call this endpoint. SEODiff crawls the pages, runs your assertions, computes SEO/GEO metrics, and returns a token-compressed summary designed for LLM consumption.
Evaluate programmatic SEO pages at scale. Provide URLs (explicit list or sitemap + pattern), custom assertions, and get back a pass/fail verdict with clustered errors and a structural DOM fingerprint — all in a compact JSON payload your AI agent can reason about.
curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"base_url": "https://staging.example.com",
"url_pattern": "/etf/*",
"sample_size": 30,
"assertions": [
{"rule": "contains_string", "value": "Dividend History", "severity": "critical"},
{"rule": "not_contains_string", "value": "undefined", "severity": "critical"},
{"rule": "min_word_count", "value": 500, "severity": "warning"},
{"rule": "selector_exists", "value": "table.holdings", "severity": "critical"},
{"rule": "has_schema"},
{"rule": "no_placeholders"},
{"rule": "max_js_ghost_ratio", "value": 0.1, "severity": "critical"}
],
"wait": false
}' \
https://seodiff.io/api/v1/agent/evaluate
| Field | Type | Description |
|---|---|---|
base_url | string | Base URL for sitemap discovery and pattern matching. Required unless urls is provided. |
urls | string[] | Explicit list of URLs to evaluate. Overrides sitemap discovery. |
url_pattern | string | Glob pattern to filter discovered URLs (e.g. /etf/*, /trail/*/details). |
sample_size | integer | Max pages to evaluate (default 20, max 200). |
assertions | array | Custom assertions to run against each page (see below). |
url_assertions | object | Per-URL assertion overrides. Keys are full URLs, values are assertion arrays. When present for a URL, these replace the global assertions for that URL. Example: {"https://example.com/old-page": [{"rule": "status_code", "value": 410}]} |
baseline_eval_id | string | Previous evaluation_id to compare against. If provided, the response includes a regressions array showing any metric degradations (e.g. dropped H1 coverage, ACRI regression, new placeholder leaks). |
wait | boolean | Block until evaluation completes (default true). When false, returns 202 Accepted with a status_url for polling. |
timeout_seconds | integer | Max wait time (default 120, max 300). |
cache_bust | boolean | Append a unique query parameter to each URL to bypass CDN and server-side caches. Useful for evaluating freshly deployed pages. |
wait defaults to true — the request blocks until evaluation completes (up to timeout_seconds, default 120s). This is ideal for small-to-medium evaluations (≤20 pages). No polling required; you get the full result in one call.
wait: falseMost LLM tool-call APIs (OpenAI, Anthropic, local IDE extensions) have HTTP timeouts of 30–60 seconds. For large evaluations (30+ pages), set wait: false. The endpoint returns instantly with a status_url. Instruct your agent: “Poll the status_url every 5 seconds until status is passed or failed.”
| Rule | Value type | Description |
|---|---|---|
contains_string | string | Visible text must contain this string |
not_contains_string | string | Visible text must NOT contain this string (substring match) |
not_contains_word | string | Visible text must NOT contain this word as a whole word (word-boundary match, case-insensitive). Unlike not_contains_string, this won’t match inside other words — e.g. "undefined" won’t match “fee structures remain undefined” but will match the standalone word “undefined” as rendered placeholder text. Ideal for avoiding false positives on English prose that naturally uses words like “undefined”, “null”, or “NaN”. |
html_contains_string | string | Raw HTML must contain this string (searches full source including <script>, <meta> tags, JSON-LD, etc.). Use this when asserting on elements not in visible text. |
html_not_contains_string | string | Raw HTML must NOT contain this string |
min_word_count | integer | Minimum word count |
max_word_count | integer | Maximum word count |
status_code | integer | Expected HTTP status (e.g. 200) |
has_schema | — | At least one JSON-LD schema block |
has_schema_type | string | JSON-LD must contain a specific @type (e.g. BreadcrumbList, Product). Case-insensitive. |
min_schema_count | integer | Minimum number of JSON-LD blocks |
has_h1 | — | Page must have an H1 |
has_meta_description | — | Page must have a meta description |
selector_exists | CSS selector | A CSS selector that must match ≥1 element |
selector_count | integer | CSS selector (in selector field) must match ≥N elements |
regex_match | regex | Visible text must match this regex |
regex_not_match | regex | Visible text must NOT match this regex |
no_placeholders | — | No pSEO placeholder leaks ({{var}}, undefined, NaN, etc.). Scans visible text only — excludes <script>, <style>, <code>, <pre>, <svg>, and aria-hidden="true" elements. Safe for React/Next.js RSC payloads, URL-encoded strings, and SVG chart labels. Matches show surrounding text for easy triage. ERROR is matched in all-caps only. AI hallucination patterns require full phrases (e.g. “as an AI language model”, “as an AI, I” — not “as an AI” followed by a regular word like “infrastructure”). Hyphenated-word safe: null inside compound words like “non-null” is excluded automatically. N/A context-aware: bare N/A values only trigger as placeholders when ≥3 occurrences are found on a page (to avoid false positives on data tables where “N/A” is a legitimate “not applicable” value). Supports an optional ignore array to whitelist terms: {"rule": "no_placeholders", "ignore": ["error", "n/a"]}. |
min_acri | integer | Minimum ACRI score (0–100) |
max_token_bloat | float | Maximum HTML-to-text ratio |
no_noindex | — | Page must not have noindex |
max_js_ghost_ratio | float (0–1) | Maximum JS ghost ratio. Detects pages that require JavaScript to render content (React/Next.js/Vue/Angular/Svelte). A ratio of 0.95 means the page is almost invisible to HTML-only crawlers. Use 0.1 to ensure proper SSR. |
has_canonical | — | Page must have a <link rel="canonical"> tag |
no_duplicate_h1 | — | Page must have at most one <h1> element |
response_ok | — | Convenience alias for status_code: 200. Requires no value field. |
Each assertion accepts an optional severity: "critical" (default) or "warning".
threshold ↔ value aliasBoth "value" and "threshold" are accepted in assertion objects. If both are present, value takes precedence. Example: {"rule": "max_token_bloat", "threshold": 12} is equivalent to {"rule": "max_token_bloat", "value": 12}.
type ↔ rule aliasBoth "rule" and "type" are accepted as the assertion key. If you write {"type": "has_schema_type", "value": "BreadcrumbList"}, it works identically to {"rule": "has_schema_type", "value": "BreadcrumbList"}.
contains_string searches visible text only (scripts, styles, and framework payloads are stripped). To assert on JSON-LD schema types, meta tags, or other elements inside <script> or <head>, use html_contains_string or the dedicated has_schema_type assertion.
When baseline_eval_id is provided, the response always includes a regressions array — even during processing status. If there are no regressions, it will be an empty array ([]), never null.
wait: true){
"evaluation_id": "eval_17120...",
"status": "failed",
"pages_evaluated": 30,
"pass_rate": 87,
"duration_ms": 4521,
"summary": "Evaluation failed: 26/30 pages passed. critical assertion \"not_contains_string\"=\"undefined\" failed on 4/30 pages.",
"failed_assertions": [
{
"rule": "not_contains_string",
"value": "undefined",
"severity": "critical",
"failure_count": 4,
"failure_rate": "4/30 pages",
"example_url": "https://staging.example.com/etf/GLD",
"context": "Found \"undefined\" in: …<div class='dividend-yield'>undefined</div>…"
}
],
"regressions": [
{
"metric": "schema_coverage",
"previous": "100%",
"current": "87%",
"delta": "-13%",
"diagnosis": "Schema coverage dropped — JSON-LD blocks may have been removed."
}
],
"metrics": {
"avg_word_count": 1234,
"avg_acri": 72,
"schema_coverage": "87%",
"h1_coverage": "100%",
"meta_desc_coverage": "93%",
"avg_token_bloat": 8.2,
"non_200_count": 0,
"error_count": 0,
"placeholder_pages": 2
},
"structural_fingerprint": "H1(SPY ETF Overview) > H2(Holdings) > Table(50 rows) > H2(Dividend History) > Chart > Footer",
"failing_pages": [
{
"url": "https://staging.example.com/etf/GLD",
"http_status": 200,
"word_count": 12,
"acri": 31,
"failed_assertions": ["not_contains_string:undefined", "min_word_count:500"],
"structural_fingerprint": "H1(GLD) > Div(error) > [EXPECTED table.holdings MISSING]"
}
]
}
wait: false)When wait: false, the endpoint returns 202 Accepted immediately:
{
"evaluation_id": "eval_17120...",
"status": "processing",
"status_url": "/api/v1/agent/evaluate/eval_17120...",
"pages_planned": 30
}
Poll evaluation status. While processing, returns a lightweight status object. Once complete, returns the full evaluation result (same schema as the synchronous response above).
# Poll until complete
curl -H "Authorization: Bearer $SEODIFF_API_KEY" \
https://seodiff.io/api/v1/agent/evaluate/eval_17120...
Evaluation results are cached for 1 hour. Completed evaluations can be referenced as baseline_eval_id in subsequent evaluations to detect regressions.
| Field | Description |
|---|---|
status | passed, failed, or processing (async mode) |
status_url | Polling URL (async mode only). GET this URL to check progress or retrieve the completed result. |
pass_rate | Percentage of pages that passed all assertions (0–100) |
summary | One-paragraph human/LLM-readable summary of the evaluation |
failed_assertions | Aggregated assertion failures clustered by rule (not per-page). Includes exact context snippets showing where each failure occurred. |
regressions | Metric degradations vs. the baseline_eval_id (only present when a baseline is provided). Tracks: pass_rate, avg_acri, avg_word_count, schema_coverage, h1_coverage, meta_desc_coverage, non_200_count, placeholder_pages, avg_token_bloat. |
metrics | Averaged SEO metrics across all evaluated pages |
structural_fingerprint | Compact DOM skeleton of the first page (token-compressed). For failing pages, missing selectors are annotated: [EXPECTED .selector MISSING] |
failing_pages | Details of pages that failed (max 20, for token budget) |
sample_note | Present when sample_size truncated the URL list. Shows how many URLs were evaluated vs. discovered, e.g. “Evaluated 20 of 275 URLs (sample_size=20). Increase sample_size for broader coverage.” |
1. You ask your AI agent: “Add a dividend section to the ETF template. Verify it works across edge cases.”
2. The agent writes code, pushes to staging.
3. The agent calls POST /api/v1/agent/evaluate with wait: false and assertions like not_contains_string: undefined, selector_exists: .dividend-table, and max_js_ghost_ratio: 0.1.
4. The agent polls GET /api/v1/agent/evaluate/{id} every 5 seconds until status is passed or failed.
5. SEODiff finds GLD (Gold ETFs have no dividends) renders “undefined”. The failing page’s fingerprint shows: H1(GLD) > [EXPECTED .dividend-table MISSING].
6. The agent reads the compact JSON, adds a null-check, re-pushes, re-evaluates — this time passing baseline_eval_id from the first run to catch regressions. Status: passed, 0 regressions.
7. The agent reports back: “Feature deployed and verified across 30 edge cases.”
/agent/evaluate (Micro): Use in your daily coding prompts. The agent runs this during the coding loop on 30 staging pages to verify a feature PR before merging.
/deep-audit/{id}/pseo-agent (Macro): Use for weekly portfolio maintenance. An agent runs this on a full 10,000-page deep audit to find widespread architectural rot.
Visual and semantic QA for individual pages. Supports three modes: manual (you define the tasks), chaos (autonomous exploratory QA — the LLM decides what to test), and profile-template (auto-generate assertion rules from golden pages).
Set exploration_mode: "chaos" and the LLM autonomously generates QA tasks covering data trust, template integrity, visual sanity, UX confusion, and SEO meta — without you writing a single assertion. Optionally provide a persona to unlock domain-specific probes (finance, outdoor/trail, recipe).
Submit a page for multi-modal QA. Returns immediately with a review_id and status_url for async polling. The endpoint fetches the page, optionally captures a screenshot, and runs QA tasks through text and vision LLMs.
curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/etf/SPY",
"qa_tasks": [
{
"id": "holdings_sum",
"modality": "text_llm",
"prompt": "Check if the portfolio weights in the holdings table sum to 100%"
},
{
"id": "chart_match",
"modality": "vision_vlm",
"prompt": "Does the performance chart visually match the stated YTD return?"
}
]
}' \
https://seodiff.io/api/v1/agent/human-review
curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/etf/SPY",
"exploration_mode": "chaos",
"persona": "You are a skeptical financial analyst"
}' \
https://seodiff.io/api/v1/agent/human-review
No qa_tasks needed. The LLM auto-generates 5+ tasks spanning text and vision modalities.
| Field | Type | Description |
|---|---|---|
url | string | Required (or use urls). The page URL to evaluate. |
urls | string[] | Batch mode: array of URLs to review with shared config. Each gets its own review_id. Max 50. |
exploration_mode | string | Set to "chaos" for autonomous exploratory QA. When set, qa_tasks is optional (auto-generated if empty). |
persona | string | Optional persona for the LLM (e.g. “You are a senior Mountain Guide and UX critic”). In chaos mode, triggers domain-specific probes. Keywords: financ*/investor/stock/trading/portfolio/analyst, hiker/outdoor/trail, chef/cook/recipe. |
qa_tasks | array | Explicit QA tasks. Each has id (string), modality (text_llm, vision_vlm, or data_interpreter), and prompt (string). Omit in chaos mode to auto-generate. |
strictness | string | low, medium (default), or high. Controls how aggressively the LLM flags issues. |
wait | boolean | Default false (async — returns immediately). Set true to block until complete. |
depth | string | "quick" skips vision tasks for faster iteration (text_llm only, ~3 tasks). Default is full mode (text + vision, 5+ tasks). Only affects chaos mode auto-generated tasks. |
model | string | Override inference model. Routing is automatic: gemini-* / gemma-* → Google AI Studio, llama3.1-* / qwen-* / gpt-oss-* / zai-glm-* → Cerebras Cloud, org/model / llama-3.* → Groq Cloud, anything else → local Ollama. Default: server-configured model (currently gemma4:26b). See model comparison below. |
timeout_sec | integer | Max seconds for the review. If exceeded, completed tasks are returned and remaining tasks are marked “model unavailable: timeout expired”. Range: 30–1500 (25 min). Default: 20 min (sync) / 25 min (async). |
When exploration_mode: "chaos" is set, the following tasks are always generated:
| Task ID | Modality | What it checks |
|---|---|---|
chaos_content_trust | text_llm | Implausible claims, contradictory numbers, suspiciously round data |
chaos_template_integrity | text_llm | Placeholder text, Lorem ipsum, data from wrong page instance |
chaos_seo_meta | text_llm | Title/meta/H1 uniqueness — template stamps vs. dynamic insertion |
chaos_visual_sanity | vision_vlm | Overlapping elements, broken images, charts that don’t match data |
chaos_ux_confusion | vision_vlm | 5-second test: can you tell what this page is about and what to do next? |
Persona-triggered extras: chaos_financial_data (finance personas), chaos_trail_data (outdoor personas), chaos_recipe_data (cooking personas).
After the LLM produces findings, each issue’s element field is cross-referenced against the raw page text. If the claimed element text does not appear anywhere on the page, the issue is demoted to severity: "info" and its description is prefixed with [unverified]. This prevents pure LLM hallucinations (e.g. “copyright says © 2020” when no such text exists) from being reported as high-severity findings.
Full mode (default): 2–15 minutes per page. The 2 vision_vlm tasks require headless Chromium screenshot capture + VLM inference, which is CPU-intensive. Expect ~2–5 min with Google AI models, ~10–15 min with local Ollama on CPU-only servers.
Quick mode ("depth": "quick"): 3–60 seconds per page. Skips all vision_vlm tasks and runs only text_llm tasks. Recommended for batch validation of programmatic pages where visual checks are less critical.
For multi-page chaos reviews, submit jobs concurrently and poll each status_url independently.
Initial response (async):
{
"review_id": "hr_8528b6edc53b7454",
"status": "processing",
"status_url": "/api/v1/agent/human-review/hr_8528b6edc53b7454",
"url": "https://example.com/etf/SPY",
"overall_passed": null,
"duration_ms": 0
}
overall_passed is null until the review completes. Only treat it as a boolean (true/false) when status is "complete".
When urls array is provided, the response wraps multiple reviews:
{
"batch": true,
"count": 3,
"reviews": [
{ "review_id": "hr_a1b2c3d4e5f6a7b8", "status": "processing", "status_url": "/api/v1/agent/human-review/hr_a1b2c3d4e5f6a7b8", "url": "https://example.com/page-1" },
{ "review_id": "hr_b2c3d4e5f6a7b8c9", "status": "processing", "status_url": "/api/v1/agent/human-review/hr_b2c3d4e5f6a7b8c9", "url": "https://example.com/page-2" },
{ "review_id": "hr_c3d4e5f6a7b8c9d0", "status": "processing", "status_url": "/api/v1/agent/human-review/hr_c3d4e5f6a7b8c9d0", "url": "https://example.com/page-3" }
]
}
Each review can be polled independently via its status_url.
Poll review status. Once complete, returns the full result:
{
"review_id": "hr_8528b6edc53b7454",
"status": "complete",
"url": "https://example.com/etf/SPY",
"human_verdict": "4 of 6 QA checks failed.",
"overall_passed": false,
"qa_results": [
{
"task_id": "chaos_content_trust",
"modality": "text_llm",
"prompt": "Examine all claims, numbers, and data points...",
"category": "data",
"passed": false,
"issues": [
{
"element": "Holdings Table",
"description": "Portfolio weights sum to 103.2%, exceeding 100%.",
"severity": "critical"
}
],
"notes": "The top-10 holdings alone sum to 37%. Full table sums to 103.2%."
}
],
"page_context": {
"title": "SPY ETF Overview",
"word_count": 1234,
"has_h1": true
},
"duration_ms": 47294,
"model": "gemma4:26b"
}
| Field | Description |
|---|---|
status | processing, complete, or failed |
human_verdict | One-line summary (e.g. “4 of 6 QA checks failed.”) |
overall_passed | true/false when complete; null while status is processing |
qa_results | Array of results, one per QA task. Each has task_id, modality, prompt, category, passed, issues[], and notes. |
issues[].severity | critical, warning, or info |
page_context | Page metadata: title, word_count, has_h1 |
model | LLM model used (e.g. gemma4:26b) |
duration_ms | Total wall-clock time in milliseconds |
tasks_total | (Polling only) Total QA tasks planned |
tasks_completed | (Polling only) Tasks finished so far |
progress_pct | (Polling only) Completion percentage (0–100) |
est_remain_sec | (Polling only) Estimated seconds remaining. Based on elapsed time per completed task. 0 when unknown (no tasks completed yet). |
current_step | (Polling only) Human-readable progress step (e.g. “running vision_vlm (2 tasks)”, “capturing screenshot”) |
retry_after_sec | Present (default: 30) when one or more QA tasks were skipped due to model unavailability. Agents should wait this many seconds before retrying the review. |
Chaos mode is for exploratory, open-ended QA on individual pages — finding issues you didn’t know to look for. It uses LLM reasoning (text + vision) and is best for spot-checking programmatic pages in production.
/agent/evaluate is for deterministic, rule-based validation at scale — running 17 assertion types across 200 pages. Use it in CI/CD pipelines where you know exactly what to check.
Automated baseline profiler. Feed 2–10 “golden” page URLs from the same pSEO template, and the LLM reverse-engineers the template’s structure to generate assertion rules you can feed into /agent/human-review or /agent/evaluate.
curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"urls": [
"https://example.com/etf/SPY",
"https://example.com/etf/QQQ",
"https://example.com/etf/IWM"
],
"template_name": "etf_page"
}' \
https://seodiff.io/api/v1/agent/profile-template
| Field | Type | Description |
|---|---|---|
urls | string[] | Required. 2–10 golden page URLs from the same template. |
template_name | string | Human-friendly label for the template. |
focus_areas | string[] | Optional focus: data_accuracy, layout, seo_meta, etc. |
wait | boolean | Default false (async). Set true to block until complete. Async returns 202 Accepted with a status_url for polling at /api/v1/agent/profile-template/{job_id}. |
model | string | Override inference model. Same routing as /agent/human-review: Google AI (gemini-* / gemma-*) auto-routes, local Ollama names accepted. |
{
"template_name": "etf_page",
"page_count": 3,
"generated_assertions": [
{
"rule": "The H1 must follow the pattern: '[Ticker] ETF Overview'",
"category": "seo",
"modality": "text_llm",
"confidence": 0.95,
"reason": "All 3 pages follow this H1 pattern consistently."
},
{
"rule": "A holdings table with at least 10 rows must be present",
"category": "structural",
"modality": "text_llm",
"confidence": 0.9,
"reason": "Core functional component present in all sampled pages."
}
],
"variance_notes": "Dynamic slots: [Ticker], [Fund Name], [AUM], [Expense Ratio]...",
"duration_ms": 289825
}
Both /agent/human-review and /agent/profile-template support Google AI Studio models via the model parameter. When a model name starts with gemini- or gemma-, the request is automatically routed to Google’s OpenAI-compatible endpoint.
Setup: Set the SEODIFF_GOOGLE_AI_API_KEY environment variable on the server with your Google AI Studio API key. No other configuration is needed.
| Model | RPM | RPD | Vision | Best for |
|---|---|---|---|---|
gemini-2.5-flash | 5 | 20 | Yes | Powerful reasoning (may 503 under free-tier demand) |
gemini-3-flash-preview | 5 | 20 | Yes | Latest generation, multimodal |
gemini-3.1-flash-lite-preview | 15 | 500 | Yes | Best daily limit, fast (<10s) |
gemma-4-31b-it | 15 | 1,500 | No | High-quality text, high volume |
gemma-4-26b-a4b-it | 15 | 1,500 | No | High-volume text tasks |
Gemini models support vision_vlm tasks (screenshot analysis). Gemma models are text-only — vision tasks will fall back to the server-configured VLM. For full chaos mode (text + vision), use a Gemini model or omit the model field to use the default.
If Google AI Studio returns HTTP 503 (high demand) or 429 (rate limited), SEODiff automatically retries up to 3 times with exponential backoff (5s, 10s, 15s). The Retry-After header from upstream providers (Groq, Cerebras, etc.) is respected when present. If all attempts fail, affected QA tasks are reported as “Model unavailable” and the response includes a retry_after_sec field (default: 30) so agents can schedule a retry.
curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/etf/SPY",
"exploration_mode": "chaos",
"model": "gemini-2.5-flash"
}' \
https://seodiff.io/api/v1/agent/human-review
Model names starting with llama3.1-, qwen-, gpt-oss-, or zai-glm- are routed to Cerebras Cloud, which provides extremely fast inference via wafer-scale hardware.
Setup: Set the SEODIFF_CEREBRAS_API_KEY environment variable on the server. No other configuration is needed.
| Model | Context | Vision | Best for |
|---|---|---|---|
llama3.1-8b | 8K | No | Fastest option (<1s). Good for quick text QA at scale. |
qwen-3-235b-a22b-instruct-2507 | 65K | No | Best accuracy/speed balance. 235B MoE, deep reasoning in ~3s. |
gpt-oss-120b | 128K | No | Large context window, general-purpose. |
zai-glm-4.7 | 32K | No | Premium quality, highest per-token cost. |
Cerebras models do not support vision_vlm tasks. Vision tasks will fall back to the server-configured VLM (local Ollama). For full chaos mode (text + vision), use a Gemini model or the default.
curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/recipe/carbonara",
"exploration_mode": "chaos",
"model": "qwen-3-235b-a22b-instruct-2507"
}' \
https://seodiff.io/api/v1/agent/human-review
Models using org/model format (e.g. meta-llama/llama-4-scout-17b-16e-instruct) or starting with llama-3. are routed to Groq Cloud, which provides fast LPU-accelerated inference with a generous free tier.
Setup: Set the SEODIFF_GROQ_API_KEY environment variable on the server. No other configuration is needed.
| Model | Context | Vision | Free-tier RPM / TPM | Best for |
|---|---|---|---|---|
meta-llama/llama-4-scout-17b-16e-instruct | 128K | No | 30 / 30K | Highest free-tier throughput. Great overall quality. |
llama-3.3-70b-versatile | 128K | No | 30 / 12K | Largest free model. Best accuracy for complex QA. |
llama-3.1-8b-instant | 128K | No | 30 / 6K | Fastest free option. Good for quick text QA. |
qwen/qwen3-32b | 32K | No | 60 / 6K | Highest RPM. Strong reasoning. |
moonshotai/kimi-k2-instruct | 128K | No | 60 / 10K | High RPM + throughput. Long context. |
Groq models do not support vision_vlm tasks. Vision tasks will fall back to the server-configured VLM. For full chaos mode (text + vision), use a Gemini model or the default.
The free tier has per-minute and per-day limits (RPM/RPD/TPM/TPD). If you hit a 429 response, the retry-after header tells you how long to wait. For high-volume work, consider models with higher RPM like qwen/qwen3-32b (60 RPM) or use batch mode with smaller payloads.
curl -X POST -H "Authorization: Bearer $SEODIFF_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/product/wireless-headphones",
"exploration_mode": "chaos",
"model": "meta-llama/llama-4-scout-17b-16e-instruct"
}' \
https://seodiff.io/api/v1/agent/human-review
Benchmark results for a standardized QA task (recipe page, chaos-style prompt, single text_llm task). Speed measured end-to-end from API call to parsed response.
| Model | Provider | Speed | Accuracy | Cost | Notes |
|---|---|---|---|---|---|
llama3.1-8b | Cerebras | ~500ms | Medium | $0.10/M tokens | Fastest. Occasional false positives on valid data. Best for high-volume quick scans. |
qwen-3-235b | Cerebras | ~3s | High | Preview (free) | Best speed/accuracy balance. Catches nutrition inconsistencies, analytical reasoning. |
meta-llama/llama-4-scout-17b-16e-instruct | Groq | ~500ms | High | Free tier | Highest free-tier throughput (30K TPM). 17B MoE with 128K context. |
llama-3.3-70b-versatile | Groq | ~400ms | High | Free tier | Largest free Groq model. Concise, accurate QA. Best quality per free token. |
llama-3.1-8b-instant | Groq | ~250ms | Medium | Free tier | Fastest free Groq model. Some noise in output. Good for quick scans. |
qwen/qwen3-32b | Groq | ~1–2s | High | Free tier | Highest RPM (60). Uses thinking tokens. Strong analytical reasoning. |
gemini-2.5-flash | Google AI | ~10s | Highest | Free tier | Deep analysis with thinking tokens. Best for thorough QA. May 503 under demand. |
gemma4:26b | Local Ollama | ~2–5 min | High | Free (self-hosted) | Default. No external API calls. Requires server CPU/GPU. Best for privacy-sensitive work. |
Speed priority: llama3.1-8b (sub-second). Accuracy priority: gemini-2.5-flash (deep reasoning). Best balance: qwen-3-235b-a22b-instruct-2507 (fast + smart). Free + fast: meta-llama/llama-4-scout-17b-16e-instruct (Groq). Privacy: gemma4:26b (local, no data leaves your server).
When SEODIFF_GOOGLE_AI_API_KEY is configured and the server’s default VLM is local Ollama, vision tasks (vision_vlm) are automatically routed to Gemini 2.5 Flash via Google AI Studio. This cuts vision latency from ~2–5 minutes (CPU inference) to ~10 seconds (cloud). No configuration or model override needed — it happens transparently. Text-only models (Cerebras, Gemma) still fall back correctly.
When the server’s default text LLM is local Ollama, text tasks (text_llm) are automatically routed to a fast cloud provider. Priority: Groq (llama-3.3-70b-versatile, ~400ms) → Cerebras (qwen-3-235b, ~3s) → Google AI (gemini-2.5-flash, ~10s) → local Ollama (fallback). This cuts text_llm from ~3–10 minutes (CPU) to under 1 second. To force local inference, set SEODIFF_LLM_BASE_URL to a non-localhost URL or use the model parameter to explicitly select a local model like gemma4:26b.
These tools analyze how well a page is optimized for AI consumption. Most are free-tier and accept a URL in the POST body.
AI Extractability Score (AES). Analyzes how easily AI systems can extract structured data from a page.
curl -X POST -H "Content-Type: application/json" \
-d '{"url":"https://example.com/page"}' \
https://seodiff.io/api/v1/aes
RAG chunking simulator. Shows how the page would be chunked for retrieval-augmented generation.
Entity schema generator. Extracts and suggests JSON-LD structured data for a page.
LLM training data auditor. Analyzes quality signals for training corpus inclusion.
Robots.txt and crawler health check. Validates bot access and directives.
AI crawler simulation. Shows what an AI bot sees when crawling the page.
AI answer preview. Simulates how an LLM would summarize or reference the page.
Structural entropy analysis. Measures content structure quality for machine consumption.
Hallucination risk checker. Evaluates whether page content may induce LLM hallucinations.
Validate a site's llms.txt file against the emerging standard.
Generate an llms.txt file for a site.
Errors return JSON with an error field:
{
"error": "domain not verified for this account"
}
| Status | Meaning |
|---|---|
400 | Invalid input (missing/malformed fields) |
401 | Missing or invalid API key |
403 | Plan-gated feature or insufficient permissions |
404 | Resource not found |
409 | Validation failed (scan did not pass) |
429 | Rate limit exceeded. Includes Retry-After header (seconds) and retry_after_seconds field in the JSON body. |
502 | Upstream error (database, crawl engine) |
API keys are scoped to your account. You can only access:
Domain verification is required for deep-audit creation and scan exports. Verify domains via DNS TXT or by connecting Google Search Console.
wait=true, SEODiff returns 200 for pass and 409 for fail (JSON body always includes pass).The dashboard and CI/CD automation are clients of the same API. This keeps behavior consistent and allows SEODiff to evolve heuristics without changing your integration surface.