POST /api/v1/agent/evaluate endpoint. Each rule defines a deterministic pass/fail gate that runs against every sampled URL.Check visible page text and raw HTML source for expected or forbidden strings, regex patterns, and placeholder leaks.
| Rule | Value | Description |
|---|---|---|
contains_string |
string | Visible text must contain this string. Case-sensitive. |
not_contains_string |
string | Visible text must NOT contain this string. Use for catching leaked debug text, competitor names, or placeholder copy. |
html_contains_string |
string | Raw HTML source (including <script>, <meta>, JSON-LD) must contain this string. |
html_not_contains_string |
string | Raw HTML source must NOT contain this string. |
regex_match |
regex | Visible text must match this Go-flavour regex (RE2 syntax). |
regex_not_match |
regex | Visible text must NOT match this regex. |
no_placeholders |
— | Detects pSEO placeholder leaks: {{var}}, undefined, NaN, ERROR, common AI hallucination phrases. Strips <svg> and aria-hidden="true" elements to prevent false positives from chart labels. AI pattern “as an AI” requires a full phrase boundary (won’t match “as an AI infrastructure leader”). Supports optional ignore array to whitelist terms. Alias: no_placeholder (backward-compatible). |
{
"assertions": [
{ "rule": "contains_string", "value": "Free shipping" },
{ "rule": "not_contains_string", "value": "Lorem ipsum" },
{ "rule": "no_placeholders", "ignore": ["{{example}}"] }
]
}
Enforce minimum and maximum word counts on visible page text. Useful for catching thin pages or content explosions in pSEO templates.
| Rule | Value | Description |
|---|---|---|
min_word_count |
integer | Page must have ≥ N words of visible text. |
max_word_count |
integer | Page must have ≤ N words. Guards against runaway template expansions. |
{ "rule": "min_word_count", "value": 200 }
| Rule | Value | Description |
|---|---|---|
status_code |
integer | HTTP response status must equal this value. Common: 200, 301, 410. |
{ "rule": "status_code", "value": 200 }
Verify that JSON-LD schema blocks exist, contain the correct @type, or meet a minimum count.
| Rule | Value | Description |
|---|---|---|
has_schema |
— | At least one <script type="application/ld+json"> block must exist. |
has_schema_type |
string | JSON-LD must contain a specific @type (e.g. Product, BreadcrumbList). Case-insensitive. |
min_schema_count |
integer | Minimum number of JSON-LD blocks on the page. |
{
"assertions": [
{ "rule": "has_schema" },
{ "rule": "has_schema_type", "value": "Product" },
{ "rule": "min_schema_count", "value": 2 }
]
}
Check for the presence or count of specific DOM elements using CSS selectors.
| Rule | Value / Field | Description |
|---|---|---|
has_h1 |
— | Page must have at least one <h1> element. |
has_meta_description |
— | Page must have a <meta name="description"> tag. |
selector_exists |
selector |
CSS selector must match ≥1 element. Use the selector field. Example: "selector": "nav.breadcrumb" |
selector_count |
selector + integer |
CSS selector must match ≥ N elements. value is the minimum count, selector is the CSS selector. |
{
"assertions": [
{ "rule": "has_h1" },
{ "rule": "has_meta_description" },
{ "rule": "selector_exists", "selector": "nav.breadcrumb" },
{ "rule": "selector_count", "selector": ".product-card", "value": 5 }
]
}
Enforce quality scores and rendering requirements.
| Rule | Value | Description |
|---|---|---|
min_acri |
integer (0–100) | Minimum AI Crawler Reality Index score. Pages below this score are poorly structured for AI crawlers. |
max_token_bloat |
float | Maximum HTML-to-text ratio. High values (>15) indicate bloated markup that wastes LLM context windows. |
no_noindex |
— | Page must NOT have a noindex robots directive (meta tag or X-Robots-Tag header). |
max_js_ghost_ratio |
float (0–1) | Maximum JS ghost ratio. Detects pages that require JavaScript to render content. Set to 0.1 to enforce SSR (server-side rendering). |
has_canonical |
— | Page must have a <link rel="canonical"> tag. Important for deduplication of pSEO pages and preventing index bloat. |
no_duplicate_h1 |
— | Page must have at most one <h1> element. Multiple H1s confuse search engine heading hierarchy signals. |
response_ok |
— | Convenience alias for status_code: 200. No value needed. |
{
"assertions": [
{ "rule": "min_acri", "value": 60 },
{ "rule": "max_token_bloat", "threshold": 12 },
{ "rule": "no_noindex" },
{ "rule": "max_js_ghost_ratio", "value": 0.1 },
{ "rule": "response_ok" }
]
}
Note: Both value and threshold are accepted interchangeably in assertion objects.
Every assertion object supports these optional fields:
| Field | Type | Description |
|---|---|---|
severity |
string | "critical" (default) or "warning". Critical failures fail the overall evaluation. Warnings are reported but don't fail the run. |
selector |
string | CSS selector string (used by selector_exists and selector_count). |
ignore |
string[] | Whitelist of terms to skip (used by no_placeholders). |
Assertions can be applied globally (to all sampled URLs) or per-URL for fine-grained control:
{
"urls": ["https://example.com/product/1", "https://example.com/product/2"],
"assertions": [
{ "rule": "has_h1" },
{ "rule": "min_word_count", "value": 100 }
],
"url_assertions": {
"https://example.com/product/1": [
{ "rule": "contains_string", "value": "Widget Pro" }
]
}
}
When url_assertions is set for a URL, those assertions replace the global assertions for that URL. They do not merge.
Use Agentic QA (chaos mode) to let an AI adversarially probe your pages for issues that predefined rules can't catch.