Concepts

A mental model for how SEODiff scans sites, groups pages into templates, and turns changes into deterministic signals.

Docs home Getting started API reference Glossary

Scan

A scan is a bounded crawl of a public base URL. It produces a report (HTML) and a structured artifact (JSON) that are designed to diff cleanly over time.

Sampling

SEODiff is intentionally sample-based. Instead of crawling every URL, it samples a representative set of pages. This keeps runs fast and makes diffs more stable.

Why sampling works for regressions

Regression detection cares about changes, not complete coverage.
Templates tend to affect many pages, so a small sample can reveal systemic issues.
Stable sampling produces stable pass/fail decisions in automation.

When to increase the sample

Large programmatic sites (many templates).
When you need higher confidence for a release.
When monitoring detects drift and you want more context.

Templates (patterns)

SEODiff groups sampled pages into templates (sometimes called “patterns”). This is what makes it useful for programmatic sites: a single template change can affect hundreds or thousands of URLs.

Issue keys

Every check emits machine-readable issue keys (for example schema_missing_required). Automation uses issue keys to decide pass/fail. Humans use the report to see examples, affected templates, and suggested fixes.

Scores

Scores are designed to be directional and stable. In automation, prefer explicit fail rules (fail_on, max_issue_rate) over gating on score alone.

Diffs and regressions

SEODiff is strongest when it can compare a scan to a previous scan or baseline. This powers incidents in monitoring and (planned) regression-only gates in CI/CD.