Concepts

A mental model for how SEODiff scans sites, groups pages into templates, and turns changes into deterministic signals.

Scan

A scan is a bounded crawl of a public base URL. It produces a report (HTML) and a structured artifact (JSON) that are designed to diff cleanly over time.

Sampling

SEODiff is intentionally sample-based. Instead of crawling every URL, it samples a representative set of pages. This keeps runs fast and makes diffs more stable.

Why sampling works for regressions

  • Regression detection cares about changes, not complete coverage.
  • Templates tend to affect many pages, so a small sample can reveal systemic issues.
  • Stable sampling produces stable pass/fail decisions in automation.

When to increase the sample

  • Large programmatic sites (many templates).
  • When you need higher confidence for a release.
  • When monitoring detects drift and you want more context.

Templates (patterns)

SEODiff groups sampled pages into templates (sometimes called “patterns”). This is what makes it useful for programmatic sites: a single template change can affect hundreds or thousands of URLs.

Issue keys

Every check emits machine-readable issue keys (for example schema_missing_required). Automation uses issue keys to decide pass/fail. Humans use the report to see examples, affected templates, and suggested fixes.

Scores

Scores are designed to be directional and stable. In automation, prefer explicit fail rules (fail_on, max_issue_rate) over gating on score alone.

Diffs and regressions

SEODiff is strongest when it can compare a scan to a previous scan or baseline. This powers incidents in monitoring and (planned) regression-only gates in CI/CD.

Related