Issue: Duplicate Clusters Medium

Groups of 2 or more pages with near-identical content within the scanned sample.

What this means

SEODiff groups pages with content similarity above the configured threshold (default 92%) into clusters. Each cluster represents a set of pages that are near-duplicates of each other. The JSON field is duplicate_clusters.

Detection condition

Triggered when DuplicateClusterSize ≥ 2 and a cluster ID is present. The maximum number of similarity comparisons is capped at 1,500 pairs by default. Suppressed when fewer than 10 pages are sampled.

Impact on scores

Severity weight: 7. Deductions: −12 on Content, dampened as heuristic. Large clusters amplify the impact because they affect multiple pages.

How to fix

  1. Review each cluster in your SEODiff report to understand what makes the pages similar.
  2. Differentiate content by adding unique data, sections, or context per page.
  3. Consolidate truly duplicate pages behind a single canonical URL.
  4. For pSEO: review your template to ensure sufficient unique data injection per page.