Issue: Exact Duplicates High

Multiple pages within the same template have identical content.

What this means

Two or more pages produce the exact same text content after stripping boilerplate. This is especially common in programmatic SEO setups where a template fails to inject unique data for certain entries. The JSON field is exact_duplicates.

Detection condition

Triggered when SimilarityKind == "exact". SEODiff computes text similarity using SimHash and flags 100% matches. The similarity threshold for near-duplicates is configurable (default 92%), but exact duplicates always count as an issue regardless of information gain score.

Impact on scores

Severity weight: 12. Deductions: −15 on Indexability, −18 on Content score, plus an additional −10 per duplicate cluster. Search engines may choose to index only one version, effectively wasting all other duplicate pages.

Common causes

How to fix

  1. Identify the duplicate clusters in your SEODiff report.
  2. If caused by empty data: ensure every page has unique, substantive content or return 404 for entries with no data.
  3. If caused by URL variants: add proper canonical tags pointing to the preferred URL.
  4. If caused by parameters: use rel="canonical" to point parameter URLs back to the clean URL.
  5. For pSEO: add guard logic to check if the template data is empty/identical before publishing.