SEODiff groups pages with content similarity above the configured threshold (default 92%) into clusters. Each cluster represents a set of pages that are near-duplicates of each other. The JSON field is duplicate_clusters.
Triggered when DuplicateClusterSize ≥ 2 and a cluster ID is present. The maximum number of similarity comparisons is capped at 1,500 pairs by default. Suppressed when fewer than 10 pages are sampled.
Severity weight: 7. Deductions: −12 on Content, dampened as heuristic. Large clusters amplify the impact because they affect multiple pages.