Medium — Crawl Access
Blocked by robots.txt
Your robots.txt file is preventing Google (and other crawlers) from accessing pages you want indexed. This is one of the simplest GSC errors to fix — but one of the most costly when misconfigured, because blocked pages can never rank.
Tests robots.txt, meta robots, and bot access for 10+ crawlers. No account required.
Common robots.txt Mistakes
- Blocking entire directories.
Disallow: /blog/ when you only meant to block one page.
- Wildcard overkill.
Disallow: /*? blocks all URLs with query parameters, including legitimate paginated content.
- Staging rules in production.
Disallow: / left over from a dev/staging environment. This blocks the entire site.
- Blocking CSS/JS resources. Google needs access to CSS and JS files to render pages. Blocking
/wp-content/ or /static/ breaks rendering.
- User-agent mismatches. Rules targeting
User-agent: Googlebot vs User-agent: * behave differently. Typos in user-agent names silently fail.
How to Fix It
- Audit your robots.txt. Use GSC's robots.txt tester to verify which URLs are blocked.
- Use SEODiff's Crawl Access Checker to test access for Googlebot, GPTBot, ClaudeBot, and other crawlers simultaneously.
- Be specific. Instead of blocking entire directories, use precise path patterns.
- Allow CSS/JS/images. Never block
/static/, /assets/, or /wp-content/themes/.
- Test before deploying. Always test robots.txt changes against your URL list before pushing to production.
robots.txt vs meta robots vs noindex
These are different mechanisms with different effects:
- robots.txt Disallow — Prevents crawling. Google can't see the page at all. If external links point to it, Google may still index the URL (with no snippet).
- meta robots noindex — Allows crawling but prevents indexing. Google sees the page but won't show it in results.
- X-Robots-Tag: noindex — Same as meta robots, but set via HTTP header. Works for non-HTML files (PDFs, images).
Important: If you block a page in robots.txt AND have a noindex meta tag, Google will never see the noindex tag (because it can't crawl the page). The noindex is effectively invisible.