What robots.txt is
robots.txt is a plain text file at the root of your domain that tells search engine crawlers which parts of your site they may or may not crawl. It is the first door bots knock on when arriving at your site: they read robots.txt and decide which URLs to visit.
Basic syntax
- User-agent: which bot the rule applies to (Googlebot, Bingbot, * for all).
- Disallow: path or prefix to block.
- Allow: exception to a broader Disallow.
- Sitemap: absolute URL of your sitemap.xml.
Common use cases
- Open site: User-agent * with empty Disallow. Everything is crawlable.
- Block admin panel: Disallow: /admin/ so the panel is not indexed.
- Block staging: Disallow: / so no bot crawls your test site.
- Block heavy files: PDFs, ZIPs or exports you do not want in the index.
What robots.txt does NOT do
Many people confuse crawling with indexing. robots.txt only controls crawling (whether the bot visits the URL). If a URL blocked by robots.txt is linked from another site, Google can still index it with an empty snippet. To guarantee something does not appear in results, use meta robots noindex on the page itself, not robots.txt.
Common mistakes
- Blocking everything in production: forgetting Disallow: / in staging and pushing it to prod is an SEO disaster.
- Blocking CSS or JS: Google needs these files to render and rank correctly. Do not block them.
- Treating robots.txt as security: anyone can read your robots.txt. Listing /admin/ there is like putting up a sign.
- Case sensitivity: paths are case-sensitive. /Admin/ and /admin/ are different.
Location and validation
The file must live at the exact domain root: https://yoursite.com/robots.txt. It does not work in subfolders. Validate with Google Search Console (Settings > robots.txt) to see how Googlebot interprets it. Also test real URLs with URL inspection to confirm they are crawlable.
Final checklist
Before pushing to production: confirm you are not blocking critical paths, the sitemap points to the correct https URL, and your staging site has a separate robots.txt that actually blocks everything. Keep the file simple — fewer rules, less risk of mistakes.