You've probably seen "robots.txt" mentioned in some SEO checklist, opened your site's
version of the file, and found a handful of cryptic lines like User-agent and
Disallow staring back at you. It looks like something only a developer should
touch, so most people either ignore it completely or copy a random template and hope for
the best.
Neither approach is great. Robots.txt is one of the simplest files on your entire site once you understand what it's actually for — and getting it wrong, even by one character, can quietly keep search engines from crawling pages you wanted indexed all along.
Robots.txt is a plain-text file at the root of your domain that tells search engine
crawlers which parts of your site they're allowed or not allowed to crawl. It uses
simple rules like User-agent, Disallow, and Allow
to guide crawler behavior, and it can also point crawlers to your sitemap — but it
doesn't remove pages from search results by itself, and it can't hide anything from
human visitors.
What is robots.txt?
Robots.txt is a small, plain-text file that lives at the root of your domain — always at a
URL like https://example.com/robots.txt — and gives instructions to search
engine crawlers (also called bots or spiders) about which parts of your site they should
or shouldn't visit.
- It's a request, not a lock. Well-behaved crawlers like Googlebot read and follow it, but it relies on cooperation — it can't technically prevent access the way a password or firewall does.
- It controls crawling, not indexing. Blocking a page stops crawlers from fetching its content, but the URL itself can still appear in search results if something else links to it.
- It's written in simple directives. A handful of keywords —
User-agent,Disallow,Allow,Sitemap— cover the vast majority of what any site needs. - It's public. Anyone, not just search engines, can open the file at its URL, so it should never be used to hide sensitive information.
The practical takeaway: robots.txt is a traffic-directing sign for crawlers, not a security wall — and most sites only need a few short lines to use it correctly.
Why robots.txt matters for SEO
A file this small has an outsized effect on how efficiently search engines understand your site:
- It protects your crawl budget. Large sites get a limited number of crawl visits; keeping bots away from admin pages, filters, or duplicate URLs leaves more of that budget for pages that matter.
- It prevents low-value pages from cluttering search results. Internal search pages, staging folders, or thank-you pages rarely need to be crawled at all.
- It points crawlers to your sitemap. A single
Sitemap:line gives search engines a direct map of your site's structure instead of relying purely on discovery through links. - One wrong line can block everything. Because the rules apply site-wide, a mistaken
Disallow: /can accidentally tell every crawler to stay away from the entire domain.
Step-by-step: writing your first robots.txt
-
Check if you already have one. Visit
yourdomain.com/robots.txtin a browser — if a file loads, your site already has one, even if it's just a default from your CMS. - Decide what, if anything, should be blocked. Most sites only need to block admin areas, internal search results, or staging folders — not entire content sections.
-
Start with a User-agent line. Use
User-agent: *to target all crawlers, or name a specific crawler if you need different rules for it. -
Add Disallow rules for anything crawlers shouldn't fetch. Each rule targets one path, like
Disallow: /admin/, and an emptyDisallow:line means nothing is blocked. -
Add Allow rules only if you need an exception. Use
Allowto permit a specific file or subfolder inside a path that's otherwise disallowed. -
Point to your sitemap. Add a line like
Sitemap: https://example.com/sitemap.xmlusing the full, absolute URL. - Upload it to your domain's root and test it. Use Google Search Console's robots.txt tester (or a validator) to confirm the file loads correctly and blocks only what you intended.
# Block admin and internal search, allow everything else
User-agent: *
Disallow: /admin/
Disallow: /search/
Allow: /admin/help-center/
Sitemap: https://example.com/sitemap.xml
Common mistakes beginners make
1. Blocking the entire site by accident
Disallow: / tells crawlers to stay away from every single page on the domain.
It's meant for staging environments, not live sites, but it's easy to leave in place after
launch by mistake.
2. Assuming Disallow removes a page from Google
Blocking crawling doesn't guarantee de-indexing. A blocked URL that's linked to elsewhere can still show up in search results, sometimes without a description, since Google never got to read the page.
3. Putting the file in the wrong location
Robots.txt only works from the exact root of a domain or subdomain. A copy placed inside a
subfolder, like example.com/blog/robots.txt, is simply ignored by crawlers.
4. Blocking CSS or JavaScript files
Disallowing folders that hold stylesheets or scripts can stop search engines from rendering the page the way visitors see it, which can hurt how well Google understands and ranks that page.
Real-world examples
How different site types typically use robots.txt in practice:
In each case, the rules follow the same handful of directives — only the paths and intent behind them change.
Robots.txt directives compared
A quick reference for the directives you'll actually use, what each one does, and when to reach for it.
| Directive | What it does | Common use |
|---|---|---|
| User-agent | Names which crawler the following rules apply to | User-agent: * for all crawlers |
| Disallow | Blocks crawling of a specific path | Admin panels, internal search, staging folders |
| Allow | Creates an exception inside a blocked path | Permitting one file inside a disallowed folder |
| Sitemap | Points crawlers to your sitemap's full URL | Helping crawlers discover your site structure |
| Crawl-delay | Requests a pause between crawler requests | Ignored by Google, honored by some other bots |
Generate your robots.txt right now — free
The Rebrixe Robots.txt Generator builds a clean, correctly formatted file from simple toggles and fields — no memorizing syntax, no guessing where slashes and wildcards go.
Frequently asked questions
https://example.com/robots.txt. A copy placed in a subfolder or subdomain
is ignored; each subdomain needs its own robots.txt file at its own root.
User-agent: * for
every other crawler, each with its own Allow and Disallow rules.
Disallow: / can block an entire site from being crawled, and a
missing character can make a rule match nothing at all. Because the file affects the
whole domain, small errors have outsized consequences, so testing before publishing
matters.
Sitemap: line pointing to
your sitemap's full URL gives crawlers a direct path to your site's structure, even
though it's technically optional and sitemaps can also be submitted separately in
search console tools.