Robots.txt Explained for Beginners (2026 Guide)

Q: Where does robots.txt need to be placed?

At the root of the domain and nowhere else, such as https://example.com/robots.txt. A copy placed in a subfolder or subdomain is ignored; each subdomain needs its own robots.txt file at its own root.

Q: What happens if I make a mistake in robots.txt?

A misplaced Disallow: / can block an entire site from being crawled, and a missing character can make a rule match nothing at all. Because the file affects the whole domain, small errors have outsized consequences, so testing before publishing matters.

Q: Should my sitemap be listed in robots.txt?

It's a common and recommended practice. Adding a Sitemap: line pointing to your sitemap's full URL gives crawlers a direct path to your site's structure, even though it's technically optional and sitemaps can also be submitted separately in search console tools.

You've probably seen "robots.txt" mentioned in some SEO checklist, opened your site's version of the file, and found a handful of cryptic lines like User-agent and Disallow staring back at you. It looks like something only a developer should touch, so most people either ignore it completely or copy a random template and hope for the best.

Neither approach is great. Robots.txt is one of the simplest files on your entire site once you understand what it's actually for — and getting it wrong, even by one character, can quietly keep search engines from crawling pages you wanted indexed all along.

Quick Answer

Robots.txt is a plain-text file at the root of your domain that tells search engine crawlers which parts of your site they're allowed or not allowed to crawl. It uses simple rules like User-agent, Disallow, and Allow to guide crawler behavior, and it can also point crawlers to your sitemap — but it doesn't remove pages from search results by itself, and it can't hide anything from human visitors.

What is robots.txt?

Robots.txt is a small, plain-text file that lives at the root of your domain — always at a URL like https://example.com/robots.txt — and gives instructions to search engine crawlers (also called bots or spiders) about which parts of your site they should or shouldn't visit.

It's a request, not a lock. Well-behaved crawlers like Googlebot read and follow it, but it relies on cooperation — it can't technically prevent access the way a password or firewall does.
It controls crawling, not indexing. Blocking a page stops crawlers from fetching its content, but the URL itself can still appear in search results if something else links to it.
It's written in simple directives. A handful of keywords — User-agent, Disallow, Allow, Sitemap — cover the vast majority of what any site needs.
It's public. Anyone, not just search engines, can open the file at its URL, so it should never be used to hide sensitive information.

The practical takeaway: robots.txt is a traffic-directing sign for crawlers, not a security wall — and most sites only need a few short lines to use it correctly.

Why robots.txt matters for SEO

A file this small has an outsized effect on how efficiently search engines understand your site:

It protects your crawl budget. Large sites get a limited number of crawl visits; keeping bots away from admin pages, filters, or duplicate URLs leaves more of that budget for pages that matter.
It prevents low-value pages from cluttering search results. Internal search pages, staging folders, or thank-you pages rarely need to be crawled at all.
It points crawlers to your sitemap. A single Sitemap: line gives search engines a direct map of your site's structure instead of relying purely on discovery through links.
One wrong line can block everything. Because the rules apply site-wide, a mistaken Disallow: / can accidentally tell every crawler to stay away from the entire domain.

📊 Quick stat A single misplaced slash in a Disallow rule is one of the most common causes of sudden, site-wide traffic drops reported in Search Console — the file's simplicity is exactly why small errors go unnoticed until rankings fall.

Step-by-step: writing your first robots.txt

Check if you already have one. Visit yourdomain.com/robots.txt in a browser — if a file loads, your site already has one, even if it's just a default from your CMS.
Decide what, if anything, should be blocked. Most sites only need to block admin areas, internal search results, or staging folders — not entire content sections.
Start with a User-agent line. Use User-agent: * to target all crawlers, or name a specific crawler if you need different rules for it.
Add Disallow rules for anything crawlers shouldn't fetch. Each rule targets one path, like Disallow: /admin/, and an empty Disallow: line means nothing is blocked.
Add Allow rules only if you need an exception. Use Allow to permit a specific file or subfolder inside a path that's otherwise disallowed.
Point to your sitemap. Add a line like Sitemap: https://example.com/sitemap.xml using the full, absolute URL.
Upload it to your domain's root and test it. Use Google Search Console's robots.txt tester (or a validator) to confirm the file loads correctly and blocks only what you intended.

Example robots.txt

# Block admin and internal search, allow everything else
User-agent: *
Disallow: /admin/
Disallow: /search/
Allow: /admin/help-center/

Sitemap: https://example.com/sitemap.xml

Try the Rebrixe Robots.txt Generator — free Pick your rules, add your sitemap, get a ready-to-upload file. No syntax to memorize.

Generate Robots.txt →

Common mistakes beginners make

1. Blocking the entire site by accident

Disallow: / tells crawlers to stay away from every single page on the domain. It's meant for staging environments, not live sites, but it's easy to leave in place after launch by mistake.

2. Assuming Disallow removes a page from Google

Blocking crawling doesn't guarantee de-indexing. A blocked URL that's linked to elsewhere can still show up in search results, sometimes without a description, since Google never got to read the page.

3. Putting the file in the wrong location

Robots.txt only works from the exact root of a domain or subdomain. A copy placed inside a subfolder, like example.com/blog/robots.txt, is simply ignored by crawlers.

4. Blocking CSS or JavaScript files

Disallowing folders that hold stylesheets or scripts can stop search engines from rendering the page the way visitors see it, which can hurt how well Google understands and ranks that page.

💡 Pro tip After any change to robots.txt, re-test the file in Search Console before assuming it's working — a rule that looks correct can still match more, or less, than intended.

Real-world examples

How different site types typically use robots.txt in practice:

E-commerce store

Blocking filter URLs

Fewer duplicate pages

Disallows faceted navigation parameters like ?color= and ?sort= to stop endless duplicate URL variations from being crawled.

SaaS product site

Blocking the app dashboard

Marketing pages stay clean

Disallows /app/ and /account/ so logged-in product screens never compete with marketing pages in search results.

News publisher

Sitemap-first setup

Fast content discovery

Keeps Disallow rules minimal but lists a frequently updated sitemap so new articles get discovered quickly.

Agency staging site

Full crawl block

Disallow: /

Blocks the entire staging subdomain from crawling until the site is ready to launch publicly.

In each case, the rules follow the same handful of directives — only the paths and intent behind them change.

Robots.txt directives compared

A quick reference for the directives you'll actually use, what each one does, and when to reach for it.

Directive	What it does	Common use
User-agent	Names which crawler the following rules apply to	`User-agent: *` for all crawlers
Disallow	Blocks crawling of a specific path	Admin panels, internal search, staging folders
Allow	Creates an exception inside a blocked path	Permitting one file inside a disallowed folder
Sitemap	Points crawlers to your sitemap's full URL	Helping crawlers discover your site structure
Crawl-delay	Requests a pause between crawler requests	Ignored by Google, honored by some other bots

Generate your robots.txt right now — free

The Rebrixe Robots.txt Generator builds a clean, correctly formatted file from simple toggles and fields — no memorizing syntax, no guessing where slashes and wildcards go.

Free Robots.txt Generator Set your rules, add your sitemap, download the file.

Open Robots.txt Generator →

Frequently asked questions

Is robots.txt required for every website?

No. If a site doesn't have one, search engines simply assume they're allowed to crawl everything. A robots.txt file is only needed when there's something specific you want to guide crawlers away from or point them toward, like a sitemap.

Does Disallow in robots.txt actually keep a page out of Google?

Not necessarily. Disallow only blocks crawling, not indexing. If other sites link to a blocked page, Google can still index the URL without visiting it, sometimes showing it in results with no description. Use a noindex meta tag for pages that must never appear in search.

Where does robots.txt need to be placed?

At the root of the domain and nowhere else, such as https://example.com/robots.txt. A copy placed in a subfolder or subdomain is ignored; each subdomain needs its own robots.txt file at its own root.

Can robots.txt block a specific search engine but allow others?

Yes. Rules are grouped under a User-agent line, so a site can write one block targeting a specific crawler's name and a separate block with User-agent: * for every other crawler, each with its own Allow and Disallow rules.

What happens if I make a mistake in robots.txt?

A misplaced Disallow: / can block an entire site from being crawled, and a missing character can make a rule match nothing at all. Because the file affects the whole domain, small errors have outsized consequences, so testing before publishing matters.

Should my sitemap be listed in robots.txt?

It's a common and recommended practice. Adding a Sitemap: line pointing to your sitemap's full URL gives crawlers a direct path to your site's structure, even though it's technically optional and sitemaps can also be submitted separately in search console tools.

Can I use robots.txt to hide a page from competitors or the public?

No. Robots.txt is a public, plain-text file that anyone can view by visiting its URL, and it only asks well-behaved crawlers not to visit certain paths — it doesn't restrict human visitors or enforce privacy in any way.

What is robots.txt?

Why robots.txt matters for SEO

Step-by-step: writing your first robots.txt

Common mistakes beginners make

1. Blocking the entire site by accident

2. Assuming Disallow removes a page from Google

3. Putting the file in the wrong location

4. Blocking CSS or JavaScript files

Real-world examples

Robots.txt directives compared

Generate your robots.txt right now — free

Frequently asked questions

Generate your robots.txt in seconds

Related tools and guides

How to Add Schema Without Coding

Schema Markup Explained: What It Is and How to Use It

HTML Meta Tags Guide: Title, Description, and Beyond