What Is robots.txt? (2026 Guide)

You've probably typed a URL like yoursite.com/robots.txt into a browser out of curiosity and landed on a plain text file with a handful of cryptic lines — "User-agent," "Disallow," "Sitemap." It looks minor. It isn't. That small file is one of the first things a search engine crawler reads before it touches anything else on your site.

Get it wrong and you can accidentally hide your entire site from Google. Leave it out entirely and crawlers just assume everything is fair game. Understanding what robots.txt actually does — and what it doesn't — is one of the fastest ways to avoid a silent, easy-to-miss SEO mistake.

Quick Answer

robots.txt is a plain text file placed at the root of a domain that tells search engine crawlers which parts of a site they're allowed to request. It uses simple rules like "Disallow" and "Allow" per user-agent, and can point crawlers to a sitemap. It doesn't remove pages from search results or restrict who can view a page — it only manages crawler access.

What is robots.txt, exactly?

robots.txt follows the Robots Exclusion Protocol, a long-standing convention that gives site owners a standardized way to talk to crawlers before those crawlers request anything else.

It's a request, not a lock. The file tells well-behaved crawlers what they shouldn't request. It has no technical ability to stop a browser, a person, or a non-compliant bot from reaching a URL directly.
It works by user-agent. Rules can target a specific crawler, like Googlebot, or apply to every bot using a wildcard, so different crawlers can be given different levels of access.
Two core directives do most of the work. Disallow marks a path as off-limits to crawling, while Allow carves out an exception inside a disallowed folder.
It can point to your sitemap. A Sitemap: line gives crawlers a direct path to your full list of URLs, independent of the disallow/allow rules.

In short: robots.txt manages traffic at the front door. It decides who gets waved through to crawl a page — it says nothing about whether that page can later show up in search results.

Why robots.txt matters for SEO

A file this small has an outsized effect on how efficiently — and how safely — your site gets crawled:

It protects your crawl budget. Blocking low-value paths like internal search results or filtered URL parameters frees up crawler attention for pages that actually matter to rankings.
It keeps admin and staging areas out of the way. Login pages, cart pages, and staging subfolders rarely need to be crawled, and excluding them keeps crawl activity focused.
One mistake can de-index a whole site. A single stray Disallow: / left over from a staging environment can quietly block Google from crawling a live production site.
It's the first file crawlers check. Search engines fetch robots.txt before crawling anything else on a domain, so its rules take effect immediately on every subsequent crawl.

📊 Quick stat A large share of accidental "why did my traffic drop to zero" cases trace back to one line: a leftover Disallow: / that never got removed after launch — not a penalty, not an algorithm update.

Step-by-step: creating and adding a robots.txt file

Decide what actually needs blocking. Most sites need very few rules — think admin paths, internal search, or duplicate parameter URLs, not entire sections.
List the user-agents you want to address. Use User-agent: * for a rule that applies to every crawler, or name a specific one if you need different treatment for it.
Write your Disallow and Allow rules. Each rule is a path, not a full URL — Disallow: /admin/ blocks that folder for the user-agent listed directly above it.
Add a Sitemap line. Point crawlers to your sitemap's full URL, such as Sitemap: https://yoursite.com/sitemap.xml, so they can discover your pages efficiently.
Save the file as robots.txt. It must be plain text, named exactly robots.txt, with no capitalization changes.
Upload it to the root of your domain. It has to be reachable at https://yoursite.com/robots.txt directly — a subfolder location won't be recognized.
Test it before trusting it. Use Google Search Console's robots.txt report to confirm specific URLs are allowed or blocked exactly as intended.

Try the Rebrixe robots.txt Generator — free Build a correct, ready-to-upload robots.txt file in seconds. No coding required.

Generate robots.txt →

Common mistakes with robots.txt

1. Blocking the entire site by accident

A single leftover line like Disallow: / under a wildcard user-agent tells every crawler to skip the whole domain — a common survivor from a staging environment that never got cleaned up at launch.

2. Using robots.txt to try to remove pages from search

Disallowing a URL stops crawling, but a page that's already indexed, or one that other sites link to, can still show up in results without a description — the correct tool for removal is a noindex directive, not a Disallow rule.

3. Blocking CSS and JavaScript files

Search engines render pages the way a browser does, which means they need access to stylesheets and scripts. Blocking them can cause a page to be evaluated as broken or poorly built even when it looks fine to a visitor.

4. Assuming it's a privacy or security control

The file itself is public and readable by anyone, so listing a sensitive path in it can actually draw attention to that path rather than hiding it — real access control needs authentication, not a crawler directive.

💡 Pro tip After any site migration or launch, re-check robots.txt first. It's one of the most common places a temporary "block everything" rule from staging quietly makes it into production.

Real-world examples

How different types of sites typically shape their robots.txt rules:

Blog

Block internal search

Disallow: /?s=

Keeps thin, duplicate internal search-result pages out of the crawl queue while leaving articles fully open.

E-commerce store

Block cart and checkout

Disallow: /cart/

Excludes session-specific, non-indexable pages while keeping every product and category page crawlable.

SaaS product

Block the app, allow the marketing site

Disallow: /app/

Separates the logged-in product experience from the public marketing pages that actually need to rank.

Small business site

Minimal rules, full sitemap

Allow: /

Leaves the whole site open to crawling and simply points to the sitemap for efficient discovery.

Notice the pattern: every example blocks a narrow, specific path. None of them reach for broad, sweeping rules — that restraint is exactly what keeps robots.txt safe to use.

robots.txt vs. other crawl controls

robots.txt is one of several tools for managing how search engines treat your pages. Here's how it compares to the others.

Method	Controls	Stops indexing?	Best for
robots.txt	Crawling access	No, not reliably	Managing crawl budget, keeping bots out of low-value paths
Meta robots noindex	Indexing of a page	Yes	Keeping a specific, crawlable page out of search results
X-Robots-Tag header	Indexing, non-HTML files	Yes	Non-HTML files like PDFs or images that can't hold a meta tag
XML sitemap	Discovery, not blocking	No	Helping crawlers find and prioritize pages you want indexed

Generate your robots.txt right now — free

The Rebrixe robots.txt Generator builds a clean, correctly formatted file with the rules you choose — no account, no watermark, and nothing to write by hand.

Free robots.txt Generator Pick your rules, add your sitemap, download the file.

Open robots.txt Generator →

Frequently asked questions

Is robots.txt the same as noindex?

No. robots.txt asks crawlers not to visit a URL, but a page blocked this way can still appear in search results without a snippet if other sites link to it. A noindex meta tag or header is the correct way to keep a page out of search results entirely, and it requires the page to be crawlable so the tag can be read.

Where does robots.txt need to live?

It must sit at the root of the domain, such as https://example.com/robots.txt. A copy placed in a subfolder or subdomain is ignored for the rest of the site; each subdomain needs its own robots.txt if it needs rules at all.

Can robots.txt stop other people from seeing a page?

No. The file is publicly readable by anyone, including people, and it only carries weight with crawlers that choose to respect it. It is not a security or privacy tool, and sensitive URLs should never be listed there as a way to hide them.

Do all search engines obey robots.txt?

Major search engines like Google and Bing follow the standard, but obedience is voluntary by design, and some bots ignore the file entirely. Blocking abusive or unwanted bots reliably usually requires server-level rules in addition to robots.txt.

What happens if a site has no robots.txt at all?

Crawlers treat a missing file as an open invitation and assume the entire site is crawlable, then request the URL and receive a 404. That's a valid, safe state for many small sites and doesn't cause an error on its own.

Can I test my robots.txt before publishing it?

Yes. Google Search Console's robots.txt report and similar third-party testers let you check whether a specific URL is allowed or blocked under the current rules before you rely on them in production.

Should I block CSS and JavaScript files in robots.txt?

Generally no. Search engines need to fetch these files to render the page the way a visitor sees it, and blocking them can lead to a page being misjudged as broken or lower quality than it actually is.

What is robots.txt, exactly?

Why robots.txt matters for SEO

Step-by-step: creating and adding a robots.txt file

Common mistakes with robots.txt

1. Blocking the entire site by accident

2. Using robots.txt to try to remove pages from search

3. Blocking CSS and JavaScript files

4. Assuming it's a privacy or security control

Real-world examples

robots.txt vs. other crawl controls

Generate your robots.txt right now — free

Frequently asked questions

Build your robots.txt in seconds

Related tools and guides

Schema Markup Explained: What It Is and How to Use It

How to Add Schema Without Coding

Schema Markup Generator