Robots.txt Explained for Beginners

You've probably seen "robots.txt" mentioned in some SEO checklist, opened your site's version of the file, and found a handful of cryptic lines like User-agent and Disallow staring back at you. It looks like something only a developer should touch, so most people either ignore it completely or copy a random template and hope for the best.

Neither approach is great. Robots.txt is one of the simplest files on your entire site once you understand what it's actually for — and getting it wrong, even by one character, can quietly keep search engines from crawling pages you wanted indexed all along.

Quick Answer

Robots.txt is a plain-text file at the root of your domain that tells search engine crawlers which parts of your site they're allowed or not allowed to crawl. It uses simple rules like User-agent, Disallow, and Allow to guide crawler behavior, and it can also point crawlers to your sitemap — but it doesn't remove pages from search results by itself, and it can't hide anything from human visitors.

What is robots.txt?

Robots.txt is a small, plain-text file that lives at the root of your domain — always at a URL like https://example.com/robots.txt — and gives instructions to search engine crawlers (also called bots or spiders) about which parts of your site they should or shouldn't visit.

The practical takeaway: robots.txt is a traffic-directing sign for crawlers, not a security wall — and most sites only need a few short lines to use it correctly.

Why robots.txt matters for SEO

A file this small has an outsized effect on how efficiently search engines understand your site:

📊 Quick stat A single misplaced slash in a Disallow rule is one of the most common causes of sudden, site-wide traffic drops reported in Search Console — the file's simplicity is exactly why small errors go unnoticed until rankings fall.

Step-by-step: writing your first robots.txt

  1. Check if you already have one. Visit yourdomain.com/robots.txt in a browser — if a file loads, your site already has one, even if it's just a default from your CMS.
  2. Decide what, if anything, should be blocked. Most sites only need to block admin areas, internal search results, or staging folders — not entire content sections.
  3. Start with a User-agent line. Use User-agent: * to target all crawlers, or name a specific crawler if you need different rules for it.
  4. Add Disallow rules for anything crawlers shouldn't fetch. Each rule targets one path, like Disallow: /admin/, and an empty Disallow: line means nothing is blocked.
  5. Add Allow rules only if you need an exception. Use Allow to permit a specific file or subfolder inside a path that's otherwise disallowed.
  6. Point to your sitemap. Add a line like Sitemap: https://example.com/sitemap.xml using the full, absolute URL.
  7. Upload it to your domain's root and test it. Use Google Search Console's robots.txt tester (or a validator) to confirm the file loads correctly and blocks only what you intended.
Example robots.txt
# Block admin and internal search, allow everything else
User-agent: *
Disallow: /admin/
Disallow: /search/
Allow: /admin/help-center/

Sitemap: https://example.com/sitemap.xml
Try the Rebrixe Robots.txt Generator — free Pick your rules, add your sitemap, get a ready-to-upload file. No syntax to memorize.
Generate Robots.txt →

Common mistakes beginners make

1. Blocking the entire site by accident

Disallow: / tells crawlers to stay away from every single page on the domain. It's meant for staging environments, not live sites, but it's easy to leave in place after launch by mistake.

2. Assuming Disallow removes a page from Google

Blocking crawling doesn't guarantee de-indexing. A blocked URL that's linked to elsewhere can still show up in search results, sometimes without a description, since Google never got to read the page.

3. Putting the file in the wrong location

Robots.txt only works from the exact root of a domain or subdomain. A copy placed inside a subfolder, like example.com/blog/robots.txt, is simply ignored by crawlers.

4. Blocking CSS or JavaScript files

Disallowing folders that hold stylesheets or scripts can stop search engines from rendering the page the way visitors see it, which can hurt how well Google understands and ranks that page.

💡 Pro tip After any change to robots.txt, re-test the file in Search Console before assuming it's working — a rule that looks correct can still match more, or less, than intended.

Real-world examples

How different site types typically use robots.txt in practice:

E-commerce store
Blocking filter URLs
Fewer duplicate pages
Disallows faceted navigation parameters like ?color= and ?sort= to stop endless duplicate URL variations from being crawled.
SaaS product site
Blocking the app dashboard
Marketing pages stay clean
Disallows /app/ and /account/ so logged-in product screens never compete with marketing pages in search results.
News publisher
Sitemap-first setup
Fast content discovery
Keeps Disallow rules minimal but lists a frequently updated sitemap so new articles get discovered quickly.
Agency staging site
Full crawl block
Disallow: /
Blocks the entire staging subdomain from crawling until the site is ready to launch publicly.

In each case, the rules follow the same handful of directives — only the paths and intent behind them change.

Robots.txt directives compared

A quick reference for the directives you'll actually use, what each one does, and when to reach for it.

Directive What it does Common use
User-agent Names which crawler the following rules apply to User-agent: * for all crawlers
Disallow Blocks crawling of a specific path Admin panels, internal search, staging folders
Allow Creates an exception inside a blocked path Permitting one file inside a disallowed folder
Sitemap Points crawlers to your sitemap's full URL Helping crawlers discover your site structure
Crawl-delay Requests a pause between crawler requests Ignored by Google, honored by some other bots

Generate your robots.txt right now — free

The Rebrixe Robots.txt Generator builds a clean, correctly formatted file from simple toggles and fields — no memorizing syntax, no guessing where slashes and wildcards go.

Free Robots.txt Generator Set your rules, add your sitemap, download the file.
Open Robots.txt Generator →

Frequently asked questions

No. If a site doesn't have one, search engines simply assume they're allowed to crawl everything. A robots.txt file is only needed when there's something specific you want to guide crawlers away from or point them toward, like a sitemap.
Not necessarily. Disallow only blocks crawling, not indexing. If other sites link to a blocked page, Google can still index the URL without visiting it, sometimes showing it in results with no description. Use a noindex meta tag for pages that must never appear in search.
At the root of the domain and nowhere else, such as https://example.com/robots.txt. A copy placed in a subfolder or subdomain is ignored; each subdomain needs its own robots.txt file at its own root.
Yes. Rules are grouped under a User-agent line, so a site can write one block targeting a specific crawler's name and a separate block with User-agent: * for every other crawler, each with its own Allow and Disallow rules.
A misplaced Disallow: / can block an entire site from being crawled, and a missing character can make a rule match nothing at all. Because the file affects the whole domain, small errors have outsized consequences, so testing before publishing matters.
It's a common and recommended practice. Adding a Sitemap: line pointing to your sitemap's full URL gives crawlers a direct path to your site's structure, even though it's technically optional and sitemaps can also be submitted separately in search console tools.
No. Robots.txt is a public, plain-text file that anyone can view by visiting its URL, and it only asks well-behaved crawlers not to visit certain paths — it doesn't restrict human visitors or enforce privacy in any way.

Generate your robots.txt in seconds

The Rebrixe Robots.txt Generator builds a clean, correctly formatted file from simple toggles — no account, no watermark, and nothing to memorize, just a ready-to-upload file.

Launch the Robots.txt Generator →
← Back to blogs