Two lines in a robots.txt file. One says Disallow, the other says
Allow, and it's easy to assume they're simple opposites — block this, permit
that. Then a folder you thought was fully blocked shows up in Search Console as crawled
anyway, or a page you meant to keep open stops getting visited by Googlebot, and it's
clear the relationship between the two directives is less obvious than it looks.
The confusion isn't really about syntax. It's about how these two rules interact when they overlap, what they do and don't control, and why a single misplaced slash can quietly change what gets crawled sitewide.
Disallow tells crawlers not to request a path; Allow tells them they may, even inside a path that Disallow would otherwise block. When both rules could apply to the same URL, the most specific (longest) matching path wins, not whichever line comes first. Neither directive prevents a URL from being indexed — they only control crawling.
What do Allow and Disallow actually mean?
Both directives live inside a User-agent block in robots.txt and
describe URL paths, not files or pages by name. Each one tells a specific crawler, or all
crawlers, how to treat requests that match that path.
- Disallow blocks a path from being requested.
Disallow: /admin/tells a well-behaved crawler not to fetch any URL beginning with/admin/at all. - Allow carves out an exception inside a blocked path.
Allow: /admin/help.htmlinside an otherwise disallowed/admin/folder tells the crawler that this one file is still fair game. - Nothing listed means nothing is blocked. By default, every path on a site is crawlable — Disallow is the only directive that removes access, so a robots.txt file with zero rules blocks nothing at all.
- Specificity decides conflicts, not order. When a URL matches both an Allow and a Disallow rule, engines that follow the standard robots.txt spec use whichever rule has the longer, more specific matching path.
In that example, the Allow line's path is longer and more specific than the Disallow
line's, so /private/press-kit.pdf stays crawlable while the rest of
/private/ stays blocked.
Why getting this right matters
Robots.txt sits at the very top of the crawling pipeline. A rule set wrong here doesn't just affect one page — it can silently reshape what a search engine sees across an entire section of a site.
- An overly broad Disallow can hide real content. A single rule like
Disallow: /blogwithout a trailing slash can unintentionally block/blog-archive/and/blog2/too, since it's matched as a prefix. - A missing Allow can trap useful pages. CSS and JS files inside a disallowed folder can prevent Google from rendering a page properly if no Allow rule frees them.
- Disallow doesn't guarantee de-indexing. A blocked URL can still surface in search results if other sites link to it, since Google can index a URL without ever crawling its content.
- These directives are advisory, not enforced. Compliant crawlers like Googlebot and Bingbot respect them, but robots.txt provides no actual access control against bots that choose to ignore it.
Step-by-step: writing Allow and Disallow rules correctly
- List every path that genuinely shouldn't be crawled. Think admin panels, internal search results, staging folders, or duplicate parameterized URLs — not pages you simply don't want ranking.
-
Write the broadest Disallow rule first. Block the parent folder, like
Disallow: /account/, rather than listing every single file inside it one by one. -
Add Allow rules only for real exceptions. If one file or subfolder inside a blocked path still needs to be crawled, add a more specific
Allowline pointing directly at it. -
Double-check trailing slashes and wildcards.
/blogand/blog/match different sets of URLs, and a stray*can widen a rule far beyond what was intended. -
Place the file at the domain root.
robots.txtonly takes effect athttps://yoursite.com/robots.txt— a copy anywhere else in the folder structure is ignored. - Test specific URLs before publishing. Check the paths you most care about against the rule set to confirm each one resolves the way you expect.
- Re-check after every site restructure. New folders, renamed sections, or a CMS migration can leave old Disallow rules blocking paths that no longer exist, or missing new ones that should be blocked.
Common mistakes with Allow vs Disallow
1. Assuming Disallow removes a page from search results
Disallow only stops crawling. A blocked URL that's already linked from elsewhere can still
appear in results — the correct way to keep something out of search entirely is a
noindex tag or authentication, not robots.txt.
2. Blocking a folder without freeing the assets inside it
Disallowing a template or theme folder can accidentally block the CSS and JavaScript a page needs to render, causing Google to see a broken or incomplete layout during rendering.
3. Forgetting that paths are case-sensitive
Disallow: /Private/ does not block /private/. If a site uses
inconsistent capitalization in its URLs, each variation needs its own line.
4. Writing an Allow rule that's less specific than the Disallow it's meant to override
Because the longest matching path wins, an Allow rule that's shorter or less precise than the competing Disallow rule simply won't take effect, and the block stays in place.
Real-world examples
How Allow and Disallow are typically combined for common site structures:
Allow: /products/
Allow: /wp-admin/admin-ajax.php
(no Allow needed)
Allow vs Disallow compared
A side-by-side look at what each directive actually does, and where the common misconceptions creep in.
| Aspect | Disallow | Allow |
|---|---|---|
| Primary function | Blocks a path from being crawled | Permits a path, overriding a broader Disallow |
| Needed by default? | Only for paths you want blocked | Optional — only for exceptions |
| Controls indexing? | No — controls crawling only | No — controls crawling only |
| Conflict resolution | Loses to a more specific Allow rule | Wins over a less specific Disallow rule |
| Respected by all bots? | Only compliant crawlers | Only compliant crawlers |
Build your robots.txt file right now — free
The Rebrixe Robots.txt Generator handles path ordering and specificity for you — pick the folders to block, add exceptions where needed, and get a correctly structured file with Allow and Disallow rules in the right order.
Frequently asked questions
/Folder/ and /folder/ are treated as two different paths and
need to be listed separately if both should be affected.