How Robots.txt Rules Actually Work
Most SEOs misread how crawlers apply robots.txt rules. Here's the precise logic behind Allow/Disallow matching.
Most Specific Rule Wins
When both Allow and Disallow match a URL, the longer (more specific) rule takes priority — not the order in the file. Allow: /admin/public/ beats Disallow: /admin/.
Wildcards: * and $
* matches any sequence of characters. $ anchors to end of URL. Example: Disallow: /*.pdf$ blocks all PDF files but not /pdf-guide/.
User-agent Matching Order
Crawlers look for their own User-agent block first. If found, they use only that block. The User-agent: * block is a fallback — used only when no specific match exists.
Empty Disallow = Allow All
Disallow: with no value means "allow everything." This is the standard way to allow a bot you listed in a User-agent block to crawl the entire site.
Case Sensitivity
The User-agent value is case-insensitive. But paths in Allow and Disallow are case-sensitive. /Admin/ and /admin/ are treated as different paths.
Crawl-delay
Not supported by Google, but respected by Bing and others. Sets the minimum seconds between requests. Useful to reduce server load from aggressive crawlers like AhrefsBot.