A marketing site has one job: get pages indexed. A SaaS product has several jobs running on the same domain family at once — a marketing site that wants to rank, an app that customers log into, docs that should be searchable, an internal API, and probably a staging environment nobody meant to make public. A single generic robots.txt file, copied from a template, usually gets this wrong in one of two directions: it blocks too much and your pricing page stops ranking, or it blocks too little and Google starts crawling your login screen and dashboard routes.
Robots.txt for SaaS isn't a one-line fix. It's a small set of deliberate decisions about which parts of your product are content and which parts are application — and those decisions look different for every subdomain you run.
A SaaS robots.txt file should allow crawling of your marketing pages, blog, and docs, while disallowing the logged-in app, admin routes, internal API endpoints, and any staging or dev subdomain. Each subdomain — marketing site, app, docs — needs its own robots.txt at its own root, and the file should always point to your sitemap for the pages you do want indexed.
What is robots.txt, and what it isn't
Robots.txt is a plain-text file at the root of a domain that tells well-behaved crawlers which paths they're welcome to request and which ones to skip. It's a request, not a lock — it works because major search engines choose to respect it, not because it enforces anything.
- It controls crawling, not access. A Disallow rule stops Googlebot from fetching a URL. It does nothing to stop a person from opening that URL directly in a browser.
- It's per-hostname. A rule set at yoursaas.com has no effect on app.yoursaas.com or docs.yoursaas.com — each one is checked separately.
- It uses a small set of directives.
User-agenttargets a crawler,Disallowblocks a path pattern,Allowcarves out an exception, andSitemappoints to your sitemap file. - It's publicly readable. Anyone can view a site's robots.txt, so it shouldn't be used to list sensitive paths you don't want discovered — doing so can draw attention to them instead.
Why robots.txt matters more for SaaS
A brochure site can often get away with an empty or near-empty robots.txt. SaaS products can't, because of how much surface area they run on:
- Dashboards look like pages to a crawler. Without a rule blocking it, Googlebot will attempt to crawl your app's routes just like any other URL, wasting crawl budget on screens no anonymous visitor should see indexed.
- Staging environments are easy to forget. A dev or staging subdomain spun up for testing is still a public URL unless it's explicitly blocked or authenticated, and it competes with your production site for rankings if indexed.
- Docs and marketing content need to stay open. Overly broad Disallow rules — often copied from the app's robots.txt by mistake — can accidentally block the very content pages meant to bring in search traffic.
- Crawl budget is finite even for well-funded sites. Every dashboard, filter combination, or internal search-result URL that gets crawled is a URL not spent on your pricing page or latest blog post.
Step-by-step: setting up robots.txt for a SaaS product
- Map every subdomain and path your product uses. List the marketing site, app/dashboard, docs, blog, API, status page, and any staging or dev environments — robots.txt decisions get made per hostname.
- Sort each area into "should rank" or "should not rank." Marketing pages, blog posts, pricing, and docs usually belong in the first group; login screens, account settings, internal tools, and staging belong in the second.
- Write Disallow rules for the app and internal paths. On the marketing domain, block the app path, admin routes, and any internal API path the app calls directly.
-
Block staging and dev subdomains entirely. Add a robots.txt with
Disallow: /underUser-agent: *at the root of every non-production hostname, and keep it separate from the production file. - Leave docs and blog open, with their own sitemap reference. If docs live on their own subdomain, give it a permissive robots.txt with a Sitemap line pointing to the docs sitemap.
- Add the sitemap directive to each production robots.txt. This isn't required, but it gives crawlers a direct path to the URLs you actually want indexed.
- Test each file with Google Search Console. Use the robots.txt report and URL Inspection tool per verified property to confirm the right paths are blocked and the right ones aren't.
- Re-check robots.txt whenever you ship a new subdomain or major route. A new internal tool or a newly public docs section both change what should be allowed or blocked.
Common mistakes SaaS teams make
1. Blocking the entire site by accident
A single stray Disallow: / under User-agent: * — often left over
from a staging config — blocks every page on the domain from being crawled, including the
homepage.
2. Treating robots.txt as a security measure
Disallowing a path doesn't restrict who can open it; it only asks compliant crawlers not to fetch it. Sensitive routes still need authentication, and listing them in a public robots.txt can make them easier to find, not harder.
3. Forgetting that subdomains don't inherit rules
A well-configured robots.txt on the marketing domain has zero effect on app.yoursaas.com or docs.yoursaas.com — each subdomain needs its own file checked and maintained separately.
4. Blocking CSS and JavaScript
Disallowing asset folders to "save crawl budget" can prevent Google from rendering pages correctly, since it fetches and executes the same resources a browser would to judge layout and content.
5. Combining Disallow with noindex on the same page
If a page is blocked with Disallow, crawlers never fetch it, which means they never see a noindex tag placed on it either — the two directives shouldn't be used together on a page you're trying to keep out of search.
Real-world examples by subdomain
How the same SaaS product typically configures robots.txt differently across its subdomains:
Blocking methods compared
Robots.txt is one of several ways to keep a URL out of search results — here's how it compares to the others for a SaaS site.
| Method | Blocks crawling | Blocks indexing | Restricts access | Best for |
|---|---|---|---|---|
| Robots.txt Disallow | Yes | Usually, not guaranteed | No | App routes, internal API paths, staging domains |
| Noindex meta tag | No | Yes, reliably | No | Thin pages that should stay crawlable but not ranked |
| Password protection / auth wall | Indirectly | Yes | Yes | Dashboards, account pages, anything genuinely private |
| Robots.txt + noindex combined | — | — | No | Not recommended — Disallow prevents the noindex tag from ever being seen |
Generate your SaaS robots.txt right now — free
The Rebrixe Robots.txt Generator builds a clean, correctly formatted robots.txt file for marketing sites, apps, docs, and staging environments alike. No account, no watermark — just answer a few questions and copy the result.