Robots.txt for SaaS: The Complete Setup Guide

A marketing site has one job: get pages indexed. A SaaS product has several jobs running on the same domain family at once — a marketing site that wants to rank, an app that customers log into, docs that should be searchable, an internal API, and probably a staging environment nobody meant to make public. A single generic robots.txt file, copied from a template, usually gets this wrong in one of two directions: it blocks too much and your pricing page stops ranking, or it blocks too little and Google starts crawling your login screen and dashboard routes.

Robots.txt for SaaS isn't a one-line fix. It's a small set of deliberate decisions about which parts of your product are content and which parts are application — and those decisions look different for every subdomain you run.

Quick Answer

A SaaS robots.txt file should allow crawling of your marketing pages, blog, and docs, while disallowing the logged-in app, admin routes, internal API endpoints, and any staging or dev subdomain. Each subdomain — marketing site, app, docs — needs its own robots.txt at its own root, and the file should always point to your sitemap for the pages you do want indexed.

What is robots.txt, and what it isn't

Robots.txt is a plain-text file at the root of a domain that tells well-behaved crawlers which paths they're welcome to request and which ones to skip. It's a request, not a lock — it works because major search engines choose to respect it, not because it enforces anything.

# A minimal SaaS marketing-site robots.txt User-agent: * Disallow: /app/ Disallow: /admin/ Disallow: /api/ Allow: / Sitemap: https://yoursaas.com/sitemap.xml

Why robots.txt matters more for SaaS

A brochure site can often get away with an empty or near-empty robots.txt. SaaS products can't, because of how much surface area they run on:

📊 Quick stat The most common robots.txt incident at fast-growing SaaS companies isn't a missing rule — it's a staging robots.txt file, set to block everything, getting copied straight into production during a deploy and quietly de-indexing the marketing site.

Step-by-step: setting up robots.txt for a SaaS product

  1. Map every subdomain and path your product uses. List the marketing site, app/dashboard, docs, blog, API, status page, and any staging or dev environments — robots.txt decisions get made per hostname.
  2. Sort each area into "should rank" or "should not rank." Marketing pages, blog posts, pricing, and docs usually belong in the first group; login screens, account settings, internal tools, and staging belong in the second.
  3. Write Disallow rules for the app and internal paths. On the marketing domain, block the app path, admin routes, and any internal API path the app calls directly.
  4. Block staging and dev subdomains entirely. Add a robots.txt with Disallow: / under User-agent: * at the root of every non-production hostname, and keep it separate from the production file.
  5. Leave docs and blog open, with their own sitemap reference. If docs live on their own subdomain, give it a permissive robots.txt with a Sitemap line pointing to the docs sitemap.
  6. Add the sitemap directive to each production robots.txt. This isn't required, but it gives crawlers a direct path to the URLs you actually want indexed.
  7. Test each file with Google Search Console. Use the robots.txt report and URL Inspection tool per verified property to confirm the right paths are blocked and the right ones aren't.
  8. Re-check robots.txt whenever you ship a new subdomain or major route. A new internal tool or a newly public docs section both change what should be allowed or blocked.
Try the Rebrixe Robots.txt Generator — free Answer a few questions about your setup, get a ready-to-use robots.txt file.
Generate Robots.txt →

Common mistakes SaaS teams make

1. Blocking the entire site by accident

A single stray Disallow: / under User-agent: * — often left over from a staging config — blocks every page on the domain from being crawled, including the homepage.

2. Treating robots.txt as a security measure

Disallowing a path doesn't restrict who can open it; it only asks compliant crawlers not to fetch it. Sensitive routes still need authentication, and listing them in a public robots.txt can make them easier to find, not harder.

3. Forgetting that subdomains don't inherit rules

A well-configured robots.txt on the marketing domain has zero effect on app.yoursaas.com or docs.yoursaas.com — each subdomain needs its own file checked and maintained separately.

4. Blocking CSS and JavaScript

Disallowing asset folders to "save crawl budget" can prevent Google from rendering pages correctly, since it fetches and executes the same resources a browser would to judge layout and content.

5. Combining Disallow with noindex on the same page

If a page is blocked with Disallow, crawlers never fetch it, which means they never see a noindex tag placed on it either — the two directives shouldn't be used together on a page you're trying to keep out of search.

💡 Pro tip Before every production deploy, diff the live robots.txt against the previous version. A one-line change here can silently de-index an entire site, and it rarely shows up in a normal QA pass.

Real-world examples by subdomain

How the same SaaS product typically configures robots.txt differently across its subdomains:

Marketing site
yoursaas.com
Mostly open
Allows the homepage, pricing, and blog; disallows /app/, /admin/, and internal API paths.
Application
app.yoursaas.com
Fully blocked
Disallow: / at the root, since every route requires a login and none of it should be indexed.
Documentation
docs.yoursaas.com
Fully open
Allows all crawling with its own sitemap reference, since docs pages are meant to rank and answer search queries.
Staging
staging.yoursaas.com
Fully blocked
Disallow: / plus authentication, so a config mistake in one layer doesn't leave the environment fully exposed.

Blocking methods compared

Robots.txt is one of several ways to keep a URL out of search results — here's how it compares to the others for a SaaS site.

Method Blocks crawling Blocks indexing Restricts access Best for
Robots.txt Disallow Yes Usually, not guaranteed No App routes, internal API paths, staging domains
Noindex meta tag No Yes, reliably No Thin pages that should stay crawlable but not ranked
Password protection / auth wall Indirectly Yes Yes Dashboards, account pages, anything genuinely private
Robots.txt + noindex combined No Not recommended — Disallow prevents the noindex tag from ever being seen

Generate your SaaS robots.txt right now — free

The Rebrixe Robots.txt Generator builds a clean, correctly formatted robots.txt file for marketing sites, apps, docs, and staging environments alike. No account, no watermark — just answer a few questions and copy the result.

Free Robots.txt Generator Set your rules per path, get a ready-to-upload robots.txt.
Open Robots.txt Generator →

Frequently asked questions

Yes. Crawlers look for robots.txt at the root of each individual hostname, so app.yoursaas.com, docs.yoursaas.com, and the main marketing domain each need their own file even if they belong to the same product.
No. Robots.txt only asks well-behaved crawlers not to fetch a URL — it does not restrict access. If a page is already indexed, or if a URL is linked to from somewhere else, it can still appear in search unless it's also protected by authentication or a noindex tag.
It's still worth adding a Disallow rule for the app path. Login screens, error pages, and redirect URLs under that path can otherwise get crawled and occasionally indexed, and blocking the path also saves crawl budget for pages that actually matter.
This usually happens when a staging or dev subdomain was never given its own robots.txt disallowing all crawlers, or when the production robots.txt file was copied to staging without adjustment, leaving it wide open.
No, it typically hurts it. Google renders pages the way a browser does, and if it can't fetch the CSS or JS a page depends on, it may misjudge the page's layout or content quality, which can affect how that page ranks.
Disallow tells crawlers not to request the page at all, so they can't see a noindex tag placed on it. Noindex tells crawlers that already fetched the page not to include it in search results. Blocking a page with Disallow while also relying on noindex to remove it is contradictory, since the crawler never gets far enough to read the tag.
Google Search Console's robots.txt report shows the last fetched version of the file for a verified property and flags syntax issues, and the URL Inspection tool shows whether a specific page is blocked, indexed, or eligible for crawling.
Only if the new page falls under a path pattern that's already being blocked or allowed by an existing rule. A new marketing or blog URL under an already-open path needs no change; a new internal tool or account-only feature under a new path should be reviewed before launch.

Generate your SaaS robots.txt in seconds

The Rebrixe Robots.txt Generator builds a correctly scoped robots.txt for your marketing site, app, or docs subdomain — no account, no watermark, just a ready-to-upload file.

Launch the Robots.txt Generator →
← Back to blogs