Robots.txt for SaaS: The Complete Setup Guide (2026)

A marketing site has one job: get pages indexed. A SaaS product has several jobs running on the same domain family at once — a marketing site that wants to rank, an app that customers log into, docs that should be searchable, an internal API, and probably a staging environment nobody meant to make public. A single generic robots.txt file, copied from a template, usually gets this wrong in one of two directions: it blocks too much and your pricing page stops ranking, or it blocks too little and Google starts crawling your login screen and dashboard routes.

Robots.txt for SaaS isn't a one-line fix. It's a small set of deliberate decisions about which parts of your product are content and which parts are application — and those decisions look different for every subdomain you run.

Quick Answer

A SaaS robots.txt file should allow crawling of your marketing pages, blog, and docs, while disallowing the logged-in app, admin routes, internal API endpoints, and any staging or dev subdomain. Each subdomain — marketing site, app, docs — needs its own robots.txt at its own root, and the file should always point to your sitemap for the pages you do want indexed.

What is robots.txt, and what it isn't

Robots.txt is a plain-text file at the root of a domain that tells well-behaved crawlers which paths they're welcome to request and which ones to skip. It's a request, not a lock — it works because major search engines choose to respect it, not because it enforces anything.

It controls crawling, not access. A Disallow rule stops Googlebot from fetching a URL. It does nothing to stop a person from opening that URL directly in a browser.
It's per-hostname. A rule set at yoursaas.com has no effect on app.yoursaas.com or docs.yoursaas.com — each one is checked separately.
It uses a small set of directives. User-agent targets a crawler, Disallow blocks a path pattern, Allow carves out an exception, and Sitemap points to your sitemap file.
It's publicly readable. Anyone can view a site's robots.txt, so it shouldn't be used to list sensitive paths you don't want discovered — doing so can draw attention to them instead.

# A minimal SaaS marketing-site robots.txt
User-agent: *
Disallow: /app/
Disallow: /admin/
Disallow: /api/
Allow: /

Sitemap: https://yoursaas.com/sitemap.xml

Why robots.txt matters more for SaaS

A brochure site can often get away with an empty or near-empty robots.txt. SaaS products can't, because of how much surface area they run on:

Dashboards look like pages to a crawler. Without a rule blocking it, Googlebot will attempt to crawl your app's routes just like any other URL, wasting crawl budget on screens no anonymous visitor should see indexed.
Staging environments are easy to forget. A dev or staging subdomain spun up for testing is still a public URL unless it's explicitly blocked or authenticated, and it competes with your production site for rankings if indexed.
Docs and marketing content need to stay open. Overly broad Disallow rules — often copied from the app's robots.txt by mistake — can accidentally block the very content pages meant to bring in search traffic.
Crawl budget is finite even for well-funded sites. Every dashboard, filter combination, or internal search-result URL that gets crawled is a URL not spent on your pricing page or latest blog post.

📊 Quick stat The most common robots.txt incident at fast-growing SaaS companies isn't a missing rule — it's a staging robots.txt file, set to block everything, getting copied straight into production during a deploy and quietly de-indexing the marketing site.

Step-by-step: setting up robots.txt for a SaaS product

Map every subdomain and path your product uses. List the marketing site, app/dashboard, docs, blog, API, status page, and any staging or dev environments — robots.txt decisions get made per hostname.
Sort each area into "should rank" or "should not rank." Marketing pages, blog posts, pricing, and docs usually belong in the first group; login screens, account settings, internal tools, and staging belong in the second.
Write Disallow rules for the app and internal paths. On the marketing domain, block the app path, admin routes, and any internal API path the app calls directly.
Block staging and dev subdomains entirely. Add a robots.txt with Disallow: / under User-agent: * at the root of every non-production hostname, and keep it separate from the production file.
Leave docs and blog open, with their own sitemap reference. If docs live on their own subdomain, give it a permissive robots.txt with a Sitemap line pointing to the docs sitemap.
Add the sitemap directive to each production robots.txt. This isn't required, but it gives crawlers a direct path to the URLs you actually want indexed.
Test each file with Google Search Console. Use the robots.txt report and URL Inspection tool per verified property to confirm the right paths are blocked and the right ones aren't.
Re-check robots.txt whenever you ship a new subdomain or major route. A new internal tool or a newly public docs section both change what should be allowed or blocked.

Try the Rebrixe Robots.txt Generator — free Answer a few questions about your setup, get a ready-to-use robots.txt file.

Generate Robots.txt →

Common mistakes SaaS teams make

1. Blocking the entire site by accident

A single stray Disallow: / under User-agent: * — often left over from a staging config — blocks every page on the domain from being crawled, including the homepage.

2. Treating robots.txt as a security measure

Disallowing a path doesn't restrict who can open it; it only asks compliant crawlers not to fetch it. Sensitive routes still need authentication, and listing them in a public robots.txt can make them easier to find, not harder.

3. Forgetting that subdomains don't inherit rules

A well-configured robots.txt on the marketing domain has zero effect on app.yoursaas.com or docs.yoursaas.com — each subdomain needs its own file checked and maintained separately.

4. Blocking CSS and JavaScript

Disallowing asset folders to "save crawl budget" can prevent Google from rendering pages correctly, since it fetches and executes the same resources a browser would to judge layout and content.

5. Combining Disallow with noindex on the same page

If a page is blocked with Disallow, crawlers never fetch it, which means they never see a noindex tag placed on it either — the two directives shouldn't be used together on a page you're trying to keep out of search.

💡 Pro tip Before every production deploy, diff the live robots.txt against the previous version. A one-line change here can silently de-index an entire site, and it rarely shows up in a normal QA pass.

Real-world examples by subdomain

How the same SaaS product typically configures robots.txt differently across its subdomains:

Marketing site

yoursaas.com

Mostly open

Allows the homepage, pricing, and blog; disallows /app/, /admin/, and internal API paths.

Application

app.yoursaas.com

Fully blocked

Disallow: / at the root, since every route requires a login and none of it should be indexed.

Documentation

docs.yoursaas.com

Fully open

Allows all crawling with its own sitemap reference, since docs pages are meant to rank and answer search queries.

Staging

staging.yoursaas.com

Fully blocked

Disallow: / plus authentication, so a config mistake in one layer doesn't leave the environment fully exposed.

Blocking methods compared

Robots.txt is one of several ways to keep a URL out of search results — here's how it compares to the others for a SaaS site.

Method	Blocks crawling	Blocks indexing	Restricts access	Best for
Robots.txt Disallow	Yes	Usually, not guaranteed	No	App routes, internal API paths, staging domains
Noindex meta tag	No	Yes, reliably	No	Thin pages that should stay crawlable but not ranked
Password protection / auth wall	Indirectly	Yes	Yes	Dashboards, account pages, anything genuinely private
Robots.txt + noindex combined	—	—	No	Not recommended — Disallow prevents the noindex tag from ever being seen

Generate your SaaS robots.txt right now — free

The Rebrixe Robots.txt Generator builds a clean, correctly formatted robots.txt file for marketing sites, apps, docs, and staging environments alike. No account, no watermark — just answer a few questions and copy the result.

Free Robots.txt Generator Set your rules per path, get a ready-to-upload robots.txt.

Open Robots.txt Generator →

Frequently asked questions

Does every subdomain of a SaaS product need its own robots.txt file?

Yes. Crawlers look for robots.txt at the root of each individual hostname, so app.yoursaas.com, docs.yoursaas.com, and the main marketing domain each need their own file even if they belong to the same product.

Can robots.txt stop private customer data from appearing in Google?

No. Robots.txt only asks well-behaved crawlers not to fetch a URL — it does not restrict access. If a page is already indexed, or if a URL is linked to from somewhere else, it can still appear in search unless it's also protected by authentication or a noindex tag.

Should the app dashboard be blocked in robots.txt if it already requires login?

It's still worth adding a Disallow rule for the app path. Login screens, error pages, and redirect URLs under that path can otherwise get crawled and occasionally indexed, and blocking the path also saves crawl budget for pages that actually matter.

Why does my staging site keep showing up in Google search results?

This usually happens when a staging or dev subdomain was never given its own robots.txt disallowing all crawlers, or when the production robots.txt file was copied to staging without adjustment, leaving it wide open.

Will blocking CSS and JavaScript in robots.txt improve crawl efficiency?

No, it typically hurts it. Google renders pages the way a browser does, and if it can't fetch the CSS or JS a page depends on, it may misjudge the page's layout or content quality, which can affect how that page ranks.

What's the difference between Disallow in robots.txt and a noindex meta tag?

Disallow tells crawlers not to request the page at all, so they can't see a noindex tag placed on it. Noindex tells crawlers that already fetched the page not to include it in search results. Blocking a page with Disallow while also relying on noindex to remove it is contradictory, since the crawler never gets far enough to read the tag.

How do I know if my SaaS robots.txt is actually working correctly?

Google Search Console's robots.txt report shows the last fetched version of the file for a verified property and flags syntax issues, and the URL Inspection tool shows whether a specific page is blocked, indexed, or eligible for crawling.

Do I need to update robots.txt every time I launch a new product page?

Only if the new page falls under a path pattern that's already being blocked or allowed by an existing rule. A new marketing or blog URL under an already-open path needs no change; a new internal tool or account-only feature under a new path should be reviewed before launch.

What is robots.txt, and what it isn't

Why robots.txt matters more for SaaS

Step-by-step: setting up robots.txt for a SaaS product

Common mistakes SaaS teams make

1. Blocking the entire site by accident

2. Treating robots.txt as a security measure

3. Forgetting that subdomains don't inherit rules

4. Blocking CSS and JavaScript

5. Combining Disallow with noindex on the same page

Real-world examples by subdomain

Blocking methods compared

Generate your SaaS robots.txt right now — free

Frequently asked questions

Generate your SaaS robots.txt in seconds

Related tools and guides

How to Add Schema Without Coding

Schema Markup Explained: What It Is and How to Use It

Robots.txt Generator