Skip to content
TypeParser
All tools

robots.txt Builder

Generate and validate robots.txt.

beats en.ryte.com edge: Build + validate in one tool
builder
user-agent
allow (one per line)
disallow (one per line)
sitemap
crawl delay (sec)
robots.txt
Guide

About robots.txt Builder

Build and validate <code>robots.txt</code> files. Add user-agents (Googlebot, Bingbot, GPTBot, ClaudeBot, PerplexityBot, etc.), allow / disallow paths, and the sitemap URL. The validator flags syntax errors and warns about common mistakes. Drop the output at <code>/robots.txt</code> at your site root.

What a robots.txt should usually have

User-agent: *
Disallow: /admin/
Disallow: /api/
Disallow: /cart

# Major engines
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

# AI crawlers — choose to allow or block
User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

Sitemap: https://yoursite.com/sitemap-index.xml

The ”*” block sets the default; named user-agents override.

Common patterns

  • Block staging from indexingUser-agent: * / Disallow: / (better: serve X-Robots-Tag: noindex header)
  • Block private routesDisallow: /admin/, /api/, /cart
  • Allow major engines, block AI training — explicit allow for Googlebot/Bingbot, disallow for GPTBot/ClaudeBot
  • Point at sitemap — always include the Sitemap: directive

Mistakes to avoid

  • Robots.txt as a security mechanism — it is not. Anything sensitive needs auth.
  • Disallowing CSS/JS — Google rendering needs them. Don’t block /_next/, /static/, /_astro/, etc.
  • Sitemap URL relative — must be absolute (https://...).
  • Overlapping allow/disallow — most crawlers honor the most specific match; explicit allow wins on tie.

Common workflows

New site launch. Build robots.txt with all major engines allowed, AI training blocked or allowed per policy, sitemap declared.

Audit an existing robots.txt. Paste in to validate. The tool flags unrecognized directives and common errors.

Block staging from indexing. During pre-launch, deploy a Disallow: / robots.txt or (better) serve X-Robots-Tag: noindex headers.

Frequently asked questions

What does robots.txt actually do?
A polite request to crawlers. Compliant bots (Google, Bing, most AI crawlers) honor it. Malicious crawlers ignore it. For hard blocks, use authentication or rate-limiting.
Wildcard behavior?
Disallow: /admin/* matches everything under /admin/. Most crawlers also accept $ for end-of-URL.
How do I block AI crawlers?
User-agent matches: GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot, Google-Extended (training data), CCBot (Common Crawl). Set Disallow: / per agent to opt out.
Sitemap declaration?
Add a Sitemap: https://yoursite.com/sitemap.xml line at the bottom (any user-agent block). Pointing search engines at your sitemap.
Where does robots.txt go?
Always at the site root: /robots.txt. Anywhere else, crawlers ignore it.
Does noindex go in robots.txt?
No — noindex is a meta tag (or X-Robots-Tag header). robots.txt blocks crawl; noindex blocks indexing.

Related tools

Last updated: 2025-01-15