Robots.txt Guide for UK Websites: How to Optimise and Use Examples

So, you’ve got a website in the UK and you keep hearing about this mysterious little file called robots.txt. What’s the deal with it, right? Well, it’s basically your site’s way of politely telling Google and other search engines, “Hey, here’s what you can and can’t look at.” In other words, it’s a tiny text file with a pretty big job, especially when it comes to SEO. Now, the good news is you don’t have to be a tech wizard to get it right. We’ll walk you through everything: why robots.txt matters for UK sites, how to create one that Google loves, and we’ll even throw in 20 real-life examples to make your life easier. So let’s dive into the ultimate robots.txt guide for UK websites—no stress, no jargon overload, just the good stuff!

What Is Robots.txt?

Robots.txt is basically your website’s “house rules” for search engines. It’s a tiny text file that sits in the root of your domain and tells crawlers—like Googlebot—what they can look at, and what they should leave alone. Nothing fancy, no code magic. Just a simple file that helps your site stay organised in Google’s eyes.

Think of it like this: if your website was an office, robots.txt would be the receptionist saying, “Feel free to explore this area, but please avoid that storage room, it’s a mess.”
That’s literally it. One small file, big control.

For UK websites, especially WordPress and small business sites, a clean robots.txt makes a real difference. It keeps Google from wasting time crawling junk pages (like search results, cart pages or random parameter URLs) and focuses it on the important stuff—your core pages that actually bring traffic and sales.

“A robots.txt file tells search engine crawlers which URLs the crawler can access on your site. This is used mainly to avoid overloading your site with requests; it is not a mechanism for keeping a web page out of Google.” – Google for Developers

How Does Robots.txt Work?

At its core, robots.txt works like a polite conversation between your website and search engine crawlers. When Googlebot visits your site, the very first thing it checks is your robots.txt file—kind of like scanning the building rules before stepping inside.

The file contains simple “allow” and “disallow” instructions that tell crawlers which parts of your website they can access and which areas they should skip. No pressure, no strict enforcement — just clear guidance that most search engines happily follow.

Here’s the magic:
If your robots.txt is clean and well-structured, Google crawls your site faster, wastes less time on useless URLs, and indexes your important content more efficiently. But if your robots.txt is a mess? Google can end up crawling the wrong stuff, ignoring good pages… or in the worst case, not indexing your site at all.

So while robots.txt looks simple — a few lines of text — it has a pretty big influence on how your website performs in UK search results. And that’s exactly why we’re walking through it step by step.

Why Robots.txt Matters for SEO (Especially in the UK)

Robots.txt might look like the most boring file on your entire website — but in the world of SEO, especially here in the UK, it plays a much bigger role than most people think.

Here’s why it actually matters:

1. It helps Google focus on the pages that truly matter

Every website has pages Google doesn’t need to waste time on — search pages, filter URLs, cart, checkout, login, random parameters, and other digital clutter.
Robots.txt lets you say:
“Google, mate, skip the messy stuff and focus on my real content.”

This keeps your crawl budget clean and efficient — yes, even small UK websites have one.

2. It protects your site from duplicate content chaos

UK e-commerce sites are notorious for generating endless duplicates through filters and parameters.
Google hates that.
A solid robots.txt file helps prevent those URLs from flooding the index and hurting your rankings.

3. It speeds up how fast Google discovers your pages

A well-structured robots.txt helps Google crawl your site faster and smarter.
This means:

  • quicker indexing of new blog posts
  • faster updates when you refresh content
  • improved visibility for local UK search terms

All small wins that add up.

4. It keeps crawlers away from sensitive or useless areas

While robots.txt isn’t a security tool, it does keep bots away from:

  • admin areas
  • staging environments
  • scripts and system folders
  • archive or legacy sections

Less crawling noise = a cleaner, stronger index.

5. It prevents catastrophic SEO mistakes

One tiny line like this:

Disallow: /

…can wipe an entire website out of Google.
And yes — we’ve seen UK businesses accidentally publish this during migrations or redesigns.

A clear, controlled robots.txt reduces the risk of these disasters.

Where Should Robots.txt Be Located?

Robots.txt only works if it sits in a very specific place — the root directory of your domain.
That’s the top-level folder of your website, not a subfolder, not inside WordPress themes, and definitely not hidden somewhere deep in /wp-content/.

To put it simply, Google should be able to access your robots.txt file here:

https://yourwebsite.co.uk/robots.txt

Not here:

  • https://yourwebsite.co.uk/wp-content/robots.txt
  • https://yourwebsite.co.uk/assets/robots.txt
  • https://yourwebsite.co.uk/private/robots.txt

If it’s not in the root, Google will completely ignore it.

Why the root location matters

Search engines always request /robots.txt first before crawling anything else on your site. If the file isn’t in the right spot, bots assume your site has no crawl rules at all, which means:

  • more junk URLs indexed
  • wasted crawl budget
  • slower indexing
  • messy SEO signals

For WordPress users in the UK, you usually have two scenarios:

1. Using the built-in virtual robots.txt

WordPress generates a “virtual” robots.txt automatically when no physical file exists.
It’s simple, but limited — good for beginners, not great for SEO.

2. Uploading your own robots.txt manually

This gives you full control and is recommended for most business websites.
Just upload the file via FTP, cPanel, or your hosting file manager right into the public_html (root) directory.

Once it’s there, test it by visiting:
https://yourwebsite.co.uk/robots.txt
If it loads instantly, you’re good to go.

Where Should Robots.txt Be Located?

Your robots.txt file only works if it sits in the root directory of your domain — nowhere else.
Google must be able to access it directly here:

https://yourwebsite.co.uk/robots.txt

If the file is in a subfolder (like /wp-content/ or /assets/), Google will ignore it entirely.

Why does this location matter?
Because search engines always check /robots.txt first before crawling anything on your site.
If it’s missing or placed incorrectly, Google assumes there are no rules — which can lead to messy indexing and wasted crawl budget.

On WordPress, you have two options:

  • Virtual robots.txt generated by WordPress (basic, limited)
  • Your own physical robots.txt uploaded to the root (recommended)

Once uploaded, just visit your URL to confirm everything works correctly.

How Google Reads Robots.txt (Simple Breakdown)

Google reads your robots.txt file before it crawls anything on your website. It’s the very first checkpoint — a quick “what’s allowed and what’s not” scan. The file uses two basic components:

  • User-agent – which crawler the rule refers to
  • Allow / Disallow – what the crawler can or cannot access

When Googlebot lands on your site, it does something like this:

  1. Looks for yourwebsite.co.uk/robots.txt
  2. Reads the rules from top to bottom
  3. Follows the most specific rule that applies to its crawler
  4. Starts crawling the allowed areas of your site

Even though robots.txt isn’t a security tool and doesn’t force Google, the bot almost always respects the instructions. That’s why a clean robots.txt helps guide Google toward your key pages, while keeping it away from your low-value or messy URLs.

A simple file, but a lot of influence.

Common Robots.txt Mistakes (And How to Avoid Them)

Robots.txt is simple, but it’s also one of the easiest places to break your SEO without even realising it. A single wrong line can block Google from crawling key pages — or even the entire website. Here are the most common mistakes UK site owners make, plus how to avoid them.

1. Blocking the entire website by accident

This is the nightmare scenario, and it happens more often than you’d think — especially during redesigns and staging migrations.

Disallow: /

This single line tells Google: “Do not crawl anything.”
If this appears on a live website, your rankings will drop to zero.

How to avoid it:
Always double-check your robots.txt before and after launching a new site. Never reuse staging rules on production.


2. Blocking important resources (CSS, JS, images)

Some businesses still block entire folders like /wp-content/ or /wp-includes/.
Google needs access to CSS and JavaScript to properly render your website.
If it can’t load your layout, you tank your Core Web Vitals and indexing quality.

How to avoid it:
Never block theme files, scripts or styles unless you fully understand the impact.


3. Using robots.txt as a “security tool”

Robots.txt is not a protection method.
If you put private or sensitive folders inside robots.txt, you’re basically advertising their location to bots — and hackers.

How to avoid it:
Use password protection, server rules or private directories.
Robots.txt is only for crawler guidance, not for hiding content.


4. Leaving old or useless rules after a redesign

Many UK websites evolve over time — new themes, new structures, new pages.
Robots.txt often stays outdated, blocking or allowing things that no longer exist.

How to avoid it:
Every redesign = audit robots.txt.
Remove old sections, update URLs and simplify the file.


5. Allowing crawl traps to remain open

Sites with filters, parameters or search pages often create huge amounts of duplicate URLs.
If you forget to block these, Google wastes crawl budget on pointless, auto-generated pages.

How to avoid it:
Disallow search results, parameters and unnecessary filter paths whenever possible.


6. Forgetting to include your sitemap

It’s a small detail, but adding your sitemap to robots.txt makes crawling more efficient.

Sitemap: https://yourwebsite.co.uk/sitemap_index.xml

How to avoid it:
Add it once and leave it. Google loves structured crawling.


7. Mixing Allow/Disallow in the wrong order

Google reads robots.txt from top to bottom.
Poor ordering can lead to rules cancelling each other out.

How to avoid it:
Group rules clearly and keep the file tidy.
Less chaos = fewer indexing errors.


8. Blocking mobile or ad bots unintentionally

Some UK businesses accidentally block Google AdsBot or Googlebot-Mobile, which ruins:

  • mobile indexing
  • PPC Quality Score
  • page rendering in SERPs

How to avoid it:
Never block Google-specific agents unless you absolutely know why.


Bottom line

Most robots.txt disasters come from:
rushed migrations, copy–paste templates, or old rules nobody remembers adding.
Keeping it clean, simple and intentional is the best long-term SEO strategy.

How to Create a Robots.txt File That Google Actually Loves

Creating a robots.txt file isn’t complicated — but creating one that Google actually likes (and that genuinely helps your SEO) requires a bit more thought. Most UK websites either overdo it with aggressive blocking, or they leave the file empty and hope for the best. Neither approach works.

Here’s how to build a clean, reliable robots.txt that keeps Google happy and your SEO healthy:

1. Keep it simple — Google prefers clarity over creativity

Robots.txt isn’t the place to get fancy.
Google reads it line by line, top to bottom, so a simple, well-structured file will always outperform a messy one.
Stick to clear rules, grouped logically, without duplicates or contradictions.


2. Start with a universal user-agent

Unless you’re dealing with something VERY specific, this is enough:

User-agent: *

This tells all search engine crawlers that the rules below apply equally.
No need to list 20 bots unless you have an advanced setup.


3. Allow core access, block only what’s necessary

Your goal is NOT to hide your website.
It’s to stop Google from wasting time on junk URLs.

Good rules:

  • block admin areas
  • block search pages
  • block parameter spam
  • keep CSS/JS accessible (non-negotiable)

Bad rules:

  • blocking entire folders
  • blocking theme files
  • blocking important pages
  • blocking everything “just in case”

4. Always allow admin-ajax

WordPress needs this to function properly.
Google needs it to render your pages correctly.

Allow: /wp-admin/admin-ajax.php

5. Add your sitemap — Google loves it

This is the single most important line most UK websites forget:

Sitemap: https://yourwebsite.co.uk/sitemap_index.xml

It gives Google a direct map of your structure, improving indexing speed.


6. Don’t use robots.txt for security — ever

If you don’t want people to access something, robots.txt is the wrong tool.
It publicly lists every “secret” folder you try to hide.

Use proper server rules or password protection.
Robots.txt is only for crawler management.


7. Test your robots.txt after uploading

Once the file is live, check it manually:

  • Visit https://yourwebsite.co.uk/robots.txt
  • Use Google Search Console’s “robots.txt tester”
  • Crawl your site with Screaming Frog

This makes sure nothing important is accidentally blocked.


8. Keep it updated with your website’s structure

If your site changes — new pages, new shop system, redesign, migration —
your robots.txt MUST change too.

Most SEO issues we fix in the UK come from outdated rules left over from old sites.

While every site is a bit different, most UK businesses — especially WordPress, service websites, portfolios and small e-commerce stores — can safely use one clean, reliable robots.txt setup. It keeps Google happy, avoids index bloat, and protects your crawl budget without blocking anything important.

Here’s the version we recommend for 90% of UK websites:

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /?s=
Disallow: /trackback/
Disallow: /xmlrpc.php
Disallow: /*?replytocom=
Disallow: /cgi-bin/

Sitemap: https://yourwebsite.co.uk/sitemap_index.xml

Why this setup works so well

  • Keeps admin pages out of Google — no need for them in the index.
  • Allows admin-ajax — crucial for WordPress functionality and proper rendering.
  • Blocks search result pages — they create unnecessary duplicates.
  • Blocks old WordPress junk like trackbacks and replytocom URLs.
  • Keeps crawl paths clean without restricting valuable content.
  • Includes the sitemap — giving Google a direct roadmap of your site.

When you might need a custom setup

If your website has:

  • a large e-commerce catalogue
  • advanced filters
  • multiple language versions
  • a custom CMS
  • internal apps or portals
  • complex parameter structures

…then you’ll want a slightly more advanced robots.txt.
Don’t worry — you’ll find plenty of examples in the next section.

Before we jump into the examples…

Double-check your live robots.txt after uploading it.
A single typo can block entire parts of your website — or everything.

Google robot teaching the SocialBerry blueberry mascot how robots.txt works

20 Real Robots.txt Examples You Can Use Right Away

Below you’ll find the most useful robots.txt examples for real UK websites — WordPress, e-commerce, blogs, agencies and local businesses. Copy, adjust and paste directly into your site.

1. Standard WordPress robots.txt (Safe Default)

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

Sitemap: https://yourwebsite.co.uk/sitemap_index.xml

2. Block Search Pages (Highly Recommended)

User-agent: *
Disallow: /?s=

3. Block WordPress Junk URLs

User-agent: *
Disallow: /trackback/
Disallow: /xmlrpc.php
Disallow: /*?replytocom=

4. Block Cart, Checkout & Account Pages (E-commerce)

User-agent: *
Disallow: /cart/
Disallow: /checkout/
Disallow: /my-account/

5. Block Parameter URLs (Filters, Sorting, etc.)

User-agent: *
Disallow: /*?*orderby=
Disallow: /*?*filter=
Disallow: /*?*price=

6. Allow Everything (Minimal Setup)

User-agent: *
Allow: /

Sitemap: https://yourwebsite.co.uk/sitemap.xml

7. Block Entire Website (Staging Environments)

User-agent: *
Disallow: /

8. Block Specific Crawlers (Ahrefs, Semrush, etc.)

User-agent: AhrefsBot
Disallow: /

User-agent: SemrushBot
Disallow: /

9. Allow Only Googlebot

User-agent: Googlebot
Allow: /

User-agent: *
Disallow: /

10. Block Image Indexing

User-agent: Googlebot-Image
Disallow: /

11. Protect PPC — Allow Google AdsBot Only

User-agent: *
Disallow: /

User-agent: AdsBot-Google
Allow: /

12. Block a Specific Folder

User-agent: *
Disallow: /private-files/

13. News Website Setup

User-agent: *
Disallow: /tag/
Disallow: /wp-login.php

Sitemap: https://yourwebsite.co.uk/news-sitemap.xml

14. Agency/Portfolio Website

User-agent: *
Disallow: /wp-login.php
Disallow: /wp-register.php

15. Blog-Only Website (Open Access)

User-agent: *
Allow: /

16. Block Pagination (SEO Cleanup)

User-agent: *
Disallow: /page/

17. WooCommerce Advanced Filtering Block

User-agent: *
Disallow: /product-tag/
Disallow: /product-category/*?*

18. Block Feed & Comment URLs

User-agent: *
Disallow: /feed/
Disallow: /comments/

19. Block Old or Backup Versions of the Site

User-agent: *
Disallow: /old-site/
Disallow: /backup/

20. Full SEO-Optimised Robots.txt for Most UK Businesses

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /?s=
Disallow: /cgi-bin/
Disallow: /xmlrpc.php
Disallow: /trackback/
Disallow: /*?replytocom=
Disallow: /feed/

Sitemap: https://yourwebsite.co.uk/sitemap_index.xml
key takeaways

Conclusion

Robots.txt may be small, but it has a massive impact on how Google crawls, understands and ranks your website. A clean, well-structured file keeps your crawl budget focused, prevents duplicate content issues and gives Google a much clearer picture of what your site is really about. And the best part? Once you understand the basics, maintaining a great robots.txt is incredibly easy.

Whether you’re running a local business site, a growing e-commerce store or a content-heavy blog, the examples in this guide should give you everything you need to build a setup that Google actually respects — without overcomplicating things or breaking your SEO by accident.

And if you ever decide to refresh your website, improve performance or fix long-standing technical issues, remember: a strong technical foundation always pairs perfectly with a well-crafted design. That’s exactly where our web design expertise comes in — helping UK businesses build faster, cleaner and more future-proof websites.

If you want your site to look great and play perfectly with Google — you know where to find us.

Don’t be green when it comes to robots.txt

Green Berry robots.txt
What exactly is robots.txt and what does it do?
It’s a simple text file placed in the root of your domain that tells search engine crawlers which paths they can access and which ones they should avoid. It guides crawling — not indexing.
Where should robots.txt be located on my website?
Always in the root directory, accessible at: https://yourdomain.co.uk/robots.txt. Your server should return HTTP 200 and Content-Type: text/plain.
Does “Disallow” remove a page from Google search results?
No. Disallow only blocks crawling. A URL can still appear in Google’s index if other pages link to it. To remove URLs, use noindex or the X-Robots-Tag header.
How do I add my sitemap to robots.txt?
Just add a full URL in its own line, like: Sitemap: https://yourdomain.co.uk/sitemap_index.xml. You can include multiple sitemap lines.
Do subdomains need their own robots.txt file?
Yes. Each subdomain has its own robots.txt — e.g. blog.yourdomain.co.uk/robots.txt is separate from www.yourdomain.co.uk/robots.txt.
Should I block CSS and JS in robots.txt?
No. Search engines need CSS and JavaScript to properly render your pages. Blocking them can harm SEO, UX and indexing quality.
What’s the best way to create robots.txt in WordPress?
Use an SEO plugin like Rank Math, Yoast or SEOPress. Avoid mixing plugin-generated robots.txt with a manual file to prevent conflicts.
How can I test my robots.txt file?
Check it directly at /robots.txt. In Google Search Console use URL Inspection → “Crawl allowed: Yes”. With tools like Screaming Frog, test a custom robots.txt.
Should I block UTM, fbclid or gclid parameters?
Often yes — they create duplicate URLs. Examples: Disallow: /*?utm_, Disallow: /*?fbclid=, Disallow: /*?gclid=.
Do “Crawl-delay”, “Host” or “Noindex” work in robots.txt?
Google ignores Crawl-delay and Host. Noindex in robots.txt also doesn’t work — use meta or X-Robots-Tag.
What if I accidentally publish “Disallow: /” on my live site?
Remove the rule immediately, clear cache/CDN, re-fetch key URLs in Search Console and resubmit your sitemap to speed up recovery.
Should I block search pages, filters and sorting parameters?
Often yes, to reduce duplication. Examples: Disallow: /*?s=, Disallow: /*?orderby=, Disallow: /*?filter_.
How do I block PDFs from crawling or appearing in search?
In robots.txt: Disallow: /*.pdf$ (blocks crawling). To remove PDFs from search, use meta or header noindex on pages that link to them.
Where can I find Google’s official robots.txt guidelines?
You can find them in Google’s Search documentation: official robots.txt guide.
Chris
Chris

Web designer and SEO/UX specialist with 20 years of experience. I combine visual sense with technical SEO and performance optimization (Core Web Vitals) to make every project intuitive, fast, and ready to rank high - and coffee is my most loyal framework. ☕