What is the GPTBot?

GPTBot is the web crawler from OpenAI. It is used to train their models as well as for providing real-time information in ChatGPT (and similar features). And yes, GPT-4 can already access the internet.

Its full user-agent string is:

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot).

How to make sure the GPTBot can’t crawl your website?

You can use robots.txt to block GPTBot from accessing your website, or parts of it. To disallow GPTBot to access your site you can add GPTBot to your site’s robots.txt:

User-agent: GPTBot
Disallow: /

According to some statistics at least 15% of the top 100 websites and 7% of the top 1,000 websites are blocking GPTBot, a new analysis finds.

Why to NOT block GPT Bot?

At Otterly.AI we would not recommend blocking GPTBot from accessing your website. Here’s why:

  • You might get even more traffic – because LLMs use your content, display it and your users find content on those LLMs
  • Your users are potentially better informed before they even land on your website
  • You avoid and false information and you use the opportunity to simply educate the market with YOUR content – also on LLMs

Why to block GPTBot from accessing your website?

The short version:

  • Protect your privacy, content and security
  • Protect your IP rights
  • Protect your own website experience

Disclaimer: Just because you block the GPTBot, it doesn’t mean that LLMs “forget” you, your brand or content. You just protect your content from any future usage.

Otterly.AI for Brand Monitoring and Compliance

If you are now wondering how your brand is mentioned on LLMs – e.g. GPT-4 or Google Gemini, feel free to try Otterly.AI for free. You can monitor any prompt and see if your brand is mentioned and how it got referenced.


