Cloudflare Launches Free Tool to Prevent AI Bots from Scraping Website Data

Image credits: Cloudflare

Cloudflare Launches Free Tool to Prevent AI Bots from Scraping Website Data

Reading time: 15 min

  • Shipra Sanganeria

    Written by: Shipra Sanganeria Cybersecurity & Tech Writer

  • Justyn Newman

    Fact-Checked by Justyn Newman Head Content Manager

Cloudflare released a new free tool last week that allows its clients to block AI companies’ web crawlers from scrapping their website content to train generative AI models. The one-click solution is now available to all customers, including those on free plans.

In a blog post introducing the tool, the company mentioned that its AI bot-combating tool, launched since September last year, has assisted customers in blocking identified crawlers that adhere to established protocols while scraping content. According to Cloudflare’s interna data, 85.2% of its customers opted to block these AI bots.

However, amid the growing demand for training data for generative AI models, Cloudflare identified instances where AI companies have circumvented these measures by using scrapers that appear to be legitimate visitors.

“Customers don’t want AI bots visiting their websites, and especially those that do so dishonestly,” the blog post stated. “We fear that some AI companies intent on circumventing rules to access content will persistently adapt to evade bot detection.”

To address this problem and provide a solution that helps clients block all AI crawlers, including those following proper scraping protocols, Cloudflare introduced this easy one-click feature.

Cloudflare states that this dynamic tool will update automatically as the company identifies new fingerprints of bots that widely scrape the web for training large language models.

Cloudflare Study Identifies Popular Active AI Bots

In the blog post, Cloudflare also shared internal study findings that highlighted popular AI bots attempting to access websites within its network. Among these, the most notable was the Bytedance-owned Bytespider bot, which accessed 40.4% of websites protected by Cloudflare, closely followed by OpenAI-managed GPTBot at 35.46%. The other two AI crawlers ranking in the top 4 were Amazon’s Amazonbot and Anthropic’s ClaudeBot.

In conclusion, Cloudflare emphasized that as AI technology continues to evolve, some companies may employ obfuscation techniques to evade bot-detection measures. Nevertheless, with its new mechanisms, the company aims to offer effective tools for managing AI bot access. It believes that with such tools, content creators will have greater control over how their content is utilized for AI training and applications.

Did you like this article? Rate it!
I hated it I don't really like it It was ok Pretty good! Loved it!
0 Voted by 0 users
Title
Comment
Thanks for your feedback