30.1 C
New Delhi
Monday, September 16, 2024

Major Websites Reject Apple’s AI Scraping Practices

More from Author

In Short:

A recent study by journalist Ben Welsh found that over 25% of U.S. news sites block Applebot-Extended, while 53% block OpenAI’s bot. Google-Extended is blocked by nearly 43%. Some publishers choose to block AI bots without partnerships, while others enter commercial agreements. As AI usage grows, managing which bots to block has become a significant focus for media executives.


In a recent analysis, data journalist Ben Welsh discovered that approximately one-quarter of the news websites surveyed (specifically, 294 out of 1,167 primarily English-language, US-based publications) are blocking Applebot-Extended. This marks a notable contrast to his findings regarding OpenAI‘s bot, which is blocked by 53 percent of the sites included in the sample. Following the introduction of Google‘s AI-specific bot, Google-Extended, in September, Welsh reports that nearly 43 percent of news websites also block this bot. This data suggests that Applebot-Extended may not yet be on the radar of many publishers, although Welsh noted that its blocking percentage has been “gradually moving” upward since he commenced his investigation.

Welsh is engaged in an ongoing project that examines how news organizations interact with major AI agents. He commented, “A bit of a divide has emerged among news publishers about whether or not they want to block these bots.” Welsh acknowledged that the reasons behind each organization’s decisions are not entirely clear, but he speculated that licensing arrangements—where publishers are compensated for allowing bot access—could play a significant role.

Last year, The New York Times reported that Apple was seeking to establish AI-related deals with various publishers. Since then, entities such as OpenAI and Perplexity have forged partnerships with multiple news outlets and popular platforms. Jon Gillham, founder of Originality AI, remarked, “A lot of the largest publishers in the world are clearly taking a strategic approach,” implying that some are withholding access to data until a partnership agreement is finalized.

Supporting Gillham’s assertion, there have been instances where publishers altered their blocking strategies. For example, Condé Nast websites had previously restricted OpenAI‘s web crawlers but reversed this decision following a partnership announcement last week. A spokesperson from Buzzfeed, Juliana Clifton, stated that their company currently blocks Applebot-Extended and other AI web-crawling bots unless a partnership agreement—typically requiring payment—has been established with the owners.

Maintaining an accurate block list can be challenging due to the manual editing required for robots.txt, especially with the ceaseless introduction of new AI agents. Gavin King, founder of Dark Visitors, noted that many organizations struggle with discerning which bots to block. Dark Visitors provides a freemium service that updates clients’ robots.txt files and recognizes that publishers constitute a significant segment of their clientele due to copyright concerns.

The robots.txt file, while often perceived as a minor detail for webmasters, has gained considerable significance in the current digital landscape dominated by AI, leading it to become a strategic focus for media executives. Reports indicate that executives from major media companies are now directly involved in decisions regarding which bots to block.

Some organizations have explicitly communicated their rationale for blocking AI scraping tools due to existing partnerships. For instance, Lauren Starke, senior vice president of communications at Vox Media, stated, “We’re blocking Applebot-Extended across all of Vox Media’s properties, as we have done with many other AI scraping tools when we don’t have a commercial agreement with the other party.” She emphasized the company’s commitment to safeguarding the integrity and value of their published content.

- Advertisement -spot_img

More articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.

- Advertisement -spot_img

Latest article