In Short:
Web crawling has been used by companies like Google and Perplexity AI to absorb articles for their summarizing machines. WIRED senior writer Kate Knibbs discusses the controversy over Common Crawl, while Randall Lane from Forbes talks about how Perplexity AI repurposed a Forbes article without permission. The practice of web crawling has become more controversial due to its use in AI technology.
Web Crawling and AI Controversy
Web crawling, a long-standing practice of indexing information on the internet, has traditionally been used by search engines like Google and organizations like Internet Archive and Common Crawl to organize content and make it easily searchable. Recently, however, this practice has become a topic of controversy as companies such as Google and Perplexity AI are using web crawling technology to extract entire articles for their summarization algorithms.
Gadget Lab Discussion
This week on Gadget Lab, WIRED senior writer Kate Knibbs discusses the implications of web crawling and the controversy surrounding Common Crawl. The show also features a conversation with Forbes’ chief content officer and editor Randall Lane, focusing on how Perplexity AI repurposed a Forbes article without authorization or proper credit.
Show Notes
For more information, read Kate’s article on publishers challenging Common Crawl for AI training data. Also, explore Randall’s piece on how Perplexity.AI reused content from Forbes writers.
Recommendations
Randall recommends checking out the National Thoroughbred League for horse racing enthusiasts. Kate suggests reading the book Victim by Andrew Boryga. Lauren’s recommendation is the show Hacks on Max.
Stay Connected
Follow Randall Lane (@RandallLane), Kate Knibbs (@Knibbs), and other podcast hosts on social media for more updates. The show is produced by Boone Ashworth (@booneashworth). Our theme music is by Solar Keys.
How to Listen
To listen to the podcast, use the audio player on this page. Alternatively, subscribe for free through different platforms: Apple Podcasts, Google Podcasts, Spotify, or by using the RSS feed.