Cloudflare Accuses Perplexity of Stealth-Crawling Blocked Sites
In a blog post, Cloudflare has accused AI search startup Perplexity of using stealth crawling methods to access websites that explicitly forbade its bots.

Cloudflare outlines how Perplexity employed undisclosed crawlers that ignored robots.txt directives and rotated user agents and IP addresses to bypass blocks.
According to Cloudflare, the controversy began after numerous customers reported that Perplexity continued to fetch content even when its declared bots, PerplexityBot and Perplexity‑User, were explicitly blocked via robots.txt or Web Application Firewall rules. The company then launched controlled tests involving newly created, undiscoverable domains with strict no‑crawl settings.
Despite these protections, Perplexity’s AI still returned detailed summaries about those domains, leading Cloudflare to conclude that hidden crawlers had bypassed the blocks.
Cloudflare notes that once the declared crawlers were blocked, Perplexity switched to generic user agent strings emulating Google Chrome on macOS. These stealth crawlers reportedly cycled through IP addresses and Autonomous System Numbers not associated with the company’s known infrastructure to evade detection.
The stealth activity was observed across tens of thousands of domains and involved millions of requests per day. As a consequence of these findings, Cloudflare has de‑verified Perplexity’s bots and introduced new bot management heuristics to block the stealth crawling behavior across its network of sites.
Cloudflare CEO Matthew Prince did not hold back, characterizing the behavior as similar to cybercriminal tactics and stressing the need to identify and block actors who disregard website directives.

Perplexity strongly disputed the allegations. In a blog post and statements to media, the company called Cloudflare’s report a “publicity stunt” and claimed it misrepresents how its AI assistants fetch data. Perplexity argued that it only accesses web pages on direct user demand, and that much of the cited traffic was generated by a third‑party automation service.
Perplexity’s response also emphasized that the content retrieved was used in real time to answer user questions, not stored or repurposed for model training.
Want to see more of our stories on Google?
P.S. Want to keep this site truly independent? Support us by buying us a beer, treating us to a coffee, or shopping through Amazon here. Links in this post are affiliate links, so we earn a tiny commission at no charge to you. Thanks for supporting independent Canadian media!