The internet is no longer primarily a human space. Cloudflare data now shows that bots account for 57.5% of all HTTP traffic — and bot traffic has officially surpassed human traffic for the first time in the internet’s history. So who’s generating all that traffic? Five AI bots dominate the landscape — and they’re not all doing the same thing.

What Is An AI Bot?
An AI bot (short for robot) is an automated program that traverses the internet without a human directly controlling each action. Unlike a person typing a URL and clicking around, a bot executes instructions programmatically — crawling pages, extracting data, following links, and sending requests at machine speed and scale.
AI bots, specifically, are bots built to serve AI systems. They collect training data for large language models, power search and retrieval for AI assistants, or act as agentic AI — autonomous agents that complete tasks on a user’s behalf. The latter category is the fastest-growing: when an AI agent shops, researches, or browses for a user, it can hit thousands of pages to accomplish what a human would do in five clicks. That multiplier effect is what flipped the bot-to-human traffic ratio.
AI bots typically identify themselves to web servers via a “user-agent” string, which is how Cloudflare can classify and track them. Not all bots are transparent about this — but the major ones operated by Google, Meta, Anthropic, OpenAI, and others do declare themselves.
The Top 5 AI Bots By HTTP Traffic Share
Based on Cloudflare’s data tracking HTTP request trends for the five most active AI bots:
| Bot | Company | Traffic Share |
|---|---|---|
| Googlebot | 27.8% | |
| Meta-ExternalAgent | Meta | 12.3% |
| ClaudeBot | Anthropic | 11% |
| Bytespider | ByteDance | 10.4% |
| GPTBot | OpenAI | 9% |
Breaking Down Each Bot
1. Googlebot — 27.8%
Googlebot is the dominant force, and it isn’t close. At nearly 28% of AI bot traffic, Google’s crawler operates at a scale no competitor currently matches. Its primary purpose is search — indexing the web so Google’s search engine can surface relevant results. Googlebot has been doing this for over two decades and has the most mature crawling infrastructure in existence. Its lead here reflects Google’s incumbency in web indexing, not just AI ambition.
2. Meta-ExternalAgent — 12.3%
Meta’s crawler comes in second, a notable position for a company whose core products — Facebook, Instagram, WhatsApp — are primarily closed platforms rather than open-web search engines. Meta-ExternalAgent serves mixed purposes: training data collection for Meta’s AI models (including Llama), content retrieval, and potentially link-preview generation. The 12.3% share signals the scale of Meta’s investment in AI infrastructure and its appetite for web-sourced training data.
3. ClaudeBot — 11%
Anthropic, the company behind the Claude family of AI models, runs ClaudeBot primarily for training data collection. Its 11% share places it within striking distance of Meta, which is significant given that Anthropic is a fraction of Meta’s size. ClaudeBot’s presence reflects how aggressively AI labs have had to crawl the web to fuel model development — a trend that has contributed directly to the overall surge in bot traffic.
4. Bytespider — 10.4%
ByteDance, the Chinese company behind TikTok, operates Bytespider with a mixed purpose profile. It collects data for training and recommendation systems that power TikTok and ByteDance’s broader AI stack. At 10.4%, Bytespider is nearly on par with ClaudeBot, underscoring how seriously Chinese tech companies are building out their own AI training pipelines using open web data — independent of US-based model providers.
5. GPTBot — 9%
OpenAI’s GPTBot rounds out the top five. Despite OpenAI running ChatGPT, which now has nearly a billion daily users, GPTBot comes in last among the five at 9%. Its declared purpose is primarily training data collection. The lower share relative to competitors may reflect OpenAI’s heavier reliance on licensed data partnerships (it has deals with major publishers and Reddit) rather than open crawling alone.
What Are These Bots Actually Doing?
Cloudflare categorizes bot activity by crawl purpose. The five main categories are:
- Search — Indexing web content for search engine results (Googlebot’s primary function)
- Training — Harvesting content to train AI models (ClaudeBot, GPTBot, Bytespider)
- Mixed Purpose — Multiple functions, often combining training and retrieval (Meta-ExternalAgent, Bytespider)
- User Action — Bots acting on behalf of a user — the “agentic” category that is growing the fastest
- Undeclared — Bots that don’t specify their purpose
The “User Action” category deserves attention. Right now it’s a small slice of total traffic, but it grew nearly 8,000% over 2025. As more AI assistants gain the ability to browse the web autonomously on a user’s behalf, this category will likely overtake training-focused crawling entirely.
Why This Matters
The composition of internet traffic is changing structurally. For businesses and publishers, bots don’t engage with ads, don’t convert, and don’t show up in standard analytics. The web’s economic model — built on human eyeballs — is under pressure.
Cloudflare has already blocked over 416 billion AI bot requests at website owners’ requests and launched a Pay Per Crawl system allowing publishers to charge AI scrapers for access. The infrastructure layer is adapting, but the underlying dynamic isn’t going away: every AI model needs data, and the web is where most of it lives.
The five bots above represent the companies best positioned to shape what AI knows — and therefore what AI does.