What is an AI crawler?

An AI crawler is an automated bot that downloads web pages to power AI products. Some collect training data for large language models (GPTBot, ClaudeBot, Meta-ExternalAgent), while others index content for search and AI assistants (Googlebot, Bingbot, Applebot).

Which AI crawlers are most active?

Googlebot is consistently the most active crawler on the web. Among AI training bots, GPTBot (OpenAI), ClaudeBot (Anthropic) and Meta-ExternalAgent (Meta) generate the largest crawl volumes, and their share keeps growing.

Should I block AI crawlers like GPTBot or ClaudeBot?

It depends on your goals: blocking AI training bots via robots.txt keeps your content out of future model training, but it can also reduce how accurately AI assistants describe your brand. Most brands that care about AI visibility allow them.

Where does the AI Crawler Index data come from?

The index is built from Cloudflare Radar bot traffic data, normalized across Cloudflare's global network, and refreshed daily by LLM Pulse.

Live crawler rankings, updated daily

AI Crawler Index: Which Bots Crawl the Web Most?

The AI Crawler Index ranks the most active crawlers on the web, from search bots like Googlebot and Bingbot to AI training crawlers like GPTBot (OpenAI), ClaudeBot (Anthropic) and Meta-ExternalAgent (Meta). Updated daily with Cloudflare Radar data covering billions of requests to millions of websites.

Top Crawlers

Googlebot

Google

19.37%

👑

Other Bots

Various

25.29%

Meta-ExternalAgent

Bot Activity: Last 90 Days

Stacked 100%

Source: Cloudflare Radar (normalized request volumes across Cloudflare's global network)

Full Ranking

Other Bots

Various

25.29%

Googlebot

Google

19.37%

Meta-ExternalAgent

About These Bots

Googlebot

(Google)

Google's primary web crawler for indexing pages in Google Search. The most active crawler on the web.

Block via robots.txt: User-agent: Googlebot

Learn more →

Meta-ExternalAgent

(Meta)

Meta's AI training crawler used for Llama models and Meta AI products.

Block via robots.txt: User-agent: Meta-ExternalAgent

Learn more →

GPTBot

(OpenAI)

OpenAI's web crawler used to train and improve ChatGPT and other models. Respects robots.txt directives.

Block via robots.txt: User-agent: GPTBot

Learn more →

Applebot

(Apple)

Apple's web crawler supporting Siri, Spotlight, and Apple Intelligence features.

Block via robots.txt: User-agent: Applebot

Learn more →

facebookexternalhit

(Meta)

Meta's crawler that fetches page previews when URLs are shared on Facebook and Instagram.

Block via robots.txt: User-agent: facebookexternalhit

Learn more →

Bingbot

(Microsoft)

Microsoft's web crawler for indexing pages in Bing Search and powering Copilot answers.

Block via robots.txt: User-agent: Bingbot

Learn more →

Amazonbot

(Amazon)

Amazon's crawler for Alexa AI and their machine learning services.

Block via robots.txt: User-agent: Amazonbot

Learn more →

YandexBot

(Yandex)

Yandex's web crawler for indexing pages in Russia's largest search engine.

Block via robots.txt: User-agent: YandexBot

Learn more →

GoogleOther

(Google)

Google's secondary crawler used for research and non-search indexing tasks.

Block via robots.txt: User-agent: GoogleOther

Learn more →

ClaudeBot

(Anthropic)

Anthropic's web crawler used to train Claude AI models. Respects robots.txt directives.

Block via robots.txt: User-agent: ClaudeBot

Learn more →

AI Crawler Index FAQ

: An AI crawler is an automated bot that downloads web pages to power AI products. Some collect training data for large language models (GPTBot, ClaudeBot, Meta-ExternalAgent), while others index content for search and AI assistants (Googlebot, Bingbot, Applebot).
: Googlebot is consistently the most active crawler on the web. Among AI training bots, GPTBot (OpenAI), ClaudeBot (Anthropic) and Meta-ExternalAgent (Meta) generate the largest crawl volumes, and their share keeps growing.
: It depends on your goals: blocking AI training bots via robots.txt keeps your content out of future model training, but it can also reduce how accurately AI assistants describe your brand. Most brands that care about AI visibility allow them.
: The index is built from Cloudflare Radar bot traffic data, normalized across Cloudflare's global network, and refreshed daily by LLM Pulse.

Track How AI Crawlers Affect Your Brand Visibility

LLM Pulse monitors how AI models mention your brand. See which crawlers visit your site and how that translates into AI visibility.

Start Free Trial See AI crawler traffic on your own site: Agent Analytics