User agent is a umbrella term used for many purposes. In search engine world, this term is used for the automated crawling bots used by various search engines like Google and Bing. These automated web crawlers search and index the content in their databases in order to serve on the search results pages. Each bot has its own name help website owners to understand which one is crawling their site. If you are looking for complete list of user agent list for popular search engines, then here you go.
Why You Should Know User Agent Names?
Well, it is not necessary to know the user agent names for simply visiting a website on the internet. However, website owners and developers need the user agent names for following reasons.
- For blocking automated bots in order to save server resources and use the resources for real human users. For example, if you have no content in Chinese then it is better to block the user agent Baiduspider to save bandwidth.
- To block specific file type or directory on the site.
- For monitoring and troubleshooting purposes – you will see the user agent name in the server log. So knowing the user agent will help you to understand whether they are legitimate or bad bots.
User Agent Strings in Server Log
The web server will log each and every visit to the website. By analyzing these log entries, you can find out how many automated crawlers are scanning your site. For example, Google uses Googlebot user agent to crawl websites for showing them in desktop search results. This can be seen in the user agent string of your server log as below:
Mozilla/5.0 (compatible; Googlebot/2.1; +https://www.google.com/bot.html)
User Agents List for Google, Bing, Baidu and Yandex Search Engines
Here is list of user agents for all popular search engines. Note that Google and other search engines use different crawlers for different purposes. For example, Googlebot-News is the crawler used for crawling news website’s content while Googlebot is used for desktop and mobile crawling.
User Agent Name | Purpose of Crawling | Search Engine |
---|---|---|
Googlebot | Desktop | |
Googlebot | Smartphone | |
Googlebot-News | News | |
Googlebot-Video | Videos | |
Googlebot-Image | Images | |
Mediapartners-Google | AdSense Mobile | |
Mediapartners-Google | AdSense Desktop | |
AdsBot-Google | Landing Page Quality Check | |
AdsBot-Google-Mobile-Apps | App Crawler | |
Bingbot | Desktop and Mobile | Bing |
MSNBot | Predecessor of Bingbot | Bing |
MSNBot-Media | Images and Videos | Bing |
AdIdxBot | Bing Ads | Bing |
BingPreview | Page Snapshots | Bing |
Desktop | Baiduspider | Baidu |
Mobile | Baiduspider | Baidu |
Business Search (Advertisements) | Baiduspider-ads | Baidu |
Baidu Union | Baiduspider-cpro | Baidu |
Baidu Favorites | Baiduspider-favo | Baidu |
Image Search | Baiduspider-image | Baidu |
News Search | Baiduspider-news | Baidu |
Video Search | Baiduspider-video | Baidu |
YandexBot | Desktop | Yandex |
YandexMobileBot | Mobile | Yandex |
Yandex | All Crawling | Yandex |
YandexDirect | Advertising | Yandex |
YandexDirectDyn | Dynamic Banners | Yandex |
YandexMedia | Media | Yandex |
YandexImages | Images | Yandex |
YaDirectFetcher | Advertising | Yandex |
YandexBlogs | Blog Posts and Comments | Yandex |
YandexNews | News | Yandex |
YandexPagechecker | Micro Markup Validator | Yandex |
YandexMetrika | Web Analytics | Yandex |
YandexCalendar | Calendar | Yandex |
YandexScreenshotBot | Screenshot | Yandex |
YandexFavicons | Favicons | Yandex |
YandexWebmaster | Webmaster Services | Yandex |
YandexImageResizer | Mobile Image Services | Yandex |
YandexSitelinks | Sitelinks | Yandex |
YandexAntivirus | Malware Checker | Yandex |
YandexVertis | Vertical Search | Yandex |
Slurp | All Search | Yahoo! |
DuckDuckBot | Search | DuckDuckGo |
ia_archiver | Cralwer for Ranking | Alexa |
aolbuild | Search | AOL |
teoma | Search | Ask Jeeves |
How to Control User Agents?
User agents can be controlled by using appropriate directives in Robots.txt and .htaccess files. These files must be available in the root directory of your site. All good search engines follow the Robots.txt file entries to index only the allowed content. For example, if you don’t want Yandex search engine to crawl your site then add the following entries in your Robots.txt file.
User-agent: YandexBot
Disallow: /
If you don’t know how to use robots.txt file or if there is no such file on your server, then you can generate robots.txt entries using our free Robots.txt generator tool.
The .htaccess file is used to control the Apache server which you can use for blocking user agents. For example, you can instruct the server to block known bad bots by adding the below entries in your .htaccess file.
SetEnvIfNoCase User-Agent ([a-z0-9]{2000}) bad_bot SetEnvIfNoCase User-Agent (archive.org|binlar|casper|checkpriv|choppy|clshttp|cmsworld|diavol|dotbot|extract|feedfinder|flicky|g00g1e|harvest|heritrix|httrack|kmccrew|loader|miner|nikto|nutch|planetwork|postrank|purebot|pycurl|python|seekerspider|siclab|skygrid|sqlmap|sucker|turnit|vikspider|winhttp|xxxyy|youda|zmeu|zune) bad_bot Order Allow,Deny Allow from All Deny from env=bad_bot
As you can see, you need to know the name of user agents to control their behavior.
Leave a Reply
Your email is safe with us.