User agent is a umbrella term used for many purposes. In search engine world, this term is used for the automated crawling bots used by various search engines like Google and Bing. These automated web crawlers search and index the content in their databases in order to serve on the search results pages.
Why You Should Know the User Agent Names?
Well, it is not necessary to know the user agent names for a normal users. But for site owners and developers, it is necessary to know the user agent names for the following reasons.
- You may need to block the automated bots in order to save server resources and use the resources for real human users. For example, if you have no content is Chinese then it is better to block Baiduspider to save bandwidth.
- For blocking specific file types or directories on your site.
- For monitoring and troubleshooting purposes – you will see the user agent name in the server log. So knowing the user agent will help you to understand whether they are legitimate or bad bots.
User Agent Strings in Server Log
The web server will log each and every visit to the website. By analyzing these log entries, you can find out how many automated crawlers are scanning your site. For example, Google uses Googlebot user agent to crawl websites for showing them in desktop search results. This can be seen in the user agent string of your server log as below:
User Agents List for Google, Bing, Baidu and Yandex Search Engines
Here is list of user agents for the popular search engines.
User Agent Name | Purpose of Crawling | Search Engine |
---|---|---|
Googlebot | Desktop | |
Googlebot | Smartphone | |
Googlebot-News | News | |
Googlebot-Video | Videos | |
Googlebot-Image | Images | |
Mediapartners-Google | AdSense Mobile | |
Mediapartners-Google | AdSense Desktop | |
AdsBot-Google | Landing Page Quality Check | |
AdsBot-Google-Mobile-Apps | App Crawler | |
Bingbot | Desktop and Mobile | Bing |
MSNBot | Predecessor of Bingbot | Bing |
MSNBot-Media | Images and Videos | Bing |
AdIdxBot | Bing Ads | Bing |
BingPreview | Page Snapshots | Bing |
Desktop | Baiduspider | Baidu |
Mobile | Baiduspider | Baidu |
Business Search (Advertisements) | Baiduspider-ads | Baidu |
Baidu Union | Baiduspider-cpro | Baidu |
Baidu Favorites | Baiduspider-favo | Baidu |
Image Search | Baiduspider-image | Baidu |
News Search | Baiduspider-news | Baidu |
Video Search | Baiduspider-video | Baidu |
YandexBot | Desktop | Yandex |
YandexMobileBot | Mobile | Yandex |
Yandex | All Crawling | Yandex |
YandexDirect | Advertising | Yandex |
YandexDirectDyn | Dynamic Banners | Yandex |
YandexMedia | Media | Yandex |
YandexImages | Images | Yandex |
YaDirectFetcher | Advertising | Yandex |
YandexBlogs | Blog Posts and Comments | Yandex |
YandexNews | News | Yandex |
YandexPagechecker | Micro Markup Validator | Yandex |
YandexMetrika | Web Analytics | Yandex |
YandexCalendar | Calendar | Yandex |
YandexScreenshotBot | Screenshot | Yandex |
YandexFavicons | Favicons | Yandex |
YandexWebmaster | Webmaster Services | Yandex |
YandexImageResizer | Mobile Image Services | Yandex |
YandexSitelinks | Sitelinks | Yandex |
YandexAntivirus | Malware Checker | Yandex |
YandexVertis | Vertical Search | Yandex |
Slurp | All Search | Yahoo! |
DuckDuckBot | Search | DuckDuckGo |
ia_archiver | Cralwer for Ranking | Alexa |
aolbuild | Search | AOL |
teoma | Search | Ask Jeeves |
How to Control User Agents?
User agents can be controlled by using appropriate directives in Robots.txt and .htaccess files. These files must be available in the root directory of your site.
All good search engines follow the Robots.txt file entries. For example, if you don’t want Yandex search engine to crawl your site then add the following entries in your Robots.txt file.
Disallow: /
Generate Robots.txt entries for your site using our free Robots.txt generator.
The .htaccess file is used to control the Apache server. For example, you can instruct the server to block known bad bots by adding the below entries.
SetEnvIfNoCase User-Agent ([a-z0-9]{2000}) bad_bot SetEnvIfNoCase User-Agent (archive.org|binlar|casper|checkpriv|choppy|clshttp|cmsworld|diavol|dotbot|extract|feedfinder|flicky|g00g1e|harvest|heritrix|httrack|kmccrew|loader|miner|nikto|nutch|planetwork|postrank|purebot|pycurl|python|seekerspider|siclab|skygrid|sqlmap|sucker|turnit|vikspider|winhttp|xxxyy|youda|zmeu|zune) bad_bot Order Allow,Deny Allow from All Deny from env=bad_bot
Leave a Reply
Your email is safe with us.