It was not so easy to know what happened on other part of the world in 1990s. The digital growth during past two decades was unimaginable and the whole world is being brought into the hands of individual human beings. Search engines like Google are one of the important reasons for this digital growth to bring information to your hand. Each day more than billions of searches are made in Google to find the relevant information. It is interesting and important to understand how Google search engine works in order to display the best possible webpage on the search result.
Types of Search Engines
Basically there are three types of search engines available:
- Automatic crawler based search engines
- Manually maintained search engines
- Hybrid types
Most popular search engines we use on day to day basis are hybrid types. They have automated bots for finding the information and use minimum manual intervention to classify the details. Learn more about different types of search engines.
How Google Search Engine Works?
Google uses automated crawlers for getting information from the web and uses human intervention for taking action against malpractices. Below are the four basic steps Google follows for displaying a webpage on search result:
- Finding information by crawling the web
- Indexing the information in search database
- Calculating the relevancy
- Retrieving the search results
Step 1 – Crawling the Web
Search engines use a piece of software code to find the available information from webpages. The software code is referred with many names like crawler, bot, spider, etc. Below are some of the crawlers used by popular search engines.
- Googlebot used by Google for web crawling
- Bingbot used by Bing search engine
- Baidu Spider used by Baidu search engine
- Yandexbot used by Yandex search engine
A single search engine can use multiple crawlers to find different types of information. For example, Google uses the following crawlers to find relevant webpages on the web:
|Crawler Name (User-agents)||Purpose|
|Googlebot||Used to index content for showing in Google web search results. This is also the same crawler used for smartphones.|
|Googlebot-Image||Used to index images for showing in Google image search results.|
|Googlebot-News||Used to collect news feed for showing in Google news search results.|
|Googlebot-Video||Used to crawl videos on the web for showing in video search results.|
|Googlebot-Mobile||Used for Google mobile search on feature phones.|
|Mediapartners-Google||Used for indexing web page content for displaying relevant Google AdSense ads.|
How Does Crawler Work?
Search engine crawlers look for each single webpage on the web and find the hyperlinks on the pages. Each link is being followed or ignored (nofollow) as instructed through meta tags. There are ways to control the crawlers through .htaccess, robots.txt and meta tags. You can read more on search engine optimization for crawlers in a separate article.
The collected details by crawlers are sent to Google servers for classifying and indexing.
The crawlers use the list of webpages based on the previous information and also use XML Sitemap submitted by site owners. The XML Sitemap is submitted to Google through Google Search Console and other search engines also have their own webmaster tools account. Unlike before, crawlers are more intelligent to understand the meaning of content, validate content changes and evaluate the links.
For Website Owners on Crawlers:
- Crawlers also use the bandwidth of site’s server, hence it may be necessary to control the crawl rate of the automated search engine bots. You can control the crawlers under Google Search Control and respective webmaster tools account.
- Google does not allow setting the crawling time. What you can do is merely to increase or decrease the frequency. But Bing offers the control to tell when exactly you want Bingbot to crawl your site. In such cases, ensure to set the crawl rate maximum when you have less visitors on your site.
- Google decide the crawling of the pages based on their own algorithm and does not accept payment to crawl the site more frequently. When your webpage is not visible in search results then use URL Inspection option in Google Search Console to submit your content to Google.
- There are also bad bots which may not follow the guidance from robots.txt or meta tags.
Step 2 – Classifying and Indexing Crawled Information
Everyday there are new pages published and old domains expiring. So, the crawlers need to get latest and correct information and send to servers. Google servers classify the received information and index it for easy reference.
Imagine a library with racks classified with sections. You can find a book easily by looking on the related rack. Google servers do similar classification of information based on the keywords on the webpages. This is the reason the keywords on each single webpage is important, as the page will be classified accordingly.
Google has sophisticated indexing system to check multiple factors on webpage content. For example, time relevant content is displayed top in search results based on relevancy rather than keywords. Also images and videos are used for image and video search respectively.
If you are a website owner, ensure the page is written for human users with the readable content. In general, search engines easily interpret text based content compared to images, videos and flash content. You can use tools like Semrush to do keyword analysis and competitive keywords to improve your ranking.
Step 3 – Calculating the Relevancy
When you search for a query, search engine needs to look for relevant results from billions of indexed webpages. With the highly intelligent crawling and indexing system, it is easy for Google to look for the pages relevant to the searched keywords. In simple words, the relevancy between the search query and the webpage content decides the retrieved result.
On other hand, Google also uses relevancy for indexing the content with correct context.
- When there is a word “Washington” on a webpage, Google can easily interpret the context whether it is used as a name of the place or a person.
- Sites with focused niche tends to perform better than the sites with broader scope.
- Google understands the brand name. For example, when you search for “webnots” you will get the “webnots.com” as a top result. Though there is no dictionary meaning of webnots, over the period of time Google will understand that it’s a brand name.
Step 4 – Retrieving the Results
Once the relevant list of pages is fetched, the final step is to retrieve the results in an appropriate order. Generally the most popular pages are listed on top and the popularity is calculated based on the quality inbound links to the page. The concept is very simple that the popular pages are referred by more number of people and has high reference on external websites.
Listing based on the link popularity works perfectly if the links are legitimate. Unfortunately this concept of ranking created a revolution in search engine marketing field that every site owner started artificial link building. This includes leaving the site’s URL in comment section, forum posting and all possible places on popular sites. Google made many improvements in this link popularity concept like not considering links from comment section. Also there will be a heavy penalty for the sites having artificial links and trying to manipulate the link popularity by any means.
Though the search results are displayed in fraction of seconds there are huge mathematical algorithms to calculate the position of the webpages on the search results. This ensure the site owners provide more useful and user friendly information to visitors.
There is a big process involved in crawling, indexing and showing a webpage in search results. Google also continuously changes the logic of ranking webpages making things more dynamic. Though this is a big effort, remember that Google is a profit oriented company and shows a list of advertisements on top of organic search results to make money. Due to this fact, you should not rely only on Google search information for making important decisions.