After publishing articles, instantly showing them in Google search results is one of the major objectives for all website owners. Some even try URL Inspection tool in Google Search Console for submitting manual crawl request. However, crawling and indexing of content is not guaranteed by Google for websites. Though you can’t change this fact with Google, there are many other factors that actually can influence the crawling of your articles by Google. If you are struggling to show up your content, here are some of the basic stuffs you need to check for allowing search engine bots to crawl your site properly.
Why Instant Crawling is Important?
It may not be as important for you to show articles instantly in Google when you publish tutorials or how-to stuffs. However, crawling instantly makes sense especially if the content doesn’t stay relevant too long. For example, a product promotion for a special occasion like new year is valid for only few days. You need to make sure the content is reaching out your audience in that period. This is the same situation for all news and media outlets publishing millions of articles every second.
Types of Crawling Issues to Check
There are few basic types of crawling issues you may face:
- Googlebot does not crawl your content at all
- Content takes too long to show in the search results
- Content show up in inappropriate format
You can do simple Google search or check in Search Console account to find these issues are present in your site. If you are discovering one of these issues, find out whether these are the reasons.
1. Use Optimized XML Sitemap
First make sure you have submitted XML Sitemap in Google Search Console. Remember, you should have verified the site ownership in order to use the features of Search Console account. This is a basic information for Googlebot to start the crawling process and you can see the last read date against the submitted sitemap. This will help you to find whether Googlebot is crawling your content or having any issues which you can check under the same Sitemaps or Coverage sections.
Many users think an autogenerated XML Sitemap is more than sufficient for Google to crawl your content. However, it is better to submit a properly validated Sitemap for your site which contains all required information. Let’s take an example of Sitemaps from Weebly and WordPress platforms. Both Weebly and WordPress automatically generates XML Sitemap for you, though you can have a custom Sitemap Index for WordPress with the help of plugins like Yoast SEO or Rank Math.
Weebly Sitemap Example – Weebly Sitemap shows URL and last modified date.
WordPress Sitemap Example Generated by Yoast SEO – WordPress Sitemap shows URL, last modified date, number of images and most importantly in a Sitemap Index format.
The index shows clear structure of your site and each individual Sitemap contains corresponding articles, for example post Sitemap contains only time relevant posts without mixing with static pages.
Though both platforms do not use priority in the Sitemap, WordPress Sitemap clearly tells Google about whether the article is a post or page and links each URL to the content. Googlebot also can get the details of different post types available on your site to understand the structure better.
Also, check your automatically generated Sitemap for 301 and 404 pages, then fix them. After making sure that your Sitemap is clean, go to Google Search Console and resubmit it.
2. Check for Blocked Content
Sometimes you or your developer may accidentally blocked search engine crawlers. For example, you might have setup blocking rules in development site and moved the changes to live site without noticing. Though there are multiple ways to block content, most popular way is to leave a disallow directive in robots.txt file. This can block Googlebot and other search engine bots from crawling certain parts of your website.
Use Google’s Robots.txt Tester tool to check your site’s robots.txt file and delete the blocked entries and resubmit the URLs in Google Search Console. Be aware that it may take weeks before the webpages are crawled again and started showing in the search results.
In addition, you might have wrongly blocked Googlebot’s IP address which can block the crawling. Check in your hosting account IP manager tool and in your website’s control panel that and delete any blocked IP addresses that belong to Googlebot. Lastly, check robots meta tag on page level and confirm that the page has no blocking meta attributes like nofollow and noindex.
Note: Most content management systems and website builder tools also allow you to block search engine bots when you create a new website. Make sure to disable this feature when you are submitting XML Sitemap to Google.
3. Fix Missing Structured Data
Using structured data helps Googlebot to understand your page’s content and show the relevant details in search results. For example, you want to show the star rating instead of plain text for your review articles. Though this will not stop Googlebot like robots.txt file entries, you may see unexpected results.
You can add structured data markup by using JSON-LD as recommended by Google. Make sure to test your site’s structured data markup using Schema Markup Testing Tool and fix all issues. After that use Rich Results Test tool to find how Google sees your structured data when showing in search results.
This will help you to add all mandatory information for the schema type you use and show up attractive content in search results. Though this looks technical, content management systems like WordPress makes this easy with the help of plugins.
4. Combine Duplicate Webpages
Smaller websites are easy to maintain and generally do not have duplicate content issue. However, for large websites, especially ecommerce websites, duplicate websites are big problems. Ecommerce websites may use one dedicated webpage for each variant of a product. It means that product variants can be almost identical. When this happens, Google will automatically take one of the pages as your primary page and ignore all other pages considering them as duplicate.
This can create problem when the pages with low priced variants appear in search results instead of high sales conversion pages.
- First avoid duplicate content by combining the pages and delete the ones with low value. Make sure to setup 301 redirect so that search engine bots understand the correct page to show in search results.
- If it is unavoidable to use duplicate webpages in your website, it is preferred to use canonical tag to identify the parent page.
- Lastly, you can use price and product variations with a single product page instead of creating multiple pages. You can do this easily with plugins like WooCommerce when using WordPress.
5. Messy Site Structure
In general, URL is not a ranking factor for showing the content in Google. However, having well-defined and clean URL structure will contribute to improved user experience which in turns result in higher rankings in search results. From crawling perspective, consider using simple page URLs and use breadcrumb to instruct search engines where exactly the current page is located on your site. Each webpage could be placed directly under the main domain to keep the URL structure simple.
Here is what is Google is officially saying for using complex URLs:
Overly complex URLs, especially those containing multiple parameters, can cause problems for crawlers by creating unnecessarily high numbers of URLs that point to identical or similar content on your site. As a result, Googlebot may consume much more bandwidth than necessary, or may be unable to completely index all the content on your site.
Google
You can avoid the followings in the URLs to avoid crawling related issues:
- If your webpage URLs are still automatically generated using random characters, it’s a good thing to fix that. Also avoid using dynamic parameters to avoid duplicate crawling.
- Block internal search results and other duplicate URLs with parameters using robots.txt file directives.
- Avoid using underscore and use hyphen instead.
- Add nofollow attribute to hyperlinks where you do not want the crawlers to follow the links and fix broken links avoiding 404 errors.
Not following guidelines will lead to Googlebot not crawling your URL and you will see errors like “URL is Unknown to Google” when inspecting URL in Search Console.
6. Too Much JavaScript Code
Render blocking JavaScript is one of the most popular issues you can see while using Google PageSpeed Insights tool for measuring speed score. When you use heavy JavaScript on your page, make sure it is not blocking the loading of page’s content for crawlers. Some websites use heavy JavaScript code for sliders, portfolio filtering and showcasing dynamic charts. The problem is that rest of the text content on the page will not load until the JavaScript is loaded fully. This may result in Googlebot not getting the full content of your page.
Test your pages with heavy JavaScript, using URL Inspection tool in Google Search Console to see how Googlebot for smartphone crawls your site.
If you see partial crawl or empty content, you may need to check the followings:
- Check your caching solution and CDN works properly delivery full content without blocking.
- Move the JavaScript files on the page to footer section so that other content can load faster.
- Gone are the days you need to use plenty of JavaScript like jQuery to create interactive webpages. Find and replace JavaScript based elements on your page with static HTML or CSS.
Other problem with JavaScript is using code from third-party sites like Google AdSense. Unfortunately, you can’t optimize third-party content and the options are to either avoid using them or delay loading them until there is a user interaction. Delaying scripts will not show them to crawlers like Googlebot and the bot will not see the corresponding content on the page. This may work fine for advertisements, but for text content related features, it is always better to use HTML or CSS instead of JavaScript.
7. Large and Non-Optimized Images
There are two possibilities of crawling issues with images. One is not seeing the images in Google Image search results and other is the images create problem for the normal page with text content.
- If you have websites like portfolio, photography or artwork, it is important to show individual images in search result. The best option here is to use a separate image Sitemap so that Googlebot can crawl them separately.
- When you have a large unoptimized image on the header section of your page, it can create problem for Googlebot. You will see errors like submitted page has no content in Search Console since Googlebot can’t render the remaining text content on the page. the solution here is to use smaller, optimized images and server them in lighter format like WebP.
Remember, Google uses Google-Image as a crawler for crawling images. Therefore, when testing images make sure to use Google-Image as a bot to get correct results.
8. Slow Hosting Server
You may wonder how server’s speed could affect the crawling of Googlebot. The problem comes when you have large number of URLs in XML Sitemap and Googlebot could not crawl all of them due to limited server resources. As mentioned above, plugins like Yoast SEO in WordPress creates individual Sitemaps each contains 1000 URLs. Most of the shared hosting servers will crash when you try to open the Sitemap in browser. If this is the case, you can’t expect Googlebot to crawl the Sitemap.
- Try to split each Sitemap to 200 or less URLs.
- Check your server hardware and bandwidth to optimize performance. Alternatively, you can upgrade to VPS or dedicated hosting to improve the overall performance. For WordPress, you can go for managed WordPress hosting companies like SiteGround, Kinsta or WPEngine.
Remember, page loading speed in mobile is important as Google uses smartphone crawler by default to crawl and index your pages. Slow loading pages on mobiles may create problems for crawlers to grasp the entire content. Therefore, make sure to have a responsive website that is optimized for mobile speed. Besides having a strong hosting server, make sure to cache your content, use CDN and target to pass Core Web Vitals. These factors help to rank your pages high in Google search results.
Final Words
All the above mentioned points are guidance for website owners to monitor and fix the crawling related issues on their site. Remember, with mobile first indexing Google uses the crawler for smartphone by default to crawl and index your content. Therefore, make sure to have a mobile optimized site with text content loading fast on above the fold area and avoid using heavy JavaScript and images on the header. These factors along with correct XML Sitemap and clean arobots.txt file will help crawlers to instantly find and index your content.
Leave a Reply
Your email is safe with us.