Understanding Website Crawlability

If you are serious about outranking your competition on Google or any other search engines, then you need to understand the technical side of SEO, such as website crawlability.

Website crawlability is one such aspect in technical SEO knowledge. However, it is quite common for people to overlook its significance in boosting your website’s SEO.

Unlike building backlinks, it is a form of on-site SEO. Which means you can be tweaked and improved on your end.

But first, what is a crawler?

In SEO terms, a crawler refers to a program (or ‘robot’) that belongs to a search engine (such as Google) that browses the Internet and indexes web pages.

Sometimes known also as a spider or a bot, crawlers create indexes for web pages so that search engines to function the way it should.

For example, if you own a website for your cupcake business in New York, the crawler goes through your site, picks up key information about your website and then sorts them accordingly.

So when someone runs a search term on ‘cupcakes in New York,’ your site would show up on the search results page.

Google uses a highly advanced sorting algorithm which includes looking at the quality of each website’s content, inbound links, and many other causes, to determine the relevance of the website.

Think of Google’s search result as a gigantic telephone book. To have your website be included in the book, your website needs to be crawled.

Ok, so then what is crawlability?

While most websites have no problems getting crawled by Google, some factors can put a dent in the crawling procedure.

To put it plainly, the crawlability of your website refers to how accessible your website is for crawlers to go through your website.

A website with low crawlability is as good as telling Google to ‘stay out of my website,’ which can cause your website to not show up in Google’s results.

Let’s take a look at some of the factors that can affect your crawlability:

HTTP status code

Sometimes you see websites that have missing pages, and numbers like 404 and 301 appear. These numbers are known as the HTTP status codes.

A crawler actively looks at the HTTP status code of a website before conducting a crawl. If the page is invalid or causes a redirect, Google won’t crawl that page.

Robots.txt

This is an important text file that tells the crawler which pages on your website to index and which websites to exclude.

This text file consists of blocks of directives that gives instructions to the crawler.

If you do not have your robot.txt file configured properly, the crawler will refuse to crawl your website. You can read more about robots.txt in this article.

Robots meta tag

Sometimes, there are pages on your website that you wish to block search engines from accessing.

For example, if you are running an eBook store, you would want to block your Downloads section from search engines so people who didn’t pay can’t download for free.

This is where robots meta tags come in. Placing a tag on a webpage blocks crawlers for that particular page.

If this tag is placed wrongly (e.g., on the homepage), then you are blocking the crawler from the entire website

JavaScript

Having JavaScript frameworks on your website can create difficulty for Google’s crawler to crawl your website effectively.

Fortunately, this is no longer a big issue as almost every new website are moving away from JavaScript and using HTML5 instead. But it is something you should be aware of.

Crawl budget

This is a term that is less commonly known among website owners. A crawl budget is an estimated amount of timespan and the number of web pages that a crawler will crawl in a day.

When the authority and quality of the website are high, the budget allocated by Google for crawling will be high as well.

What this means is that if you are running a new website with little to no authority, Google will not allocate a high crawl budget for your website.

Therefore you need to improve the efficiency of the crawlers and ensure that your web pages are being crawled properly and regularly within a limited budget.

A better crawl budget for your website means there will be more organic traffic. This is because Google perceives your website as more important.

While an increased crawl rate does not necessarily equate to higher search results ranking position, it has a definite impact on the search performance.

Checking your crawl rate

You can make use of Google Search Console to check out the crawl stats of Google’s crawler.

Just log in to the Search Console, click on ‘Crawl stats’ under ‘Crawl menu’ and you will be presented with a graphical breakdown report of the past Google crawler’s activities in the past 90 days.

Monitoring your crawl rate reports regularly lets you identify any server errors that could be plaguing your website

Improving your website for crawlability

Now we understand crawlability and how we can check our crawl rate, let’s look at what are some of the ways to increase crawlability of your website.

Send your XML sitemaps to Google Search Console

A sitemap is a file which has a roadmap for your website.

Since XML sitemaps break down the structure and hierarchy of your site, they ensure that crawler bots can find all the pages on your site.

It allows Google’s crawler to quickly locate all your essential pages, even in cases where your internal links are not functioning correctly.

This is especially useful if you are running a very large website and there are just too many internal links to maintain.

A single XML sitemap can only hold up to 50,000 URLs, if your website has more than 50,000 addresses, you’ll need to submit 2 sitemaps.

There are also speciality sitemaps for images and videos, which really help your non-text content get crawled.

Optimise videos and images for crawlers

Google loves content that has images and videos. However, crawlers cannot ‘see’ or ‘watch’ such content.

Crawlers need text to help them understand and classify your website. Use descriptive text in the filenames or alt descriptions of your videos and images.

For example, an image of one of your crew repairing a refrigerator should include an alt tag reading, “refrigeration repair in [insert your service area]”

This will help tell Google’s crawler what the image (and give a further indication as to what your refrigeration repair page is about) is showing

Using AMP pages

Google AMP (Accelerated Mobile Pages) are highly optimised for mobile web browsing performance.

This means AMP sites take a much shorter time to load, resulting in an increase of in mobile performance.

Since Google has moved to a mobile-first index, is crucial to ensure Google’s crawler bot can adequately index your mobile site’s content.

Conduct a mobile-friendly test and use the “Fetch as Google” option from Google to check for any issues.

You can find your mobile usability report under the Search Console. This report is useful in checking for any mobile usability issues that your site may have.

Alternatively, there are tools such as Screaming Frog that can simulate mobile bot search behavior. Use these tools to ensure that the mobile pages are rendered properly.

Perform a log analysis and block spam bots. This will grant you easier access to search engine crawlers.

Build internal links

If you have multiple internal links in your website, it can help get your pages crawled.

When you are running a blog, place links in the post that links to other relevant blog posts. Or if you are operating an online storefront, place recommendations that links to other similar products.

When placing internal links, do also check that all links are functioning to make sure no page gets left out by the crawler.

Using ‘nofollow’ and ‘noindex’ in meta tags

Sometimes, there will be content on your website that is not ready to be published or has very thin content.

In these situations, you should deploy nofollow or noindex meta tags to tell crawlers what to do for these pages.

Meta Tags are the codes in the section of a website, invisible to the user but crawled and indexed by the bots.

If the meta tags of a specific page are blocked from indexing; the page will be crawled by the spiders, however, will not be indexed.

The Nofollow parameter informs the search engines to index the page but not to follow the links available on the page.

It is important for the webmaster to provide the right attributes to the bots to seamlessly crawl the website.

Increase the word count of your blog articles to 2500 words and above.

Whenever you wish to rank for an informative question such as “how to improve Shopify SEO,” then make sure that you have at least 2500 words in your page content.

Snapagency did a survey and found that blog posts that have a word count of 2500 words or more received the maximum social shares.

Similarly, pages having a word count between 2200-2500 words received the most organic traffic.

Hence, the best blog post length that you should always aim for is 2500 words.

Simply find out all the informative search queries based on Google Micro Moments that will help the customer to reach the end of the funnel.

After that, create content that provides actionable and interactive text to help the user find answers to their problems or confusions.

Set up redirects properly

If you are moving or changing the URL of individual pages, you can make use of a 301 redirect to point users to the new page. A 301 redirect not only helps users but bots as well.

So if you are implementing redirects, ensure that they are done correctly, without slowing down your site performance.

Too many redirects can create unnecessary waste of time, which eats into your crawl budget.

Create a HTML sitemap to complement your XML sitemap

An HTML sitemap is like an XML sitemap. Both do the same thing to improve your website’s crawlability.

An HTML sitemap is a static web page that also lists out every page on your website. So when a crawler finds this page, they will also see all the other pages of your site.

A great way to implement this on a blog website is to have a page that consolidates all the recent posts of your blog.

Update your sitemap regularly

Having an outdated sitemap with a ton of redundant links can create errors and slow down the crawling speed significantly.

Remove any duplicate pages or non-working links from your sitemap, keep your sitemap size to 50,000 links and below.

Also, use Google’s Search Console regularly to help you check for any sitemap errors.

Improve your server

By now you should know that your website loading speed directly correlates to the crawlability of your website.

On rare occasions, your web host could be the main reason your website loading speed is getting bottlenecked.

There are usually two reasons for slow website loading speed.

The first reason is due to server-related issues: Your site may be slow for a very simple reason – the current channel width is no longer sufficient. You can check the width of your channel in the description of your plan.

The second reason is front-end factors, where your website is running on code that isn’t optimised for your users.

If it contains bulky scripts and plug-ins, your site is at risk. Do not forget to optimise images, videos and other content, so they do not add to the page’s slowness.

Use Google’s Pagespeed Insights to determine if your website is loading fast enough. If it isn’t, contact your web host to see if you can remedy the situation together.

Otherwise, you should consider migrating your website to a faster host.

…And that’s all there is to it

At the end of the day, SEO boils down to these three things: creating great content, promoting that content to generate links, and making sure that Google and other search engines can find it.

Web crawlability is about making your website more accessible for Google. If you follow these tips and carry out the best practices, you will achieve better results.

About Murray Dare

Murray Dare is a Marketing Consultant, Strategist and Director at Dare Media. Murray helps UK businesses find better ways to connect with their audiences through targeted content marketing strategies.