Do you want to ensure that search engines access your content properly and rank high?
In his SEJ eSummit session, Bartosz Góralewicz presented how Google renders websites on a large scale and gave insights based on Google's patents and documentation.
Here is a summary of his presentation.
It gets worse.
Ten percent of the URLs within an average domain are not indexed by Google and are unique, indexable URLs.
This is something to be aware of, especially as these trends can change and worsen over time.
In 2015, Google claimed they were good at rendering:
– Aleyda Solis (@aleyda), May 2, 2017
Although we have Google's Martin Splitt today who was incredibly helpful to the SEO community, there are still questions to be answered.
In November 2019, Splitt announced at the Chrome Developer Summit that the median for Google rendering had improved from up to one week the previous year to just five seconds in 2019.
They also discovered that:
- There are hardly any big brands in the Google index.
- Indexing HTML is not as easy as expected.
- Indexing trends fluctuate during Google updates.
- They can be excluded from the Google index.
Currently, one of the challenges in diagnosing indexing loss is that the site command is unreliable and can return many false negatives.
Entry into the Google index: a big SEO challenge
The inclusion of your content in the Google index is an absolute basis for your online presence – and still remains a major SEO challenge.
And this problem is compounded by Google's limited resources, because they cannot render and index the entire World Wide Web, especially given the cost now associated with many modern websites.
Check out just a few of the biggest brands with significant indexing issues.
If Google doesn't index your web pages, all other SEO activities don't matter.
It's good that both SEOs and Googler are starting the conversation about indexing problems and that we have better data sources to validate them.
And most indexing problems can actually be solved through technical SEO.
Batch-optimized rendering: how it works
Google views your website from a perspective of stack-optimized rendering and fetch architecture (BOR).
If you look at these views side by side, what Google sees is different from what users see in a browser.
How does BOR work?
Step 1: BOR skips all resources that are not essential to preview your page
The first step in a batch-optimized rendering and retrieval architecture is to remove all the resources that Google doesn't need to preview or lay out your website.
- Tracking scripts (Google Analytics, Hotjar, etc.)
By simply removing these additional resources, up to 50% of the loading, scripting and rendering time can be saved. This saves a lot of resources on Google.
Step 2: Set the value of a virtual clock
In the second step from Google, the value of the virtual clock is determined (which we will discuss in more detail below).
Step 3: The layout of the website is generated
As soon as the time on this virtual clock has expired, the layout of the website is generated.
There are two key concepts to consider:
- The virtual clock.
- The layout.
What is a virtual clock?
Virtual Clock measures the cost of rendering a website.
It's sort of a Google page rendering budget, and websites are allocated a little of the "budget".
If rendering is stopped to retrieve resources (e.g. scripts, CSS files, image dimensions, etc.), this virtual clock is no longer developed. It only progresses if we actually render.
However, there is no guarantee of how much of this virtual time you can get.
Although we don't know what the limit is (and we may never know), we can find out how resource-hungry our website is.
With Chrome DevTools, you can slow down your CPU and see how it affects scripting and rendering.
Let's take the H&M website as an example
It increased the time up to 25 times.
We can see how H&M struggles with rendering and indexing.
How to measure the "virtual clock load" of your website
Góralewicz recommends two options for measuring your "virtual clock load".
The layout of your page
When the virtual clock time runs out, the layout is generated regardless of whether the rendering is halfway completed or not.
This leads to many potential challenges.
When rendering, there is a lot of emphasis on how the layout affects this whole idea.
Content location matters
We already know that text above the fold is more important than text below the line.
It turns out that this also affects how Google crawls this content.
The "Scheduling Resource Crawls" patent granted by Google in 2011 shows how the search engine views different sections of the website and the links in these sections with different priorities.
The rendering goes far beyond that.
It's a much broader topic because apart from the fact that Google only sees the content, we're now interested in:
The layout of the page.
- The importance of content based on text size, placement, etc.
- Internal and external link extraction.
- Entry change rates.
- Other factors related to how a website is rendered and what it looks like, including photos.
Batch rendering vs. photos
Google's rendering service uses fake images. Here's an example of how that works.
What about links?
The value of links depends on their position and attributes.
We've known this for a while, but it gets more interesting as we look at more patterns from Google.
The position of the link within the page is important.
This affects how Google crawls this link and what type of "rating" Google assigns to this link.
In addition, some sections of your page gain more publicity and link authority than others.
This case from cnn.com is, for example, an exact quote from a Google patent that evaluates documents based on user behavior and / or functional data.
"(…) A link that is positioned on the cnn.com website under the heading" More Top Stories "is highly likely to be selected."
We can see that Google uses heuristics to select more important internal links.
Which sections are not indexed?
Góralewicz and his team found through nine months of research that Google uses very similar heuristics to choose which parts of a website to render and which to skip.
To diagnose partial indexing, the Onely team examined popular websites to determine which parts of a particular layout are indexed and which are not.
They found that Google apparently ignored some parts of the website more eagerly than others.
For example, Google seems to have problems rendering sections to "related items" and "you may also be interested".
Google will most likely index your main content.
But… There is a good chance that your website will skip a part of your page that is not as important as the main content after Google tries to understand this layout.
Google mentioned that they interrupt the script when they're heavy, but we didn't know what that meant until now.
Partial indexing: key results
You may think that partial indexing is not so important for a problem.
- If you index your main content first, we can assume that this is a smart decision from Google.
- This means that they often ignore parts of your layout.
- This can cause problems with general indexing and crawling.
However, this leads to an even bigger problem: after 14 days, 10% of the URLs are not indexed.
At the end of his presentation, Góralewicz shared the following findings:
- Rendering SEO and indexing will be one of the hottest SEO trends. Soon.
- If you are not indexed, all other SEO activities that you perform do not matter.
- Indexing is something you can see and measure. It drives sales. Directly.
- For the first time in the history of search engine optimization, we have a good understanding of how rendering and indexing work. So let's make good use of this.
Watch this presentation
You can now see the full presentation of Góralewicz from SEJ eSummit on June 2nd.
Selected picture: Paulo Bobita
All screenshots from the author, July 2020