Do you want to ensure that search engines access your content properly and rank high?

In his SEJ eSummit session, Bartosz Góralewicz presented how Google renders websites on a large scale and gave insights based on Google's patents and documentation.

Here is a summary of his presentation.

JavaScript code

The problem with JavaScript

Góralewicz and his team found that 40% of the JavaScript-based content is not indexed after 14 days.

It gets worse.

Ten percent of the URLs within an average domain are not indexed by Google and are unique, indexable URLs.

This is something to be aware of, especially as these trends can change and worsen over time.

In 2015, Google claimed they were good at rendering:

"(A) As long as you don't prevent Googlebot from crawling your JavaScript or CSS files, we can generally render and understand your web pages like modern browsers."

advertising

Read below

Since 2017, Góralewicz and his team have created many other experiments, including camouflage experiments with JavaScript and others that have found problems crawling and indexing JavaScript-based websites.

Which JavaScript framework is crawlable and indexable? @bart_goralewicz # SMS2017 pic.twitter.com/3beH9dCj14

– Aleyda Solis (@aleyda), May 2, 2017

In the same year, Google started talking openly about JavaScript SEO.

Although we have Google's Martin Splitt today who was incredibly helpful to the SEO community, there are still questions to be answered.

In November 2019, Splitt announced at the Chrome Developer Summit that the median for Google rendering had improved from up to one week the previous year to just five seconds in 2019.

Góralewicz and the additional investigation by the Onely team showed, however, that "the average render delay for new websites is practically nonexistent and the delay in indexing JavaScript content is still very large."

Many JavaScript-based websites are not indexed and no longer rank even after two weeks.

advertising

Read below

They also discovered that:

  • There are hardly any big brands in the Google index.
  • Indexing HTML is not as easy as expected.
  • Indexing trends fluctuate during Google updates.
  • They can be excluded from the Google index.

Currently, one of the challenges in diagnosing indexing loss is that the site command is unreliable and can return many false negatives.

Site command

Entry into the Google index: a big SEO challenge

The inclusion of your content in the Google index is an absolute basis for your online presence – and still remains a major SEO challenge.

And this problem is compounded by Google's limited resources, because they cannot render and index the entire World Wide Web, especially given the cost now associated with many modern websites.

Check out just a few of the biggest brands with significant indexing issues.

Percentage of URLs not indexed

If Google doesn't index your web pages, all other SEO activities don't matter.

It's good that both SEOs and Googler are starting the conversation about indexing problems and that we have better data sources to validate them.

And most indexing problems can actually be solved through technical SEO.

Here's how.

Batch-optimized rendering: how it works

Google views your website from a perspective of stack-optimized rendering and fetch architecture (BOR).

If you look at these views side by side, what Google sees is different from what users see in a browser.

Batch-optimized rendering

How does BOR work?

advertising

Read below

Step 1: BOR skips all resources that are not essential to preview your page

The first step in a batch-optimized rendering and retrieval architecture is to remove all the resources that Google doesn't need to preview or lay out your website.

This contains:

  • Tracking scripts (Google Analytics, Hotjar, etc.)
  • Show
  • photos

By simply removing these additional resources, up to 50% of the loading, scripting and rendering time can be saved. This saves a lot of resources on Google.

Step 2: Set the value of a virtual clock

In the second step from Google, the value of the virtual clock is determined (which we will discuss in more detail below).

Step 3: The layout of the website is generated

As soon as the time on this virtual clock has expired, the layout of the website is generated.

advertising

Read below

There are two key concepts to consider:

  • The virtual clock.
  • The layout.

What is a virtual clock?

Virtual Clock measures the cost of rendering a website.

It's sort of a Google page rendering budget, and websites are allocated a little of the "budget".

If rendering is stopped to retrieve resources (e.g. scripts, CSS files, image dimensions, etc.), this virtual clock is no longer developed. It only progresses if we actually render.

This means that you need more "virtual time" on the virtual clock if your website has a lot of CSS, JavaScript or other resources.

However, there is no guarantee of how much of this virtual time you can get.

Although we don't know what the limit is (and we may never know), we can find out how resource-hungry our website is.

advertising

Read below

With Chrome DevTools, you can slow down your CPU and see how it affects scripting and rendering.

Let's take the H&M website as an example

It increased the time up to 25 times.

We can see how H&M struggles with rendering and indexing.

Chrom DevTools - Slow CPU

How to measure the "virtual clock load" of your website

Góralewicz recommends two options for measuring your "virtual clock load".

advertising

Read below

The layout of your page

When the virtual clock time runs out, the layout is generated regardless of whether the rendering is halfway completed or not.

This leads to many potential challenges.

Most importantly, JavaScript SEO ends here and SEO rendering starts.

When rendering, there is a lot of emphasis on how the layout affects this whole idea.

Content location matters

We already know that text above the fold is more important than text below the line.

It turns out that this also affects how Google crawls this content.

The "Scheduling Resource Crawls" patent granted by Google in 2011 shows how the search engine views different sections of the website and the links in these sections with different priorities.

This shows that Javascript SEO is just the tip of the iceberg. It's just a matter of whether Google can see our content.

The rendering goes far beyond that.

Render SEO manifest: Why we have to go beyond JavaScript SEO

It's a much broader topic because apart from the fact that Google only sees the content, we're now interested in:

advertising

Read below

The layout of the page.

  • The importance of content based on text size, placement, etc.
  • Internal and external link extraction.
  • Entry change rates.
  • Other factors related to how a website is rendered and what it looks like, including photos.

Batch rendering vs. photos

Google's rendering service uses fake images. Here's an example of how that works.

Example of rendering image batches

What about links?

The value of links depends on their position and attributes.

We've known this for a while, but it gets more interesting as we look at more patterns from Google.

Connection position

The position of the link within the page is important.

advertising

Read below

This affects how Google crawls this link and what type of "rating" Google assigns to this link.

In addition, some sections of your page gain more publicity and link authority than others.

More top stories - CNN

This case from cnn.com is, for example, an exact quote from a Google patent that evaluates documents based on user behavior and / or functional data.

"(…) A link that is positioned on the cnn.com website under the heading" More Top Stories "is highly likely to be selected."

We can see that Google uses heuristics to select more important internal links.

Which sections are not indexed?

Góralewicz and his team found through nine months of research that Google uses very similar heuristics to choose which parts of a website to render and which to skip.

advertising

Read below

To diagnose partial indexing, the Onely team examined popular websites to determine which parts of a particular layout are indexed and which are not.

They found that Google apparently ignored some parts of the website more eagerly than others.

For example, Google seems to have problems rendering sections to "related items" and "you may also be interested".

Google will most likely index your main content.

But… There is a good chance that your website will skip a part of your page that is not as important as the main content after Google tries to understand this layout.

Google mentioned that they interrupt the script when they're heavy, but we didn't know what that meant until now.

Partial indexing: key results

You may think that partial indexing is not so important for a problem.

  • If you index your main content first, we can assume that this is a smart decision from Google.
  • This means that they often ignore parts of your layout.
  • This can cause problems with general indexing and crawling.
  • And we return to the problem that after 14 days, around 40% of the JavaScript content is not indexed.

advertising

Read below

However, this leads to an even bigger problem: after 14 days, 10% of the URLs are not indexed.

This goes far beyond JavaScript SEO because rendering is done with and without JavaScript.

JavaScript is not the main reason for rendering.

If we know what we know now, should we still call it JavaScript SEO?

Takeaways

At the end of his presentation, Góralewicz shared the following findings:

  • Rendering SEO and indexing will be one of the hottest SEO trends. Soon.
  • If you are not indexed, all other SEO activities that you perform do not matter.
  • Indexing is something you can see and measure. It drives sales. Directly.
  • For the first time in the history of search engine optimization, we have a good understanding of how rendering and indexing work. So let's make good use of this.

Watch this presentation

You can now see the full presentation of Góralewicz from SEJ eSummit on June 2nd.

More resources:

advertising

Read below

Image credits

Selected picture: Paulo Bobita
All screenshots from the author, July 2020

LEAVE A REPLY

Please enter your comment!
Please enter your name here