"Do you want to spend money on ads or get rid of that black box?"

This (rough) question helped shape my career path more than 10 years ago to become the SEO I am today.

I chose this path because I love working on challenges and looking under the hood for the causes.

I try to solve the answer to life, the universe and everything that has been given with the help of Google Deep throat as 42 and checking that I had the right question (spoiler: it's six times nine) excites me about SEO.

And what got me to work on this article was a great discussion in Jeff Ferguson's post on whether we had the math to decode the Google algorithm, and if so what did the industry need?

The two things that are needed

To those who know me, it won't surprise you that I am against the view that even basic correlation analysis using Spearman's coefficient is enough to analyze the Google algorithm.


Read on below

Since my SMX East presentation in 2011, I have publicly advocated using multilinear regressions as a minimum for analyzing key points.

Other advanced statistical methods, be it machine learning or neural networks, play a role.

In this article, however, I will focus on regressions.

An important limitation in using statistical methods is that a tool, on its own or in the end, cannot be classified as a good study.

This is where the right data analysis skills with SEO experience come into play.

As can be seen repeatedly in COVID-19 analyzes, it is not enough just to have a data analyst background to claim that challenges can be solved in a medium or Twitter post through epidemiology experts.

And while some seem to help provide valuable ideas to share, the vast majority proceed with humility without much caution, which can spread misinformation.

Do I need to remind the industry what happens when SEO misinformation gets into the news from non-search experts?


Read on below

The "I'm not a statistician, but …"

OK, what gives me the right to point in the direction of advanced statistics for the studies?

A Masters in International Relations with a major in International Economics where I studied econometrics and had the pleasure of tearing apart econometric papers on China's economy.

There's a reason I tear up SEO correlation studies on Twitter when they come out.

So why regressions?

First and foremost, it is no longer a matter of analyzing a single measure in isolation.

Instead, there are several metrics that can also interact with each other to influence the ranking.

This requires the use of a multi-linear regression on this point alone.

Additionally, moving away from focusing on individual metrics and instead of talking about the various factors is causing SEOs to think more deeply about a broad set of metrics that they need to work on to improve rankings.

On the flip side, this prioritizes the work as 1,000 metrics may seem daunting. However, if 900+ barely move the needle 0.1%, knowing which one to work on will speed up optimization tasks.

Additionally, using time series with regression analysis (where the factors are analyzed over a set period of time rather than at a specific point in time) can help smooth out the daily or weekly changes to focus on the core areas while also providing insight into grant these areas Which major algorithm updates have moved?

And for agencies looking to gain credibility, check out the science areas for how they do regression analysis in complicated areas. For example:

And while it rarely happens, there are also specific submissions for SEO research.

Good analytical skills are important

Logically, giving someone a tool without the proper training doesn't mean that it will automatically lead to good results.

Because of this, the advanced statistical tool is complemented by the right curious mindset, ready to go deep (like a power user) and broadcast the data via the alarm clock.


Read on below

This mindset will work to determine:

  • What data to collect.
  • What has directional quality.
  • What to remove before even starting any analysis.

This is a basic standard that requires some SEO experience, especially to see in advance what metrics may be the underlying cause and how to avoid bias in terms of demographics, seasonality, buyer intent, etc.

And that SEO experience also means that the analysis has a better chance of incorporating worthwhile interaction effects into the analysis, especially when an isolated optimization may not be considered spam unless it is done in conjunction with other tactics. (For example, white text in a large paragraph on a white background without the user seeing it.)

Knowing that Google doesn't use a single monolithic algorithm, all analyzes must include categories or groups, such as: B .:

  • Keyword intent.
  • Search volume.
  • Ranking positions.
  • Industry sectors.
  • Etc.


Read on below

Even more so, you should check the scatter plot of the data to make sure there are no problems like:

  • Heteroscedasticity: data that fan out due to unequal variability.
  • Simpson's Paradox: Two different populations show the same trend which together gives the opposite trend.

Scatter plots or whisker plots are therefore a must in these analyzes to show that the study avoided common statistical problems.

With the results, providing a standard format for regression results helps those with a statistical background to quickly and easily review the conclusions without having to run the regression separately to verify the claim to the results.

Since a crucial part of any statistical study, and a common failure in the course of many publicly advertised SEO studies, is that the interpretations are far from reasonable.

Too often, the gullible claims are used as linkbait rather than clarifying for the SEO community.

I often ask myself when engaging in these studies:


Read on below

  • Does the dataset rule out potential outliers like Wikipedia or Amazon?
  • How does the study deal with endogeneity when ranking affects click-through rate, when the claim is that click-through rate affects ranking?
  • Has a fantastic claim that direct traffic affects the ranking the extraordinary evidence of this?
  • Why are rankings displayed on the X-axis? Okay, the last one is more annoying my pet.

And this is where peer review comes in.

It is one thing to check for inaccuracies in your work.

Peer Review takes it to another level by helping to find blind spots, question assumptions made, improve study quality, and determine the suitability of the work for the trust of the larger SEO community.

All at the same time?

In an ideal world, yes!

In reality, it will likely take a few steps (and missteps) to get there.

And I and many statistical-thinking SEOs don't want to follow a single example.

To generate model ideas, take a look at:


Read on below

Check out Hulya Coban's article how to write a regression study and run a linear regression model using Python.

This is where the SEO industry has to go if we really want to understand what's going on in the Google algorithm, build a solid foundation of trust in the studies, and stop the disinformation out there.

What about this study that …

OK, it depends.

More specifically, there are acceptable exceptions and some key Russ Jones counterpoints that should be considered when correlation studies and software metrics have any value.

Description-six-most commonly used statistical

I don't mind using correlation studies privately to create an internal business use case.

Got it.

Time is precious in the business world. So use what you can and own it when it fails.


Read on below

In the public eye, the few worthwhile studies have been carefully thought through, using the right analytical framework with the right written care, or focusing on year-over-year changes in Google's SERPs.

And articles highlighting methods with data transparency require well-deserved praise for their openness.

Regardless, there are SEO live test studies with tools like SearchPilot.

These are more mathematically structured and I've worked with developers to create them in-house and reported publicly on their value since 2011.

So the works of these studies, from using PPC titles for SEO to the experiments done on Pinterest, is a good stepping stone when you have the immense amount of traffic that is required.

Let's move up

That being said, the advanced statistical methods and solid data analysis skills with SEO experience are a must for what the industry needs to achieve.

And there are enough Statistical Thinking SEOs ready to help, review, and make suggestions for the studies to be authoritative.


Read on below

Yes, there is a lot of heavy criticism on Twitter threads from these SEOs when a new study comes out, but it is a good concern of the industry reputation to avoid misinterpreting the point of a study, resulting in a bad one SEO and the Desire of Others Learn how to better analyze a complicated system.

And while a multi-linear regression model is not perfect given the need to rely on historical data and the maintenance over time that can otherwise bias results, it is still a step in the right direction for them SEO industry more statistically minded.

In a nutshell…

If you have the immense amount of data (as well as time and resources) it takes to do this correctly and you want to become the first SEO agency, consultant, etc. in the industry to do it, here's what you need to do :

  • An advanced statistical model like multi-linear regressions.
  • A curious attitude with SEO experience.
  • A large number of metrics that are reduced in size by those with directional quality.
  • Interaction metrics.
  • Groups and categories of data.
  • A period of more than a week.
  • Endogeneity, heteroscedasticity and other prejudices checked.
  • Any data outliers have been removed.
  • Methodology explained.
  • Working with scatterplots and regression data formats.
  • Claims substantiated with sufficient evidence.
  • Data and analyzes reviewed by experts.


Read on below

More resources:


Please enter your comment!
Please enter your name here