Too often we assume that SEO best practices work against any competition in any industry. However, most best practices go untested and may not be “best” in every situation.

We all know that tactics that worked in 2020 don't necessarily have to move the needle in 2021 as Core Web Vitals (CWV) and other signals mix forward. We have to do better for our companies and our customers.

I'm a data nerd at heart with a lot of battle scars from 15 years of SEO. The idea of ​​analyzing thousands of local SERPs sounded like too much fun to pass up. I found some surprising correlations, and just as importantly, I built a methodology and data set that can be updated quarterly to reflect changes over time.

I analyzed 50,000+ SERPs in retail banking to understand the massive shifts in rankings and search behavior during the embargo period. We have a lot of historical data for banking websites so comparing COVID data before and after your purchase would be easier than starting over.

I'll explain how I did it below. But first I want to share WHY I think sharing this type of research is so important to the SEO community.

Why Validate SEO Best Practices With Data?

It's a great time to get started with SEO. We have amazing tools and can collect more data than ever before. We have thriving communities and excellent basic training materials.

However, we often see our craft distilled into oversimplified "best practices" that are believed to apply across the board. However, if there is one universal truth in search engine optimization, there are no universal truths. A best practice can be misinterpreted or out of date, resulting in opportunities being missed or a business being completely damaged.

Using the example of the increasing importance of CWV, SEOs have the opportunity (and obligation) to separate fact from fiction. We need to know if and how much CWV affects the ranking over time so that we can prioritize our efforts.

We can improve our SEO game individually and collectively by testing and validating best practices with research. It just takes a curious mind, the right tools, and a willingness to accept the results rather than forcing a narrative.

Failure to validate best practices is an obligation for SEO practitioners and shows that they are unwilling to question assumptions. In my experience, a lack of data can make the opinions of high-level stakeholders weigh more than the recommendations of an SEO expert.

First, ask the right questions

Real insight comes from combining data from multiple sources to answer critical questions and ensure your strategies are backed by valid data. In my analysis of the local banks, I first listed the questions I wanted the answers to:

  • What characteristics do high-ranking local bank websites share?
  • Who are banks actually competing against in the SERPs? Is it mainly other banks?
  • How do competitive SERPS change based on when / where / how users search?
  • How can smaller local companies outside their region gain an advantage over larger competitors?
  • How does SERP composition affect a bank's ability to rank well on targeted keywords?
  • How important are core web vitals (CWV) for rankings? How does this change over time?

You can do the same analysis by replacing “banks” with other local business categories. The list of possible questions is endless so you can customize it to suit your needs.

Here's an important reminder: be ready to accept the answers, even if they are inconclusive or contradict your assumptions. Data-driven SEOs need to avoid confirmation bias if we are to stay objective.

I analyzed 50,000 search results in just a few hours

I combined three of my favorite tools to analyze SERPs at a large scale and collect the data needed to answer my questions:

  • STAT on generated ranking reports for selected keywords
  • Screaming Frog to crawl websites and collect technical SEO data
  • Power BI to analyze the large amounts of data and create simple visualizations

Step 1: determine your data needs

I used data from the US Census Bureau to identify all cities with populations greater than 100,000 because I wanted a representation of the SERPs of local banks across the country. There were 314 different cities in my list, but you can customize your list to suit your needs.

I also wanted to collect data for desktop and mobile search to compare the SERP differences between device types.

Step 2: Identify Your Keywords

I picked "banks near me" and "banks in {city, st}" because of their strong local intent and high search volume versus more specific banking services keywords.

Step 3: Generate a STAT import file in CSV format

Once you have your keywords and market list, it's time to prepare for the STAT bulk upload. Use the template provided in the link to create a CSV file with the following fields:

  • Project: The name of the new STAT project or an existing project.
  • Folder: The name of the new folder or an existing folder. (This is an optional column that you can leave blank.)
  • Site: The domain name for the site you want to track. Note that for our purposes, you can enter any URL here that you want to track. The Top 20 report includes all of the ranking URLs for the target keywords, even if they are not listed in your "Site" column.
  • Keyword: The search query you are adding.
  • Tags: Enter as many keyword tags as you want, separated by commas. I used "city" and "near me" as tags to differentiate between query types. (This is an optional column that you can leave blank.)
  • Market: Enter the market (country and language) in which you want to track the keyword. I used "US-en" for US English.
  • Location: If you want to track the keyword in a specific location, please provide the city, state, province, zip code, and / or zip code. I used the city and state list in the format "city, street".
  • Device: Choose whether you want to get desktop or smartphone results. I chose both.

Each market, location, and device type multiplies the number of keywords you need to track. In the end, I had 1,256 keywords (314 markets X 2 keywords X 2 devices) in my import file.

Once your file is complete, you can import it into STAT and start tracking.

Step 4: Run a top 20 report for all keywords in STAT

The Google SERP Top 20 Comparative Report built into STAT captures the top 20 organic results of each SERP at different intervals (daily, weekly, monthly, etc.) to examine changes over time. I didn't need daily data so I just let it run for two consecutive days and removed the data I didn't need. I run the same report quarterly to track changes over time.

Check out the video below to learn how to set up this report!

My 1,256 keywords were generating over 25,000 rows of data per day. Each line is a different organic listing and contains the keyword, monthly search volume, rank (including the local package), base rank (excluding the local package), the https / http protocol of the ranking url, the ranking url and Your keywords.

Here is an example of the raw output in CSV format:

It's easy to see how useful this data is on its own, but it gets even more powerful when we clean it up and start crawling the ranking URLs.

Step 5: clean up and normalize your STAT URL data

At this point, you may have spent 1-2 hours collecting the original data. This step is a little more time consuming, but data cleansing allows you to do more advanced analysis and gain more useful insights in Screaming Frog.

Here are the changes I made to the STAT ranking data to prepare for the next steps in Screaming Frog and Power BI. At the end there are several columns with URLs available. Each one serves a purpose later.

  1. Duplicate the "Ranking URL" column into a new column called "Normalized URL".
  2. Remove URL parameters from the Normalized URL fields using Excel's text-to-column tool and followed by "?" Separate. I deleted the new columns with the URL parameters as they were not helpful for my analysis.
  3. Duplicate the new, clean "Normalized URL" column into a new column called "TLD". Use the text-to-column tool in the TLD column, separating it with "/" to remove everything but the domain name and subdomains. Delete the new columns. I chose to keep the subdomains in my TLD column, but you can remove them if that helps your analysis.
  4. Finally, create another column called "Full URL" that will eventually become the list of URLs that you will be crawling in "Screaming Frog". To generate the full URL, simply use Excel's concatenation function to combine the "Protocol" and "Normalized URL" columns. Your formula looks something like this: = concatenate (A1, ": //", C1) to include the ": //" in a valid URL string.

The 25,000+ rows in my data set are within the limits of Excel, so I can easily edit the data in one place. You may need to use a database (I like BigQuery) as your records grow.

Step 6: Categorize Your SERP Results by Website Type

As you scroll through the SERP results, it's easy to see that banks aren't the only type of website that ranks for keywords with local search intent. Since one of my first questions was SERP composition, I had to identify all types of websites and label each one for further analysis.

This step is by far the most time consuming and insightful. I spent 3 hours breaking the first batch of 25,000+ URLs into one of the following categories:

  • Institution (websites of banks and credit unions)
  • Directory (aggregators, local corporate directories, etc.)
  • Reviews (local and national websites like Yelp.com)
  • Education (content about banks on .edu domains)
  • Government (content about banks on .gov domains and municipal websites)
  • Jobs (career pages and job aggregators)
  • News (local and national news sites with bank content)
  • Food banks (yes, many food banks have keywords for "banks near me")
  • Real estate (commercial and private real estate ads)
  • Search engines (classified content of a search engine)
  • Social Media (ranking content on social media websites)
  • Others (completely random results not related to any of the above results)

Your local SERPs will likely have many of these website types and other unrelated categories like food banks. Speed ​​up the process by sorting and filtering your TLD and Normalized URL columns to categorize multiple rows at once. For example, all yelp.com rankings can be categorized as "ratings" with a quick copy / paste.

At this point, your ranking record is complete and you can start crawling the top ranking websites in your industry to see what they have in common.

Step 7: crawl your target websites with Screaming Frog

My initial STAT data identified over 6,600 unique pages from local banking websites that ranked in the top 20 organic search results. There are far too many pages to evaluate manually. Enter Screaming Frog, a crawler that mimics Google's web crawler and extracts tons of SEO data from websites.

I configured Screaming Frog to crawl each of the 6,600 ranking pages for a more complete analysis of the characteristics shared by high ranking banking websites. Don't just let go of SF, though. Make sure you configure it correctly to save time and avoid crawling unnecessary pages.

These settings ensure that we receive all the information we need to answer our questions in a crawl:

List mode: Insert a de-duplicated complete URL list from your STAT data. In my case, that was more than 6,600 URLs.

Database storage mode: It may be a little slower than the memory (RAM). However, saving your crawl results to your hard drive will ensure that if you make a mistake (as I have often done) and close your report before you are done analyzing the data, you will not lose your results.

Limit crawl depth: Set this value to 0 (zero) so that the spider only crawls the URLs in your list without following internal links to other pages in these domains.

APIs: I highly recommend using the Pagespeed Insights integration to pull Lighthouse speed metrics right into your crawl data. If you have a Moz account with API access, you can also use the built-in integration to get link and domain data from the Moz API.

Once you've configured the spider, let it rip! This can take a few minutes to several hours, depending on the number of URLs crawled and your computer's speed and memory limits. Just be patient! You can try performing larger crawls overnight or on an additional computer to avoid blocking your primary computer.

Step 8: Export your Screaming Frog Crawl Data to Excel

Saving your crawl data in Excel is remarkably easy.

Step 9: Connect your datasets in Power BI

At this point, you should have two sources of data in Excel: one for your STAT ranking data and one for your Screaming Frog crawling data. Our goal is to combine the two data sources to see how organic search rank can be influenced by on-page SEO elements and website performance. To do this, we must first merge the data.

If you have access to a Windows PC, the free version of Power BI is powerful enough to get you started. First, use the Get Data Wizard to load your two data sources into a new project.

Once your records are loaded, it's time to do the magic by creating relationships in your data to unlock correlations between rankings and site characteristics. To combine your data in Power BI, create a many-to-many relationship between your STAT Full URL and Screaming Frog Original URL fields.

If you are new to BI tools and data visualization, don't worry! There are plenty of helpful tutorials and videos just a quick search away. At this point it is really difficult to break anything and there are many ways you can experiment with analyzing your data and sharing insights with many types of charts and graphs.

I should note that Power BI is my data visualization tool of choice, but you may be able to use Tableau or an equally powerful tool. Google Data Studio was not an option for this analysis because it only allows outside left links to multiple data sources and does not support "many-to-many" relationships. It's a technical way of saying that Data Studio isn't flexible enough to create the data relationships we need.

Step 10: analyze and visualize!

With the integrated visualizations of Power BI you can quickly summarize and present data. Here we can start analyzing the data to answer the questions we asked earlier.

Results – what did we learn?

Here are some examples of the insights gained from merging our rankings and crawl data. Spoiler alert – CWV doesn't have a strong impact on organic rankings.

Who are banks actually competing against in the SERPs? Is it mainly other banks?

On desktops, about 67% of organic search results belong to financial institutions (banks and credit unions), with stiff competition from review sites (7%) and online directories (22%). This information helps shape our SEO strategies for banks by pointing out ways to monitor and maintain listings in relevant directories and review sites.

Okay, now let's mix up our data sources to see how the website category distribution changes based on rank on desktop devices. All of a sudden, we can see that financial institutions actually hold the majority of the top 3 scores, while review sites and directories are more common in positions 4 through 10.

How important are core web vitals (CWV) for rankings? How does this change over time?

Website performance and website speed are important issues in search engine optimization and will only gain importance when CWV becomes a ranking signal this May. We can start to understand the relationships between site speed and rankings by comparing STAT rankings and Pagespeed Insights data from Screaming Frog reports.

As of January 2021, websites with higher Lighthouse performance scores (i.e., they load faster) tend to be better than websites with lower scores. This could help justify investing in the speed and performance of the website.

Some CWV elements correlate more closely with better rankings, while others are more scattered. This does not mean that CWV is not important or useful, but rather a starting point for further analysis after May.

So what? What can we learn from this type of analysis?

Regardless, STAT and Screaming Frog are incredibly powerful SEO tools. The data they provide is useful if you happen to be an SEO, but the ability to pull data together and extract relationships multiplies your worth in any organization that values ​​data and acts on insights.

In addition to validating generally accepted SEO skills with data ("faster websites are rewarded with better rankings"), better use of relational data can also help us avoid wasting valuable time on less important tactics ("improving cumulative layout shifting at all costs ! "). .

Correlation does not, of course, imply causality, and aggregated data does not guarantee a result for individual locations. However, if you are a bank marketer responsible for customer acquisition through organic channels, you need to provide this type of data to your stakeholders to justify increased investment in SEO.

I hope others will go further by sharing the tools and methods by building their additional insights and bringing it to the SEO community. What other datasets can we combine to deepen our understanding of SERPs on a larger scale? Let me know your thoughts in the comments!

LEAVE A REPLY

Please enter your comment!
Please enter your name here