What is duplicate content and why is it a problem for your website? Better still, how can you find it and fix it?
In this week's episode of Whiteboard Friday, Moz Learn Team Specialist Meghan goes over some useful (and starving) analogies to help you answer those questions!
Click the whiteboard image above to open a high resolution version in a new tab!
Hey Moz fans. Welcome to another edition of Whiteboard Friday. I'm Meghan and I'm part of the Learn team here at Moz. Today we're going to talk a little bit about duplicate content.
So why are we talking about duplicate content?
Well, this is a pretty common problem, and it can often be a little confusing. What is it? How is it determined? Why are certain pages on my website being marked as duplicates of each other? And most importantly, if I discover that this is something I want to address on my website, how do I fix it?
What is duplicate content?
So what is, first of all, duplicate content?
Essentially, duplicate content is content that appears in multiple places on the Internet. However, this may not be as cut and dry as it appears. Contents that are too similar, even if they are not identical, can be viewed as duplicates of each other.
When thinking about duplicate content, it's important to remember that it's not just about what human visitors see when they go to your website and compare two pages. It's also about what search engines and crawlers see when they access these pages. Since they cannot see the rendered page, they usually leave the page's source code. If this code is too similar, the crawler might think that it is looking at two versions of the same page.
Imagine walking into a bakery and standing in front of you are two cupcakes that look almost identical. You have no signs. How do you know which one you want? This is what happens when a search engine comes across two pages that are too similar.
This mix-up between content can lead to ranking problems as search engines may not be able to figure out which page to rate, or they may rate the wrong page. Within the Moz tools we have a 90% threshold for duplicate content. This means that all pages with at least 90% the same code will be marked as duplicates of each other.
Now that we have briefly explained what duplicate content is, what do we do? There are several ways to resolve duplicate content.
First, there is the ability to implement 301 redirects. This option is similar to a VHS copy of a movie, which may no longer be as relevant.
So you want to make sure people get the digital version that is streamed online. On your site, you can redirect older versions of pages to new, updated versions. This is relevant for issues with changes to subdomains or logs, as well as for content updates where you no longer want users to be able to access that older content.
Rel = canon
Next you can implement rel = canonicals on your side. For example, suppose you are at a cake shop and have two types of biscuits with you: sugar and chocolate chips. They consider your sugar cookies top notch. When people ask you which to try, point them at the sugar cookies, though they still have the option to try the chocolate chip.
On your website, this is equivalent to selling two items of different colors. You want human visitors to see and access both colors, but you would use a canonical tag to let crawlers know which page is more relevant to ranking.
You also have the option to mark pages as Meta-No-Index.
For example, you might have two editions of your favorite book. You will read and reference this second edition because it is the latest and most relevant. However, you still want to read the first issue and have access to it if you need to. Meta noindex tags tell the crawler that it can still crawl this duplicate page but should not include it in its index. This can be useful for problems with duplicate content due to pagination.
But what if you have two pages that are really not duplicates of each other? They deal with different topics and should be treated as separate content. In this case, you can add more content to each of these pages so that the crawler becomes less confused.
This would allow them to stand out from each other, and it would be similar to saying that one adds sprinkles and a cherry to one cupcake and maybe a different frosting color to the other.
Use Moz Pro to identify and resolve duplicate content
If you ever need help determining which pages of your website may be considered duplicates of each other, Moz Pro Site Crawl and On-Demand Crawl can come in handy.
In these two tools we mark which pages are considered duplicates of each other and you can even export this data to CSV for analysis outside of the tool. Just a little pro tip here. When this data is CSV exported, the duplicate content group tells you which pages are considered duplicates of each other.
Therefore, all pages with the same duplicate content group number are part of the same duplicate page group. This is by no means an exhaustive list of ways you can resolve duplicate content, but I hope it helps you head in the right direction when it comes to addressing this issue. If you are interested in learning more about SEO basics and strategies, be sure to check out the SEO Essentials certification offered by Moz Academy.
Thank you for watching.
Video transcription from Speechpad.com