Understanding Duplicate Content
- Posted on: Dec 12 2012
Duplicate content refers to content on a website that is similar to content on another page within the same domain or on a completely different domain. Duplicate content penalties were originally intended to address spammer websites that would scrape unique content from other domains, scramble the wording (in hopes of tricking the search engines) and publish the spun content on their own domains.
Understanding how duplicate content will affect your website’s search engine visibility first requires an understanding of how Google determines duplicate content. Search engines look at relatively small phrase segments, 4-6 words in length, across webpages. If the search engine spider finds the same phrase segments on two different pages or two different domains, then it concludes that the content is duplicate.
In some cases a domain will purposely have multiple pages displaying the same content. When this happens the search engine must select which page to display in the search results and which to exclude from the search index. If you have pages that are duplicated you can add a canonical URL tag to the page you want Google to crawl. Then make sure all the links you build go to the crawled URL.
Search engines use duplicate content filters to remove seemingly duplicate pages from the search results. Google specifically uses a concept called QDD (Query Deserves Diversity) to provide better search results to end users. Query deserves diversity is the idea that people want diverse search results for a given search query.
There are two approaches for websmasters when it comes to duplicate content. One, webmasters need to search for snippets of their unique content to ensure that other sites haven’t scraped the site content. Two, make sure that the content living on your own domain(s) doesn’t look like duplicate content to Google.
There are free tools to search for duplicate content. Webmasters can also periodically search for phrases from their unique content. If pages from other domains show up in the search results page you could have a scraping situation on your hands.
Duplicate content penalties are rarely applied, usually only in extreme cases of web spam. However, search engines do devalue a domain if there are multiple cases of duplicate content across the website.
Consequences of Duplicate Content for Non-Web Spam Websites
Because the crawl budget will be expended from the search engine crawling duplicate content pages, fewer “good” pages within the domain will be crawled, lowering the overall domain authority.
Links to duplicate content are a waste of link juice. It would be better to have all the links go to one page on the site, making that page stronger with more available PageRank to share with the rest of the site. Dispersing the link juice across multiple duplicate pages is a missed opportunity.
You should review your content to avoid unintended penalties.