Duplicate content has been a subject of much recent discussion, not least because of contradictory statements from Google itself. But let’s start from the beginning: In this paper, I want to clarify what duplicate content is, how to avoid it, and, of course, whether it’s really harmful is.
If other domains link to the content of the site, the problem gets bigger. Some link to the content under URL A, others to the same content under URL B. Since only one of Google’s URLs is considered, valuable power is lost. With DC, you can not exhaust all your ranking potential.Duplicate Content (DC) or duplicate content refers to repetitive or similar passages of text or content that can be accessed via various URLs. These can occur within a domain or are the same content on different domains. Search engines claim to show only unique results to their users. When a search engine encounters duplicate content and recognizes it as such, it often displays only one of the results in the SERPs and chooses which of its results to display based on its own criteria. What are these criteria, you read in the following infographic I found on search .engine land .
The reasons why duplicate content comes along are very versatile. Often there are technical errors :
Often product descriptions are copied for simplicity or no individual titles and descriptions are stored in the source code, resulting in duplicate content.
URL parameters provide many useful features, such as tracking traffic sources. For the search engine, different parameters result in different URLs and thus duplicate content.
In some content management systems, there is the option of paginating the comments on different pages. It is created by www.beispiel.de as well as www.beispiel.de/kommentarseite-1/, www.beispiel.de/kommentarseite-2/ etc.
If the article pages of a web site link to the print versions, the Google bot also encounters them and will then detect DC.
To track visitors to a website, you assign sessions. For example, sessions allow the user to add products to the shopping cart. The session IDs serve as a unique identification feature. Since each user is assigned a new session ID, there is a risk of DC.
In fact, the reasons for duplicate content are often within the domain and are due to Webmaster “errors”. But there are also duplicate content, if other websites also put the content online with themselves, possibly even with the permission of the author. However, if you do not link to the original source, Google can not tell who you want to give preference to in the search results.
If the page with and without www before it is callable, it concerns Duplicate content. Often the search engine recognizes despite different spellings that it is the same page. The same goes for http and https. However, it is better not to rely on it, because sometimes the search engine just does not recognize it.
It is best to use only lowercase in the URL. Google also recognizes two different pages from the same case-sensitive URL: www.example.com/examplepage and www.example.com/examplepage.
URL parameters should generally be avoided with regard to duplicate content. The parameters in a URL are interchangeable in their order, that is: www.example.com/?cat=2&id=1 is the same page as www.example.com/?id=1&cat=2. For the search engine, these are two completely different pages.
If you intentionally publish the same content on multiple pages to rank better or generate more traffic, you risk punishing Google. Most DC is created but unintentionally. Google itself says : Only those who use DC as a tool of manipulation, must fear a punishment from Google.
To check if a website is affected by duplicate content , google search . To do this, you copy a sentence into the search field that appears on the website. If you get more than one hit, there is duplicate content that the search engine has not yet filtered out. When the search engine filters out Duplicate Content, it will not appear in the SERPs. The following text from Google, which may appear in the SERPs, suggests duplicate content:
In case this text is displayed, you should also look at the results filtered out to reveal DC. Also, certain parameters are helpful in Google search. The query Site: example.de intitle: “Beispielseite” shows all pages of the domain www.example.de, which contain the keyword “example page” in the title. Another possibility: In the Webmaster Tools you get under Search Appearance> HTML Improvements duplicate title tags.
Internal Duplicate Content continues to be very well detected with the Screaming Frog . Here you can sort your crawl after the “hash”. Duplicate Content is displayed in the Screaming Frog with an identical hash.
If content is moved on a page, which often happens in the context of a relaunch, then you have to pay attention to redirects. For this you can use the status codes 301 and 302 in the htaccess file . 301 is intended for persistent forwarding, continues to inherit the linkjuice and should therefore be brought forward for permanent change. 302 is only recommended for temporary redirects because it does not inherit the linkjuice. For a detailed guide, I recommend this SEO trainee article .
The Canonical Tag refers to a preferred version of the displayed URL in the SERPs. It is therefore particularly used in online shops that display identical products on several pages. The Canonical Tag can also be applied across domains, if another source is cited and you want to show Google, who is the author of the text. The tag is set as the meta tag in the head section of the HTML document: <link href = “http://www.example.com/canonical-version-this-page/” rel = “canonical” />
The internal linking should be uniform. If the canonical page is defined as http://www.example.com, always use the URL version with http: // and www. be linked (eg on http://www.beispiel.de/beispiel.html and not on http://beispiel.de/seite.html).
To identify country-specific content, Google recommends using the Country Code top level domains .de, .co.uk, .fr,
About the Webmaster Tools , you can tell Google how the domain to be indexed. In addition, you can tell Google about the Parameter Handling Tool , how to deal with different URL parameters.
Nevertheless, for products that are similar, individual product descriptions make sense.
Pages tagged with the “noindex, follow” tag will not be indexed by Google.
If it is not possible to set the Canonical tag (eg no access to the head area in the HTML document), you can always link to the original source.
If content for a page is missing, it should not be published at best. Alternatively, the meta-tag “noindex” can be used.
Some content is published automatically by the CMS under different URLs. Bloggers know this: the blog post will also be published on the homepage, in the archive and under the page under which all articles on the same keyword have appeared.