Duplicate Content: Brief Summary
Duplicate content is a term used to describe online content that appears in multiple different places on the internet, and this is usually defined as more than one URL or web page. For a piece of web content to be considered duplicate content, it will either be an exact match for another piece of content, or so similar to another piece of content that it is treated as being the same by search engines like Google and Bing.
The term is most commonly associated with the field of search engine optimisation, where it is considered to be poor practice. This is primarily because search engines do not want to display multiple versions of the same content to users who enter a search query. As a result, duplicate content is usually penalised by search engine algorithms, resulting in lower placement on search engine results pages, or even in it not being placed at all.
Duplicate Content: Detailed Summary
When two pieces of content appear on separate web pages, and are either identical or “appreciably similar” to one another, search engines like Google and Bing treat it as an instance of duplicate content. This is significant, because search engines make a conscious effort to avoid displaying duplicate content to their users, and so have developed algorithms which penalise it when deciding their search engine results page rankings.
Duplicate content can be broadly divided into two different types: malicious duplicate content, and non-malicious duplicate content. Although the ethics involved in these two types of duplicate content are different, they are generally treated the same by search engine algorithms. With that being said, malicious duplicate content can also lead to additional problems, such as copyright violation and accusations of plagiarism.
An example of malicious duplicate content would be content which has been intentionally replicated or plagiarised from another site. This includes content that has been intentionally replicated by the same author, for multiple websites, and this particular practice is sometimes referred to as ‘self-plagiarism’. Also included in the malicious category are various black hat SEO techniques, including search spam and content scraping.
Nevertheless, the vast majority of duplicate content is non-malicious in nature. One of the most prominent examples of this is web pages that are intentionally reproduced, in order to serve different functions, such as when there is a main version, a mobile optimised version and a printer-friendly version. It can also be an issue with sites that intentionally syndicate content, with permission from the original author or publishing website.
Non-malicious duplicate content may also be produced when a web page is made accessible under multiple different sub-domains, or when certain blocks of content are intentionally repeated on every page of a website. A good example of the latter would be a copyright notice at the bottom of the page, or when contact details are placed on every page. For this reason, the length of any such blocks of text should be kept short.
Finally, within the field of e-commerce, duplicate content is a common problem within product descriptions. This is because, in many cases, there will be multiple different e-commerce sites selling the same products. Often, they will use a product description given to them by the product manufacturer, but this can mean that potentially hundreds of different websites will have the exact same product description on them.
Problems With Duplicate Content
As stated, the primary issue with duplicated content is that it is treated with great suspicion by search engine algorithms. Search engines do not want their users to enter a search term and be presented with multiple versions of the same content, but algorithms are not always good at deciding which versions to display, which to ignore, and whether the duplicate content is malicious or non-malicious in nature.
To avoid the issue of presenting users with multiple versions of the same content, search engines will typically penalise duplicate content in search engine results pages, either by not ranking it at all, or by placing it lower down, meaning it may not appear until several pages in. A major issue for website owners and content creators is that content may be penalised, even if it was the original version of the content and has since been replicated elsewhere.
For content creators, duplicate content also poses several additional issues. For instance, the benefit of inbound links to content may be diminished, because those links will be divided among the various duplicates. Moreover, in the most serious cases, such as those where plagiarism is alleged, duplicate content can result in take-down notices being issued, or legal action being pursued by the original creator, even if the plagiarism was unintentional.
Ways to Reduce Duplicate Content
In order to avoid duplicate content, it is important that content creators focus on creating original content as much as possible, and avoid replicating pages multiple times on the same website. Even if content is partially inspired by another source, the information should be carefully written to avoid accidentally duplicating or plagiarising content. A plagiarism detector can also help to prevent instances of accidental plagiarism.
Of course, most duplicate content is non-malicious in nature, but there are still steps that can be taken to prevent it from occurring. For example, where web content is accessible via more than one URL, search engines will treat it as duplicate content and decide one URL which is the main or ‘canonical’ URL. However, the search engine may not choose the best option. Users can specify their own preferred page by including a canonical tag.
Additionally, search engines can be told not to include specific pages within search results through the inclusion of either a noindex tag or a robots.txt file. If the duplicate content exists because it appears on versions of the website intended to serve different parts of the world, a hreflang tag can also be used.
For users who allow their content to be syndicated on other websites, it is recommended that they insist that the website syndicating the content links to the original source, which can help Google to understand it as the origin point. It may also be sensible practice to insist that the website includes a noindex tag, which will prevent the syndicated version from appearing in search engine results pages, but this may not always be possible.
Finally, even though it can be time consuming and potentially expensive, it is best for those operating in the field of e-commerce to create their own original product descriptions. While these descriptions can effectively be re-writes of the official product descriptions given by manufacturers, re-wording it can provide a significant SEO advantage over many other e-commerce sites selling the same product and using the official description.
Duplicate content is a term primarily used within the field of SEO to refer to content that appears on more than one URL. It can be the result of either malicious or non-malicious practices, but will usually be penalised by search engines regardless, as they do not want to present users with multiple versions of the same content. For this reason, it is important to take steps to avoid duplicate content, or use tags to instruct search engines to ignore it.