Duplicate content
Duplicate content is identical content or almost identical content, that are placed on two or more different URLs and will be indexed by search engines on all URLs. The are many ways to create duplicate content on your website. Some of them are:
- If you copy an article from your website or another website and post it again on your website (bad duplicate content).
- If you have a printer-friendly page of each of your articles.
- If you have a query in the URL that doesn’t change the content of the page
- If you have a multiple category pages with the same article
- If you have tag pages with the same article
- If your search component uses different URLs than on the article page
- If your website generate pages for multiple platforms (e.g. mobile and desktop version)
- If you can access your site both with and without www in your domain.
- If you can access your frontpage both on http://www.domain.com and http://www.domain.com/index.php.
- If you link to a page both with and without / after the URL (http://www.domain.com/page and http://www.domain.com/page/).
So there is both bad duplicate content and technically duplicate content – you should avoid both. You avoid bad duplicate content, by not copying any content and don’t write the same content to multiple sites. You avoid technically duplicate dontent, by setting up your website to tell search engines not to index those sites.
The purpose of bad duplicate content, can often be to affect the search engine rank. Google fight this alot, because it gives the users of Google a bad search experience, and it is therefore not a good idea to have identical content on more URLs. If you don’t do anything about duplicate content, you can get a bad search engine rank of the site or you can disapear completely from the search result page.
301 redirect – redirect to correct page
The solutions to must of the causes to duplicate content, is simply to permanently redirect it to the correct site. This is called a 301 redirect and is done in a .htaccess file that is located in the domain FTP root.
- Redirect e.g. http://domain.com to http://www.domain.com
- Redirect http://www.domain.com/index.php and http://domain.com/index.php to http://www.domain.com
- Redirect http://www.domain.com/page/ to http://www.domain.com/page
An example of a 301 redirection in the .htaccess file is:
RewriteCond %{HTTP_HOST} ^domain.com$
RewriteRule ^(.*) http://www.domain.com/$1 [L,R=301]
This 301 redirection redirect every URL that has been entered without to the same URL, just with www. This also happens when the search engine bot is comming by, they just get redirected to the other URL and therefore only one URL is indexed.
Nofollow and noindex
If you have a page you don’t want to be indexed in Google, e.g. the printer-friendly version of the page, you can tell Google not to index or follow the page. If you set the page to follow, but noindex, the search engine will read the page, but not put it in the index.
An example of this is:
<meta name=”robots” content=”noindex, nofollow”>
You just put this META HTML code in the HEAD section of your page. Then Google wont either index or follow the page.
If you want the search engines to follow, but not index – just write “noindex, follow”. Do you want them to index, but not follow – simply write “index, nofollow”.
You can also set the nofollow and noindex settings in a link, you do that by adding rel=”nofollow” to the <A> tag.
<a href=”site.html” rel=”nofollow”>Link</a>
The last method is to setup nofollow or noindex in the file robot.txt that are placed in the root folder (if not, you can create it there).