The term ‘canonicalization’ scares off a lot of people as the methods of avoiding it look very ‘techie’. As a result there are numerous websites affected by it which are consequently experiencing poor rankings in the SERPs.
What is Canonicalization?
Canonicalization happens when a website can be accessed from different URLs e.g. http://your-url.com and also http://www.your-url.com.
In a nutshell this means there is a duplicate content issue, a matter very much under scrutiny as a result of Google’s 2011 algorithm changes, particularly ‘Farmer’ in April of this year and ‘Fresh’ just a few weeks ago.
This can be checked quite simply by entering your website URL as a Google search and observing whether the SERP brings up more than one version. If it does, that is proof positive that you are suffering from canonicalization.
The negative effects of this problem include a dilution of backlinks, which is another way certain to lose you rank which is the last thing an e-commerce seo campaign is put in place to do.
The First Way
Resolving canonicalization will not only allow you to choose the URL you would like shown for a specific piece of content, it will prevent multiple URLs showing for the same content as well. There are two recognized ways to overcome the problem. The first is using the rel=”canonical” tag on each webpage e.g.
Google are very amenable to the resolution of canonical issues and since
2009 have accepted a particular and simple way to use the rel= canonical link tag for identifying outdated pages. Using the HTML format as shown above, add the URL for the redundant page:
/outdated-page-1
This informs the search engines that content on that specific page refers to the specified canonical page, which will be your preferred URL. Canonical links can also be used to link domains when required. At this point, although this is starting to sound technical, it is really quite simple and there are numerous websites which will offer step-by-step advice to real technophobes.
The Second Way
The second method, a 301 Redirect, simply indicates to both website visitors and search engine spiders that something has been permanently moved to a different location, this could be a page or specific content. Users are redirected automatically to the updated domain or page and the search engines’ indexes will update in due course, allowing links to be credited to the new location.
The HTML entered for this would be in the following format, replacing with your own website information where appropriate:
Redirect 301 /oldpage.html http://www.example.co.uk/newpage.html
This method is suitable for static web pages, but for those driven by database applications (a blog for example) the file name will need to be appended by an appropriate query string, shown after the URL e.g.
http://www.example.co.uk/page.php?id=2
Redirect 301 query strings are a little more complex to create but once again there are many tutorials and forums that can help.
Robot Text Files
On a similar, subject, but with a different result is robots.txt, which is a text file entered in the server side coding. The function is to ‘disallow’ search engine spiders entering and crawling the disallowed content. The correct name for this is the Robots Exclusion Protocol.
This text file is generally used to prevent search engine access to non-visible items such as a personal images folder, website administration folders and null-value folders like the cgi-bin, none of which is indexed.
Any WordPress Powered site, including WordPress that includes e-commerce is covered. Canonicalization has been available since before WordPress 3.0. You simply type the URL you want… with or without the www in front of your domain name.
I think a lot of people forget what duplicate content really is. Google IS smart enough now to distinguish your site and sees whether it is listed with or without the www. They will list only 1. Even Matt Cutts talked about this, even before the Panda updates. It is when that same content is showing up on other websites that it becomes a big problem.
Thanks for interesting tips!
I agree that duplicate content is a problem for many websites. Is there any other method to check if I have duplicate content?