One of the basic tenets of SEO is the handling of non-canonical URLs. Countless posts have been written on the topic, and it’s a topic that’s discussed at probably every Internet marketing conference. Yet when it comes to our analytics, my experience has been that typically little-to-no attention is dedicated to keeping our content reports clear. As a result, many sites’ content reports are being mishandled, and skewed data is being passed up the food chain.
First things first. What’s a canonical URL? Simply put, canonical URLs are the URLs that you want search engines and visitors to your site to actually discover. The search engines have made it easier and easier for sites to be able to tell them which are their canonical URLs so that non-canonical URLs don’t sabotage a site’s money pages.
A typical scenario looks something like this: A client pulls its top 10 landing pages for analysis. They look at metrics like bounce rate, conversion rate, revenue, etc. But because of rogue query parameters, each of those pages may have multiple duplicates. But none of those duplicates are being factored in because no one knows they even exist.
The Primary Culprit
When non-canonical URLs wind up in content reports (Behavior > Site Content), it can become difficult to downright impossible to measure the effectiveness of your most important pages. Query parameters (e.g., http://www.mysite.com/widgets?sort=asc&color=blue&sid=153678) can create many duplicates of a single page. I actually saw with one large ecommerce site that used a gaggle of query parameters, a single page could be divided up into more than a hundred rows.
Typically, the more query parameters a site uses, the more permutations of parameter a URL can take on, causing the potential for duplication to increase significantly.
Google Analytics has a little-known setting that allows you to dictate which parameters you want to exclude. The key litmus test to determine if a parameter should be excluded is this: Does the parameter determine unique content?
For example, if your site used the default WordPress permalink structure that uses the p query parameter to set page URLs, you wouldn’t want to exclude the p parameter because it is used by the site to determine unique content. IOW, http://www.mysite.com/p=123 is a different page from http://www.mysite.com/p=179.
However, query parameters that merely rearrange, filter, or manipulate the information on a page (such as a sort option or size filter for a retailer) should be excluded from content reports. Excluding query parameters won’t filter out visits to these pages, like using a view (previous known as profile) filters do. Instead, it will simply consolidate pages by removing the query parameters from them.
So, for example, if you set your view to exclude the parameters sort and color, /jackets/?color=red&sort=asc, /jackets/?color=lime&sort=desc, and /jackets/?color=red&sort=desc would all become /jackets/, and the data for these pages is aggregated. All of the data for these pages would be consolidated into one line item.
How To Find Your Site’s Query Parameters
There are two different ways to find your site’s query parameters:
Google Webmaster Tools’ URL Parameters Report
The URL Parameters report (under Crawl) contains a list of all the query parameters googlebot found while crawling the site. It’s a great place to start. What you want to do is go through that list and use the same litmus test you would to determine if a page is the canonical URL.
Google Analytics’ Line Item Filter
If you pull up the All Pages report (Behavior > Site Content) and use the filter above the report to search for an equal sign, you’ll get a list of pages that contain query parameters.
To exclude query parameters you’ve already identified, take these steps.
Step 1. Click the Advanced link to the right of the filter box.
Step 2. Click the Add a dimension or metric button below the filter you already have. Select Page as your dimension but Exclude as the drop-down to the left.
Step 3. Set the drop-down to the right of the dimension drop-down to Matching RegExp, then separate the parameters you’ve already identified with pipe characters (found above the back slash key).
Step 4. Click the Apply button and analyze away.
Set Up Exclude
When you have your list of parameters you want to exclude, simply drop them in the box, separated by commas. Yours should look something like this:
You can learn how to clean up more than just your content reports in Google Analytics with my Analytics Audit Template, a self-guided, 147-page audit template that is regularly updated and will teach you how to do detailed analytics audits like a pro.