Problem: I want to see just my short-tail keywords but don’t know how to isolate them in my reports.
Solution: Create an advanced segment using regular expressions to identify these keywords.
This post is actually inspired by a question from one of my SEO friends on Twitter, Rob Woods. So I toyed and fiddled in Google Analytics until I figured it out. Truth be told, I relish any opportunity to play with regular expressions (AKA regex), so it was a fun challenge.
First, let me say, if you aren’t familiar with how to use regular expressions, I wrote a blog post that breaks down regular expressions in Google Analytics in simple terms on the BlueGlass blog. It’s not as scary as it sounds. Promise.
One-Word Keywords
Let’s say you want to see only those searches that led to visits to your site with just one keyword. You could identify all of the single-word keywords used to find your site with the following expression:
^[a-zA-Z0-9]*$
As scary as the expression may look, it’s actually quite simple. It just means the keyword starts an alphanumeric character and can have an undefined number of alphanumeric characters following it. Using the ^ and $ regex bookends ensures that you keep the junk out of your trunk.
But it’s clunky and unrefined, like wearing high tops with a suit. So let’s replace the [a-zA-Z0-9] with a w. The w regex expression matches any word characters: letters, digits, and underscores. So now your expression would look like this:
^w*$
Two-Word Keywords
Now let’s try that with two keywords. We’ll need another new regex expression to match the spaces between keywords. The s will do the trick because it matches any whitespace element, including spaces, tabs, and line breaks. So our expression becomes:
^w*sw*$
Easy, right? Okay, let’s get a little more advanced.
Customize the Number of Keywords
Let’s say you want to isolate searches that came to the site with one or two keywords. There are, of course, several different regex expressions you could use, which is typical with regular expressions. But I think the simplest would be to write it this way:
^w*(sw*)?$
To review, the ? means that the preceding character(s) are optional, and the parentheses group characters just like you learned in algebra. So this expression says that there’s at least one word, but there could be a whitespace and another word that follow.
If you wanted to have between two and four words, it would look like this:
^w*sw*(sw*)?(sw*)?
This says there are definitely two words with a space in between, but there may be one or two more space/word combinations as well.
Create an Advanced Segment
Now it’s time to apply this expression to an advanced segment. If you’re not using advanced segments in Google Analytics, you are missing out. Google’s Conversion University did this five-minute video on how to create advanced segments in Google Analytics.
So let’s say you want to isolate the organic searches that led visitors to your site that include between one and three keywords. First of all, here’s the regex you’ll use to capture that range of keywords:
^w*(sw*)?(sw*)?$
Now we’re going to put that in an advanced segment like such:
Click for larger image.
This just says that this segment will only match visits that come from organic and match our regex.
Caveat
The w expression will not match hyphens. So if you want to capture searches that include hyphens (or any other character that’s not a letter, number, or underscore), you will need to account for them in your regex. To include search terms with hyphens, you could use this:
^(w|-)*(s(w|-)*)?(s(w|-)*)?
All I did was replace each instance of w with w|- … which just means either a word character or a hyphen. And I grouped them with parentheses.
Your Turn
So there ya go. Start segmenting your search terms to see what competitive phrases you’re getting traffic for. An interesting data dive would be to see if Google Instant has impacted the number of short-tail searches your site has gotten. It was unleashed Sep 8, 2010, so you can look at your before/after by creating this segment and see how many more (or fewer) searches you’ve gotten.
Andres says
Hi Annie, I just saw you at Search Love, your talk was great. Hey now that Blueglass is gone your post on RegEx (linked to in this post) is not available. Maybe you can repost it on this site? I would love to read it. Thanks!