This website or its third-party tools use cookies which are necessary to its functioning and required to improve your experience. By clicking the consent button, you agree to allow the site to use, collect and/or store cookies.
The platform this site runs on doesn't have the option to turn off cookies server side, but you can do that in your browser and then return to the site.
I accept
Deny cookies Go Back

Annielytics.com

I make data sexy

  • About
  • Resources
  • Services
  • Blog
  • Contact
  • Log In

Feb 05 2021

Regex for Marketers: Plain English, Real World Examples [VIDEO]

regex for marketers, analysts, and coders
You will know the answer to this by the time you get to the end of this video.

I’ll admit it: I used to be a regexaphobe. When I was new to analytics, I remember someone sending me a snippet of regular expressions (AKA regex) to solve a goal setup conundrum I was working through. It looked like a foreign language to me. I was fascinated by it but repelled at the same time. #itscomplicatedk?

Sadly, my intimidation of regex prevented me from doing more powerful analysis. I tried everything to avoid it and would copy and paste code from articles I saved when I had to create a custom filter. But eventually I hit a wall I couldn’t scale unless I conquered this beast. So, like Yukon Cornelius, Rudolph, and Hermey,  I set out on a quest to learn it.

Regex is a beast with no teeth (unless you screw it up).

Why This Post

As nerdy as regex is, I’m writing this post because it will broaden your capacity as a marketer to do more sophisticated analysis in tools like Google Analytics, Google Docs, Google Spreadsheets, Tableau, Screaming Frog, SQL, etc. Basically any tool that uses filters. I will be creating videos to demonstrate practical tasks you might need to carry out in each of these tools with the aid of regex. It’s gonna be dope.

using regular expressions in marketing tools
Me thinking about empowering marketers to amp up their filters with regex

So I’m going to hit on the main ones you’ll need, while explaining the geek speak in simple terms. I will even subject myself to public scorn by sharing the goofy mnemonic devices I used early on to remember a few of them I just couldn’t seem to get down.

I will break the regex characters you’ll use most down in the order I go through in my video. When creating the video, I used regexr.com to test my regex. There are quite a few tools on the market. There were a couple times it was a little buggy. So if it’s not matching and you’re sure your regex is on point, try refreshing the page.

I’ll also include the lists I used in my demos so that you can follow along, if you’re so inclined.

Video

Regex Lineup

Pipe (|)

What It Does

The pipe character is the regex equivalent to or. It’s the Swiss army knife of all the regex characters. Follow Along

In the video (00:34 min mark), I use a list of countries, which you can download here, and use regex to filter it down to just EU countries. The regex I used:

Austria|Belgium|Bulgaria|Croatia|Republic of Cyprus|Czech Republic|Denmark|Estonia|Finland|France|Germany|Greece|Hungary|Ireland|Italy|Latvia|Lithuania|Luxembourg|Malta|Netherlands|Poland|Portugal|Romania|Slovakia|Slovenia|Spain|Sweden

Caveat

One thing you need to be careful with when using the pipe character in a long list like this is, if you tack a pipe character onto the end of your list, you will select everything. You’re basically saying, “or whatever.”

Dot (.)

What It Does

The . metacharacter is a wildcard character. It means match any one character. It can be a number, letter, or special character (even a white space). By itself, it’s not that amazing, but with the help of its frequent companions, the asterisk (*) and plus (+) characters, it’s pretty bad to the bone.

Follow Along

In the video (4:00 min mark), I use a list of kindergarten words to play with the dot character. Feel free to play along:

cat

cap

cot

bat

cut

dab

but

Caveat

If you’re a marketer, you’ll be using dots quite a bit as themselves, so you’ll need to escape them (i.e., drop a backslash in front of it). That said, most of the time, if you’re matching a list of URLs, your regex will most likely work even if you forget to escape the dot because how many other characters are you going to see before your top level domain (e.g., .com, .edu, .gov)?

Asterisk (*)

What It Does

The asterisk says to match 0 or more of the character that comes right before it. So, in other words, it looks at the character before it (most often the . character) and indicates that there may or may not be that character AND an unlimited number of matches afterwards.

Follow Along

In the video (4:00 min mark), I address the asterisk and plus characters in quick succession after the dot character. See the Dot section for the list of words I used in my demo. 👆

Caveat

The .* combo meal is expensive. I treat it as an option of last resorts. I demonstrate in the video how I’ll most commonly use it within some pretty tight parameters when I walk through how to capture the misspellings of Britney Spears (34:21 min mark).

Heads-Up

I wasn’t supposed to cover the * character when I did but went off script. Then I forgot I did that and introduced it [again] at the 7:07 min mark. #50firstdates

regex for marketers
At least I don’t have to worry about you forgetting what the asterisk character does. 🤦‍♀️

Plus Sign (+)

What It Does

The + means one or more of the previous character. So it’s a lot like the asterisk, except it requires that at least one character matches. Iow, the previous character is mandatory. I use this all through the video tutorial.

Follow Along

In the video (4:00 min mark), I address the plus and asterisk characters in quick succession after the dot character. See the Dot section for the list of words I used in my demo. 👆

Square Brackets ([ ])

What It Does

This means match any one of the characters between the brackets. So, c[aou]p would match cap, cop, and cup. But you can only pick one; that’s the key to the brackets. You can throw in a dash to indicate a range of characters to choose from. For example, [0-5] would mean you could pick any one digit between 0 and 5, and [x-z] will match x, y, z.

Follow Along

In the video (6:18 min mark), I use the list of words under the Dot section to demo square brackets. But we’ll use them several times throughout the video. They accomplish the same thing as the pipe character, but I find brackets easier to read than a bunch of pipe characters.

Caveat

You don’t need to escape regex characters when they’re inside square brackets. You won’t blow anything up if you do, but they’re not necessary. Imagine playing a high-stakes game of tag on the playground. Square brackets are base for regex characters, like *, ., +, and ?. So you’ll get no judgment from me for escaping them, but I can’t protect you from that pedantic developer on your team who’s already tired of marketers poking around in their code.

No need to escape regex characters in square brackets.
Tough crowd.

The one exception is if you’re using [^ ] to exclude string characters and want to indicate the literal ^ character, as opposed to the regex character. Then you could drop a \ in front of it. (If you’re new to regex, I promise this will all make sense by the time you get to the end of this post.) Alternatively, if you have multiple characters you’re excluding, you could position after another character. So if you wanted to exclude the caret character along with the hyphen and asterisk in your regex, you could write, [\^-*] or [-*^]. (Is it just me or does that first expression look a little flirty?)

Backslash (\)

What It Does

This character escapes the character that follows it. In plain English, that simply means that it says treat the character that follows it as a regular ol’ character and NOT a regex character. These non-regexy characters are literally called literals. 😂😶😏

So if I write out index\.aspx\?query=funky\+boots, I’m saying treat the . , ?, and + signs as characters and don’t interpret them as regex.

Follow Along

In the video (6:18 min mark), I go through the list of most common regex characters that you’ll need to escape. You can find that list here. And here is the list I worked from:

$45.18

3892.8467

$35479.27

$39,756.18

$1284

76390

Caveat

You may play Russian roulette with your regex and not escape your regex characters. With the example of the URI above, it would probably work out. But you’re going to have 🍳 on your face if someone drills into your dashboard and finds junk. To wit, I was once building out Tableau dashboards for a client, and their Google rep had been sending them filtered data to drop in their reports. When I audited their data using a treemap, one wrong character caused two brackets of their keywords to be distorted by millions. (They used a * when they should have used a +. In this client’s case, that was an actual expensive mistake. 🤭)

Digit (\d)

What It Does

The digit metacharacter is very self-explanatory. It includes any one number between 0-9.

Follow Along

In the video (9:37 min mark) I use the list under Backslash, due north, to demonstrate this handy regex character.

Caveat

Regex characters are case sensitive. If you capitalize the ‘d’ (i.e., \D) it is negated, meaning it will match any character that’s NOT on the VIP list (ergo, letters, symbols, etc.).

Question Mark (?)

What It Does

Technically, this character means 0 or 1 of the character before, but I like to think of it as the previous character being optional. Maybe it’s there, maybe it’s not—who knows, really? Hence the ?. See how easy this is when you’re not learning from a textbook printed on recycled paper in Times New Roman with pics of Macs from the 80s? Or reading my post from 2013 that was technically correct but not elegant. (Like, at all.)

regex code
Me reading through my 2013 post this week.

Follow Along

In the video (10:55 min mark) you can keep rocking the list above to practice.

Caveat

You can make multiple characters optional using the ? character; you just have to wrap them up in a little burrito made of parentheses, e.g., (sir)? paul mccartney.

I don’t want to dash anyone’s faith in the future of humanity, but IRL your regex would probably look closer to:

(sir)? paul mcc?[ck]artn(ey|y|ie)

Curly Braces ({ })

What It Does

Curly braces indicate how many times you may want a character repeated. They immediately follow the character (or characters wrapped in parentheses) and either contain a single number or two numbers separated by a comma. Let’s say you want to scoop up all US zip codes out a column where the address is in one cell. (Annoying, amirite?)  Because a basic zip code in the US is five digits, you’d write it as [\d]{5}—or [0-9]{5} if you want to look like a neophyte. (Kidding. Sorta.)

You could also express a range with curly braces by using the convention {minimum, maximum}. For example, let’s say you have a list of product IDs that start with three lower case letters followed by a hyphen and then three-to-five digits. You could pattern match it with this:

[\w]{3}-[\d]{3,5}

If the \w was pulling in characters you didn’t want, you could cinch it down by only including what you need: [a-z] or [a-zA-Z].

Follow Along

In the video (14:09 min mark) I use this list below to identify phone numbers:

325-678-3892

89-2784-09

578-487-89921

(202) 893-2749

98-36489032

813-234-9569

Caveat

A mistake I sometimes see in Google Analytics accounts is someone will separate the min and max numbers with a hyphen. It’s an honest mistake. We can use them in square brackets. But someone probably lost a bet somewhere, and it was decided that the curly braces should use a comma. And this, boys and girls, is why programming is hard.

regex in simple terms / plain English
😜😛😂🥺😕😣😩

Caret (^)

What It Does

The caret character just indicates the beginning of a line—meaning your selection has to begin with whatever you put after it. I use this all the time when pattern matching URLs and URIs (a URL that got separated from the hostname/subdomain). I’m in the process right now of building out a series of campaign-specific dashboards for client with different universes of URLs. I’m using regex to pattern match of URLs and URIs and then marrying up their Google Analytics, Search Console, Moz, and Screaming Frog data. This wouldn’t be possible without regex.

I also use the ^ regex character when making sets in Tableau. This is helpful in grouping keywords from Google Ads, Search Console, site search, names, etc.

Follow Along

In the video (15:50 min mark), I use the list below to identify social media profiles.

@AnnieCushing

This is just test

@ me!

@mashable

annie@annielytics.com

@old_skool

more random text

@annie-cushing

Caveat

If you see a caret inside square brackets, it takes on an entirely different role. I’ll cover that in the “Square Brackets + Caret” section 👇.

Word (\w)

What It Does

I didn’t include this regex metacharacter in my original post, but now—after almost 10 years of experience with regex—I use it all the time. It includes any one character that’s a letter (upper- or lowercase), number, or underscore. It’s a more efficient alternative to typing out [a-zA-Z0-9_]. Oddly enough, it doesn’t include a hyphen.

Similar to the \d metacharacter, if you capitalize it, you’ll throw your net out and catch anything that’s not a word character (e.g., a symbol).

Follow Along

In the video (at the 15:50 min mark) I introduce the \w character along with the caret. You can use that same list.

Caveat

If you including numbers or the underscore included in your filter, you’ll need to just indicate letters. And if your pattern could include lowercase and uppercase letters, you’ll need to specify that, e.g., [a-zA-Z].

Parentheses ( )

What They Do

Parentheses are used to form groups — just like you learned in algebra. When you write more sophisticated regex, you’ll rely pretty heavily on parentheses. For one client’s site, I wanted to create a bucket for all the URLs that were generated when someone searched for a property on their site. I save snippets like this in Evernote and tag these snippets with ‘regex’ so it’s fun sometimes to look back on my old code. We tested it thoroughly before creating the rewrite filter (where I rewrote them all to a single URL since these pages all did the same thing). And it worked. But it’s a hot mess:

(^/index.html?pclass.*)|(/index.html?action=search.*)|(/index.php?cur_page=.*)|(/index.html?searchtext.*)|(/realty/index.html?pclass.*)

Here’s how I’d write it now:

^(/realty)?/index\.(html|php)\?(pclass|action=search|cur_page|searchtext)

regex cleanup for marketers
Especially after cleaning up that regex salad.

Parentheses are especially helpful when identifying words that are frequently truncated, like months. So if you wrote Sep(tember)? it would match Sep or September. Or if you want to let go and let God, [sS]ep(tember)? would additionally match sep and september. But now I’m just showing off. Sorry.

Follow Along

In the video (at the 19:20 min mark) I introduce the parentheses. You can use the list below to follow along:

facebook.com

search.yahoo.com

huffpo.com

search.ask.com

pinterest.com

search.aol.com

search.xfinity.com

Caveat

In Google Analytics, you don’t need to tack a (.*) to the end of your patterns to catch string characters in the caboose. The report filter treats regex as a contains filter on ‘roids. But some tools explicitly require the wildcard characters to account for string characters you haven’t included in your regex. So user beware.

Dollar Sign ($)

What It Does

The dollar sign character means that your string must end at that point. For example, health insurance$ matches cheap health insurance but not health insurance rates. Or you could attach a $ to the end of a URL to prevent that URL with any query strings from being included in your match. Or at the end of a directory to analyze only traffic to your category pages and not their child pages. (I demonstrate the latter in the video.)

I really look forward to demonstrating how you can use regex to search and replace. It’s tricky but very empowering once you learn the essentials because most tools that support regex support this ability. You will use the $ a lot when you power up to replacing with regex.

Follow Along

In the video (at the 22:08 min mark) I introduce the parentheses. You can use the list below to follow along:

/blog/google-docs/how-to-import-one-spreadsheet-into-another-in-google-drive-video/

/blog/

/guides/definitive-guide-campaign-tagging-google-analytics/

/services/

/comprehensive-self-guided-site-audit-checklist/

/resources/

/blog/analytics/referral-exclusion-list-google-analytics-explained/

/blog/excel-tips/formatting-dates-in-excel/

/about/

/services/analytics-audits/

/about-me/

Caveat

Just because a $ means the end of a line, it doesn’t necessarily mean the end of your regex. For example, you could have an expression that looks like ^Los Angeles$|^New York$|^Chicago$. (This would filter a report down to just the three largest cities in the US.)

Utterly Ridiculous Mnemonic Device (That Works)

I came up with this when I first started learning hierogl– regex. But you have to promise not to laugh.

Promise? 🤨

Okay, I thought of how you lead someone with a carrot (I know it’s a different spelling—work with me 🙄) by putting it out in front and how at the end of the day it’s all about the money.  So the ^ goes in front in a regex expression and the $ at the end.

how to remember regex characters
I’ll just see myself out.

Yeah, yeah, go ahead and laugh (promise breaker). But I guarantee you’ll remember next time.

Square Brackets + Caret ( [^ ])

What It Does

If you toss a caret into your square brackets (as the first character), it will exclude whatever else is in the square brackets. So b[^a]t will match bit, bet, bot, and but but not bat. As with the square brackets sans the caret, you don’t separate these characters in any way. Just shove them into the elevator together.

Follow Along

In the video (24:51 min mark), I use the list below to identify phone numbers:

325-678-3892

89-2784-09

578-487-89921

(202) 893-2749

98-36489032

813-234-9569

Caveat

As I wrote above, in the Square Brackets section, you need to be careful if you want to exclude the literal caret character. You’ll either need to escape it or make sure it doesn’t directly follow the left square bracket.  So if you wanted to exclude the caret character along with the hyphen and asterisk in your regex, you could write, [\^-*] or [-*^].

Whitespace (\s)

What It Does

The whitespace metacharacter matches a space character. I use it most commonly to match an actual space, but it will also match the tab (\t), new line (\n), and carriage return (\r). (It also matches the line and form feed, but I’ve never had to use those options as an analyst.)

Follow Along

In the video (29:26 min mark), I use the same phone number list above.

Caveat

If you only need to match a space between words, you can just drop a space into your regex. Watch out for those Boomers and their double spaces between sentences though. (Oh HEY, Boomers! 😘)

Testing Your Regex

The best part of Google Analytics is every report comes with a line-item filter. And that filter is sensitive to regex. Previously, you would need to select Matching RegExp for it to recognize it; now you can just enter your regex into the filter, and you’re good to go.

regex regular expressions for Google Analytics
Google Analytics filters are now sensitive to regex. No need to select “Matching RegExp” from the drop-down.

So if I’m writing regex to capture a group of pages to concatenate in a segment to analyze, I’ll fire up a content report and paste my regex into the filter. If all of my pages are present and accounted for, I’m golden. It’s a real time saver.

That said, if you’re brand new to regex and want to test your code, I highly recommend using a regex helper like regexr.com (what I used in my tutorial) or regex101.com.

More Practice

The rest of the video tutorial is an opportunity to practice your regex with more lists. I’ll drop them below:

Britney Spears Practice

34:21 min mark

Britney Spears

Brittany Speers

Britanni Spers

brittany spears

Britany Spears

Britani Speres

Brittny Spears

britanni speers

brtany spears

Identify URIs with Query Parameters

38:21 min mark

You’ll want to either drop a group of URLs with query parameters into regexr.com or open your All Pages report (Behavior > Site Content).

Filter for Site Search Terms with Three Words

41:23 min mark

You’ll want to either drop a group of multi-word terms into regexr.com or open your Behavior > Site Search > Search Terms. (Alternatively, you could pull these from any keyword tool, like Search Console, Ahrefs, etc.)

Staging Subdomains

43:08 min mark

www.mydomain.com

staging.mydomain.com

blog.mydomain.com

production.mydomain.com

store.mydomain.com

login.mydomain.com

Extract Zip Codes

44:14 min mark

1367 Misty Ridge Ct Hampton, GA 30228-8456

6489 M 40 Lawton, MI 49065

3360 Woods Ln Callahan, FL 32011

378 Country Side Ln #UNT 2 Albany MN 56307

Y No Regex in Excel?

A common frustration I had for a long time was that I couldn’t use regex in Excel. I could Word but not Excel. Go figure. You can use a plugin like the SeoTools plugin or do all your regex in Google Docs and bring it back into Excel or (my personal fave) use advanced filters in Excel. They actually give you more options than regex and are easier to master.

Written by Annie Cushing · Categorized: Programming · Tagged: Google Analytics, Google Docs, Google Sheets, Regex, Regular Expressions, Screaming Frog, Tableau

Comments

  1. Sam says

    February 8, 2021 at 7:32 AM

    Fantastic, I’ve been meaning to learn this for sometime but was always put off by these mega guides not aimed at marketers. I just wanna perform some regex on Google Analytics & Screaming Frog. So this is great 🙂

    Reply
    • Annie Cushing says

      February 8, 2021 at 8:23 AM

      This is exactly why I did this guide. I’ll be doing videos dedicated to each of those tools, so make sure you’ve subscribed and click the bell for notifications, if you want to follow along. I really look forward to demystifying the extract feature that regex offers in all of these tools.

      Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

CAN I HELP YOU FIND SOMETHING?

‘MAKING DATA SEXY’ IS LIVE ?

Want to learn how to create compelling, aesthetically pleasing charts in Excel? Learn how with Making Data Sexy.

Note: Also offered for Mac

STUFF I BLOG ABOUT

LEARN HOW TO USE ALL THE TOOLS

marketing strategy guide

DIY marketing strategy guide. This guide provides step-by-step instructions on how to perform 66 unique marketing tasks using 15 reputable marketing tools (both free and paid). Steal it for $295! Learn more.

LEARN TO DO A SITE AUDIT

site audit template

DIY site audit template. 20 sections, 215 checkpoints, 100+ explainer graphics, 218 pages, step-by-step instructions. Steal it for $295! Learn more.

LEARN TO DO AN ANALYTICS AUDIT

analytics audit template

DIY analytics audit template. 8 sections, 61 checkpoints, 100+ explainer graphics, 205 pages, step-by-step instructions. Steal it for $295! Learn more.

TO THIS DATA I DO THEE WED

dashboard course

Learn to build dynamic dashboards in Excel with Google Analytics data. 16 hours of video, 3 sample dashboards, 142-page workbook, practice Excel file, and more! Learn more.

FOLLOW ME ON TWITTER

SUBSCRIBE TO MY YOUTUBE CHANNEL

Privacy Policy
  • Email
  • LinkedIn
  • Twitter
  • YouTube

© 2023 annielytics.com