• Skip to main content

Annielytics.com

I make data sexy

  • About
  • Services
  • Tools
  • Blog
  • Portfolio
  • Contact
  • Log In

Feb 05 2021

Regex for Marketers in Plain English with Real World Examples [VIDEO]

regex for marketers, analysts, and coders
You will know the answer to this by the time you get to the end of this video.

I’ll admit it: I used to be a regexaphobe. When I was new to analytics, I remember someone sending me a snippet of regular expressions (AKA regex) to solve a goal setup conundrum I was working through. It looked like a foreign language to me. I was fascinated by it but repelled at the same time. #itscomplicatedk?

Sadly, my intimidation of regex prevented me from doing more powerful analysis. I tried everything to avoid it and would copy and paste code from articles I saved when I had to create a custom filter. But eventually I hit a wall I couldn’t scale unless I conquered this beast. So, like Yukon Cornelius, Rudolph, and Hermey,  I set out on a quest to learn it.

Regex is a beast with no teeth (unless you screw it up).

As nerdy as regex is, I’m writing this post because it will broaden your capacity as a marketer to do more sophisticated analysis in tools like Google Analytics, Google Docs, Google Spreadsheets, Tableau, Screaming Frog, SQL, etc. Basically any tool that uses filters. I will be creating videos to demonstrate practical tasks you might need to carry out in each of these tools with the aid of regex. It’s gonna be dope.

using regular expressions in marketing tools
Me thinking about empowering marketers to amp up their filters with regex

So I’m going to hit on the main ones you’ll need, while explaining the geek speak in simple terms. I will even subject myself to public scorn by sharing the goofy mnemonic devices I used early on to remember a few of them I just couldn’t seem to get down.

I will break the regex characters you’ll use most down in the order I go through in my video. When creating the video, I used regexr.com to test my regex. There are quite a few tools on the market. There were a couple times it was a little buggy. So if it’s not matching and you’re sure your regex is on point, try refreshing the page.

I’ll also include the lists I used in my demos so that you can follow along, if you’re so inclined.

Regex Lineup

Pipe (|)

What It Does

The pipe character is the regex equivalent to or. It’s the Swiss army knife of all the regex characters.

Follow Along

In the video (00:34 min mark), I use a list of countries, which you can download here, and use regex to filter it down to just EU countries. The regex I used:

Austria|Belgium|Bulgaria|Croatia|Republic of Cyprus|Czech Republic|Denmark|Estonia|Finland|France|Germany|Greece|Hungary|Ireland|Italy|Latvia|Lithuania|Luxembourg|Malta|Netherlands|Poland|Portugal|Romania|Slovakia|Slovenia|Spain|Sweden

Caveat

One thing you need to be careful with when using the pipe character in a long list like this is, if you tack a pipe character onto the end of your list, you will select everything. You’re basically saying, “or whatever.”

Dot (.)

What It Does

The . metacharacter is a wildcard character. It means match any one character. It can be a number, letter, or special character (even a white space). By itself, it’s not that amazing, but with the help of its frequent companions, the asterisk (*) and plus (+) characters, it’s pretty bad to the bone.

Follow Along

In the video (4:00 min mark), I use a list of kindergarten words to play with the dot character. Feel free to play along:

cat
cap
cot
bat
cut
dab
but

Caveat

If you’re a marketer, you’ll be using dots quite a bit as themselves, so you’ll need to escape them (i.e., drop a backslash in front of it). That said, most of the time, if you’re matching a list of URLs, your regex will most likely work even if you forget to escape the dot because how many other characters are you going to see before your top level domain (e.g., .com, .edu, .gov)?

Asterisk (*)

What It Does

The asterisk says to match 0 or more of the character that comes right before it. So, in other words, it looks at the character before it (most often the . character) and indicates that there may or may not be that character AND an unlimited number of matches afterwards.

Follow Along

In the video (4:00 min mark), I address the asterisk and plus characters in quick succession after the dot character. See the Dot section for the list of words I used in my demo. ?

Caveat

The .* combo meal is expensive. I treat it as an option of last resorts. I demonstrate in the video how I’ll most commonly use it within some pretty tight parameters when I walk through how to capture the misspellings of Britney Spears (34:21 min mark).

Heads-Up

I wasn’t supposed to cover the * character when I did but went off script. Then I forgot I did that and introduced it [again] at the 7:07 min mark. #50firstdates

regex for marketers
At least I don’t have to worry about you forgetting what the asterisk character does. ?‍♀️

Plus Sign (+)

What It Does

The + means one or more of the previous character. So it’s a lot like the asterisk, except it requires that at least one character matches. Iow, the previous character is mandatory. I use this all through the video tutorial.

Follow Along

In the video (4:00 min mark), I address the plus and asterisk characters in quick succession after the dot character. See the Dot section for the list of words I used in my demo. ?

Square Brackets ([ ])

What It Does

This means match any one of the characters between the brackets. So, c[aou]p would match cap, cop, and cup. But you can only pick one; that’s the key to the brackets. You can throw in a dash to indicate a range of characters to choose from. For example, [0-5] would mean you could pick any one digit between 0 and 5, and [x-z] will match x, y, z.

Follow Along

In the video (6:18 min mark), I use the list of words under the Dot section to demo square brackets. But we’ll use them several times throughout the video. They accomplish the same thing as the pipe character, but I find brackets easier to read than a bunch of pipe characters.

Caveat

You don’t need to escape regex characters when they’re inside square brackets. You won’t blow anything up if you do, but they’re not necessary. Imagine playing a high-stakes game of tag on the playground. Square brackets are base for regex characters, like *, ., +, and ?. So you’ll get no judgment from me for escaping them, but I can’t protect you from that pedantic developer on your team who’s already tired of marketers poking around in their code.

No need to escape regex characters in square brackets.
Tough crowd.

The one exception is if you’re using [^ ] to exclude string characters and want to indicate the literal ^ character, as opposed to the regex character. Then you could drop a \ in front of it. (If you’re new to regex, I promise this will all make sense by the time you get to the end of this post.) Alternatively, if you have multiple characters you’re excluding, you could position after another character. So if you wanted to exclude the caret character along with the hyphen and asterisk in your regex, you could write, [\^-*] or [-*^]. (Is it just me or does that first expression look a little flirty?)

Backslash (\)

What It Does

This character escapes the character that follows it. In plain English, that simply means that it says treat the character that follows it as a regular ol’ character and NOT a regex character. These non-regexy characters are literally called literals. ???

So if I write out index\.aspx\?query=funky\+boots, I’m saying treat the . , ?, and + signs as characters and don’t interpret them as regex.

Follow Along

In the video (6:18 min mark), I go through the list of most common regex characters that you’ll need to escape. You can find that list here. And here is the list I worked from:

$45.18
3892.8467
$35479.27
$39,756.18
$1284
76390

Caveat

You may play Russian roulette with your regex and not escape your regex characters. With the example of the URI above, it would probably work out. But you’re going to have ? on your face if someone drills into your dashboard and finds junk. To wit, I was once building out Tableau dashboards for a client, and their Google rep had been sending them filtered data to drop in their reports. When I audited their data using a treemap, one wrong character caused two brackets of their keywords to be distorted by millions. (They used a * when they should have used a +. In this client’s case, that was an actual expensive mistake. ?)

Digit (\d)

What It Does

The digit metacharacter is very self-explanatory. It includes any one number between 0-9.

Follow Along

In the video (9:37 min mark) I use the list under Backslash, due north, to demonstrate this handy regex character.

Caveat

Regex characters are case sensitive. If you capitalize the ‘d’ (i.e., \D) it is negated, meaning it will match any character that’s NOT on the VIP list (ergo, letters, symbols, etc.).

Question Mark (?)

What It Does

Technically, this character means 0 or 1 of the character before, but I like to think of it as the previous character being optional. Maybe it’s there, maybe it’s not—who knows, really? Hence the ?. See how easy this is when you’re not learning from a textbook printed on recycled paper in Times New Roman with pics of Macs from the 80s? Or reading my post from 2013 that was technically correct but not elegant. (Like, at all.)

regex code
Me reading through my 2013 post this week.

Follow Along

In the video (10:55 min mark) you can keep rocking the list above to practice.

Caveat

You can make multiple characters optional using the ? character; you just have to wrap them up in a little burrito made of parentheses, e.g., (sir)? paul mccartney.

I don’t want to dash anyone’s faith in the future of humanity, but IRL your regex would probably look closer to:

(sir)? paul mcc?[ck]artn(ey|y|ie)

Curly Braces ({ })

What It Does

Curly braces indicate how many times you may want a character repeated. They immediately follow the character (or characters wrapped in parentheses) and either contain a single number or two numbers separated by a comma. Let’s say you want to scoop up all US zip codes out a column where the address is in one cell. (Annoying, amirite?)  Because a basic zip code in the US is five digits, you’d write it as [\d]{5}—or [0-9]{5} if you want to look like a neophyte. (Kidding. Sorta.)

You could also express a range with curly braces by using the convention {minimum, maximum}. For example, let’s say you have a list of product IDs that start with three lower case letters followed by a hyphen and then three-to-five digits. You could indicate pattern match with this:

[\w]{3}-[\d]{3,5}

If the \w was pulling in characters you didn’t want, you could cinch it down by only including what you need: [a-z] or [a-zA-Z].

Follow Along

In the video (14:09 min mark) I use this list below to identify phone numbers:

325-678-3892
89-2784-09
578-487-89921
(202) 893-2749
98-36489032
813-234-9569

Caveat

A mistake I sometimes see in Google Analytics accounts is someone will separate the min and max numbers with a hyphen. It’s an honest mistake. We can use them in square brackets. But someone probably lost a bet somewhere, and it was decided that the curly braces should use a comma. And this, boys and girls, is why programming is hard.

regex in simple terms / plain English
???????

Caret (^)

What It Does

The caret character just indicates the beginning of a line—meaning your selection has to begin with whatever you put after it. I use this all the time when pattern matching URLs and URIs (a URL that got separated from the hostname/subdomain). I’m in the process right now of building out a series of campaign-specific dashboards for client with different universes of URLs. I’m using regex to pattern match of URLs and URIs and then marrying up their Google Analytics, Search Console, Moz, and Screaming Frog data. This wouldn’t be possible without regex.

I also use the ^ regex character when making sets in Tableau. This is helpful in grouping keywords from Google Ads, Search Console, site search, names, etc.

Follow Along

In the video (15:50 min mark), I use the list below to identify social media profiles.

@AnnieCushing
This is just test
@ me!
@mashable
annie@annielytics.com
@old_skool
more random text
@annie-cushing

Caveat

If you see a caret inside square brackets it takes on an entirely different role. See below to learn more. ?

Word (\w)

What It Does

I didn’t include this regex metacharacter in my original post, but now—after almost 10 years of experience with regex—I use it all the time. It includes any one character that’s a letter (upper- or lowercase), number, or underscore. It’s a more efficient alternative to typing out [a-zA-Z0-9_]. Oddly enough, it doesn’t include a hyphen.

Similar to the \d metacharacter, if you capitalize it, you’ll throw your net out and catch anything that’s not a word character (e.g., a digit or symbol).

Follow Along

In the video (at the 15:50 min mark) I introduce the \w character along with the caret. You can use that same list.

Caveat

If you including numbers or the underscore included in your filter, you’ll need to just indicate letters. And if your pattern could include lowercase and uppercase letters, you’ll need to specify that, e.g., [a-zA-Z].

Parentheses ( )

What They Do

Parentheses are used to form groups — just like you learned in algebra. When you write more sophisticated regex, you’ll rely pretty heavily on parentheses. For one client’s site, I wanted to create a bucket for all the URLs that were generated when someone searched for a property on their site. I save snippets like this in Evernote and tag these snippets with ‘regex’ so it’s fun sometimes to look back on my old code. We tested it thoroughly before creating the rewrite filter (where I rewrote them all to a single URL since these pages all did the same thing). And it worked. But it’s a hot mess:

(^/index.html?pclass.*)|(/index.html?action=search.*)|(/index.php?cur_page=.*)|(/index.html?searchtext.*)|(/realty/index.html?pclass.*)

Here’s how I’d write it now:

^(/realty)?/index\.(html|php)\?(pclass|action=search|cur_page|searchtext)

regex cleanup for marketers
Especially after cleaning up that regex salad

Follow Along

In the video (at the 19:20 min mark) I introduce the parentheses. You can use the list below to follow along:

facebook.com
search.yahoo.com
huffpo.com
search.ask.com
pinterest.com
search.aol.com
search.xfinity.com

Caveat

In Google Analytics, you don’t need to tack a (.*) to the end of your patterns to catch string characters in the caboose. The report filter treats regex as a contains filter on ‘roids. But some tools explicitly require the wildcard characters to account for string characters you haven’t included in your regex. So user beware.

To drive this character into the ground would be Sep(tember)? would match Sep or September. Or if you want to let go and let God, [sS]ep(tember)? would additionally match sep and september. But now I’m just showing off. Sorry.

Dollar Sign ($)

What It Does

The dollar sign character means that your string must end at that point. For example, health insurance$ matches cheap health insurance but not health insurance rates. Or you could attach a $ to the end of a URL to prevent that URL with any query strings from being included in your match. Or at the end of a directory to analyze only traffic to your category pages and not their child pages. (I demonstrate the latter in the video.)

Follow Along

In the video (at the 22:08 min mark) I introduce the parentheses. You can use the list below to follow along:

/blog/google-docs/how-to-import-one-spreadsheet-into-another-in-google-drive-video/
/blog/
/guides/definitive-guide-campaign-tagging-google-analytics/
/services/
/comprehensive-self-guided-site-audit-checklist/
/resources/
/blog/analytics/referral-exclusion-list-google-analytics-explained/
/blog/excel-tips/formatting-dates-in-excel/
/about/
/services/analytics-audits/
/about-me/

Caveat

Just because a $ means the end of a line, it doesn’t necessarily mean the end of your regex. For example, you could have an expression that looks like ^Los Angeles$|^New York$|^Chicago$. (This would filter a report down to just the three largest cities in the US.)

Utterly Ridiculous Mnemonic Device (That Works)

I came up with when I first started learning hierogl– regex. But you have to promise not to laugh.

Promise? ?

Okay, I thought of how you lead someone with a carrot (I know it’s a different spelling—work with me ?) by putting it out in front and how at the end of the day it’s all about the money.  So the ^ goes in front in a regex expression and the $ at the end.

how to remember regex characters
I’ll just see myself out.

Yeah, yeah, go ahead and laugh (promise breaker). But I guarantee you’ll remember next time.

Square Brackets + Caret ( [^ ])

What It Does

If you toss a caret into your square brackets (as the first character), it will exclude whatever else is in the square brackets. So b[^a]t will match bit, bet, bot, and but but not bat. As with the square brackets sans the caret, you don’t separate these characters in any way. Just shove them into the elevator together.

Follow Along

In the video (24:51 min mark), I use the list below to identify phone numbers:

325-678-3892
89-2784-09
578-487-89921
(202) 893-2749
98-36489032
813-234-9569

Caveat

As I wrote above, in the Square Brackets section, you need to be careful if you want to exclude the literal caret character. You’ll either need to escape it or make sure it doesn’t directly follow the left square bracket.  So if you wanted to exclude the caret character along with the hyphen and asterisk in your regex, you could write, [\^-*] or [-*^].

Whitespace (\s)

What It Does

The whitespace metacharacter matches a space character. I use it most commonly to match an actual space, but it will also match the tab (\t), new line (\n), and carriage return (\r). (Also the line and form feed, but I’ve never had to use that as an analyst.)

Follow Along

In the video (29:26 min mark), I use the same phone number list above.

Caveat

If you only need to match a space between words, you can just drop a space into your regex. Watch out for those Boomers and their double spaces between sentences though. (Oh HEY, Boomers! ?)

Testing Your Regex

The best part of Google Analytics is every report comes with a line-item filter. And it is sensitive to regex. Previously, you would need to select Matching RegExp for it to recognize it; now you can just enter your regex into the filter, and you’re good to go.

regex regular expressions for Google Analytics
Google Analytics filters are now sensitive to regex. No need to select “Matching RegExp” from the drop-down.

So if I’m writing regex to capture a group of pages to concatenate in a segment to analyze, I’ll go to the Top Content report and paste my regex into the filter. If all of my pages are present and accounted for, I’m golden. It’s a real time saver.

That said, if you’re brand new to regex and want to test your code, I highly recommend using a regex helper like regexr.com (what I used in my tutorial) or regex101.com.

More Practice

The rest of the video tutorial is an opportunity to practice your regex with more lists. I’ll drop them below:

Britney Spears Practice

34:21 min mark

Britney Spears
Brittany Speers
Britanni Spers
brittany spears
Britany Spears
Britani Speres
Brittny Spears
britanni speers
brtany spears

Identify URIs with Query Parameters (aka Wrecking Balls)

38:21 min mark

You’ll want to either drop a group of URLs with query parameters into regexr.com or open your All Pages report (Behavior > Site Content).

Filter for Site Search Terms with Three Words

41:23 min mark

You’ll want to either drop a group of multi-word terms into regexr.com or open your Behavior > Site Search > Search Terms. (Alternatively, you could pull these from any keyword tool, like Search Console, Ahrefs, etc.)

Staging Subdomains

43:08 min mark

www.mydomain.com
staging.mydomain.com
blog.mydomain.com
production.mydomain.com
store.mydomain.com
login.mydomain.com

Extract Zip Codes

44:14 min mark

1367 Misty Ridge Ct Hampton, GA 30228-8456
6489 M 40 Lawton, MI 49065
3360 Woods Ln Callahan, FL 32011
378 Country Side Ln #UNT 2 Albany MN 56307

Caveat About Regex In Excel

A common frustration I had for a long time was that I couldn’t use regex in Excel. I could Word but not Excel. Go figure. You can use a plugin like the SeoTools plugin or do all your regex in Google Docs and bring it back into Excel or (my personal fave) use advanced filters in Excel. They actually give you more options than regex and are easier to master.

Written by Annie Cushing · Categorized: Programming

Reader Interactions

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Copyright © 2025