How To Pull Domains Out Of A List Of URLs In Excel [VIDEO]

extracting domains from URLs in Excel

One thing I like to do when doing a competitive analysis for a client is rifle through their backlinks vis-a-vis their competitors’ and reverse engineer their competitors’ marketing strategies. But sometimes that means extracting text from another column of data to organize your data for maximum pivoting. Since this is a common task I do a lot, I’m going to show you how to pull domains out of a list of URLs using a combo meal of the LEFT and SEARCH functions.

First Batter Up: LEFT Function

The LEFT function follows the following structure:

LEFT( text, [number_of_characters] )

text: For this you can either put text in quotation marks inside the formula or point to the cell where the text is. Most people choose the latter. It’s much more scalable.

number_of_characters: So anytime you see something in brackets like this, it means that parameter is optional. In the case of the LEFT function, I think it’s kind of silly to make the number of characters optional. If you leave it out, Excel will just grab one character. Can’t really think of many examples that would be useful, but knock yourself out if you find one. If you have a column of data and the text has the same number of characters in each cell, you’re in luck. You can just enter the number of characters.

Most of us aren’t that lucky though, especially Internet marketers who have to do messy tasks like extract the domain from a group of links from different domains. In such occasions the SEARCH or FIND functions are really helpful.

SEARCH vs FIND Functions in Excel

I like SEARCH over find because it’s not case sensitive (and FIND is). Plus, SEARCH allows you to use wildcards and FIND doesn’t. One of my fave trainers, Mike Girvin, did a YouTube video demonstrating the differences between the two. I can only think of one time I needed to use FIND (because I needed it to be case sensitive). But in this case either will do.

Next Up: SEARCH Function

So what the SEARCH and FIND functions do is return the position of a character you search for. This works swimmingly with the LEFT function because you can find a character’s position and return that for the number of characters you want to extract. If you need to stop at the character before the one you can isolate using the SEARCH function, you can subtract 1 (or however many you need) from the value returned by the SEARCH function. Or you might need to include the character(s) just after it; in that case you would just add 1 to the value.

The SEARCH function follows the following structure:

SEARCH( substring, string, [start_position] )

substring: Pure, unbridled geek speak that means whatever you’re searching for (we’re going to use “/”)

string: Typically the cell this text string is in, though you could enter text as long as you flank it with quotation marks

start_position: This is optional because oftentimes you’ll start looking for your character in the beginning of the string. In our case, the two forward slashes at the beginning of each URL (e.g., http:// and https://) prove to be problematic. So what we’ll need to do is tell Excel to start counting at the 9th character (because http:// has 7 characters and https:// has 8).

Final Formula

When I’m finished the formula looks like this:

=LEFT(B3,SEARCH("/",B3,9))

Note: If you’re doing this in a formatted table, Excel will used structured references instead. That’s what I always do. One other little benefit is you don’t have to double-click the top cell to send the formula down. It gets applied to the column automatically.

I just didn’t use a table for this tutorial because the structured references can make the formulas look unnecessarily complicated when you’re just learning.

Watch It In Action

Still a little squeamish? Download the workbook and check out the video.

Stupid Feedback Noise

In quiet spots in the video I get feedback. The only solution I’ve had any success with is to unplug the cord from my PC, but I forgot to do that. Sorry.

How To Use This Data

In my next post I’ll show you how you can then take this data, pop it into a pivot table, and analyze it to steal glean from all of your competitors’ best ideas.

  • https://plus.google.com/100938619255397146926?rel=author Brian Jensen

    Great stuff Annie, the video was a really nice touch. I’m looking forward to the next post in the series and appreciate the heads-up on Mike Girvin!

    • http://www.annielytics.com/ Annie Cushing

      Thanks, Brian! Yeah, Mike is awesome. Love his work, and he’s a super nice guy to boot. I’m editing the video for the next post right now. :)

  • cayecaye

    Hey Annie, as always excel formula rules! :) I have a question though our team does the same url cleaning daily, however we do it using the TEXT to COLUMNS feature using / as the delimiting factor. After we delimit we concatenate the main URL.. does this process about sums up the formula above?

    • http://www.annielytics.com/ Annie Cushing

      Text to columns also works. It’s just more of a hatchet job approach. But I used it before I learned how to extract text.

  • Peter Traychev

    Great stuff, thanks :) I’ve been using =MID(A2,8,SEARCH(“/”,A2,9)-8) but this one is simpler. If there is a way to strip “http://” and “http://www.” part in a single formula that would be great.

  • Modesto Siotos

    Hi Annie,
    Thanks for the tips. One of the example URLs in the spreadsheet is
    http://useconomy.about.com/od/suppl1/tp/BP_Oil_Spill.htm but the returned domain appears as http://useconomy.about.com/ which is a subdomain rather than http://about.com/ which is the actual domain.

    Do you know what’s the quickest way to return the true domain rather than the subdomain? Thanks!

    • http://www.annielytics.com/ Annie Cushing

      Well, technically www is a subdomain too. But since I always do my research at the subdomain level, I keep them.

    • Rob James

      If you want to strip any URL down to the basic domain you end up with a lot of complicated formulae, for example once you’ve remove http://www. the following formula will remove the first subdomain.
      =IFERROR(IF(LEN(SUBSTITUTE(LEFT(C2,LEN(C2)-7),”.”,”"))= LEN(LEFT(C2,LEN(C2)-7)),C2,RIGHT(C2,LEN(C2)-SEARCH(“.”,C2,1))),”")