Regex: Match everything between two things

by John H
5 minutes

regularOhh the adventures of regular expressions!!!  I've been beating my head against the wall with this one.  I have a long string filled with all kinds of characters and variables and I need to grab everything between two things.

Example 1: I want to grab the email address in a url.

em>http://www.domain.com/index.html?p=test&amp;email=john@domain.com</em

The regular expression to get the email address is easy in this one.  You just need to use the look behind operator.  So the expression would be (?<=email=)(.+)  This uses the look behind operator ?<= and what I'm looking behind is email= .  Now the look behind operator is just a pointer operation - it actually doesn't consume anything - so what you have to do at that point is consume the characters behind the operator.  That is accomplished using (.+)  the . will grab any character because it is a wildcard.  The + will repeat the operation until it gets to the end of the line. (.*) would work as well.

Example 2:  I want to grab the email in the address but I have another variable.

em>http://www.domain.com/index.html?p=test&amp;email=john@domain.com&amp;name=john</em

In this situation (?<=email=)(.) won't work because it will return everything until the end of the line which is john@domain.com_name=john  What I want to do now is stop my "greedy" operator at the ampersand &.  To do this I need to use the look ahead operator (?=).  So my look ahead expression looks like this (?=&).  Altogether the expression looks like this (?<=email=)(.)(?=&)   The expression is look behind email= and starts grabbing characters until it looks ahead for the & .  Just what I wanted.  Unfortunately I can't use this scenario for the first example because in the first example there isn't a trailing & - instead it is the end of the line. This expression is work only if there is a & at the end of the email address

 

Example 3: I want to grab the email address but have lots of variables afterward

em>http://www.domain.com/index.html?p=test&amp;email=john@domain.com&amp;name=john&amp;zip=90800</em

Our last expression is greedy and since it is so greedy it will keep eating characters until it hits the last which is too far.  So how do we stop the expression at the first instead of the last one?  We do that by using a "Lazy" modifier which is ?.  So now our expression looks like <?<=email=)(.*?)(?=&)

Example 4: I want to grab the email address but I don't know where the email variable is

http://domain.com/index.html?email=john@domain.com or http://domain.com/index.html?email=john@domain.com&amp;name=john or http://domain.com/index.html?email=john@domain.com&amp;name=john&amp;zip=343324

In this last situation I want grab both instances - one with multiple variables or one with email at the end.  To accomplish this you will need to have two expressions and trigger them with an OR operator | .  The expression looks like this (?<=email=)(.*?)(?=&)|(?<=email=)(.+)

This will return the email address if it is at the end of the line or in the middle.

 

 

Related Articles

Regular Expression Anything In Between Two strings

$pattern =...

John H John H
~1 minute

Moving Wordpress Site to a New Domain

Scenario:  You bought domain, JohnsHatDogs.com.  You configured the DNS and it's pointing to...

John H John H
5 minutes

Deleting records in AWS Cloud Search

Deleting All Records through SDK Managing documents is sucky in Cloud Search. The CloudSearch...

John H John H
~1 minute