andyMatthews.net

Parsing search terms from CGI.http_referrer using regular expressions

Today someone over at the House of Fusion mailing list asked how to parse out search terms from an incoming referrer link. Several people responded with options of looping over the referrer, and other suggestions. Then I put forth the idea of using a simple regex to extract the required string. Here's what I came up with, maybe it'll help you?

This searches for a literal string 'q=' that is immediately preceded by a ? or an & and looks for any text after the = but before an &.

That worked really well, but I didn't like the fact that the q= is also returned in the match, especially when that meant I'd have to remove that string from my final result. So I asked on Twitter and was referred to Ben Nadel's post on REMatchGroup where I found out about negative look-behinds. That did the trick. So here's the new regex.

This regular expression uses functionality called negative look-behind. It basically says "only match the target string if it's immediately preceded by another string". Let's break it down.

Hope this helped you out. It was a great challenge, and I learned something new about regex that I didn't know.