andyMatthews.net

A beginner's guide to regular expressions

I've noticed over the past few months quite a few developers with little to no knowledge of regular expressions (regex from here on out). For whatever reason they haven't taken the time, or had the chance, to learn what I consider to be one of the most powerful, and useful tools available in programming. Even knowing a few basics can really streamline your workflow, and improve your code. Not only are they useful IN code, but they can even help you write code. In this post I'm going to cover some regex basics, then show you some real examples of how they can solve problems for you.

The basics are always a great place to start, so let's look at some syntax. At it's heart, regex are simply a way to match (and replace) one string with another. So we replace the literal string cat, with mouse, or the number 8675309, with 42. Now that would be useful if we wanted to replace many occurrences of 8675309, but that's not all that common. Wouldn't it be nice if, in addition to replacing 8675309 with 42, we could also replace 5318008 with 42? Regular Expressions let's you do that with special strings called meta characters. Meta characters are what gives regex their incredible power. Here's a list of some of the most common meta characters and how they're used.

You can probably start to tell that meta characters are powerful, but it's not that common to want to match just one character. Characters sets allow developers to create groupings of characters to be used in a match. Character classes are pre-existing strings which offer the same functionality. To create a character set, simply wrap any number of characters in square brackets []. Let's take a look at characters sets, and character classes.

It's nice to have the ?, +, and * meta characters available to us, but wouldn't it be nice if we could specifiy a specific number of characters to match? Character ranges allow you to do this by specifying optional start and end numeric values within curly braces { }.

Finally, Regular Expressions allow you to store one, or more, of your matches into temporary variables, called back references, which can used at other points in your expression. Here's how they work.

Whew...fingers are tired. Now that we have a reference for the important aspects of regex, let's look at how we'd use them in real world examples.

I hope this will help you get started with regular expressions. Feel free to post your questions in the comments.