The always enjoyable Jeff Atwood wrote an excellent discussion about regular expressions that everyone should read. Having seen many times where a developer will write a full blown parser instead of a regular expression, it’s critical every developer spend a little time learning regular expressions. It goes without saying that the more code you write the more bugs you have. Regular expressions aren’t too hard, but like anything else us developers deal with, you do need to take the time to learn the basics so you can scale that learning curve. I wanted to chime in with a few additional thoughts on regular expression that have helped me and mention something extremely important I thought Jeff left out.
While there are references and documentation on the web about learning regular expressions, absolutely nothing tops Jeffrey E. F. Friedl’s book Mastering Regular Expressions. This is a topic you need to spend some time thinking about to ensure you really have that “aha!” moment of understanding. While it’s sad in today’s Internet world no one seems to read books anymore, if you don’t read Jeffrey’s book, you’ll be doing the equivalent of sky diving without a parachute. Playing with regular expressions will be fun, but eventually you will be in extreme pain.
Like learning any technology, it helps to see examples. Jeffrey’s book is full of examples, but to see how others have solved many problems using regular expressions, I’ve always liked RegExLib.com. It’s a great list of all sorts of regular expressions, many of which you can use directly. More importantly, you can see all sorts of tips and tricks. My one complaint with RegExLib.com is that they don’t follow Jeff Atwood’s advice and apply good white spacing and comments in the actual regular expressions.
When I got to the end of Jeff Atwood’s blog post, I realized he forgot one of the most important pieces of advice about regular expressions. While you can use a web site to develop and test a regular expression, I highly recommend a standalone tool that runs on your machine. The key reason is that those tools will allow you to save the regular expression to a file so you can check that file into version control. I can’t stress enough that if you’re using regular expressions, the file you used to build a regular expression absolutely has to be part of the source code. With the regular expression tool file in version control, future maintenance developers, which will include you, can easily tweak and fix any problems in the expression in a controlled environment. All the standalone tools you can use support including the test data in the saved file which makes it easy for others to see what you were testing with.
There are numerous tools available but the two I find best are the free Expresso, and the commercial RegexBuddy (which Jeff highly recommends as well). If you can pay the $40.00 USD for RegexBuddy, you’ll have the best tool on the market. I especially like the Debug option which makes it easy to see exactly what’s matching when in an expression. In the screen shot below, I’ve executed one of the example regular expressions to look for the different parts of a URL. In the Debug tab, I’ve selected training.atmosera.com (highlighted in yellow) and the regular expression window is showing me exactly what matched in the upper window (highlighted in blue).
One thing I found a little confusing about RegexBuddy when I started using it was creating a file that contains just the regular expression and the test data as it does not use a document metaphor like nearly any other application in the world. What you’ll do is click on the Library tab, and click the far left button that looks like a blank document. This creates a blank library file. Next highlight the expression from the History window you want to add, and click the Add button making sure to select “Add Regex with Test Subject” from the popup menu. The following screenshot shows you what to click. One thing I especially like about RegexBuddy’s files is that they are straight XML so once it’s in your version control of choice, you can compare and diff the files all you want.
While RegexBuddy is very good, but I still test my regular expressions with Expresso as well. That’s because Expresso has a timer in it so you can get an idea of how fast an expression takes to resolve. This is especially nice if you’re working with greedy expressions with lots of possible matching.
Don’t worry, regular expressions may look hard, but with a little effort, they really are not that bad. Consider them one of the most important tools in your toolbox!