Easy data validation in Perl with Regexp::Common

Take some of the trickiness out of building regular expressions in Perl with the Regexp::Common module.

Image by:

WOCinTech Chat. Modified by Opensource.com. CC BY-SA 4.0

Building regular expressions in Perl can be a little bit tricky, particularly for the newcomer. It's a powerful technique, but even experienced Perl developers can sometimes find themselves checking the documentation to make sure they've got it right.

Another common issue with regular expressions lies in the common expressions we use all the time; it seems like we're forever re-inventing the wheel! But, for this problem at least, there is a useful answer.

I asked a fellow developer a while back for a list of the most useful modules, and this one really stood out, as I've had the problem that it aims to solve. Damian Conway's module Regexp::Common sets up a framework for having repeatable, useful regular expressions. Helpfully, it comes with a raft of routines already defined, and it provides tools for rolling up your own patterns for anything your application might need. Let's take a quick look.

Usage

Once you've used the module with use Regexp::Common, you can substitute the included patterns right where you would put your own expression, like so:

if ( $input =~ /$RE{num}{int}/ ){
    print 'yes, it is an integer!';
}
elsif ( $input =~ /$RE{quoted}/ ){
    print 'it is a quoted string!';
}

If you prefer it a different way, you can also use the subroutine-based interface. The same logic from above would look like this:

if ( $input =~ RE_num_int() ){
    print 'yes, it is an integer!';
}
elsif ( $input =~ RE_quoted() ){
    print 'it is a quoted string!';
}

Some of the built-in expressions have parameter settings to let you configure their behaviors, like searching for delimiters, formats of strings, and many other things. To use them, just include them in the call:

# Check for balanced parentheses
if ( $input =~ /$RE{balanced}{-parens=>'()'}/ )  {...}
# or using the subroutine interface:
if ( $input =~ RE_balance(-parens=>'()' ) {...}

One of the really nice patterns I spotted was the call to remove leading and/or trailing whitespace. In 15 or so years of writing Perl, I've seen a whole lot of messy ways to do this, but this, to me, is beautifully clean and elegant:

$input =~ s/$RE{ws}{crop}//g;

Numerous patterns have already been deployed for Regexp::Common, including many sorts of URLs, common string formatting issues, credit card numbers, numbers, whitespace, zip codes, U.S. social security numbers, palindromes, and even profanity! I looked at that last one's source code, and I'm stumped; Damian's regular expression-fu is much stronger than mine, and this isn't just a simple list-matching tool. You can see the full list of included modules on the Regexp::Common release page on MetaCPAN.

Creating your own

You can include the pattern export in your use statement, if you'd like to create your own elements in the $RE hash. Here's an example adapted from the documentation:

use Regexp::Common 'pattern';

pattern name   => ['name', 'mine'],
        create => '(?i:Ruthie)',
        #the 'i' makes it case-insensitive!
        ;

my $input = 'Ruthie, I really need you to finish this article!';
if ($input =~ /$RE{name}{mine}/) {
    print "You got mentioned!\n";
}
$input = 'I can even, ruthie, include it mid-sentence.';
if ($input =~ /$RE{name}{mine}/) {
    print "You got mentioned en passant!\n";
}

If your application work uses regular expressions for data validation, be sure and give Regexp::Common a look, and see if you can save yourself some time and suffering. By adding new modules as needed to Regexp::Common's array of tools, you can have consistent validation throughout a large application. If you write something useful, why not submit it to the maintainers to add? You can find contact information in the Regexp::Common documentation.