Using Scalar Variables in Regular Expressions

in Perl Tips, Regular Expressions
by William Ward on August 4, 2004 3:20 pm

If you know how to use regular expressions you will know about special codes like ^, $, and \w which can be used to indicate position or certain classes of characters in the string. But in fact, anything that works in a double-quoted string can be used in a regular expression!

In a double-quoted string in Perl you can have backslash codes such as "\n" which represent special characters, or scalar or array variable names which cause the contents of those variables to be inserted into the string.

In a regular expression, you can do the same thing. And if the string contains regular expression commands, they will be executed as if the contents of the string was listed directly in the regular expression.

For example:

  my $regex = '.txt$';
  opendir(DIR, ".") or die "opendir: $!n";
  while(defined($_ = readdir(DIR)))
  {
      print if /$regex/
  }

Here, each filename is compared to the pattern /\.txt$/ as if it were written that way inside the while loop. DoubleSingle-quoted strings are used when setting $regex to avoid needing an extra backslash.

Warning: watch out for the empty string! In Perl, the regular expression // will simply repeat the previous successful match. If $regex is the empty string instead of something like "\\.txt$" then you will effectively be matching // in the while loop. So if a previous regular expression was matched during this program, you’ll see that pattern compared to the filename instead of $regex! The lesson here is to be careful to ensure that $regex is not empty.

9 Comments »

  1. There is a more interesting case: using constant in the regexp. For example, I have constants:

    use constant NORNAL_REQ = 1;
    use constant ADD_REQ = 2;
    use constant FAILED_REQ = 3;

    And then I’d like to use them in the one of representation of SWITCH block:

    my $test = 2;
    SWITCH: for($test) {
    /^${NORMAL_REQ}$/x && do { do something for case 1 … ; last SWITCH;};
    /^${ADD_REQ}$/x && do { do something for case 2 … ; last SWITCH;};
    /^${FAILED_REQ}$/x && do { do something for case 3 … ; last SWITCH;}
    }

    Comment by Michael — January 30, 2006 @ 1:01 am

  2. Excellent point. However there are a couple of minor problems with your code:

    You used = instead of => in the “use constant” lines
    You need to add a backslash (\) before the constant names in order to make a scalar reference that can be dereferenced.

    Here’s the corrected code:

    use constant NORNAL_REQ => 1;
    use constant ADD_REQ => 2;
    use constant FAILED_REQ => 3;

    my $test = 2;
    SWITCH: for($test) {
    /^${\NORMAL_REQ}$/x && do { print “case 1\n”; last SWITCH;};
    /^${\ADD_REQ}$/x && do { print “case 2\n”; last SWITCH;};
    /^${\FAILED_REQ}$/x && do { print “case 3\n”; last SWITCH;}
    }

    Comment by William Ward — January 30, 2006 @ 2:11 am

  3. this should say “Single-quoted strings are used when setting $regex to avoid needing an extra backslash.” not “Double-quoted strings are used when setting $regex to avoid needing an extra backslash.”

    Comment by bla — August 27, 2006 @ 10:28 am

  4. Yes, of course you’re right. I wonder how I missed that! Thanks. I edited the post with a line through the mistake.

    Comment by William Ward — September 15, 2006 @ 9:59 pm

  5. I realise this is an old posting but i came here via a google search regarding scalar vars and regexp, coming from a tcl enviroment i have been looking for a way to apply a regexp substitution to an array as a whole vs looping through each item in perl. In terms of performance tcl blows it away in this context, im for certain though that if i could apply the regexp to the entire array in one iteration it would be the opposite result. The question is, how is it possible?

    because it wont let you do

    @array =~ s/\r//g;

    versus

    foreach $item @array { $item =~ s/\r//g; }

    the ladder takes a substantially longer time with 100,000 lines of text than its tcl equivlant.

    Comment by Joe — February 27, 2008 @ 2:34 am

  6. I don’t have any background in tcl, so I’m not spoiled the way you are :)

    I’m pretty sure Perl doesn’t have a way to do what you’re asking for - only various forms of loops (foreach, grep, or map). In certain situations you could join the array elements into a string and apply your regex on that, I suppose.

    You might want to ask your question on perlmonks.org.

    Comment by William Ward — February 27, 2008 @ 6:41 pm

  7. Joe,
    You can actually do this quite simply with the map function. However, I really doubt this will be any faster.

    my @array = (…);
    @array = map {$_ =~ s/\r//g} @array;

    You’ve probably figured this out by now, but it may be useful for other people stopping by.

    Comment by James — June 20, 2008 @ 8:49 am

  8. Hi, I have this portion of code:

    my $sepchar = “|”; # pipe symbol
    @row = split(/$sepchar/, $buffer);

    However, this doesn’t work, but

    @row = split(/\|/, $buffer)

    DOES work. How can I mask special chars in the variable to determine the sepchar dynamically?
    Thanks for any help.

    Comment by george — June 25, 2008 @ 12:48 pm

  9. The contents of your variable $sepchar contain a regular expression metacharacter, so that’s like saying split(/|/, $buffer) without the backslash. To add the backslash use the \Q escape code, like this:

    @row = split(/\Q$sepchar\E/, $buffer);

    You can think of \Q as similar to \U which converts letters to upper case; however what \Q does is insert a \ before any non-alphanumeric characters. The \E indicates the end of the quoted portion, in this case it would be optional since it’s at the end.

    Comment by William Ward — June 25, 2008 @ 4:16 pm

RSS feed for comments on this post. TrackBack URI

Leave a comment