Ignorance is Bliss - non-memorizing parentheses

in Perl Tips, Regular Expressions
by William Ward on April 20, 2006 2:07 pm

One of regular expressions’ most useful features is memorization. To do this, just put parentheses around part of your expression and the result will be memorized:

my($name) = /hello, (\w+)/

In this example, we look in $_ for the word “hello” followed by a comma, space, and a word. Since the word, \w+, has parentheses around it, the part of the string that it matches gets memorized. In this example, we are assigning the return value of the regular expression match to $name. So if $_ contains “hello, world” then $name gets “world” - very convenient.

But parentheses also do other things besides memorize their contents, and this feature can become annoying. Here’s an example. (more…)

Using Scalar Variables in Regular Expressions

in Perl Tips, Regular Expressions
by William Ward on August 4, 2004 3:20 pm

If you know how to use regular expressions you will know about special codes like ^, $, and \w which can be used to indicate position or certain classes of characters in the string. But in fact, anything that works in a double-quoted string can be used in a regular expression!

(more…)

Regular Expressions to Parse Data Files

in Perl Tips, Regular Expressions
by William Ward on March 25, 2004 3:52 pm

Regular expressions are the best way to parse text in Perl. And when combined with the hash data structure, you can easily build an in-memory structure based on data read in from a file.

(more…)

Validating Data Types

in Perl Tips, Regular Expressions
by William Ward on July 17, 2003 1:00 am

Perl is not a strongly-typed language. A scalar can hold a string or a number or a reference. But sometimes you need to know what it contains, for example if you are communicating with a strongly-typed system like a relational database, or even if you just want to make sure the user entered a number for his age rather than "old enough" or something of that kind.

(more…)

Non-Greedy Regular Expressions

in Perl Tips, Regular Expressions
by William Ward on February 12, 2003 6:31 pm

Regular expressions in Perl are "greedy." That means that if you use a * or + operator in a regular expression, it grabs as much of the string as it can. This can be frustrating at times, but it’s useful in other respects. Consider this:

  my $ip_addr = "192.168.1.2";
  my ($network, $host) = ($ip_addr =~ /(.+).(.+)/);
  print "network=$network host=$hostn";

You need a way to know for sure whether $network gets "192.168.1" and $lastpart gets "2" or whether the split is "192" vs. "168.1.2". The decision was made to have it be "greedy" which means that the first + grabs the lion’s share, and the second one gets the leftovers. Put another way, the first one gets as much as possible short of making it impossible to match the string.

(more…)