<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Perl Tips Blog from Bay View Training</title>
	<atom:link href="http://www.bayview.com/blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.bayview.com/blog</link>
	<description></description>
	<lastBuildDate>Fri, 19 Dec 2008 16:33:05 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Perl in xkcd again</title>
		<link>http://www.bayview.com/blog/2008/12/19/perl-in-xkcd-again/</link>
		<comments>http://www.bayview.com/blog/2008/12/19/perl-in-xkcd-again/#comments</comments>
		<pubDate>Fri, 19 Dec 2008 16:30:36 +0000</pubDate>
		<dc:creator>William Ward</dc:creator>
				<category><![CDATA[Perl]]></category>

		<guid isPermaLink="false">http://www.bayview.com/blog/?p=40</guid>
		<description><![CDATA[The web comic xkcd once again points out the importance of Perl:

]]></description>
			<content:encoded><![CDATA[<p>The web comic <a href="http://xkcd.com/">xkcd</a> once again points out the importance of Perl:</p>
<p><a href="http://xkcd.com/519/"><img src="http://imgs.xkcd.com/comics/11th_grade.png" width="356" height="222" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.bayview.com/blog/2008/12/19/perl-in-xkcd-again/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A nice article from a recent student</title>
		<link>http://www.bayview.com/blog/2008/10/30/a-nice-article-from-a-recent-student/</link>
		<comments>http://www.bayview.com/blog/2008/10/30/a-nice-article-from-a-recent-student/#comments</comments>
		<pubDate>Fri, 31 Oct 2008 02:38:15 +0000</pubDate>
		<dc:creator>William Ward</dc:creator>
				<category><![CDATA[News]]></category>

		<guid isPermaLink="false">http://www.bayview.com/blog/?p=39</guid>
		<description><![CDATA[At the end of each class, we always ask our students to fill out an evaluation form.  There are two reasons for this: to find areas where we need to improve, and with the hope that they&#8217;ll put down some kind words that we can quote on the site in the testimonials page.  [...]]]></description>
			<content:encoded><![CDATA[<p>At the end of each class, we always ask our students to fill out an evaluation form.  There are two reasons for this: to find areas where we need to improve, and with the hope that they&#8217;ll put down some kind words that we can quote on the site in the testimonials page.  A recent student, David &#8220;Zonker&#8221; Harris, took it one step farther and wrote a <a href="http://consoleteam.blogspot.com/2008/10/little-perl-of-wisdom.html">glowing blog entry</a> about his experiences.  Thanks Zonker!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bayview.com/blog/2008/10/30/a-nice-article-from-a-recent-student/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Searching files with multi-line entries</title>
		<link>http://www.bayview.com/blog/2008/10/20/searching-files-with-multi-line-entries/</link>
		<comments>http://www.bayview.com/blog/2008/10/20/searching-files-with-multi-line-entries/#comments</comments>
		<pubDate>Tue, 21 Oct 2008 00:20:50 +0000</pubDate>
		<dc:creator>William Ward</dc:creator>
				<category><![CDATA[Files & Directories]]></category>
		<category><![CDATA[Perl Tips]]></category>

		<guid isPermaLink="false">http://www.bayview.com/blog/?p=38</guid>
		<description><![CDATA[Say that you have a file that looks something like this:
2008-01-02: first entry
2008-02-03: second entry on two lines
    here is the additional line
2008-03-04: third entry
   has
   three
   extra lines
2008-04-05: fourth entry has just one on line again
If you need to search for all entries that have [...]]]></description>
			<content:encoded><![CDATA[<p>Say that you have a file that looks something like this:</p>
<pre>2008-01-02: first entry
2008-02-03: second entry on two lines
    here is the additional line
2008-03-04: third entry
   has
   three
   extra lines
2008-04-05: fourth entry has just one on line again</pre>
<p>If you need to search for all entries that have &#8220;line&#8221; in the text, and display the entire entry when found, you can&#8217;t just search line-by-line &#8212; that would work for the first and fourth entries, but the second entry would miss the additional line, and in the third entry the word &#8220;line&#8221; is on the fourth line so you&#8217;d miss the first three.</p>
<p>What you need to do in a case like this is read line-by-line, but only process an entry once you&#8217;ve found the end of the entry.  There are two ways to solve this, depending on your data and what your needs are:</p>
<ol>
<li>If the file is not very large (and never will be), and you need to do the search multiple times, then you could load the entire file into memory as an array of entries, and then search that array using grep or foreach.</li>
<li>If the file is very large, or you only need to scan through it once to find one result, then just load each entry into a string, and display that string if it matches.</li>
</ol>
<p>First I&#8217;ll show how to load the entire file since I think it&#8217;s easier to understand:</p>
<pre>my @stuff;
while (&lt;IN&gt;) {
    if (/^\s/) { $stuff[-1] .= $_; }
    else { push @stuff, $_;  }
}
print grep { /line/ } @stuff;</pre>
<p>If the line begins with space, then it&#8217;s a continuation line, so modify the previous entry found (the last item of the array, using index -1) to add the text to it.  If the line doesn&#8217;t begin with space, it&#8217;s a new entry so push it onto the end of the array.  Once the entire file is read, each element in @stuff would correspond to one record, including the multiple extra lines, so it&#8217;s easy to scan using grep to find what you need.</p>
<p>The second approach involves using a scalar, rather than an array, to build up each record.  When the next new record starts, or end of file is reached, we check to see if the record we&#8217;ve just read matches the pattern:</p>
<pre>my $last_entry;
while (&lt;IN&gt;) {
    if (/^\s/) {
        $last_entry .= $_;
    }
    else {
        print $last_entry if $last_entry =~ /line/;
        $last_entry = $_;
    }
    print $last_entry if $last_entry =~ /line/ &#038;&#038; eof(IN);
}</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.bayview.com/blog/2008/10/20/searching-files-with-multi-line-entries/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Sorting in Reverse Order</title>
		<link>http://www.bayview.com/blog/2008/08/08/sorting-in-reverse-order/</link>
		<comments>http://www.bayview.com/blog/2008/08/08/sorting-in-reverse-order/#comments</comments>
		<pubDate>Fri, 08 Aug 2008 20:16:26 +0000</pubDate>
		<dc:creator>William Ward</dc:creator>
				<category><![CDATA[Data Structures]]></category>
		<category><![CDATA[Perl Tips]]></category>

		<guid isPermaLink="false">http://www.bayview.com/blog/2008/08/08/sorting-in-reverse-order/</guid>
		<description><![CDATA[Say you have an array of names: @names=qw(Tom Dick Harry);
If you wanted to sort these, you could just use a simple sort() command: @sorted=sort(@names);  That uses alphabetical order for sorting by default.  The sort criteria is not given, but you could get the same results by giving a longer version of the sort [...]]]></description>
			<content:encoded><![CDATA[<p>Say you have an array of names: <tt>@names=qw(Tom Dick Harry);</tt></p>
<p>If you wanted to sort these, you could just use a simple <tt>sort()</tt> command: <tt>@sorted=sort(@names);</tt>  That uses alphabetical order for sorting by default.  The sort criteria is not given, but you could get the same results by giving a longer version of the sort function call, like so: <tt>@sorted=sort {$a cmp $b} @names;</tt></p>
<p>Here, <tt>$a</tt> and <tt>$b</tt> are special variables which are used to compare two of the values in <tt>@names</tt> to see which should come first in the sort order.  The <tt>cmp</tt> operator returns a positive value if <tt>$a > $b</tt>, a negative value if <tt>$a < $b</tt>, or zero if they are equal.  By changing the formula that follows the sort keyword you can change the order of the sort function.</p>
<p>If you wanted to sort in reverse order, you could just use Perl's </tt><tt>reverse()</tt> function: <tt>@revsort=reverse(@sorted);</tt> or in one statement, <tt>@revsort=reverse sort @names;</tt>.  This is inefficient however as it must make a temporary copy of the list of names, which could get expensive if the array is large.  A more efficient way is to just change the <em>sort criteria</em> to produce the reverse result.  <tt>@revsort=sort { $b cmp $a } @names;</tt>.  Now, the values returned by <tt>cmp</tt> are the opposite of what they were above, and so the sort order is the opposite.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bayview.com/blog/2008/08/08/sorting-in-reverse-order/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Finding the Biggest File</title>
		<link>http://www.bayview.com/blog/2008/07/29/finding-the-biggest-file/</link>
		<comments>http://www.bayview.com/blog/2008/07/29/finding-the-biggest-file/#comments</comments>
		<pubDate>Wed, 30 Jul 2008 00:28:01 +0000</pubDate>
		<dc:creator>William Ward</dc:creator>
				<category><![CDATA[Files & Directories]]></category>
		<category><![CDATA[Perl Tips]]></category>

		<guid isPermaLink="false">http://www.bayview.com/blog/2008/07/29/finding-the-biggest-file/</guid>
		<description><![CDATA[How do I find the biggest files under a directory?  There are many ways to do this, but it isn&#8217;t always as easy as it sounds.
First of all if the directory has no sub-directories, it&#8217;s easy.  Just list the files sorted by size, which any operating system can do.  But if there [...]]]></description>
			<content:encoded><![CDATA[<p>How do I find the biggest files under a directory?  There are many ways to do this, but it isn&#8217;t always as easy as it sounds.</p>
<p>First of all if the directory has no sub-directories, it&#8217;s easy.  Just list the files sorted by size, which any operating system can do.  But if there are sub-directories, or if you&#8217;re talking about the entire filesystem, it&#8217;s not so easy.  Here&#8217;s a way to do it using Perl:</p>
<p>The Unix command &#8220;find&#8221; can recursively scan all the directories under a given point, and perform some action on each file.   It has a lot of options, and if combined with commands such as &#8220;ls&#8221; and &#8220;sort&#8221; it can be done, but it is not trivial to get it right.  It would be nice if we could do this within a program so we could have the full power of Perl to work with as we scan these files.  For this reason, the &#8220;File::Find&#8221; module was created.  It ships with Perl so you already have it on your system.  And to make it easier to convert a Unix &#8220;find&#8221; command to a &#8220;File::Find&#8221; program, the script &#8220;find2perl&#8221; is included with Perl as well.</p>
<p>To get started, use the &#8220;find2perl&#8221; command to create the Perl script that scans the files:</p>
<blockquote><p><code>% find2perl . -type f -print > find_biggest.pl</code></p></blockquote>
<p>The file &#8220;find_biggest.pl&#8221; is created with a Perl script that just displays the files found.  If we&#8217;re going to find the biggest files, we need to store the file sizes in some kind of data structure so that we can sort them by size.  What I suggest is putting them into a hash with the filenames as keys and sizes as values.  The &#8220;find_biggest.pl&#8221; script looks something like this so far:</p>
<blockquote><p><code>#! /usr/bin/perl -w<br />
    eval 'exec /usr/bin/perl -S $0 ${1+"$@"}'<br />
        if 0; #$running_under_some_shell</p>
<p>use strict;<br />
use File::Find ();</p>
<p># Set the variable $File::Find::dont_use_nlink if you're using AFS,<br />
# since AFS cheats.</p>
<p># for the convenience of &#038;wanted calls, including -eval statements:<br />
use vars qw/*name *dir *prune/;<br />
*name   = *File::Find::name;<br />
*dir    = *File::Find::dir;<br />
*prune  = *File::Find::prune;</p>
<p>sub wanted;</p>
<p># Traverse desired filesystems<br />
File::Find::find({wanted => \&#038;wanted}, '.');<br />
exit;</p>
<p>sub wanted {<br />
    my ($dev,$ino,$mode,$nlink,$uid,$gid);</p>
<p>    (($dev,$ino,$mode,$nlink,$uid,$gid) = lstat($_)) &#038;&#038;<br />
    -f _ &#038;&#038;<br />
    print("$name\n");<br />
}</code></p></blockquote>
<p>The important parts to look at are the call to <code>File::Find::find()</code> and the subroutine <code>wanted</code>.  File::Find will execute the subroutine once for each file that it finds.  So what we need to do is modify the subroutine to record the file names, and then after <code>File::Find::find</code> exits, sort the files.</p>
<p>First, we change <code>wanted</code> to store the filenames in a hash rather than print them.  Change it to this:</p>
<blockquote><p><code>sub wanted {<br />
    my ($dev,$ino,$mode,$nlink,$uid,$gid);</p>
<p>    (($dev,$ino,$mode,$nlink,$uid,$gid) = lstat($_)) &#038;&#038;<br />
    -f _ &#038;&#038;<br />
    ($size{$name} = -s _);<br />
}</code></p></blockquote>
<p>Since we&#8217;re introducing a new variable <code>%size</code> we need to declare it: add &#8220;<code>my %size;</code>&#8221; just before calling <code>File::Find::find</code>.  Then we need to sort the keys of that hash according to the values.  So, after calling <code>File::Find::find</code> but before <code>exit</code>, we want to do the following:</p>
<blockquote><p><code>my @files = sort { $size{$a} < => $size{$b} } keys %size;<br />
print "Biggest file: $files[-1] (Size: $size{$files[-1]} bytes)\n";</code></p></blockquote>
<p>You can download the final script <a href="http://www.bayview.com/blog/wp-content/uploads/2008/07/find_biggestpl.txt">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bayview.com/blog/2008/07/29/finding-the-biggest-file/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Better Way to Slurp</title>
		<link>http://www.bayview.com/blog/2008/07/22/a-better-way-to-slurp/</link>
		<comments>http://www.bayview.com/blog/2008/07/22/a-better-way-to-slurp/#comments</comments>
		<pubDate>Tue, 22 Jul 2008 15:52:32 +0000</pubDate>
		<dc:creator>William Ward</dc:creator>
				<category><![CDATA[Files & Directories]]></category>
		<category><![CDATA[Perl Tips]]></category>

		<guid isPermaLink="false">http://www.bayview.com/blog/2008/07/22/a-better-way-to-slurp/</guid>
		<description><![CDATA[In an earlier entry (was it really six years ago?) I talked about the usage of $/ and the -0 command-line option to Perl to change the input delimiter.  But there&#8217;s another way to read in &#8220;slurp&#8221; mode that isn&#8217;t described there, the File::Slurp Perl module.
File::Slurp provides a function read_file, which given a filename, [...]]]></description>
			<content:encoded><![CDATA[<p>In <a href="http://www.bayview.com/blog/2002/07/29/input-delimiter/">an earlier entry</a> (was it really six years ago?) I talked about the usage of $/ and the -0 command-line option to Perl to change the input delimiter.  But there&#8217;s another way to read in &#8220;slurp&#8221; mode that isn&#8217;t described there, the <a href="http://search.cpan.org/dist/File-Slurp/">File::Slurp</a> Perl module.</p>
<p>File::Slurp provides a function read_file, which given a filename, returns its contents as a single string if called in scalar context (in array context, returns an array of lines, as defined by whatever delimiter $/ is set to).  It&#8217;s basically the same thing as setting $/ to the empty string and reading, but contained in a subroutine.</p>
<p>There is also a subroutine write_file, which lets you &#8220;spew&#8221; the contents of a string into a file.  It saves you a few lines of code: open, print, and close.  It can also be called as overwrite_file as a synonym, or you can call append_file to add to rather than overwrite a file.</p>
<p>Finally, read_dir lets you get the contents of a directory in one go, which is a lot more convenient than using opendir/readdir.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bayview.com/blog/2008/07/22/a-better-way-to-slurp/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Platform-Specific Perl</title>
		<link>http://www.bayview.com/blog/2006/10/02/platform-specific-perl/</link>
		<comments>http://www.bayview.com/blog/2006/10/02/platform-specific-perl/#comments</comments>
		<pubDate>Tue, 03 Oct 2006 04:29:36 +0000</pubDate>
		<dc:creator>William Ward</dc:creator>
				<category><![CDATA[Files & Directories]]></category>
		<category><![CDATA[Perl Tips]]></category>

		<guid isPermaLink="false">http://www.bayview.com/blog/?p=26</guid>
		<description><![CDATA[As an interpreted language, Perl scripts can generally be run unmodified on any platform.  But there are situations where the differences between platforms make it necessary to test what platform you are running on and act accordingly.
Say, for example, that you need to change permissions on a file.  On Unix and related operating [...]]]></description>
			<content:encoded><![CDATA[<p>As an interpreted language, Perl scripts can generally be run unmodified on any platform.  But there are situations where the differences between platforms make it necessary to test what platform you are running on and act accordingly.<span id="more-26"></span></p>
<p>Say, for example, that you need to change permissions on a file.  On Unix and related operating systems (including Mac OS X and Linux) you would use the <tt>chmod</tt> function, but that doesn&#8217;t do much on Windows.  Although the <tt>chmod</tt> command will execute on Windows it doesn&#8217;t do much.  Traditional FAT filesystems have only four attributes per file: archived (A), read-only (R), hidden (H), and system (S). These can be checked and set with the <a href="http://search.cpan.org/perldoc?Win32%3A%3AFile">Win32::File</a> module.  For NTFS, use <a href="http://search.cpan.org/perldoc?Win32%3A%3AFileSecurity">Win32::FileSecurity</a>.<br />
(For more information about Perl on Windows, see the <a href="http://www.perl.com/doc/FAQs/nt/perlwin32faq4.html">Perl Win32 FAQ</a>.)</p>
<p>So, in your program, if you want to be able to do one thing on Windows and another on Unix-like systems, you need to test what platform you are on.  The way to do that in Perl is to look at the special variable <tt>$^O</tt>.  (That&#8217;s the letter <tt>O</tt>, not the number <tt>0</tt>).  If you&#8217;re on Windows, it will have the value &#8220;MSWin32&#8243; (sadly, it doesn&#8217;t give you any clue what version of Windows you&#8217;re on).  So you would do something like this:</p>
<pre>if ($^O eq &quot;MSWin32&quot;) {
    require Win32::FileSecurity;
    Win32::FileSecurity::Set($file, { ... })
        or die &quot;Error in FileSecurity for $file: $^E&quot;;
}
else {
    chmod 0644, $file
        or die &quot;Error in chmod for $file: $!&quot;;
}</pre>
<p>To find out what the value of $^O is on your platform, run this little command from your shell:</p>
<pre>perl -e &quot;print $^O&quot;;</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.bayview.com/blog/2006/10/02/platform-specific-perl/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Ignorance is Bliss &#8211; non-memorizing parentheses</title>
		<link>http://www.bayview.com/blog/2006/04/20/ignorance-is-bliss-non-memorizing-parentheses/</link>
		<comments>http://www.bayview.com/blog/2006/04/20/ignorance-is-bliss-non-memorizing-parentheses/#comments</comments>
		<pubDate>Thu, 20 Apr 2006 22:07:39 +0000</pubDate>
		<dc:creator>William Ward</dc:creator>
				<category><![CDATA[Perl Tips]]></category>
		<category><![CDATA[Regular Expressions]]></category>

		<guid isPermaLink="false">http://www.bayview.com/blog/?p=24</guid>
		<description><![CDATA[One of regular expressions&#8217; most useful features is memorization.  To do this, just put parentheses around part of your expression and the result will be memorized:
my($name) = /hello, (\w+)/
In this example, we look in $_ for the word &#8220;hello&#8221; followed by a comma, space, and a word.  Since the word, \w+, has parentheses [...]]]></description>
			<content:encoded><![CDATA[<p>One of regular expressions&#8217; most useful features is memorization.  To do this, just put parentheses around part of your expression and the result will be memorized:</p>
<blockquote><p><code>my($name) = /hello, (\w+)/</code></p></blockquote>
<p>In this example, we look in <code>$_</code> for the word &#8220;hello&#8221; followed by a comma, space, and a word.  Since the word, <code>\w+</code>, has parentheses around it, the part of the string that it matches gets memorized.  In this example, we are assigning the return value of the regular expression match to <code>$name</code>.  So if <code>$_</code> contains &#8220;hello, world&#8221; then $name gets &#8220;world&#8221; &#8211; very convenient.</p>
<p>But parentheses also do other things besides memorize their contents, and this feature can become annoying.  Here&#8217;s an example.<span id="more-24"></span></p>
<p>In a regular expression the <code>|</code> symbol indicates &#8220;or&#8221; &#8211; either the stuff to the left of it <em>or</em> the stuff to the right of it will match.  For example, <code>/hello|hi/</code> will match either &#8220;hello&#8221; or &#8220;hi&#8221; in the string.  You can even have more than one of these: <code>/hello|hi|howdy|greetings/</code> will match any of those four words.</p>
<p>The trouble is, what if you want the &#8220;or&#8221; to apply to only <em>part</em> of the string?  That&#8217;s where parentheses come in.  Let&#8217;s combine the previous two examples to show what I mean:</p>
<blockquote><p><code>my($name) = /(hello|hi|howdy|greetings), (\w+)/</code></p></blockquote>
<p>In this example, we want any of &#8220;hello,&#8221; &#8220;hi,&#8221; &#8220;howdy,&#8221; or &#8220;greetings&#8221;, followed by a comma, space, and a word which is memorized.  The problem is, the greeting word is also memorized, and so <code>$name</code> gets that word instead of the name that we want it to get!</p>
<p>The easy solution is to allocate a variable for that word:</p>
<blockquote><p><code>my($x, $name) = /<strong>(hello|hi|howdy|greetings)</strong>, (\w+)/</code></p></blockquote>
<p>But here, we don&#8217;t care about the value in <code>$x</code> so why bother allocating a variable for it?  Can&#8217;t get just this one benefit of parentheses without having them memorize anything?  For years, the answer was no.  But then a few years back the Perl regex guys came up with a syntax to do it &#8211; just add <code>?:</code> to the beginning of the parenthesized block, making it:</p>
<blockquote><p><code>my($name) = /(<strong>?:</strong>hello|hi|howdy|greetings), (\w+)/</code></p></blockquote>
<p>Gee, that was awfully obvious, wasn&#8217;t it?  <strong>NOT!</strong>  <em>Why do they have to make these things so unintelligible?</em> I hear you cry.</p>
<p>The answer is backward compatibility.  Think about it &#8211; all the obvious characters already mean something, or if they don&#8217;t, chances are someone&#8217;s used them in a regular expression already to search for that character.  So the <em>only</em> way to introduce a new feature into regular expressions is to use something that previously was a syntax error.  Since the &#8220;<code>?</code>&#8221; character in a regex means &#8220;the previous thing zero or one times&#8221; and the thing before the &#8220;<code>?</code>&#8221; in this syntax is &#8220;<code>(</code>&#8221; (which if you recall means &#8220;start memorizing here&#8221;), it didn&#8217;t make sense to say &#8220;start memorizing here, zero or one times&#8221; so it was a syntax error.  Since it was an error, nobody would have used it in an existing Perl script.  So by giving <code>(?</code> a meaning that wasn&#8217;t a syntax error, backward compatibility is preserved.</p>
<p>But why &#8220;<code>(?:</code>&#8221; and not just &#8220;<code>(?</code>&#8220;?  I wasn&#8217;t there, but I would assume they wanted to add more features to the parenthesized syntax and were running out of previously-bad syntax that they could give meaning to.  For example, you may know that you can make a regex case-insensitive by adding <code>/i</code> to the end.  Well, you can also insert the &#8220;<code>i</code>&#8221; between the &#8220;<code>?</code>&#8221; and &#8220;<code>:</code>&#8221; to make only part of the regex be case-insensitive: <code>/(?i:hello|hi), world/</code> would allow &#8220;hello&#8221; or &#8220;HELLO&#8221; but &#8220;world&#8221; would have to be all lowercase.</p>
<p>So, the bottom line: if you find yourself wanting to use parentheses in your regex for reasons other than memorizing, and memorizing gets in your way (or you want to save a little on performance, since memorizing can slow things down a little), then just remember to insert <code>?:</code> at the start of the parenthesized part of your pattern.</p>
<blockquote><p><code>my($name) = /(?:hello|hi|howdy|greetings), (\w+)/</code></p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.bayview.com/blog/2006/04/20/ignorance-is-bliss-non-memorizing-parentheses/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Dates in Perl: Hawaiian Vacation Planning</title>
		<link>http://www.bayview.com/blog/2006/01/05/dates-in-perl-hawaiian-vacation-planning/</link>
		<comments>http://www.bayview.com/blog/2006/01/05/dates-in-perl-hawaiian-vacation-planning/#comments</comments>
		<pubDate>Fri, 06 Jan 2006 01:27:00 +0000</pubDate>
		<dc:creator>William Ward</dc:creator>
				<category><![CDATA[Miscellaneous]]></category>
		<category><![CDATA[Perl Tips]]></category>

		<guid isPermaLink="false">http://www.bayview.com/blog/?p=3</guid>
		<description><![CDATA[Since we&#8217;re starting a new year, let&#8217;s look at handling dates in Perl.  Let&#8217;s say the user enters a date and you want to check if it&#8217;s between a particular range of start/end dates.
In particular, let&#8217;s say you want to go to Hawaii and your kids are in school for the spring semester from [...]]]></description>
			<content:encoded><![CDATA[<p>Since we&#8217;re starting a new year, let&#8217;s look at handling dates in Perl.  Let&#8217;s say the user enters a date and you want to check if it&#8217;s between a particular range of start/end dates.</p>
<p>In particular, let&#8217;s say you want to go to Hawaii and your kids are in school for the spring semester from January 9 through June 2.  Your travel agent gives you a list of possible dates when you can go to Hawaii really cheaply, and you want to know which ones conflict with your kids&#8217; school schedule so you can include the budget for a babysitter in the cost of the trip.<br />
<span id="more-3"></span><br />
The easiest way to compare dates in a date range is to get them into a format that you can compare using regular comparison operators.  There are two typical approaches for doing this:</p>
<p>The first is by converting the dates to numbers.  This is the way C programmers do it.  You convert the dates into the number of seconds since January 1 1970 at midnight, GMT.  (And you thought Perl programmers were arbitrary?)  Perl supports this by providing interfaces to C library functions such as localtime() and timelocal().  You parse the dates into a set of numbers for year, month, day, hours, minutes, and seconds and pass it to the timelocal() function (which is found in the Time::Local module, natch), and out comes a big number.  Do that for the start and end dates of the school calendar, and dates of your Hawaii trip, and you can compare them as numbers.</p>
<p>The trouble with this solution is that it&#8217;s difficult to parse dates.  You have to deal with all the different formats they can be written in (or use a fussy user interface that requires them in a very particular format).  You have to worry about weird things like subtracting 1 from the month so it&#8217;s in the range 0-11 (which timelocal needs) and more importantly, every time you need to write a script or module to deal with dates you have to do it all over again!</p>
<p>So a much better approach is to use a Perl module that has built-in support for dates.  Perl doesn&#8217;t come with any but there are a couple you can download from www.cpan.org (or if you&#8217;re using ActivePerl on Windows, installed using PPM).  They handle all the date parsing and can even do calculations to determine not just where the date of your trip to Hawaii lies within the semester at your kids&#8217; school, but how many days (or minutes, or seconds&#8230;) away it is from summer vacation.</p>
<p>Two modules that I can suggest for doing this are Date::Manip and Date::Calc.  I don&#8217;t love the API for Date::Manip, and it can be a little slow, but I love what it does.  It can parse just about any date format you can think of (plus a few you probably can&#8217;t think of) and generate dates in just about any format.  It can do calculations between dates and even consider business days vs. calendar days.  It&#8217;s written all in Perl so you can install it easily on any platform.  The other, Date::Calc, is faster and better designed, and does almost all the things Date::Manip does, but it has a C-compiled back-end which means you have to do a little more work to install it.</p>
<p>For now we&#8217;ll look at Date::Manip, but the same approach can be taken with the other modules such as Date::Calc.  The strategy we&#8217;ll take is to use string comparison rather than numeric, and convert the dates to a common string format that can be compared easily.  The format used by Date::Manip is YYYYMMDDHH:MM:SS.  So in our example, your school semester runs from 2006010600:00:00 to 2006060200:00:00 (we don&#8217;t care about the time of day, so just use zeros).  You get the list of dates from your travel agent and maybe each tour company returns the information in a different date format, so you get a list such as &#8220;April 1st&#8221; and &#8220;July 11&#8243; and &#8220;2/26/2006&#8243; and you need to parse them.</p>
<p>Date::Manip&#8217;s ParseDate subroutine can convert each of those to the right string format, and then getting the answer is easy using Perl&#8217;s gt and lt operators.  Remember, don&#8217;t use > and < for comparing strings!  (Although in this case the results would be the same, they wouldn&#8217;t if the minutes &#038; seconds mattered.  And besides if you have warnings enabled &#8211; and you should &#8211; it&#8217;ll complain when it hits the colons in the strings.)</p>
<p>So there you have it &#8211; run the dates through ParseDate subroutine and then just compare them as strings.  The answer? 2006040100:00:00 and 2006022600:00:00 fall within the school year, but 2006071100:00:00 doesn&#8217;t.  Don&#8217;t forget your sunscreen!<br /><p>Technorati Tags: <a href="http://technorati.com/tag/Perl" rel="tag">Perl</a>, <a href="http://technorati.com/tag/parsing+dates" rel="tag"> parsing dates</a>, <a href="http://technorati.com/tag/comparing+dates" rel="tag"> comparing dates</a>, <a href="http://technorati.com/tag/date+formats" rel="tag"> date formats</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.bayview.com/blog/2006/01/05/dates-in-perl-hawaiian-vacation-planning/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Finding the Largest File in a Directory</title>
		<link>http://www.bayview.com/blog/2005/02/22/finding-the-largest-file-in-a-directory/</link>
		<comments>http://www.bayview.com/blog/2005/02/22/finding-the-largest-file-in-a-directory/#comments</comments>
		<pubDate>Wed, 23 Feb 2005 01:31:07 +0000</pubDate>
		<dc:creator>William Ward</dc:creator>
				<category><![CDATA[Files & Directories]]></category>
		<category><![CDATA[Perl Tips]]></category>

		<guid isPermaLink="false">http://www.bayview.com/blog/?p=11</guid>
		<description><![CDATA[Here&#8217;s an easy way to find the largest file in a directory.

First, open the directory to read the list of file names in it.
  opendir DIR, $directory
      or die &#34;Error reading $directory: $!\n&#34;;
Then, read the file names and sort according to size.
  my @sorted =
    [...]]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s an easy way to find the largest file in a directory.</p>
<p><span id="more-11"></span></p>
<p>First, open the directory to read the list of file names in it.</p>
<pre>  opendir DIR, $directory
      or die &quot;Error reading $directory: $!\n&quot;;</pre>
<p>Then, read the file names and sort according to size.</p>
<pre>  my @sorted =
      sort {-s &quot;$directory/$a&quot; &lt;=&gt; -s &quot;$directory/$b&quot;}
	 readdir(DIR);</pre>
<p>Finally, close the directory.</p>
<pre>  closedir DIR;</pre>
<p>Now you can print the name of the largest file.</p>
<pre>  print &quot;Largest file in $directory is $sorted[-1].\n&quot;;</pre>
<p>Variations:</p>
<ul>
<li>To get the oldest file, use <tt>-M</tt> instead of <tt>-s</tt>.</li>
<li>To get the smallest file, or the newest file, swap <tt>$a</tt> and <tt>$b</tt>.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.bayview.com/blog/2005/02/22/finding-the-largest-file-in-a-directory/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
