Perl Tips Blog from Bay View Training

Bay View Training

Searching files with multi-line entries

in Files & Directories, Perl Tips
by William Ward on October 20, 2008 4:20 pm

Say that you have a file that looks something like this:

2008-01-02: first entry
2008-02-03: second entry on two lines
    here is the additional line
2008-03-04: third entry
   has
   three
   extra lines
2008-04-05: fourth entry has just one on line again

If you need to search for all entries that have “line” in the text, and display the entire entry when found, you can’t just search line-by-line — that would work for the first and fourth entries, but the second entry would miss the additional line, and in the third entry the word “line” is on the fourth line so you’d miss the first three.

What you need to do in a case like this is read line-by-line, but only process an entry once you’ve found the end of the entry. There are two ways to solve this, depending on your data and what your needs are:

  1. If the file is not very large (and never will be), and you need to do the search multiple times, then you could load the entire file into memory as an array of entries, and then search that array using grep or foreach.
  2. If the file is very large, or you only need to scan through it once to find one result, then just load each entry into a string, and display that string if it matches.

First I’ll show how to load the entire file since I think it’s easier to understand:

my @stuff;
while (<IN>) {
    if (/^\s/) { $stuff[-1] .= $_; }
    else { push @stuff, $_;  }
}
print grep { /line/ } @stuff;

If the line begins with space, then it’s a continuation line, so modify the previous entry found (the last item of the array, using index -1) to add the text to it. If the line doesn’t begin with space, it’s a new entry so push it onto the end of the array. Once the entire file is read, each element in @stuff would correspond to one record, including the multiple extra lines, so it’s easy to scan using grep to find what you need.

The second approach involves using a scalar, rather than an array, to build up each record. When the next new record starts, or end of file is reached, we check to see if the record we’ve just read matches the pattern:

my $last_entry;
while (<IN>) {
    if (/^\s/) {
        $last_entry .= $_;
    }
    else {
        print $last_entry if $last_entry =~ /line/;
        $last_entry = $_;
    }
    print $last_entry if $last_entry =~ /line/ && eof(IN);
}

3 Comments »

  1. [...] Searching files with multi-line entries By William Ward Say that you have a file that looks something like this: 2008-01-02: first entry 2008-02-03: second entry on two lines here is the additional line 2008-03-04: third entry has three extra lines 2008-04-05: fourth entry has just one on … Perl Tips Blog from Bay View Training - http://www.bayview.com/blog/ [...]

    Pingback by Perl Coding School » Blog Archive » perl tips [2008-10-21 03:41:17] — October 20, 2008 @ 8:00 pm

  2. [...] Source Partager sur Facebook   Partager sur Wikio « Perl - Attribut method [...]

    Pingback by Perl - Rechercher une expression et garder plusieurs lignes at Benjamin Baudouin — October 23, 2008 @ 8:24 am

  3. I think this is the first time someone translated one of my Perl Tips into French! How about that?

    Comment by William Ward — October 23, 2008 @ 8:35 am

RSS feed for comments on this post. TrackBack URI

Leave a comment