package VCP::Filter::map;

=head1 NAME

VCP::Filter::map - rewrite name, branch_id or skip revisions

=head1 SYNOPSIS

  ## In a .vcp file:

    Map:
            name_glob_1<branch_1> name_out_1<branch_result_1>
            name_glob_2<branch_2> name_out_2<branch_result_2>
            # ... etc ...

  ## From the command line:
   vcp <source> map: name_glob_1<branch_1> name_out_1<branch_result_1> -- <dest>

  ## you may have one or more ( pattern match ) pairs on the command
  ## line, ending with --

  ## the <branch> part of the maps is optional.

=head1 DESCRIPTION

Maps source files, revisions, and branches to destination files and
branches while copying a repository.  This is done by rewriting the
C<name> and C<branch_id> of revisions according to a list of rules.

=head2 Rules

A rule is a pair of expressions specifying a pattern to match against
each incoming revision's name and branch_id and a replacement expression
specifying the revision's new name and branch_id.

The list of rules is evaluated top down; the first rule in the list
that matches is used to generate the new name and branch_id.  If no
other rules match the implicit default rule is to copy files as is.

=head2 Patterns and Replacement Expressions

Patterns and replacements are each are composed of two subexpressions, the
C<name_pat> and the C<branch_id_pat> like so (note the lack of whitespace):

    name_pat<branch_id_pat><@labels_pat>

The C<< <branch_id_pat> >> (including angle brackets) is optional and may
be forbidden by some sources or destinations that embed the concept of a
branch in the name_pat.  (See L<VCP::Dest::p4|VCP::Dest::p4> for an
example, though this may be changed in the future).

For now, the symbols C<#> and C<@> are reserved for future used in all
expressions and must be escaped using C<\>.

Various shell-like wildcards are implemented in pattern expressions,
as shown next.

=head1 Pattern Expressions

The C<name_pat>, C<branch_id_pat> and C<labels_pat> specify patterns
using shell regular expression syntax with the extension that
parentheses are used to extract portions of the match in to numbered
variables which may be used in the result construction, like Perl
regular expressions:

   ?      Matches one character other than "/"
   *      Matches zero or more characters other than "/"
   ...    Matches zero or more characters, including "/"
   (foo)  Matches "foo" and stores it in the $1, $2, etc

Some example pattern C<name_pat>s are:

   name_pat  Matches
   ========  =======
   foo       the top level file "foo"
   foo/bar   the file "foo/bar"
   ...       all files (like a missing name_pat)
   foo/...   all files under "foo/"
   .../bar   all files named "bar" anywhere
   */bar     all files named "bar" one dir down
   ....pm    all files ending in ".pm"
   ?.pm      all top level 4 char files ending in ".pm"
   \?.pm     the top level file "?.pm"
   (*)/...   all files in subdirs, puts the top level dirname in $1

Unix-style slashes are used, even on operating systems where that may
not be the preferred local custom.  A pattern consisting of the empty
string is legal and matches everything (NOTE: currently there is no way
to take advantage of this; quoting is not implemented in the forms
parser yet.  use "..." instead).

Relative paths are taken relative to the rev_root indicated in the
source specification for pattern C<name_pat>s (or in the destination
specification for result C<name_pat>s).  For now, a relative path is a
path that does not begin with the character C</>, so be aware that the
pattern C<(/)> is relative.  This is a limitation of the implementation
and may change, until it does, don't rely on a leading "(" making a path
relative and use multiple rules to match multiple absolute paths.

If no C<name_pat> is provided, C<...> is assumed and the pattern will
match on all filenames.

Some example pattern C<branch_id_pat>s are:

    branch_id_pat  Matches files on
    =============  ================
    <>             no branch label
    <...>          all branches (like a missing <branch_id_pat>)
    <foo>          branch "foo"
    <R...>         branches beginning with "R"
    <R(...)>       branches beginning with "R", the other chars in $1

If no C<branch_id_pat> is provided, files on all branches are matched.
C<*> and C<...> still match differently in pattern C<branch_id_pat>s, as
in <name_pat> patterns, but this is likely to make no difference, as
I've not yet seen a branch label with a "/" in it.  Still, it is wise
to avoid "*" in C<branch_id_pat> patterns.

Some example pattern C<labels_pat>s are:

    labels_pat  Matches files on
    ==========  ================
    <@>         no labels
    <@...>      all labels (like a missing <@labels_pat>)
    <@(...)>    all labels, captured
    <@foo>      label "foo"
    <@R...>     labels beginning with "R"
    <@R(...)>   labels beginning with "R", the other chars in $1

If no C<labels_pat> is provided, revs with any (or no) labels all match.
If the result action is C<<< <<branch_to(...)>> >>>, a branch will be
created for each label that matches the C<labels_pat>.  It's not wise to
use the capture parenthesis in a C<labels_pat> with any other result
actions; the behavior is undefined and may change.  This is because a
revision can have 0, 1, or more than one label, so the semantics of what
gets captured is unclear.

The difference between C<*> and C<...> applies to C<branch_id_pat>s and
C<labels_pat>s, but this is likely to make no difference, as I've not
yet seen a branch label with a "/" in it.  Still, it is wise to favor
C<...> in this case.

Some example composite patterns are (any $ variables set
are given in parenthesis):

    Pattern            Matches
    =======            =======
    foo<>              top level files named "foo" not on a branch
    (...)<>            all files not on a branch ($1)
    (...)/(...)<>      all files not on a branch ($1,$2)
    ...<R1>            all files on branch "R1"
    .../foo<R...>      all files "foo" on branches beginning with "R"
    (...)/foo<R(...)>  all files "foo" on branches beginning with "R" ($1, $2)

=head2 Escaping

Null characters and newlines are forbidden in all expressions.

The characters C<#>, C<@>, C<[>, C<]>, C<{>, C<}>, C<E<gt>>, C<E<lt>>
and C<$> must be escaped using a C<\>, as must any wildcard characters
meant to be taken literally.

In result expressions, the wildcard characters C<*>, C<?>, the wildcard
trigraph C<...> and parentheses must each be escaped with single C<\> as
well.

No other characters are to be escaped.

=head2 Case sensitivity

By default, all patterns are case sensitive.  There is no way to
override this at present; one will be added.

=head2 Result Expressions

Result expressions look a lot like patthern expressions except that
wildcards are not allowed and C<$1> and C<${1}> style variable
interpolation is.

To explore result expressions, let's look at converting set of example
files between cvs and p4 repositories.  The difficulty here is that cvs
and p4 have differing branching implementations.

Let's assume our CVS repository has a module named C<flibble> with a
file named C<foo/bar> in it.  Here is a branch diagram, with the main
development trunk shown down the left (C<1.1> through C<1.6>, etc) and
a single branch, tagged in CVS with a branch tag of C<beta_1>, is
shown forking off version C<1.5>:

     flibble/foo/bar:

         1.1
          |
         ...
          |
         1.5
          | \
          |  \ beta_1
          |   \
         1.6   \
          |    1.5.2.1
         ...    |
                |
               1.5.2.2
                |
               ...

    NOTE 1: You can use C<vcp> to extract graphical branch diagrams by
    installing AT&T's GraphViz package and the Perl CPAN module
    GraphViz.pm.  Then you can use a command like:

        $ vcp cvs:/var/cvsroot:flibble/foo/bar \
            branch_diagram:foo_bar.png

    to generate a .png file showing something like the above diagram.

On the other hand, p4 users typically branch files using directory
names.  Here's file C<foo/bar> again, with the main trunk held in the main
depot's //depot/main directory, again with a branch after the 5th
version of the file, but this time, the branch is represented by taking
a copy 

    //depot/main/foo/bar

         #1
          |
         ...
          |
         #5
          |\
          | \ //depot/beta_1/foo/bar
          |  \
         #6   \
          |   #1
         ...   |
               |
              #2
               |
              ...
          
    NOTE 2: the p4 command allows users to branch in very crafty and
    creative ways; it does not enforce the semantic of 1 branch per
    directory, and this gives p4 users a lot of power and flexibility.
    It also means that you might need some pretty crafty and creative
    branch maps when converting from p4 to other repositories.

    NOTE 3: that branch looks like a copy, but is actually just a
    metadata entry in the perforce repository, so it's very low
    overhead in terms of server effort and disk space, usually
    even more so than CVS branches.

    NOTE 4: Using GraphViz (as described in NOTE 1 above), you can
    build a diagram like this using vcp:

        $ vcp p4:perforce.our.com:1666://depot/flibble/foo/bar \
            branch_diagram:foo_bar.png

A user may or may not choose to label a branch in p4 with something
called a "branch specification" (see "p4 help branch" for details).  For
this discussion, we'll assume they didn't.

First, let's look at cvs -> p4 conversion.  To do this, we need to
match the branch tags in the CVS repository and use them to map branched
files in to a p4 subdirectory.  Here's .vcp file for this:

   ## cvs2p4.vcp

   Source:
   # get all files in the flibble module from cvs
       cvs:/var/cvsroot:flibble/...

   Destination:
   # Put the files in the flibble directory in the main depot of p4
       p4:perforce.our.com:1666://depot/flibble/...

   Map:
   #   Pattern       Result
   #   ============  =======
       (...)<>       main/$1   # main trunk => //depot/flibble/main/...
       (...)<(...)>  $2/$1     # branches   => //depot/flibble/$branch/...

The C<Source:> and C<Destination:> fields are just pieces of a normal
C<vcp> command line moved in to C<cvs2p4.vcp>.  The C<Map:> field is a
list of rules composed of pattern, result expression pairs.

In this example, all of the map expressions are relative paths.  The
patterns are relative to the C<Source:> cvs repositories' "C<flibble>"
module.  The results are relative to the C<Destination:> p4
repositories' "C<//depot/flibble/>" directory.

The first rule maps all files that have no branch tag in to the p4
directory C<//depot/flibble/main/>.  The C<< (...)<> >> pattern has two
parts: a C<name> part and a C<branch_id> part.  The C<name> part,
C<(...)>, matches all path names and copies them to the C<$1> variable.
The C<branch_id> part, C< <> >, matches empty / missing C<branch_id>s
(C<vcp>'s name for the CVS branch tag associated with a file on a
branch).  The C< main/$1 > result retrieves the C<name> part stored in
C<$1> and prefixes it with "C<main/>" to build the final C<name> value.

The second rule maps all files on branches to an appropriately named
subdirectory in the p4 destination.  The pattern is a lot like the first
rule's, but has a C<branch_id> part that matches all C<branch_id>s and
copies them in to C<$2>.  The rule merely uses this C<branch_id> from
C<$2> instead of the hardcoded "C<main/>" string to place the branches
in appropriate subdirectories.

Here's how our flibble/foo/bar file version fare when passed through
this mapping:

    CVS flibble/...              p4 //depot/flibble/...
    ========================     ======================

    foo/bar#1.1                  main/foo/bar#1
    foo/bar#1.2                  main/foo/bar#2
    ...                          ...
    foo/bar#1.5.2.1              beta_1/foo/bar#1
    foo/bar#1.5.2.2              beta_1/foo/bar#2
    ...                          ...

It's up to you to be sure there are no branches tagged "C<main>" in the
CVS repository.  Also, no branch specification will be created in the
target p4 repository (this is a limitation that should be fixed).

=head2 Result Actions: <<...>>

The result expression C<< <<skip>> >> indicates to skip the revision,
while the result expression C<< <<keep>> >> indicates to pass it through
unchanged:

    Map:
    #   Pattern            Result
    #   =================  ==========
        old_stuff/...      <<skip>>  # Delete all files in /old
        old_stuff/.../*.c  <<keep>>  # except these

        tags/(*)/...       <<label_parent($1)>>
        tags/(*)/(*)/...   <<label_parent("Release-$1",$2)>>
                                     # Delete matched revs and
                                     # apply the label to their
                                     # parent (i.e. previous) revs

        (...)<@(...)>      <<branch_to("tags/$2/$1")>>
                                     # Create branches from tags
                                     # (note lack of leading "/").

All actions are standalone tokens; no other terms (name, branch_id,
labels) are allowed.

<<branch_to>> and <<label_parent()>> are used when converting to/from
repositories like Subversion that use branches to implement labels.
Don't forget the to use double quotes on the arguments if you're
doing more than just passing in match variables like C<< $1 >>; the
error message you get when the arguments are not valid Perl is
an error in the compiled code and is well nigh inscrutable.  TODO:
work on that.

=head3 <<keep>>

<<keep>> is used to keep the indicated revision; it may be used
to include revisions that a later rule would <<skip>> or rewrite.

=head3 <<skip>>

<<skip>> is used to not pass on the indicated revision.  It doesn't
delete it in the target repository, it alters the data stream to look as
though the matched revision(s) didn't exist in the source repository.
NOTE: at present, it does not "route around" skipped revisions, so
if you skip a revision, you might also need to skip all later revisions
on that filebranch.  But this is the expected use case anyway (removing
entire file branches).  This behavior can be changed if need be.

=head3 <<label_parent()>>

<<label_parent(...)>> applies the label or labels passed to the matched
revision's parent revision (as indicated by the revision's previous_id
field) and then skips the matched revision (like <<skip>>).  If no
labels are passed, then any and all parenthesized captures are
concatenated and used as a label.  [Implementor's note: we can easily
add a new result action if anyone ever needs to not skip such
revisions.]

NOTE: <<label_parent>> causes the map filter to accumulate all
non-<<skip>>ed revisions until the last revision is received, then apply
the labels, then emit all revisions downstream.  Otherwise the map
filter does not buffer revisions, but alters revisions and passes them
on as they are received.  At some point, we may add a "label" action to
allow the destinations to apply labels to existing revisions, rather
than requiring the VCP::Rev to contain the list of labels, at which time
this buffering may be removed.

=head3 <<branch_to()>>

<<branch_to()>> creates one or more branches from the matched revision.
If the pattern expression had a label match (C<< <@...> >>) term, then
the <<branch_to()>> is applied once to each matching label.

The matched revision itself is treated as though a <<keep>> rule
preceeded the <<branch_to()>> rule (meaning that branches will be
inserted in the revision stream after the matched revision).

The branches will be created with:

    Field        Value (R4 is the matched rev)
    ===========  =============================
    name         $path
    id           $r->id . ".0"
    previous_id  $r->id
    rev_id       $r->rev_id . ".0"

No labels are removed from the matched revision.

=head2 The default rule

There is a default rule that is always added to the end of any 
explicitly supplied rules:

    ...  <<keep>>  ## Default rule: passes everything through as-is

This is evaluated after all the other rules.  Thus, if no other rule
matches a revision, it is passed through unchanged.

=head2 Command Line Parsing

For large maps or repeated use, the map is best specified in a .vcp
file.  For quick one-offs or scripted situations, however, the map:
scheme may be used on the command line.  In this case, each parameter
is a "word" (separated by whitespace) and every pair of words is a 
( pattern, result ) pair.

Because L<vcp|vcp> command line parsing is performed incrementally and
the next filter or destination specifications can look exactly like
a pattern or result, the special token "--" is used to terminate the
list of patterns provided on the command line.  This may also
be the last word in the C<Map:> section of a .vcp file, but that is
superfluous.  It is an error to use "--" before the last word in a .vcp
file.

=for test_script t/61map.t

=cut

$VERSION = 1 ;

@ISA = qw( VCP::Filter );

use strict ;
use VCP::Logger qw( lg );
use VCP::Debug qw( :debug );
use VCP::Utils qw( empty shell_quote );
use VCP::Filter;
use Regexp::Shellish qw( compile_shellish );
#use base qw( VCP::Filter );

#use fields (
#   'MAP_SUB',   ## The rules to apply, compiled in to an anon sub
#);

my @expr_order = qw( name branch_id labels );

sub _parse_expr {
   my ( $type, $v ) = @_;

   my %exprs;

   return () unless defined $v;

   if ( $type eq "result" && $v =~ m{\A<<(.*?)(?:\((.*)\))?>>\z} ) {
      my $action = $1;
      my $args = $2;
      if ( $action eq "label_parent" ) {
         die "$action requires arguments\n" if empty $args;
         $exprs{label_parent} = 1;
         $exprs{args} = $args;
      }
      elsif ( $action eq "branch_to" ) {
         die "$action requires arguments\n" if empty $args;
         $exprs{branch_to} = 1;
         $exprs{args} = $args;
      }
      elsif ( $action eq "delete" ) {
         ## back compat.  TODO: issue a warning.
         die "$action does not accept arguments\n" if defined $args;
         $exprs{skip} = 1;
      }
      elsif ( 0 <= index "|keep|skip|", "|$action|" ) {
         die "$action does not accept arguments\n" if defined $args;
         $exprs{$action} = 1;
      }
      else {
         die "unknown action in map rule result: $v\n"
      }
      return \%exprs;
   }

   my $s = $v;
   $exprs{name} = $1
      if $s =~ s{\A((?:\\.|[^<>\\])+)}{};

   ## Note the subtle difference of "+" vs "*" in the above and below
   ## regexen; names must be non-empty, but branches and labels may
   ## be empty.

   while ( length $s ) {
      last unless $s =~ s{\A<((?:\\.|[^<>\\])*)>}{};
      if ( 0 == index $1, "@" ) {
         die "only one <\@label> term allowed in \"$v\"\n"
             if defined $exprs{labels};
         $exprs{labels} = substr $1, 1;
      }
      else {
         die "only one <branch_id> term allowed in \"$v\"\n"
             if defined $exprs{branch_id};
         $exprs{branch_id} = $1;
      }
   }

   die "unable to parse map $type \"$v\" (broke at \"$s\")\n"
      if length $s;

   for ( sort keys %exprs ) {
      next unless defined $exprs{$_};

      die "newline in the $_ expression '$exprs{$_}' of map $type '$v'\n"
         if $exprs{$_} =~ tr/\n//;

      die "unescaped '$1' in the $_ expression '$exprs{$_}' of map $type '$v'\n"
         if $exprs{$_} =~ 
            ( $type eq "pattern"
                ? qr{(?<!\\)(?:\\\\)*([\@#<>\[\]{}\$])}
                : qr{(?<!\\)(?:\\\\)*([\@#<>\[\]*?()]|\.\.\.)|(?<!\$)\{}
            );

      die "illegal escape sequence '$1' in the $_ expression '$exprs{$_}' of map $type '$v'\n"
         if $exprs{$_} =~ qr{(?<!\\)(?:\\\\)*(\\(?!=\.\.\.)[^\@#<>\[\]{}*?()])};
   }

   return \%exprs;
}


sub _parse_rule {
   my $self = shift;
   my ( $name, $pattern, $result ) = @_;

   my $pattern_exprs = _parse_expr pattern => $pattern;
   my $result_exprs  = _parse_expr result  => $result;

   $self->{BATCH_MODE} ||= $result_exprs->{label_parent};

   return ( $name, $pattern, $pattern_exprs, $result_exprs );
}


sub _keep_rev_expr {
   my $self = shift;
   return join "",
      $self->{BATCH_MODE}
         ? "\$self->store_rev"
         : "\$self->SUPER::handle_rev",
      "( ",
      shift,
      " )";
}


sub _keep_rev_return_statement {
   my $self = shift;
   join "",
      "return ",
      $self->_keep_rev_expr( "\$rev" ),
      ";\n"
}


sub _compile_rule {
   my $self = shift;
   my ( $name, $pattern, $pattern_exprs, $result_exprs ) = @_;

   ## The test expression is a single regexp that matches a string
   ## built up from some pieces of the rev metadata.  Right now, only
   ## the name and the branch_id are tested, by someday the labels,
   ## change_id, rev_id, and comment could be tested.  If so, the
   ## comment field would need to come last due to newline issues.

   my $test_expr = 
      ! keys %$pattern_exprs
         ? 1  ## This happens iff the pattern was undef (which
              ## should only happen for the default rule).
         : join(
            "",
            "m'\\A",   ## Note the single-quotish context
            (map {
               my $field = $_;
               my $term = $pattern_exprs->{$field};
               defined $term
                  ? do {
                     my $re = compile_shellish( $term, { anchors => 0 } );
                     $re =~ s{(')}{\\`}g;
                     $re =~ s{\A\(\?[\w-]*: (.*) \)}{$1}gx;
                        # for readability of dumped code
                     $field eq "labels"
                        ? $re = "(?:.*\\n)*label:$re\\n(?:.*\\n)*"
                        : "$re\\n";
                  }
                  : $field eq "labels"
                     ? "(?:.*\\n)*"
                     : ".*\\n",
            } @expr_order ),
            "\\z'",
         );

   $pattern = defined $pattern ? qq{$pattern} : "<<match all>>";

   my $result_statement = join(
      "",
      debugging()
         ?  qq{lg( '    matched $name ($pattern)' );\n}
         : (),
      $result_exprs->{keep}
         ? (
            debugging()
               ?  qq{lg( "    <<keep>>ing" );\n}
               : (),
            $self->_keep_rev_return_statement
         )
      : $result_exprs->{skip}
         ? (
            debugging()
               ?  qq{lg( "    <<skip>>ing" );\n}
               : (),
            "return \$self->_skip_rev;\n"
         )
      : $result_exprs->{label_parent}
         ? (
            debugging()
               ?  qq{lg( "    <<label_parent>>ing (and <<skip>>ing)" );\n}
               : (),
            "\$self->_label_parent( \$rev, $result_exprs->{args} );\n",
            "return \$self->_skip_rev;\n"
         )
      : $result_exprs->{branch_to}
         ? do {
            my $args = $result_exprs->{args};
            (
               $self->_keep_rev_expr( "\$rev" ),
               ";\n",
               qq{for my \$label (\@labels) \{\n},
               qq{   local \$_ = join "", \$fields, map "\$_\\n", \$label;\n}, 
               qq{   next unless $test_expr;\n},
               debugging()
                  ?  qq{   lg( "    <<branch_to>>ing: ", join ", ", $args );\n}
                  : (),
               "   ",
               $self->_keep_rev_expr( "\$_" ),
               "\n",
               "       for \$self->_new_branch_to_revs( \$rev, $args );\n",
               "}\n",
               "return;\n"
            );
         }
      : (
         map(
            defined $result_exprs->{$_}
               ? do {
                  my $expr = $result_exprs->{$_};
                  $expr =~ s{([\\"])}{\\$1}g;
                  $expr =~ s{\n}{\\n}g;
                  (
                     debugging()
                        ?  qq{lg( "    rewriting $_ to '$expr'" );\n}
                        : (),
                     qq{\$rev->$_( "$expr" );\n}
                  )
               }
               : (
                     debugging()
                        ?  qq{lg( "    not rewriting $_" );\n}
                        : (),
               ),
            @expr_order
         ),
         $self->_keep_rev_return_statement,
      )
   );

   $result_statement =~ s/^/   /gm;

   "if ( $test_expr ) {\n$result_statement}\n";
}

sub _compile_rules {
   my $self = shift;
   my ( $rules ) = @_;

   my @field_get_exprs;
   my @list_get_exprs;

   my $preamble = <<END_PREAMBLE;
my ( \$self, \$rev ) = \@_;

END_PREAMBLE

   for ( @expr_order ) {
      if ( $_ eq "labels" ) {
         $preamble .= <<CODE;
my \@labels = map( "label:\$_", \$rev->labels);

CODE
         push @list_get_exprs, qq{\@labels};
      }
      else {
         my $expr = qq{\$rev->$_ || ""};
         push @field_get_exprs, $expr;
      }
   }

   my $field_get_exprs = join ",\n   ", @field_get_exprs;
   my $list_get_exprs  = join ",\n   ", @list_get_exprs;

   ## NOTE: making this a closure causes spurious warnings at exit so
   ## we pass $self explicitly.
   $preamble .= <<CODE;
my \$fields = join "", map "\$_\\n",
   $field_get_exprs;

local \$_ = join "", \$fields, map "\$_\\n", $list_get_exprs;

CODE

   $preamble .= <<CODE if debugging;
my \$s = \$_; \$s =~ s/\\n/\\\\n/g; lg( "map testing '\$s' (", \$rev->as_string, ")" );

CODE

   my $rule_number;
   my @parsed_rules =
      map [ $self->_parse_rule(  @$_ ) ],
         map( [ "Rule " . ++$rule_number, @$_               ], @$rules ),
              [ "Default Rule",           undef, "<<keep>>" ];

   my $code = join "",
      $preamble,
      map $self->_compile_rule(  @$_ ), @parsed_rules;

   $code =~ s/^/   /mg;
   $code = "sub {\n$code}";
   debug "map code:\n$code" if debugging;

   return( eval "#line 1 VCP::Filter::map::map_function\n$code"
      or die "$@ compiling\n",
         do {
            my $w = length( $code =~ tr/\n// + 1 ) ;
            my $ln;
            1 while chomp $code;
            $code =~ s{^}[sprintf "%${w}d|",++$ln]gme;
            "$code\n";
         },
   );
}


sub new {
   my $self = shift->SUPER::new;

   my ( $spec, $options ) = @_ ;

   $self->{MAP_SUB} = $self->_compile_rules(
      $self->parse_rules_list( $options, "Pattern", "Replacement" )
   );

   return $self ;
}


sub revs_db {
   my $self = shift;
   $self->{REVS_DB};
}


sub store_rev {
   my $self = shift;
   my ( $r ) = @_;
   $self->revs_db->set( [ $r->id ], $r->serialize );
}


sub handle_header {
   my $self = shift;

   if ( $self->{BATCH_MODE} ) {

      my $store_loc = $self->tmp_dir;

      $self->{REVS_DB} = VCP::DB_File::big_records->new(
         StoreLoc  => $store_loc,
         TableName => "revs",
      );

      $self->revs_db->delete_db;
      $self->revs_db->open_db;
   }

   $self->{LABELS_BY_ID} = {};

   $self->SUPER::handle_header( @_ );
}


sub _label_parent {
   my $self = shift;
   my ( $r, @labels ) = @_;

   my $parent_id = $r->previous_id;

   die "<<label_parent()>> failed: no previous_id set for: ",
      $r->as_string,
      "\n"
       if empty $parent_id;

   my $id = $r->id;

   push @{$self->{LABELS_BY_ID}->{$r->previous_id}->{$_}}, $id for @labels;
}


sub _new_branch_to_revs {
   my $self = shift;
   my ( $r, @paths ) = @_;
   my $previous_id = $r->id;
   my $previous_rev_id = $r->rev_id;
   map
      VCP::Rev->new(
         id          => "$previous_id.0",
         name        => $_,
         rev_id      => "$previous_rev_id.0",
         previous_id => $previous_id,
         comment     => "[vcp] create branch for <<branch_to>>",
         author      => undef,
         date        => undef,
      ),
      @paths
}


sub handle_rev {
   my $self = shift;

   $self->{MAP_SUB}->( $self, @_ );
}


sub handle_footer {
   my $self = shift;

   if ( $self->revs_db ) {

      my $l = delete $self->{LABELS_BY_ID};
      for my $rev_id ( sort keys %$l ) {
         for my $label ( sort keys %{$l->{$rev_id}} ) {
            if ( @{$l->{$rev_id}->{$label}} > 1 ) {
               warn "vcp: map: $rev_id labelled with \"$label\" by multiple revs:\n",
                   map "    - $_\n", @{$l->{$rev_id}->{$label}};
            }
         }
      }

      $self->revs_db->foreach_record_do( sub {
         my $r = VCP::Rev->deserialize( @_ );

         my $id = $r->id;
         $r->add_labels( sort keys %{delete $l->{$id}} )
            if exists $l->{$id};

         $self->SUPER::handle_rev( $r );
      } );

      warn "vcp: map: $_ not found, had labels queued: ",
         join ", ", map "\"$_\"", sort keys %{$l->{$_}}
            for sort keys %$l;

      $self->revs_db->close_db;
      $self->revs_db->delete_db;
      delete $self->{REVS_DB};
   }

   $self->SUPER::handle_footer( @_ );
}


sub DESTROY {
   my $self = shift;
   if ( $self->revs_db ) {
      $self->revs_db->close_db;
      $self->revs_db->delete_db;
   }
}


=head1 <<delete>>

NOTE: the <<delete>> result action has been deprecated and replaced
with <<skip>> in order to head off possible confusion with the
"delete" action used to delete a revision from the destination 
repository.


=head1 LIMITATIONS

There is no way (yet) of telling the mapper to continue processing the
rules list.  We could implement goto-labels like C<< label: >> to be
allowed before pattern expressions (but not between pattern and result),
and we could then impelement C<< <goto label> >>.  And a C<< <next> >> 
could be used to fall through to the next label.  All of which is
wonderful, but I want to gain some real world experience with the
current system and find a use case for gotos and fallthroughs before I
implement them.  This comment is here to solicit feedback :).

=head1 AUTHOR

Barrie Slaymaker <barries@slaysys.com>

=head1 COPYRIGHT

Copyright (c) 2000, 2001, 2002 Perforce Software, Inc.
All rights reserved.

See L<VCP::License|VCP::License> (C<vcp help license>) for the terms of use.

=cut

1