package VCP::Filter::map; =head1 NAME VCP::Filter::map - rewrite name and branch number. =head1 SYNOPSIS ## From the command line: vcp <source> map: p1 r1 p2 r2 -- <dest> ## In a .vcp file: Map: name_glob_1<branch_1> name_out_1<branch_result_1> name_glob_2<branch_2> name_out_2<branch_result_2> # ... etc ... =head1 DESCRIPTION Maps source files, revisions, and branches to destination files and branches while copying a repository. This is done by rewriting the name and branch_id of revisions according to a list of rules. =head2 Rules A rule is a pair of expressions specifying a pattern to match against the each incoming revision's name and branch_id and a result to use to replace the revision's name and branch_id. The list of rules is evaluated top down; the last rule in the list that matches is used to generate the new name and branch_id. There is a default rule that applies to all files Note that sorting is performed in the destination, so the map will affect the sort order and the original file name and branch_id are lost. =head2 Patterns and Rule Expressions Patterns and rules are composed of two subexpressions, the C<name_expr> and the C<branch_id_expr> like so: name_expr<branch_id_expr> The C<< <branch_id_expr> >> (including angle brackets) is optional and may be forbidden by some sources or destinations that embed the concept of a branch in the name_expr. (See L<VCP::Dest::p4|VCP::Dest::p4> for an example, though this may be changed in the future). For now, the symbols C<#> and C<@> are reserved for future used in all expressions and must be escaped using C<\>, and various shell-like wildcards are implemented in pattern expressions. =head2 Pattern Expressions Both the C<name_expr> and C<branch_id_expr> specify patterns using shell regular expression syntax with the extension that parenthesese are used to extract portions of the match in to numbered variables which may be used in the result construction, like Perl regular expressions: ? Matches one character other than "/" * Matches zero or more characters other than "/" ... Matches zero or more characters, including "/" (foo) Matches "foo" and stores it in the $1, $2, etc Some example pattern C<name_expr>s are: Pattern name_expr Matches ========= ======= foo the top level file "foo" foo/bar the file "foo/bar" ... all files (like a missing name_expr) foo/... all files under "foo/" .../bar all files named "bar" anywhere */bar all files named "bar" one dir down ....pm all files ending in ".pm" ?.pm all top level 4 char files ending in ".pm" \?.pm the top level file "?.pm" (*)/... all files in subdirs, puts the top level dirname in $1 Unix-style slashes are used, even on operating systems where that may not be the preferred local custom. A pattern consisting of the empty string is legal and matches everything (NOTE: currently there is no way to take advantage of this; quoting is not implemented in the forms parser yet. use "..." instead). Relative paths are taken relative to the rev_root indicated in the source specification for pattern C<name_expr>s (or in the destination specification for result C<name_expr>s). For now, a relative path is a path that does not begin with the character C</>, so be aware that the pattern C<(/)> is relative. This is a limitation of the implementation and may change, until it does, don't rely on a leading "(" making a path relative and use multiple rules to match multiple absolute paths. If no C<name_expr> is provided, C<...> is assumed and the pattern will match on all filenames. Some example pattern C<branch_id_expr>s are: Pattern branch_id_expr Matches files on =========== ================ <> no branch <...> all branches (like a missing <branch_id_expr> <foo> branch "foo" <R...> branches beginning with "R" <R(...)> branches beginning with "R", the other chars in $1 If no C<branch_id_expr> is provided, files on all branches are matched. C<*> and C<...> still match differently in pattern C<branch_id_expr>s, as in <name_expr> patterns, but this is likely to make no difference, as I've not yet seen a branch label with a "/" in it. Still, it is wise to avoid "*" in C<branch_id_expr> patterns. Some example composite patterns are (any $ variables set are given in parenthesis): Pattern Matches ======= ======= foo<> top level files named "foo" not on a branch (...)<> all files not on a branch ($1) (...)/(...)<> all files not on a branch ($1,$2) ...<R1> all files on branch "R1" .../foo<R...> all files "foo" on branches beginning with "R" (...)/foo<R(...)> all files "foo" on branches beginning with "R" ($1, $2) =head2 Escaping Null characters and newlines are forbidden in all expressions. The characters C<#>, C<@>, C<[>, C<]>, C<{>, C<}>, C<E<gt>>, C<E<lt>> and C<$> must be escaped using a C<\>, as must any wildcard characters meant to be taken literally. In result expressions, the wildcard characters C<*>, C<?>, the wildcard trigraph C<...> and parentheses must each be escaped with single C<\> as well. No other characters are to be escaped. =head2 Case sensitivity By default, all patterns are case sensitive. There is no way to override this at present; one will be added. =head2 Result Expressions Result expressions look a lot like patthern expressions except that wildcards are not allowed and C<$1> and C<${1}> style variable interpolation is. =head2 Result Actions: <delete>> and <<keep>> The result expression C<< <<delete>> >> indicates to delete the revision, while the result expression "<<keep>>" indicates to pass it through unchanged: Map: # Pattern Result # ================= ========== old_stuff/... <<delete>> # Delete all files in /old old_stuff/.../*.c <<keep>> # except these C< <<delete>> > and C< <<keep>> > may not appear in results; they are standalone tokens. =head2 The default rule There is a default rule ... <<keep>> ## Default rule: passes everything through as-is that is evaluated before all the other rules. Thus, if no other rule matches a revision, it is passed through unchanged. =head2 Command Line Parsing For large maps or repeated use, the map is best specified in a .vcp file. For quick one-offs or scripted situations, however, the map: scheme may be used on the command line. In this case, each parameter is a "word" and every pair of words is a ( pattern, result ) pair. Because L<vcp|vcp> command line parsing is performed incrementally and the next filter or destination specifications can look exactly like a pattern or result, the special token "--" is used to terminate the list of patterns if map: is from on the command line. This may also be the last word in the C<Map:> section of a .vcp file, but that is superfluous. It is an error to use "--" before the last word in a .vcp file. =for test_script t/61map.t =cut $VERSION = 1 ; use strict ; use VCP::Debug qw( :debug ); use VCP::Utils qw( shell_quote ); use VCP::Filter; use Regexp::Shellish qw( compile_shellish ); use base qw( VCP::Filter ); use fields ( 'MAP_RULES', ## The rules to apply ); my @expr_order = qw( name branch_id ); sub _parse_expr { my ( $type, $v ) = @_; if ( $type eq "result" ) { return ( delete => 1 ) if $v eq "<<delete>>"; return ( keep => 1 ) if $v eq "<<keep>>"; } return () unless length $v; my %exprs; @exprs{@expr_order} = $v =~ m{ \A (?:( (?: \\. | [^<\\] )+ ## name ))? (?: <( .* ## branch_id )> )? \z }x; die "vcp: unable to parse map $type '$v'\n" unless grep defined, values %exprs; for ( @expr_order ) { next unless defined $exprs{$_}; die "newline in the $_ expression '$exprs{$_}' of map $type '$v'\n" if $exprs{$_} =~ tr/\n//; die "unescaped '$1' in the $_ expression '$exprs{$_}' of map $type '$v'\n" if $exprs{$_} =~ ( $type eq "pattern" ? qr{(?<!\\)(?:\\\\)*([\@#<>\[\]{}\$])} : qr{(?<!\\)(?:\\\\)*([\@#<>\[\]{}*?()]|\.\.\.)} ); die "illegal escape sequence '$1' in the $_ expression '$exprs{$_}' of map $type '$v'\n" if $exprs{$_} =~ qr{(?<!\\)(?:\\\\)*(\\(?!=\.\.\.)[^\@#<>\[\]{}*?()])}; } return %exprs; } sub _compile_rule { my ( $pattern, $result ) = @_; my %pattern_exprs = _parse_expr pattern => $pattern; my %result_exprs = _parse_expr result => $result; ## The test expression is a single regexp that matches a string ## built up from some pieces of the rev metadata. Right now, only ## the name and the branch_id are tested, by someday the labels, ## change_id, rev_id, and comment could be tested. If so, the ## comment field would need to come last due to newline issues. my $test_expr = ! keys %pattern_exprs ? 1 : join( "", "/\\A", join( "\\n", ## Newlines are forbidden in all expressions. map defined $_ ? do { my $re = compile_shellish( $_, { anchors => 0 } ); $re =~ s{/}{\\/}g; $re =~ s{\A\(\?[\w-]*:(.*)\)}{$1}g; # for readiliby of dumped code $re; } : ".*", @pattern_exprs{qw( name branch_id )} ), "\\z/", ); my $result_statement = $result_exprs{passthrough} ? "return \$self->dest->handle_rev( \$rev );\n" : $result_exprs{delete} ? "return; ## Deleted!\n" : join( "", map( defined $result_exprs{$_} ? do { my $expr = $result_exprs{$_}; $expr =~ s{([\\"])}{\\$1}g; $expr =~ s{\n}{\\n}g; qq{\$rev->$_( "$expr" );\n}; } : (), @expr_order ) ) . "return \$self->dest->handle_rev( \$rev );\n"; $result_statement =~ s/^/ /gm; "if ( $test_expr ) {\n$result_statement}\n"; } sub _compile_rules { my VCP::Filter::map $self = shift; my $preamble = join ", ", map qq{\$rev->$_ || ""}, @expr_order; my $preamble = <<END_PREAMBLE; my ( \$rev ) = \@_; local \$_ = join "\\n", $preamble; END_PREAMBLE $preamble .= qq{my \$s = \$_; \$s =~ s/\\n/\\\\n/g; VCP::Debug::debug( "vcp: map testing '\$s'" );\n\n} if explicitly_debugging $self; # Rules get evaluated in reverse order. my $code = join "", $preamble, map _compile_rule( @$_ ), reverse( [ "", "<<keep>>" ], @_ ); $code =~ s/^/ /mg; # NOTE: the sub is a closure and encloses our $self $code = "sub {\n$code}"; debug "vcp: map code:\n$code" if explicitly_debugging $self; return( eval $code or die "vcp: $@ compiling\n", do { my $w = length( $code =~ tr/\n// + 1 ) ; my $ln; 1 while chomp $code; $code =~ s{^}[sprintf "%${w}d|",++$ln]gme; "$code\n"; }, ); } sub new { my $class = shift ; $class = ref $class || $class ; my $self = $class->SUPER::new( @_ ) ; ## Parse the options my ( $spec, $options ) = @_ ; # Add the default rule. unshift @$options, ( "(...)", "\$1" ); my $pattern; while ( @$options ) { my $v = shift @$options; last if $v eq "--"; if ( ! defined $pattern ) { my ( $name_expr, $branch_id_expr ) = _parse_expr $v, "pattern"; $pattern = compile_shellish $v; } else { my ( $name_expr, $branch_id_expr ) = _parse_expr $v, "pattern"; ## De-escape some escaped chars. Leave @ and $ escaped ## because the $result is interpreted in doublequotish ## context. $v =~ s{(?<!\\)(?:\\\\)*\\(\.\.\.|[#<>\[\]*?()])}{$1}g; my $result = $v; push @{$self->{MAP_RULES}}, [ $pattern, $result ]; $pattern = undef; } } if ( debugging $self ) { require Data::Dumper; debug( Data::Dumper->Dump( [ $self->{MAP_RULES} ], [ "MAP_RULES" ] ) ); } if ( $pattern ) { my @out = map [ map shell_quote( $_ ), @$_ ], @{$self->{MAP_RULES}}; shift @out; # hide the default rule. my $pw = length "Pattern"; $pw = $_ > $pw ? $_ : $pw for map length $_->[0], @out; my $rw = length "Result"; $rw = $_ > $rw ? $_ : $rw for map length $_->[1], @out; die "Odd number of values in map:\n\n", sprintf( "# %-${pw}s %s\n", "Pattern", "Result" ), sprintf( "# %-${pw}s %s\n", "=" x $pw, "=" x $rw ), map( sprintf( " %-${pw}s %s\n", @$_ ), @out ), sprintf( " %-${pw}s %s\n", shell_quote( $pattern ), "" ), "\n" if defined $pattern; } return $self ; } sub handle_rev { my VCP::Filter::map $self = shift; for ( @{$self->{MAP_RULES}} ) { } $self->dest->handle_rev( @_ ); } =head1 LIMITATIONS There is no way (yet) of telling the mapper to continue processing the rules list. We could implement labels like C< <<I<label>>> > to be allowed before pattern expressions (but not between pattern and result), and we could then impelement C< <<goto I<label>>> >. And a C< <<next>> > could be used to fall through to the next label. All of which is wonderful, but I want to gain some real world experience with the current system and find a use case for gotos and fallthroughs before I implement them. This comment is here to solicit feedback :). =head1 AUTHOR Barrie Slaymaker <barries@slaysys.com> =head1 COPYRIGHT Copyright (c) 2000, 2001, 2002 Perforce Software, Inc. All rights reserved. See L<VCP::License|VCP::License> (C<vcp help license>) for the terms of use. =cut 1
# | Change | User | Description | Committed | |
---|---|---|---|---|---|
#20 | 5404 | Barrie Slaymaker |
- SVN support added - Makefile gives clearer notices about missing optional prereqs. - VCP::Filter::labelmap and VCP::Filter::map: <<skip>> replaces deprecated <<delete>> to be clearer that no revisions are deleted from either repository but some just are skipped and not inserted. - VCP::Filter::map: support added for SVN-like branch labels - VCP::Source: support added for ISO8601 timestamps emitted by SVN. |
||
#19 | 4487 | Barrie Slaymaker | - dead code removal (thanks to clkao's coverage report) | ||
#18 | 4483 | Barrie Slaymaker | - calls to skip_rev() are summarized to STDOUT | ||
#17 | 4481 | Barrie Slaymaker | - VCP::Filter::map calls skip_rev when deleting a rev (spotted by clkao) | ||
#16 | 4021 | Barrie Slaymaker |
- Remove all phashes and all base & fields pragmas - Work around SWASHGET error |
||
#15 | 4012 | Barrie Slaymaker | - Remove dependance on pseudohashes (deprecated Perl feature) | ||
#14 | 3491 | Barrie Slaymaker | - All sections are now documented in generated config files | ||
#13 | 3415 | John Fetkovich | documentation cleaned up | ||
#12 | 3155 | Barrie Slaymaker |
Convert to logging using VCP::Logger to reduce stdout/err spew. Simplify & speed up debugging quite a bit. Provide more verbose information in logs. Print to STDERR progress reports to keep users from wondering what's going on. Breaks test; halfway through upgrading run3() to an inline function for speed and for VCP specific features. |
||
#11 | 3091 | Barrie Slaymaker |
Factor out rules list parsing; it's useful elsewhere and should not have been copy & edited in to two files in the first place. |
||
#10 | 3089 | Barrie Slaymaker | Fix minor bug in error reporting. | ||
#9 | 3028 | Barrie Slaymaker | Fix quoting issue in debugging statement | ||
#8 | 3026 | Barrie Slaymaker | Code and docs cleanup | ||
#7 | 3015 | Barrie Slaymaker |
Finally find & fix a stray "not an ARRAY reference" in perl's cleanup code (it's probably mishandling a closure) |
||
#6 | 2867 | John Fetkovich | debugging output change | ||
#5 | 2317 | Barrie Slaymaker | Get map working on revml->revml | ||
#4 | 2316 | Barrie Slaymaker | intermediate checkin | ||
#3 | 2315 | Barrie Slaymaker | update docs, implement a bit more mapping code | ||
#2 | 2307 | Barrie Slaymaker | get VCP::Filter::map working, update docs | ||
#1 | 2305 | Barrie Slaymaker | This too |