Source.pm #31

  • //
  • guest/
  • perforce_software/
  • revml/
  • lib/
  • VCP/
  • Source.pm
  • View
  • Commits
  • Open Download .zip Download (14 KB)
package VCP::Source ;

=head1 NAME

VCP::Source - A base class for repository sources

=head1 SYNOPSIS

=head1 DESCRIPTION

=head1 OPTIONS

=over

=item  --bootstrap

  --bootstrap=pattern

Forces all files matching the given shell regular expression (may use
wildcards like "*", "?", and "...") to have their first revisions
transferred as complete copies instead of deltas.  This is useful when
you want to transfer a revision other than the first revision as the
first revision in the target repository.  It is also useful when you
want to skip some revisions in the target repository (although the L<Map
filter|VCP::Filter::map> has superceded this use).

=item --continue

Tells VCP to continue where it left off from last time.  This will not
detect new branches of already transferred revisions (this limitation
should be lifted, but results in an expensive rescan of metadata), but
will detect updates to already transferred revisions.

=back

=cut

$VERSION = 0.1 ;

use strict ;

use UNIVERSAL qw( isa ) ;
use VCP::Debug qw( :debug ) ;
use VCP::Logger qw( lg BUG );

use base 'VCP::Driver' ;

use fields (
   'BOOTSTRAP',         ## The raw option so we can regurgitate it
   'BOOTSTRAP_REGEXPS', ## Determines what files are in bootstrap mode.
   'DEST',
   'CONTINUE',          ## Set if we're resuming from the prior
                        ## copy operation, if there is one.  This causes
                        ## us to determine a minimum rev by asking the
                        ## destination what it's seen on a given filebranch
   'SENT_REV_COUNT',    ## Number of revs sent

   ## Turns out that most real repositories (ie not RevML, at least)
   ## are most easily scanned in reverse chronological order.  Keeping
   ## the last revision or the last revision by filebranch is handy in
   ## these cases.
   'LAST_REV',               ## The rev that was last sent
   'LAST_REV_BY_FILEBRANCH', ## The last sent
   'SEEN_IDS',               ## IDs of those revisions set already.
   'SEND_REV_WHEN_PREVIOUS_ID_SET',
) ;


sub init {
   my VCP::Source $self = shift;
   $self->bootstrap( $self->{BOOTSTRAP} );
   $self->{SENT_REV_COUNT} = 0;
   $self->{SEND_REV_WHEN_PREVIOUS_ID_SET} = {};
}


###############################################################################

=head1 SUBCLASSING

This class uses the fields pragma, so you'll need to use base and 
possibly fields in any subclasses.  See L<VCP::Plugin> for methods
often needed in subclasses.

=head2 Subclass utility API

=over

=item options_spec

Adds common VCP::Source options to whatever options VCP::Plugin parses:

=cut

sub options_spec {
   my VCP::Source $self = shift;
   return (
      $self->SUPER::options_spec,
      "bootstrap|b=s"    => \$self->{BOOTSTRAP},
      "continue"         => \$self->{CONTINUE},
      "rev-root=s"       => \$self->{REV_ROOT},
   );
}

=item dest

Sets/Gets a reference to the VCP::Dest object.  The source uses this to
call handle_header(), handle_rev(), and handle_end() methods.

=cut

sub dest {
   my VCP::Source $self = shift ;

   $self->{DEST} = shift if @_ ;
   return $self->{DEST} ;
}


=item continue

Sets/Gets the CONTINUE field (which the user sets via the --continue flag)

=cut

sub continue {
   my VCP::Source $self = shift ;

   $self->{CONTINUE} = shift if @_ ;
   return $self->{CONTINUE} ;
}


=item real_source

Returns the reference to be used when sending revisions to the destination.

Each revision has a pointer to the source that sends it so that filters
and destinations can call get_source_file().

Most sources return $self; Sources that spool data, such as
VCP::Source::metadb, need to specify a real source.  They do so by
overloading this method.  VCP::Source::revml does not do this, as it
supplies a get_source_file().

=cut

sub real_source {
    return shift;
}

=item send_rev

    $self->send_rev( $r );

As the revisions are scanned, the source sends them downstream to the
dest using this method.  Sources should not retain references to revisions,
they should copy them if needed, or better yet, copy *just* the required
metadata as it is needed.  This is a requirement so that filters may alter
the revisions without affecting the source's logic.

This updates last_rev and last_rev_for_filebranch.

=cut

sub _send_rev {
   my VCP::Source $self = shift ;
   my ( $r ) = @_;

   debug "_send_rev: ", $r->id
      if debugging;

   $r->set_source( $self->real_source );
   ++$self->{SENT_REV_COUNT};
   delete $self->{SEND_REV_WHEN_PREVIOUS_ID_SET}->{int $r};
   $self->dest->handle_rev( $r ) if $self->dest;
}

sub send_rev {
   my VCP::Source $self = shift ;
   my ( $r ) = @_;

   debug "send_rev: ", $r->id
      if debugging;

   $self->{LAST_REV} = $r;
   $self->{LAST_REV_BY_FILEBRANCH}->{$r->source_filebranch_id} = $r;
   $self->{SEEN_IDS}->{$r->id} = undef;
   $self->_send_rev( $r );
}


=item queue_rev

Some revs can't be sent immediately.  They get queued.

This updates last_rev and last_rev_for_filebranch.

=cut

sub queue_rev {
   my VCP::Source $self = shift ;
   my ( $r ) = @_;

   debug "queue_rev: ", $r->id
      if debugging;

   $self->{LAST_REV} = $r;
   $self->{LAST_REV_BY_FILEBRANCH}->{$r->source_filebranch_id} = $r;
   $self->{SEEN_IDS}->{$r->id} = undef;

   $self->revs->add( $r );
}


=item queue_rev_until_previous_id_set

Some revs can't be sent immediately.  They get queued until their
previous_id is set by set_last_rev_in_filebranch_previous_id().

This updates last_rev and last_rev_for_filebranch.

=cut

sub queue_rev_until_previous_id_set {
   my VCP::Source $self = shift ;
   my ( $r ) = @_;

   debug "queue_rev_until_previous_id_set: ", $r->id
      if debugging;
   $self->{LAST_REV} = $r;
   $self->{LAST_REV_BY_FILEBRANCH}->{$r->source_filebranch_id} = $r;
   $self->{SEEN_IDS}->{$r->id} = undef;
   $self->{SEND_REV_WHEN_PREVIOUS_ID_SET}->{int $r} = $r;
}


=item queued_rev

    $self->queued_rev( $id );

Returns a queued rev by id.

Sources where revs can arrive willy-nilly, like VCP::Source::revml, queue
up all revs and need to randomly access them.

=cut

sub queued_rev {
   my VCP::Source $self = shift ;
   return $self->revs->get( @_ );
}

=item last_rev

Returns the last revision sent or queued.

=cut

sub last_rev {
   my VCP::Source $self = shift ;
   my ( $r ) = @_;

   return $self->{LAST_REV};
}


=item queued_revs

Returns a list of all queued revs.  Does not remove them from the queue.

=cut

sub queued_revs {
   my VCP::Source $self = shift;
   return $self->revs->get;
}


=item last_rev_for_filebranch

    $self->last_rev_for_filebranch( $filebranch_id );

Returns the last revision sent or queued on the indicated filebranch.

=cut

sub last_rev_for_filebranch {
   my VCP::Source $self = shift ;
   my ( $filebranch_id ) = @_;

   return $self->{LAST_REV_BY_FILEBRANCH}->{$filebranch_id};
}


=item set_last_rev_in_filebranch_previous_id

    $self->set_last_rev_in_filebranch_previous_id( $r );

If there is a last_rev_for_filebranch for $r->filebranch_id, sets its
previous_id to point to $r.  This is useful for sources which scan
in most-recent-first order.

=cut


sub set_last_rev_in_filebranch_previous_id {
   my VCP::Source $self = shift ;
   my ( $r ) = @_;

   my $child_rev = $self->last_rev_for_filebranch( $r->source_filebranch_id );
   if ( $child_rev ) {
      debug "setting ", $child_rev->id, "->previous_id to ", $r->id
         if debugging;

      $child_rev->previous_id( $r->id );
      $self->_send_rev( $child_rev )
         if $self->{SEND_REV_WHEN_PREVIOUS_ID_SET}->{int $child_rev};
   }
}


=item send_rev_because_previous_id_was_set

=cut

sub send_rev_because_previous_id_was_set {
   my $self = shift;
   my ( $r ) = @_;
   $self->_send_rev( $r );
}


=item last_revs_for_all_filebranches

    $self->last_revs_for_all_filebranches;

Returns the last revision sent or queued on every filebranch

=cut

sub last_revs_for_all_filebranches {
   my VCP::Source $self = shift ;

   return values %{$self->{LAST_REV_BY_FILEBRANCH}};
}


=item id_seen

    $self->id_seen( $id );

Returns true if the indicated id was sent or queued.

=cut

sub id_seen {
   my VCP::Source $self = shift;
   my ( $id ) = @_;
   return exists $self->{SEEN_IDS}->{$id};
}




=item sent_rev_count

Returns (does not set) the number of revs sent so far.

=cut

sub sent_rev_count {
    my VCP::Source $self = shift;
    return $self->{SENT_REV_COUNT};
}


=item send_revs

    $self->send_revs;

Removes and sends all revs accumulated so far.

=cut

sub send_revs {
   my VCP::Source $self = shift ;
   my ( $revs ) = @_;

   debug "sending revs" if debugging;

   $revs ||= $self->revs->remove_all;
   ## Oddly, we can't show the progress bar here because filters in the
   ## chain may accumulate revisions and sort them, so this is not a good
   ## metric.
   for my $i (
      0..$#$revs,
   ) {
      $self->send_rev( $revs->[$i] );
      $revs->[$i] = undef;
   }
   for my $r ( values %{$self->{SEND_REV_WHEN_PREVIOUS_ID_SET}} ) {
      $self->send_rev( $r );
   }
   $self->{SEND_REV_WHEN_PREVIOUS_ID_SET} = {};
}

=back

=head1 SUBCLASS OVERLOADS

These methods should be overridded in any subclasses.

=over

=cut

sub copy_revs {  ## TODO: delete this (DEPRECATED)
   my VCP::Source $self = shift ;
   my ( $revs ) = @_;
   $self->send_revs;
}


=item get_source_file

All sources must provide a way for the destination to fetch a revision.

=cut

sub get_source_file {
    my VCP::Source $self = shift;
    die $self, " does not overload get_source_file()\n";
}


=item handle_header

REQUIRED OVERLOAD.

Subclasses must add all repository-specific info to the $header, at least
including rep_type and rep_desc.

   $header->{rep_type} => 'p4',
   $self->p4( ['info'], \$header->{rep_desc} ) ;

The subclass must pass the $header on to the dest:

   $self->dest->handle_header( $header )
      if $self->dest;

This may be called when dest is null to allow the source to initialize
itself when it won't be scanning the real source.  So the if $self->dest
is important.

That's not the case for copy_revs().

=cut

sub handle_header {
   my VCP::Source $self = shift ;

#   my ( $header ) = @_ ;

   BUG "ERROR: handle_header not overloaded by class '", ref $self, "'.  Oops.\n";
#      if $self->can( 'handle_header' ) eq \&handle_header ;

#   $self->dest->handle_header( $header ) ;
}


=item handle_footer

Not a required overload, as the footer carries no useful information at
this time.  Overriding methods must call this method to pass the
$footer on:

   $self->SUPER::handle_footer( $footer ) ;

=cut

sub handle_footer {
   my VCP::Source $self = shift ;

   my ( $footer ) = @_ ;

   $self->dest->handle_footer( $footer ) ;
}


=item parse_time

   $time = $self->parse_time( $timestr ) ;

Parses "[cc]YY/MM/DD[ HH[:MM[:SS]]]".

Will add ability to use format strings in future.
HH, MM, and SS are assumed to be 0 if not present.

Returns a time suitable for feeding to localtime or gmtime.

Assumes local system time, so no good for parsing times in revml, but that's
not a common thing to need to do, so it's in VCP::Source::revml.pm.

=cut

{
    ## This routine is slow and gets called a *lot* with duplicate
    ## inputs, at least by VCP::Source::cvs, so we memoize it.
    my %cache;

    sub parse_time {
       my VCP::Source $self = shift ;
       my ( $timestr ) = @_ ;

       return $cache{$timestr} ||= do {
           ## TODO: Get parser context here & give file, line, and column.
           ## filename and rev too, while we're scheduling more work for
           ## the future.
           BUG "Malformed time value $timestr\n"
              unless $timestr =~ /^(\d\d)?\d?\d(\D\d?\d){2,5}/ ;
           my @f = split( /\D/, $timestr ) ;
           if (
              length $f[0] <= 2
              && $f[0] <= 12
              && ( length $f[2] == 4
                 || $f[2] > 12
                 || "0" eq substr( $f[2], 0, 1 )
              )
           ) {
              ## Must be MM/DD/YY, or MM/DD/YYYY.  timelocal() needs
              ## YY(YY)?/MM/DD
              splice @f, 0, 3, ( $f[2], $f[0], $f[1] );
           }

           --$f[1] ; # Month of year needs to be 0..11
           push @f, ( 0 ) x ( 6 - @f ) ;
           require Time::Local;
           my $t = eval { Time::Local::timelocal( reverse @f ) };
           BUG $@ unless defined $t;
           return $t;
        }
    }
}


=item bootstrap

Sets (and parses) or gets the bootstrap spec.

Can be called plain:

   $self->bootstrap( $bootstrap_spec ) ;

See the command line documentation for the format of $bootstrap_spec.

=cut

sub bootstrap {
   my VCP::Source $self = shift ;
   if ( @_ ) {
      my ( $val ) = @_ ;
      $self->{BOOTSTRAP} = $val;
      $self->{BOOTSTRAP_REGEXPS} = [
         defined $val
            ? map $self->compile_path_re( $_ ), split /,+/, $val
            : ()
       ];
    }

   return $self->{BOOTSTRAP};
}


=item is_bootstrap_mode

   ... if $self->is_bootstrap_mode( $file ) ;

Compares the filename passed in against the list of bootstrap regular
expressions set by L</bootstrap>.

The file should be in a format similar to the command line spec for
whatever repository is passed in, and not relative to rev_root, so
"//depot/foo/bar" for p4, or "module/foo/bar" for cvs.

This is typically called in the subbase class only after looking at the
revision number to see if it is a first revision (in which case the
subclass should automatically put it in bootstrap mode).

=cut

sub is_bootstrap_mode {
   my VCP::Source $self = shift ;
   my ( $file ) = @_ ;

   my $result = grep $file =~ $_, @{$self->{BOOTSTRAP_REGEXPS}} ;

   lg(
      "$file ",
      ( $result ? "=~ " : "!~ " ),
      "[ ", join( ', ', map "qr/$_/", @{$self->{BOOTSTRAP_REGEXPS}} ), " ] (",
      ( $result ? "not in " : "in " ),
      "bootstrap mode)"
   ) if debugging;

   return $result ;
}

=back

=head1 COPYRIGHT

Copyright 2000, Perforce Software, Inc.  All Rights Reserved.

This module and the VCP package are licensed according to the terms given in
the file LICENSE accompanying this distribution, a copy of which is included in
L<vcp>.

=head1 AUTHOR

Barrie Slaymaker <barries@slaysys.com>

=cut

1
# Change User Description Committed
#46 5404 Barrie Slaymaker - SVN support added
- Makefile gives clearer notices about missing optional
  prereqs.
- VCP::Filter::labelmap and VCP::Filter::map: <<skip>> replaces
  deprecated <<delete>> to be clearer that no revisions
  are deleted from either repository but some just are
  skipped and not inserted.
- VCP::Filter::map: support added for SVN-like branch labels
- VCP::Source: support added for ISO8601 timestamps
  emitted by SVN.
#45 5082 Barrie Slaymaker - VCP::Source tells VCP::Rev to uncache the source to allow
  the source instance to be DESTROYed and thus clean up its
  working files.
#44 5078 Barrie Slaymaker - VCP::Source::parse_time() 0s out undefined/missing fields
#43 4500 Barrie Slaymaker - Minor POD cleanup
#42 4497 Barrie Slaymaker - --rev-root documented
       - All destinations handle rev_root defaulting now
#41 4487 Barrie Slaymaker - dead code removal (thanks to clkao's coverage report)
#40 4135 Barrie Slaymaker - Time fields may have trailing AM/PM or A/P without leading whitespace
#39 4134 Barrie Slaymaker - "AM", "PM", "A", and "P" (case insensitive) are now parsed
  properly when parsing time values
#38 4039 Barrie Slaymaker - VCP::Source::scan_metadata() API now in place,
- VCP::Source::copy_revs() is fully deprecated.
#37 4021 Barrie Slaymaker - Remove all phashes and all base & fields pragmas
- Work around SWASHGET error
#36 3982 Barrie Slaymaker - VCP::Source no longer leaks memory by delete()ing from a phash
- VCP::Source::cvs now flushes to disk more often to conserve RAM
#35 3970 Barrie Slaymaker - VCP::Source handles rev queing, uses disk to reduce RAM
- Lots of other fixes
#34 3922 Barrie Slaymaker - More paranoid paramter checking
#33 3916 Barrie Slaymaker - Reduce memory consumption
#32 3907 Barrie Slaymaker - Debugging cleanups
#31 3898 Barrie Slaymaker - VCP::Source::* --rev-root reinstanted
#30 3855 Barrie Slaymaker - vcp scan, filter, transfer basically functional
    - Need more work in re: storage format, etc, but functional
#29 3835 Barrie Slaymaker - VCP::Source supports queuing of revs and facilities for
  sending revs ASAP to conserve memory
#28 3820 Barrie Slaymaker - VCP::Source::revml now uses VCP::Source's queueing methods
    - For maintainability only, does not decrease memory util.
#27 3819 Barrie Slaymaker - Factor send & queueing of revs up in to VCP::Source
#26 3811 Barrie Slaymaker - fetch_*() and get_rev() renamed get_source_file()
#25 3806 Barrie Slaymaker - VCP::Source no longer tries to send to a missing dest
#24 3804 Barrie Slaymaker - Refactored to prepare way for reducing memory footprint
#23 3706 Barrie Slaymaker - VCP gives some indication of output progress (need more)
#22 3687 Barrie Slaymaker - Destinations may now use compile_path_re()
#21 3681 Barrie Slaymaker - VCP now scans much more of real_vss_1 and converts it to revml
#20 3679 Barrie Slaymaker - VCP::Source::vss respects --case-sensitive in more places
#19 3677 Barrie Slaymaker - rev_root sanity check is now case insensitive on Win32
- Parens in source filespecs are now treated as regular
  characters, not capture groups
- ** is not treated as '...'
#18 3477 Barrie Slaymaker - Make --rev-root only available in VCP::Source::p4
#17 3460 Barrie Slaymaker - Revamp Plugin/Source/Dest hierarchy to allow for
  reguritating options in to .vcp files
#16 3445 Barrie Slaymaker - Don't misparse YYYY/MM/DD dates as MMMM/DD/YY.
- t/61sort.t no longer blows up due to VCP::Rev's new
  BUG checks.
#15 3443 Barrie Slaymaker - Use BUG instead of Carp::confess
- Recognize MM/DD/YY format dates
#14 3157 Barrie Slaymaker debug conversion to VCP::Logger
#13 3155 Barrie Slaymaker Convert to logging using VCP::Logger to reduce stdout/err spew.
       Simplify & speed up debugging quite a bit.
       Provide more verbose information in logs.
       Print to STDERR progress reports to keep users from wondering
       what's going on.
       Breaks test; halfway through upgrading run3() to an inline
       function for speed and for VCP specific features.
#12 3133 Barrie Slaymaker Make destinations call back to sources to check out files to
       simplify the architecture (is_metadata_only() no longer needed)
       and make it more optimizable (checkouts can be batched).
#11 3131 Barrie Slaymaker Double the speed of the RCS file parser.
       Deprecate VCP::Revs::shift() in favor of remove_all().
#10 2824 John Fetkovich removed CVS_CONTINUE field from Source/cvs.pm, and added
       CONTINUE field and continue accessor to Source.pm.  Moved parsing
       of the --continue option also.
#9 2809 Barrie Slaymaker Implement --repo-id in Plugin.pm, refactor source & dest
       options parsing starting in VCP::Source::cvs (need to
       roll out to other sources and dests), get t/91cvs2revml.t
       passing again (first time in months! branching and
       --continue support works in cvs->foo!).
#8 2453 John Fetkovich removed compilation of revml.
 will be making that a separate executable.
#7 2293 Barrie Slaymaker Update CHANGES, TODO, improve .vcp files, add --init-cvs
#6 2015 Barrie Slaymaker submit changes
#5 1998 Barrie Slaymaker Initial, revml and core VCP support for branches
#4 1809 Barrie Slaymaker VCP::Patch should ignore lineends
#3 628 Barrie Slaymaker Cleaned up POD in bin/vcp, added BSD-style license.
#2 468 Barrie Slaymaker - VCP::Dest::p4 now does change number aggregation based on the
  comment field changing or whenever a new revision of a file with
  unsubmitted changes shows up on the input stream.  Since revisions of
  files are normally sorted in time order, this should work in a number
  of cases.  I'm sure we'll need to generalize it, perhaps with a time
  thresholding function.
- t/90cvs.t now tests cvs->p4 replication.
- VCP::Dest::p4 now doesn't try to `p4 submit` when no changes are
  pending.
- VCP::Rev now prevents the same label from being applied twice to
  a revision.  This was occuring because the "r_1"-style label that
  gets added to a target revision by VCP::Dest::p4 could duplicate
  a label "r_1" that happened to already be on a revision.
- Added t/00rev.t, the beginnings of a test suite for VCP::Rev.
- Tweaked bin/gentrevml to comment revisions with their change number
  instead of using a unique comment for every revision for non-p4
  t/test-*-in-0.revml files.  This was necessary to test cvs->p4
  functionality.
#1 467 Barrie Slaymaker Version 0.01, initial checkin in perforce public depot.