VCP::Process - How vcp works
vcp is designed to be a general purpose repository
import/export tool. This document describes some of the techniques used
to keep vcp
general purpose.
vcp
works in several phases:
1. Metadata Scanning
vcp
must take the source repository
spec, something like cvs:module/dir/...
and use the appropriate
repository interface (cvs log
in this case) to extract the metadata.
The metadata is currently kept all in memory; if you run in to a repository so big that this is troublesome, do the transfer in phases or pester us to provide a swap file capability for this data.
In the case of a RevML source, it is not practical to scan the input for metadata alone (the RevML may be coming from the standard input, for instance), so all of the files in a RevML source file are extracted during the scanning phase, as mentioned in VCP::Source::revml.
1a. Base revisions and backfilling
vcp
therefore needs to be able to recreate the first revision of a
text file in an incremental transfer when RevML is in use. This is
addressed by a process called "backfilling the base revision".
The "base" revision of a file is the revision that immediately precedes the first revision being transfered. It is also the last revision in the previous transfer and must be the most recent revision (on the appropriate branch) in the destination repository.
vcp
"backfills" the base revision by checking it out of the
destination repository, then reconstitutes the first revision by
applying the (base revision => first revision) delta to the base
revision. Each revision in a RevML file contains an MD5
checksum to make sure that all backfilling and patching is implemented
accurately.
1b. Selecting
2. Sorting and Change Aggregation
This is primarily used to do change number aggregation when converting from a repository that does not provide change set metadata (like CVS) to one that does (like p4).
This is also important when generating RevML files because the order of appearance of files in a log file may hinge on exactly when the files were inserted along with their names, at least in the case of CVS. Sorting the revisions provides for consistent RevML files, which is important in testing situations.
3. File transfer.
For incremental transfers an extra step is taken to ensure that incremental transfers leave no gaps. The base revision is backfilled from the destination repository (using the process for backfilling described in phase 1 above) and compared to the base revision from the source repository.
Currently, vcp
shells out to command line tools like cvs
and
p4
. This is a "least common denominator" approach that allows VCP to
operate at a safe distance from the underlying implementations. It is
also the primary bottleneck in transferring files. We will gladly
accept donations of drivers that use direct library interfaces or remote
procedure call (SOAP, RMI, etc., etc.) techniques to speed this process
up.
Barrie Slaymaker <barries@slaysys.com>
Copyright (c) 2000, 2001, 2002 Perforce Software, Inc. All rights reserved.
See VCP::License (vcp help license
) for the terms of use.