cvs2p4: A toolset for importing CVS into Perforce Richard Geiger rmg@perfortify.com July 24, 2006 ==== INTRODUCTION This small, free, set of tools provides a means for importing CVS modules into Perforce. It was originally developed for use at Network Appliance in the spring of 1997, to convert our product source code revision history from CVS into Perforce. At the time, as an afterthought, I put together a public distribution, hoping that the work might benefit others. Since that time there has been an steady flow of users, and an unsteady flow of improvements aimed at ease of use, performance, accuracy, capacity, and flexibility. I have since used the tool to perform two other "real" migrations at other Perforce customers, and am gearing up for a third, which will be the largest and most intricate I have yet attempted. I have also tried to support anybody interested in using cvs2p4, and to provide prompt bug fixes when bugs have been reported. cvs2p4 was inspired by, and is patterned at a high level after the PVCS to Perforce converter available on the Perforce web site. A conversion consists of the following phases, each of which is performed by a separate perl script: - bin/genmetadata Scans the CVS repository, parsing every RCS archive file to generate a single metadata file which holds all of the information needed needed by the subsequent phases. Several other files produced by this phase provide further information, and are used by later steps to convert CVS release tags into Perforce labels. - bin/genchanges Scans the metadata file produced by the previous phase, to identify groups of RCS revisions that comprise Perforce atomic changes, and writes a file describing them for use by the next phase. - bin/dochanges Based on the data produced by the previous phases, generates Perforce metadata in Perforce journal format. This metadata refers directly to revisions in the original RCS archives in CVS. *** No new ,v files are generated by the conversion! *** Rather, you run the Perforce server against a copy of (or through a link to) the original CVS repository. - bin/dolabels Using the data created by previous phases, creates the Perforce metadata required to represent CVS tags as Perforce labels. Essentially, cvs2p4 tries make the resultant Perforce depot look (as much as possible) as if the work in CVS had been going on in Perforce. In particular, it attempts to model changes corresponding to branch creations as if they had been done with: p4 integrate //depot/branchA/... //depot/branchB/... This is in contrast to the rcstoperf.sh script, which scatters the "integrates" corresponding of the creation of files on new branches into many changes (basically, according to when the file was actually first changed in the new branch). cvs2p4 also allows you to import only selected branches, and/or to map some branch other than the the CVS trunk to become the new "main" branch in Perforce. See the notes in the template config file ("test/config") for more information on these features. Note: A CVS tagged revision will make it into a Perforce label ONLY when the revision is in fact present in the converted depot, subject to the branches selected for import. (See the notes for the "$WANTLINES" variable in the config file). ==== MANIFEST *** Note: You should unpack the archive on the OS you intend to run the conversion under. I.e., do not expect to be able to unpack the archive on a Windows machine, and have it run properly on a *nix host. After unpacking the distribution archive, use the MANIFEST script to verify that you have all of the pieces. The output should look something like this: $ MANIFEST MANIFEST README NEWS config.tmpl bin/cvs2p4 bin/genmetadata bin/genchanges bin/dochanges bin/dolabels bin/revmap bin/srcdiff bin/cvs2p4 lib/util.pl src/rcs-5.7/src/rlog.c.patch test/file,v test/phone.gif,v test/dollar$file,v test/space file,v test/pound#file,v test/percent_%file,v test/at@file,v test/star*file,v test/datefile_readd,v test/Attic/datefile,v test/config.test test/runtest test/norm test/metadata.good test/lines.good test/changes.good test/p4_changes_-l.good test/p4_describe.good test/p4_describe-new.good test/p4_filesat.good test/p4_labels.good All ok $ ==== REQUIREMENTS This stuff should work on any Unix host that supports: - Perl 5.x, with working dbm support (i.e., dbmopen()/dbmclose() work). The scripts assume that perl will be found via $PATH. It must be a perl5! Some people have reported problems that seem to be related to dbm limitations with some perls when converting very large repositories. I like implementations based on Berkeley-DB. As supplied, you'll need to have the perl DB_File package installed. - Compiler suite capable of building RCS 5.7 (to make a special slightly tweaked version of rlog from a patch supplied with this package). - Perforce server (p4d) release 2002.1 or later. cvs2p4 has been used successfully with Perforce releases up to 2006.1. Later Perforce releases may work, but since this script generates journal-format metadata directly, it may need to be changed in order to work correctly with other Perforce releases. Please see the "PERFORCE METADATA DEPENDENCIES" section (below) for further details. ==== WHAT IT DOES This converter will import a CVS module (or a group of them at once) into Perforce, preserving the branching structure seen in the RCS ,v files in the CVS repository, and translating them into Perforce branches within the depot. It will only import RCS branches up through the highest numbered revisions on branches that have branch tags referring to them; thus, it will not necessarily bring *every* revision in the CVS module into Perforce, but *will* bring in every revision leading up to the current revision for every branch it imports. I think this is what most people will want; if not, hack away. Like the "rcstoperf.sh" converter available on the Perforce web site, it applies heuristics to try and identify multiple changes in CVS that are highly likely to comprise what would be seen as a single change in Perforce, and makes them appear as a single Perforce change. (The heuristics are: checked in by the same user, proximal in time, and bearing an identical log message). It deals correctly with files that are dead on the CVS trunk (I.e., where the RCS ,v files are in the "Attic/". The converter attempts to leave converted files in Perforce with a sensible Perforce file type (See `p4 help filetypes` for a description of file tyeps in Perforce) after the conversion. However, due to limitations in RCS's notion of "file type" (the -k options, controlling keyword expansion), cvs2p4 must currently decide to import all "text" files as Perforce type "text" (text with no keyword expansion) or "ktext" (text with keyword expansion). This is controlled by the "$KTEXT" configuration option, which is on by default. Also note that binary files will be converted to Perforce type "binary+D"; the (unusual) "+D" is there because the converter works by using the existing RCS archive files directly; normally in perforce, filetype "binary" implies storage of complete revisions, rather than as RCS archives. Rest assured that "binary+D" is correct. The "UI" for the converter is not very slick, but for most people it's a one-time kind of tool anyway. Feel free to improve it if you are so inclined. Please understand that this tool is *not* officially supported by Perforce. It is supplied in hopes that somebody will find it useful (Or perhaps only entertaining :-). ==== ***** Caveat!: ***** As of release 3.0, cvs2p4 now employs a modified rlog (see below) in order to gain better performance. However, in the current implementation, if any CVS log messages contain lines satisfying either: $_ eq "=============================================================================\n" or $_ =~ /^revision\s+([^\s]+)\s*next\s*([^\s]+)?$/ the conversion will fail in unpredicatable ways. This limitation will be lifted in a future release. ==== src/rcs-5.7/src/rlog.c.patch This release of cvs2p4 relies on a patched version of the RCS "rlog" command. To use this release, you'll need to - Get RCS 5.7 sources (from ftp://ftp.cs.purdue.edu/pub/RCS/); - Unpack them, and apply the patch in src/rcs-5.7/src/rlog.c.patch to the rlog.c supplied in the RCS distribution; - Build them to produce an "rlog" command; - Copy the built rlog binary into the .../cvs2p4/bin directory, where "bin/genmetadata" will look for it. I hope to provide pre-built binaries for at least a few popular platforms in upcoming releases. ==== TESTING I have included a *very* rudimentary automated test "suite", in the test/ directory. You can use this to verify that it seems to work in your environment. To run it: 1. Edit test/config, and change the lines # p4 command location (If other than "/usr/local/bin/p4") # $P4 = "/usr/local/bin/p4"; # p4 command location (If other than "/usr/local/bin/p4d") # $P4D = "/usr/local/bin/p4d"; to reflect the actual location of your "p4" and "p4d" commands, and the server port that you want to be used during the conversion. (it must be localhost: with an unused port number; this is used only while running the conversion - you can of course, run your production server with the result of the conversion using any port you desire). Also, verify that the port number in $P4PORT is currently unused on the host where you will run the conversion: # Perforce server to use during the conversion. Must be # "localhost:" and some unused port number. THIS SHOULD NEVER # BE POINTED AT A PRODUCTION PERFORCE SERVER INSTANCE! # $P4PORT = "localhost:1680"; 2. Run the tests with test/runtest This should run all of the conversion scripts on a test CVS module, and then verify a few things by querying the Perforce server after the conversion is complete. If everything goes well, the end of the output should be runtest: ok In these tests, the converted CVS "module" consists of a very few files, but it does have a carefully constructed branching structure, intended to verify that the converter does the right stuff with respect to branching. ==== USAGE Once you have got the test running properly, you can turn your attention to your conversion. 1. Make a directory to hold all the working files for the conversion, and create a config file, starting with test/config as a template: $ mkdir convdir; cp test/config convdir In general, all of the configuration settings and optins for a given conversion are specified in the config file. Edit the convdir/config file to reflect your locale and intent. (See the comments in the config file for descriptions of the settings and options). 2. Run bin/cvs2p4: The script takes a single argument- the name of the directory where the "config" file resides. (It will create all intermediate, temp, and working files under this directory, which we will refer to as the "conversion directory".) -OR-: bin/cvs2p4 executes each of the four stages of a full conversion, in turn. If any stage fails, the conversion will terminate without attempting to run remaining stages. If you desire, you can run each of the four stages yourself, (i.e., without using bin/cvs2p4). The commands used to run them are shown below: 2a. Run bin/genmetadata: As for each of the four phases, the script takes a single argument - the name of the directory where the "config" file resides. (It will create all intermediate, temp, and working files under this directory, which we will refer to as the "conversion directory".) $ bin/genmetadata convdir genmetadata: rm -rf convdir/logmsgs.dir convdir/logmsgs.pag ... . . (filenames of each file in the CVS module, as they are scanned) . ===== Lines referenced: chupa curly ha <- a list of branch tags encountered in the scan; larry also saved to convdir/lines. shemp xxx This reads cvsdir/config to get its marching orders, then scans the CVS module(s) for all ,v and Attic/,v files, creating: convdir/metadata <- the extracted RCS/CVS metadata convdir/logmsgs.pag <- An ndbm database convdir/logmsgs.dir <- of the log messages convdir/lines <- A list of "codelines" (== branch tags) At this point, you may want to look at the list of branch tags encountered, (which was written to convdir/lines), edit the config file, setting $WANTLINES to 1, and filling in the "<&1 | tee OUT dochanges> /bin/rm -f convdir/revmap.db ... dochanges> /bin/rm -f convdir/depotmap.db ... dochanges> /bin/rm -rf p4root && mkdir -p p4root dochanges> /bin/mkdir -p /home/rmg/web/richard_geiger/... dochanges> /bin/ln -s /home/rmg/web/richard_geiger/... ========== change group 1 ========== change group 2 ========== change group 3 . . . ========== change group 17 ========== change group 18 dochanges> cd /home/rmg/web/richard_geiger/... Recovering from dbmeta... dochanges> cd /home/rmg/web/richard_geiger/... Dumping to checkpoint... When this command finishes, your CVS module has been imported to Perforce, in the Perforce server database identified by the $P4ROOT configuration variable. The state of the resultant database is saved in a checkpoint file named $P4ROOT/checkpoint. *** NOTE ***: cvs2p4 does not create new RCS-format archives (,v files) under $P4ROOT; rather, it uses the existing RCS archives in the CVS tree directly. By defasult, does this by making a symbolic link named $P4ROOT/depot/IMPORT pointing to the $CVS_MODULE tree. If you'd rather have dochanges copy in the CVS module for you, set COPYIMPORT in the config file. 2d. If you want to import CVS tags as Perforce labels, there is an additional phase (once again, the single argument is the name of the conversion directory where the config file lives): $ bin/dolabels convdir make label: testlabel dolabels> cd p4root && /usr/local/bin/p4d -jr dblbls /home/rmg/web/richard_geiger/guest/richard_geiger/utils/cvs2p4_meta/p4root Recovering from dblbls... dolabels> cd p4root && rm -f checkpoint; /usr/local/bin/p4d -jd checkpoint /home/rmg/web/richard_geiger/guest/richard_geiger/utils/cvs2p4_meta/p4root Dumping to checkpoint... This step adds the symbolic tag information from the CVS archive (for "plain", non-branch tags) to the Perforce database identified by the $P4ROOT configuration variable. The state of the resultant database is saved in a checkpoint file named $P4ROOT/checkpoint. 3. If you want the RCS revision-to-Perforce change map, run: $ bin/revmap convdir Or, for the reverse mapping: $ bin/revmap -map rrevmap convdir ==== PRESCAN MODE During the course of a conversion, the bin/genmetadata phase can detect and report unusual conditions which may indicate "corruption" in the CVS repository. Typically, you'll want to deal with these prior to performing the live conversion. In order to make a quicker way of finding these conditions, bin/genmetadata now supports a "-prescan" flag. When run this way: $ bin/genmetadata -prescan convdir genmetadata will (as usual) parse each RCS archive file in the CVS repository, and report any conflicts it finds, but it will NOT bother to parse some additional information needed for an actual conversion, nor produce all of the output metadata needed by an actual conversion. This allows you to perform one or more "-prescan"s on your CVS repository to more quickly resolve any such problems. bin/cvs2p4 can also be used to run a "prescan", e.g., "bin/cvs2p4 -prescan" will run bin/genmetadata (only) with the -prescan options. None of the remaining converison stages are performed. ==== INCREMENTAL CONVERSIONS At this time, the recommended procedure for doing "incremental" conversions - i.e., combining multiple CVS repositories, or doing subsets of the CVS modules in a repository one at a time - is to do each as a new conversion (starting with change 1), and then to combine them as desired using the "perfmerge2.pl" tool. This is also a useful pattern when you want to combine some new chunk of CVS (or RCS) repository into an existing Perforce depot. perfmerge2.pl can be obtained by sending email to support@perforce.com In order for this to work, you'll need to insure that there is no overlap in the namespaces of files, between your existing Perforce repository and the newly converted files. See the notes at the top of the perfmerg2.pl script. perfmerge2.pl can operate in different modes, with respect to the ordering of change numbers in the merged repositories. You can elect either - to have it renumber all of the merged changesets, so that the time-ordered property of all change numbers (both existing and newly-merged) is preserved; or, - to leave your existing changes remain numbered as they are, with the newly imported changed numbered from the next available change number, even though some of them may have taken place (in CVS) interleaved in time with your existing Perforce changes. Note that perfmerge2.pl only merges server metadata; you'll also need to manually copy the tree of RCS archive files from your newly converted $P4ROOT into your existing server's $P4ROOT. ==== PERFORCE METADATA DEPENDENCIES Since cvs2p4 works by directly generating Perforce metadata in the perforce checkpoint/journal format, it is dependent on "knowing" the right definitions for certain tables within the Perforce database. As of this writing, cvs2p4 writes metadata for the following Perforce tables, at the version number shown for each table: table name ver ------------ --- db.change 0 db.desc 0 db.rev 3 db.revcx 0 db.integed 0 db.depot 0 db.domain 2 db.counters 0 The tests provided with this package are known to work correctly using any Perforce server from version 2001.1 to 2005.2. It should work correctly with any new p4d version that can still read (and upgrade from) 2002.2 metadata. ==== IMPORTING TO MULTIPLE DEPOTS If you wish to divide the body of CVS being imported into multiple depots in Perforce, you can establish mappings in the config file by adding lines of the form: $Depotmap{""} = ""; ... is the name of a directory in $CVSROOT; .... is the name of the Perforce depot to create. For example, if $CVSROOT points to "/cvsroot", and you want the files from $CVSROOT/somedir/ to be placed into the Perforce depot "//somedepot", you would add $Depotmap{"somedir"} = "//somedepot"; Note that the slashes _must_ be present in the value as shown above! Perforce Depot Specifications are created for each depot used. ==== IMPORTING CVS TAGS AS PERFORCE LABELS Some very basic architectural differences between CVS and Perforce create challenges when attempting to represent CVS tags as Perforce labels. This is a side effect of a difference in the way CVS and Perforce handle branching and the way CVS tags work. The essential problem is as follows...: In CVS, when a file has been branched, but not yet changed in the child branch, there is no way, based only on information in that individual CVS archive, to determine _which_ branch(es) a label was intended to apply to. This is due to the fact that, in CVS, creating a new branch does not actually create a new branch revision within the RCS archive; it merely marks the branch point with a symbolic name. This is not a problem for CVS, since the act of checking out a CVS tree using a CVS tag will always retrieve a single CVS revision for every file bearing the tag. In Perforce, however, Perforce's "lazy copy" mechanism creates (if only virtually!) a distinct #1 revision in the child branch at branching time. Thus, for a given file, the #1 revisions for multiple Perforce branches may share the same parent branch and revision. In such cases, there's no way for the "dolabels" command to know specifically _which_ child branch a given CVS tag was meant to apply to. Previously, cvs2p4 "punted" the problem, by including the #1 revisions in all of the child branches, to "share" a tagged revision (i.e., those that share the common branch point in CVS) in the imported Perforce label. Unfortunately, for respositories with large numbers of tags, this approach suffered badly from poor performance and increasingly large numbers of "noise" revisions being created in the resulting Perforce labels. More recently, provisions have been added to bin/dolabels for a user-supplied mapping function, in order to infuse the converter with outside knowledge about which tags go with with branches. (And a way of selecting which tags to actually import as labels!) Finally, as of cvs2p4 3.0, a heuristic based on looking at the state of a tag across *all* files bearing the tag has been implemented, and is enabled by default. *** As of this release, by default, only those labels for which the *** label -> mapping has been established by the heuristic will be *** represented as labels in the resultant Perforce depot! This behavior *** may be changed, using the "$DISCARD_UNMAPPED_TAGS" configuration *** variable. Since the heuristic may not work in all circumstances or for all labels, a user-supplied mapping function, to be applied when the heuristic fails, can be supplied. Presently, it is coded as the subroutine "branch_for_tag()" in bin/dolabels, and you should edit the script if you need to implement such a function. ==== CVS "import"-ED FILES CVS files which were created by running "cvs import" once or more to import "vendor branches" have some interesting properties. In particular, when a cvs import-ed file has not yet had any changes commited to the trunk (i.e., any local changes to what was originally imported from the vendor branch), a "cvs checkout" or "cvs update" without no other arguments will use the latest vendor branch revision present for the file. For example, Consider an RCS file with the following revisions: 1.3 1.2 1.1 1.1.1.1 1.1.1.2 1.1.1.3 1.1.1.4 The 1.1, 1.1.1.1, 1.1.1.2, and 1.1.1.3 revisions were created by three "cvs import" commands before revision 1.2 was commited onto the trunk. (The 1.1.1.4 revision was created -after- 1.2). In Perforce, the sequence of revisions on the "main" branch would be: 1.1 main/#1 1.1.1.2 main/#2 1.1.1.3 main/#3 1.2 main/#4 1.3 main/#5 In such cases, revision tags selecting revisions from the 1.1.1 branch "spliced" into main (1.1.1.2 and 1.1.1.3, in our example) are placed into the Perforce labels in *both* the main and import branches. ==== HOW TO PACKAGE MODIFICATIONS 1. In a //guest workspace, make your modifications to files in the package. Before submitting: 2. Edit the NEWS file to document the change(s). (Please follow the established format. Of course.) 3. Do the submit. 4. Update the checksums in the MANIFEST file: $ p4 edit MANIFEST $ MANIFEST -gen 5. As root, Generate the release tarball: # MANIFEST -tar (This creates cvs2p4-.tar) 6. And add it to the default changelist: $ p4 add cvs2p4- 7. Finally, update the "cvs2p4-latest" symbolic link: $ p4 edit cvs2p4-latest.tar $ rm cvs2p4-latest.tar $ ln -s cvs2p4-.tar cvs2p4-latest.tar 8. And then submit. In this scenario, the change will include the following files: MANIFEST cvs2p4-.tar (added) cvs2p4-latest.tar To complete the act of "publishing" the new release, you must have Perforce write access to //public/perforce/utils/cvs2p4/... "Publishing" the new release a simple a matter of integrating your change(s) into the //public/..., and submitting. ==== SUPPORT I try to maintain this tool as a contribution to the community at large. If you have questions or problems, please feel free to email me ( rmg at perfortify dot com ) I originally wrote and contributed this tool while working for Network Appliance in 1997. I worked on it as a Perforce employee from August 2002 through December 2003. I have since also worked at Data Domain, of Palo Alto, California, where further improvements were made. Presently, I work at IronPort Systems in San Bruno, California, and, as you might guess, cvs2p4 is evolving again. I would like to gratefully acknowledge the support of all of these employers, who have allowed me to maintain and improve cvs2p4 both for their ends, as well as for others who might find them useful. I'd enjoy hearing from anyone who uses this (or tries to!), whether you have problems and questions, or not. Drop me a line!, Thanks, - Richard Geiger rmg at perfortify dot com revised July 2006, release 3.0