cvs2p4 2.3.4 January 7, 2003 README revised March 24, 2003 Release 2.3.1 of cvs2p4 introduced a radically different approach to importing CVS history into Perforce, in order to provide a much faster conversion process. cvs2p4 1.x releases put data into Perforce by "replaying" all of the CVS changes into a live Perforce server. This newer version works by directly generating Perforce metadata, and linking (or copying) the RCS archives from the CVS repository directory directly into the Perforce file archive. If you have problems with this version, you can still get a copy of the older (and slower, but time-tested) version at ftp://public.perforce.com/public/perforce/utils/cvs2p4/cvs2p4-1.3.3.tar ==== INTRODUCTION This small set of tools provides a means for importing a CVS module into Perforce. It was originally developed for use at Network Appliance, to convert our product source code revision history from CVS into Perforce. As such it sprouted some NetApp-specific features suited to our special needs, but I have made an attempt to make these unobtrusive to the general user. Basically, it is patterned at a high level after the PVCS to Perforce converter available on the Perforce web site, doing the following steps during a conversion: - Scans the CVS repository to generate a metadata file; - Scans the metadata file to identify groups of RCS revisions that comprise Perforce changes; - Imports the revisions/log history into a Perforce depot, by directly generating Perforce metadata in "journal" format. (driven by the output of the previous phase); - Finally, (and optionally), generates a map of RCS revisions and the Perforce changes they belong to. cvs2p4 tries make the resultant Perforce depot look as if the work in CVS had been going on in Perforce. In particular, it attempts to create changes corresponding to the whole creation of new branches a la p4 integrate //depot/branchA/... //depot/branchB/... This is in contrast to rcstoperf.sh, which scattered the "integrates" corresponding of the creation of files on new branches into many changes (basically, according to when the file was actually first changed in the new branch). cvs2p4 also allows you to import only selected branches, and/or to map some branch other than the the CVS trunk to become the new "main" branch in Perforce. See the notes in the template config file ("test/config") for more information on these features. As of version 1.3, cvs2p4 will also import CVS symbolic version tags. Note: A CVS tagged revision will make it into Perforce labels ONLY when the revision is in fact present in the converted depot, subject to the branches selected for import. (See the notes for the "WANTLINES" variable in the config file). ==== MANIFEST After unpacking the distribution archive, use the MANIFEST script to verify that you have all of the pieces. The output should go something like this: $ MANIFEST MANIFEST Artistic README NEWS bin/genmetadata bin/genchanges bin/dochanges bin/dolabels bin/revmap lib/util.pl test/file,v test/dollar$file,v test/space file,v test/config test/runtest test/norm test/metadata.good test/lines.good test/changes.good test/p4_changes_-l.good test/p4_describe.good test/p4_filesat.good test/p4_labels.good All ok ==== REQUIREMENTS This stuff should work on any Unix host that supports: - Perl 5.x, with working dbm support (i.e., dbmopen()/dbmclose() work). The scripts assume that perl will be found via $PATH. It must be a perl5! Some people have reported problems that seem to be related to dbm limitations with some perls when converting very large repositories. I like implementations based on Berkeley-DB. - Perforce release 2002.1, 2002.2, or 2003.1 Later Perforce releases may work, but since this script generates journal-format metadata directly, it may need to be changed in order to work correctly with other Perforce releases. Please see the "PERFORCE METADATA DEPENDENCIES" section (below) for further details. ==== WHAT IT DOES This converter will import a CVS module into Perforce, preserving the branching structure seen in the RCS ,v file in the CVS repository, and translating them into Perforce branches within the depot. As it stands, it will only import RCS branches up through the highest numbered revisions on branches that have branch tags referring to them; thus, it will not necessarily bring *every* revision in the CVS module into Perforce, but *will* bring in every revision leading up to the current revision for every branch it imports. I think this is what most people will want; if not, hack away. Like the "rcstoperf.sh" converter available on the Perforce web site, it applies heuristics to try and identify multiple changes in CVS that are highly likely to comprise what would be seen as a single change in Perforce, and makes them appear as a single Perforce change. (The heuristics are: checked in by the same user, proximal in time, and bearing an identical log message). It deals correctly with files that are dead on the CVS trunk (I.e., where the RCS ,v files are in the "Attic/". The converter attempts to leave converted files in perforce with a sensible Perforce file type (See `p4 help filetypes` for a description of file tyeps in Perforce) after the conversion. However, due to limitations in RCS's notion of "file type" (the -k options, controlling keyword expansion), cvs2p4 must currently decide to import all "text" files as Perforce type "text" (text with no keyword expansion) or "ktext" (text with keyword expansion). This is controlled by the "$KTEXT" configuration option, which is on by default. Also note that binary files will be converted to Perforce type "binary+D"; the (unusual) "+D" is there because the converter works by using the existing RCS archive files directly; normally in perforce, filetype "binary" implies storage of complete revisions, rather than as RCS archives. Rest assured that "binary+D" is correct. The "UI" for the converter is not very slick, but for most people it's a one-time kind of tool anyway. Feel free to improve it if you are so inclined. While I am currently a Perforce employee, please understand that this is *not* presently officially supported by Perforce. It is supplied in hopes that somebody will find it useful (Or perhaps only entertaining :-). ==== TESTING I have included a *very* rudimentary automated test "suite", in the test/ directory. You can use this to verify that it seems to work in your environment. To run it: 1. Edit test/config, and change the lines # p4 command location (If other than "/usr/local/bin/p4") # $P4 = "/usr/local/bin/p4"; # p4 command location (If other than "/usr/local/bin/p4d") # $P4D = "/usr/local/bin/p4d"; # Perforce server we're using. # $P4PORT = "localhost:1680"; to reflect the actual location of your "p4" and "p4d" commands, and the server port that you are using. *** Note: Previous versions of this tool allowed you to run the Perforce server on a different host than the one where the conversion tools were run. This is no longer the case; thus you should probably never change the "localhost" part of the P4PORT configuration setting, above. 2. Run the tests with test/runtest This should run all of the conversion scripts on the test CVS module (well, file - it's a one-file module!), and then verify a few things by querying the Perforce server after the conversion is complete. If everything goes well, the end of the output should be runtest: ok In this version, the converted CVS "module" consists of a very few files, but it does have a carefully constructed branching structure, intended to verify that the converter does the right stuff with respect to branching. ==== USAGE 1. Make a directory to hold all the working files for the conversion, and create a config file, starting with test/config as a template: $ mkdir convdir; cp test/config convdir Edit the convdir/config file to reflect your locale and intent. (See the comments in the config file). 2. Run bin/genmetadata: It takes a single argument - the name of the directory where the "config" file resides. (It will create all intermediate, temp, and working files under this directory.) $ bin/genmetadata convdir genmetadata: rm -rf convdir/logmsgs.dir convdir/logmsgs.pag ... . . (filenames of each file in the CVS module, as they are scanned) . ===== Lines referenced: chupa curly ha <- a list of branch tags encountered in the scan; larry also saved to convdir/lines. shemp xxx This reads cvsdir/config to get its marching orders, then scans the CVS module for all ,v and Attic/,v files, creating: convdir/metadata <- the extracted RCS/CVS metadata convdir/logmsgs.pag <- An ndbm database convdir/logmsgs.dir <- of the log messages convdir/lines <- A list of "codelines" (== branch tags) At this point, you may want to look at the list of branch tags encountered, (which was written to convdir/lines), edit the config file, setting WANTLINES to 1, and filling in the "<&1 | tee OUT dochanges> /bin/rm -f convdir/revmap.db ... dochanges> /bin/rm -f convdir/depotmap.db ... dochanges> /bin/rm -rf p4root && mkdir -p p4root dochanges> /bin/mkdir -p /home/rmg/web/richard_geiger/... dochanges> /bin/ln -s /home/rmg/web/richard_geiger/... ========== change group 1 ========== change group 2 ========== change group 3 . . . ========== change group 17 ========== change group 18 dochanges> cd /home/rmg/web/richard_geiger/... Recovering from dbmeta... dochanges> cd /home/rmg/web/richard_geiger/... Dumping to checkpoint... Basically, that's it. When this command finishes, your CVS module has been imported to Perforce, in the Perforce server database identified by the $P4ROOT configuration variable. The state of the resultant database is saved in a checkpoint file named $P4ROOT/checkpoint. NOTE: cvs2p4 does not create new RCS-format archives (,v files) under $P4ROOT; rather, it uses the existing RCS archives in the CVS tree directly. By defasult, does this by making a symbolic link named $P4ROOT/depot/IMPORT pointing to the $CVS_MODULE tree. If you'd rather have dochanges copy in the CVS module for you, set COPYIMPORT in the config file. 6. If you want to import labels from CVS tags, run $ bin/dolabels convdir make label: testlabel dolabels> cd p4root && /usr/local/bin/p4d -jr dblbls /home/rmg/web/richard_geiger/guest/richard_geiger/utils/cvs2p4_meta/p4root Recovering from dblbls... dolabels> cd p4root && rm -f checkpoint; /usr/local/bin/p4d -jd checkpoint /home/rmg/web/richard_geiger/guest/richard_geiger/utils/cvs2p4_meta/p4root Dumping to checkpoint... This step adds the symbolic tag information from the CVS archive (for "plain", non-branch tags) to the Perforce database identified by the $P4ROOT configuration variable. The state of the resultant database is saved in a checkpoint file named $P4ROOT/checkpoint. ** NOTE: This version of cvs2p4 does *not* create new RCS archives in ** $P4ROOT/depot/...; Rather, it creates a symbolic link ** "$P4ROOT/depot/IMPORT -> $CVS_MODULE"; i.e., the existing RCS ** archives form the CVS repository are used by the Perforce server, ** in place. If you'd rather have it make a _copy_ of the RCS archive ** files from your CVS repository, set "$COPYIMPORT = 1" in your ** config file. 7. If you want the RCS revision-to-Perforce change map, run: $ bin/revmap convdir Or, for the reverse mapping: $ bin/revmap -map rrevmap convdir ==== INCREMENTAL CONVERSIONS At this time, the recommended procedure for doing "incremental" conversions - i.e., combining multiple CVS repositories, or doing subsets of the CVS modules in a repository one at a time - is to do each as a new conversion (starting with change 1), and then to combine them as desired using the "perfmerge2.pl" tool. This is also a useful pattern when you want to combine some new chunk of CVS (or RCS) repository into an existing Perforce depot. perfmerge2.pl can be obtained by sending email to support@perforce.com In order for this to work, you'll need to insure that there is no overlap in the namespaces of files, between your existing Perforce repository and the newly converted files. See the notes at the top of the perfmerg2.pl script. perfmerge2.pl can operate in different modes, with respect to the ordering of change numbers in the merged repositories. You can elect either - to have it renumber all of the merged changesets, so that the time-ordered property of all change numbers (both existing and newly-merged) is preserved; or, - to leave your existing changes remain numbered as they are, with the newly imported changed numbered from the next available change number, even though some of them may have taken place (in CVS) interleaved in time with your existing Perforce changes. Note that perfmerge2.pl only merges server metadata; you'll also need to manually copy the tree of RCS archive files from your newly converted $P4ROOT into your existing server's $P4ROOT. ==== PERFORCE METADATA DEPENDENCIES Since cvs2p4 works by directly generating Perforce metadata in the perforce checkpoint/journal format, it is dependent on "knowing" the right definitions for certain tables within the Perforce database. As of this writing, cvs2p4 writes metadata for the following Perforce tables, at the version number shown for each table: table name ver ------------ --- db.change 0 db.desc 0 db.rev 3 db.revcx 0 db.integed 0 db.depot 0 db.domain 2 db.counters 0 p4d version 2002.2 writes version 3 of the db.domain table, but can correctly read version 2. The tests provided with this package are known to work correctly using Perforce 2002.1, 2002.2, and 2003.1. ==== WHY DOES dolabels LABEL FILES IN MANY BRANCHES? == What: When converting CVS tags into Perforce labels, users are often surprised to observe that, in addition to the files in the expected branch, many files in _other_ branches are also labeled. == Why: This is a side effect of a difference in the way CVS and Perforce handle branching, in combination with the way CVS tags work. The essential problem is as follows... When a file which has been branched, but not yet changed in the child branch, there is no way, based only on information in the CVS archive, to determine _which_ branch(es) a label was intended to apply to. This is due to the fact that, in CVS, creating a new branch does not actually create a new branch revision within the RCS archive; it merely marks the branch point with a symbolic name. This is not a problem for CVS, since the act of checking out a CVS tree using a CVS tag will always retrieve a single CVS revision for every file bearing the tag. In Perforce, however, Perforce's "lazy copy" mechanism creates (if only virtually!) a distinct #1 revision in the child branch at branching time. Thus, for a given file, the #1 revisions for multiple Perforce branches may share the same parent branch and revision. In such cases, there's no way for the "dolabels" command to know specifically _which_ child branch a given CVS tag was meant to apply to. Therefore, dolabels includes the #1 revisions in all of the child branches that share the common branch point in CVS. == What to do: In order to set up a Perforce client for the "correct" branch corresponding to a particualr CVS tag (now Perforce label), the user must, in effect, tell Perforce what branch the label if really supposed to be used with. This can be done by either 1 - Restricting the view map in the client workspace to files with the "correct" branch for the label used, e.g., View: //depot/prod/rel1_0/... //client/... and % p4 sync ...@label - OR - 2 - Selectively syncing only the "correct" client branch by providing the branch in in the sync filespec, e.g., View: //depot/prod/... //client/... and % pr sync //depot/prod/rel1_0/...@label Another approach is for the user to remove the label from all branches except for the "correct" one(s). == Future In the future, it could be possible, using heuristics that consider global information (as opposed converting each file based solely on information from the corresponding CVS (RCS) file archive, to have dolabels automatically determine the correct branch for each label, and ONLY include files from that branch in a label. That's (potentially) in the future, though! ==== HOW TO PACKAGE MODIFICATIONS Until now (October 7, 2003), I have made all changes to the package. But this may change in the future, if/when the project gets additional curators. So, I think it a good idea to document the steps used to package and publish a new release: 1. In a //guest workspace, make your modifications to files in the package. Before submitting: 2. Edit the NEWS file to document the change(s). (Please follow the established format. Of course.) 3. Do the submit. 4. Update the checksums in the MANIFEST file: $ p4 edit MANIFEST $ MANIFEST -gen 5. As root, Generate the release tarball: # MANIFEST -tar (This creates cvs2p4-.tar) 6. And add it to the default changelist: $ p4 add cvs2p4- 7. Finally, update the "cvs2p4-latest" symbolic link: $ p4 edit cvs2p4-latest.tar $ rm cvs2p4-latest.tar $ ln -s cvs2p4-.tar cvs2p4-latest.tar 8. And then submit. In this scenario, the change will include the following files: MANIFEST cvs2p4-.tar (added) cvs2p4-latest.tar To complete the act of "publishing" the new release, you must have Perforce write access to //public/perforce/utils/cvs2p4/... "Publishing" the new release a simple a matter of integrating your change(s) into the //public/..., and submitting. ==== SUPPORT I originally wrote and contributed this tool while working for Network Appliance in 1997. I worked as a Perforce employee from August 2002 through December 2003. I presently work at Data Domain, Inc., where I will continue to try and offer help to those who encounter problems using these tools, but cannot guarentee any level of support. - Richard Geiger rmg@datadomain.com (revised October 7, 2003, release 2.3.7)