cvs2p4 2.3.4	January 7, 2003
README revised  October 26, 2004

Release 2.3.1 of cvs2p4 introduced a radically different approach to
importing CVS history into Perforce, in order to provide a much faster
conversion process. cvs2p4 1.x releases put data into Perforce by
"replaying" all of the CVS changes into a live Perforce server.  This
newer version works by directly generating Perforce metadata, and
linking (or copying) the RCS archives from the CVS repository
directory directly into the Perforce file archive.

==== INTRODUCTION    

This small set of tools provides a means for importing a CVS module
into Perforce.

It was originally developed for use at Network Appliance, to convert
our product source code revision history from CVS into Perforce.

As such it sprouted some NetApp-specific features suited to our
special needs, but I have made an attempt to make these unobtrusive to
the general user.

Basically, it is patterned at a high level after the PVCS to
Perforce converter available on the Perforce web site, doing the
following steps during a conversion:

  - Scans the CVS repository to generate a metadata file;

  - Scans the metadata file to identify groups of RCS revisions
    that comprise Perforce changes;

  - Imports the revisions/log history into a Perforce depot, by
    directly generating Perforce metadata in "journal" format.
    (driven by the output of the previous phase);

  - Finally, (and optionally), generates a map of RCS revisions and
    the Perforce changes they belong to.

cvs2p4 tries make the resultant Perforce depot look as if the work in
CVS had been going on in Perforce. In particular, it attempts to
create changes corresponding to the whole creation of new branches a
la

  p4 integrate //depot/branchA/... //depot/branchB/...

This is in contrast to rcstoperf.sh, which scattered the "integrates"
corresponding of the creation of files on new branches into many
changes (basically, according to when the file was actually first
changed in the new branch).

cvs2p4 also allows you to import only selected branches, and/or to map
some branch other than the the CVS trunk to become the new "main"
branch in Perforce. See the notes in the template config file
("test/config") for more information on these features.

As of version 1.3, cvs2p4 will also import CVS symbolic version tags.

Note: A CVS tagged revision will make it into Perforce labels ONLY
when the revision is in fact present in the converted depot, subject
to the branches selected for import. (See the notes for the
"WANTLINES" variable in the config file).


==== MANIFEST

*** Note: You should unpack the archive on the OS you intend to run
the conversion under. I.e., do not expect to be able to unpack the
archive on a Windows machine, and have it run properly on a *nix host.

After unpacking the distribution archive, use the MANIFEST script to
verify that you have all of the pieces.  The output should go
something like this:

$ MANIFEST  
  MANIFEST
  Artistic
  README
  NEWS
  bin/genmetadata
  bin/genchanges
  bin/dochanges
  bin/dolabels
  bin/revmap
  lib/util.pl
  test/file,v
  test/dollar$file,v
  test/space file,v
  test/config
  test/runtest
  test/norm
  test/metadata.good
  test/lines.good
  test/changes.good
  test/p4_changes_-l.good
  test/p4_describe.good
  test/p4_filesat.good
  test/p4_labels.good

All ok


==== REQUIREMENTS    

This stuff should work on any Unix host that supports:

  - Perl 5.x, with working dbm support (i.e., dbmopen()/dbmclose()
    work). The scripts assume that perl will be found via $PATH. It
    must be a perl5! Some people have reported problems that seem to
    be related to dbm limitations with some perls when converting very
    large repositories. I like implementations based on Berkeley-DB.

  - Perforce release 2002.1, 2002.2, or 2003.1

    Later Perforce releases may work, but since this script generates
    journal-format metadata directly, it may need to be changed in
    order to work correctly with other Perforce releases. Please see
    the "PERFORCE METADATA DEPENDENCIES" section (below) for further
    details.
    
==== WHAT IT DOES

This converter will import a CVS module into Perforce, preserving the
branching structure seen in the RCS ,v file in the CVS repository, and
translating them into Perforce branches within the depot. As it
stands, it will only import RCS branches up through the highest
numbered revisions on branches that have branch tags referring to
them; thus, it will not necessarily bring *every* revision in the CVS
module into Perforce, but *will* bring in every revision leading up to
the current revision for every branch it imports. I think this is what
most people will want; if not, hack away.

Like the "rcstoperf.sh" converter available on the Perforce web site,
it applies heuristics to try and identify multiple changes in CVS that
are highly likely to comprise what would be seen as a single change in
Perforce, and makes them appear as a single Perforce change. (The
heuristics are: checked in by the same user, proximal in time, and
bearing an identical log message).

It deals correctly with files that are dead on the CVS trunk (I.e.,
where the RCS ,v files are in the "Attic/".

The converter attempts to leave converted files in perforce with a
sensible Perforce file type (See `p4 help filetypes` for a description
of file tyeps in Perforce) after the conversion. However, due to
limitations in RCS's notion of "file type" (the -k options,
controlling keyword expansion), cvs2p4 must currently decide to import
all "text" files as Perforce type "text" (text with no keyword
expansion) or "ktext" (text with keyword expansion). This is
controlled by the "$KTEXT" configuration option, which is on by
default.

Also note that binary files will be converted to Perforce type
"binary+D"; the (unusual) "+D" is there because the converter works by
using the existing RCS archive files directly; normally in perforce,
filetype "binary" implies storage of complete revisions, rather than
as RCS archives. Rest assured that "binary+D" is correct.

The "UI" for the converter is not very slick, but for most people it's
a one-time kind of tool anyway. Feel free to improve it if you are so
inclined.

Please understand that this tool is *not* presently officially
supported by Perforce. It is supplied in hopes that somebody will find
it useful (Or perhaps only entertaining :-).


==== TESTING

I have included a *very* rudimentary automated test "suite", in the
test/ directory. You can use this to verify that it seems to work in
your environment.

To run it:

  1. Edit test/config, and change the lines

       # p4 command location (If other than "/usr/local/bin/p4")
       #
       $P4             = "/usr/local/bin/p4";
       
       # p4 command location (If other than "/usr/local/bin/p4d")
       #
       $P4D            = "/usr/local/bin/p4d";
       
       # Perforce server we're using.
       #
       $P4PORT = "localhost:1680";
       
     to reflect the actual location of your "p4" and "p4d" commands,
     and the server port that you are using.

     *** Note: Previous versions of this tool allowed you to run the
     Perforce server on a different host than the one where the
     conversion tools were run. This is no longer the case; thus you
     should probably never change the "localhost" part of the P4PORT
     configuration setting, above. 
     
  2. Run the tests with

       test/runtest

     This should run all of the conversion scripts on the test CVS
     module (well, file - it's a one-file module!), and then verify a
     few things by querying the Perforce server after the conversion
     is complete.

     If everything goes well, the end of the output should be

       runtest: ok

In this version, the converted CVS "module" consists of a very few files,
but it does have a carefully constructed branching structure, intended
to verify that the converter does the right stuff with respect to
branching.


==== USAGE

1. Make a directory to hold all the working files for the conversion,
   and create a config file, starting with test/config as a template:

     $ mkdir convdir; cp test/config convdir

   Edit the convdir/config file to reflect your locale and
   intent. (See the comments in the config file).

2. Run bin/genmetadata:

   It takes a single argument - the name of the directory where the
   "config" file resides. (It will create all intermediate, temp, and
   working files under this directory.)

     $ bin/genmetadata convdir
     genmetadata: rm -rf convdir/logmsgs.dir convdir/logmsgs.pag ...
     .
     . (filenames of each file in the CVS module, as they are scanned)
     .
     ===== Lines referenced:
     chupa
     curly
     ha         <- a list of branch tags encountered in the scan;
     larry         also saved to convdir/lines.
     shemp
     xxx

   This reads cvsdir/config to get its marching orders, then scans the
   CVS module for all ,v and Attic/,v files, creating:

     convdir/metadata      <- the extracted RCS/CVS metadata
     convdir/logmsgs.pag   <- An ndbm database
     convdir/logmsgs.dir   <-   of the log messages
     convdir/lines         <- A list of "codelines" (== branch tags)

   At this point, you may want to look at the list of branch tags
   encountered, (which was written to convdir/lines), edit the config
   file, setting WANTLINES to 1, and filling in the "<<LINES" here
   file with the names of the branches you want to import to Perforce;
   then, rerun bin/genmetadata to rescan and pick up only those
   revisions you care about.

   
4. Run bin/genchanges:

   Again, this takes a single argument - the name of the "conversion
   directory":

     rmg $ bin/genchanges convdir
     16354                    <- This counter spins as it's running.
                                 It will count up to the number of
                                 lines in the metadata file.

   This reads convdir/config and convdir/metadata, and writes
   convdir/changes.


5. Run bin/dochanges:

   You might want to save a copy of the output with "tee".
   The output will look something like:

     rmg $ bin/dochanges convdir 2>&1 | tee OUT
     dochanges> /bin/rm -f convdir/revmap.db ...
     dochanges> /bin/rm -f convdir/depotmap.db ...
     dochanges> /bin/rm -rf p4root && mkdir -p p4root
     dochanges> /bin/mkdir -p /home/rmg/web/richard_geiger/...
     dochanges> /bin/ln -s /home/rmg/web/richard_geiger/...
     ========== change group 1
     ========== change group 2
     ========== change group 3
      .
      .
      .
     ========== change group 17
     ========== change group 18
     dochanges> cd /home/rmg/web/richard_geiger/...
     Recovering from dbmeta...
     dochanges> cd /home/rmg/web/richard_geiger/...
     Dumping to checkpoint...
     
   Basically, that's it. When this command finishes, your CVS module
   has been imported to Perforce, in the Perforce server database
   identified by the $P4ROOT configuration variable. The state of the
   resultant database is saved in a checkpoint file named
   $P4ROOT/checkpoint.

   NOTE: cvs2p4 does not create new RCS-format archives (,v files)
   under $P4ROOT; rather, it uses the existing RCS archives in the CVS
   tree directly. By defasult, does this by making a symbolic link
   named $P4ROOT/depot/IMPORT pointing to the $CVS_MODULE tree. If
   you'd rather have dochanges copy in the CVS module for you, set
   COPYIMPORT in the config file.


6. If you want to import labels from CVS tags, run

     $ bin/dolabels convdir
     make label: testlabel
     dolabels> cd p4root && /usr/local/bin/p4d -jr dblbls
     /home/rmg/web/richard_geiger/guest/richard_geiger/utils/cvs2p4_meta/p4root
     Recovering from dblbls...
     dolabels> cd p4root && rm -f checkpoint; /usr/local/bin/p4d -jd checkpoint
     /home/rmg/web/richard_geiger/guest/richard_geiger/utils/cvs2p4_meta/p4root
     Dumping to checkpoint...

   This step adds the symbolic tag information from the CVS archive
   (for "plain", non-branch tags) to the Perforce database identified
   by the $P4ROOT configuration variable.  The state of the resultant
   database is saved in a checkpoint file named $P4ROOT/checkpoint.
 
** NOTE: This version of cvs2p4 does *not* create new RCS archives in
** $P4ROOT/depot/...; Rather, it creates a symbolic link
** "$P4ROOT/depot/IMPORT -> $CVS_MODULE"; i.e., the existing RCS
** archives form the CVS repository are used by the Perforce server,
** in place. If you'd rather have it make a _copy_ of the RCS archive
** files from your CVS repository, set "$COPYIMPORT = 1" in your
** config file.
   
7. If you want the RCS revision-to-Perforce change map, run:

     $ bin/revmap convdir

   Or, for the reverse mapping:

     $ bin/revmap -map rrevmap convdir


==== INCREMENTAL CONVERSIONS

At this time, the recommended procedure for doing "incremental"
conversions - i.e., combining multiple CVS repositories, or doing
subsets of the CVS modules in a repository one at a time - is to do
each as a new conversion (starting with change 1), and then to combine
them as desired using the "perfmerge2.pl" tool.

This is also a useful pattern when you want to combine some new chunk
of CVS (or RCS) repository into an existing Perforce depot.

perfmerge2.pl can be obtained by sending email to support@perforce.com

In order for this to work, you'll need to insure that there is no
overlap in the namespaces of files, between your existing Perforce
repository and the newly converted files. See the notes at the top of
the perfmerg2.pl script.

perfmerge2.pl can operate in different modes, with respect to the
ordering of change numbers in the merged repositories. You can elect
either

  - to have it renumber all of the merged changesets, so that the
    time-ordered property of all change numbers (both existing and
    newly-merged) is preserved; or,

  - to leave your existing changes remain numbered as they are, with
    the newly imported changed numbered from the next available change
    number, even though some of them may have taken place (in CVS)
    interleaved in time with your existing Perforce changes.

Note that perfmerge2.pl only merges server metadata; you'll also need
to manually copy the tree of RCS archive files from your newly
converted $P4ROOT into your existing server's $P4ROOT.


==== PERFORCE METADATA DEPENDENCIES

Since cvs2p4 works by directly generating Perforce metadata in the
perforce checkpoint/journal format, it is dependent on "knowing" the
right definitions for certain tables within the Perforce database.

As of this writing, cvs2p4 writes metadata for the following Perforce
tables, at the version number shown for each table:

  table name    ver
  ------------  ---
  db.change     0
  db.desc       0
  db.rev        3
  db.revcx      0
  db.integed    0
  db.depot      0
  db.domain     2
  db.counters   0

p4d version 2002.2 writes version 3 of the db.domain table, but can
correctly read version 2.

The tests provided with this package are known to work correctly using
Perforce 2002.1, 2002.2, 2003.1, and 2004.2.


==== WHY DOES dolabels LABEL FILES IN MANY BRANCHES?

== What:

When converting CVS tags into Perforce labels, users are often
surprised to observe that, in addition to the files in the
expected branch, many files in _other_ branches are also labeled.

== Why:

This is a side effect of a difference in the way CVS and Perforce
handle branching, in combination with the way CVS tags work.

The essential problem is as follows... When a file which has been
branched, but not yet changed in the child branch, there is no way,
based only on information in the CVS archive, to determine _which_
branch(es) a label was intended to apply to. This is due to the fact
that, in CVS, creating a new branch does not actually create a new
branch revision within the RCS archive; it merely marks the branch
point with a symbolic name. This is not a problem for CVS, since the
act of checking out a CVS tree using a CVS tag will always retrieve a
single CVS revision for every file bearing the tag. In Perforce,
however, Perforce's "lazy copy" mechanism creates (if only virtually!)
a distinct #1 revision in the child branch at branching time. Thus,
for a given file, the #1 revisions for multiple Perforce branches may
share the same parent branch and revision. In such cases, there's no
way for the "dolabels" command to know specifically _which_ child
branch a given CVS tag was meant to apply to.

Therefore, dolabels includes the #1 revisions in all of the child
branches that share the common branch point in CVS.

== What to do:

In order to set up a Perforce client for the "correct" branch
corresponding to a particular CVS tag (now Perforce label), the user
must, in effect, tell Perforce what branch the label if really
supposed to be used with. This can be done by either

  1 - Restricting the view map in the client workspace to files with
      the "correct" branch for the label used, e.g.,

	View:
		//depot/prod/rel1_0/...	//client/...

      and

	% p4 sync ...@label

 - OR -

  2 - Selectively syncing only the "correct" client branch by
      providing the branch in in the sync filespec, e.g.,

	View:
		//depot/prod/...	//client/...

      and

        % pr sync //depot/prod/rel1_0/...@label

Another approach is for the user to remove the label from all branches
except for the "correct" one(s).

== Future

In the future, it could be possible, using heuristics that consider
global information (as opposed converting each file based solely on information
from the corresponding CVS (RCS) file archive, to have dolabels
automatically determine the correct branch for each label, and ONLY
include files from that branch in a label. That's (potentially) in the
future, though!


==== HOW TO PACKAGE MODIFICATIONS

Until now (October 7, 2003), I have made all changes to the package.
But this may change in the future, if/when the project gets additional
curators. So, I think it a good idea to document the steps used to
package and publish a new release:

  1. In a //guest workspace, make your modifications to files in the
     package. Before submitting:

  2. Edit the NEWS file to document the change(s). (Please follow the
     established format. Of course.)

  3. Do the submit.

  4. Update the checksums in the MANIFEST file:

       $ p4 edit MANIFEST
       <if your mods add files, edit MANIFEST to reflect this>
       $ MANIFEST -gen
       
  5. As root, Generate the release tarball:

       # MANIFEST -tar <vers>

     (This creates cvs2p4-<vers>.tar)

  6.  And add it to the default changelist:

       $ p4 add cvs2p4-<vers>

  7.  Finally, update the "cvs2p4-latest" symbolic link:

       $ p4 edit cvs2p4-latest.tar
       $ rm cvs2p4-latest.tar
       $ ln -s cvs2p4-<vers>.tar cvs2p4-latest.tar

  8. And then submit. In this scenario, the change will include
     the following files:

        MANIFEST
        cvs2p4-<vers>.tar 	(added)
        cvs2p4-latest.tar

To complete the act of "publishing" the new release, you must have
Perforce write access to //public/perforce/utils/cvs2p4/...
"Publishing" the new release a simple a matter of integrating your
change(s) into the //public/..., and submitting.


==== SUPPORT

I originally wrote and contributed this tool while working for Network
Appliance in 1997. 

I worked as a Perforce employee from August 2002 through December
2003.

I presently work at Data Domain, Inc., where I will continue to try
and offer help to those who encounter problems using these tools, but
cannot guarantee any level of support.

  - Richard Geiger  rmg@datadomain.com

  (revised October 7, 2003, release 2.3.7)