This is a loose collection of useful information culled from various
email messages. It's meant to save some time in reading the archives
for prospective hackers.
Adding Backends
===============
Getting Started
---------------
I'd copy from a "real" back end instead of the null backends. Choose
based on which one is most like ab:
- CVS branches in revision number space (foo#1.1.2.1 is a branch of
foo#1.1), has no atomic changes, and VCP can read RCS files
directly.
- p4 is the nicest to work with. It branches in name space
(foo/file can be a branch of bar/file) and captures enough
metadata to allow us to reproduce a source repository fairly
accurately. There is experimental support for P4::Client, an
interface to the experimental p4 api library (the library is
solid, mind you, but has some minor issues that keep us from using
it reliably all the time).
- There's an in-development svn back end out on the web, but it's
still working through svn-specific and VCP internals issues (I
refactored a lot and broke it :/).
- VSS also branches in name space, but is awkward to work with for
many reasons: missing metadata, poor command line tools and
inconsistent data model unstable operation.
Backend Modules Footprint
-------------------------
The back ends have three parts, usually:
- VCP::Dest::foo: handle_header(), handle_rev(), handle_footer() and
several other functions. If the dest handles changes, then
handle_rev() must accumulate changes until a rev for a different
change arrives or handle_footer() is called.
- VCP::Source::foo: handle_header(), copy_revs(), and
handle_footer() scan the repository and emit metadata. get_file()
retrieves files as needed by the downstream filters or dest.
- VCP::Utils::foo: These are common infrastructure for the dest &
source.
Implementation Order and Test Suites
------------------------------------
We try to build matched pairs of VCP::Source::foo and VCP::Dest::foo in
order to provide for reasonable testing and to encourage VCP to be
balanced and not merely a mechanism for fleeing to a given repository
type. I don't want VCP to become a mechanism for one-way migration, it
needs to be balanced where possible.
We would need to do the least-common-denominator testing in the
t/90revml2foo_*.t (which needs VCP::Dest::foo) and t/91svn2revml.t,
which needs VCP::Source::foo).
t/95*.t is good for ensure a particular conversion works to expectation
given one of the repositories generated in the t/90*.t tests.
t/99*.t is good for testing that a conversion works with a special case
repository. The other t/9*.t tests use just enough test data to make
sure there is no fundamental breakage.
Repository Interaction Patterns
-------------------------------
The command line is often a good place to start, it lets you use
relatively debugged and well defined access points (in general, not sure
how stable/usabel ab's CLI is). Other access methods are usually much
faster:
- Going direct to disk (if the repository has a supported /
published external on-disk representation) is pretty speedy and,
for CVS at least, takes about as much work as parsing the log file
emmitted by the cvs CLI.
- Implement a direct-to-backend protocol like P4::Client. This has
all the advantages of the CLI without the pain and suffering
involved in repeatedly spawning child processes. It's more
flexible than direct file reads because the repository can be
located over the net.
- Check to see if there's a backend available. Perl modules
should be drop-in useful, you can also wrap C interfaces
easily with the Inline:: modules, at least for prototyping and
quite likely "for real".
- More and more servers offer a web or WebDAV frontend, Perl's
WWW and HTTP modules may be helpful here.
- Read / generate a data dump for the backend. p4d can dump all
metadata as a "checkpoint" and import checkpoints and "journal"
files. We've not implemented this because it's in proprietary
format and we don't want to have to track it every change.
Interactive User Interfaces
---------------------------
> When I run 'perl -Ilib bin/vcp' I don't see my new backend in the list of
> source and dest I get prompted for.
You won't yet, those are for production ready backends and they're
hand-crafted in the ui_machines directory using state machines specified
in XML to specify the flow of the UI. Leave that until last, the
command line and config files are far more appropriate for rapidly
evolving backends.
Internals Notes
===============
TODO: most of this should move in to code comments or POD.
The VCP::Rev::*_info fields
---------------------------
The intent of this field in VCP::Rev is to capture source
repository information that does not survive least common
denominator processing, like p4 or CVS file modes.
- Destinations and filters could then use this to convert from
source flags like CVS's keyword expansion controls or p4's
stored-compressed-or-not flags to the destination's.
- These are not intended for internal processing, though you had
no way of knowing that
- VCP::Revs may be serialized between filters, so storing refs
in them is no longer a Good Idea
This should really be replaced by a more general mechanism like a
source_info member that is a HASH keyed by plugin ID (Perl package name
plus position in the plugins chain? Dunno), which then contains a HASH
of data members in plugin-specific format. Or something...
Filter Working Set Data
-----------------------
Filters should store internal-use-only data off to the side in a data
member or, for data sets that can get big, perhaps in a VCP::DB_File
structure
Filter sets that need to share data should also store them off to the
side and coordinate with eachother.
- If this is a need, VCP will need to provide some sort of rendevous
mechanism, probably using the feature negotiation mechanism that
should also replace sort_filter() mentioned elsewhere.
Cloned Revs and placeholders
----------------------------
One type of placeholder has the action "clone". This is used (so far)
for CVS branches that are given multiple branch tags, so a master branch
is cloned on to several "clone" branches using placeholder revs that
have a previous_id of the rev on the branch master. It is likely to
have use for VSS shares as well but that's not implemented as of VCP v
0.9.
This diagram illustrates what happens when a rev 1.10 is branched on to
physical branch 1.10.2 and that branch is given three branch tags
"bt_1", "bt_2", and "bt_3". The arcs in the graph represent VCP::Rev
instances. "B" revs have $r->is_branch_rev TRUE (action eq "branch"),
while the "C" revs have $r->is_clone_rev TRUE (action eq "clone"). Both
have $r->is_placeholder_rev TRUE.
||||||||||||||||||||||||||||||||||||||||
1.10
| \B
. 1.10.2.1<bt_1>----------------+
. | \C \C
. | 1.10.2.1<bt_2> 1.10.2.1<bt_3>
|
1.10.2.2<bt_1>----------------+
| \ \C
| 1.10.2.2<bt_2> 1.10.2.2<bt_3>
|
1.10.2.3<bt_1>----------------+
| \C \C
| 1.10.2.3<bt_2> 1.10.2.3<bt_3>
|
||||||||||||||||||||||||||||||||||||||||
Kind of strange, but it seems to capture the semantic of "hey, I created
this branch, then I also labelled it this way and that" so that
VCP::Dest::p4 users can tell the master branch from the cloned branches.
Per-Backend Notes
=================
Some things to be aware of when seeing data from particular backends
(see also the LIMITATIONS sections for each of the modules you might be
dealing with).
CVS Oddities
------------
See the discussion of Cloned Revs elsewhere.
CVS does not guarantee that 1.1 is the first rev on the trunk (I've seen
3.0, etc). In general, people get very funky with RCS files and you can
only count on the branches, next and symbols fields to give you
structural information and you need to actually check to see the base
rev on the trunk.
VCP::Source::cvs can issue two revs for a deleted revision with changes,
so a dead 1.1 would cause two VCP::Rev instances: one to create the file
and another to delete it. This is necessary for dead 1.1 revs and for
multiple consecutive dead but edited revs.
CVS does not supply complete metadata for deletes and branches. user_id
and time are missing for many deletes and for all branch creations.
VCP::Dest::sort_filter()
------------------------
The destinations insert filters using this. This is so that users don't
need to add must-have filters to every .vcp file:
- ChangeSets
- See VCP::Dest::p4
- StringEdit
- This is to clean up filenames, labels, usernames, etc.,
that would cause svn to choke.
Background:
- sort_filter() is a first cut at a generalized negotiation
mechanism.
- The VCP::Filter::svn* are the first filters that are
requires
- I'd like to generalize the sort_filter() implementation in
to a generalized contract negotiation mechanism where each
filter places guarantees in to a HASH ref and the
downstream filters can:
- ignore what they don't care about
- adapt if need be
- die for illegal input (missing guarantees)
- warn for odd or dangerous
- insert their own prefilters