This is a loose collection of useful information culled from various email messages. It's meant to save some time in reading the archives for prospective hackers. Adding Backends =============== Getting Started --------------- I'd copy from a "real" back end instead of the null backends. Choose based on which one is most like ab: - CVS branches in revision number space (foo#1.1.2.1 is a branch of foo#1.1), has no atomic changes, and VCP can read RCS files directly. - p4 is the nicest to work with. It branches in name space (foo/file can be a branch of bar/file) and captures enough metadata to allow us to reproduce a source repository fairly accurately. There is experimental support for P4::Client, an interface to the experimental p4 api library (the library is solid, mind you, but has some minor issues that keep us from using it reliably all the time). - There's an in-development svn back end out on the web, but it's still working through svn-specific and VCP internals issues (I refactored a lot and broke it :/). - VSS also branches in name space, but is awkward to work with for many reasons: missing metadata, poor command line tools and inconsistent data model unstable operation. Backend Modules Footprint ------------------------- The back ends have three parts, usually: - VCP::Dest::foo: handle_header(), handle_rev(), handle_footer() and several other functions. If the dest handles changes, then handle_rev() must accumulate changes until a rev for a different change arrives or handle_footer() is called. - VCP::Source::foo: handle_header(), copy_revs(), and handle_footer() scan the repository and emit metadata. get_file() retrieves files as needed by the downstream filters or dest. - VCP::Utils::foo: These are common infrastructure for the dest & source. Implementation Order and Test Suites ------------------------------------ We try to build matched pairs of VCP::Source::foo and VCP::Dest::foo in order to provide for reasonable testing and to encourage VCP to be balanced and not merely a mechanism for fleeing to a given repository type. I don't want VCP to become a mechanism for one-way migration, it needs to be balanced where possible. We would need to do the least-common-denominator testing in the t/90revml2foo_*.t (which needs VCP::Dest::foo) and t/91svn2revml.t, which needs VCP::Source::foo). t/95*.t is good for ensure a particular conversion works to expectation given one of the repositories generated in the t/90*.t tests. t/99*.t is good for testing that a conversion works with a special case repository. The other t/9*.t tests use just enough test data to make sure there is no fundamental breakage. Repository Interaction Patterns ------------------------------- The command line is often a good place to start, it lets you use relatively debugged and well defined access points (in general, not sure how stable/usabel ab's CLI is). Other access methods are usually much faster: - Going direct to disk (if the repository has a supported / published external on-disk representation) is pretty speedy and, for CVS at least, takes about as much work as parsing the log file emmitted by the cvs CLI. - Implement a direct-to-backend protocol like P4::Client. This has all the advantages of the CLI without the pain and suffering involved in repeatedly spawning child processes. It's more flexible than direct file reads because the repository can be located over the net. - Check to see if there's a backend available. Perl modules should be drop-in useful, you can also wrap C interfaces easily with the Inline:: modules, at least for prototyping and quite likely "for real". - More and more servers offer a web or WebDAV frontend, Perl's WWW and HTTP modules may be helpful here. - Read / generate a data dump for the backend. p4d can dump all metadata as a "checkpoint" and import checkpoints and "journal" files. We've not implemented this because it's in proprietary format and we don't want to have to track it every change. Interactive User Interfaces --------------------------- > When I run 'perl -Ilib bin/vcp' I don't see my new backend in the list of > source and dest I get prompted for. You won't yet, those are for production ready backends and they're hand-crafted in the ui_machines directory using state machines specified in XML to specify the flow of the UI. Leave that until last, the command line and config files are far more appropriate for rapidly evolving backends. Internals Notes =============== TODO: most of this should move in to code comments or POD. The VCP::Rev::*_info fields --------------------------- The intent of this field in VCP::Rev is to capture source repository information that does not survive least common denominator processing, like p4 or CVS file modes. - Destinations and filters could then use this to convert from source flags like CVS's keyword expansion controls or p4's stored-compressed-or-not flags to the destination's. - These are not intended for internal processing, though you had no way of knowing that - VCP::Revs may be serialized between filters, so storing refs in them is no longer a Good Idea This should really be replaced by a more general mechanism like a source_info member that is a HASH keyed by plugin ID (Perl package name plus position in the plugins chain? Dunno), which then contains a HASH of data members in plugin-specific format. Or something... Filter Working Set Data ----------------------- Filters should store internal-use-only data off to the side in a data member or, for data sets that can get big, perhaps in a VCP::DB_File structure Filter sets that need to share data should also store them off to the side and coordinate with eachother. - If this is a need, VCP will need to provide some sort of rendevous mechanism, probably using the feature negotiation mechanism that should also replace sort_filter() mentioned elsewhere. Cloned Revs and placeholders ---------------------------- One type of placeholder has the action "clone". This is used (so far) for CVS branches that are given multiple branch tags, so a master branch is cloned on to several "clone" branches using placeholder revs that have a previous_id of the rev on the branch master. It is likely to have use for VSS shares as well but that's not implemented as of VCP v 0.9. This diagram illustrates what happens when a rev 1.10 is branched on to physical branch 1.10.2 and that branch is given three branch tags "bt_1", "bt_2", and "bt_3". The arcs in the graph represent VCP::Rev instances. "B" revs have $r->is_branch_rev TRUE (action eq "branch"), while the "C" revs have $r->is_clone_rev TRUE (action eq "clone"). Both have $r->is_placeholder_rev TRUE. |||||||||||||||||||||||||||||||||||||||| 1.10 | \B . 1.10.2.1----------------+ . | \C \C . | 1.10.2.1 1.10.2.1 | 1.10.2.2----------------+ | \ \C | 1.10.2.2 1.10.2.2 | 1.10.2.3----------------+ | \C \C | 1.10.2.3 1.10.2.3 | |||||||||||||||||||||||||||||||||||||||| Kind of strange, but it seems to capture the semantic of "hey, I created this branch, then I also labelled it this way and that" so that VCP::Dest::p4 users can tell the master branch from the cloned branches. Per-Backend Notes ================= Some things to be aware of when seeing data from particular backends (see also the LIMITATIONS sections for each of the modules you might be dealing with). CVS Oddities ------------ See the discussion of Cloned Revs elsewhere. CVS does not guarantee that 1.1 is the first rev on the trunk (I've seen 3.0, etc). In general, people get very funky with RCS files and you can only count on the branches, next and symbols fields to give you structural information and you need to actually check to see the base rev on the trunk. VCP::Source::cvs can issue two revs for a deleted revision with changes, so a dead 1.1 would cause two VCP::Rev instances: one to create the file and another to delete it. This is necessary for dead 1.1 revs and for multiple consecutive dead but edited revs. CVS does not supply complete metadata for deletes and branches. user_id and time are missing for many deletes and for all branch creations. VCP::Dest::sort_filter() ------------------------ The destinations insert filters using this. This is so that users don't need to add must-have filters to every .vcp file: - ChangeSets - See VCP::Dest::p4 - StringEdit - This is to clean up filenames, labels, usernames, etc., that would cause svn to choke. Background: - sort_filter() is a first cut at a generalized negotiation mechanism. - The VCP::Filter::svn* are the first filters that are requires - I'd like to generalize the sort_filter() implementation in to a generalized contract negotiation mechanism where each filter places guarantees in to a HASH ref and the downstream filters can: - ignore what they don't care about - adapt if need be - die for illegal input (missing guarantees) - warn for odd or dangerous - insert their own prefilters