Richard Gruet - Created September 15, 2005 - Last revised September 19, 2005

p4ftpsync - Synchronizes a remote site with a P4 depot via FTP

Table Of Contents

Overview

What's that ?

p4ftpsync.py is a Python script which synchronizes a remote site with a P4 repository via Ftp, and vice versa. It is specially useful when you have not enough control over the remote host to be able to install a Perforce client there, which would make synchronization easy (at least the P4 to remote site sync).

What for ?

I am a member of a team which designs web sites. We want to keep the pages and code for our sites under P4 control, and synchronize the remote live sites from time to time to reflect the changes. Occasionally we have also to modify the pages directly on the live sites (in emergency situations!), and some of our clients can even edit the pages directly using Macromedia Contribute: In these cases we want to have the P4 repository synced back.
If you have enough control over the web host where your live site resides, i.e. if you can install a P4 client, the first sync (P4 to the live site) is easily accomplished via a P4 sync command, but not the second since P4 doesn't detect automatically changes in workspaces, you still have to manually open the files for add or delete. And if you have not enough privileges to be able to install a P4 client, which is the case of a lot of small to medium sites using cheap Web Hosting, you just can't use the P4 client install/P4 sync strategy [this strategy and others are developped in an interesting article on the P4 web site: Web Content management in Perforce].
I searched for alternate solutions but could not find any - I could have missed something though, please let me know! - So I eventually wrote a program to perform the P4 to remote site synchronization, which I extended later to perform the reverse sync as well.
The problem is to emulate a P4 remote client accessible only via FTP. The basic idea is to use a dedicated local client/workspace that reflects (mirrors) the state of the remote site. If we P4 sync this workspace to a certain revision, we get the exact list of changes that occurred, which we can then propagate (push) to the remote site using FTP operations. Conversely, with the restriction that the remote site be synced to the head revision, we can detect changes made directly to the site by comparing the remote site and the local workspace (synced to the head revision beforehand).
Basically that's what p4ftpsync does. It's plain Python, one single loooooong file (>3000 lines!), uses no external library (not even the Perforce Python API) and requires only the basic P4 client install + the Python language (not required with the (almost) standalone executable version). It works with Perforce server version 2005.1 but should work with older versions as well (it doesn't use any fancy features but relies on P4 output parsing so it could be sensitive to change in format). The code uses some tricks to optimize speed (file caching, multi-threaded FTP operations) and so far it seems to be rather reliable, at least for common simple operations (eg sync to head revision).

Requirements

Installation

You have the choice between 2 distributions :
In both cases you must have installed at least the Perforce client, and added the P4 install directory to your PATH environment variable before you can run p4ftpsync. Check it by opening a terminal and type:
p4
You should get a P4 help message. If not, check your P4 install and PATH envt var.
p4ftpsync also requires that you create a dedicated P4 client (workspace). This workspace will be used to mirror the remote site in order to perform the synchronization. Use the Perforce command p4 client to create a client. Define the client map spec so to include the part of the depot that you want to synchronize.

Start-up

We assume that you have installed everything as described above and created a dedicated P4 client (called here syncClient). The examples described use the python script p4ftpsync.py, but apply to the executable version as well.
To get help, type :
[python] p4ftpsync[.py] -h  (or --help)
You can skip the "python" if p4ftpsync.py is in your PATH and files with extension .py are associated to the Python interpreter (which is generally the case). On windows, you can even omit the extension .py, provided you have added .py to the list of extensions in the PATHEXT environment variable.
Let's try a normal (P4 to remote site) synchronization. Say you have your site on ftp server ftpServerAddress accessible by user userName, password passwd, rooted at /mySite/www/. Say the corresponding location in the P4 depot is at //depot/mySite/www/.... You want to synchronize the remote site with the head revision of the depot. Type:
p4ftpsync.py -v -t --p4Passwd password --p4Client syncClient --ftpHost ftpServerAddress
 --ftpUser userName --ftpPasswd passwd --ftpRoot /mySite/www/ //depot/mySite/www/...
If you perform this sync for the first time, p4ftpsync will try to upload all the files in //depot/mySite/www/.... Actually, the actions are the same as a p4 sync //depot/mySite/www/... would do, except that they are converted to ftp actions to update the remote site (viewed as a remote P4 workspace). Next runs will only transfer changes since the last run, as P4 sync would do (however option -f allows to force the complete refresh of the workspace and therefore will upload all the files again).
As you can see, there are a lot of parameters to provide to p4ftpsync. You will probably find more convenient to define a small script (for each site to synchronize), which will call pftpsync with the appropriate arguments. I find personally easier to define two scripts for the normal and reverse sync respectively (since they have different options), that I call (on Windows) syncMySitelive.bat and rsyncMySitelive.bat. Here is an example of such scripts :
@echo off
REM syncMySitelive.bat:
REM Synchronizes (depot head revision -> live site) the MySite live site via FTP.
REM
REM For reverse synchronization, see rsyncMySitelive.bat.
REM Uses P4 workspace ("client") syncClient as a mirror.
REM You can pass additional args like :
REM   -v, --verbose  to get a more detailed trace,
REM   -f, --force to force P4 to resync to the given revision
REM   ... and many more! use option -h for details.
REM 
p4ftpsync.py %* --p4Passwd password --p4Client syncClient --ftpHost ftpServerAddress
 --ftpUser userName --ftpPasswd passwd /mySite/www/ --exclude @p4ftpSyncDir\syncMySitelive.excludes.txt
 //depot/mySite/www/...
(Replace p4ftpsync.py with p4ftpsync.exe if needed)
@echo off
REM rsyncMySitelive.bat:
REM Reverse synchronization (live site -> P4) of live site MySite via FTP.
REM Creates a P4 changelist for the changes and submits it.
REM
REM For normal synchronization (p4 -> live site), see syncMySitelive.bat.
REM Uses P4 workspace ("client") syncClient as a mirror.
REM You can pass additional args like :
REM   -v, --verbose  to get a more detailed trace,
REM   -f, --force to disable the file desc cache and force re-reading of remote files
REM   ... and many more! use option -h for details.
REM 
p4ftpsync.py %* --reverse --submit --p4Passwd password --p4Client syncClient --ftpHost ftpServerAddress
 --ftpUser userName --ftpPasswd passwd /mySite/www/ --comment "GG3:rsync: Integrated changes made on the
 mySite live site." --mailto "john@doe.org,jane@jungle.com" --exclude @p4ftpSyncDir\syncMySitelive.excludes.txt
 //depot/mySite/www/...
(options specific to reverse sync are in bold).
Both scripts share the same list of exclusions (which makes sense). This is why the list is contained in a external file (p4ftpSyncDir\syncMySitelive.excludes.txt), rather than directly passed as a command line argument (this is indicated by the use of @ in option --exclude). Files to exclude from the sync are specified as patterns, one per line. Patterns are actually limited regular expressions, implicitely terminated by $ and preceded by ^.*, which in practice means that files matched are expected to reside in leaf directories (i.e. directories with no sub-directories). (This is a bit restrictive and should be improved in the near future). For examples of patterns, see option --exclude in the Command Reference section below.
Once you have perfectly tuned the parameters in your 2 scripts, you may consider scheduling their execution, for example on a daily basis, using Control Panel/Scheduled Tasks on Windows, cron on Unix, etc...
Synchronizing and reverse synchronizing is like having two different p4 clients accessing and modifying the same files. If a file can be modified directly on the live site, then a situation of conflict (e.g. simultaneous edits of the same file) is possible, and must be resolved. By scheduling the reverse sync first, then the "normal" sync, rather than the contrary, one guarantees that any conflict will be detected by Perforce and scheduled for resolve. This is not true if the scripts are scheduled the other way (normal, then reverse): in this case any change on the live site will be silently overwritten by a change (done elsewhere) already submitted in P4.
The strategy above is somehow "blind" in that it synchronizes the live site at fixed intervals without any consideration for actual changes made. You don't always want to propagate changes submitted in P4 to the live site ASAP. Maybe you want to accumulate changes somewhere and once you are ready, transfer them to the live site. A possible strategy to achieve this is to create a branch for the sync: you make the changes into the main/development branch, and when you feel ready you integrate them into the sync branch, which of course will be the one specified for the whatToSync parameter to p4ftpsync.

Command Reference

(this is basically a formatted copy of the ouput of p4ftpsync --help)
[python] p4ftpsync[.py] [options] whatToSync
whatToSync specifies what to sync, and must be a valid P4 file spec such as the ones used in the P4 sync command (e.g. //depot/MyProject/MyDir/..., //depot/Proj2/main.c#2, @label, etc...). For the reverse sync, the revision range info is N.S., since the comparison is always done with the HEAD revision.

Reverse sync

p4ftpsync may also be run in reverse sync mode (option -r). In this mode, the files on the remote site are compared with the latest revision files in P4 and any change on the remote site is reported into a new P4 changelist (optionally submitted at the end of the process). This mode is handy if some changes are done directly on the remote site and you want to easily keep your P4 repository up to date. Symbolic links are supported on the remote site but mapped to real files locally/in P4 named like the link.

Valid options

Perforce options
--p4Port
P4 host:port to use [default: current config, see Default values]
--p4User
P4 user name to use [default: see Default values].
--p4Passwd
P4 user password [default: see Default values].
--p4Client
P4 client to use as the mirror of the remote site [default: see Default values].
FTP options (for the remote site)
--ftpHost
FTP host IP address, assume port 21 --(mandatory)
--ftpUser
FTP user name [mandatory]
--ftpPasswd
FTP user password [default: will be prompted]
--ftpRoot
FTP root directory [mandatory]
Normal sync (p4 to remote site) options
-o, --scriptDir DIR
Directory in which to generate the Python update script in normal sync [default: this prog dir/p4FtpSyncScripts]
Reverse sync (remote site to P4) options
-r, --reverse
Reverse synchronization: Changes occurred on the remote site are detected, and a P4 changelist is created but not submitted, unless option -s is specified.
-s, --submit
Submit the P4 changelist created for the changes detected. The default is to not submit, so the changes can be reviewed before (manual) submit.
-c, --comment "COMMENT"
Optional description for the changelist. A default will be generated if none is specified.
-m, --mailto ADDR1,ADDR2,...
A list of email addresses to send a "file change report" to [default: don't send an email]. For now the mail server settings are globals in p4ftpsync.py !
Common sync. options
-f, --force
For normal sync: If specified, the target will be resynced even if supposedly up to date (same as P4 sync option -f).
For reverse sync: If specified, the file description cache will not be used and the file download & comparison will always be done.
-t, --test
Test/preview mode: For normal sync: do not really P4 sync the mirror client workspace, generate the script but do not actually FTP.
For reverse sync: do not create P4 changelist or copy files to client space.
-x, --exclude FILE1,FILE2,... | @FILE
Exclude files from the list of updates to do (handy if some files must remain different locally from remotely, e.g. config files, system files). FILEs are specified as a comma separated list (no spaces) of patterns (regexp, or alternatively listed in a file (@FILE), one per line. If the (end of the) path of a file to sync matches one of the patterns, then it will be excluded from the sync.
Examples of patterns:
/.htaccess (exclude the .htaccess file from any leaf (ie without subdir) directory)
/log/.* (exclude any file named log, in any leaf directory )
/~.* (exclude any file whose name starts with ~, in any leaf directory)
.pdf (exclude all pdf files from leaf directories)
.PDF (same as above but uppercase, for now there is no way to be case insensitive).
/A/aFile (exclude any file named aFile present in any leaf directory named A)
Note: the current exclusion pattern syntax is somewhat limited. It is difficult to write a pattern to exclude a certain file type from a non-leaf directory, or to be case insensitive. This will probably be improved soon.
Misc options
-v, --verbose
Verbose: More trace/error messages on stdout.
-V, --version
Print p4ftpsync version on stdout and exit.
-h, --help
Print help message on stdout and exit.

Default values

p4ftpsync stores (in file p4ftpsync.opt) the last values used for options p4Client, ftpHost, ftpUser and ftpRoot and uses them if no value is provided on the command line. Options are stored twice: as associated to the specific target sync spec, and as generic "last session defaults" (ftpRoot is not stored in the latter case).
When trying to reload the values for a new session, p4ftpsync first attempts to reload the target specific values, then the non specific ones.

File description cache (reverse sync only)

Remote file descriptions are saved on disk after a successful synchronization and used on subsequent syncs to determine if a file has changed (different file descs), without having to download the file, which can save a tremendous amount of time.
On the first run of p4ftpsync for a given site, all files from the remote site will be downloaded and compared to the files in the depot, quite a lengthy and bandwith consuming operation ! However next iterations should use the cache and be considerably faster (the only relatively long operation that can't be shortened is the determination of the current file structure on the remote site). When the cache is missing or corrupted, the download of the entire site (minus the exclusions) occurs again.
The cache can be disabled with option -f, --force (it is then not used for the current iteration but will still be saved for future use).

Generated FTP Script

In normal sync, a Python script is created in the directory indicated by option -o for the specified target (whatToSync). The script is named what_date.py (with / and spaces replaced by _). It contains the set of FTP actions required to synchronize the remote client (site) as requested, therefore its execution will perform the actual update, which is very useful, should the FTP session fail before completion: executing the script will ftp the changes again.

Log

A file p4ftpsync.log is generated in p4ftpsync's folder (always verbose, unlike the on-screen trace which depends on option -v). Logs are rotated to p4ftpsync.log.n.zip when they exceed a certain size (4MB by default).

Limitations, Bugs, ToDo

The most obvious limitations and todos I can think of right now are:
Please report bugs or suggestions to Richard Gruet. I'd be happy to hear from you. I'll try to fix bugs and implement suggestions if I think they can improve p4ftpsync (and are not to complex !). Don't hold your breath, though. I'm quite busy.

License

Copyright © 2004-2005 Richard Gruet
Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee or royalty is hereby granted, provided that the above copyright notice appear in all copies and that both that copyright notice and this permission notice appear in supporting documentation or portions thereof, including modifications, that you make.
THE AUTHOR RICHARD GRUET DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE !
In short: I publish this program in hope it can be useful to others. Use it as you wish but keep the copyright intact, and don't hold me responsible for any problem you run into ;)