Richard Gruet - Created September 15, 2005 - Last revised
September 20, 2005
p4ftpsync - Synchronizes a remote site with a P4 depot via FTP
Table Of Contents
Overview
What's that ?
p4ftpsync.py
is a Python script which synchronizes a remote site with a P4 repository via Ftp, and vice versa. It is specially useful when you have not enough control over the remote host to be able to install a Perforce client there, which would make synchronization easy (at least the P4 to remote site sync).
What for ?
I am a member of a team which designs web sites. We want to keep the pages and code for our sites under P4 control,
and
synchronize the remote live sites from time to time to reflect the changes. Occasionally we
have also to modify the pages
directly on the live sites (in emergency situations!), and some
of our clients can even edit the pages directly using Macromedia
Contribute: In these cases we want to have the P4 repository
synced back.
If you have enough control over the web host where your live site resides, i.e. if you can install a P4 client, the first sync
(P4 to the live site) is easily accomplished via a
P4 sync
command, but not the second since P4 doesn't detect
automatically changes in workspaces, you still have to manually open the files for add or delete.
And if you have not enough privileges to be able to install a P4 client, which is the case of a lot of small to medium
sites using cheap Web Hosting, you just can't use the P4 client install/P4 sync strategy [this strategy and others are developped
in an interesting article on the P4 web site:
Web Content management in
Perforce].
I searched for alternate solutions but could not find any - I could have missed something though, please let me know! -
So I eventually wrote a program to perform the P4 to remote site synchronization, which I extended later to perform
the reverse sync as well.
The problem is to emulate a P4 remote client accessible only via FTP. The basic idea is to use a dedicated local
client/workspace that reflects (mirrors) the state of the remote site. If we P4 sync this workspace to a certain
revision, we get the exact list of changes that occurred, which we can then propagate (push) to the remote
site using FTP operations. Conversely, with the restriction that the remote site be synced to the head revision,
we can detect changes made directly to the site by comparing the remote site and the local workspace (synced to
the head revision beforehand).
Basically that's what
p4ftpsync
does. It's plain
Python, one single loooooong file (>3000 lines!),
uses no external library (not even the Perforce Python API) and requires only the basic P4 client install + the
Python language (not required with the (almost) standalone executable version). It works with Perforce
server version 2005.1 but should work with older versions as well (it doesn't use any fancy features but relies
on P4 output parsing so it could be sensitive to change in format). The code uses some tricks to optimize speed
(file caching, multi-threaded FTP operations) and so far it seems to be rather reliable, at least for common simple
operations (eg sync to head revision).
Requirements
- Python 2.2+ (not required if the Windows standalone executable version is used).
- Basic (text mode) Perforce client installed locally. No Python API required.
- Your remote site accessible via FTP. The more possible simultaneous connections the faster transfers will be.
Installation
You have the choice between 2 distributions :
- A Python script p4ftpsync.py
(you will need Python 2.2+ to execute it).
- A Win32 standalone executable p4ftpsync.exe.
Actually you may also need to download MSVCR71.dll
if not already present on your system (this is because Python 2.4 is compiled with MicroSoft VC7, which creates
executables that depend on that dll). Put it in the same directory as the .exe.
In both cases you must have installed at least the
Perforce client, and added the P4 install directory to
your
PATH
environment variable before you can run
p4ftpsync
.
Check it by opening a terminal and type:
p4
You should get a P4 help message. If not, check your P4 install and PATH
envt var.
p4ftpsync
also requires that you create a dedicated P4 client (workspace).
This workspace will be used to mirror the remote site in order to perform the synchronization.
Use the Perforce command p4 client
to create a client. Define the client map spec
so to include the part of the depot that you want to synchronize.
Start-up
We assume that you have installed everything as described above and created a dedicated P4 client
(called here syncClient). The examples described use the python script p4ftpsync.py
,
but apply to the executable version as well.
To get help, type :
[python] p4ftpsync[.py] -h (or --help)
You can skip the "python" if p4ftpsync.py
is in your PATH
and files with extension
.py
are associated to the Python interpreter (which is generally the case). On windows, you can even
omit the extension .py
, provided you have added .py to the list of extensions in the PATHEXT
environment variable.
Let's try a normal (P4 to remote site) synchronization. Say you have your site on ftp server ftpServerAddress
accessible by user userName, password passwd, rooted at /mySite/www/. Say the corresponding
location in the P4 depot is at //depot/mySite/www/.... You want to synchronize the remote site with the
head revision of the depot. Type:
p4ftpsync.py -v -t --p4Passwd password --p4Client syncClient --ftpHost ftpServerAddress
--ftpUser userName --ftpPasswd passwd --ftpRoot /mySite/www/ //depot/mySite/www/...
- If you omit a mandatory option, p4ftpsync will ask you to enter it interactively. Some P4 options are
guessed from the environment, and p4ftpsync saves certain option values from one session to the
other and uses them as defaults (more details about this later).
- Option -v (or --verbose) displays more detailed information messages on the screen (there
is also an -always verbose- log, see file
sameDirAsP4ftpsync/p4ftpsync.log
.
- Option -t (or --test) performs a dry run: synchronization actions are listed but not
actually executed. The 2 options can be grouped together as
-vt
since they have no parameters.
If you perform this sync for the first time, p4ftpsync
will try to upload
all the files in //depot/mySite/www/... to the remote site. Actually, the actions
are the same as a p4 sync //depot/mySite/www/...
would do, except that they are converted to ftp
actions to update the remote site (viewed as a remote P4 workspace). Next runs will only transfer changes since
the last run, as P4 sync
would do (however option -f allows to force
the complete refresh of the workspace and therefore will upload all the files again).
As you can see, there are a
lot of parameters to provide to
p4ftpsync
!
Fortunately the program tries to
help you :
- by defaulting missing P4 parameters p4Host, p4User, p4Passwd to their current values in the environment
(P4PORT, P4USER, P4PASSWORD - actually the program gets this info by parsing the output of a
p4 info
command).
- The values of several options are also saved on disk during a session and used as defaults in
the next session (passwords are never stored).
- Finally, the user is prompted to dynamically enter the missing required parameters.
In spite of these facilities, if you have to routinely perform a sync task, it is more convenient to define
a small
script for each site to synchronize, which will call
pftpsync
with the
appropriate arguments. I find personally easier to define
two scripts for the normal and reverse
sync respectively (since they have different options), that I call (on Windows)
syncMySitelive.bat
and
rsyncMySitelive.bat
. Here is an example of such scripts :
@echo off
REM syncMySitelive.bat:
REM Synchronizes (depot head revision -> live site) the MySite live site via FTP.
REM
REM For reverse synchronization, see rsyncMySitelive.bat.
REM Uses P4 workspace ("client") syncClient as a mirror.
REM You can pass additional args like :
REM -v, --verbose to get a more detailed trace,
REM -f, --force to force P4 to resync to the given revision
REM ... and many more! use option -h for details.
REM
p4ftpsync.py %* --p4Passwd password --p4Client syncClient --ftpHost ftpServerAddress
--ftpUser userName --ftpPasswd passwd /mySite/www/ --exclude @p4ftpSyncDir\syncMySitelive.excludes.txt
//depot/mySite/www/...
(Replace p4ftpsync.py
with p4ftpsync.exe
if needed)
@echo off
REM rsyncMySitelive.bat:
REM Reverse synchronization (live site -> P4) of live site MySite via FTP.
REM Creates a P4 changelist for the changes and submits it.
REM
REM For normal synchronization (p4 -> live site), see syncMySitelive.bat.
REM Uses P4 workspace ("client") syncClient as a mirror.
REM You can pass additional args like :
REM -v, --verbose to get a more detailed trace,
REM -f, --force to disable the file desc cache and force re-reading of remote files
REM ... and many more! use option -h for details.
REM
p4ftpsync.py %* --reverse --submit --p4Passwd password --p4Client syncClient --ftpHost ftpServerAddress
--ftpUser userName --ftpPasswd passwd /mySite/www/ --comment "GG3:rsync: Integrated changes made on the
mySite live site." --mailto "john@doe.org,jane@jungle.com" --exclude @p4ftpSyncDir\syncMySitelive.excludes.txt
--smtpServer smtp.myIsp.com //depot/mySite/www/...
(options specific to
reverse sync are in bold).
- option --reverse tells
p4ftpsync
to perform a reverse synchronisation, instead
of the default "normal" one.
- option --submit tells
p4ftpsync
to automatically submit the P4 changelist created
for the changes detected (if any). By default the changelist is not submitted to let you have a look at the
changes and possibly revert some of them (useful during your first trials).
- option --comment overrides the default auto-generated changelist comment with a customized one.
- option --mailto tells
p4ftpsync
to send a report e-mail to each address listed.
The --smtpServer option specifies the address of the SMTP server to use (the default is
localhost:25. Use options --smtpUser and --smtpPasswd if the server requires authentication).
Default is to not send an email report.
Both scripts share the same
list of exclusions (which makes sense). This is why the list is
contained in a
external file (
p4ftpSyncDir\syncMySitelive.excludes.txt), rather than
directly passed as a command line argument (this is indicated by the use of
@
in option --exclude). Files to exclude from the sync are specified as
patterns, one per line.
Patterns are actually Python
regular expressions, implicitely terminated by
$
and preceded by
^.*
, which in practice means that patterns will be matched against the
end of the files to check. For examples of patterns, see option
--exclude
in the
Command Reference section below.
Once you have perfectly tuned the parameters in your 2 scripts, you may consider scheduling their
execution, for example on a daily basis, using Control Panel/Scheduled Tasks on Windows, cron on Unix, etc...
Synchronizing and reverse synchronizing is like having two different p4 clients accessing and modifying the
same files. If a file can be modified directly on the live site, then a situation of
conflict (e.g. simultaneous edits of the same file) is possible, and must be resolved.
By scheduling the reverse sync first, then the "normal" sync, rather
than the contrary, one guarantees that any conflict will be detected by Perforce and scheduled for resolve.
This is not true if the scripts are scheduled the other way (normal, then reverse): in this case any change
on the live site will be silently overwritten by a change (done elsewhere) already submitted in P4.
The strategy above is somewhat "blind" in that it synchronizes the live site at fixed intervals without any
consideration for actual changes made. You don't always want to propagate changes submitted in P4 to
the live site ASAP. Maybe you want to accumulate changes somewhere and once you are ready, transfer
them to the live site. A possible strategy to achieve this is to create a branch for the sync:
you make the changes into the main/development branch, and when you feel ready you integrate
them into the sync branch, which of course will be the one specified for the whatToSync parameter
to p4ftpsync
.
Command Reference
(this is basically a formatted copy of the ouput of p4ftpsync --help
)
[python] p4ftpsync[.py] [options] whatToSync
whatToSync specifies what to sync, and must be a valid P4 file spec such as the ones used in the
P4 sync
command (e.g. //depot/MyProject/MyDir/...
,
//depot/Proj2/main.c#2
, @label
, etc...). For the reverse sync, the revision
range info is N.S., since the comparison is always done with the HEAD revision.
Reverse sync
p4ftpsync
may also be run in reverse sync mode (option -r). In this mode, the files
on the remote site are compared with the latest revision files in P4 and any change on the remote site is reported
into a new P4 changelist (optionally submitted at the end of the process). This mode is handy if some changes are
done directly on the remote site and you want to easily keep your P4 repository up to date. Symbolic links
are supported on the remote site but mapped to real files locally/in P4 named like the link.
Valid options
Perforce options
--p4Port host[:port]
P4 host:port to use, default port 1666 [default: current config, see
Default values]
--p4User userName
--p4Passwd password
--p4Client client
P4 client to use as the mirror of the remote site [default: see
Default values].
FTP options (for the remote site)
--ftpHost host[:port]
FTP host IP address, default port 21 [mandatory]
--ftpUser userName
FTP user name [mandatory]
--ftpPasswd password
FTP user password [default: will be prompted]
--ftpRoot rootRelativePath
Path of the root folder of the site [mandatory]
Normal sync (p4 to remote site) options
-o, --scriptDir localPath
Directory in which to generate the Python update script in normal sync
[default: thisProgDir/p4FtpSyncScripts
]
Reverse sync (remote site to P4) options
-r, --reverse
Reverse synchronization: Changes occurred on the remote site are detected, and
a P4 changelist is created but not submitted, unless option -s is specified.
-s, --submit
Submit the P4 changelist created for the changes detected. The default is to not
submit, so the changes can be reviewed before (manual) submit.
-c, --comment "COMMENT"
Optional description for the changelist. A default will be generated if none is specified.
-m, --mailto addr1,addr2,...
A list of email addresses to send a "file change report" to [default: don't send an email].
For now the mail server settings are globals in p4ftpsync.py
!
--smtpServer host[:port]
SMTP server IP address to use for sending the above mail [default localhost:25]
--smtpUser userName
SMTP user name if authentication is required [default: None]
--smtpPasswd password
SMTP user password [default: None]
--fromAddr addr
Email address to put in the 'From' field of the report emails [default: p4ftpsync@p4ftpsync.net -dummy!]
Common sync. options
-f, --force
For
normal sync: If specified, the target will be resynced even if supposedly up
to date (same as P4 sync option -f).
For
reverse sync: If specified, the file description
cache
will
not be used and the file download & comparison will
always be done.
-t, --test
Test/preview mode: For normal sync: do not really P4 sync the mirror client workspace,
generate the script but do not actually FTP.
For reverse sync: do not create P4 changelist or copy files to client space.
-x, --exclude FILE1,FILE2,... | @FILE
Exclude files from the list of updates to do (handy if some files must remain different locally
from remotely, e.g. config files, system files). FILEs are specified as a comma separated list (no spaces) of
patterns, or alternatively listed in a file (@FILE), one per line. Patterns are Python compatible
regular expressions. They are implicitely completed with a ^.*
on the left and a $
on the end (unless they already include them), meaning that if the end of the path of a file to sync
matches one of the patterns, then it will be excluded from the sync. The (?i)
flag (IGNORECASE) can be added
to a pattern to make the match case-independent.
Examples of patterns
.htaccess
(exclude the .htaccess file wherever it appears)
/A/B/f.txt
(exclude file f.txt located in folder /A/B/)
/log/.*
(exclude all files in directory log/)
/~.*
(exclude all files whose name starts with ~)
.pdf
(exclude all pdf files)
.pdf(?i)
(ditto, but (?i) specifies to ignore case,
so .PDF matches too)
Misc options
-v, --verbose
Verbose: More trace/error messages on stdout.
-V, --version
Print p4ftpsync version on stdout and exit.
-h, --help
Print help message on stdout and exit.
Default values
p4ftpsync stores (in file p4ftpsync.opt
) the last values used for options p4Client,
ftpHost, ftpUser and ftpRoot and uses them if no value is provided on the command line.
Options are stored twice: as associated to the specific target sync spec, and as generic
"last session defaults" (ftpRoot is not stored in the latter case).
When trying to reload the values for a new session, p4ftpsync first attempts to reload the
target specific values, then the non specific ones.
File description cache (reverse sync only)
Remote file descriptions are saved on disk after a successful synchronization and used on
subsequent syncs to determine if a file has changed (different file descs), without having to
download the file, which can save a tremendous amount of time. There is a different cache for every
ftp host+user (file user@ftpHost.fdc
)
On the first run of p4ftpsync
for a given site, all
files from the remote site will be downloaded and compared to the files in the depot, quite a lengthy
and bandwith consuming operation ! However next iterations should use the cache and be considerably
faster (the only relatively long operation that can't be shortened is the determination of the current
file structure on the remote site). When the cache is missing or corrupted, the download of the entire
site (minus the exclusions) occurs again.
The cache can be disabled with option -f, --force (it is then not used for the current
iteration but will still be saved for future use).
Generated FTP Script
In normal sync, a Python script is created in the directory indicated by option -o
for the specified target (whatToSync). The script is named what_date.py
(with / and spaces replaced by _). It contains the set of FTP actions required to synchronize
the remote client (site) as requested, therefore its execution will perform the actual update,
which can prove very useful, should the FTP session fail before completion: executing the script will
ftp the changes again.
Log
A file p4ftpsync.log
is generated in p4ftpsync
's folder (always
verbose, unlike the on-screen trace which depends on option -v). Logs are rotated to
p4ftpsync.log.n.zip
when they exceed a certain size (4MB by default).
Limitations, Bugs, ToDo
The most obvious limitations and todos I can think of right now are:
- A limited support for symbolic links. Currently links are recognized on the remote site
but translated into normal files in P4 during reverse sync. That is, we tolerate that links
exist on the remote site but not locally and in P4. Since P4 supports symbolic links, it
could potentially be changed, though. (Due to the comparison strategy used during reverse
sync, I'm not sure that the case where 2 links withing the site point to the same file is
correctly handled: it will create as many real files in p4).
- An uncertain support of older versions of the P4 server (<2005.1). This program probably works
with not-too-old versions of the server, but I wouldn't bet on it.
- Support of all ftp servers is not guaranteed either. I tried on at least
3 different servers, but they are sometimes surprising in their use of error codes.
- ... and probably many other ones I don't think of ...
Please report bugs or suggestions to
Richard Gruet. I'd be happy to hear from you. I'll try to fix bugs
and implement suggestions if I think they can improve p4ftpsync (and are not to complex !). Don't hold your breath, though.
I'm quite busy.
License
Permission to use, copy, modify, and distribute this software and its
documentation for any purpose and without fee or royalty is hereby
granted, provided that the above copyright notice appear in all copies
and that both that copyright notice and this permission notice appear
in supporting documentation or portions thereof, including modifications,
that you make.
THE AUTHOR RICHARD GRUET DISCLAIMS ALL WARRANTIES WITH REGARD TO
THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
FITNESS, IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL,
INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING
FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT,
NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION
WITH THE USE OR PERFORMANCE OF THIS SOFTWARE !
In short: I publish this program in hope it can be useful to others. Use it as you wish but keep the copyright
intact, and don't hold me responsible for any problem you run into ;)