p4fs - p4 fast sync project readme

p4fs is a demonstration project build on top of the Perforce p4api to show the
effect of several different ways a Perforce client can write files to disk under
NTFS. The current Perforce clients (p4/p4v/p4win) all write data to disk as soon
as it is received (in chunks of about 4-8KB), which, under NTFS, can very easily
fragment the files. This can be a serious performance bottleneck when doing
large sync's. This is not an issue for files with sizes < 100KB, but if you have
many large (likely binary) files, low client-side disk performance, caused by
the fragmentation can easily become the limiting factor. Additionally, working
with these fragmented files will be slower than it needs to be. As this project
demonstrates, this need not be so; it is relatively simple to remove this
bottleneck.

The project contains two tools that can be used by anybody that wants to improve
their NTFS sync performance and reduce client-side fragmentation. See below for
an idea of the performance increase you might expect. The main goal of the
project, however, is to try to persuade Perforce to implement something similar
in the normal client tools (in which case I would also really appreciate a sync
progress bar in P4V ;-).

This directory contains the full source of the project (in the src dir), Visual
Studio 2005 project files (in the prj dir) and precompiled binaries (in the bin
dir) of the following tools:

- p4fs is almost a drop in replacement for the p4 commandline tool. A sync
  command is processed by the tool itself so as to reduce fragmentation and
  increase download performance. Any command other than sync is piped through
  to the normal p4 client (which needs to be locatable through the PATH). Output
  and return codes should be identical to normal p4 operation. The only
  noticeable difference is that it currently only supports the global options
  -p (port) -c (client) -H (host) -u (user) -P (pass) and -h (help). A couple of
  custom switches were added to the sync command, but those are mainly relevant
  for testing (see below for details). Also note that it will ignore the value
  of P4CONFIG, only settings set on the commandline, environment variables and 
  values set through "p4 set" are honoured.

- P4fsV is a version of the client intended to be installed as a custom tool in
  P4V/P4Win. It is a windows application with a nice big Cancel button and a
  progressbar (the progress bar behaves a bit odd if you pass multiple paths).
  It supports the exact same options as p4fs does, but it does not pipe commands 
  other than sync through to p4. To integrate in P4V, create a custom tool, give
  it a name (GetLatestFast), check the "Add to applicable context menus" box, Add 
  "P4fsV.exe" in the 'Application' field and "sync %D" in the "Arguments' field. 
  Leave the rest blank, except check the 'Refresh P4V upon completion' box. Now 
  you can easily use P4fsV from the P4V right-click menu. Note that the same 
  remarks regarding global options and settings as for p4fs apply.

Giving exact figures for the performance increase you get is hard, as it is very
dependent on the current fragmentation status of the harddisk. But even with a
fairly lightly fragmented disk (such as you might have when running a defragment
tool nightly), the performance gain is still noticeable. To get a rough idea of
the average performance improvement, I've run many tests on harddisks in various
stages of fragmentation and averaged the results below. The sync's were done on
a representative sample of the binary files from our depot. Total size was about
900 MB, with an average of about 600KB per file. I measured both the average
throughput, the average number of file fragments, and the time it takes to do a
p4 diff -se on the whole set of files after the sync. This last number should
be a rough indication of the cost of working with the files, as the diff
performance is mostly limited by the speed at which the files can be read in
from disk to calculate the md5 checksum.

            throughput     average fragmentation      p4 diff -se time
p4         126 Mbit/sec     4.86 fragments/file         81.7 seconds
p4fs       202 Mbit/sec     1.04 fragments/file         35.1 seconds

While there is a fair amount of variation in these numbers, the results are
definitely reproducable. The little bit of fragmentation you still see with p4fs
is from directory fragmentation, which is still present. I used my Perforce
profiling script (available at:
http://public.perforce.com:8080/guest/frank_compagner/ ) to verify that the
client disk was indeed the bottleneck during all tests. The throughput, is not
network throughput, but rather the number of bits written to disk, possibly
after zlib decompression, so the actual network throughput is smaller.

This is all you really need to know if you just want to use the tools to improve
your sync performance and reduce fragmentation. It has been tested pretty
extensively at Guerrilla Games, and no bugs have been found in quite a while.
However, I offer no guarantees that it will do the right thing in every
circumstance; for example, I do not know if it correctly handles all the
FileSysType types, some of which I have never seen (we have no unicode files,
for instance, and I'm not sure how end-of-line conversion should be handled for
them). A good effort has been made to ensure correct cancelation behaviour. This
should always succeed, and never leave any temporary files on disk.


If you're interested in the details of the project, read on.

I began the project experimenting with increasing the size of the data that was
written to disk in one go. Extending the buffersize indeed helps to reduce
fragmentation, up to a point (at about 1 MegaByte, NTFS appears to start
fragmenting the file anyway). But if we do all file I/O synchronously then we
spend lots of time waiting for the write to complete. So I tried a number of
approaches:

- FileWriterImmediate: this essentially copies the behaviour of the normal
  Perforce clients, writing each buffer to disk as soon as it arrives.
- FileWriterSync: Allocates a large buffer and writes that to disk in one go
  once it's filled (or the file is complete).
- FileWriterASync: Identical to FileWriterSync but does all File I/O
  asynchronously, so we can continue reading data from the network while the
  data is written do disk.
- FileWriterMT: The multi-threaded filewriter uses separate read and write
  threads, with the buffers being passed from the read to the write thread once
  they're complete.

Unsurprisingly, the latter two filewriters gave best performance, while
fragmentation was drastically reduced for all but the Immediate filewriter. But
after a while I discovered the new OutputStat options that were added to the
sync command with the 2006.2 release. With these, we can find out the size of
the file prior to it being downloaded. This makes it possible to pre-allocate
the file on disk before we start writing to it, enabling NTFS to make a better
choice of where to put the file on disk. This alters the picture considerably;
now all filewriters cause very little fragmentation, and the best performance is
actually achieved by the immediate filewriter (probably due to the simple
implementation with very little overhead). So pre-allocation of the file is all
that is needed to reduce fragmentation and increase performance. If anybody from
Perforce is reading this, the essential code is in the FileWriter::CreateFile()
function ;-).

A slight complication is that, On NTFS, because of security considerations, the
pre-allocation will wipe the entire file first, possibly degrading performance.
From XP onwards the SetFileValidData() Windows API has made it possible to avoid
this penalty (provided the user has local admin rights). In testing, however,
this did not appear to be really necessary, even the cost of zero-ing the entire
file before the write appears to be marginal when compared to the cost of the
seeks caused by the normal fragmentation. Still, when available, the
SetFileValidData() function is used.

I've kept all filewriter classes in the project so it is easy to compare the
relative performance of each. The default choice is now always the immediate
filewriter, but you can override the behaviour with the following custom
commandline switches (these need to be added after the sync command):
-I, -S, -A, -M will select the Immediate, Sync, ASync or MT filewriter. Only
   one of these can be present or you will get an error.
-a means do not pre-allocate the file, but just use the selected filewriter.
-v means do not use the SetFileValidData() function.

Here's an overview of the relative performance of each combination:

          no pre-allocation   pre-allocation       pre-allocation
                              no validation        validation
          Mbit/sec  #frags    Mbit/sec  #frags     Mbit/sec  #frags
p4 sync
p4fs -I
p4fs -S
p4fs -A
p4fs -M

If you want to build p4fs yourself, you need to add the p4api headers and libs
to the project. It has been extensively tested with version 2007.2, but earlier
versions are probably ok, though you will need at least version 2006.2, as that
is the first version that supports the required OutputStat fields in the sync
response. You will also need to add the zlib project (http://www.zlib.net/); I
used version 1.2.3, but newer versions should be ok too. If you use the standard
zlib vcproj file (which I would recommend), use the configuration manager to set
it up so you do not use the optimized ASM configurations of zlib. These will not
compile under Visual Studio 2005 without some small tweaks, but more importantly
they sometimes seem to give bogus errors when decompressing. I have no idea why
this is, but as zlib performance is not on the critical path for this
application, it is much better to just use the standard "Debug" and "Release"
configurations.

The code is lightly documented, and the design might be a bit anemic, but the
whole thing is still fairly straightforward. In implementing this, I've
discovered quite a few intricacies of reimplementing FileSys, so looking over
the code might be useful to anybody trying to do the same (on any platform).
If you have any comments, questions or find any bugs, I would love to hear
from you at frank@compagner.com

Frank Compagner