p4fs - p4 fast sync project readme p4fs is a demonstration project build on top of the Perforce p4api to show the effect of several different ways a Perforce client can write files to disk under NTFS. The current Perforce clients (p4/p4v/p4win) all write data to disk as soon as it is received (in chunks of about 4-8KB), which, under NTFS, can very easily fragment the files. This can be a serious performance bottleneck when doing large sync's. This is not an issue for files with sizes < 100KB, but if you have many large (likely binary) files, low client-side disk performance, caused by the fragmentation can easily become the limiting factor. Additionally, working with these fragmented files will be slower than it needs to be. As this project demonstrates, this need not be so; it is relatively simple to remove this bottleneck. The project contains two tools that can be used by anybody that wants to improve their NTFS sync performance and reduce client-side fragmentation. See below for an idea of the performance increase you might expect. The main goal of the project, however, is to try to persuade Perforce to implement something similar in the normal client tools (in which case I would also really appreciate a sync progress bar in P4V ;-). This directory contains the full source of the project (in the src dir), Visual Studio 2005 project files (in the prj dir) and precompiled binaries (in the bin dir) of the following tools: - p4fs is almost a drop in replacement for the p4 commandline tool. A sync command is processed by the tool itself so as to reduce fragmentation and increase download performance. Any command other than sync is piped through to the normal p4 client (which needs to be locatable through the PATH). Output and return codes should be identical to normal p4 operation. The only noticeable difference is that it currently only supports the global options -p (port) -c (client) -H (host) -u (user) -P (pass) and -h (help). A couple of custom switches were added to the sync command, but those are mainly relevant for testing (see below for details). Also note that it will ignore the value of P4CONFIG, only settings set on the commandline, environment variables and values set through "p4 set" are honoured. - P4fsV is a version of the client intended to be installed as a custom tool in P4V/P4Win. It is a windows application with a nice big Cancel button and a progressbar (the progress bar behaves a bit odd if you pass multiple paths). It supports the exact same options as p4fs does, but it does not pipe commands other than sync through to p4. To integrate in P4V, create a custom tool, give it a name (GetLatestFast), check the "Add to applicable context menus" box, Add "P4fsV.exe" in the 'Application' field and "sync %D" in the "Arguments' field. Leave the rest blank, except check the 'Refresh P4V upon completion' box. Now you can easily use P4fsV from the P4V right-click menu. Note that the same remarks regarding global options and settings as for p4fs apply. Giving exact figures for the performance increase you get is hard, as it is very dependent on the current fragmentation status of the harddisk. But even with a fairly lightly fragmented disk (such as you might have when running a defragment tool nightly), the performance gain is still noticeable. To get a rough idea of the average performance improvement, I've run many tests on harddisks in various stages of fragmentation and averaged the results below. The sync's were done on a representative sample of the binary files from our depot. Total size was about 900 MB, with an average of about 600KB per file. I measured both the average throughput, the average number of file fragments, and the time it takes to do a p4 diff -se on the whole set of files after the sync. This last number should be a rough indication of the cost of working with the files, as the diff performance is mostly limited by the speed at which the files can be read in from disk to calculate the md5 checksum. throughput average fragmentation p4 diff -se time p4 126 Mbit/sec 4.86 fragments/file 81.7 seconds p4fs 202 Mbit/sec 1.04 fragments/file 35.1 seconds While there is a fair amount of variation in these numbers, the results are definitely reproducable. The little bit of fragmentation you still see with p4fs is from directory fragmentation, which is still present. I used my Perforce profiling script (available at: http://public.perforce.com:8080/guest/frank_compagner/ ) to verify that the client disk was indeed the bottleneck during all tests. The throughput, is not network throughput, but rather the number of bits written to disk, possibly after zlib decompression, so the actual network throughput is smaller. This is all you really need to know if you just want to use the tools to improve your sync performance and reduce fragmentation. It has been tested pretty extensively at Guerrilla Games, and no bugs have been found in quite a while. However, I offer no guarantees that it will do the right thing in every circumstance; for example, I do not know if it correctly handles all the FileSysType types, some of which I have never seen (we have no unicode files, for instance, and I'm not sure how end-of-line conversion should be handled for them). A good effort has been made to ensure correct cancelation behaviour. This should always succeed, and never leave any temporary files on disk. If you're interested in the details of the project, read on. I began the project experimenting with increasing the size of the data that was written to disk in one go. Extending the buffersize indeed helps to reduce fragmentation, up to a point (at about 1 MegaByte, NTFS appears to start fragmenting the file anyway). But if we do all file I/O synchronously then we spend lots of time waiting for the write to complete. So I tried a number of approaches: - FileWriterImmediate: this essentially copies the behaviour of the normal Perforce clients, writing each buffer to disk as soon as it arrives. - FileWriterSync: Allocates a large buffer and writes that to disk in one go once it's filled (or the file is complete). - FileWriterASync: Identical to FileWriterSync but does all File I/O asynchronously, so we can continue reading data from the network while the data is written do disk. - FileWriterMT: The multi-threaded filewriter uses separate read and write threads, with the buffers being passed from the read to the write thread once they're complete. Unsurprisingly, the latter two filewriters gave best performance, while fragmentation was drastically reduced for all but the Immediate filewriter. But after a while I discovered the new OutputStat options that were added to the sync command with the 2006.2 release. With these, we can find out the size of the file prior to it being downloaded. This makes it possible to pre-allocate the file on disk before we start writing to it, enabling NTFS to make a better choice of where to put the file on disk. This alters the picture considerably; now all filewriters cause very little fragmentation, and the best performance is actually achieved by the immediate filewriter (probably due to the simple implementation with very little overhead). So pre-allocation of the file is all that is needed to reduce fragmentation and increase performance. If anybody from Perforce is reading this, the essential code is in the FileWriter::CreateFile() function ;-). A slight complication is that, On NTFS, because of security considerations, the pre-allocation will wipe the entire file first, possibly degrading performance. From XP onwards the SetFileValidData() Windows API has made it possible to avoid this penalty (provided the user has local admin rights). In testing, however, this did not appear to be really necessary, even the cost of zero-ing the entire file before the write appears to be marginal when compared to the cost of the seeks caused by the normal fragmentation. Still, when available, the SetFileValidData() function is used. I've kept all filewriter classes in the project so it is easy to compare the relative performance of each. The default choice is now always the immediate filewriter, but you can override the behaviour with the following custom commandline switches (these need to be added after the sync command): -I, -S, -A, -M will select the Immediate, Sync, ASync or MT filewriter. Only one of these can be present or you will get an error. -a means do not pre-allocate the file, but just use the selected filewriter. -v means do not use the SetFileValidData() function. Here's an overview of the relative performance of each combination: no pre-allocation pre-allocation pre-allocation no validation validation Mbit/sec #frags Mbit/sec #frags Mbit/sec #frags p4 sync p4fs -I p4fs -S p4fs -A p4fs -M If you want to build p4fs yourself, you need to add the p4api headers and libs to the project. It has been extensively tested with version 2007.2, but earlier versions are probably ok, though you will need at least version 2006.2, as that is the first version that supports the required OutputStat fields in the sync response. You will also need to add the zlib project (http://www.zlib.net/); I used version 1.2.3, but newer versions should be ok too. If you use the standard zlib vcproj file (which I would recommend), use the configuration manager to set it up so you do not use the optimized ASM configurations of zlib. These will not compile under Visual Studio 2005 without some small tweaks, but more importantly they sometimes seem to give bogus errors when decompressing. I have no idea why this is, but as zlib performance is not on the critical path for this application, it is much better to just use the standard "Debug" and "Release" configurations. The code is lightly documented, and the design might be a bit anemic, but the whole thing is still fairly straightforward. In implementing this, I've discovered quite a few intricacies of reimplementing FileSys, so looking over the code might be useful to anybody trying to do the same (on any platform). If you have any comments, questions or find any bugs, I would love to hear from you at frank@compagner.com Frank Compagner