*** These notes are a sad excuse for "documentation", but are aimed at *** getting this to a potential user quickly. As time permits, I'll *** try to improve "p4watch", add some genuine documentaiton, and *** better packaging. - rmg 9/19/2001 === p4watch Notes === "p4watch" is a tool for tracking down Perforce performance problems, when you suspect that something is periodically causing Perforce commands which _should_ be quick to execute to get "the slows". It works by periodically running a (nominally) short-running "p4" command, ("p4 describe 1", by default), and watching for the command to complete. If the command has not completed given timeout duration, then a "p4wd" is done, in order to log system state. After "p4wd" has been run, it will also be run again after the original "p4" probe command finally completes. (This makes it easy to look for other p4d activity which was active at the time the probe command timed out, but which is gone after the probe command finally completed; these are sometimes good suspects for what caused the slowness. Optionally, a "vmstat" on the server host and a "sysstat" on the filer (each of which produce periodic output) can be requested. When these are used, p4watch captures periodic vmstst and sysstat output for the entire duration of p4watch. All output from all of this goes to the standard output, and the ttt logfile (see below), with labeling and timestamping that help to see what happened when (or to extract particular information of interest, usging grep). "p4watch" uses a library called "ttt". In particular, you'll need: tttLib.pl - the main ttt library tttAgents.pl - the ttt agent library ttt_generic_opts.pl - generic options for the ttt library These should reside in the same directory a the "p4watch" script. (For now, at least) "p4watch" also requires the Time::HiRes perl module (available from CPAN). Finally, you'll need a working "p4wd" on your p4d host. If you choose to enable the vmstat logging, the "vmstat" command issued may need to be adjusted for the operating system you are running your Perforce server on. "p4watch" can be run from any host or as any user, provided that: - you'll need to run on a host (and as a user) from which you can run "rsh p4wd". (I.e., you can run p4watch itself directly on your server host, or some other host) - if you enable the "sysstatint" option (filer sysstat output), you'll need to run "p4watch" from a host (and as a user) that can run "rsh sysstat". - if you enable the "vmstatint" option (p4d host sysstat logging), you'll need to run "p4watch" from a host (and as a user) that can run "rsh vmstat". All "p4watch" arguments can be set either on the command line, using, e.g., p4watch p4wdthresh=10 srv_host=charley user=rmg p4cmdint=120 or in the environment, e.g., srv_host=charley; export srv_host p4cmdint=120; export p4cmdint p4watch I like to use a shell script to start p4watch, e.g.: | #!/bin/sh | | p4watch \ | filer_user=rmg:rmg1pwd \ | logfile=log \ | p4cmd=files \ | p4cmdint=10 \ | p4wdthresh=5 \ | sysstatint=3 \ | vmstatint=3 \ | srv_port=1678 \ | srv_host=localhost \ | p4wd="p4wd -p4d_pid 5068 -p4d_log /home/p4/root-cvs2p4/log 1000; top b n 1" To get a brief description of the arguments (and their defaults), use p4watch =options Here's a bit more about the options: p4cmd the short running p4 command to run as a probe p4wd the p4wd command to execute. (You may need to supply p4wd arguments to fit your server, as shown in the example above; also not that you can tack on other commands to be run after p4wd - a "top b n 1" in the example above). p4wdthresh threshold above which to run p4wd i.e., if the "$p4dcmd" doesn't finish or yeild output for this many seconds, run a p4wd. vmstatint vmstat interval argument setting this causes "vmstat " to be run on the p4d host sysstatint filer sysstat interval argument setting this causes "sysstat to be run on the filer p4cmdint p4 describe interval this is the frequency at which "p4 describe" probes should be run srv_host host p4d is running on The host where "p4d" is running srv_port port p4d is listening on The port on srv_host where p4d is listening srv_user user to run as on srv_host all command on srv_host will be run with "rsh -l ..." cli_host host to run "p4 describe" probes on The host where the "p4 describes" should be run cli_user user to run as on cli_host all command on cli_host will be run with "rsh -l ..." filer the filer hostname filer_user user to run as on the filer all command on the filer will be run with "rsh -l ..." I suggest that, before attempting to run "p4watch", try running "testAgents"; the output should look like | rmg $ testAgents | : logging to: /home/rmg/web/richard_geiger/guest/richard_geiger/p4bench/ttt_log | execute : 14498 on peggyo: date | execute : 14499 on peggyo: date | execute : 14500 on peggyo: date | 14500 : STDOUT: Thu Sep 13 17:58:01 PDT 2001 | 14498 : STDOUT: Thu Sep 13 17:58:01 PDT 2001 | 14499 : STDOUT: Thu Sep 13 17:58:01 PDT 2001 | 14500 : STDERR: EOF | 14500 : STDOUT: EOF | 14498 : STDERR: EOF | 14498 : STDOUT: EOF | 14499 : STDERR: EOF | 14499 : STDOUT: EOF | 14498 : exit status = 0 | 14499 : exit status = 0 | 14500 : exit status = 0 | exit : pass And then, perhaps another one, asking to run "uname" on the p4dhost ("computer", in this example): | rmg $ testAgents onhost1=computer oncmd1=hostname | : logging to: /home/rmg/web/richard_geiger/guest/richard_geiger/p4bench/ttt_log | execute : 5871 on computer: hostname | execute : 5872 on peggyo: date | execute : 5873 on peggyo: date | 5872 : STDOUT: Wed Sep 19 15:09:49 PDT 2001 | 5873 : STDOUT: Wed Sep 19 15:09:49 PDT 2001 | 5872 : STDERR: EOF | 5872 : STDOUT: EOF | 5873 : STDERR: EOF | 5873 : STDOUT: EOF | 5871 : STDOUT: computer.perforce.com | 5871 : STDERR: EOF | 5871 : STDOUT: EOF | 5871 : exit status = 0 | 5872 : exit status = 0 | 5873 : exit status = 0 | exit : pass (This will verify that the "ttt" library is working right; I think it should work fine on most Unixes, but if not, please let me know.) For an example p4watch session, see EXAMPLE_p4watch_log. *** Note: to make sense of the following, I suggest using a really wide window!: Entries in the log file look like 2001-09-19 14:26:26: execute : 5646 on localhost: uname -a ^^^^^^^^^^^^^^^^^^^^^^^^^^^ message ^^^^^^^^^ tag ^^^^^^^^^^^^^^^^^^^ timestamp The "tag"s serve to identiy line, since they are logged in time order, and sometimes this creates interleaved stuff you want to pull apart. Try, e.g., grep ": vmstat :" EXAMPLE_p4watch_log Additionally, lines that log output from child processes have, 2001-09-19 14:26:26: 5646 : STDOUT: Linux peggyo 2.4.2-2 #1 Sun Apr 8 20:41:30 EDT 2001 i686 unknown ^^^^^^^ stream id Note that the "tag" for these lines can either be the pid of the process that produced the output (as above), or a symbolic name, assigned when when the process was started, e.g., 2001-09-19 14:26:26: vmstat : STDOUT: procs memory swap io system cpu ^^^^^^ p4watch uses the tag ": trip :" when slowness is detected: 2001-09-19 14:26:50: sysstat : STDOUT: 2% 0 0 0 0 0 0 0 0 0 0 >60 100% 0% - 0% 2001-09-19 14:26:53: vmstat : STDOUT: 0 0 0 2564 4996 28796 68904 0 0 0 0 111 56 0 0 100 2001-09-19 14:26:53: sysstat : STDOUT: 2% 0 0 0 0 0 0 0 0 0 0 >60 100% 0% - 0% 2001-09-19 14:26:54: trip : do p4wd after waiting 5.993265 seconds 2001-09-19 14:26:54: execute : 5656 on localhost: p4wd -p4d_pid 5068 -p4d_log /home/p4/root-cvs2p4/log 1000; top b n 1 2001-09-19 14:26:54: p4wd : STDOUT: 5068*5068 3981 p4d Wed Sep 19 13:35:18 2001 768 00:00:00 p4d -p 1678 ^^p4wd output^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Note: in this example, the server is very unbusy, so you only see the "master" p4d process in the p4wd output above. (In fact I caused the timeout in this example to trip by stopping the server with job-control!) 2001-09-19 14:26:55: p4wd : STDOUT: 2001-09-19 14:26:55: p4wd : STDOUT: 2001-09-19 14:26:55: p4wd : STDOUT: 2:26pm up 42 days, 5:20, 2 users, load average: 0.00, 0.00, 0.00 2001-09-19 14:26:55: p4wd : STDOUT: 84 processes: 78 sleeping, 1 running, 1 zombie, 4 stopped