*** These notes are a sad excuse for "documentation", but are aimed at
*** getting this to a potential user quickly. As time permits, I'll
*** try to improve "p4watch", add some genuine documentaiton, and
*** better packaging.
- rmg 9/19/2001
=== p4watch Notes ===
"p4watch" is a tool for tracking down Perforce performance problems,
when you suspect that something is periodically causing Perforce
commands which _should_ be quick to execute to get "the slows".
It works by periodically running a (nominally) short-running "p4"
command, ("p4 describe 1", by default), and watching for the command
to complete. If the command has not completed given timeout duration,
then a "p4wd" is done, in order to log system state.
After "p4wd" has been run, it will also be run again after the
original "p4" probe command finally completes. (This makes it easy to
look for other p4d activity which was active at the time the probe
command timed out, but which is gone after the probe command finally
completed; these are sometimes good suspects for what caused the
slowness.
Optionally, a "vmstat" on the server host and a "sysstat" on the filer
(each of which produce periodic output) can be requested. When these
are used, p4watch captures periodic vmstst and sysstat output for the
entire duration of p4watch.
All output from all of this goes to the standard output, and the ttt
logfile (see below), with labeling and timestamping that help to see
what happened when (or to extract particular information of interest,
usging grep).
"p4watch" uses a library called "ttt". In particular, you'll need:
tttLib.pl - the main ttt library
tttAgents.pl - the ttt agent library
ttt_generic_opts.pl - generic options for the ttt library
These should reside in the same directory a the "p4watch" script.
(For now, at least)
"p4watch" also requires the Time::HiRes perl module (available from
CPAN).
Finally, you'll need a working "p4wd" on your p4d host.
If you choose to enable the vmstat logging, the "vmstat" command
issued may need to be adjusted for the operating system you are
running your Perforce server on.
"p4watch" can be run from any host or as any user, provided that:
- you'll need to run on a host (and as a user) from which you can
run "rsh <p4dhost> p4wd". (I.e., you can run p4watch itself
directly on your server host, or some other host)
- if you enable the "sysstatint" option (filer sysstat output), you'll
need to run "p4watch" from a host (and as a user) that
can run "rsh <filer> sysstat".
- if you enable the "vmstatint" option (p4d host sysstat logging),
you'll need to run "p4watch" from a host (and as a user) that
can run "rsh <p4dhost> vmstat".
All "p4watch" arguments can be set either on the command line,
using, e.g.,
p4watch p4wdthresh=10 srv_host=charley user=rmg p4cmdint=120
or in the environment, e.g.,
srv_host=charley; export srv_host
p4cmdint=120; export p4cmdint
p4watch
I like to use a shell script to start p4watch, e.g.:
| #!/bin/sh
|
| p4watch \
| filer_user=rmg:rmg1pwd \
| logfile=log \
| p4cmd=files \
| p4cmdint=10 \
| p4wdthresh=5 \
| sysstatint=3 \
| vmstatint=3 \
| srv_port=1678 \
| srv_host=localhost \
| p4wd="p4wd -p4d_pid 5068 -p4d_log /home/p4/root-cvs2p4/log 1000; top b n 1"
To get a brief description of the arguments (and their defaults),
use
p4watch =options
Here's a bit more about the options:
p4cmd the short running p4 command to run as a probe
p4wd the p4wd command to execute.
(You may need to supply p4wd arguments to fit your server,
as shown in the example above; also not that you can tack on other
commands to be run after p4wd - a "top b n 1" in the example above).
p4wdthresh threshold above which to run p4wd
i.e., if the "$p4dcmd" doesn't finish or yeild output for
this many seconds, run a p4wd.
vmstatint vmstat interval argument
setting this causes "vmstat <vmstatint>" to be run on the p4d host
sysstatint filer sysstat interval argument
setting this causes "sysstat <sysstatint> to be run on the filer
p4cmdint p4 describe interval
this is the frequency at which "p4 describe" probes should be run
srv_host host p4d is running on
The host where "p4d" is running
srv_port port p4d is listening on
The port on srv_host where p4d is listening
srv_user user to run as on srv_host
all command on srv_host will be run with "rsh -l <srv_user> ..."
cli_host host to run "p4 describe" probes on
The host where the "p4 describes" should be run
cli_user user to run as on cli_host
all command on cli_host will be run with "rsh -l <cli_user> ..."
filer the filer hostname
filer_user user to run as on the filer
all command on the filer will be run with "rsh -l <filer_user> ..."
I suggest that, before attempting to run "p4watch", try running
"testAgents"; the output should look like
| rmg $ testAgents
| : logging to: /home/rmg/web/richard_geiger/guest/richard_geiger/p4bench/ttt_log
| execute : 14498 on peggyo: date
| execute : 14499 on peggyo: date
| execute : 14500 on peggyo: date
| 14500 : STDOUT: Thu Sep 13 17:58:01 PDT 2001
| 14498 : STDOUT: Thu Sep 13 17:58:01 PDT 2001
| 14499 : STDOUT: Thu Sep 13 17:58:01 PDT 2001
| 14500 : STDERR: EOF
| 14500 : STDOUT: EOF
| 14498 : STDERR: EOF
| 14498 : STDOUT: EOF
| 14499 : STDERR: EOF
| 14499 : STDOUT: EOF
| 14498 : exit status = 0
| 14499 : exit status = 0
| 14500 : exit status = 0
| exit : pass
And then, perhaps another one, asking to run "uname" on the p4dhost
("computer", in this example):
| rmg $ testAgents onhost1=computer oncmd1=hostname
| : logging to: /home/rmg/web/richard_geiger/guest/richard_geiger/p4bench/ttt_log
| execute : 5871 on computer: hostname
| execute : 5872 on peggyo: date
| execute : 5873 on peggyo: date
| 5872 : STDOUT: Wed Sep 19 15:09:49 PDT 2001
| 5873 : STDOUT: Wed Sep 19 15:09:49 PDT 2001
| 5872 : STDERR: EOF
| 5872 : STDOUT: EOF
| 5873 : STDERR: EOF
| 5873 : STDOUT: EOF
| 5871 : STDOUT: computer.perforce.com
| 5871 : STDERR: EOF
| 5871 : STDOUT: EOF
| 5871 : exit status = 0
| 5872 : exit status = 0
| 5873 : exit status = 0
| exit : pass
(This will verify that the "ttt" library is working right; I think it
should work fine on most Unixes, but if not, please let me know.)
For an example p4watch session, see EXAMPLE_p4watch_log.
*** Note: to make sense of the following, I suggest using a really wide window!:
Entries in the log file look like
2001-09-19 14:26:26: execute : 5646 on localhost: uname -a
^^^^^^^^^^^^^^^^^^^^^^^^^^^ message
^^^^^^^^^ tag
^^^^^^^^^^^^^^^^^^^ timestamp
The "tag"s serve to identiy line, since they are logged in time order,
and sometimes this creates interleaved stuff you want to pull apart.
Try, e.g.,
grep ": vmstat :" EXAMPLE_p4watch_log
Additionally, lines that log output from child processes have,
2001-09-19 14:26:26: 5646 : STDOUT: Linux peggyo 2.4.2-2 #1 Sun Apr 8 20:41:30 EDT 2001 i686 unknown
^^^^^^^ stream id
Note that the "tag" for these lines can either be the pid of the
process that produced the output (as above), or a symbolic name,
assigned when when the process was started, e.g.,
2001-09-19 14:26:26: vmstat : STDOUT: procs memory swap io system cpu
^^^^^^
p4watch uses the tag ": trip :" when slowness is detected:
2001-09-19 14:26:50: sysstat : STDOUT: 2% 0 0 0 0 0 0 0 0 0 0 >60 100% 0% - 0%
2001-09-19 14:26:53: vmstat : STDOUT: 0 0 0 2564 4996 28796 68904 0 0 0 0 111 56 0 0 100
2001-09-19 14:26:53: sysstat : STDOUT: 2% 0 0 0 0 0 0 0 0 0 0 >60 100% 0% - 0%
2001-09-19 14:26:54: trip : do p4wd after waiting 5.993265 seconds
2001-09-19 14:26:54: execute : 5656 on localhost: p4wd -p4d_pid 5068 -p4d_log /home/p4/root-cvs2p4/log 1000; top b n 1
2001-09-19 14:26:54: p4wd : STDOUT: 5068*5068 3981 p4d Wed Sep 19 13:35:18 2001 768 00:00:00 p4d -p 1678
^^p4wd output^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Note: in this example, the server is very unbusy, so you only see
the "master" p4d process in the p4wd output above. (In fact I caused
the timeout in this example to trip by stopping the server with
job-control!)
2001-09-19 14:26:55: p4wd : STDOUT:
2001-09-19 14:26:55: p4wd : STDOUT:
2001-09-19 14:26:55: p4wd : STDOUT: 2:26pm up 42 days, 5:20, 2 users, load average: 0.00, 0.00, 0.00
2001-09-19 14:26:55: p4wd : STDOUT: 84 processes: 78 sleeping, 1 running, 1 zombie, 4 stopped