# Example Helix Nagios Plugins * * * ![Logo](SmallMagnifier.png "Logo") This project contains example bash scripts that show how Helix components could be monitored from a Nagios server. The return states are configurable so could be customised for other monitoring solutions as needed: STATE_OK=0 STATE_WARNING=1 STATE_CRITICAL=2 ### Scripts **check_helix_p4d_health** Description: Helix P4D health checker - Example Nagios monitoring script the checks the status of the Helix P4D server. Usage: check_helix_p4d_health -p [--all] [-u] check_helix_p4d_health -p [--licensecheck [-l ] ] check_helix_p4d_health -p [--pidcheck [-c ] ] check_helix_p4d_health -p [--p4monitorcheck [-m ] ] check_helix_p4d_health -p [--p4diskcheck [-d ] ] check_helix_p4d_health -p [--p4repcheck [-r ] check_helix_p4d_health -p [--version] [--help] The '--all' flag runs all tests. The '-u' flag specifies the P4D user name. This user must be an 'operator' user or must have 'super' access to the P4D server The '-p' flag specifies the P4D hostname and port. The '--licensecheck' flag tests if the license file is nearing it's expiry date. By default it checks for expiry within '30' days but this can be overriden with the '-l' flag. The '--p4diskcheck' flag checks for free disk space on the P4D drives using the Perforce command 'p4 diskspace' and warns if the disks are over 95% used. This value can be overriden by speciying a value between 0 and 99 using with the '-d' flag. The '--pidcheck' flag counts the number of connected p4d processes using 'netstat' and warns if the are over 500 processes running. This value can be overriden with the '-c' flag. The '--p4monitorcheck' flag counts the number of commands in the 'p4 monitor' table and warns if there are over 500 running. This value can be overriden with the '-m' flag. The '--p4repcheck' flags (REPLICA ONLY) checks the current replication status for this replica and warns if there is a differnce of over 100,000 bytes between master and replica. This value can be overriden with the '-r' flag. Example Output: CRITICAL: P4D server not responding! Perforce client error: Connect to server failed; check $P4PORT. TCP connect to localhost:1666 failed. connect: 127.0.0.1:1666: Connection refused Tip: Check if the 'p4d' process is running on the box. Check the P4D log file for errors if it unexpectedly stopped. Examples: Run all checks against server on localhost:1666 check_helix_p4d_health -p 1666 --all Check if license will expire in next 45 days check_helix_p4d_health -p 1666 --licensecheck -l 45 ### Example Nagios Installation The script can be run using SSH or NRPE. In the example below I use the NRPE plugin. Note that it requires that the NRPE service accepts arguments which can be a security risk. If you have an doubts use SSH or hard code the paramaters on the monitored server. Below are the command definitions for the NRPE service and scripts on the Nagios server. define command{ command_name check_nrpe command_line /usr/lib/nagios/plugins/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -a $ARG2$ } define command{ command_name check_helix_p4d_health command_line $USER1$/check_helix_p4d_health $ARG1$ } Below is a service definition that runs the script 'check_helix_p4d_health' use 'check_nrpe' and provides the parameters '-p localhost:1666 --licensecheck'. This can be used to check the Helix P4D license status on port 1666 on server 'master-helix-server'. define service { host_name master-helix-server service_description Helix P4D 1666 License Check check_command check_nrpe!check_helix_p4d_health!'-p localhost:1666 --licensecheck' max_check_attempts 2 check_interval 2 retry_interval 2 check_period 24x7 check_freshness 1 contact_groups admins notification_interval 2 notification_period 24x7 notifications_enabled 1 register 1 } On the monitored server the command is configured within NRPE as: command[check_helix_p4d_health]=/usr/lib/nagios/plugins/check_helix_p4d_health $ARG1$*