= Perforce Helix Core Server Deployment Package (for UNIX/Linux) Perforce Professional Services :revnumber: v2023.2 :revdate: 2023-12-22 :doctype: book :icons: font :toc: :toclevels: 5 :sectnumlevels: 4 :xrefstyle: full // Attribute for ifdef usage :unix_doc: true == Preface The Server Deployment Package (SDP) is the implementation of Perforce's recommendations for operating and managing a production Perforce Helix Core Version Control System. It is intended to provide the Helix Core administration team with tools to help: * Simplify Management * Simplify Upgrades * High Availability (HA) * Disaster Recovery (DR) * Fast and Safe Upgrades * Production Focus * Best Practice Configurables * Optimal Performance, Data Safety, and Simplified Backup This guide is intended to provide instructions of setting up the SDP to help provide users of Helix Core with the above benefits. This guide assumes some familiarity with Perforce and does not duplicate the basic information in the Perforce user documentation. This document only relates to the Server Deployment Package (SDP). All other Helix Core documentation can be found here: https://www.perforce.com/support/self-service-resources/documentation[Perforce Support Documentation]. *Please Give Us Feedback* Perforce welcomes feedback from our users. Please send any suggestions for improving this document or the SDP to consulting@perforce.com. :sectnums: == Overview The SDP has four main components: * Hardware and storage layout recommendations for Perforce. * Scripts to automate critical maintenance activities. * Scripts to aid the setup and management of replication (including failover for DR/HA). * Scripts to assist with routine administration tasks. Each of these components is covered, in detail, in this guide. === Using this Guide <<_setting_up_the_sdp>> describes concepts, terminology and pre-requisites <<_maintaining_the_sdp_on_unix_linux>> covers administrative duties associated with keeping an installation of the SDP in good shape. <<_installing_the_sdp_on_unix_linux>> consists of what you need to know to setup Helix Core sever on a Unix platform. <<_backup_replication_and_recovery>> gives information around the Backup, Restoration and Replication of Helix Core, including some guidance on planning for HA (High Availability) and DR (Disaster Recovery) <<_upgrades>> covers upgrades of `p4d` and related Helix Core executables. <<_upgrading_the_sdp>> covers upgrading the SDP itself. <<_maximizing_server_performance>> covers optimizations and proactive actions. <<_tools_and_scripts>> covers all the scripts used within the SDP in detail. <<_sdp_package_contents_and_planning>> describes the details of the SDP package. <> describes the standard for setting the `journalPrefix` configurable. <<_server_spec_naming_standard>> describes the standard for naming 'server' specs created with the `p4 server` command. <<_frequently_asked_questions>> and <<_troubleshooting_guide>> are useful for other questions. <<_starting_and_stopping_services>> gives on overview of starting and stopping services with common init mechanisms, `systemd` and SysV. === Getting the SDP The SDP is downloaded as a single zipped tar file the latest version can be found at: https://swarm.workshop.perforce.com/projects/perforce-software-sdp/files/downloads The file to download containing the latest SDP is consistently named `sdp.Unix.tgz`. A copy of this file also exists with a version-identifying name, e.g. `sdp.Unix.2021.2.28649.tgz`. The direct download link to use with `curl` or `wget` is illustrated with this command: curl -L -O https://swarm.workshop.perforce.com/projects/perforce-software-sdp/download/downloads/sdp.Unix.tgz === Checking the SDP Version Once installed, the SDP `Version` file exists as `/p4/sdp/Version`. This is a simple text file that contains the SDP version string. The version can be checked using a command like `cat`, as in this sample command: $ cat /p4/sdp/Version Rev. SDP/MultiArch/2020.1/27955 (2021/08/13) That string can be found in Change History section of the link:ReleaseNotes.html[SDP Release Notes]. This can be useful in determining if your SDP is the latest available, and to see what features are included. When an SDP tarball is extracted, the `Version` file appears in the top-level `sdp` directory. == Setting up the SDP This section tells you how to configure the SDP to setup a new Helix Core server. The SDP can be installed on multiple server machines, and each server machine can host one or more Helix Core server instances. See <<_terminology_definitions>> for detailed definition of terms. The SDP implements a standard logical directory structure which can be implemented flexibly on one or many physical server machines. Additional relevant information is available in the https://www.perforce.com/perforce/doc.current/manuals/p4sag/Content/P4SAG/Home-p4sag.html[System Administrator Guide]. === Terminology Definitions * *process* - a running process with a process identifier (PID). It should normally be qualified as to what type of process it is: ** *p4d process* - a running p4d process with it's own copy of db.* files. P4D processes may be of any one of the standard types, e.g. standard or commit-server, and any of the valid replica types: standby, forwarding-replica, edge-server etc. ** *p4p process* – proxy instance talking to a single upstream p4d instance ** *p4broker process* – p4broker talking to a single upstream p4d instance * *Instance* - a logically independent set of Helix Core data and metadata, represented by entities such as changelist numbers and depot paths, and existing a storage device in the form of db.* files (metadata) and versioned files (archive files). Thus, the instance is a reference to the logical data set, with its set of users, files, changelists. ** The default SDP instance name is simply `1` (the digit 'one'). ** Any alphanumeric name can be used. It is mainly of interest to administrators, not regular users. ** Instance names are best kept short, as they are typed often in various admin operational tasks. ** An *instance* has a well defined name, embedded in its P4ROOT value. If the P4ROOT is `/p4/ace/root`, for example, `ace` is the instance name. ** An *instance* must operate with at least one p4d process on a master server machine. The instance may also extend to many machines running additional p4d, p4broker, and p4p processes. For the additional p4d processes, they can be replicas of various types, to include standby, edge, and filtered forwarding replicas (to name a few). ** On all machines on which an instance is physically extended, including proxy, broker, and replica machines, the instance exists as `/p4/N`, where `N` is the instance name. ** There can be more than one instance a machine. * *Server machine* - this is a host machine (virtual or physical) with operating system and on which any number of p4d or other processes may be running. * *Server spec* or *server specification* - is the entity managed using `p4 server` command (and the companion `p4 servers` to list all of them). * *Server* - this is a vague term. It needs to be fully qualified, and use on its own (unadorned) depends on context. It may mean any one of: ** Server machine ** P4d process (this is usually the most common usage - tend to assume this unless otherwise defined.) ** Any other type of instance! IMPORTANT: Thus "p4d server" is unclear as to whether you are talking about a p4d process or a server machine or a combination of both (since there may be a single instance on a single machine, or many instances on a machine, etc). Make sure you understand what is being referred to! == Pre-Requisites [arabic] . The Helix Core binaries (p4d, p4, p4broker, p4p) have been downloaded (see <<_installing_the_sdp_on_unix_linux>>) . _sudo_ access is required . System administrator available for configuration of drives / volumes (especially if on network or SAN or similar) . Supported Linux version, currently these versions are fully supported - for other versions please speak with Perforce Support. * Ubuntu 18.04 LTS (bionic) * Ubuntu 20.04 LTS (focal) * Red Hat Enterprise Linux (RHEL) 7.x * Red Hat Enterprise Linux (RHEL) 8.x * CentOS 7 * CentOS 8 (not recommended for production; Rocky Linux replaces CentOS 8) * Rocky Linux 8.x * SUSE Linux Enterprise Server 12 === Volume Layout and Hardware As can be expected from a version control system, good disk (storage) management is key to maximizing data integrity and performance. Perforce recommend using multiple physical volumes for *each* p4d server instance. Using three or four volumes per instance reduces the chance of hardware failure affecting more than one instance. When naming volumes and directories the SDP assumes the "hx" prefix is used to indicate Helix volumes. Your own naming conventions/standards can be used instead, though this is discouraged as it will create inconsistency with documentation. For optimal performance on UNIX machines, the XFS file system is recommended, but not mandated. The EXT4 filesystem is also considered proven and widely used. * {blank} + *Depot data, archive files, scripts, and checkpoints*: Use a large volume, with RAID 6 on its own controller with a standard amount of cache or a SAN or NAS volume (NFS access is fine). This volume is the only volume that *must* be backed up. The SDP backup scripts place the metadata snapshots on this volume. + This volume is normally called `/hxdepots`. * {blank} + *Perforce metadata (database files), 1 or 2 volumes:* Use the fastest volume possible, ideally SSD or RAID 1+0 on a dedicated controller with the maximum cache available on it. Typically a single volume is used, `/hxmetadata`. In some sites with exceptionally large metadata, 2 volumes are used for metadata, `/hxmetadata` and `/hxmetadata2`. Exceptionally large in this case means the metadata size on disk is such that (2x(size of db.* files)+room for growth) approaches or exceeds the storage capacity of the storage device used for metadata. That's driven by how big /hxmetadata volume. So if you have a 16T storage volume and your total size of db.* files is some ~7T or less (so ~14T total), that's probably a reasonable cutoff for the definition of "exceptionally large" in this context. IMPORTANT: Do not run anti-virus tools or back up tools against the `hxmetadata` volume(s) or `hxlogs` volume(s), because they can interfere with the operation of the Perforce server executable. * {blank} + *Journals and logs:* a fast volume, ideally SSD or RAID 1+0 on its own controller with the standard amount of cache on it. This volume is normally called `/hxlogs` and can optionally be backed up. + If a separate logs volume is not available, put the logs on the `/hxmetadata` or `/hxmetadata1` volume, as metadata and logs have similar performance needs that differ from `/hxdepots`. WARNING: Storing metadata and logs on the same volume is discouraged, since the redundancy benefit of the P4JOURNAL (stored on `/hxlogs`) is greatly reduced if P4JOURNAL is on the same volume as the metadata in the P4ROOT directory. NOTE: If multiple controllers are not available, put the `/hxlogs` and `/hxdepots` volumes on the same controller. On all SDP machines, a `/p4` directory will exist containing a subdirectory for each instance, and each instance named `/p4`. The volume layout is shown in <<_sdp_package_contents_and_planning>>. This `/p4` directory enables easy access to the different parts of the file system for each instance. For example: * `/p4/1/root` contains the database files for instance `1` * `/p4/1/logs` contains the log files for instance `1` * `/p4/1/bin` contains the binaries and scripts for instance `1` * `/p4/common/bin` contains the binaries and scripts common to all instances == Maintaining the SDP on Unix / Linux === Backup procedures Helix Core's purpose is to maintain long-running history of all your development. As such, it is important to take reliable backups to preserve your dataset integrity. ==== Metadata checkpoints The SDP contains scripts and a default crontab which will create daily checkpoints with no downtime. The script <<_daily_checkpoint_sh>> accomplishes this my rotating the journal, replaying it into the `offline_db` directory, and checkpointing the `offline_db` directory. The resulting checkpoints, rotated journals, and checkpoint checksum files can be found in `/p4//checkpoints`. It is difficult to overstate the importance of regular checkpoints. Perforce metadata (the `db.*` files) is in a constant state of flux, and a checkpoint is the most reliable point of recovery for a commit server. Attempts to back up the `root` directory with `cp` or `rsync` will result in a metadata set that is probably inconsistent and corrupt. Simple backups of the root directory are insufficient. ==== Backup of the partition containing depots, checkpoints, and the SDP configuration There are three important parts to an SDP installation of Perforce: Metadata, archive storage (back-end version file storage), and configuration. A standard SDP installation will have all three of these on the `/hxdepots` partition or equivalent. Whatever your server backup strategy is, ensure that you are taking regular snapshots of `/hxdepots`. === Notifications The SDP contains the framework to allow your server to communicate its automated maintenance activities, both successes and failures. It is important to ensure that the SPD is properly configured to send emails to the right people, and that the right people are monitoring their emails. ==== Configuration Setting up mailx, postfix, or mailutils will allow your server to send out emails to your administrative team. Details can be found in <>. To tell the SDP whom to mail, you will need to set that in the file `/p4/common/config/p4_` on a per-instance basis. The relevant lines are: `export MAILTO=P4AdminList@p4demo.com` `export MAILFROM=P4Admin@p4demo.com` The `MAILTO` value can be a distribution group like `administrators@company.net`, a single recipient like `bruno@company.net`, or a comma delimited list like `bruno@company.net,mary@company.net,pat@company.net`. The `MAILFROM` value can be a valid email address, or a placeholder like `do-not-reply@company.net`. ==== Notifications to monitor Your administrator should be aware of the emails that the SDP will be sending on a regular basis. Be careful to not simply redirect them into an unmonitored folder. ===== Daily Checkpoint Probably the most important notification to follow, the daily checkpoint job lets you know that your metadata is backed up. Any error messages should be investigated. ===== Verify By default, the SDP will run a verify on all your back-end versioned file storage on a weekly basis. It is possible that errors or warnings will creep into an instance as time goes on. These should be investigated, but they are often not mission-critical. ===== Sync Replica If you are in a Helix topology that contains replicas or edges, those machines will have their own automated jobs that synchronize checkpoints from the commit server, and keep the metadata in sync. To maintain a healthy topology, these emails should also be investigated if they contain errors. === Disk usage Running out of disk is never fun. You should keep an eye on your disk usage, expanding when needed. A default SDP instance has the following configurables set: `filesys.P4JOURNAL.min = 5G` `filesys.P4ROOT.min = 5G` `filesys.depot.min = 5G` These settings will cause Perforce to halt when they discover that free disk space is under 5G on the specified partition. This will spare you from corruption if Perforce tries to write to a database and isn't able to finish. _However_, there are some edge cases where disk usage can still be disruptive. If your total partition size is 5G or lower, Perforce will halt automatically even if 5G was your intended partition size. Monitoring and expanding your storage space is an important part of maintenance. == Installing the SDP on Unix / Linux === Manual Install The following documentation covers internal details of how the SDP can be deployed manually. To install Perforce Helix Core server and the SDP, perform the steps laid out below: * Set up a user account, file system, and configuration scripts. * Run the configuration script. * Start the p4d process and configure the required file structure for the SDP. [.arabic] . If it doesn't already exist, create a group called `perforce`: sudo groupadd perforce . Create a user called `perforce` and set the user's home directory to `/home/perforce` on a local disk. We recommend using a local rather than automounted home directory for the `perforce` OS user. Using an automounted home directory introduces new failure modes for p4d, as well as potential performance issues. A local directory on the local storage is recommend for the home directory. (If the `/home` directory is always automounted, consider using something else, like `/usr/local/home/perforce` in the example below): sudo useradd -d /home/perforce -s /bin/bash -m perforce -g perforce . Allow the perforce user sudo access - Option 1 (full sudo) sudo touch /etc/sudoers.d/perforce sudo chmod 0600 /etc/sudoers.d/perforce sudo echo "perforce ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/perforce sudo chmod 0400 /etc/sudoers.d/perforce . Allow the perforce user sudo access - Option 2 (limited sudo) sudo touch /etc/sudoers.d/perforce sudo chmod 0600 /etc/sudoers.d/perforce vi /etc/sudoers.d/perforce . In the text editor, make the file look like this to give limited sudo, replacing `EDTIME_HOSTNAME` with the current machine: Cmnd_Alias P4_SVC = /usr/bin/systemctl start p4d_*, \ /usr/bin/systemctl start p4d_*, \ /usr/bin/systemctl stop p4d_*, \ /usr/bin/systemctl restart p4d_*, \ /usr/bin/systemctl status p4d_*, \ /usr/bin/systemctl cat p4d_*, \ /usr/bin/systemctl start p4dtg_*, \ /usr/bin/systemctl stop p4dtg_*, \ /usr/bin/systemctl restart p4dtg_*, \ /usr/bin/systemctl status p4dtg_*, \ /usr/bin/systemctl cat p4dtg_*, \ /usr/bin/systemctl start p4broker_*, \ /usr/bin/systemctl stop p4broker_*, \ /usr/bin/systemctl restart p4broker_*, \ /usr/bin/systemctl status p4broker_*, \ /usr/bin/systemctl cat p4broker_*, \ /usr/bin/systemctl start p4p_*, \ /usr/bin/systemctl stop p4p_*, \ /usr/bin/systemctl restart p4p_*, \ /usr/bin/systemctl status p4p_*, \ /usr/bin/systemctl cat p4p_*, \ /usr/bin/systemctl start p4prometheus*, \ /usr/bin/systemctl stop p4prometheus*, \ /usr/bin/systemctl restart p4prometheus*, \ /usr/bin/systemctl status p4prometheus*, \ /usr/bin/systemctl cat p4prometheus*, \ /usr/bin/setcap, \ /usr/bin/getcap perforce EDITME_HOSTNAME = (root) NOPASSWD: P4_SVC . Then lock down the file: sudo chmod 0400 /etc/sudoers.d/perforce . Create or mount the OS server file system volumes (per layout in previous section) * `/hxdepots` * `/hxlogs` + and either: * `/hxmetadata` + or * `/hxmetadata1` * `/hxmetadata2` . These directories should be owned by: `perforce:perforce` sudo chown -R perforce:perforce /hx* . (Optional) if you have different root directories, or are putting all files into one mounted filesystem (only recommended for small repositories), then do something like the following: + Option 1, all under a single directory `/data`: cd /data mkdir hxmetadata hxlogs hxdepots sudo chown -R perforce:perforce /data/hx* cd / ln -s /data/hx* . sudo chown -h perforce:perforce /hx* + Option 2, different mounted root folders, e.g. `/P4metadata`, `/P4logs`, `/P4depots`: sudo chown -R perforce:perforce /P4metadata /P4logs /P4depots ln -s /P4metadata /hxmetadata ln -s /P4logs /hxlogs ln -s /P4depots /hxdepots sudo chown -h perforce:perforce /hx* . Extract the SDP tarball. cd /hxdepots tar -xzf /WhereYouDownloaded/sdp.Unix.tgz . Set environment variable SDP. export SDP=/hxdepots/sdp . Make the entire $SDP (`/hxdepots/sdp`) directory writable by `perforce:perforce` with this command: chmod -R +w $SDP . Download the appropriate p4, p4d and p4broker binaries for your release and platform: cd /hxdepots/sdp/helix_binaries ./get_helix_binaries.sh + If you want to specify a particular release, use the `-r` option as in this example specifying the r20.2 release: cd /hxdepots/sdp/helix_binaries ./get_helix_binaries.sh -r r20.2 ==== Manual Install Initial setup The next steps highlight the setup and configuration of a new Helix Core instance using the `mkdirs.sh` script included in the SDP. [source] .Usage ---- include::gen/mkdirs.sh.man.txt[] ---- IMPORTANT: If you use a "name" for the instance (not an integer) you MUST modify the P4PORT variable in the `mkdirs._instance_.cfg` file. NOTE: The instance name must map to the name of the cfg file or the default file will be used with potentially unexpected results. Examples: * `mkdirs.sh 1` requires `mkdirs.1.cfg` * `mkdirs.sh ion` requires `mkdirs.ion.cfg` [start=3] . Put the Perforce license file for the p4d server instance into `/p4/1/root` NOTE: if you have multiple instances and have been provided with port-specific licenses by Perforce, the appropriate license file must be stored in the appropriate `/p4//root` folder. IMPORTANT: the license file must be renamed to simply the name `license`. Your Helix Core instance is now setup, but not running. The next steps detail how to make the Helix Core p4d instance a system service. You are then free to start up the `p4d` instance as documented in <<_starting_and_stopping_services>>. Please note that if you have configured SSL, then refer to <<_use_of_ssl>>. ===== Use of SSL As documented in the comments in mkdirs.cfg, if you are planning to use SSL you need to set the value of: SSL_PREFIX=ssl: Then you need to put certificates in `/p4/ssl` after the SDP install or you can generate a self signed certificate as follows: Edit `/p4/ssl/config.txt` to put in the info for your company. Then run: /p4/common/bin/p4master_run /p4//bin/p4d_ -Gc For example using instance 1: /p4/common/bin/p4master_run 1 /p4/1/bin/p4d_1 -Gc In order to validate that SSL is working correctly: source /p4/common/bin/p4_vars 1 Check that P4TRUST is appropriately set in the output of: p4 set Update the P4TRUST values: p4 trust -y p4 -p ssl:$HOSTNAME:1666 trust -y # Assuming correct port p4 -p $P4MASTERPORT trust -y Check the stored P4TRUST values: p4 trust -l You need to have an entry for the above for both loopback (`127.0.0.1` and the IP address of current machine) Check you are not prompted for trust: p4 login p4 info ===== Configuration script mkdirs.cfg The `mkdirs.sh` script executed above resides in `$SDP/Server/Unix/setup`. It sets up the basic directory structure used by the SDP. Carefully review the config file `mkdirs.**_instance_**.cfg` for this script before running it, and adjust the values of the variables as required. The important parameters are: [cols=",",options="header",] |=== |Parameter |Description |DB1 |Name of the hxmetadata1 volume (can be same as DB2) |DB2 |Name of the hxmetadata2 volume (can be same as DB1) |DD |Name of the hxdepots volume |LG |Name of the hxlogs volume |CN |Volume for /p4/common |SDP |Path to SDP distribution file tree |SHAREDDATA |TRUE or FALSE - whether sharing the /hxdepots volume with a replica - normally this is FALSE |ADMINUSER |P4USER value of a Perforce super user that operates SDP scripts, typically `perforce`. |OSUSER |Operating system user that will run the Perforce instance, typically perforce. |OSGROUP |Operating system group that OSUSER belongs to, typically perforce. |CASE_SENSITIVE |Indicates if p4d server instance has special case sensitivity settings |SSL_PREFIX |Set if SSL is required so either "ssl:" or blank for no SSL |P4ADMINPASS a| Password to use for Perforce superuser account - can be edited later in /p4/common/config/.p4password.p4_1.admin |P4SERVICEPASS a| This value is not used by any SDP scripts or standard procedures. It is left in place for backward compatibility. |P4MASTERHOST |Fully qualified DNS name of the Perforce master server machine for this instance. should refer to the DNS of the edge server machine. Otherwise replicas should refer to the commit-server machine. |=== For a detailed description of this config file it is fully documented with in-file comments, or see ==== SDP Init Scripts The SDP includes templates for initialization scripts ("init scripts") that provide basic service `start`/`stop`/`status` functionality for a variety of Perforce server products, including: * p4d * p4broker * p4p * p4dtg During initialization for an SDP instance, the SDP `mkdirs.sh` script creates a set of initialization scripts based on the templates, and writes them in the instance-specific bin folder (the "Instance Bin" directory), `/p4/_N_/bin`. For example, the `/p4/1/bin` folder for instance `1` might contain any of the following: p4d_1_init p4broker_1_init p4p_1_init p4dtg_1_init The set of `*_init` files in the Instance Bin directory defines which services (p4d, p4broker, p4p, and/or p4dtg) are active for the given instance on the current machine. A common configuration is to run both p4d and p4broker together, or only run a p4p on a machine. Unused init scripts must be removed from the Instance Bin dir. For example, if a p4p is not needed for instance 1 on the current machine, then `/p4/1/bin/p4p_1_init` should be removed. For example, the init script for starting p4d for instance 1 is `/p4/1/bin/p4d_1_init`. All init scripts accept at least `start`, `stop`, and `status` arguments. How the init scripts are called depends on whether your operating system uses the systemd or older SysV init mechanism. This is detailed in sections specific to each init mechanism below. Templates for the init scripts are stored in: /p4/common/etc/init.d ===== Configuring systemd ====== Configuring systemd for p4d RHEL/CentOS 7 or 8, SuSE 12, Ubuntu (>= v16.04), Amazon Linux 2, and other Linux distributions utilize *systemd / systemctl* as the mechanism for controlling services, replacing the earlier SysV init process. Templates for systemd *.service files are included in the SDP distribution in `$SDP/Server/Unix/p4/common/etc/systemd/system`. Note that using `systemd` is strongly recommended on systems that support it, for safety reasons. However, enabling services to start automatically on boot is optional. To configure p4d for systemd, run these commands as the root user: I=1 Replace the `1` on the right side of the `=` with your SDP instance name, e.g. xyz if your P4ROOT is /p4/xyz/root. Then: cd /etc/systemd/system sed -e "s:__INSTANCE__:$I:g" -e "s:__OSUSER__:perforce:g" $SDP/Server/Unix/p4/common/etc/systemd/system/p4d_N.service.t > p4d_${I}.service chmod 644 p4d_${I}.service systemctl daemon-reload If you are configuring p4d for more than one instance, repeat the `I=` command with each instance name on the right side of the `=`, and then repeat the block of commands above. Once configured, the following are sample management commands to start, stop, and status the service. These following commands are typically run as the `perforce` OSUSER using `sudo` where needed: systemctl cat p4d_1 systemctl status p4d_1 sudo systemctl start p4d_1 sudo systemctl stop p4d_1 IMPORTANT: if running with SELinux in enforcing mode, see <<_enabling_systemd_under_selinux>> .Systemd Required if Configured **** If you are using `systemd` and you have configured services as above, then you can no longer run the `\*_init` scripts directly for normal service `start`/`stop`, though they can still be used for `status`. The `sudo systemctl` commands **must** be used for `start`/`stop`. Attempting to run the underlying scripts directly will result in an error message if systemd is configured. This is for safety: systemd's concept of service status (up or down) is only reliable when systemd starts and stops the service itself. The SDP init scripts require the systemd mechanism (using the `systemctl` command) to be used if it is configured. This ensures that services will gracefully stop the service on reboot (which would otherwise present a risk of data corruption for p4d on reboot). The SDP requires systemd to be used if it is configured, and we strongly recommend using system on systems that use it. We recommend this to eliminate the risk of corruption on reboot, and also for consistency of operations. However, the SDP does not require systemd to be used. The SDP uses `systemctl cat` of the service name (e.g. `p4d_1`) to determine if systemd is configured for any given service. **** ====== Configuring systemd for p4p Configuring p4p for systemd is identical to the configuration the for p4d, except that you would replace `p4d` with `p4p` in the sample commands above for configuring p4d. TIP: Note SELinux fix (<<_enabling_systemd_under_selinux>>) may be similarly required. ====== Configuring systemd for p4dtg Configuring p4dtg for systemd is identical to the configuration the for p4d, except that you would replace `p4d` with `p4dtg` in the sample commands above for configuring p4d. TIP: Note SELinux fix (<<_enabling_systemd_under_selinux>>) may be similarly required. ====== Configuring systemd p4broker - multiple configs Configuring p4broker for systemd can be similar to configuration the for p4d, but there are extra options as you may choose to run multiple broker configurations. For example, you may have: * a default p4broker configuration that runs when the service is live, * a "Down for Maintenance" (DFM) broker used in place of the default broker during maintenance to help lock out users broadcasting a friendly message like "Perforce is offline for scheduled maintenance." * SSL broker config enabling an SSL-encrypted connection to a server that might not yet require SSL encryption for all users. The service name for the default broker configuration is always `p4broker_N`, where `N` is the instance name, e.g. `p4broker_1` for instance `1`. This uses the default broker config file, `/p4/common/config/p4_1.broker.cfg`. .Host Specific Broker Config **** For circumstances where host-specific broker configuration is required, the default broker will use a `/p4/common/config/p4_N.broker..cfg` if it exists, where `` is whatever is returned by the command `hostname -s`. The logic in the broker init script will favor the host-specific config if found, otherwise it will use the standard broker config. **** When alternate broker configurations are used, each alternate configuration file must have a separate systemd unit file associated with managing that configuration. The service file must specify a configuration tag name, such as 'dfm' or 'ssl'. That tag name is used to identify both the broker config file and the systemd unit file for that broker. If the broker config is intended to run concurrently with the default broker config, it must listen on a different port number than the one specified in the default broker config. If it is only intended to run in place of the standard config, as with a 'dfm' config, then it should listen on the same port number as the default broker if a default broker is used, or else the same port as the p4d server if brokers are used only for dfm. The systemd service for a broker intended to run only during maintenance should not be enabled, and thus only manually started/stopped as part of maintenance procedures. TIP: If maintenance procedures involve a reboot of a server machine, you may also want to disable all services during maintenance and re-enable them afterward. For example, say you want a default broker, a DFM broker, and an SSL broker for instance 1. The default and SSL brokers will run continuously, and the DFM broker only during scheduled maintenance. The following broker config files would be needed in `/p4/common/config`: * `p4_1.broker.cfg` - default broker, targets p4d on port 1999, listens on port 1666 * `p4_1.broker.ssl.cfg` - SSL broker, targets p4d on port 1999, listens on port 1667 * `p4_1.broker.dfm.cfg` - DFM broker, targets p4d on port 1999 , listens on port 1666. Then, create a systemd *.service file that references each config. For the default broker, use the template just as with p4d above. Do the following as the `root` user: I=1 Replace the `1` on the right side of the `=` with your SDP instance name, e.g. xyz if your P4ROOT is /p4/xyz/root. Then: cd /etc/systemd/system sed -e "s:__INSTANCE__:$I:g" -e "s:__OSUSER__:perforce:g" $SDP/Server/Unix/p4/common/etc/systemd/system/p4broker_N.service.t > p4broker_$I.service chmod 644 p4broker_$I.service systemctl daemon-reload Once configured, the following are sample management commands to start, stop, and status the service. These following commands are typically run as the `perforce` OSUSER using `sudo` where needed: systemctl cat p4broker_1 systemctl status p4broker_1 sudo systemctl start p4broker_1 sudo systemctl stop p4broker_1 For the non-default broker configs for the SSL and DFM brokers, start by copying the default broker config to a new *.service file with `_ssl` or `_dfm` inserted into the name, like so: cd /etc/systemd/system cp p4broker_1.service p4broker_1_dfm.service cp p4broker_1.service p4broker_1_ssl.service Next, modify the p4broker_1_dfm.service file and p4broker_1_ssl.service files with a text editor, making the following edits: * Find the string that says `using default broker config`, and change the word `default` to `dfm` or `ssl` as appropriate, so it reads something like `using dfm broker config`. * Change the ExecStart and ExecStop definitions by appending the `dfm` or `ssl` tag. For example, change these two lines: ExecStart=/p4/1/bin/p4broker_1_init start ExecStop=/p4/1/bin/p4broker_1_init stop to look like this for the `dfm` broker: ExecStart=/p4/1/bin/p4broker_1_init start dfm ExecStop=/p4/1/bin/p4broker_1_init stop dfm After any modifications to systemd *.services files are made, reload them into with: systemctl daemon-reload At this point, the services `p4broker_1`, `p4broker_1_dfm`, and `p4broker_1_ssl` can be started and stopped normally. Finally, enable those services you want to start on boot. In our example here, we will enable the default and ssl broker services to start on boot, but not the DFM broker: systemctl enable p4broker_1 systemctl enable p4broker_1_ssl You must be aware of which configurations listen on the same port, and not try to runs those configurations concurrently. In this case, ensure the default and dfm brokers don't run at the same time. So, for example, you might start a maintenance window with: sudo systemctl stop p4broker_1 p4d_1 sudo systemctl start p4broker_1_dfm and end maintenance in the opposite order: sudo systemctl stop p4broker_1_dfm sudo systemctl start p4broker_1 p4d_1 Details may vary depending on what is occurring during maintenance. TIP: Note SELinux fix (<<_enabling_systemd_under_selinux>>) may be similarly required. ===== Enabling systemd under SELinux If you have `SELinux` in `Enforcing` mode, then you may get an error message when you try and start the service: ``` $ systemctl start p4d_1 $ systemctl status p4d_1 : Active: failed Process: 1234 ExecStart=/p4/1/bin/p4d_1_init start (code=exited, status=203/EXEC) : $ journalctl -u p4d_1 --no-pager | tail : ... p4d_1.service: Failed to execute command: Permission denied ... p4d_1.service: Failed at step EXEC spawning p4d_1_init: Permission denied ``` This can be easily fixed (as `root`): semanage fcontext -a -t bin_t /p4/1/bin/p4d_1_init restorecon -vF /p4/1/bin/p4d_1_init TIP: If not already installed then `yum install policycoreutils-python-utils` gets you the basic commands mentioned above - you don't need the full `setools` which comes with a GUI! Then try again: systemctl start p4d_1 systemctl status p4d_1 The status command should show `Active: active` For troubleshooting SELinux, we recommend link:https://www.serverlab.ca/tutorials/linux/administration-linux/troubleshooting-selinux-centos-red-hat/[the setroubleshoot utility] TIP: Look for denied in /var/log/audit.log and then `ls -alZ ` for any file that triggered the denied message and go from there. ===== Configuring SysV Init Scripts To configure services for an instance on systems using the SysV init mechanism, run these commands as the `root` user: Repeat this step for all instance init scripts you wish to configure as system services. cd /etc/init.d ln -s /p4/1/bin/p4d_1_init chkconfig --add p4d_1_init With that done, you can `start`/`stop`/`status` the service as `root` by running commands like: service p4d_1_init status service p4d_1_init start service p4d_1_init stop On SysV systems, you can also run the underlying init scripts directly as either the `root` or `perforce` user. If run as `root`, the script becomes `perforce` immediately, so that no processing occurs as root. ==== Configuring Automatic Service Start on Boot You may want to configure your server machine such that the Helix Core Server for any given instance (and/or Proxy and/or Broker) will start automatically when the machine boots. This is done using Systemd or Init scripts as covered below. ===== Automatic Start for Systems using systemd Once systemd services are configured, you can enable the service to start on boot with a command like this, run a s `root`: systemctl enable p4d_1 The `enable` command configures the services to start automatically when the machine reboots, but does not immediately start the service. _Enabling services is optional_; you can start and stop the services manually regardless of whether it is enabled for automatic start on boot. ===== For systems using the SysV init mechanism Once SysV services are configured, you can enable the service to start on boot with a command like this, run as `root`: chkconfig p4d_1_init on ==== SDP Crontab Templates The SDP includes basic crontab templates for master, replica, and edge servers in: /p4/common/etc/cron.d These define schedules for routine checkpoint operations, replica status checks, and email reviews. ==== Completing Your Server Configuration . Ensure that the admin user configured above has the correct password defined in `/p4/common/config/.p4passwd.p4_1.admin`, and then run the `p4login1` script (which calls the `p4 login` command using the `.p4passwd.p4_1.admin` file). . For new server instances, run this script, which sets several recommended configurables: cd /p4/sdp/Server/setup/configure_new_server.sh 1 For existing servers, examine this file, and manually apply the `p4 configure` command to set configurables on your Perforce server instance. Initialize the perforce user's crontab with one of these commands: crontab /p4/p4.crontab and customize execution times for the commands within the crontab files to suite the specific installation. The SDP uses wrapper scripts in the crontab: `run_if_master.sh`, `run_if_edge.sh`, `run_if_replica.sh`. We suggest you ensure these are working as desired, e.g. /p4/common/bin/run_if_master.sh 1 echo yes /p4/common/bin/run_if_replica.sh 1 echo yes /p4/common/bin/run_if_edge.sh 1 echo yes The above should output `yes` if you are on the master (commit) machine (or replica/edge as appropriate), but otherwise nothing. Any issues with the above indicate incorrect values for `$MASTER_ID`, or for other values within `/p4/common/config/p4_1.vars` (assuming instance `1`). You can debug this with: bash -xv /p4/common/bin/run_if_master.sh 1 echo yes If in doubt contact support. ==== Validating your SDP installation Source your SDP environment variables and check that they look appropriate - for `1`: source /p4/common/bin/p4_vars 1 The output of `p4 set` should be something like: P4CONFIG=/p4/1/.p4config (config 'noconfig') P4ENVIRO=/dev/null/.p4enviro P4JOURNAL=/p4/1/logs/journal P4LOG=/p4/1/logs/log P4PCACHE=/p4/1/cache P4PORT=ssl:1666 P4ROOT=/p4/1/root P4SSLDIR=/p4/ssl P4TICKETS=/p4/1/.p4tickets P4TRUST=/p4/1/.p4trust P4USER=perforce There is a script `/p4/common/bin/verify_sdp.sh`. Run this specifying the id, e.g. /p4/common/bin/verify_sdp.sh 1 The output should be something like: verify_sdp.sh v5.6.1 Starting SDP verification on host helixcorevm1 at Fri 2020-08-14 17:02:45 UTC with this command line: /p4/common/bin/verify_sdp.sh 1 If you have any questions about the output from this script, contact support-helix-core@perforce.com. ------------------------------------------------------------------------------ Doing preflight sanity checks. Preflight Check: Ensuring these utils are in PATH: date ls grep awk id head tail Verified: Essential tools are in the PATH. Preflight Check: cd /p4/common/bin Verified: cd works to: /p4/common/bin Preflight Check: Checking current user owns /p4/common/bin Verified: Current user [perforce] owns /p4/common/bin Preflight Check: Checking /p4 and /p4/ are local dirs. Verified: P4HOME has expected value: /p4/1 Verified: This P4HOME path is not a symlink: /p4/1 Verified: cd to /p4 OK. Verified: Dir /p4 is a local dir. Verified: cd to /p4/1 OK. Verified: P4HOME dir /p4/1 is a local dir. Finishing with: Verifications completed, with 0 errors and 0 warnings detected in 57 checks. If it mentions something like: Verifications completed, with 2 errors and 1 warnings detected in 57 checks. then review the details. If in doubt contact Perforce Support: support-helix-core@perforce.com === Local SDP Configuration There are many scenarios where you may need to override a default value that the SDP provides. These changes must be done in specific locations so that your changes persist across SDP upgrades. There are two different scopes of configuration to be aware of and two locations you can place your configuration in: [options="header"] |====================================================================================== | Location | Scope | Description | /p4/common/site/config/$P4SERVER.vars.local | SDP Instance Specific | Single configuration file that is scoped to a single SDP Instance | /p4/common/site/config/$P4SERVER.vars.local.d/* | SDP Instance Specific | Directory of configuration files that are scoped to a single SDP Instance | /p4/common/site/config/p4_vars.local | SDP Wide | Single configuration file that is scoped to all SDP Instances | /p4/common/site/config/p4_vars.local.d/* | SDP Wide | Directory of configuration files that are scoped to all SDP Instances |====================================================================================== ==== Load Order [arabic] . `/p4/common/bin/p4_vars` . `/p4/common/site/config/p4_vars.local` . `/p4/common/site/config/p4_vars.local.d/*` . `/p4/common/config/$P4SERVER.vars` . `/p4/common/site/config/$P4SERVER.vars.local.d/*` === Setting your login environment for convenience Consider adding this to your `.bashrc` for the perforce user as a convenience for when you login: echo "source /p4/common/bin/p4_vars 1" >> ~/.bashrc Obviously if you have multiple instances on the same machine you might want to setup an alias or two to quickly switch between them. === Configuring protections, file types, monitoring and security After the server instance is installed and configured, either with the Helix Installer or a manual installation, most sites will want to modify server permissions ("Protections") and security settings. Other common configuration steps include modifying the file type map and enabling process monitoring. To configure permissions, perform the following steps: [arabic] . To set up protections, issue the `p4 protect` command. The protections table is displayed. . Delete the following line: write user * * //depot/... . Define protections for your repository using groups. Perforce uses an inclusionary model. No access is given by default, you must specifically grant access to users/groups in the protections table. It is best for performance to grant users specific access to the areas of the depot that they need rather than granting everyone open access, and then trying to remove access via exclusionary mappings in the protect table even if that means you end up generating a larger protect table. . To set the default file types, run the p4 typemap command and define typemap entries to override Perforce's default behavior. . Add any file type entries that are specific to your site. Suggestions: * For already-compressed file types (such as `.zip`, `.gz`, `.avi`, `.gif`), assign a file type of `binary+Fl` to prevent p4d from attempting to compress them again before storing them. * For regular binary files, add `binary+l` to make so that only one person at a time can check them out. + A sample file is provided in `$SDP/Server/config/typemap` If you are doing things like games development with `Unreal Engine` or `Unity`, then there are specific recommended typemap to add in KB articles: https://portal.perforce.com/s/[Search the Knowledge Base] . To make your changelists default to restricted (for high security environments): p4 configure set defaultChangeType=restricted === Operating system configuration Check <<_maximizing_server_performance>> for detailed recommendations. ==== Configuring email for notifications Use Postfix - which Integrates easily with Gmail, Office365 etc just search for postfix and the email provider. Examples: * https://www.howtoforge.com/tutorial/configure-postfix-to-use-gmail-as-a-mail-relay/ * https://support.google.com/accounts/answer/185833?hl=en#zippy=%2Cwhy-you-may-need-an-app-password * https://www.middlewareinventory.com/blog/postfix-relay-office-365/#3_Office_365_SMTP_relay_Discussed_in_this_Post Please note that for Gmail: * You must turn on 2FA for the account which is trying to create an app password * The organization must allow 2FA (2-Step Verification) - this is normally turned off in Google Workspace (formerly known as G Suite). Testing of email once configured: echo "Test email" | mail -s "Test email subject" user@example.com If there are problems sending email, then this may find the problem: grep postfix /var/log/* cat /var/log/maillog ==== Swarm Email Configuration The advantage of installing Postfix is that it is easily testable from the command line as above. The Swarm configuration then becomes editing `config.php` as below (optional sender address) and restarting Swarm in the normal way (resetting its cache first): [source,php] ---- // this block should be a peer of 'p4' 'mail' => array( // 'sender' => 'swarm@my.domain', // defaults to 'notifications@hostname' 'transport' => array( 'name' => 'localhost', // name of SMTP host 'host' => 'localhost', // host/IP of SMTP host ), ), ), ---- Restarting Swarm (on CentOS): cd /opt/perforce/swarm/data rm cache/*cache.php systemctl restart httpd ==== Configuring PagerDuty for notifications The default behavior of the SDP is to use email for delivering alerts and log files. This section details replacing email with https://www.pagerduty.com/[PagerDuty]. ===== Prerequisites * https://www.pagerduty.com/[PagerDuty Account] * https://support.pagerduty.com/docs/service-directory[PagerDuty Service] where SDP/Helix Core incidents will be created * Events API V2 Integration added to PagerDuty Service, this will produce an Integration Key which will be used later * https://github.com/martindstone/pagerduty-cli/wiki/PagerDuty-CLI-User-Guide#installation-and-getting-started[Install PagerDuty CLI] ===== SDP Configuration The following can be added to `/p4/common/site/config/p4_vars.local` to configure the SDP to use PagerDuty: # set this environment variable to the Integration Key that was created when adding the # Events API V2 Integration to your PagerDuty Service export PAGERDUTY_ROUTING_KEY="2ac2....e5c3" ===== Optional variables The SDP will automatically set the Title of the PagerDuty Incident based on the exception that occurred. The SDP will also include the log file from the exception (example: checkpoint log, p4verify log, etc). If you have multiple Helix Core servers it will be helpful to include some additional context with the incident so you know which server the alert is coming from. The following environment variable can optionally be used to add additional context to the PagerDuty Incident: # export PAGERDUTY_CUSTOM_FIELD="" ====== Example Additional Context Configuration The following snippet will create environment variables in `p4_vars.local` that will provide additional context in each PagerDuty Incident: curl -s -H Metadata:true --noproxy "*" "http://169.254.169.254/metadata/instance?api-version=2021-02-01" > /tmp/azure_metadata cat <<-EOF >> /p4/common/site/config/p4_vars.local export PAGERDUTY_ROUTING_KEY="2ac2....e5c3" export VM_ID="$(jq -r '.compute.vmId' /tmp/azure_metdata)" export REGION="$(jq -r '.compute.location' /tmp/azure_metdata)" export AZURE_SUBSCRIPTION_ID="$(jq -r '.compute.subscriptionId' /tmp/azure_metdata)" export PAGERDUTY_CUSTOM_FIELD=\$(cat <<-END ############################################# Azure Subscription: \$AZURE_SUBSCRIPTION_ID Region: \$REGION Azure VM ID: \$VM_ID ############################################# END ) EOF The following context will be added as a field on the PagerDuty Incident: ############################################# Azure Subscription: f306878d-d321-4731-4cd3-f3afafbbd3ac Region: eastus Azure VM ID: 5ee13bfe-8a0c-486f-ae08-c43e44255d15 ############################################# ==== Configuring AWS Simple Notification Service (SNS) for notifications The default behavior of the SDP is to use email for delivering alerts and log files. This section details replacing email with AWS SNS. ===== Prerequisites * AWS CLI installed * Authorization for `publish` to a AWS SNS topic ===== SDP Configuration The following can be added to `/p4/common/config/p4_1.vars` to configure the SDP to use SNS: # SNS Alert Configurations # Two methods of authentication are supported: key pair (on prem, azure, etc) and IAM role (AWS deployment) # In the case of IAM role the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables must not be set, not even empty strings # To test SNS delivery use the following command: aws sns publish --topic-arn $SNS_ALERT_TOPIC_ARN --subject test --message "this is a test" # export AWS_ACCESS_KEY_ID="" # export AWS_SECRET_ACCESS_KEY="" export AWS_DEFAULT_REGION="us-east-1" export SNS_ALERT_TOPIC_ARN="arn:aws:sns:us-east-1:541621974560:Perforce-Notifications-SnsTopic-1FIRH0KEAXTU" ===== Example IAM Policy The following is an example policy that could be used for either an IAM Role or an IAM user with key/secret: { "Version": "2012-10-17", "Statement": [ { "Action": "sns:Publish", "Resource": "arn:aws:sns:us-east-1:541621974560:Perforce-Notifications-*", "Effect": "Allow" } ] } === Other server configurables There are various configurables that you should consider setting for your server instance. Some suggestions are in the file: `$SDP/Server/setup/configure_new_server.sh` Review the contents and either apply individual settings manually, or edit the file and apply the newly edited version. If you have any questions, please see the https://www.perforce.com/manuals/cmdref/Content/CmdRef/configurables.configurables.html[configurables section in Command Reference Guide appendix] (get the right version for your server!). You can also contact support regarding questions. === Archiving configuration files Now that the server instance is running properly, copy the following configuration files to the hxdepots volume for backup: * Any init scripts used in `/etc/init.d` or any systemd scripts to `/etc/systemd/system` * A copy of the crontab file, obtained using `crontab -l`. * Any other relevant configuration scripts, such as cluster configuration scripts, failover scripts, or disk failover configuration files. === Installing Swarm Triggers On the commit server (*NOT* the Swarm machine), get it setup to connect to the Perforce package repo (if not already done). See: https://www.perforce.com/perforce-packages Install the trigger package, e.g.: * `yum install helix-swarm-triggers` (if Red Hat family, i.e. RHEL, Rocky Linux, CentOS, Amazon Linux). * `apt install helix-swarm-triggers` (for Ubuntu) Then (for SDP environments for ease): sudo chown -R perforce:perforce /opt/perforce/etc Then install the triggers on the p4d server. Something like: vi /opt/perforce/etc/swarm-triggers.conf Make it look something like (in SDP env): SWARM_HOST='https://swarm.p4.p4bsw.com' SWARM_TOKEN='MY-UUID-STYLE-TOKEN' ADMIN_USER='swarm' ADMIN_TICKET_FILE='/p4/1/.p4tickets' P4_PORT='ssl:1666' P4='/p4/1/bin/p4_1' EXEMPT_FILE_COUNT=0 EXEMPT_EXTENSIONS='' VERIFY_SSL=1 TIMEOUT=30 IGNORE_TIMEOUT=1 IGNORE_NOSERVER=1 Then test that config file: chmod +x /p4/sdp/Unsupported/setup/swarm_triggers_test.sh /p4/sdp/Unsupported/setup/swarm_triggers_test.sh Get that to be happy. May require iteration of the conf file, trigger install, etc. Then install triggers on the server. ``` cd /p4/1/tmp p4 triggers -o > temp_file.txt /opt/perforce/swarm-triggers/bin/swarm-trigger.pl -o >> tmp_file.txt vi tmp_file.txt # Clean up formatting, make it syntactically correct. p4 triggers -i < temp_file.txt p4 triggers -o # Make sure it's there. ``` Then test! ``` mkdir /p4/1/tmp/swarm_test cd /p4/1/tmp/swarm_test export P4CONFIG=.p4config echo P4CLIENT=swarm_test.$(hostname -s)>>.p4config # Make a workspace, map View to some location where we can edit harmlessly, # or use a stream like //sandbox/main p4 client p4 add chg.txt # The important thing is '#review' which trigger will process p4 change -o | sed 's::#review' > chg.txt p4 change -i < chg.txt p4 shelve -c CL # Use CL listed in output from prior command p4 describe -s CL # if #review gets replace by something like #review-12345, you're Done! ``` == Backup, Replication, and Recovery Perforce server instances maintain _metadata_ and _versioned files_. The metadata contains all the information about the files in the depots. Metadata resides in database (db.*) files in the server instance's root directory (P4ROOT). The versioned files contain the file changes that have been submitted to the repository. Versioned files reside on the hxdepots volume. This section assumes that you understand the basics of Perforce backup and recovery. For more information, consult the Perforce https://www.perforce.com/perforce/doc.current/manuals/p4sag/Content/P4SAG/chapter.backup.html[System Administrator's Guide] and https://www.perforce.com/perforce/doc.current/manuals/p4sag/Content/P4SAG/failover.html#Failover[failover]. === Typical Backup Procedure The SDP's maintenance scripts, run as `cron` tasks, periodically back up the metadata. The weekly sequence is described below. *Seven nights a week, perform the following tasks:* [arabic] . Truncate the active journal. . Replay the journal to the offline database. (Refer to Figure 2: SDP Runtime Structure and Volume Layout for more information on the location of the live and offline databases.) . Create a checkpoint from the offline database. . Recreate the offline database from the last checkpoint. *Once a week, perform the following tasks:* [arabic] . Verify all depot files. *Once every few months, perform the following tasks:* [arabic] . Stop the live server instance. . Truncate the active journal. . Replay the journal to the offline database. (Refer to Figure 2: SDP Runtime Structure and Volume Layout for more information on the location of the live and offline databases.) . Archive the live database. . Move the offline database to the live database directory. . Start the live server instance. . Create a new checkpoint from the archive of the live database. . Recreate the offline database from the last checkpoint. . Verify all depots. This normal maintenance procedure puts the checkpoints (metadata snapshots) on the hxdepots volume, which contains the versioned files. Backing up the hxdepots volume with a normal backup utility like _rsync_ preserves the critical assets necessary for recovery. To ensure that the backup does not interfere with the metadata backups (checkpoints), coordinate backup of the hxdepots volume using the SDP maintenance scripts. The preceding maintenance procedure minimizes service outage, because checkpoints are created from offline or saved databases while the live p4d server process is running on the live databases in P4ROOT. NOTE: With no additional configuration, the normal maintenance prevents loss of more than one day's metadata changes. To provide an optimal http://en.wikipedia.org/wiki/Recovery_point_objective[Recovery Point Objective] (RPO), the SDP provides additional tools for replication. === Planning for HA and DR // tag::HA_and_DR[] The concepts for HA (High Availability) and DR (Disaster Recovery) are fairly similar - they are both types of Helix Core replica. When you have server specs with `Services` field set to `commit-server`, `standard`, or `edge-server` - see https://www.perforce.com/perforce/doc.current/manuals/p4sag/Content/P4SAG/deployment-architecture.html[deployment architectures] you should consider your requirements for how to recover from a failure to any such servers. See also https://portal.perforce.com/s/article/5434[Replica types and use cases] The key issues are around ensuring that you have have appropriate values for the following measures for your Helix Core installation: * RTO - Recovery Time Objective - how long will it take you to recover to a backup? * RPO - Recovery Point Objective - how much data are you prepared to risk losing if you have to failover to a backup server? We need to consider planned vs unplanned failover. Planned may be due to upgrading the core Operating System or some other dependency in your infrastructure, or a similar activity. Unplanned covers risks you are seeking to mitigate with failover: * loss of a machine, or some machine related hardware failure (e.g. network) * loss of a VM cluster * failure of storage * loss of a data center or machine room * etc... So, if your main `commit-server` fails, how fast should be you be able to be up and running again, and how much data might you be prepared to lose? What is the potential disruption to your organization if the Helix Core repository is down? How many people would be impacted in some way? You also need to consider the costs of your mitigation strategies. For example, this can range from: * taking a backup once per 24 hours and requiring maybe an hour or two to restore it. Thus you might lose up to 24 hours of work for an unplanned failure, and require several hours to restore. * having a high availability replica which is a mirror of the server hardware and ready to take over within minutes if required Having a replica for HA or DR is likely to reduce your RPO and RTO to well under an hour (<10 minutes if properly prepared for) - at the cost of the resources to run such a replica, and the management overhead to monitor it appropriately. Typically we would define: * An HA replica is close to its upstream server, e.g. in the same Data Center - this minimizes the latency for replication, and reduces RPO * A DR replica is in a more remote location, so maybe risks being further behind in replication (thus higher RPO), but mitigates against catastrophic loss of a data center or similar. Note that "further behind" is still typically seconds for metadata, but can be minutes for submits with many GB of files. ==== Further Resources * https://portal.perforce.com/s/article/3166[High Reliability Solutions] ==== Creating a Failover Replica for Commit or Edge Server A commit server instance is the ultimate store for submitted data, and also for any workspace state (WIP - work in progress) for users directly working with the commit server (part of the same "data set") An edge server instance maintains its own copy of workspace state (WIP). If you have people connecting to an edge server, then any workspaces they create (and files they open for some action) will be only stored on the edge server. Thus it is normally recommended to have an HA backup server, so that users don't lose their state in case of failover. There is a concept of a "build edge" which is an edge server which only supports build farm users. In this scenario it may be deemed acceptable to not have an HA backup server, since in the case of failure of the edge, it can be re-seeded from the commit server. All build farm clients would be recreated from scratch so there would be no problems. ==== What is a Failover Replica? A Failover is the hand off of the role of a master/primary/commit server from a primary server machine to a standby replica (typically on a different server machine). As part of failover processing the secondary/backup is promoted to become the new master/primary/commit server. As of 2018.2 release, p4d supports a `p4 failover` command that performs a failover to a `standby` replica (i.e. a replica with `Services:` field value set to `standby` or `forwarding-standby`). Such a replica performs a `journalcopy` replication of metadata, with a local pull thread to update its `db.*` files. After the failover is complete, traffic must be redirected to the server machine where newly promoted standby server operates, e.g. with a DNS change (possibly automated with a post-failover trigger). See also: https://portal.perforce.com/s/article/16462[Configuring a Helix Core Standby]. ifdef::unix_doc[] On Linux the SDP script `mkrep.sh` greatly simplifies the process of setting up a replica suitable for use with the `p4 failover` command. See: <<_using_mkrep_sh>>. endif::[] ==== Mandatory vs Non-mandatory Standbys You can modify the `Options:` field of the server spec of a `standby` or `forwarding-standby` replica to make it `mandatory`. This setting affects the mechanics of how failover works. When a `standby` server instance is configured as mandatory, the master/commit server will wait until this server confirms it has processed journal data before allowing that journal data to be released to other replicas. This can simplify failover if the master server is unavailable to participate in the failover, since it provides a guarantee that no downstream servers are *ahead* of the replica. This guarantee is important, as it ensures downstream servers can simply be re-directed to point to the standby after the master server has failed over to its standby, and will carry on working without problems or need for human intervention on the servers. Failovers in which the master does not participate are generally referred to as _unscheduled_ or _reactive_, and are generally done in response to an unexpected situation. Failovers in which the master server is alive and well at the start of processing, and in which the master server participates in the failover, are referred to as _scheduled_ or _planned_. IMPORTANT: If a server which is marked as `mandatory` goes offline for any reason, the replication to other replicas will stop replicating. In this scenario, the server spec of the replica can be changed to `nomandatory`, and then replication will immediately resume, so long as the replication has not been offline for so long that the master server has removed numbered journals that would be needed to catch up (typically several days or weeks depending on the KEEPJNLS setting). If this happens, the p4d server logs of all impacted servers will clearly indicate the root cause, so long p4d versions are 2019.2 or later. If set to `nomandatory` then there is no risk of delaying downstream replicas, however there is no guarantee that they will be able to switch seamlessly over to the new server in event of an unscheduled failover. TIP: We recommend creating `mandatory` standby replica(s) if the server is local to its commit server. We also recommend active monitoring in place to quickly detect replication lag or other issues. To change a server spec to be `mandatory` or `nomandatory`, modify the server spec with a command like `p4 server p4d_ha_bos` to edit the form, and then change the value in the `Options:` field to be as desired, `mandatory` or `nomandatory`, and the save and exit the editor. ==== Server host naming conventions This is recommended, but not a requirement for SDP scripts to implement failover. * Use a name that does not indicate switchable roles, e.g. don't indicate in the name whether a host is a master/primary or backup, or edge server and its backup. This might otherwise lead to confusion once you have performed a failover and the host name is no longer appropriate. * Use names ending numeric designators, e.g. -01 or -05. The goal is to avoid being in a post-failover situation where a machine with `master` or `primary` is actually the backup. Also, the assumption is that host names will never need to change. * While you don't want switchable roles baked into the hostname, you can have static roles, e.g. use p4d vs. p4p in the host name (as those generally don't change). The p4d could be primary, standby, edge, edge's standby (switchable roles). * Using a short geographic site is sometimes helpful/desirable. If used, use the same site tag used in the ServerID, e.g. aus. ifdef::unix_doc[] + Valid site tags should be listed in: `/p4/common/config/SiteTags.cfg` - see <<_sitetags_cfg>> endif::[] * Using a short tag to indicate the major OS version is *sometimes* helpful/desirable, eg. c7 for CentOS 7, or r8 for RHEL 8. This is based on the idea that when the major OS is upgraded, you either move to new hardware, or change the host name (an exception to the rule above about never changing the hostname). This option maybe overkill for many sites. * End users should reference a DNS name that may include the site tag, but would exclude the number, OS indicator, and server type (`p4d`/`p4p`/`p4broker`), replacing all that with just `perforce` or optionally just `p4`. General idea is that users needn't be bothered by under-the-covers tech of whether something is a proxy or replica. * For edge servers, it is advisable to include `edge` in both the host and DNS name, as users and admins needs to be aware of the functional differences due to a server being an edge server. Examples: * `p4d-aus-r7-03`, a master in Austin on RHEL 7, pointed to by a DNS name like `p4-aus`. * `p4d-aus-03`, a master in Austin (no indication of server OS), pointed to by a DNS name like `p4-aus`. * `p4d-aus-r7-04`, a standby replica in Austin on RHEL 7, not pointed to by a DNS until failover, at which point it gets pointed to by `p4-aus`. * `p4p-syd-r8-05`, a proxy in Sydney on RHEL 8, pointed to by a DNS name like `p4-syd`. * `p4d-syd-r8-04`, a replica that replaced the proxy in Sydney, on RHEL 8, pointed to by a DNS name like `p4-syd` (same as the proxy it replaced). * `p4d-edge-tok-s12-03`, an edge in Tokyo running SuSE12, pointed to by a DNS name like `p4edge-tok`. * `p4d-edge-tok-s12-04`, a replica of an edge in Tokyo running SuSE12, not pointed to by a DNS name until failover, at which point it gets pointed to by `p4edge-tok`. FQDNs (fully qualified DNS names) of short DNS names used in these examples would also exist, and would be based on the same short names. // end::HA_and_DR[] === Full One-Way Replication Perforce supports a full one-way https://www.perforce.com/perforce/doc.current/manuals/p4sag/Content/P4SAG/replication.html[replication] of data from a master server to a replica, including versioned files. The https://www.perforce.com/manuals/cmdref/Content/CmdRef/p4_pull.html#p4_pull[p4 pull] command is the replication mechanism, and a replica server can be configured to know it is a replica and use the replication command. The p4 pull mechanism requires very little configuration and no additional scripting. As this replication mechanism is simple and effective, we recommend it as the preferred replication technique. Replica servers can also be configured to only contain metadata, which can be useful for reporting or offline checkpointing purposes. See the Distributing Perforce Guide for details on setting up replica servers. If you wish to use the replica as a read-only server, you can use the https://www.perforce.com/perforce/doc.current/manuals/p4sag/Content/P4SAG/chapter.broker.html[P4Broker] to direct read-only commands to the replica or you can use a forwarding replica. The broker can do load balancing to a pool of replicas if you need more than one replica to handle your load. ==== Replication Setup To configure a replica server, first configure a machine identically to the master server (at least as regards the link structure such as `/p4`, `/p4/common/bin` and `/p4/**_instance_**/*`), then install the SDP on it to match the master server installation. Once the machine and SDP install is in place, you need to configure the master server for replication. Perforce supports many types of replicas suited to a variety of purposes, such as: * Real-time backup, * Providing a disaster recovery solution, * Load distribution to enhance performance, * Distributed development, * Dedicated resources for automated systems, such as build servers, and more. We always recommend first setting up the replica as a read-only replica and ensuring that everything is working. Once that is the case you can easily modify server specs and configurables to change it to a forwarding replica, or an edge server etc. ==== Replication Setup for Failover This is just a special case of replication, but implementing <<_what_is_a_failover_replica>> Please note the section below <<_using_mkrep_sh>> which implements many details. ==== Pre-requisites for Failover // tag::prerequisites_for_failover[] These are vital as part of your planning. * Obtain and install a license for your replica(s) + Your commit or standard server has a license file (tied to IP address), while your replicas do not require one to function as replicas. + However, in order for a replica to function as a replacement for a commit or standard server, it must have a suitable license installed. + This should be requested when the replica is first created. See the form: https://www.perforce.com/support/duplicate-server-request * Review your authentication mechanism (LDAP etc) - is the LDAP server contactable from the replica machine (firewalls etc configured appropriately). * Review all your triggers and how they are deployed - will they work on the failover host? + Is the right version of Perl/Python etc correctly installed and configured on the failover host with all imported libraries? IMPORTANT: TEST, TEST, TEST!!! It is important to test the above issues as part of your planning. For peace of mind you don't want to be finding problems at the time of trying to failover for real, which may be in the middle of the night! // end::prerequisites_for_failover[] On Linux: * Review the configuration of options such as <<_ensure_transparent_huge_pages_thp_is_turned_off>> and also <<_putting_server_locks_directory_into_ram>> are correctly configured for your HA server machine - otherwise you *risk reduced performance* after failover. ==== Using mkrep.sh The SDP `mkrep.sh` script should be used to expand your Helix Topology, e.g. adding replicas and edge servers. NOTE: When creating server machines to be used as Helix servers, the server machines should be named following a well-designed host naming convention. The SDP has no dependency on the convention used, and so any existing local naming convention can be applied. The SDP includes a suggested naming convention in <<_server_host_naming_conventions>> [source] .Usage ---- include::gen/mkrep.sh.man.txt[] ---- ===== SiteTags.cfg The `mkrep.sh` documentation references a SiteTags.cfg file used to register short tag names for geographic sites. Location is: `/p4/common/config/SiteTags.cfg` Your tags should use abbreviations that are meaningful to your organization. .Example/Format ---- include::../Server/Unix/p4/common/config/SiteTags.cfg.sample[] ---- A sample file exists `/p4/common/config/SiteTags.cfg.sample`. ===== Output of `mkrep.sh` The output of `mkrep.sh` (which is also written to a log file in `/p4//logs/mkrep.*`) describes a number of steps required to continue setting up the replica after the metadata configuration performed by the script is done. ==== Addition Replication Setup In addition to steps recommended by `mkrep.sh`, there are other steps to be aware of to prepare a replica server machine. ==== SDP Installation The SDP must first be installed on the replica server machine. If SDP already exists on the machine but not for the current instance, then `mkdirs.sh` must be used to add a new instance to the machine. ===== SSH Key Setup SSH keys for the `perforce` operating system user should be setup to allow the `perforce` user to `ssh` and `rsync` among the Helix server machines in the topology. If no `~perforce/.ssh` directory exist on a machine, it can be created with this command: ssh-keygen -t rsa -b 4096 === Recovery Procedures There are three scenarios that require you to recover server data: [cols=",,",options="header",] |=== |Metadata |Depotdata |Action required |lost or corrupt |Intact |Recover metadata as described below |Intact |lost or corrupt |Call Perforce Support |lost or corrupt |lost or corrupt a| Recover metadata as described below. Recover the hxdepots volume using your normal backup utilities. |=== Restoring the metadata from a backup also optimizes the database files. ==== Recovering a master server from a checkpoint and journal(s) The checkpoint files are stored in the `/p4/**_instance_**/checkpoints` directory, and the most recent checkpoint is named `p4_**_instance_**.ckp.**_number_**.gz`. Recreating up-to-date database files requires the most recent checkpoint, from `/p4/**_instance_**/checkpoints` and the journal file from `/p4/**_instance_**/logs`. To recover the server database manually, perform the following steps from the root directory of the server (/p4/instance/root). Assuming instance 1: [arabic] . Stop the Perforce Server by issuing the following command: /p4/1/bin/p4_1 admin stop . Delete the old database files in the `/p4/1/root/save` directory . Move the live database files (db.*) to the save directory. . Use the following command to restore from the most recent checkpoint. /p4/1/bin/p4d_1 -r /p4/1/root -jr -z /p4/1/checkpoints/p4_1.ckp.####.gz . To replay the transactions that occurred after the checkpoint was created, issue the following command: /p4/1/bin/p4d_1 -r /p4/1/root -jr /p4/1/logs/journal [arabic, start=6] . Restart your Perforce server. If the Perforce service starts without errors, delete the old database files from `/p4/instance/root/save`. If problems are reported when you attempt to recover from the most recent checkpoint, try recovering from the preceding checkpoint and journal. If you are successful, replay the subsequent journal. If the journals are corrupted, contact mailto:support-helix-core@perforce.com[Perforce Technical Support]. For full details about backup and recovery, refer to the https://www.perforce.com/perforce/doc.current/manuals/p4sag/Content/P4SAG/chapter.backup.html[Perforce System Administrator's Guide]. ==== Recovering a replica from a checkpoint This is very similar to creating a replica in the first place as described above. If you have been running the replica crontab commands as suggested, then you will have the latest checkpoints from the master already copied across to the replica through the use of <<_sync_replica_sh>>. See the steps in the script <<_sync_replica_sh>> for details (note that it deletes the state and rdb.lbr files from the replica root directory so that the replica starts replicating from the start of a journal). Remember to ensure you have logged the service user in to the master server (and that the ticket is stored in the correct location as described when setting up the replica). ==== Recovering from a tape backup This section describes how to recover from a tape or other offline backup to a new server machine if the server machine fails. The tape backup for the server is made from the hxdepots volume. The new server machine must have the same volume layout and user/group settings as the original server. In other words, the new server must be as identical as possible to the server that failed. To recover from a tape backup, perform the following steps (assuming instance `1`): [arabic] . Recover the hxdepots volume from your backup tape. . Create the `/p4` convenience directory on the OS volume. . Create the directories `/hxmetadata/p4/1/db1/save` and `/hxmetadata/p4/1/offline_db`. . Create the directories `/hxmetadata/p4/1/db2/save` and `/hxmetadata/p4/2/offline_db`. . Change ownership of these directories to the OS account that runs the Perforce processes. . Switch to the Perforce OS account, and create a link in the `/p4` directory to `/hxdepots/p4/1`. . Create a link in the `/p4` directory to `/hxdepots/p4/common`. . As a super-user, reinstall and enable the Systemd service files or or SysV init scripts. . Find the last available checkpoint, under `/p4/1/checkpoints` . Recover the latest checkpoint by running: /p4/1/bin/p4d_1 -r /p4/1/root -jr -z . Recover the checkpoint to the offline_db directory (assuming instance 1): /p4/1/bin/p4d_1 -r /p4/1/offline_db -jr -z . Reinstall the Perforce server license to the server root directory. . Start the perforce service by running 1/p4/1/bin/p4d_1_init start` . Verify that the server instance is running. . Reinstall the server crontab or scheduled tasks. . Perform any other initial server machine configuration. . Verify the database and versioned files by running the `p4verify.sh` script. Note that files using the https://www.perforce.com/manuals/cmdref/Content/CmdRef/file.types.synopsis.modifiers.html[+k] file type modifier might be reported as BAD! after being moved. Contact Perforce Technical Support for assistance in determining if these files are actually corrupt. ==== Failover to a replicated standby machine See link:SDP_Failover_Guide.pdf[SDP Failover Guide (PDF)] or link:SDP_Failover_Guide.html[SDP Failover Guide (HTML)] for detailed steps. == Upgrades This section describes both upgrades of the SDP itself, as well as upgrades of Helix software such as p4d, p4broker, p4p, and the the p4 command line client in the SDP structure. === Upgrade Order: SDP first, then Helix P4D The SDP should normally be upgraded prior to the upgrade of Helix Core (P4D). If you are upgrading P4D to or beyond P4D 2019.1 from a prior version of P4D, you __must__ upgrade the SDP first. If you run multiple instances of P4D on a given machine (potentially each running different versions of P4D), upgrade the SDP first before upgrading any of the instances. The SDP should also be upgraded before upgrading other Helix software on machines using the SDP, including p4d, p4p, p4broker, and p4 (the command line client). Upgrading a Helix Core server instance in the SDP framework is a simple process involving a few steps. === SDP and P4D Version Compatibility Starting with the SDP 2020.1 release, the released versions of SDP match the released versions of P4D. So SDP r20.1 is guaranteed to work with P4D r20.1. In addition, the xref:ReleaseNotes.adoc[SDP Release Notes] clarify all the specific versions of P4D supported. The SDP is often forward- and backward-compatible with P4D versions, but for best results they should be kept in sync by upgrading SDP before P4D. This is partly because the SDP contains logic that helps upgrade P4D, which can change as P4D evolves (most recently for 2019.1). The SDP is aware of the P4D version, and has backward-compatibility logic to support older versions of P4D. This is guaranteed for supported versions of P4D. Backward compatibility of SDP with older versions of P4D may extend farther back, though without the "officially supported" guarantee. === Upgrading the SDP Starting with this SDP 2021.1 release, upgrades of the SDP from 2020.1 and later use a new mechanism. The SDP upgrade procedure starting from 2020.1 and later uses the `sdp_upgrade.sh` script. Some highlights of the new upgrade mechanism: * *Automated*: Upgrades from SDP 2020.1 are automated with `sdp_upgrade.sh` provided with each new version of the SDP. * *Continuous*: Each new SDP version, starting from SDP 2021.1, will maintain the capability to upgrade from all prior versions, so long as the starting version is SDP 2020.1 or later. * *Independent*: SDP upgrades will enable upgrades to new Helix Core versions, but will not directly cause Helix Core upgrades to occur immediately. Each Helix Core instance can be upgraded independently of the SDP on its own schedule. ==== Sample SDP Upgrade Procedure For complete information, see: <<_sdp_upgrade_sh>>. A basic set of commands is: cd /hxdepots [[ -d downloads ]] || mkdir downloads cd downloads [[ -d new ]] && mv new old.$(date +'%Y%m%d-%H%M%S') [[ -e sdp.Unix.tgz ]] && mv sdp.Unix.tgz sdp.Unix.old.$(date +'%Y%m%d-%H%M%S') curl -L -s -O https://swarm.workshop.perforce.com/projects/perforce-software-sdp/download/downloads/sdp.Unix.tgz ls -l sdp.Unix.tgz mkdir new cd new tar -xzf ../sdp.Unix.tgz After extracting the SDP tarball, cd to the directory where the `sdp_ugprade.sh` script resides, and execute it from there: cd /hxdepots/downloads/new/sdp/Server/Unix/p4/common/sdp_upgrade ./sdp_upgrade.sh -man TIP: If the `curl` command cannot be used (perhaps due to lack of outbound internet access), replace that step with some other means of acquiring the SDP tarball such that it lands as `/hxdepots/downloads/sdp.Unix.tgz`, and then proceed from that point forward. .What if there is no `/hxdepots` ? **** If the existing SDP does not have a `/hxdepots` directory, find the correct value with this command: bash -c 'cd /p4/common; d=$(pwd -P); echo ${d%/p4/common}' This can be run from any shell (bash, tcsh, zsh, etc.) **** ==== SDP Legacy Upgrade Procedure If your current SDP is older than the 2020.1 release, see the xref:SDP_Legacy_Upgrades.Unix.adoc[SDP Legacy Upgrade Guide (for Unix)] for information on upgrading SDP to SDP 2020.1 from any prior version (dating back to 2007). === Upgrading Helix Software with the SDP The following outlines the procedure for upgrading Helix binaries using the SDP scripts. ==== Get Latest Helix Binaries Acquire the latest Perforce Helix binaries to stage them for upgrade using the <<_get_helix_binaries_sh>> script. If you have multiple server machines with SDP, staging can be done with this script on one machine first, and then the `/hxdepots/sdp/helix_binaries` folder can be rsync'd to other machines. Alternately, this script can be run on each machine, but as patches can be released at any time, running it once and then distributing the helix_binaries directory internally via rsync is preferred to ensure all machines at your site deploy with the same binary versions. See <<_get_helix_binaries_sh>> ==== Upgrade Each Instance Use the SDP `upgrade.sh` script to upgrade each instance of Helix on the current machine, using the staged binaries. The upgrade process handles all aspects of upgrading, including adjusting the database structure, executing commands to upgrade the p4d database schema, and managing the SDP symlinks in `/p4/common/bin`. Instances can be upgraded independently of each other. See <<_upgrade_sh>>. ==== Global Topology Upgrades - Outer to Inner For any given instance, be aware of the Helix topology when performing upgrades, specifically whether that instance has replicas and/or edge servers. When replicas and edge servers exist (and are active), the order in which `upgrade.sh` must be run on different server machines matters. Perform upgrades following an "outer to inner" strategy. For example, say for SDP instance 1, your site has the following server machines: * bos-helix-01 - The master (in Boston, USA) * bos-helix-02 - Replica of master (in Boston, USA) * nyc-helix-03 - Replica of master (in New York, USA) * syd-helix-04 - Edge Server (in Sydney, AU) * syd-helix-05 - Replica of Sydney edge (in Sydney) Envision the above topology with the master server in the center, and two concentric circles. The Replica of the Sydney edge would be done first, as it is by itself in the outermost circle. The Edge server and two Replicas of the master are all at the next inner circle. So bos-helix-02, nyc-helix-03, and syd-helix-04 could be upgraded in any order with respect to each other, or even simultaneously, as they are in the same circle. The master is the innermost, and would be upgraded last. A few standards need to be in place to make this super easy: * The `perforce` operating system user would have properly configured SSH keys to allow passwordless ssh from the master to all other servers. * The `perforce` user shell environment (~/.bash_profile and ~/.bashrc) ensured that the SDP shell environment automatically sourced The Helix global topology upgrade could be done something like, starting as perforce@bos-helix-01: cd /p4/sdp/helix_binaries ./get_helix_binaries.sh rsync -a /p4/sdp/helix_binaries/ syd-helix-05:/p4/sdp/helix_binaries rsync -a /p4/sdp/helix_binaries/ syd-helix-04:/p4/sdp/helix_binaries rsync -a /p4/sdp/helix_binaries/ nyc-helix-03:/p4/sdp/helix_binaries rsync -a /p4/sdp/helix_binaries/ bos-helix-02:/p4/sdp/helix_binaries Then do a preview of the upgrade on all machines, in outer-to-inner order: ssh syd-helix-05 upgrade.sh ssh syd-helix-04 upgrade.sh ssh nyc-helix-03 upgrade.sh ssh bos-helix-02 upgrade.sh ssh bos-helix-01 upgrade.sh On each machine, check for a message in the output that contains `Success: Finished`. If that looks good, then proceed to execute the actual upgrades: ssh syd-helix-05 upgrade.sh -y ssh syd-helix-04 upgrade.sh -y ssh nyc-helix-03 upgrade.sh -y ssh bos-helix-02 upgrade.sh -y ssh bos-helix-01 upgrade.sh -y As with the preview, check for a message in the output that contains `Success: Finished`. === Database Modifications Occasionally modifications are made to the Perforce database from one release to another. For example, server upgrades and some recovery procedures modify the database. When upgrading the server, replaying a journal patch, or performing any activity that modifies the db.* files, you must ensure that the offline checkpoint process is functioning correctly so that the files in the offline_db directory match the ones in the live server directory. Normally upgrades to the offline_db after a P4D upgrade will be applied by rotating the journal in the normal way, and applying it to the offline_db. In some cases it is necessary to restart the offline checkpoint process and the easiest way to is to run the live_checkpoint script after modifying the db.* files, as follows: /p4/common/bin/live_checkpoint.sh 1 This script makes a new checkpoint of the modified database files in the live `root` directory, then recovers that checkpoint to the `offline_db` directory so that both directories are in sync. This script can also be used anytime to create a checkpoint of the live database. IMPORTANT: Please note the warnings about how long this process may take at <<_live_checkpoint_sh>> This command should be run when an error occurs during offline checkpointing. It restarts the offline checkpoint process from the live database files to bring the offline copy back in sync. If the live checkpoint script fails, contact Perforce Consulting at consulting@perforce.com. == Maximizing Server Performance The following sections provide some guidelines for maximizing the performance of the Perforce Helix Core Server, using tools provided by the SDP. More information on this topic can be found in the https://portal.perforce.com/s/article/2529[Knowledge Base]. === Ensure Transparent Huge Pages (THP) is turned off This is reference https://portal.perforce.com/s/article/3005[KB Article on Platform Notes] There is a (now deprecated) script in the SDP which will do this: /p4/sdp/Server/Unix/setup/os_tweaks.sh It needs to be run as `root` or using `sudo`. This will not persist after system is rebooted - and is thus no longer the recommended option. TIP: We recommend the usage of `tuned` instead of the above, since it will persist after reboots. Install as appropriate for your Linux distribution (so as `root`): yum install tuned or apt-get install tuned . Create a customized `tuned` profile with disabled THP. Create a new directory in `/etc/tuned` directory with desired profile name: mkdir /etc/tuned/nothp_profile . Then create a new `tuned.conf` file for `nothp_profile`, and insert the new tuning info: + ``` cat < /etc/tuned/nothp_profile/tuned.conf [main] include= throughput-performance [vm] transparent_hugepages=never EOF ``` . Make the script executable chmod +x /etc/tuned/nothp_profile/tuned.conf . Enable `nothp_profile` using the `tuned-adm` command. tuned-adm profile nothp_profile . This change will immediately take effect and persist after reboots. To verify if THP are disabled or not, run below command: cat /sys/kernel/mm/transparent_hugepage/enabled always madvise [never] === Putting server.locks directory into RAM The `server.locks` directory is maintained in the $P4ROOT (so `/p4/1/root`) for a running server instance. This directory contains a tree of 0-length files (or 17 byte files in earlier p4d versions) used for lock coordination amongst p4d processes. This directory can be removed every time the p4d instance is restarted, so it is safe to put it into a tmpfs filesystem (which by its nature does not survive a reboot). Even on a large installation with many hundreds or thousands of users, this directory will be unlikely to exceed 64M. The files in this directory are 17 or 0 bytes depending on th p4d version; space is needed for inodes. To do this, first determine if the setting will be global for all p4d servers at your site, or will be determined on a per-server machine basis. If set globally, the per-machine configuration described below MUST be done on all p4d server machines. This should be done in a scheduled maintenance window. For each p4d server machine (**all** server machines if you intend to make this a global setting), do the following as user `root`: . Create a local directory mount point, and change owner/group to `perforce:perforce` (or `$OSUSER` if SDP config specifies a different OS user, and whatever group is used): mkdir /hxserverlocks chown perforce:perforce /hxserverlocks . Add a line to `/etc/fstab` (adjusting appropriately if `$OSUSER` and group are set to something other than `perforce:perforce`): HxServerLocks /hxserverlocks tmpfs uid=perforce,gid=perforce,size=64M,mode=0700 0 0 Note: The `64M` in the above example is suitable for many sites, including large ones. For servers with less available RAM, a smaller value is recommended, but no less than 128K. If multiple SDP instances are operated on the machine, the value must be large enough for all instances. . Mount the storage volume: mount -a . Check it is looking correct and has correct ownership (`perforce` or `$OSUSER`): df -h ls -la /hxserverlocks As user `perforce` (or `$OSUSER`), set the configurable `server.locks.dir`. This will be set in one of two ways, depending on whether it was set globally, or on a per-server machine. First, set the shell environment for your instance: source /p4/common/bin/p4_vars N Replacing `N` with your instance name; `1` by default. To set `server.locks.dir` globally, do: p4 configure set server.locks.dir="/hxserverlocks${P4HOME}/server.locks" e.g. p4 configure set ${SERVERID}#server.locks.dir=/hxserverlocks${P4HOME}/server.locks IMPORTANT: If you set this globally (without `serverid#` prefix), then you must ensure that all server machines running p4d, including replicas end edge servers, have a similarly named directory available (or bad things will happen!) IMPORTANT: Consider failover options. A failover will, by nature, change the ServerID on a given machine. If `server.locks.dir` is set globally, and all machines have the HxServerLocks configuration done as noted above, then the `server.locks.dir` setting is fully accounted for, and will not cause a problem in a failover situaion. If `server.locks.dir` is set on a per-machine basis, then you should ensure that every standby server has the same configuration with respect to `server.locks.dir` and the HxServerLocks filesystem as its target server. So any standby servers replicating from a commit server should have the same configuration as the commit server, and any standby servers replicating from an edge server should have the same configuration as the target edge server. For simplicity, using a global setting should be considered. If you are defining server machine templates (such as an AMI in AWS or with Terraform or similar), the HxServerLoccks configuration can and should be accounted for in the system template. === Installing monitoring packages The `sysstat` and `sos` packages are recommended for helping investigate any performance issues on a server. yum install sysstat sos or apt install sysstat sos Then enable it: systemctl enable --now sysstat The reports are text based, but you can use kSar (https://github.com/vlsi/ksar) to visualize the data. If installed before `sosreport` is run, `sosreport` will include the `sysstat` data. We also recommend `P4prometheus` - https://github.com/perforce/p4prometheus. See link:https://github.com/perforce/p4prometheus/blob/master/INSTALL.md#automated-script-installation[Automated script installer for SDP instances] which makes it easy to install `node_exporter`, `p4prometheus` and monitoring scripts in the `crontab` See an example of link:https://brian-candler.medium.com/interpreting-prometheus-metrics-for-linux-disk-i-o-utilization-4db53dfedcfc[interpreting prometheus metrics] === Optimizing the database files The Perforce Server's database is composed of b-tree files. The server does not fully rebalance and compress them during normal operation. To optimize the files, you must checkpoint and restore the server. This normally only needs to be done very few months. To minimize the size of back up files and maximize server performance, minimize the size of the db.have and db.label files. === P4V Performance Settings These are covered in: https://portal.perforce.com/s/article/2878 === Proactive Performance Maintenance This section describes some things that can be done to proactively to enhance scalability and maintain performance. ==== Limiting large requests To prevent large requests from overwhelming the server, you can limit the amount of data and time allowed per query by setting the MaxResults, MaxScanRows and MaxLockTime parameters to the lowest setting that does not interfere with normal daily activities. As a good starting point, set MaxScanRows to MaxResults * 3; set MaxResults to slightly larger than the maximum number of files the users need to be able to sync to do their work; and set MaxLockTime to 30000 milliseconds. These values must be adjusted up as the size of your server and the number of revisions of the files grow. To simplify administration, assign limits to groups rather than individual users. To prevent users from inadvertently accessing large numbers of files, define their client view to be as narrow as possible, considering the requirements of their work. Similarly, limit users' access in the protections table to the smallest number of directories that are required for them to do their job. Finally, keep triggers simple. Complex triggers increase load on the server. ==== Offloading remote syncs For remote users who need to sync large numbers of files, Perforce offers a https://www.perforce.com/perforce/doc.current/manuals/p4sag/Content/P4SAG/chapter.proxy.html[proxy server]. P4P, the Perforce Proxy, is run on a machine that is on the remote users' local network. The Perforce Proxy caches file revisions, serving them to the remote users and diverting that load from the main server. P4P is included in the Windows installer. To launch P4P on Unix machines, copy the `/p4/common/etc/init.d/p4p_1_init script` to `/p4/1/bin/p4p_1_init`. Then review and customize the script to specify your server volume names and directories. P4P does not require special hardware but it can be quite CPU intensive if it is working with binary files, which are CPU-intensive to attempt to compress. It doesn't need to be backed up. If the P4P instance isn't working, users can switch their port back to the main server and continue working until the instance of P4P is fixed. == Tools and Scripts This section describes the various scripts and files provided as part of the SDP package. === General SDP Usage This section presents an overview of the SDP scripts and tools, with details covered in subsequent sections. ==== Linux Most scripts and tools reside in `/p4/common/bin`. The `/p4//bin` directory (e.g. `/p4/1/bin`) contains scripts or links that are specific to that instance such as wrappers for the p4d executable. Older versions of the SDP required you to always run important administrative commands using the `p4master_run` script, and specify fully qualified paths. This script loads environment information from `/p4/common/bin/p4_vars`, the central environment file of the SDP, ensuring a controlled environment. The `p4_vars` file includes instance specific environment data from `/p4/common/config/p4_**_instance_.**vars` e.g. `/p4/common/config/p4_1.vars`. The `p4master_run script` is still used when running p4 commands against the server unless you set up your environment first by sourcing p4_vars with the instance as a parameter (for bash shell: `source /p4/common/bin/p4_vars 1`). Administrative scripts, such as `daily_checkpoint.sh`, no longer need to be called with `p4master_run` however, they just need you to pass the instance number to them as a parameter. When invoking a Perforce command directly on the server machine, use the p4_**__instance__** wrapper that is located in `/p4/**_instance_**/bin`. This wrapper invokes the correct version of the p4 client for the instance. The use of these wrappers enables easy upgrades, because the wrapper is a link to the correct version of the p4 client. There is a similar wrapper for the p4d executable, called p4d_**__instance__**. NOTE: This wrapper is important to handle case sensitivity in a consistent manner, e.g. when running a Unix server in case-insensitive mode. If you just execute `p4d` directly when it should be case-insensitive, then you may cause problems, or commands will fail. Below are some usage examples for instance 1. [cols=",",options="header",] |=== |_Example_ |_Remarks_ |`/p4/common/bin/p4master_run 1 /p4/1/bin/p4_1 admin stop` |Run `p4 admin stop` on instance 1 |`/p4/common/bin/live_checkpoint.sh 1` |Take a checkpoint of the live database on instance 1 |`/p4/common/bin/p4login 1` |Log in as the perforce user (superuser) on instance 1. |=== Some maintenance scripts can be run from any client workspace, if the user has administrative access to Perforce. ==== Monitoring SDP activities The important SDP maintenance and backup scripts generate email notifications when they complete. For further monitoring, you can consider options such as: * Making the SDP log files available via a password protected HTTP server. * Directing the SDP notification emails to an automated system that interprets the logs. === Upgrade Scripts ==== get_helix_binaries.sh [source] .Usage ---- include::gen/get_helix_binaries.sh.man.txt[] ---- ==== upgrade.sh The `upgrade.sh` script is used to upgrade `p4d` and other Perforce Helix binaries on a given server machine. The links for different versions of `p4d` are described in <<_p4d_versions_and_links>> [source] .Usage ---- include::gen/upgrade.sh.man.txt[] ---- ==== sdp_upgrade.sh This script will perform an upgrade of the SDP itself - see <<_upgrading_the_sdp>> [source] .Usage ---- include::gen/sdp_upgrade.sh.man.txt[] ---- === Legacy Upgrade Scripts ==== clear_depot_Map_fields.sh The `clear_depot_Map_fields.sh` script is used when upgrading to SDP from versions earlier than SDP 2020.1. Its usage is discussed in link:SDP_Legacy_Upgrades.Unix.html[SDP Legacy Upgrade Guide (for Unix)]. [source] .Usage ---- include::gen/clear_depot_Map_fields.sh.man.txt[] ---- === Core Scripts The core SDP scripts are those related to checkpoints and other scheduled operations, and all run from `/p4/common/bin`. If you `source /p4/common/bin/p4_vars ` then the `/p4/common/bin` directory will be added to your $PATH. ==== p4_vars The `/p4/common/bin/p4_vars` defines the SDP shell environment, as required by the Perforce Helix server process. This script uses a specified instance number as a basis for setting environment variables. It will look for and open the respective p4_.vars file (see next section). This script also sets server logging options and configurables. It is intended to be used by other scripts for common environment settings, and also by users for setting the environment of their Bash shell. .Usage source /p4/common/bin/p4_vars 1 See also: <<_setting_your_login_environment_for_convenience>> ==== p4_.vars Defines the environment variables for a specific instance, including P4PORT etc. This script is called by <<_p4_vars>> - it is not intended to be called directly by a user. For instance `1`: p4_1.vars For instance `art`: p4_art.vars Occasionally you may need to edit this script to update variables such as `P4MASTERHOST` or similar. *Location*: /p4/common/config ==== p4master_run The `/p4/common/bin/p4master_run` is a wrapper script to other SDP scripts. This ensures that the shell environment is loaded from `p4_vars` before executing the script. It provides a '-c' flag for silent operation, used in many crontab so that email is sent from the scripts themselves. This is especially useful for calling scripts that do not set their own shell environment, such as Python or Perl scripts. Historically it was used as a wrapper for all SDP scripts. TIP: Many of the bash shell scripts in the SDP set their own environment (by doing `source /p4/common/bin/p4_vars N` for their instance); those bash shell scripts do *not* need to be called with the `p4master_run` wrapper. ==== daily_checkpoint.sh The `/p4/common/bin/daily_checkpoint.sh` script configured by default to run six days a week using crontab. The script: * truncates the journal * replays it into the `offline_db` directory * creates a new checkpoint from the resulting database files * recreates the `offline_db` database from the new checkpoint. This procedure rebalances and compresses the database files in the `offline_db` directory. These can be rotated into the live (`root`) database, by the script <<_refresh_p4root_from_offline_db_sh>> .Usage /p4/common/bin/daily_checkpoint.sh /p4/common/bin/daily_checkpoint.sh 1 ==== keep_offline_db_current.sh The `/p4/common/bin/keep_offline_db_current.sh` script is for use only on a standby replica. It will not run on any other type of replica. This script ensures the offline_db has the most current journals replayed. It is intended for use on standby replicas as an alternative to sync_replica.sh or replica_cleanup.sh. It is ideal for use in an environment where the checkpoints folder of the P4TARGET server is shared (e.g. via NFS) with this server. This script does NOT do full checkpoint operations, and requires that the offline_db be in a good state before starting -- this is verified with a call to verify_sdp.sh. This uses checkpoint.log as its primary log. It is only intended for use on a machine where other scripts that update checkpoint.log don't run, e.g. daily_checkpoint.sh, live_checkpoint.sh, or rotate_journal.sh. .Usage /p4/common/bin/keep_offline_db_current.sh /p4/common/bin/keep_offline_db_current.sh 1 ==== recreate_offline_db.sh The `/p4/common/bin/recreate_offline_db.sh` recovers the offline_db database from the latest checkpoint and replays any journals since then. If you have a problem with the offline database then it is worth running this script first before running <<_live_checkpoint_sh>>, as the latter will stop the server while it is running, which can take hours for a large installation. Run this script if an error occurs while replaying a journal during daily checkpoint process. This script recreates offline_db files from the latest checkpoint. If it fails, then check to see if the most recent checkpoint in the `/p4//checkpoints` directory is bad (ie doesn't look like the right size compared to the others), and if so, delete it and rerun this script. If the error you are getting is that the journal replay failed, then the only option may be to run <<_live_checkpoint_sh>> script. .Usage /p4/common/bin/recreate_offline_db.sh /p4/common/bin/recreate_offline_db.sh 1 ==== live_checkpoint.sh The `/p4/common/bin/live_checkpoint.sh` is used to initialize the SDP `offline_db`. It must be run once, typically manually during initial installation, before any other scripts that rely on the `offline_db` can be used, such as `daily_checkpoint.sh` and `rotate_journal.sh`. This script can also be used in some cases to repair the `offline_db` if it has has become corrupt, e.g. due to a sudden power loss while checkpoint processing was running. IMPORTANT: Be aware this script locks the live database for the duration of the checkpoint which can take hours for a large installation (please check the `/p4/1/logs/checkpoint.log` for the most recent output of `daily_checkpoint.sh` to see how long checkpoints take to create/restore). Note that when a `live_checkpoint.sh` runs, the server will be unresponsive to users for a time. On a new installation this "hang time" will be imperceptible, but over time it can grow to minutes and eventually hours. The idea is that `live_checkpoint.sh` should be used only very sparingly, and is not scheduled as a routine operation. This performs the following actions: * Does a journal rotation, so the active P4JOURNAL file becomes numbered. * Creates a checkpoint from the live database db.* files in the P4ROOT. * Recovers the `offline_db` database from that checkpoint to rebalance and compress the files Run this script when creating the server instance and if an error occurs while replaying a journal during the off-line checkpoint process. .Usage /p4/common/bin/live_checkpoint.sh /p4/common/bin/live_checkpoint.sh 1 ==== p4verify.sh The `/p4/common/bin/p4verify.sh` script verifies the integrity of the 'archive' files, all versioned files in your repository. This script is run by crontab on a regular basis, typically weekly. It verifies https://www.perforce.com/manuals/cmdref/Content/CmdRef/p4_verify.html[both shelves and submitted archive files] Any errors in the log file (e.g. `/p4/1/logs/p4verify.log`) should be handled according to KB articles: * https://portal.perforce.com/s/article/3186[MISSING! errors from p4 verify] * https://portal.perforce.com/s/article/2404[BAD! error from p4 verify] If in doubt contact support-helix-core@perforce.com Our recommendation is that you should expect this to be without error, and you should address errors sooner rather than later. This may involve obliterating unrecoverable errors. NOTE: when run on replicas, this will also append the `-t` flag to the `p4 verify` command to ensure that MISSING files are scheduled for transfer. This is useful to keep replicas (including edge servers) up-to-date. .Usage /p4/common/bin/p4verify.sh /p4/common/bin/p4verify.sh 1 [source] ---- include::gen/p4verify.sh.man.txt[] ---- ==== p4login The `/p4/common/bin/p4login` script is a convenience wrapper to execute a series of `p4 login` commands, using the administration password configured in `mkdirs.cfg` and subsequently stored in a text file: `/p4/common/config/.p4passwd .p4_.admin`. [source] .Usage ---- include::gen/p4login.man.txt[] ---- ==== p4d__init Starts the Perforce server instance. Can be called directly or as describe in <<_configuring_automatic_service_start_on_boot>> - it is created by `mkdirs.sh` when SDP is installed. IMPORTANT: Do not use directly if you have configured systemctl for systemd Linux distributions such as CentOS 7.x. This risks database corruption if `systemd` does not think the service is running when it actually is running (for example on shutdown systemd will just kill processes without waiting for them). This script sources `/p4/common/bin/p4_vars`, then runs `/p4/common/bin/p4d_base` (<<_p4d_base>>). .Usage /p4//bin/p4d__init [ start | stop | status | restart ] /p4/1/bin/p4d_1_init start ==== refresh_P4ROOT_from_offline_db.sh The `/p4/common/bin/refresh_P4ROOT_from_offline_db.sh` script is intended to be used occasionally, perhaps monthly, quarterly, or on-demand, to help ensure that your live (`root`) database files are defragmented. It will: * stop p4d * truncate/rotate live journal * replay journals to offline_db * switch the links between `root` and `offline_db` * restart p4d It also knows how to do similar processes on edge servers and standby servers or other replicas. .Usage /p4/common/bin/refresh_P4ROOT_from_offline_db.sh /p4/common/bin/refresh_P4ROOT_from_offline_db.sh 1 ==== run_if_master.sh The `/p4/common/bin/run_if_master.sh` script is explained in <<_run_if_masteredgereplica_sh>> ==== run_if_edge.sh The `/p4/common/bin/run_if_edge.sh` script is explained in <<_run_if_masteredgereplica_sh>> ==== run_if_replica.sh The `/p4/common/bin/run_if_replica.sh` script is explained in <<_run_if_masteredgereplica_sh>> ==== run_if_master/edge/replica.sh The SDP uses wrapper scripts in the crontab: `run_if_master.sh`, `run_if_edge.sh`, `run_if_replica.sh`. We suggest you ensure these are working as desired, e.g. .Usage /p4/common/bin/run_if_master.sh 1 echo yes /p4/common/bin/run_if_replica.sh 1 echo yes /p4/common/bin/run_if_edge.sh 1 echo yes It is important to ensure these are returning the valid results for the server machine you are on. Any issues with these scripts are likely configuration issues with `/p4/common/config/p4_1.vars` (for instance `1`) ==== sdp_health_check.sh This script is described in the appendix <<_sdp_health_checks>>. [source] ---- include::gen/sdp_health_check.sh.man.txt[] ---- === More Server Scripts These scripts are helpful components of the SDP that run on the server machine, but are not included in the default crontab schedules. ==== p4.crontab Contains crontab entries to run the server maintenance scripts. *Location*: /p4/sdp/Server/Unix/p4/common/etc/cron.d ==== verify_sdp.sh The `/p4/common/bin/verify_sdp.sh` does basic verification of SDP setup. [source] .Usage ---- include::gen/verify_sdp.sh.man.txt[] ---- === Other Scripts and Files The following table describes other files in the SDP distribution. These files are usually not invoked directly by you; rather, they are invoked by higher-level scripts. ==== backup_functions.sh The `/p4/common/bin/backup_functions.sh` script contains Bash functions used in other SDP scripts. It is *sourced* (`source /p4/common/bin/backup_functions.sh`) by other scripts that use the common shared functions. It is not intended to be called directly by the user. ==== broker_rotate.sh The `/p4/common/bin/broker_rotate.sh` rotates the broker log file. It is intended for use on a server machine that has only broker running. When a broker is run on a p4d server machine, the `daily_checkpoint.sh` take care of rotating the broker log. It can be added to a crontab for e.g. daily log rotation. .Usage /p4/common/bin/broker_rotate.sh /p4/common/bin/broker_rotate.sh 1 ==== ccheck.sh The script `/p4/common/bin/ccheck.sh` script compares configurables against a set of defined best practices. [source] .Usage ---- include::gen/ccheck.sh.man.txt[] ---- ==== edge_dump.sh The `/p4/common/bin/edge_dump.sh` script is designed to create a seed checkpoint for an Edge server. An edge server is naturally filtered, with certain database tables (e.g. db.have) excluded. In addition to implicit filtering, the server spec may specify additional tables to be excluded, e.g. by using the ArchiveDataFilter field of the server spec. The script requires the SDP instance and the edge ServerID. .Usage /p4/common/bin/edge_dump.sh /p4/common/bin/edge_dump.sh 1 p4d_edge_syd It will output the full path of the checkpoint to be copied to the edge server and used with <<_recover_edge_sh>> ==== edge_vars The `/p4/common/bin/edge_vars` file is sourced by scripts that work on edge servers. It sets the correct list db.* files that are edge-specific in the federated architecture. This version is dependent on the version of p4d in use; this script accounts for the P4D version. It is not intended for users to call directly. ==== edge_shelf_replicate.sh The `/p4/common/bin/edge_shelf_replicate.sh` script is intended to be run on an edge server and will ensure that all shelves are replicated to that edge server (by running `p4 print` on them). Only use if directed to by Perforce Support or Perforce Consulting. ==== load_checkpoint.sh The `/p4/common/bin/load_checkpoint.sh` script loads a checkpoint into `root` and `offline_db` for commit/edge/replica instance. IMPORTANT: This script will replace your `/p4//root` database files! *Be careful!* If you want to create db files in `offline_db` then use <<_recreate_offline_db_sh>>. [source] .Usage ---- include::gen/load_checkpoint.sh.man.txt[] ---- ==== gen_default_broker_cfg.sh The `/p4/common/bin/gen_default_broker_cfg.sh` script generates an SDP instance-specific variant of the generic P4Broker config file. Display to standard output. Usage: cd /p4/common/bin gen_default_broker_cfg.sh 1 > /tmp/p4broker.cfg.ToBeReviewed The final p4broker.cfg should end up here: /p4/common/config/p4_${SDP_INSTANCE}.${SERVERID}.broker.cfg ==== journal_watch.sh The `/p4/common/bin/journal_watch.sh` script will check diskspace available to P4JOURNAL and trigger a journal rotation based on specified thresholds. This is useful in case you are in danger of running out of disk space and your rotated journal files are stored on a separate partition than the active journal. This script is using the following external variables: * SDP_INSTANCE - The instance of Perforce that is being backed up. If not set in environment, pass in as argument to script. * P4JOURNALWARN - Amount of space left (K,M,G,%) before min journal space where an email alert is sent * P4JOURNALWARNALERT - Send an alert if warn threshold is reached (true/false, default: false) * P4JOURNALROTATE - Amount of space left (K,M,G,%) before min journal space to trigger a journal rotation * P4OVERRIDEKEEPJNL - Allow script to temporarily override KEEPJNL to retain enough journals to replay against oldest checkpoint (true/false, default: false) .Usage /p4/common/bin/journal_watch.sh .Examples Run from CLI that will warn via email if less than 20% is available and rotate journal when less than 10% is available ./journal_watch.sh 20% TRUE 10% TRUE Cron job that will warn via email if less than 20% is available and rotate journal when less than 10% is available 30 * * * * [ -e /p4/common/bin ] && /p4/common/bin/run_if_master.sh ${INSTANCE} /p4/common/bin/journal_watch.sh ${INSTANCE} 20\% TRUE 10\% TRUE ==== kill_idle.sh The `/p4/common/bin/kill_idle.sh` script runs `p4 monitor terminate` on all processes showing in the output of `p4 monitor show` that are in the IDLE state. .Usage /p4/common/bin/kill_idle.sh /p4/common/bin/kill_idle.sh 1 ==== p4d_base The `/p4/common/bin/p4d_base` script is the script to start/stop/restart the `p4d` instance. It is called by `p4d__init` script (and thus also `systemctl` on systemd Linux distributions). It is not intended to be called by users directly. ==== p4broker_base The `/p4/common/bin/p4broker_base` script is very similar to <<_p4d_base>> but for the `p4broker` service instance. See https://www.perforce.com/manuals/p4dist/Content/P4Dist/chapter.broker.html[p4broker in SysAdmin Guide] ==== p4ftpd_base The `/p4/common/bin/p4ftpd_base` script is very similar to <<_p4d_base>> but for the `p4ftp` service instance. The p4ftp has been deprecated; this may be removed in a future SDP release. This product is very seldom used these days! See https://www.perforce.com/manuals/p4ftp/index.html[P4FTP Installation Guide.] ==== p4p_base The `/p4/common/bin/p4p_base` is very similar to <<_p4d_base>> but for the `p4p` (P4 Proxy) service instance. See https://www.perforce.com/manuals/p4dist/Content/P4Dist/chapter.proxy.html[p4proxy in SysAdmin Guide] ==== p4pcm.pl The `/p4/common/bin/p4pcm.pl` script is a utility to remove files in the proxy cache if the amount of free disk space falls below the low threshold. [source] .Usage ---- include::gen/p4pcm.pl.man.txt[] ---- ==== p4review.py The `/p4/common/bin/p4review.py` script sends out email containing the change descriptions to users who are configured as reviewers for affected files (done by setting the Reviews: field in the user specification). This script is a version of the `p4review.py` script that is available on the Perforce Web site, but has been modified to use the server instance number. It relies on a configuration file in `/p4/common/config`, called `p4_.p4review.cfg`. This is not required if you have installed Swarm which also performs notification functions and is easier for users to configure. .Usage /p4/common/bin/p4review.py # Uses config file as above ==== p4review2.py The `/p4/common/bin/p4review2.py` script is an enhanced version of <<_p4review_py>> . Run p4review2.py --sample-config > p4review.conf . Edit the file p4review.conf . Add a crontab similar to this: * * * * * python2.7 /path/to/p4review2.py -c /path/to/p4review.conf Features: * Prevent multiple copies running concurrently with a simple lock file. * Logging support built-in. * Takes command-line options. * Configurable subject and email templates. * Use P4Python when available and use P4 (the CLI) as a fallback. * Option to send a __single__ email per user per invocation instead of multiple ones. * Reads config from a INI-like file using ConfigParser * Have command line options that overrides environment variables. * Handles unicode-enabled server **and** non-ASCII characters on a non-unicode-enabled server. * Option to opt-in (--opt-in-path) reviews globally (for migration from old review daemon). * Configurable URLs for changes/jobs/users (for swarm). * Able to limit the maximum email message size with a configurable. * SMTP auth and TLS (not SSL) support. * Handles P4AUTH (optional; use of P4AUTH is no longer recommended). ==== proxy_rotate.sh The `/p4/common/bin/proxy_rotate.sh` rotates the proxy log file. It is intended for use on a server machine that has only proxy running. When a proxy is run on a p4d server machine, the `daily_checkpoint.sh` script takes care of rotating the proxy log. It can be added to a crontab for e.g. daily log rotation. .Usage /p4/common/bin/proxy_rotate.sh /p4/common/bin/proxy_rotate.sh 1 ==== p4sanity_check.sh The `/p4/common/bin/p4sanity_check.sh` script is a simple script to run: * p4 set * p4 info * p4 changes -m 10 .Usage /p4/common/bin/p4sanity_check.sh /p4/common/bin/p4sanity_check.sh 1 ==== p4dstate.sh The `/p4/common/bin/p4dstate.sh` is a trouble-shooting script for use when directed by support, e.g. in situations such as server hanging, major locking problems etc. It is an "SDP-aware" version of the https://portal.perforce.com/s/article/15261[standard p4dstate.sh] so that it only requires the SDP instance to be specified as a parameter (since the location of logs etc are defined by SDP). .Usage sudo /p4/common/bin/p4dstate.sh sudo /p4/common/bin/p4dstate.sh 1 ==== ps_functions.sh The `/p4/common/bin/ps_functions.sh` library file contains common functions for using 'ps' to check on process ids. It is not intended to be called by users. get_pids ($exe) .Usage Call with an exe name, e.g. /p4/1/bin/p4web_1 .Examples p4web_pids=$(get_pids $P4WEBBIN) p4broker_pids=$(get_pids $P4BROKERBIN) ==== pull.sh The `/p4/common/bin/pull.sh` is a reference pull trigger implementation for https://portal.perforce.com/s/article/15337[External Archive Transfer using pull-archive and edge-content triggers] It is a fast content transfer mechanism using Aspera (and can be adapted to other similar UDP based products.) An Edge server uses this trigger to pull files from its upstream Commit server. It replaces or augments the built in replication archive pull and is useful in scenarios where there are lots of large (binary) files and commit/edge are geographically distributed with high latency and/or low bandwidth between them. See also companion trigger <<_submit_sh>>. It is based around getting a list of files to copy from commit to edge, then doing the file transfer using `ascp` (Aspera file copy). The configurable `pull.trigger.dir` should be set to a temp folder like `/p4/1/tmp`. Startup commands look like: startup.2=pull -i 1 -u --trigger --batch=1000 The trigger entry for the pull commands looks like this: pull_archive pull-archive pull "/p4/common/bin/triggers/pull.sh %archiveList%" There are some pull trigger options, but the are not necessary with Aspera. Aspera works best if you give it the max batch size of 1000 and set up 1 or more threads. Note, that each thread will use the max bandwidth you specify, so a single pull-trigger thread is probably all you will want. The `ascp` user needs to have ssl public keys set up or export `ASPERA_SCP_PASS`. The `ascp` user should be set up with the target as / with full write access to the volume where the depot files are located. The easiest way to do that is to use the same user that is running the p4d service. TIP: ensure ascp is correctly configured and working in your environment: https://www-01.ibm.com/support/docview.wss?uid=ibm10747281 (search for "ascp connectivity testing") Standard SDP environment is assumed, e.g P4USER, P4PORT, OSUSER, P4BIN, etc. are set, PATH is appropriate, and a super user is logged in with a non-expiring ticket. IMPORTANT: Read the trigger comments for any customization requirements required for your environment. See also the test version of the script: <<_pull_test_sh>> See the `/p4/common/bin/triggers/pull.sh` script for details and to customize for your environment. ==== pull_test.sh The `/p4/common/bin/pull_test.sh` script is a test script. IMPORTANT: THIS IS A TEST SCRIPT - it substitutes for <<_pull_sh>> which uses Aspera's `ascp` and replaces that with Linux standard `scp` utility. **IT IS NOT INTENDED FOR PRODUCTION USE!!!!** If you don't have an Aspera license, then you can test with this script to understand the process. See the `/p4/common/bin/triggers/pull_test.sh` script for details. There is a demonstrator project showing usage: https://github.com/rcowham/p4d-edge-pull-demo ==== purge_revisions.sh The `/p4/common/bin/purge_revisions.sh` script will allow you to archive files and optionally purge files based on a configurable number of days and minimum revisions that you want to keep. This is useful if you want to keep a certain number of days worth of files instead of a specific number of revisions. Note: If you run this script with purge mode disabled, and then enable it after the fact, all previously archived files specified in the configuration file will be purged if the configured criteria is met. Prior to running this script, you may want to disable server locks for archive to reduce impact to end users. See: https://www.perforce.com/perforce/doc.current/manuals/cmdref/Content/CmdRef/configurables.configurables.html#server.locks.archive Parameters: * SDP_INSTANCE - The instance of Perforce that is being backed up. If not set in environment, pass in as argument to script. * P4_ARCHIVE_CONFIG - The location of the config file used to determine retention. If not set in environment, pass in as argument to script. This can be stored on a physical disk or somewhere in perforce. * P4_ARCHIVE_DEPOT - Depot to archive the files in (string) * P4_ARCHIVE_REPORT_MODE - Do not archive revisions; report on which revisions would have been archived (bool - default: true) * P4_ARCHIVE_TEXT - Archive text files (or other revisions stored in delta format, such as files of type binary+D) (bool - default: false) * P4_PURGE_MODE - Enables purging of files after they are archived (bool - default: false) .Config File Format The config file should contain a list of file paths, number of days and minimum of revisions to keep in a tab delimited format. Example: //test/1.txt 10 1 //test/2.txt 1 3 //test/3.txt 10 10 //test/4.txt 30 3 //test/5.txt 30 8 .Usage /p4/common/bin/purge_revisions.sh 4_ARCHIVE_TEXT (Optional)> .Examples Run from CLI that will archive files as defined in the config file ./purge_revisions.sh 1 /p4/common/config/p4_1.p4purge.cfg archive FALSE Cron job that will will archive files as defined in the config file, including text files 30 0 * * * [ -e /p4/common/bin ] && /p4/common/bin/run_if_master.sh ${INSTANCE} /p4/common/bin/purge_revisions.sh $INSTANCE} /p4/common/config/p4_1.p4purge.cfg archive FALSE FALSE ==== recover_edge.sh The `/p4/common/bin/recover_edge.sh` script is designed to rebuild an Edge server from a seed checkpoint from the master while keeping the existing edge specific data. You have to first copy the seed checkpoint from the master, created with <<_edge_dump_sh>>, to the edge server before running this script. (Alternately, a full checkpoint from the master can be used so long as the edge server spec does not specify any filtering, e.g. does not use ArchiveDataFilter.) Then run this script on the Edge server host with the instance number and full path of the master seed checkpoint as parameters. .Usage /p4/common/bin/recover_edge.sh /p4/common/bin/recover_edge.sh 1 /p4/1/checkpoints/p4_1.edge_syd.seed.ckp.9188.gz ==== replica_cleanup.sh The `/p4/common/bin/replica_cleanup.sh` script performs the following actions for a replica: * rotate logs * remove old checkpoints and journals * remove old logs This should be used on replicas for which the `sync_replica.sh` is not used. .Usage /p4/common/bin/replica_cleanup.sh /p4/common/bin/replica_cleanup.sh 1 ==== replica_status.sh The `/p4/common/bin/replica_status.sh` script is regularly run by crontab on a replica or edge (using <<_run_if_replica_sh>>). 0 8 * * * [ -e /p4/common/bin ] && /p4/common/bin/run_if_replica.sh ${INSTANCE} /p4/common/bin/replica_status.sh ${INSTANCE} > /dev/null 0 8 * * * [ -e /p4/common/bin ] && /p4/common/bin/run_if_edge.sh ${INSTANCE} /p4/common/bin/replica_status.sh ${INSTANCE} > /dev/null It performs a `p4 pull -lj` command on the replica to report current replication status, and emails this to the standard SDP administrator email on a daily basis. This is useful for monitoring purposes to detect replica lag or similar problems. If you are using enhanced monitoring such as https://github.com/perforce/p4prometheus[p4prometheus] then this script may not be required. .Usage /p4/common/bin/replica_status.sh /p4/common/bin/replica_status.sh 1 ==== request_replica_checkpoint.sh The `/p4/common/bin/request_replica_checkpoint.sh` script is intended to be run on a standby replica. It essentially just calls 'p4 admin checkpoint -Z' to request a checkpoint and exits. The actual checkpoint is created on the next journal rotation on the master. .Usage /p4/common/bin/request_replica_checkpoint.sh /p4/common/bin/request_replica_checkpoint.sh 1 ==== rotate_journal.sh The `/p4/common/bin/rotate_journal.sh` script is a convenience script to perform the following actions for the specified instance (single parameter): * rotate live journal * replay it to the `offline_db` * rotate logs files according to the settings in `p4_vars` for things like `KEEP_LOGS` It has several use cases: * For sites with large, long-running checkpoints, it can be used to schedule journal rotations to occur more frequently than `daily_checkpoint.sh` is run. * It can be used to trigger checkpoints to run on edge servers. .Usage /p4/common/bin/rotate_journal.sh /p4/common/bin/rotate_journal.sh 1 ==== submit.sh The `/p4/common/bin/submit.sh` script is an example submit trigger for https://portal.perforce.com/s/article/15337[External Archive Transfer using pull-archive and edge-content triggers] This is a reference edge-content trigger for use with an Edge/Commit server topology - the Edge server uses this trigger to transmit files which are being submitted to the Commit instead of using its normal file transfer mechanism. This trigger uses Aspera for fast file transfer, and UDP, rather than TCP and is typically much faster, especially with high latency connections. Companion trigger/script to <<_pull_sh>> Uses `fstat -Ob` with some filtering to generate a list of files to be copied. Create a temp file with the filename pairs expected by ascp, and then perform the copy. This configurable must be set: rpl.submit.nocopy=1 The edge-content trigger looks like this: EdgeSubmit edge-content //... "/p4/common/bin/triggers/ascpSubmit.sh %changelist%" The `ascp` user needs to have ssl public keys set up or export `ASPERA_SCP_PASS`. The `ascp` user should be set up with the target as / with full write access to the volume where the depot files are located. The easiest way to do that is to use the same user that is running the p4d service. TIP: ensure `ascp` is correctly configured and working in your environment: https://www-01.ibm.com/support/docview.wss?uid=ibm10747281 (search for "ascp connectivity testing") Standard SDP environment is assumed, e.g P4USER, P4PORT, OSUSER, P4BIN, etc. are set, PATH is appropriate, and a super user is logged in with a non-expiring ticket. See the test version of this script below: <<_submit_test_sh>> See the `/p4/common/bin/triggers/submit.sh` script for details and to customize for your environment. ==== submit_test.sh The `/p4/common/bin/submit_test.sh` script is a test script. IMPORTANT: THIS IS A TEST SCRIPT - it substitutes for <<_submit_sh>> (which uses Aspera) - and replaces `ascp` with Linux standard `scp`. IT IS NOT INTENDED FOR PRODUCTION USE!!!! If you don't have an Aspera license, then you can test with this script to understand the process. See the `/p4/common/bin/triggers/submit_test.sh` for details. There is a demonstrator project showing usage: https://github.com/rcowham/p4d-edge-pull-demo ==== sync_replica.sh The `/p4/common/bin/sync_replica.sh` script is included in the standard crontab for a replica. It runs `rsync` to mirror the `/p4/1/checkpoints` (assuming instance `1`) directory to the replica machine. It then uses the latest checkpoint in that directory to update the local `offline_db` directory for the replica. This ensures that the replica can be quickly and easily reseeded if required without having to first copy checkpoints locally (which can take hours over slow WAN links). .Usage /p4/common/bin/sync_replica.sh /p4/common/bin/sync_replica.sh 1 ==== templates directory This sub-directory of `/p4/common/bin` contains some files which can be used as templates for new commands if you wish: * template.pl - Perl * template.py - Python * template.py.cfg - config file for python * template.sh - Bash They are not intended to be run directly. ==== update_limits.py The `/p4/common/bin/update_limits.py` script is a Python script which is intended to be called from a crontab entry one per hour. It must be wrapped with the `p4master_run` script. It ensures that all current users are added to the `limits` group. This makes it easy for an administrator to configure global limits on values such as MaxScanRows, MaxSearchResults etc. This can reduce load on a heavily loaded instance. For more information: * https://portal.perforce.com/s/article/2529[Maximizing Perforce Helix Core Performance] * https://portal.perforce.com/s/article/2521[Multiple MaxScanRows and similar values] .Usage /p4/common/bin/update_limits.py /p4/common/bin/update_limits.py 1 == Sample Procedures This section describes sample procedures using the SDP tools described above, given certain scenarios. === Installing Python3 and P4Python Python3 and P4Python are useful for custom automation, including triggers. Installing Python3 and P4Python is best done using packages. First, set up the machine to download packages from Perforce Software, following the guidance appropriate for your platform on the link:https://package.perforce.com[Perforce Packages] page. Then install Python3 and P4Python Packages with the command appropriate for your operating system. For RHEL/Rocky Linux family, use: sudo yum install perforce-p4python3 For the Debian/Ubuntu family, use: sudo apt update sudo apt install perforce-p4python3 It is possible to have multiple versions of Python installed, possibly Python 2.7 (the end of the Python 2 line) and various Python 3.x versions, and possibly multiple versions either or both of Python 2 and Python 3. Whether having multiple versions is desirable or necessary depends on what software on the machine uses Python; that discussion is outside the scope of this document. However, being are of this possibility is important for installing in various existing environments. The behaviors of the `perforce-python3` package install vary slighly depending on what is already installed, and are optimized to avoid disrupting existing software. * If no prior version of Python 3 exists on the machine when the `perforce-p4python3` package is installed, then the newly installed Python 3 will be established as the default, such that calling `python3` (a symlink) will implicitly refer to the just-installed Python 3 version. **The P4Python module will be available by calling python3**. * If Python 3.8 exists on the machine when the `perforce-p4python3` package is installed, P4Python wil be added to the existing Python 3.8 install. **The P4Python module will be available by calling python3**. * If there is already some other version of Python 3.x installed but not 3.8, such as Python 3.6, installing the `perforce-p4python3` package will add a new Python 3.8 installation with the version of Python 3 it uses (e.g. `python3.8`), but it will *not* adjust the existing `python3` symlink. **The P4Python module will *not* P4Python module available with python3.** You can at that point decide to manually adjust the `python3` symlink to point to `python3.8`, though this has some risk of breaking other things (such as custom triggers) that require the other version of Python3 if it was actively used. Alternately, you can adjust the shebang lines of specific scripts that use P4Python to refer to `python3.8` specifically rather than just `python3`. In any case, avoid using `python2` or just `python`, both of which by convention refer to Python 2. === Installing CheckCaseTrigger.py This trigger is very useful to avoid people accidentally checking in files on a case-sensitive server which only differ in case from an existing file (or directory). IMPORTANT: This trigger requires `python3`, and must also have P4Python installed. See: <<_installing_python3_and_p4python>>. The trigger to install is part of the SDP but by default is in `/p4/sdp/Unsupported/Samples/triggers`. To install: . Install p4python. See: <<_installing_python3_and_p4python>>. . Copy the trigger and dependencies to approprpiate directory mkdir -p /p4/common/site/bin/triggers cp /p4/sdp/Unsupported/Samples/triggers/CheckCaseTrigger.py /p4/common/site/bin/triggers/ cp /p4/sdp/Unsupported/Samples/triggers/P4Trigger.py /p4/common/site/bin/triggers/ + . Edit the `shebang` line (first line) at the start of the trigger if necessary, e.g. change to: #!/bin/env python3 Usually `python3` is appropriate. . Test on an existing (small) changelist: p4 changes -s submitted -m 9 + pick a suitable changelist number, e.g. 1234 /p4/common/site/bin/triggers/CheckCaseTrigger.py 1234 . Test that it works .. Add appropriate line to triggers table: CheckCaseTrigger submit-change //test/... "/p4/common/site/bin/triggers/CheckCaseTrigger.py %changelist%" .. Create test workspace .. Submit simple `Test.txt` .. Attempt to submit `test.txt` and check for error . Change triggers table to valid version/path: CheckCaseTrigger submit-change //... "/p4/common/site/bin/triggers/CheckCaseTrigger.py %changelist%" === Swarm JIRA Link Here is an example of linking to cloud JIRA in `config.php`: 'jira' => array( 'host' => 'https://example.atlassian.net/', 'user' => 'p4jira@example.com', 'password' => '', 'link_to_jobs' => 'true', ), TIP: No need to get complicated with .pem files or 'http_client_options' section. Just specify `https://` prefix as above. Login to user account on Atlassian URL as above, and then create an API token by going to this URL: https://id.atlassian.com/manage-profile/security/api-tokens This curl request tested the API: curl https://example.atlassian.net/rest/api/latest/project --user p4jira@example.com: The above should list all active projects: .Example JSON response [source,json] ---- {"expand":"description,lead,issueTypes,url,projectKeys,permissions,insight","self":"https://example.atlassian.net/rest/api/2/project/11904","id":"11904","key":"ULG","name":"Ultimate Game"} ---- IMPORTANT: Check that the provided JIRA account has access to all required projects to be linked (and that it isn't missing some)! See below. .Example list of projects accessible to JIRA account [source,shell] ---- $ curl --user 'p4jira@example.com:' https://example.atlassian.net/rest/api/latest/project | jq > projects.txt $ egrep "name|key" projects.txt egrep "name|key" projects.txt "key": "PRJA", "name": "Project A", "key": "PRJB", "name": "Project B", ---- === Reseeding an Edge Server Perforce Helix Edge Servers are a form of replica that replicates "persistent history" data such as submitted changelists from the master server, while maintaining local databases for "work-in-progress" data, to include user workspaces, lists of files checked out in user workspaces, etc. This separation of persistent and work-in-progress data has significant benefits that make edge servers perform optimally for certain use cases. When a new edge server is deployed for the first time, it is "seeded" with a special seed checkpoint from the master server. This is done using the SDP `edge_dump.sh` script. Edge servers need to be reseeded in certain circumstances. When an edge server is reseeded, the latest persistent history from the master server is combined with the latest work-in-progress data from the edge server. Some occasions that require reseeding include: * When changing the scope of replication filtering, i.e. if the `*DataFilter` fields of the server spec are changed. * In some recovery situations involving hardware or other infrastructure failure. * When advised by Perforce Support. An article link:https://portal.perforce.com/s/article/12127[Edge Server Metadata Recovery] discusses the manual process in detail. The process outlined in this article is implemented in the SDP with two scripts, `edge_dump.sh` and `recover_edge.sh`. Key aspects of this implementation: * No downtime is required for the master server process. * Downtime for the edge to be reseeded is required. This is kept to a minimum. === Edge Reseed Scenario In this sample scenario, an edge server needs to be reseeded. Sample details about this scenario: * The SDP instance is `1`. * The `perforce` operating system runs the p4d process on all machines. * The `perforce` user's `~/.bashrc` ensures that the shell environment is set automatically on login, by doing: `source /p4/common/bin/p4_vars 1` * The master server has a ServerID of `master.1` and runs on the machine `bos-helix-01`. * The edge server has a ServerID of `p4d_edge_syd` and runs on the machine `syd-helix-04`. * Both the master and edge server are online and actively in use at the start of processing. * Users of the edge server to be reseeded have been notified about a planned outage. * No outage is planned or necessary for the master server * SSH keys are setup for the `perforce` user. ==== Step 0: Preflight Checks Make sure the start state is healthy. As `perforce@bos-helix-01` (the master): verify_sdp.sh 1 -online As `perforce@syd-helix-04` (the edge): verify_sdp.sh 1 ==== Step 1: Create New Edge Seed Checkpoint On the master server, create a new edge seed checkpoint using `edge_dump.sh`. This will contain recent persistent history from the master. This process uses the `offline_db` rather than P4ROOT, so no downtime is needed. TIP: Creating an edge seed requires that the `offline_db` directory not be interfered with. The `daily_checkpoint.sh` script runs in the crontab of the `perforce` user on the master, and that script must not be run when `edge_dump.sh` runs. Ensure that `edge_dump.sh` is run at a time when it won't conflict with the operation of `daily_checkpoint.sh`. If checkpoints take many hours, consider disabling the crontab for `daily_checkpoint.sh` by commenting it out of the crontab until `edge_dump.sh` completes -- but don't forget to re-enable it afterward! Create the edge seed like so, as `perforce@bos-helix-01` (the master): nohup /p4/common/bin/p4master_run 1 edge_dump.sh 1 p4d_edge_syd < /dev/null > /p4/1/logs/dump.log 2>&1 & Then monitor until completion with: tail -f $(ls -t $LOGS/edge_dump.*.log | head -1) The edge seed will appear as a file looking something like: /p4/1/checkpoints/p4_1.edge_syd.seed.2035.gz /p4/1/checkpoints/p4_1.edge_syd.seed.2035.gz.md5 When the `.md5` file appears, the edge seed checkpoint is complete. Notes: * The `nohup` at the beginning of the command and the `&` at the end ensure this process will continue to run even if the terminal window in which the command was executed disconnects. ==== Step 2: Transfer Edge Seed Transfer the edge seed from the master to the edge like so, as `perforce@bos-helix-01` (the master): scp -p /p4/1/checkpoints/p4_1.edge_syd.seed.2035.gz syd-helix-04:/p4/1/checkpoints/. scp -p /p4/1/checkpoints/p4_1.edge_syd.seed.2035.gz.md5 syd-helix-04:/p4/1/checkpoints/. ==== Step 3: Reseed the Edge Reseed the edge. As `perforce@syd-helix-04` (the edge): nohup /p4/common/bin/run_if_edge.sh 1 recover_edge.sh 1 /p4/1/checkpoints/p4_1.edge_syd.seed.2035.gz < /dev/null > /p4/1/logs/rec.log 2>&1 & Notes: * The `offline_db` of the edge server is removed at the start of processing, but is replaced at the end. * It is safe for the p4d process of the edge server to be up and running when this process starts. It it is up at the start of processing, it will be shutdown by the `recovered_edge.sh`, but not immediately. The script allows the p4d service to remain in use while the edge seed checkpoint from the master is replayed into the `offine_db`. * After the edge seed checkpoint has been replayed, the p4d service is shutdown, and then the process of combining persistent and work-in-progress data commences, the essense of the reseed operation. * After the edge reseed is complete, the p4d process is started. It will then start replcating new data from the master since the time of the edge seed checkpoint creation. The p4d service may hang and be unresponive for several minutes after it is started. If you choose to monitor closely, when a `p4 pull -lj` on the edge indicates it has caught up to the master, the service is safe to use again. * The `recover_edge.sh` script continues to run after the service is back online, as it rebuilds the `offline_db` of the edge server. * On the edge server, the edge server's regular checkpoints land in `/p4/1/checkpionts.edge_syd`. The `/p4/1/checkpoints` folder is used only for holding edge seed checkpoints transferred from the master. * Typically, all steps described in the process are done on the same day. However, it is OK if the `edge_dump.sh`, seed checkpoint transfer, and `recover_edge.sh` with some time lag between the major steps, typically measured in journal rotations or simply days, with incremental impact on the duration of the recovery step, and so long as the edge seed is not so far behind that the master no longer has numbered journals to feed the edge once it starts. TIP: Reseeding requires that the `offline_db` directory not be interfered with. The `daily_checkpoint.sh` script runs in the crontab of the `perforce` user on the edge server, and that script must not be run when `recover_edge.sh` runs. Ensure that `recover_edge.sh` is run at a time when it won't conflict with the operation of `daily_checkpoint.sh`. If checkpoints take many hours, consider disabling the crontab for `daily_checkpoint.sh` by commenting it out of the crontab until `recover_edge.sh` completes -- but don't forget to re-enable it afterward! TIP: This sample procedure does not illustrate using a p4broker service to broadcast a "Down for maintence" message on the edge server. If your SDP installation uses p4brokers on p4d server machines, they can be used to prevent regular users from attempting to access the edge server during the processing of `recover_edge.sh`. This can help prevent users from experiencing a hang, for example, in the time after the edge p4d process starts but before it catches up to the master. [appendix] == SDP Package Contents and Planning The directory structure of the SDP is shown below in Figure 1 - SDP Package Directory Structure. This includes all SDP files, including documentation and sample scripts. A subset of these files are deployed to server machines during the installation process. sdp doc Server (Core SDP Files) Unix setup (Unix-specific setup) p4 common bin (Backup scripts, etc) triggers (Example triggers) config etc cron.d init.d systemd lib test setup (cross platform setup - typemap, configure, etc) test (automated test scripts) Figure 1 - SDP Package Directory Structure === Volume Layout and Server Planning Figure 2: SDP Runtime Structure and Volume Layout, viewed from the top down, displays a Perforce _application_ administrator's view of the system, which shows how to navigate the directory structure to find databases, log files, and versioned files in the depots. Viewed from the bottom up, it displays a Perforce _system_ administrator's view, emphasizing the physical volume where Perforce data is stored. ==== Memory and CPU Make sure the server has enough memory to cache the *db.rev* database file and to prevent the server from paging during user queries. Maximum performance is obtained if the server has enough memory to keep all of the database files in memory. While the p4d process itself is frugal with system resources such as RAM, it benefits from an excess of RAM due to modern operating systems using excess RAM as file I/O cache. This is to the great benefit of p4d, even though the p4d process itself may not be seen as consuming much RAM directly. *Below are some approximate guidelines for* allocating memory. * 1.5 kilobyte of RAM per file revision stored in the server. * 32 MB of RAM per user. INFO: When doing detailed history imports from legacy SCM systems into Perforce, there may be many revisions of files. You want to account for `(total files) x (average number of revisions per file)` rather than simply the total number of files. Use the fastest processors available with the fastest available bus speed. Faster processors are typically more desirable than a greater number of cores and provide better performance since quick bursts of computational speed are more important to Perforce's performance than the number of processors. Have a minimum of two processors so that the offline checkpoint and back up processes do not interfere with your Perforce server. There are log analysis options to diagnose underperforming servers and improve things. Contact Perforce Support/Perforce Consulting for details. ==== Directory Structure Configuration Script for Linux/Unix This script describes the steps performed by the mkdirs.sh script on Linux/Unix platforms. Please review this appendix carefully before running these steps manually. Assuming the three-volume configuration described in the Volume Layout and Hardware section are used, the following directories are created. The following examples are illustrated with "1" as the server instance number. [cols=",",options="header",] |=== |_Directory_ |_Remarks_ |`/p4` |Must be under root (`/`) on the OS volume |`/hxdepots/p4/1/bin` |Files in here are generated by the mkdirs.sh script. |`/hxdepots/p4/1/depots` | |`/hxdepots/p4/1/tmp` | |`/hxdepots/p4/common/config` |Contains p4_.vars file, e.g. `p4_1.vars` |`/hxdepots/p4/common/bin` |Files from `$SDP/Server/Unix/p4/common/bin`. |`/hxdepots/p4/common/etc` |Contains `init.d` and `cron.d`. |`/hxlogs/p4/1/logs/old` | |`/hxmetadata2/p4/1/db2` |Contains offline copy of main server databases (linked by `/p4/1/offline_db`. |`/hxmetadata1/p4/1/db1/save` |Used only during running of `refresh_P4ROOT_from_offline_db.sh` for extra redundancy. |=== Next, `mkdirs.sh` creates the following symlinks in the `/hxdepots/p4/1` directory: [cols=",,",options="header",] |=== |*_Link source_* |*_Link target_* |*_Command_* |`/hxmetadata1/p4/1/db1` |`/p4/1/root` |`ln -s /hxmetadata1/p4/1/root` |`/hxmetadata2/p4/1/db2` |`/p4/1/offline_db` |`ln -s /hxmetadata1/p4/1/offline_db` |`/hxlogs/p4/1/logs` |`/p4/1/logs` |`ln -s /hxlogs/p4/1/logs` |=== Then these symlinks are created in the /p4 directory: [cols=",,",options="header",] |=== |*_Link source_* |*_Link target_* |*_Command_* |`/hxdepots/p4/1` |`/p4/1` |`ln -s /hxdepots/p4/1 /p4/1` |`/hxdepots/p4/common` |`/p4/common` |`ln -s /hxdepots/p4/common /p4/common` |=== Next, `mkdirs.sh` renames the Perforce binaries to include version and build number, and then creates appropriate symlinks. ==== P4D versions and links The versioned binary links in `/p4/common/bin` are as below. For the example of `1` we have: ls -l /p4/1/bin p4d_1 -> /p4/common/bin/p4d_1_bin The structure is shown in this example, illustrating values for two instances, with instance #1 using p4d release 2018.1 and instance #2 using release 2018.2. In /p4/1/bin: p4_1 -> /p4/common/bin/p4_1_bin p4d_1 -> /p4/common/bin/p4d_1_bin In /p4/2/bin: p4_2 -> /p4/common/bin/p4_2 p4d_2 -> /p4/common/bin/p4d_2 In `/p4/common/bin`: p4_1_bin -> p4_2018.1_bin p4_2018.1_bin -> p4_2018.1.685046 p4_2018.1.685046 p4_2_bin -> p4_2018.2_bin p4_2018.2_bin -> p4_2018.2.700949 p4_2018.2.700949 p4d_1_bin -> p4d_2018.1_bin p4d_2018.1_bin -> p4d_2018.1.685046 p4d_2018.1.685046 p4d_2_bin -> p4d_2018.2_bin p4d_2018.2_bin -> p4d_2018.2.700949 p4d_2018.2.700949 The naming of the last comes from: ./p4d_2018.2.700949 -V Rev. P4D/LINUX26X86_64/2018.2/700949 (2019/07/31). So we see the build number `p4d_2018.2.700949` being included in the name of the p4d executable. TIP: Although this link structure may appear quite complex, it is easy to understand, and it allows different instances on the same server host to be running with different patch levels, or indeed different releases. And you can upgrade those instances independently of each other which can be very useful. ==== Case Insensitive P4D on Unix By default `p4d` is case sensitive on Unix for filenames and directory names etc. It is possible and quite common to run your server in case insensitive mode. This is often done when Windows is the main operating system in use on the client host machines. IMPORTANT: In "case insensitive" mode, that means that you should ALWAYS execute `p4d` with the flag `-C1` (or you risk possible table corruption in some circumstances). The SDP achieves this by executing a simple Bash script which (for instance `1`) is `/p4/1/bin/p4d_1` with contents: #!/bin/bash P4D="/p4/common/bin/p4d_1_bin" exec $P4D -C1 "$@" So the above will ensure that `/p4/common/bin/p4d_1_bin` (for instance `1`) is executed with the `-C1` flag. As noted above, for case sensitive servers, `p4d_1` is normally just a link: /p4/1/bin/p4d_1 -> /p4/common/bin/p4d_1_bin Note for an instance `alpha` (not `1`), the file would be `/p4/alpha/bin/p4d_alpha` with contents: #!/bin/bash P4D="/p4/common/bin/p4d_alpha_bin" exec $P4D -C1 "$@" [appendix] == The journalPrefix Standard The Perforce Helix configurable https://www.perforce.com/manuals/cmdref/Content/CmdRef/configurables.configurables.html#journalPrefix[`journalPrefix`] determines where the active journal is rotated to when it becomes a numbered journal file during the journal rotation process. It also defines where checkpoints are created. In the SDP structure, the `journalPrefix` is set so that numbered journals and checkpoints land on the `/hxdepots` volume. This volume contains critical digital assets that should be reliably backed up and should have sufficient storage for large digital assets such as checkpoints. === SDP Scripts that set `journalPrefix` The SDP `configure_new_server.sh`, which applies SDP standards to fresh new `p4d` servers, sets the `journalPrefix` for the master server according to this standard. The SDP `mkrep.sh` script, which creates new replicas, sets `journalPrefix for replicas according to this standard. The SDP `mkdirs.sh` script, which initializes the SDP structure, creates a directory structure for checkpoints based on the journalPrefix. === First Form of `journalPrefix` Value The first form of the `journalPrefix` value applies to the master server's metadata set. This value is of this form, where `N` is replaced with the SDP instance name: /p4/N/checkpoints/p4_N If the SDP instance name is the default `1`, then files with a `p4_1` prefix would be stored in the `/p4/1/checkpoints` directory on the filesystem. Journal files in that directory would have names like `p4_1.jnl.320` and checkpoints would have names like `p4_1.ckp.320.gz`. This `journalPrefix` value and the corresponding `/p4/1/checkpoints` directory should be used for the master server. It should also be used for any replica that is a valid failover target for the master server. This includes all _completely unfiltered_ replicas of the master, such as `standby` and `forwarding-standby` replicas with a `P4TARGET` value referencing the master server. NOTE: A `standby` replica, also referred to as a `journalcopy` replica due to the underlying replication mechanisms, cannot be filtered. Standby replicas are commonly deployed for High Availability (HA) and Disaster Recovery (DR) purposes. ==== Detail on "Completely Unfiltered" A "completely unfiltered" replica is one in which: * None of the `*DataFilter` fields in the replica's server spec are used * The `p4 pull` command configured to pull metadata from the the replica's `P4TARGET` server, as defined in the replica's `startup._N_` configurable, does not use filtering options such as `-T`. * The replica is not an Edge server (i.e. one with a `Services` value in the server spec of `edge-server`.) Edge servers are filtered by their vary nature, as they exclude various database tables from being replicated. * The replica's seed checkpoint was created without the `-P _ServerID_` flag to `p4d`. The `-P` flag is used when creating seed checkpoints for filtered replicas and edge servers. * The replicas `P4TARGET` server references something other than the master server, such as an edge server. === Second Form of `journalPrefix` Value A second form of the `journalPrefix` is used when the replica is filtered, including edge servers. The second form of the `journalPrefix` value incorporates a shortened form of the _ServerID_ to indicate that the data set is specific to that _ServerID_. Because the metadata differs from the master, checkpoints for edge servers and filtered replicas are stored in a different directory, and use a prefix that identifies them as separate and divergent from the master's data set. This second form allows checkpoints from multiple edge servers or filtered replicas to be stored on an shared (e.g. NFS-mounted) `/hxdepots` volume. The second form of journalPrefix is also used if the `/hxdepots` volume, on which checkpoints are stored, is shared (as indicated when the replicas `lbr.replication` value is set to a value of `shared`). NOTE: Filtered replicas are a strict subset of the master server's metadata. Edge servers filter some database tables from the master, but also have their own independent metadata (mainly workspace metadata) that varies from the master server and is potentially larger than the master's data set for some tables. The "shortened form" of the _ServerID_ removes the `p4d_` prefix (per <<_server_spec_naming_standard>>). So, for example an edge server with a _ServerID_` of `p4d_edge_uk` would use just the `edge_uk` portion of the _ServerID_ in the `journalPrefix`, which would look like: /p4/N/checkpoints.edge_uk/p4_N.edge_uk If the SDP instance name is the default `1`, then files with a `p4_1.edge_uk` prefix would be stored in the `/p4/1/checkpoints.edge_uk` directory on the filesystem. Journal files in that directory would have names like `p4_1.edge_uk.320.jnl` and checkpoints would have names like `p4_1.edge_uk.320.ckp.gz`. === Scripts for Maintaining the `offline_db` The following SDP scripts help maintain the `offline_db`: * `daily_checkpoint.sh`: The `daily_checkpoint.sh` is used on the master server. When run on the master server, this script rotates the active journal to a numbered journal file, and then maintains the master's `offline_db` using the numbered journal file immediately after it is rotated. The `daily_checkpoint.sh` is also used on edge servers and filtered replicas. When run on edge servers and filtered replicas, this script maintains the replica's `offline_db` in a manner similar to the master, except that the journal rotation is skipped (as that can be done only on the master). * `sync_replica.sh`: The SDP `sync_replica.sh` script is intended to be deployed on unfiltered replicas of the master. It maintains the `offline_db` by copying (via rsync) the checkpoints from the master, and then replays those checkpoints to the local `offline_db`. This keeps the `offline_db` of the replica current, which is good to have should the replica ever need to take over for the master. INFO: For HA/DR and any purpose where replicas are not filtered, replicas of type `standby` and `forwarding-standby` should displace replicas of type `replica` and `forwarding-replica`. === SDP Structure and `journalPrefix` On every server machine with the SDP structure where a `p4d` service runs (excluding broker-only and proxy-only hosts), a structure like the following should exist for each instance: * A `/hxdepots/p4/N/checkpoints` directory * In `/p4/N`, and symlink `checkpoints` that links to `/hxdepots/p4/N/checkpoints`, such that it can be referred to as `/p4/N/checkpoints`. In addition, edge servers and filtered replicas will also have a structure like the following for each instance that runs an edge server or filtered replica: * A `/hxdepots/p4/N/checkpoints.ShortServerID` directory * In `/p4/N`, and symlink `checkpoints.ShortServerID` that links to `/hxdepots/p4/N/checkpoints.ShortServerID`, such that it can be referred to as `/p4/N/checkpoints.ShortServerID`. The SDP `mkdirs.sh` script, which sets up the initial SDP structure, initializes this structure on initial install. === Replicas of Edge Servers As edge servers have unique data, they are commonly deployed with their own `standby` replica with a `P4TARGET` value referencing a given edge server rather than the master. This enables faster recovery option for the edge server. As a special case, a `standby` replica of an edge server should have the same `journalPrefix` value as the edge server it targets. Thus, the _ServerID_ baked into the journalPrefix of a replica of an edge is the ServerID of the target edge server, not the replica. So for example, an edge server with a _ServerID_ of `p4d_edge_uk` has a `standby` replica with a _ServerID_ of `p4d_ha_edge_uk`. The journalPrefix of that edge should be the same as the edge server it targets, e.g. /p4/1/checkpoints.edge_uk/p4_1.edge_uk === Goals of the `journalPrefix` Standard Some design of goals this standard: * Make it so the `/p4/N/checkpoints` folder is reserved to mean checkpoints created from the master server's full metadata set. * Make the `/p4/N/checkpoints` folder be safe to rsync from the master to any machine in the topology (as may be needed in certain recovery situations for replicas and edge servers). * Make it so the SDP `/hxdepots` volume can be NFS-mounted across multiple SDP machines safely, such that two or more edge servers (or filtered replicas) could share versioned files, while writing to separate checkpoints directories on a per-ServerID basis. * Support all replication uses cases, including support for 'Workspace Servers', a name referring to a set of edge servers deployed in in the same location, typically sharing `/hxdepots` via NFS. Use of Workspace Servers can be used to scale Helix Core horizontally for massive user bases (typically several thousand users). [appendix] == Server Spec Naming Standard Perforce Helix server specs identify various Helix servers in a topology. Servers can be p4d servers (master, replicas, edges), p4broker, p4p, etc. This standard defines the standard for the server spec names. === General Form The general form of a server spec name is: ``` _[]_ ``` ==== Helix Server Tags The HelixServerTag_ is one of: * `p4d`: for a Helix Core server (including all https://www.perforce.com/perforce/doc.current/manuals/p4sag/Content/P4SAG/deployment-architecture.html[distributed architecture] usages such as master/replica/edge). * `p4broker`: A https://www.perforce.com/perforce/doc.current/manuals/p4sag/Content/P4SAG/chapter.broker.html[Helix Broker] * `p4p`: A https://www.perforce.com/perforce/doc.current/manuals/p4sag/Content/P4SAG/chapter.proxy.html[Helix Proxy] * `gconn`: Helix4Git (H4G) Connector * `swarm`: Helix Swarm As a special case, the _HelixServerTag_ is omitted for the ServerID of the master server spec. ==== Replica Type Tags The _ReplicaType_ is one of: * `master.`: The single master-commit server for a given SDP instance. SDP instance names are included in the ServerID for the master, as they intended to be unique within an enterprise. They must be unique to enable certain cross-instance sharing workflows, e.g. using remote depots and Helix native DVCS features. * `ha`: High Availability. This indicates a replica that was specifically intended for HA purposes and for use with the `p4 failover` command. It further implies the following: - The Services field value is `standby`. - The `rpl.journalcopy.location=1` configurable is set, optimized for SDP deployment. - The replica is not filtered in any way: No usage of the `-T` flag to `p4 pull` in the replicas startup._N_ configurables, and no usage of `*DataFilter` fields in the server spec. - Versioned files are replicated (with an `lbr.replication` value of `readonly`). - An HA replica is assumed to be geographically near its P4TARGET server, which can be a master server or an edge server. - It may or may not use the `mandatory` option in the server spec. The `ha` tag does not indicate whether the `mandatory` option is used (as this is more transient thing not suitable for baking into a server spec naming standard). * `ham`: A `ham` replica is the same as an `ha` replica except it does not replicate versioned files. Thus is a _metadata-only_ replica that shares versioned files with its P4TARGET server (master or edge) with an `lbr.replication` value of `shared`. * `fr`: Forwarding Replica (unfiltered) that replicates versioned files. * `frm`: Forwarding replica (unfiltered) that shares versioned files with its target server rather than replicating them. * `fs`: Forwarding Standby (unfiltered) that replicates versioned files. This is the same as an `ha` server, except that it is not necessarily expected to be physically near its P4TARGET server. This could be suited for Disaster Recovery (DR) purposes. * `fsm`: Forwarding standby (unfiltered) that shares versioned files with its target server rather than replicating them. This is the same as a `ham`, except that it is not necessarily expected to be physically near its P4TARGET server. * `ffr`: Filtered Forwarding Replica. This replica uses some of filtering, such as usage of `*DataFilter` fields of the server spec or `-T` flag to `p4 pull` in the replicas `startup.` configurables. Filtered replicas are not viable failover targets, as the filtered data would be lost. * `ro` - Read Only replica (unfiltered), replicating versioned files). * `rom` - Read Only metadata-only replica (unfiltered, sharing versioned files). * `edge` - Edge servers. (As edge servers are filtered by their nature, they are not valid failover targets). ===== Replication Notes If a replica does not need to be filtered, we recommend using `journalcopy` replication, i.e. using a replica with a `Services:` field value of `standby` or `forwarding-standby`. Only use non-journalcopy replication when using filtered replicas (and edge servers where there is no choice). Some general tips: * The `ha`, `ham` replicas are preferred for High Availability (HA) usage. * The `fs` and `ro` replicas are preferred for Disaster Recovery (DR) usage. * Since DR implies the replica is far from its master, replication of archives (rather than sharing e.g. via NFS) may not be practical, and so `rom` replicas don't have common use cases. * The `fr` type replica is obsolete, and should be replaced with `fs` (using `journalcopy` replication). ==== Site Tags The site tag needs to distinguish the data centers used by a single enterprise, and so generally short tag names are appropriate. See <<_sitetags_cfg>> Each site tag may be understood to be a true data center (Tier 1, Tier 2, etc.), a computer room, computer closet, or reserved space under a developer's desk. In some cases organizations will already have their own familiar site tags to refer to different sites or data centers; these can be used. In public cloud deployments, the public cloud provider's region names can be used (e.g. `us-east-1`), or an internal short form (e.g. `awsnva1` for the AWS us-east-1 data center in Northern Virginia, USA. As a special case, the `` is omitted for the master server spec. === Example Server Specs Here are some sample server spec names based on this convention: * `master.1`: A master server for SDP instance 1. * `p4d_ha_chi`: A High Availability (HA) server, suitable for use with `p4 failover`, located in Chicago, IL. * `p4d_ha2_chi`: A second High Availability server, suitable for use with `p4 failover`, located in Chicago, IL. * `p4d_ffr_pune`: A filtered forwarding replica in Pune, India. * `p4d_edge_blr`: An edge server located in Bangalore, India. * `p4d_ha_edge_blr`: An HA server with P4TARGET pointing to the edge server in Bangalore, India. * `p4d_edge3_awsnva`: A 3rd edge server in AWS data center in the us-east-1 (Northern Virginia) region. === Implications of Replication Filtering Replicas that are filtered in any way are not viable candidate servers to failover to, because any filtered data would be lost. === Other Replica Types The naming convention intentionally does not account for all possible server specs available with p4d. The standard accounts only for the distilled list of server spec types supported by the SDP `mkrep.sh` script, which are the most useful and commonly used ones. === The SDP `mkrep.sh` script The SDP script `mkrep.sh` adheres to this standard. For more information on creating replicas with this script. See: <<_using_mkrep_sh>>. [appendix] == Frequently Asked Questions This FAQ lists common questions about the SDP with answers. === How do I tell what version of the SDP I have? First, try the standard check. See: <<_checking_the_sdp_version>>. If that does not display the SDP version, as may happen with older SDP installations, run the SDP Health Check, which will report the correct version reliably. See: <<_sdp_health_checks>>. === How do I change super user password? There are two critical accounts to be aware of: * The UNIX/Linux operating system user account with a password managed by the operating system of the machine, referred to as the OSUSER. * The Perforce application super user with a password in the Perforce database. The SDP standard shell environment sets P4USER to refer to the super user. The user account name `perforce` is the default for both OSUSER and P4USER, but they can have different values. The OSUSER applies to the server machine, while the P4USER can vary on a per-instance basis. TIP: Some admins choose to use the same password for the `perforce` OSUSER and P4USER (for convenience and to reduce confusion), and then do routine rotations of both passwords (for enhanced security). TIP: The Perforce application super user should always use Perforce password management, even if other accounts are configured to use LDAP, SSO, or some other authentication method. To change the OSUSER, use your standard operating system commands. This may be the `passwd` command, but may be different depending on your operating system and other factors. The following describes how to change the Perforce application super user password. Step 1. Get a maintenance Window Plan to do this work in a maintenance window. The procedure can cause disruption if any triggers or extensions rely on a valid ticket for your application super user. Also, much automation such as the SDP `daily_checkpoint.sh` script rely on having a valid ticket. TIP: If you are fully aware of all the ways the password is used and thus the potential impacts, you can do the work outside of a maintenance window. Changing the password can disrupt triggers, extensions, and various automation, but will mot have any impact on Helix Core service itself. Step 2. Pick a Password Select your new password. Depending on your local policy, you may manually create a password, generate one, and possibly store it in a vault of some kind. Step 3. Login as the OSUSER Login as the OSUSER (e.g. `perforce`), and ensure the standard SDP shell environment is set. TIP: If the OSUSER shell environment files `~/.bash_profile` and `~/.bashrc` are set correctly, this step is done just by logging into the `perforce` OSUSER account. Step 4. Get the current password from the admin password file. The shell variable $SDP_ADMIN_PASSWORD_FILE contains the path to the password file for the current instance, something like `/p4/common/config/.p4passwd.p4_N.admin`. Do cat $SDP_ADMIN_PASSWORD_FILE Take note of the current/old password. Step 5. Put the new password in the admin password file. Step 7. Do: p4 passwd Provide the old and new password as prompted. Step 6. Call the `p4login` script to exercise the new password file: p4login -v Confirm you have a valid ticket afterward with: p4 login -s Step 7. Copy the password file to any and all replica and edge server machines. Step 8. On each replica and edge, login as `perforce` and also do `p4login -v` and `p4 login -s`. === Can I remove the perforce user? No. This account is required for critical operations like checkpoints for backup. TIP: This account need not occupy a licensed seat. Once a Helix Core server becomes licensed, you can fill out the link:https://www.perforce.com/support/vcs/helix-core-request-background-user[Helix Core Request for Background User] form to request up to 3 "background users" to support background automation tasks. This accounts for the `perforce` super user, a `swarm` user, and typically one named something like `builder` for automated builds. === Can I clone a VM to create a standby replica? Yes, cloning a virtual machine (VM) of a Helix Core commit server is a great way to simplify the process of creating a standby replica of the commit server. Similarly, cloning an edge server is useful in creating a standby replica of the edge. Cloning can be done with various technologies and in cloud and on-prem environments. For example, in AWS, creating an AMI of an EC2 instance (i.e. a virtual machine) is just different terminology for creating a clone of the virtual machine. Azure, GCP, and other clouds have similar concepts and capabilities, as do on-prem virtual infrastructure such as VMware ESX servers. Even non-virtual infrastructure tools exist for cloning bare metal server machines. Nothing needs to change other than the `server.id` file whether the machine you're cloning is a commit server (to make a standby of the commit) or an edge (to make a standby of the edge). There is a slight SDP structure difference between an commit an an edge -- an edge will have a `/hxdepots/p4/N/checkpoints.edge_SITE` directory and `/p4/N/checkpoints.edge_SITE` symlink to it. As long as you clone the machine that you're making a standby of, be it commit or edge, you'll have the correct structure on the standby. While nothing should need to change, there are a few things to double check before initiating the cloning process: * Check that the SDP Instance Vars file, `/p4/common/config/p4_N.vars` has correct values for **P4MASTERHOST** and **P4MASTER_ID**. * The **P4MASTER_ID** must be the `server.id` of the commit server, always, and that will be the same regardless of what machine you're on.The **P4MASTERHOST** should be a DNS name for the commit server that works -- i.e. that valid to reference from the standby server after cloning. Using the same DNS name used by regular users is preferred -- it can be an FQDN or a short name depending on how DNS is setup locally. If DNS isn't available in the server environment (as is sometimes the case), Plan B for setting **P4MASTERHOST** is to still use the same DNS that users know, but to add an `/etc/hosts` entry ("hack?") on the standby server machine after cloning so that the DNS name works on the standby to reference the commit server. Plan C, which we strong advise against but do support, is to use an IP address for the **P4MASTERHOST** value. Plan A is preferred because Plans B and C require the admin who executes failover to be aware of the "hacks" -- `/etc/hosts` entry or using an IP address -- to be accounted for in the failover procedure. The general idea is that `/p4/common` structure in the SDP should be _common_ across all Helix Core server machines in your fleet. Even on the standby replica, the **P4MASTER_ID** and **P4MASTERHOST** values be exactly the same as on the commit. Cloning the machine is the best way to do it. It's also nice to have a reasonably current set of archives, and nice to ensure all those little SDP config bits are correct. Here is a sample procedure of cloning a machine to create a standby replica. Step 1. Verify **P4MASTER_ID** and **P4MASTERHOST** settings are correct. Step 2. Use `mkrep.sh` to create your standby server. See: <<_using_mkrep_sh>>. Step 3. Run `p4 admin journal`. (Digression: Use `p4 admin journal` command if you're creating a standby or unfiltered edge or replica, but use the `rotate_journal.sh` script instead if you're creating a filtered edge or filtered forwarding replica, where _filtered_ here means using the `*DataFilter` fields in the server spec and/or using `-T` option to the configured `startup.N` thread that does the metadata pull for the ServerID of the new server.) Step 4. Clone the VM. Step 5. Start the new VM after the cloning operation. For example, if in AWS, launch an EC2 instance from the AMI. Step 6. Stop the p4d_N (and p4broker_N) services if running. Step 7. Use `hostname -I` to get the local/private IP, and request a new license file for that IP -- but don't wait for it. Step 8. Remove the `$P4ROOT/license` file. Step 9. Remove the `$P4ROOT/server.id` file. Step 10. Load the latest checkpoint and numbered journal, and then pull recent archives, e.g. with a command like this sample: nohup load_checkpoint.sh /p4/1/checkpoints/p4_1.ckp.50.gz /p4/1/checkpoints/p4_1.jnl.50 -s p4d_ha_bos -l -r -b -y -verify default < /dev/null > /p4/1/logs/load.log 2>&1 & That `load_checkpoint.sh` does the rest. It stops p4d and p4broker services (just in case you forgot), clears P4ROOT, moves P4LOG and P4JOURNAL aside if they exist (which they would after a cloning situation), puts the new correct `server.id` file in place, reloads from the latest checkpoint and numbered journal (that are sure to have the very latest data due to the `p4 admin journal` done above just before the cloning), does a `p4d -xu` (just in case it's needed, but shouldn't be in this situation), starts the service, and then kicks off a `p4 verify -t` command on all depots to pull over any missing files from the commit. TIP: The above procedure is merely a sample. Certain details, such as the handling of license files, may vary from one site to another. [appendix] == Troubleshooting Guide This appendix lists problems sometimes encountered by SDP users, with guidance on how to analyize and resolve each issue. Do not hesitate to contact consulting@perforce.com if additional assistance is required. === Daily_checkpoint.sh fails . Check the output of the log file and look for errors: less /p4/1/logs/checkpoint.log Possibilities include: * Errors from `verify_sdp.sh` - should be self explanatory. ** Note that it is possible to edit `/p4/common/config/p4_1.vars` and set the value of `VERIFY_SDP_SKIP_TEST_LIST` to include any tests you consider should be skipped - don't overdo this! * See next section ==== Last checkpoint not complete. Check the backup process or contact support. If this error occurs it means the script has found a "semaphore" file which is used to prevent multiple checkpoints running at the same time. This file is (for instance 1) `/p4/1/logs/ckp_running.txt`. Check if there is a current process running: ps aux | grep daily_checkpoint IMPORTANT: If you are CERTAIN that there is no checkpoint process running, then you can delete this file and re-run `daily_checkpoint.sh` (or allow it to be run via nightly crontab). If in doubt, contact support! === Replication appears to be stalled This can happen for a variety of reasons, most commonly: * Service user is not logged in to the parent ** Or there is a problem with ticket or ticket location * Configurables are incorrect (`p4 configure show allservers`) * Network connectivity to upstream parent * A problem with state file . Check the output of `p4 pull -lj`, e.g. this shows all is working well: $ p4 pull -lj Current replica journal state is: Journal 1237, Sequence 2680510310. Current master journal state is: Journal 1237, Sequence 2680510310. The statefile was last modified at: 2022/03/29 14:15:16. The replica server time is currently: 2022/03/29 14:15:18 +0000 GMT ==== Resolution . This example shows a password error for the service user: $ p4 pull -lj Perforce password (P4PASSWD) invalid or unset. Perforce password (P4PASSWD) invalid or unset. Current replica journal state is: Journal 1237, Sequence 2568249374. Current master journal state is: Journal 1237, Sequence -1. Current master journal state is: Journal 0, Sequence -1. The statefile was last modified at: 2022/03/29 13:05:46. The replica server time is currently: 2022/03/29 14:13:21 +0000 GMT .. In case of a password error, try logging in again: p4login -v 1 -service p4 pull -lj .. If the above reports an error, then copy and paste the command it shows as executing and try it manually, for example (adjust the server/user ids): /p4/1/bin/p4_1 -p p4master:1664 -u p4admin -s login svc_p4d_edge_ldn If the above is not successful: [start=3] . Review output of `verify_sdp.sh`: /p4/common/bin/verify_sdp.sh 1 grep Error /p4/1/logs/verify_sdp.log .. Check for errors in the resulting log file: grep Error /p4/1/logs/verify_sdp.log . Check for errors in the p4d log file: grep -A4 error: /p4/1/logs/log | less . Check permissions on the tickets file (env var `$P4TICKETS`): ls -al $P4TICKETS + e.g. ls -al /p4/1/.p4tickets ==== Make Errors Visible If the above doesn't help, then make errors visible/easy to find, assuming instance *1* - run this *on the replica (not commit!)*: sudo systemctl stop p4d_1 cd /p4/1/logs mv log log.old sudo systemctl start p4d_1 grep -A4 error: log | less Due to shortened log file, any errors should be easily found. Ask for help (email `support-helix-core@perforce.com`) if not obvious. ==== Remove state file Files `state` and `statejcopy` can usually be removed - let the server work out its current state. If you want to know current journal counter for replica: p4d -r /p4/1/root -k db.counters -jd - 2>/dev/null | grep @journal@ | cut -d '@' -f 8 If there is a problem with being able to pull over an old journal which no longer exists on the master you may need to reseed the replica! sudo systemctl stop p4d_1 cd /p4/1/root mv state* save/ cd /p4/1/logs [[ -d save ]] || mkdir save # Create if doesn't exist mv journal* save/ sudo systemctl start p4d_1 === Archive pull queue appears to be stalled This manifests as the output of `p4 pull -ls` showing an unchanging number of files in the queue - no progress is being made. $ p4 pull -ls File transfers: 3 active/29 total, bytes: 2338 active/25579 total. Oldest change with at least one pending file transfer: 1234. This can happen for a variety of reasons, most commonly: * Non-existent (purged) files (where filetype includes +Sn - where n is number of revisions to keep contents for) * Non-existent (shelved) files * Non-existent files with verify problem on master server * Temporary file transfer problems which exceeded thresholds for auto-retry ==== Resolutions . Retry pull errors + [source] ---- p4 pull -R p4 pull -ls ---- . If the above doesn't fix things then we can check for errors: p4 pull -l | grep -c failed . If the above is > 0 then we need to investigate in more detail. ===== Remove and re-queue Save the list of files with errors to a file - like this to allow for spaces in filenames: p4 -F "%rev% %file%" pull -l > pull.errs cat pull.errs | while read -e r f; do p4 pull -d -r $r -f "$f"; done Finally we can “re-queue” any for re-transfer (note this can take a while for files with many revs): cut -d' ' -f 2,999 pull.errs | sort | uniq | while read -e f; do echo "$f" && p4 verify -qt --only MISSING "$f"; done TIP: the `--only MISSING` option requires `p4d` version >= 2021.1 and is much faster - just remove that option with older versions of `p4d` Then have another look: p4 pull -l ===== Check for verify errors on the parent server On the parent server, check the most recent `p4verify.log` file (typically runs Saturday morning via crontab). Cross-check any entries in `pull.errs` above - if they are also verify errors on the parent server then you need to resolve that. Consider contacting helix-core-support@perforce.com if you need help. Resolutions may include obliterating lost revisions, or attempting to restore from backup. === Can't login to edge server This can happen if the edge server replication has stalled as above. ==== Resolution * Try the resolution steps for <<_replication_appears_to_be_stalled>> * Restart edge server * Monitor replication and check for any errors === Updating offline_db for an edge server If your `daily_checkpoint.sh` jobs on the edge server are failing due to a problem with the `offline_db` or missing edge journals, AND the edge server is otherwise running fine, then consider this option. IMPORTANT: Checkpointing the edge will take some time during which the edge will be locked! Schedule this for a convenient time! ==== Resolution Assuming instance 1: * ON EDGE SERVER: source /p4/common/bin/p4_vars 1 p4 admin checkpoint -Z * ON COMMIT SERVER (and at a convenient time to lock edge): source /p4/common/bin/p4_vars 1 p4 admin journal * Monitor edge server checkpoint being created (on EDGE SERVER): p4 configure show journalPrefix + Using the output shown by the above command: ls -lhtr /p4/1/checkpoints./*.ckp.* + Also you can check for edge being locked (the following may hang): p4 monitor show -al * Then replay the journal on the edge server to the `offline_db`: cd /p4/1/offline_db mv db.* save/ nohup /p4/1/bin/p4d_1 -r . -jr /p4/1/checkpoints./p4_1.ckp.NNNN.gz > rec.out & + When the above has completed, mark as usable by creating semaphore file: touch /p4/1/offline_db/offline_db_usable.txt === Journal out of sequence in checkpoint.log file This error is encountered when the offline and live databases are no longer in sync, and will cause the offline checkpoint process to fail. Because the scripts will replay all outstanding journals, this error is much less likely to occur. This error can be fixed by: * recreating the offline_db: <<_recreate_offline_db_sh>> * alternatively if that doesn't work - run the <<_live_checkpoint_sh>> script (note the warnings about locking live database) === Unexpected end of file in replica daily sync Check the start time and duration of the <<_daily_checkpoint_sh>> cron job on the master. If this overlaps with the start time of the <<_sync_replica_sh>> cron job on a replica, a truncated checkpoint may be rsync'd to the replica and replaying this will result in an error. Adjust the replica's cronjob to start later to resolve this. Default cron job times, as installed by the SDP are initial estimates, and should be adjusted to suit your production environment. [appendix] == Starting and Stopping Services There are a variety of _init mechanisms_ on various Linux flavors. The following describes how to start and stop services using different init mechanisms. === SDP Service Management with the systemd init mechanism On modern OS's, like RHEL7 & 8, Rocky Linux 8, and Ubuntu >=18.04, and SuSE >=12, the `systemd` init mechanism is used. The underlying SDP init scripts are used, but they are wrapped with "unit" files in `/etc/systemd/system` directory, and called using the `systemctl` interface as `root` (typically using `sudo` while running as the `perforce` user). On systems where systemd is used, *the service can only be started using the `sudo systemctl` command*, as in this example: sudo systemctl status p4d_N sudo systemctl start p4d_N sudo systemctl status p4d_N Note that there is no immediate indication from running the start command that it was actually successful, hence the status command is run after. For best results, wait a few seconds after running the start command before running the status command. (If the start was unsuccessful, a good start to diagnostics would include running `tail /p4/N/logs/log` and `cat /p4/N/logs/p4d_init.log`). The service should also be stopped in the same manner: sudo systemctl stop p4d_N Checking for status can be done using both the `systemctl` command, or calling the underlying SDP init script directly. However, there are cases where the status indication may be different. Calling the underlying SDP init script for status will always report status accurately, as in this example: /p4/N/bin/p4d_N_init status That works reliably even if the service was started with `systemctl start p4d_N`. Checking status using the systemctl mechanism is done like so: sudo systemctl start p4d_N If this reports that the service is *`active (running)`*, such indication is reliable. However, the status indication may falsely indicate that the service is down when it is actually running. This could occur with older init scripts if the underlying init script was used to start the server rather than using `sudo systemctl start p4d_N` as prescribed. The status indication would only indicate that the service is running if it was started using the systemctl mechanism. As of SDP 2020.1, a safety feature now assures that system is always used if configured. ==== Brokers and Proxies In the above examples for starting, stopping, and status-checking of services using either the SysV or `systemd` init mechanisms, `p4d` is the sample service managed. This can be replaced with `p4p` or `p4broker` to manage proxy and broker services, respectively. For example, on a `systemd` system, the broker service, if configured, can be started like so: sudo systemctl status p4broker_1 sudo systemctl start p4broker_1 sudo systemctl status p4broker_1 ==== Root or sudo required with systemd For SysV, having sudo is optional, as the underlying SDP init scripts can be called safely as `root` or `perforce`; the service runs as `perforce`. If `systemd` is used, by default `root` access (often granted via `sudo`) is needed to start and stop the p4d service, effectively making sudo access required for the `perforce` user. The systemd "unit" files provided with the SDP handle making sure the underlying SDP init scripts start running under the correct operating system account user (typically `perforce`). === SDP Service Management with SysV init mechanism On older OS's, like RHEL/CentOS 6, the SysV init mechanism is used. For those, you can the following example commands, replacing _N_ with the actual SDP instance name sudo service p4d_N_init status The service can be checked for status, started and stopped by calling the underlying SDP init scripts as either `root` or `perforce` directly: /p4/N/bin/p4d_N_init status Replace `status` with `start` or `stop` as needed. It is common to do a `status` check immediately before and after a `start` or `stop`. During installation, a symlink is setup such that `/etc/init.d/p4d_N_init` is a symlink to `/p4/N/bin/p4_N_init`, and the proper `chkconfig` commands are run to register the application as a service that will be started on boot and gracefully shutdown on reboot. On systems using SysV, calling the underlying SDP init scripts is safe and completely interchangeable with using the `service` command being run as `root`. That is, you can start a service with the underlying SDP init script, and the SysV init mechanism will still safely detect whether the service is running during a system shutdown, and thus will perform a graceful stop if p4d is up and running when you go to reboot. The status indication of the underlying SDP init script is absolutely 100% reliable, regardless of how the service was started (i.e. calling the init script directly as `root` or `perforce`, or using the `service` call as `root`. [appendix] == Brokers in Stack Topology A preferred methodology is to deploy p4broker processes to control access to p4d servers. In a typical configuration, 100% of user activity gets to p4d thru a p4broker deployed in "stack topology", i.e. a p4broker exists on every machine where p4d is, and access to p4d on any given machine is only via the broker, with a typical setup using firewalls to enforce that concept. There are typically only 3 exceptions: 1. p4d-to-p4d communication (`p4 pull`, `p4 journalcopy`) bypasses the broker 2. Triggers called from p4d run 'p4' commands against the p4d port directly. 3. Admins running 'p4' commands while on the server machine can bypass the broker if they want. Everything else (to include Proxies, Swarm, Jenkins, any systems integrations, etc.) must go thru the broker. Using brokers like this makes it straightforward to implement the "Down for Maintenance" concept across an entire global topology. For example, when upgrade p4d services in a global topology, doing the outer-to-inner upgrade procedure, it is best to prevent users from loading the system during the upgrade process. Using brokers in "stack topology" avoids the significant performance impact of brokers deployed on a different machine than the targeted p4d. While running on the same host, the impact of brokers is relatively small. Brokers are preferred over p4d command triggers for certain use cases. They're independent of p4d and can keep p4d safe from rogue usage patterns. [appendix] == SDP Health Checks If you need to contact Perforce Support to analyze an issue with the SDP on UNIX/Linux, you can use the `/p4/common/bin/sdp_health_check.sh` script. This script is included with the SDP (starting with SDP 2023.1 Patch 3). If your installation does not have this script, it can be downloaded separately. Every version of the `sdp_health_check.sh` script can be used any and all versions of the UNIX/Linux SDP dating back to 2007, so you don't need to be concerned with version compatibility. If your Perforce Helix server machine has outbound internet access, execute the following while logged in as the operating system user that owns the `/p4/common/bin` directory (typically `perforce` or `p4admin`): cd /p4/common/bin [[ -e sdp_health_check.sh ]] && mv -f sdp_health_check.sh sdp_health_check.sh.moved.$(date +'%Y-%m-%d-%H%M%S') curl -L -s -O https://swarm.workshop.perforce.com/projects/perforce-software-sdp/download/tools/sdp_health_check.sh chmod +x sdp_health_check.sh ./sdp_health_check.sh If your Perforce Helix server machine does not have have outbound internet access, acquire the `sdp_health_check.sh` file from a machine that does have outbound internet access, and then somehow get that file to your Perforce Helix server machine. If you have multiple server machines with SDP, possibly including machines running P4D replicas or edge servers, P4Proxy or P4Broker servers, run the health on al machines of interest. The `sdp_health_check.sh` script will produce a log file that can be provided to Perforce Support to help diagnose configuration issues and other problems. The script has these characteristics: * It is always safe to run. It does only analysis and reporting. * It does only fast checks, and has no interactive prompts. Some log files are captured such as checkpoint.log, but not potentially large ones such as the p4d server log. * It requires no command line arguments. * It does not trasnfer sensitive information. * It works for any and all UNIX/Linux SDP version since 2007.