= SDP Windows to Linux Migration Guide Perforce Professional Services :revnumber: v2023.2 :revdate: 2024-03-25 :doctype: book :icons: font :toc: :toclevels: 5 :sectnumlevels: 4 :xrefstyle: full // Attribute for ifdef usage :unix_doc: true == DRAFT NOTICE WARNING: This document is in DRAFT status and should not be relied on yet. It is a preview of a document to be completed in a future release. == Preface This guide documents the process for migrating a Helix Core (P4D) service from a Windows server machine to Linux. A migration can be minimally disruptive to users if planned and executed properly. This document informs the planning and execution of a Windows to Linux migration. Because P4D on Linux can run in the same case-insensitive mode that is familiar to users operating on P4D on Windows, the migration can be nearly seamless to users. After preparations,tthe eventual cutover is done with a `p4 failover` operation in a scheduled maintenance window (or a series of failovers if edge servers are involved). A failover smoothly transitions the P4D service from one machine to another, with no loss of data and minimal disruption. The preparation effort typically involves straightforward (though potentially long-running) tasks for the administrator. If the Windows P4D runs with custom triggers, there will be some degree of compleixty depending on how custom triggers are handled. Typical options include porting or ditching (temporarily or permanently) the custom triggers. If any additional custom automation operates on the Windows P4D server directly, similar handling may be required. Regardless of the effort and potential complexity handling customization for admins, the migration can culminate in a nearly seamless transition for users. *Please Give Us Feedback* Perforce welcomes feedback from our users. Please send any suggestions for improving this document to consulting@perforce.com. :sectnums: == Overview A Windows to Linux Migration has these elements: * Planning * Provision New Linux Server machines. * Install Perforce Helix on Linux. * Setup Linux Replica server spec on Windows. * Pull and verify archives. This may take a long while if there is a lot of data to pull, potentially requiring multiple iterations of the pull/verify process. * Do a Dry Run of the cutover. * Port and/or Test Triggers * Test, Test, Test. * Correct data issues identified in planning. * Adjust configurables. * Craft a Cutover Procedure. * Execute the Cutover Procedure. For purposes of this document, it does not matter if the servers are on-premises ("on-prem") or in a private or public cloud environment such as AWS, Azure, or GCP. Each of these components is covered in detail in this guide. == Migration Planning This section covers things to be aware of when planning. First, it is helpful to review the related document link:https://swarm.workshop.perforce.com/view/guest/perforce_software/sdp/main/doc/SDP_MigrationAndUpgradeGuide.html[SDP Migration and Upgrade Guide]. This document discusses a Big Blue Green Cutover (BBGC) style of migration. For a Windows to Linux migration, use a special type of BBGC called a Failover Migration: After preparations are complete, a `p4 failover` completes the migration to Linux (or a series of failovers if edge servers are involved). The Windows service may or may not be operated using link:https://swarm.workshop.perforce.com/view/guest/perforce_software/sdp/main/doc/SDP_Guide.Windows.html[Server Deployment Package (SDP) for Windows]. Regardless of whether the Windows service is managed with SDP, the Windows service is largely left alone during the migration. The target environment will always be setup per best practices as implemented with the link:https://swarm.workshop.perforce.com/view/guest/perforce_software/sdp/main/doc/SDP_Guide.Unix.html[Linux Server Deployment Package (SDP)]. === Planning for User Impact The migration can be nearly seamless. Typical impacts to users (humans and automation/bots) for a Windows to Linux migration include: * Possibly needing to login again after the Linux server becomes the commit (depending on whether certain configurables like `auth.id` are adjusted during migration). * If SSL is enabled, needing to trust the Linux server after it becomes the commit. * Depending on how user traffic is directed to the Windows server, there may be an impact: - If users connect using a P4PORT that includes an an IP address, users will need to change the P4PORT they use. - If the failover plan involves changing a DNS name (as opposed to some instantansous method of traffic redirection), there will be delays associated with DNS changes and DNS cache flushing. * If the migration is planned to include a change in authentication mechanism, e.g. standard LDAP -> SAML/SSO with the link:https://github.com/perforce/helix-authentication-service[Helix Authentication Service (HAS)], users will need to adapt to this. Other than being aware of the above, users to not need to do any special preparation for the cutover. For example, users _do not_ need to be concerned about the state of files in their workspaces. Whatever state files in workspaces are in at the time of cutover -- checked out to default or numbered pending changelists, shelved or not, etc. -- is not affected by the cutover. === Failover Migration This document focuses on the failover style strategy. This entails creating a server spec (ServerID) for a standby server we'll call `p4d_fs_linux`, that will operate for a time as a Linux standby replica of the current production Windows commit server. Depending on various factors such as data scale, project priority and complexity, etc. this Linux replica of the Windows commit server may operate for days, weeks or even months before it is ready for the planned and scheduled failover that will promote the Linux standby server to become the new commit server. This Failover strategy has several benefits: * Minimum disruption to end users for the cutover. * Allows for extensive testing of the new Linux server(s) and infrastructure prior to cutover. * The effect on the original Windows server(s) and infrastructure is minimal. * Rollback, while hopefully unnecessary, is straightforward. While planning and preparation will take time and effort, the disruption to end users can be minimal. TIP: If your current method of operating Helix Core on Windows does not produce a regular metadata **checkpoint**, a change is required to get at least some basic form of checkpoint process in place. (If you are not sure what a checkpoint is, see: link:https://www.perforce.com/manuals/p4sag/Content/P4SAG/backup-recovery-concepts.html[backup and recovery concepts].) The Failover strategy requires that the Windows Helix Core P4D service be at version 2019.1 (latest patch) or later. If it is not already at the latest patch available of 2019.1 or a later major version, than the plan should account for first upgrading the Windows service in place to a more recent version, such as 2023.1. Other strategies could be considered that would not require upgrading in place if avoiding an in-place upgrade is a priority. That would entail longer downtime and other complexity. Such options are not explored in this document. === Custom Triggers and Extensions The largest single variable in terms of effort estimation in planning for a Windows to Linux migration is the effort required to port any custom link:https://EDITME_TRIGGERS_DOC_URL[triggers] or link:https://EDITME_EXTENSIONS_DOC_URL[extensions]. This can be literally zero effort if you have no custom triggers or extensions (or none that need to survive the migration). If porting and/or testing is required, that becomes a software development and testing project on its own that folkds into the larger migration project. Any custom Triggers or Extensions will need to be reviewed. Any that can't be discarded will need to evaluated for porting and testing needs. Triggers written in a native Windows langage such as batch or PowerShell, or operated as compiled .exe files, will need to be ported. Even triggers written in more portably languages such as Python or Perl will need testing and may need adjustment to operate in the Linux environment. Extensions are written in Lua, the interpreter for which is entirely containdd in the Helix Core p4d binary itself. As such, custom extensions are less likely to require porting. However, they should still be evaluated and/or tested to be sure they have no Windows OS depenencies in their implementation. Extensions provided by Perforce Software, such as those assoicated with Helix Swarm and the Helix Authentication Service, are inherently cross-platform and do not need to be ported. === Other Custom Automation on Windows Server Machine Determine whether you have any custom softawre that runs directly on the Windows server machine. Custom automation that executes directly on the Windows server machine itself needs to be evaluated for porting and testing needs. Because the migration is transparent, any automation that merely connects as a client to the Windows p4d server, such as build server farms, need not be considered (other than possibly needing to login or trust p4d again and/or possibly change the P4PORT, as noted in <<_planning_for_user_impact>>). === Find Incompatible Configuration Settings Using the `p4 configure` command to interact with `db.config` is a good way, and in many cases the only way, to set various configuration items with a Helix Core server. However, there are certain settings that must not be defined with `p4 configure`, as they conflict with settings the SDP defines with shell environment variables on Linux. Review the output of the command `p4 configure show allservers` and see if any of the following are set: * `P4JOURNAL` * `P4PORT` * `P4LOG` If any of these are set with `p4 configure`, the migration plan will need to deal with unsetting them after first ensuring they are set in some other way on the Windows service. Following is an example of how to replace how P4LOG is set displays in the output of `p4 configure show all servers`. Note that changing this requires a brief service restart to take effect. ==== Sample Procedure to replace P4LOG configurable This is an example of how this might be done if the Windows service name is `Perforce`: p4 set -S Perforce P4LOG=L:\p4logs\p4d.log That will set the P4LOG variable so that it is associated with the Windows service named `Perforce`. Once that is done, it can be unset as a configurable, such as in this example: p4d.exe -r E:\PerforceRoot "-cunset P4LOG" Next, stop and then start the Windows service as you normally would. ==== Other Windows Paths in Configuration Also scan for things like Windows paths, such as Structured Logs defined to reference a Windows path. Such things will need to be be overridden in the server spec for the Linux replica. For example, if you see: any: serverlog.file.11=E:\PerforceRoot\triggers.csv You'll want to create an override for the Linux replica by doing: p4 configure set p4d_fs_linux#serverlog.file.11=/p4/1/logs/triggers.csv === Depot Root and Depot Spec Map Fields Perforce Helix depot specs have a field named `Map:` that, if used, must be eliminated prior to the deployment of a Linux replica. Further, the `server.depot.root` configurable must be set on the commit server. If done carefully, the changes to set `server.depot.root` and clear the `Map:` field of each depot spec can be done non-disruptively on the live running Windows Perforce Helix Core service, and must be done before creating the checkpoint used to seed the Linux replica. The key to making the change non-disruptively is to understand that the p4d server will use the `Map:` field value to see if it is set to anything other than the default, and otherwise will fall back to the `server.depot.root` configurable to find depots. If the value of the `Map:` field of any given depot is `_TheDepotName_/...`, that means the value is not explicitly set. Before making changes, the singular `server.depot.root` value must be made to work for all depots. A common goal early on is to make the single `server.depot.root` path work without actually moving any files, but by using Windows directory symlinks. If individual depots are on different drives, put symlinks to all depots in the directory pointed to by the `server.depot.root` configurable so that p4d can find all depot files from that path. You may also find the Map fields use Windows UNC paths or if Windows junctions. Special planning may be required if there are any depots of type `archive`. === The journalPrefix The Windows commit server must have the `journalPrefix` value be set in order to set up the Linux replica. It can be set to any value that works to enable the p4d service to find its archives, but cannot be unset. === Uncompressed Journals Examine how checkpoints and journals are currently taken on the Windows environment (or of they are taken at all). If journals on the Windows service are compressed, replication will not work. Replicas require uncompressed journals. As a general rule, the `p4d -jc` command is best done with `-Z`, which compresses the checkpoint file, but not the numbered journal files. Changes to any custom scripts that manage checkpoints in the Windows environment may be warranted. === Helix Core Components Consider what Perforce Helix systems are in your environment that may need to be handled, such as: * Helix Core Server (P4D) - Commit Server - Edge Servers - Filtered Replicas - Unfiltered Replicas - Standby Servers * Helix Broker (P4Broker) * Helix Proxy (P4P) * Helix Swarm * P4DTG === Helix Core Topology Is your server a single machine, or are there many server machines? In any case, you'll want to think in terms of a "Big Blue/Green Deploy." Every active Windows server machine in the current production topology (the "Blue" servers), including all replicas, edges, and proxies, will all need equivalent Linux server machines to replace them (the "Green" servers). Replicas are straightforward to handle. Handling edges is more complex but doable. Don't forget proxies -- they need to do the Windows -> Linux thing too. (Proxies could have been Linux all along even with a Windows P4D, but don't forget to check that). === Moving Archive Files Once the Linux replicas are setup, a variety of strategies can be used to transfer archive files. Plan 3 cycles of `p4verify.sh`, to get p4d to pull the archives. The first, starting with no archive files, is to start a bulk pull. That could take days or weeks depending on data scale. The second to fill in gaps, and the 3rd should be clean. Depending on scale of data, you may want to consider using outside-p4d mechanisms for transferring some archives (especially the `.gz` files, `,v` files should be transferred with `p4 pull` ideally). TIP: Lots of variations on how to get the archives files there. Using `p4 pull` has an advantage better that, if the Linux p4d writes the archive, it can always find it, even it it's funky with Unicode cruft in the path. By contrast, files copied outside p4d may not be found by the Linux p4d. However, for bulk pulls of Terabytes of data, a Windows port of rsync, at least for `.gz` files, will be much faster. You'll need a live running rsync service on Linux for the Windows port of rsync to talk to. There are many options here; somehow or other get the files in place so p4verify.sh is happy. === Avoid Case Conversion If there is a desire to convert the case to become case-sensitive, that should be deferred and done as a separate project. A Windows to Linux migration that preserves the original Windows case-insensitive behavior is minimally disruptive. A case conversion is likely to be disruptive to users and workflows, and is complex enough that it should be relegated to a separate project from a Windows to Linux migration. The case conversion should be done after the Windows to Linux migration is complete. === Combining Upgrade with Migration If the priority is to avoid upgrading or touching the Windows environment, an upgrade to a modern Helix Core version can be done to the Linux server during the cutover, as part of the Windows to Linux migration project. Alternately, you can upgrade the Windows P4D in place first, and then set up the Linux replica on the same modern P4D version. If the starting Windows version is 2019.1+, a Failover style migration is possible; otherwise a different strategy is needed. Typically we recommend doing the failover-then-upgrade in the same maintenance window as the Windows to Linux migration. That is, failover to the new server on Linux on the same p4d version as Windows was initially. Then once on Linux, do the standard SDP upgrade procedure for Linux, using `upgrade.sh`. === DRY RUN At least one Dry Run is required to confidently execute a migration. Plan to have at least one. In the dry run, the `p4 failover` command is NOT used. Instead, the Linux service is stopped, and the `$P4ROOT/server.id` file is simply hand-edited to be the ServerId the the commit server. Then the service is restarted. At that point, the Linux commit server will believe itself to be the new commit server, even though users will still be using the Windows server for real work. Then the Linux server can be tested in various ways: * Test connectivity from all user access points. * Test connectivity from all server access points, including replicas, proxies, and any integrated systems such as Jenkins, Swarm, P4DTG, etc. * If there are any `ldap` specs, ensure the targeted LDAP servers can be reached from the Linux server. (This may require firewall adjustments). === Setup Linux Replica ServerID On the Windows commit server, create a server spec to represent p4d on Linux. Call it `p4d_fs_linux`. == Migration Preparation === Provision New Linux Server Machines EDITME - Add content here. ==== Select Operating System As of this writing, the best options are: * Ubuntu 22.04 (or 20.04) * RHEL/Rocky Linux 9 (or 8) == Install Perforce Helix on Linux === Install Helix Core Software On the Green Linux server machines that do not yet have any data, use the Helix Installer, do a Configured Install. WARNING: The Helix Installer is only to be used on truly "green" server machines, those with no Helix Core data on them yet. su - mkdir -p /hxdepots/reset cd /hxdepots/reset curl -L -s -O https://swarm.workshop.perforce.com/download/guest/perforce_software/helix-installer/main/src/reset_sdp.sh chmod +x reset_sdp.sh ./reset_sdp.sh -C > settings.cfg In `settings.cfg`, change these settings: * DNS_name_of_master_server= * P4_PORT= * Instance= * Password= * CaseSensitive=0 * P4USER= * ServerID= * ServerType= * P4BinRel= * P4APIRel= Then run the script: ./reset_sdp.sh -no_sd -c settings.cfg 2>&1 | tee log.reset_sdp.txt su - perforce p4 set cd /p4/common/site [[ -d config ]] || mkdir config cd config === Create the Linux replica. Temporary Hack: vi /p4/common/site/config/p4_N.vars.local export P4MASTER_ID=windows.p4d export P4MASTERPORT=192.168.1.5:1666 export P4PORT=$P4MASTERPORT p4login -v cd /p4/common/config vi SiteTags.cfg azwestus2: Azure data center Add to Protections: super group ServiceUsers * //... mkrep.sh -t fs -s usw2 -r p4d-commit-wus2 Undo Temporary Hack: vi /p4/common/site/config/p4_N.vars.local #export P4MASTER_ID=Master #export P4PORT=$P4MASTERPORT export P4MASTERPORT=10.0.0.4: [appendix] == Sample Cutover Procedure === Sample Migration Scenarion The following is a sample cutover procedure for a topology with a commit server and an edge server, with custom triggers that have been ported to Linux. The sample instructions assume the `perforce` OS user on the Linux servers has been setup with the proper shell environment, specifically that the `~/.bashrc` has sourced the `/p4/common/bin/p4_vars` file with the appropriate SDP instance parameter. The preparation for this sample cutover scenario would have included: * As set of ported and tested Linux custom triggers (replacing former custom triggers on Windows) deployed on all Linux servers as `/p4/common/site/bin/triggers` folder. * A triggers table suited for operation on the Linux server after it becomes th commit server. === One Week Prior to Cutover Procedure **STEP 1: Verify Replication** Verify that replication is healthy on the Linux replica, the Windows edge, and the Linux replica of the Windows edge server. === One Day Prior to Cutover Procedure **STEP 1: Verify Replication** Verify that replication is healthy on the Linux replica, the Windows edge, and the Linux replica of the Windows edge server. **STEP 2: Checkpoint Linux Replicas** On the Linux standby of the Windodws commit server, and separately and in parallel on any Linux standbys of Windows edge servers, request a checkpoint: p4 admin checkpiont -Z WARNING: Do a `p4 info` first and confirm that the target ServerID is that of the Linux edge server, to avoid taking an unintentional "live checkpoint" of the commit server. Next, on the Windows commit server, execute a journal rotation: p4 admin journal Once this command has been run, it will trigger the Linux server to start taking a checkpoint. On the Linux server, a checkpoint should immediately appear in the checkpoints directory. TIP: The checkpoints directory is '/p4/_N_/checkpoints' for the standby of the commit server, `/p4/N/checkpoints.ShortServerID Monitor the checkpoints directory and await the appearance of a *.md5 with the same number as the checkpoint. The existence of the MD5 file incidates the successful completion of the checkpoint process. Use the `watch` utility and wait until the *.md5 file appaers. **STEP 3**: Replay Checkpoint to offline_db. On the Linux commit and edge servers, replay the checkpoint to the offline_db. This can be done regardless of whether the local p4d service is replicating or even online at all, and is nether affected by nor disruptive the p4d service. nohup recreate_offline_db.sh < /dev/null > /dev/null 2>&1 & Monitor until completion with: tail -f $LOGS/recreate_offline_db.log === Cutover Procedure **STEP 1: Verify Replication** Verify that replication is healthy on the Linux replica, the Windows edge, and the Linux replica of the Windows edge server. **STEP 2: Disabled Scheduled Tasks** On the old Windows commit and edge server machines, disable any Scheduled Tasks related to backups or checkpoints. Also ensure no long-running checkpoint or backup opreations are in progress that won't be complete by the time of the intended cutover. **STEP 3: Disable Crontabs** On the new Linux commit and edge server machines, save and then disable all crontabs intended for routine produdction operation (and they may have been left on during dry runs). **STEP 4: Lockout Users with Protections** **STEP 5: Stop Services** **STEP 6: Start Services** **STEP 7: Rotate Journal** **STEP 8: Verify Replication** **STEP 9: Failover Edge Server** **STEP 10: Apply Metadata Changes for Linux** Apply metadata changes required for operation on Linux and commit server now being on SDP. **STEP 11: Failover Commit Server** **STEP 12: Do Sanity Tests** **STEP 13: Decide: GO/NO GO** **STEP 14: Restore Default Protections** **STEP 15: Direct User Traffic to Linux** **STEP 16: Enable crontabs** EDITME [appendix] == Why Migrate? Migrations from Windows to Linux have been the single most consistent theme in Perforce Consulting in over two decades, for many reasons. The procedures have evolved over time, with the modern "failvoer style" replication being the latest in seamless cutover. EDITME Add some of the many reasons. == DRAFT NOTICE WARNING: This document is in DRAFT status and should not be relied on yet. It is a preview of a document to be completed in a future release.