DRAFT NOTICE
This document is in DRAFT status and should not be relied on yet. It is a preview of a document to be completed in a future release. |
Preface
This guide documents the process for migrating a Helix Core (P4D) service from a Windows server machine to Linux. A migration can be minimally disruptive to users if planned and executed properly. This document informs the planning and execution of a Windows to Linux migration.
Because P4D on Linux can run in the same case-insensitive mode that is familiar to users operating on P4D on Windows, the migration can be nearly seamless to users. After preparations,tthe eventual cutover is done with a p4 failover
operation in a scheduled maintenance window (or a series of failovers if edge servers are involved). A failover smoothly transitions the P4D service from one machine to another, with no loss of data and minimal disruption.
The preparation effort typically involves straightforward (though potentially long-running) tasks for the administrator. If the Windows P4D runs with custom triggers or extensions, there will be some degree of complexity depending on how custom triggers and extesions are handled. Typical options include porting or ditching (temporarily or permanently) the custom triggers. Aside from triggers and extensions, if any additional custom automation operates on the Windows P4D server directly, similar handling may be required.
Regardless of the effort and potential complexity handling customization for admins, the migration can culminate in a nearly seamless transition for users.
Please Give Us Feedback
Perforce welcomes feedback from our users. Please send any suggestions for improving this document to consulting@perforce.com.
1. Overview
A Windows to Linux Migration has these elements:
-
Planning
-
Provision New Linux Server machines.
-
Install Perforce Helix on Linux.
-
Setup Linux Replica server spec on Windows.
-
Pull and verify archives. This may take a long while if there is a lot of data to pull, potentially requiring multiple iterations of the pull/verify process.
-
Do a Dry Run of the cutover.
-
Port and/or Test Triggers
-
Test, Test, Test.
-
Correct data issues identified in planning.
-
Adjust configurables.
-
Craft a Cutover Procedure.
-
Execute the Cutover Procedure.
For purposes of this document, it does not matter if the servers are on-premises ("on-prem") or in a private or public cloud environment such as AWS, Azure, or GCP.
Each of these components is covered in detail in this guide.
2. Migration Planning
This section covers things to be aware of when planning.
First, it is helpful to review the related document SDP Migration and Upgrade Guide. This document discusses a Big Blue Green Cutover (BBGC) style of migration. For a Windows to Linux migration, use a special type of BBGC called a Failover Migration: After preparations are complete, a p4 failover
completes the migration to Linux (or a series of failovers if edge servers are involved).
The Windows service may or may not be operated using Server Deployment Package (SDP) for Windows. Regardless of whether the Windows service is managed with SDP, the Windows service is largely left alone during the migration. The target environment will always be setup per best practices as implemented with the Linux Server Deployment Package (SDP).
2.1. Planning for User Impact
The migration can be nearly seamless. Typical impacts to users (humans and automation/bots) for a Windows to Linux migration include:
-
Possibly needing to login again after the Linux server becomes the commit (depending on whether certain configurables like
auth.id
are adjusted during migration). -
If SSL is enabled, needing to trust the Linux server after it becomes the commit.
-
Depending on how user traffic is directed to the Windows server, there may be an impact:
-
If users connect using a P4PORT that includes an an IP address, users will need to change the P4PORT they use.
-
If the failover plan involves changing a DNS name (as opposed to some instantansous method of traffic redirection), there will be delays associated with DNS changes and DNS cache flushing.
-
-
If the migration is planned to include a change in authentication mechanism, e.g. standard LDAP → SAML/SSO with the Helix Authentication Service (HAS), users will need to adapt to this.
Other than being aware of the above, users to not need to do any special preparation for the cutover. For example, users do not need to be concerned about the state of files in their workspaces. Whatever state files in workspaces are in at the time of cutover — checked out to default or numbered pending changelists, shelved or not, etc. — is not affected by the cutover.
2.2. Failover Migration
This document focuses on the failover style strategy. This entails creating a server spec (ServerID) for a standby server we’ll call p4d_fs_linux
, that will operate for a time as a Linux standby replica of the current production Windows commit server. Depending on various factors such as data scale, project priority and complexity, etc. this Linux replica of the Windows commit server may operate for days, weeks or even months before it is ready for the planned and scheduled failover that will promote the Linux standby server to become the new commit server.
This Failover strategy has several benefits:
-
Minimum disruption to end users for the cutover.
-
Allows for extensive testing of the new Linux server(s) and infrastructure prior to cutover.
-
The effect on the original Windows server(s) and infrastructure is minimal.
-
Rollback, while hopefully unnecessary, is straightforward.
While planning and preparation will take time and effort, the disruption to end users can be minimal.
If your current method of operating Helix Core on Windows does not produce a regular metadata checkpoint, a change is required to get at least some basic form of checkpoint process in place. (If you are not sure what a checkpoint is, see: backup and recovery concepts.) |
The Failover strategy requires that the Windows Helix Core P4D service be at version 2019.1 (latest patch) or later. If it is not already at the latest patch available of 2019.1 or a later major version, than the plan should account for first upgrading the Windows service in place to a more recent version, such as 2023.1.
Other strategies could be considered that would not require upgrading in place if avoiding an in-place upgrade is a priority. That would entail longer downtime and other complexity. Such options are not explored in this document.
2.3. Custom Triggers and Extensions
The largest single variable in terms of effort estimation in planning for a Windows to Linux migration is the effort required to port any custom triggers or extensions. This can be literally zero effort if you have no custom triggers or extensions (or none that need to survive the migration). If porting and/or testing is required, that becomes a software development and testing project on its own that folkds into the larger migration project.
Any custom Triggers or Extensions will need to be reviewed. Any that can’t be discarded will need to evaluated for porting and testing needs.
Triggers written in a native Windows langage such as batch or PowerShell, or operated as compiled .exe files, will need to be ported. Even triggers written in more portably languages such as Python or Perl will need testing and may need adjustment to operate in the Linux environment.
Extensions are written in Lua, the interpreter for which is entirely containdd in the Helix Core p4d binary itself. As such, custom extensions are less likely to require porting. However, they should still be evaluated and/or tested to be sure they have no Windows OS depenencies in their implementation.
Extensions provided by Perforce Software, such as those assoicated with Helix Swarm and the Helix Authentication Service, are inherently cross-platform and do not need to be ported.
2.4. Other Custom Automation on Windows Server Machine
Determine whether you have any custom softawre that runs directly on the Windows server machine. Custom automation that executes directly on the Windows server machine itself needs to be evaluated for porting and testing needs.
Because the migration is transparent, any automation that merely connects as a client to the Windows p4d server, such as build server farms, need not be considered (other than possibly needing to login or trust p4d again and/or possibly change the P4PORT, as noted in Section 2.1, “Planning for User Impact”).
2.5. Depot Root and Depot Spec Map Fields
Perforce Helix depot specs have a field named Map:
that, if used, must be eliminated prior to the deployment of a Linux replica. Further, the server.depot.root
configurable must be set on the commit server.
If done carefully, the changes to set server.depot.root
and clear the Map:
field of each depot spec can be done non-disruptively on the live running Windows Perforce Helix Core service. This must be done before creating the checkpoint used to seed the Linux replica.
The key to making the change non-disruptively is to understand that the p4d server will use the Map:
field value to see if it is set to anything other than the default, and otherwise will fall back to the server.depot.root
configurable to find depots. If the value of the Map:
field of any given depot is TheDepotName/…
, that means the value is not explicitly set.
Before making changes, the singular server.depot.root
value must be made to work for all depots. A common goal early on is to make the single server.depot.root
path work without actually moving any files, but by using Windows directory symlinks. If individual depots are on different drives, put symlinks to all depots in the directory pointed to by the server.depot.root
configurable so that p4d can find all depot files from that path. You may also find the Map fields use Windows UNC paths or if Windows junctions.
Special planning may be required if there are any depots of type archive
.
2.6. The journalPrefix
The Windows commit server must have the journalPrefix
value be set in order to set up the Linux replica. It can be set to any value that works to enable the p4d service to find its archives, but cannot be unset.
2.7. Find Incompatible Configuration Settings
Using the p4 configure
command to interact with db.config
is a good way, and in many cases the only way, to set various configuration items with a Helix Core server. However, there are certain settings that must not be defined with p4 configure
, as they conflict with settings the SDP defines with shell environment variables on Linux.
Review the output of the command p4 configure show allservers
and see if any of the following are set:
-
P4JOURNAL
-
P4PORT
-
P4LOG
If any of these are set with p4 configure
, the migration plan will need to deal with unsetting them after first ensuring they are set in some other way on the Windows service. Following is an example of how to replace how P4LOG is set displays in the output of p4 configure show all servers
. Note that changing this requires a brief service restart to take effect.
2.7.1. Sample Procedure to replace P4LOG configurable
This is an example of how this might be done if the Windows service name is Perforce
:
p4 set -S Perforce P4LOG=L:\p4logs\p4d.log
That will set the P4LOG variable so that it is associated with the Windows service named Perforce
. Once that is done, it can be unset as a configurable, such as in this example:
p4d.exe -r E:\PerforceRoot "-cunset P4LOG"
Next, stop and then start the Windows service as you normally would.
2.7.2. Other Windows Paths in Configuration
Also scan for things like Windows paths, such as Structured Logs defined to reference a Windows path. Such things will need to be be overridden in the server spec for the Linux replica. For example, if you see:
any: serverlog.file.11=E:\PerforceRoot\triggers.csv
You’ll want to create an override for the Linux replica by doing:
p4 configure set p4d_fs_linux#serverlog.file.11=/p4/1/logs/triggers.csv
2.8. Uncompressed Journals
Examine how checkpoints and journals are currently taken on the Windows environment (or of they are taken at all).
If journals on the Windows service are compressed, replication will not work. Replicas require uncompressed journals.
As a general rule, the p4d -jc
command is best done with -Z
, which compresses the checkpoint file, but not the numbered journal files. Changes to any custom scripts that manage checkpoints in the Windows environment may be warranted.
2.9. Avoid Case Sensitivity Conversion
Since this document is about Windows to Linux migrations, the data set will naturally and necessarily be case-insensitive at the start of the project. This document does not discuss case sensitivty change, as it is unnecessary for a Windows to Linux migration. If there is a desire to become case-sensitive (for example, to support Linux clients), we advise deferring that as a separate project to be done after the Windows to Linux migration is complete.
A Windows to Linux migration that preserves the original case-insensitive behavior, as described in this document, is minimally disruptive. A case sensitivity conversion is best to defer until the Windows to Linux migration, for several reasons:
-
The conversion to case-sensitive can only be done on Linux.
-
Case sensitivy conversion can be disruptive to users and workflows, and may result in data loss (although data that will be lost will be known before the loss).
-
Case sensitivy conversion requires significant downtime.
-
Case sensitivy conversion requires duplication of 100% of versioned file storage (during development and testing of the case conversion process on your data).
-
Case sensitivy conversion may potentially disrupt tooling that interacts with your server.
Generally speaking a case sensitivy conversion is more complex than a Windows to Linux conversion, sufficiently so that we advise relegating case sensitivty conversion to a separate project from the Windows to Linux migration. The case sensitivty conversion, if done at all, can be started after the Windows to Linux migration is complete. The case sensitivty involves doing neurosurgery on your Helix Core data set using the p4migrate utility.
Further discussion on case sensitivty conversions is outside the scope of this document.
3. Helix Core Topology
The complexity of a Windows to Linux migration project is naturally affected by the baseline compelxity of the Helix Core ecosystem operating on Windows.
Is your server a single machine, or are there many server machines? In any case, you’ll want to think in terms of a "Big Blue/Green Deploy." Every active Windows server machine in the current production topology (the "Blue" servers), including all replicas, edges, and proxies, will all need equivalent Linux server machines to replace them (the "Green" servers). Replicas are straightforward to handle. Handling edges and/or filtered replicas adds complex complexity to be aware of.
Consider what Perforce Helix server machines and services exist in your Windows topology:
3.1. Helix Proxies
In some cases, Linux proxies will have existed with a Windows commit server all along, as running proxies on Linux is advisable even in a topology with a Windows servers (for some of the same reasons that a Windows to Linux migration is popular, such as much faster native filesystems).
Any Windows should be migrated to Linux as well. However, while strongly discouraged, a Windows p4p (proxy) can remain in place with a Linux p4d server topology (so long as it operates in case-insensitive mode, which we assume in this document).
3.2. Helix Brokers
Helix Brokers should be migrated to Linux as well. However, while strongly discouraged, a Windows p4broker can remain in place with a Linux p4d server topology.
If brokers are configured with any custom software (broker "filter" scripts), porting this software to Linux should be accounted for in planning.
3.3. Helix Core P4D Servers
Every Helix Core topology will have exactly one commit server. If there is only a single server in the topology, it is the commit server. It may have additional p4d servers that extend the topology. Following are types of p4d servers (various typess of replicas) and their implications for a Windows to Linux migration:
3.3.1. Edge Servers
3.3.2. Filtered Replicas
3.3.3. Unfiltered Replicas
3.3.4. Standby Servers
3.3.5. Distribution Servers
A Windows to Linux migration has no impact to exsiting servers of type distribution-server
.
EDITME: Should be true, but test this to confirm.
3.3.6. Helix Swarm
Helix Swarm is essentially a client to the Helix Core server, and as such is largely unaffected by a Windows to Linux migration. It may possibly need to change the configured P4PORT it uses to connect to the commit server, as noted in Section 2.1, “Planning for User Impact”). In the case of Helix Swarm, this would involve update its config.php
and reloading Swarm’s configuration.
3.3.7. Helix Authentication Service
If the Helix Authentication Service (HAS) has been deployed for the Windows commit server, it can be left in place and will be entirely unaffected by and unaware of the Windows to Linux Migration.
Optionally, the HAS service can be moved onto the Linux commit server machine for easier management.
3.3.8. P4DTG
If the Perforce Defect Tracking Gateway (P4DTG) has deployed for the Windows commit server and operates on a separate server machine, it can be left in place and will be entirely unaffected by and unaware of the Windows to Linux Migration.
If P4DTG operates on Windows, it could be migrated to Linux as well, or left in place. However, while there are many compelling reasons to migrate Helix Core to Linux, there is less of a need to migrate P4DTG if it is stable and operating well. Migrating P4DTG to Linux (e.g. if normalization to all Linux infrastructure is a goal) can be done entirely independetly of, or as part of, the Windows to Linux migration project.
3.4. Moving Archive Files
Once the Linux replicas are setup, a variety of strategies can be used to transfer archive files.
Plan 3 cycles of p4verify.sh
, to get p4d to pull the archives. The first, starting with no archive files, is to start a bulk pull. That could take days or weeks depending on data scale. The second to fill in gaps, and the 3rd should be clean.
Depending on scale of data, you may want to consider using outside-p4d mechanisms for transferring some archives (especially the .gz
files, ,v
files should be transferred with p4 pull
ideally).
Lots of variations on how to get the archives files there. Using p4 pull has an advantage better that, if the Linux p4d writes the archive, it can always find it, even it it’s funky with Unicode cruft in the path. By contrast, files copied outside p4d may not be found by the Linux p4d. However, for bulk pulls of Terabytes of data, a Windows port of rsync, at least for .gz files, will be much faster. You’ll need a live running rsync service on Linux for the Windows port of rsync to talk to. There are many options here; somehow or other get the files in place so p4verify.sh is happy.
|
3.5. Combining Upgrade with Migration
If the priority is to avoid upgrading or touching the Windows environment, an upgrade to a modern Helix Core version can be done to the Linux server during the cutover, as part of the Windows to Linux migration project.
Alternately, you can upgrade the Windows P4D in place first, and then set up the Linux replica on the same modern P4D version. If the starting Windows version is 2019.1+, a Failover style migration is possible; otherwise a different strategy is needed.
Typically we recommend doing the failover-then-upgrade in the same maintenance window as the Windows to Linux migration. That is, failover to the new server on Linux on the same p4d version as Windows was initially. Then once on Linux, do the standard SDP upgrade procedure for Linux, using upgrade.sh
.
3.6. DRY RUN
At least one Dry Run is required to confidently execute a migration. Plan to have at least one.
In the dry run, the p4 failover
command is NOT used. Instead, the Linux service is stopped, and the $P4ROOT/server.id
file is simply hand-edited to be the ServerId the the commit server. Then the service is restarted.
At that point, the Linux commit server will believe itself to be the new commit server, even though users will still be using the Windows server for real work. Then the Linux server can be tested in various ways:
-
Test connectivity from all user access points.
-
Test connectivity from all server access points, including replicas, proxies, and any integrated systems such as Jenkins, Swarm, P4DTG, etc.
-
If there are any
ldap
specs, ensure the targeted LDAP servers can be reached from the Linux server. (This may require firewall adjustments).
3.7. Setup Linux Replica ServerID
On the Windows commit server, create a server spec to represent p4d on Linux. Call it p4d_fs_linux
.
4. Migration Preparation
4.1. Provision New Linux Server Machines
EDITME - Add content here.
4.1.1. Select Operating System
As of this writing, the best options are:
-
Ubuntu 22.04 (or 20.04)
-
RHEL/Rocky Linux 9 (or 8)
5. Install Perforce Helix on Linux
5.1. Install Helix Core Software
On the Green Linux server machines that do not yet have any data, use the Helix Installer, do a Configured Install.
The Helix Installer is only to be used on truly "green" server machines, those with no Helix Core data on them yet. |
su - mkdir -p /hxdepots/reset cd /hxdepots/reset curl -L -s -O https://swarm.workshop.perforce.com/download/guest/perforce_software/helix-installer/main/src/reset_sdp.sh chmod +x reset_sdp.sh ./reset_sdp.sh -C > settings.cfg
In settings.cfg
, change these settings:
-
DNS_name_of_master_server=
-
P4_PORT=
-
Instance=
-
Password=
-
CaseSensitive=0
-
P4USER=
-
ServerID=
-
ServerType=
-
P4BinRel=
-
P4APIRel=
Then run the script:
./reset_sdp.sh -no_sd -c settings.cfg 2>&1 | tee log.reset_sdp.txt
su - perforce p4 set
cd /p4/common/site [[ -d config ]] || mkdir config cd config
5.2. Create the Linux replica.
Temporary Hack:
vi /p4/common/site/config/p4_N.vars.local
export P4MASTER_ID=windows.p4d export P4MASTERPORT=192.168.1.5:1666 export P4PORT=$P4MASTERPORT
p4login -v
cd /p4/common/config vi SiteTags.cfg
azwestus2: Azure data center
Add to Protections:
super group ServiceUsers * //...
mkrep.sh -t fs -s usw2 -r p4d-commit-wus2
Undo Temporary Hack:
vi /p4/common/site/config/p4_N.vars.local
#export P4MASTER_ID=Master #export P4PORT=$P4MASTERPORT export P4MASTERPORT=10.0.0.4:
Appendix A: Sample Cutover Procedure
A.1. Sample Migration Scenarion
The following is a sample cutover procedure for a topology with a commit server and an edge server, with custom triggers that have been ported to Linux.
The sample instructions assume the perforce
OS user on the Linux servers has been setup with the proper shell environment, specifically that the ~/.bashrc
has sourced the /p4/common/bin/p4_vars
file with the appropriate SDP instance parameter.
The preparation for this sample cutover scenario would have included:
-
As set of ported and tested Linux custom triggers (replacing former custom triggers on Windows) deployed on all Linux servers as
/p4/common/site/bin/triggers
folder. -
A triggers table suited for operation on the Linux server after it becomes th commit server.
A.2. One Week Prior to Cutover Procedure
STEP 1: Verify Replication
Verify that replication is healthy on the Linux replica, the Windows edge, and the Linux replica of the Windows edge server.
A.3. One Day Prior to Cutover Procedure
STEP 1: Verify Replication
Verify that replication is healthy on the Linux replica, the Windows edge, and the Linux replica of the Windows edge server.
STEP 2: Checkpoint Linux Replicas
On the Linux standby of the Windodws commit server, and separately and in parallel on any Linux standbys of Windows edge servers, request a checkpoint:
p4 admin checkpiont -Z
Do a p4 info first and confirm that the target ServerID is that of the Linux edge server, to avoid taking an unintentional "live checkpoint" of the commit server.
|
Next, on the Windows commit server, execute a journal rotation:
p4 admin journal
Once this command has been run, it will trigger the Linux server to start taking a checkpoint. On the Linux server, a checkpoint should immediately appear in the checkpoints directory.
The checkpoints directory is '/p4/N/checkpoints' for the standby of the commit server, `/p4/N/checkpoints.ShortServerID |
Monitor the checkpoints directory and await the appearance of a *.md5 with the same number as the checkpoint. The existence of the MD5 file incidates the successful completion of the checkpoint process.
Use the watch
utility and wait until the *.md5 file appaers.
STEP 3: Replay Checkpoint to offline_db.
On the Linux commit and edge servers, replay the checkpoint to the offline_db. This can be done regardless of whether the local p4d service is replicating or even online at all, and is nether affected by nor disruptive the p4d service.
nohup recreate_offline_db.sh < /dev/null > /dev/null 2>&1 &
Monitor until completion with:
tail -f $LOGS/recreate_offline_db.log
A.4. Cutover Procedure
STEP 1: Verify Replication
Verify that replication is healthy on the Linux replica, the Windows edge, and the Linux replica of the Windows edge server.
STEP 2: Disabled Scheduled Tasks
On the old Windows commit and edge server machines, disable any Scheduled Tasks related to backups or checkpoints. Also ensure no long-running checkpoint or backup opreations are in progress that won’t be complete by the time of the intended cutover.
STEP 3: Disable Crontabs
On the new Linux commit and edge server machines, save and then disable all crontabs intended for routine produdction operation (and they may have been left on during dry runs).
STEP 4: Lockout Users with Protections
STEP 5: Stop Services
STEP 6: Start Services
STEP 7: Rotate Journal
STEP 8: Verify Replication
STEP 9: Failover Edge Server
STEP 10: Apply Metadata Changes for Linux
Apply metadata changes required for operation on Linux and commit server now being on SDP.
STEP 11: Failover Commit Server
STEP 12: Do Sanity Tests
STEP 13: Decide: GO/NO GO
STEP 14: Restore Default Protections
STEP 15: Direct User Traffic to Linux
STEP 16: Enable crontabs
EDITME
Appendix B: Why Migrate?
Migrations from Windows to Linux have been the single most consistent theme in Perforce Consulting in over two decades, for many reasons. The procedures have evolved over time, with the modern "failvoer style" replication being the latest in seamless cutover.
EDITME Add some of the many reasons.
6. DRAFT NOTICE
This document is in DRAFT status and should not be relied on yet. It is a preview of a document to be completed in a future release. |