= SDP Migration and Upgrade Guide Perforce Professional Services :revnumber: v2023.2 :revdate: 2024-02-29 :doctype: book :icons: font :toc: :toclevels: 5 :sectnumlevels: 4 :xrefstyle: full // Attribute for ifdef usage :unix_doc: true == DRAFT NOTICE WARNING: This document is in DRAFT status and should not be relied on yet. It is a preview of a document to be completed in a future release. == Preface This document is useful in when getting an existing Perforce Helix Core installation to the optimal operating environment from any starting condition. Whether the starting environment is a large enterprise or "garage band" scale, on-prem or on a public or private cloud, or on any operating system, this guide can help get to the optimal deployment environment for Perforce Helix Core. This document focuses on software rather than hardware aspects of an optimal operating environment. While hardware is not discussed in detail, the migration and upgrade plans describe in this document provide an opportunity to test and change out hardware. This document does not discuss case sensitivity or case-conversions. Those are discussed in the link:SDP_Win2Linux_Guide.html[SDP Windows to Linux Migration Guide]. This guide assumes familiarity with Perforce Helix Core and does not duplicate basic information in the Helix Core documentation. *Please Give Us Feedback* Perforce welcomes feedback. Please send any suggestions for improving this document or the SDP to consulting@perforce.com. :sectnums: == Introduction === Optimal Helix Core Operating Environment Getting to an optimal operating environment for Perforce Helix Core first requires defining various aspects of optimal. For our purposes in this document, optimal means: * SDP version 2020.1+. This is the SDP version from which SDP upgrades are automated. * Helix Core (P4D) version 2019.1+. The P4D 2013.3 and 2019.1 versions were major architectural overhauls requiring special upgrade procedures. Once the P4D version is 2019.1 or later, future upgrades are standardized. * P4D Server is operating on Linux, on a a major version with plenty of support life left in it. As of this writing, that would be RHEL/Rocky Linux 9, or Ubuntu 22.04. RHEL/Rocky Linux 8 and Ubuntu 20.04 are actively supported as well, but with less runway available. For EOL dates of various Linux distros, see: - link:https://access.redhat.com/product-life-cycles/[RHEL] - link:https://wiki.rockylinux.org/rocky/version/[Rocky] - link:https://ubuntu.com/about/release-cycle[Ubuntu] - link:https://www.suse.com/products/server/[SuSE] - link:https://endoflife.date[EOL dates for multiple distros] * Physical layer (server machines, storage subsystems, etc.) is as desired. The above constitutes the desired *End State* a migration. In public cloud environments, an optimal Helix Core server be deployed instantly in an optimal environment by using the link:https://www.perforce.com/blog/vcs/perforce-enhanced-studio-pack[Enhanced Studio Pack (ESP)], an offering by Perforce Software available on Amazon and Azure marketplaces. ==== Optimal Storage and NFS This document doesn't focus on hardware or storage components of the definition of optimal for Helix Core. Using NFS isn't part of the optimal definition. At small scale, the cost and complexity introduced by NFS may not be worth the various benefits. However, as the scale of data increases to and above tens of Terabytes, options involving more scalable filesystem solutions like NFS start making sense and may even be effectively required. NFS can be used in on-prem and public cloud environments. ESP installations do not use NFS, and thus would require adjustment to handle very large Helix Core data sets. === Motivation ==== Global Topology Upgrades This document is written to support projects commonly referred to as a *global topology upgrade*, sometimes part of an even larger *infrastructure modernization* effort. Such upgrades are commonly driven by desires to maintain performance, security, supportability, access to new product features, and in some cases to escape custom aspects of local infrastructure. If _all_ defined aspects of the desired End State are already met, you don't need this document. Instead, use the standard upgrade procedure documented in the link:https://swarm.workshop.perforce.com/view/guest/perforce_software/sdp/main/doc/SDP_Guide.Unix.html#_upgrades[Upgrades section of the SDP Guide]. ==== Helix Remote Administration Another potential additional motivation is to get assistance for managing Helix Core servers, or perhaps to get out of that role entirely (perhaps due to departure of key personnel). If there is interest in turning over the keys to your Perforce Helix servers, consider link:https://www.perforce.com/support/consulting/helix-remote-admin[the Helix Remote Administration program (HRA)]. To be eligible for HRA, customers must be on an optimal environment. Signing up for the program commonly entails a process referred to as "HRA Onboarding," which essentially means doing a Migration Style Upgrade to an optimal environment, as outlined in this document. === Migration Style Upgrades This document focuses on the Migration-Style Upgrade strategy, as opposed to in situ (in place) upgrades. In situ upgrades are preferred when your deployment environment is already optimal, as defined above in <<_optimal_helix_core_operating_environment>>. If _all_ of the aspects are currently in the desired state, you don't need this document. Instead, use the standard upgrade procedure documented in the link:https://swarm.workshop.perforce.com/view/guest/perforce_software/sdp/main/doc/SDP_Guide.Unix.html#_upgrades[Upgrades section of the SDP Guide]. If the hardware, operating system, or P4D/SDP versions are not as desired, this guide is for you. A Migration-Style Upgrade is great when you need to make a _big change_ in the least disruptive way possible. A key characteristic of a Migration-Style Upgrade is that your original server environment is largely left alone, with little or no change. Typically the original environment remains available for some time after the upgrade, with the general idea that the old environment can eventually be decomissioned, archived, and/or simply deleted. === Big Blue Green Cutover In a Migration-Style Upgrade, a new set of server machines and supporting infrastructure (such as storage) is deployed that reflect the desired End State. This set of servers is referred to as the Green servers or infrastructure. The current, live production equipment is referred to as the Blue servers or infrastructure. In preparation for an eventual cutover, Helix Core data is brought into the Green environment. This is usually non-disruptive to users operating on the Blue (live production) environment. The Green Environment operates for a time as a test environment, allowing opportunity to test various aspects of the new infrastructure before it becomes the production infrastructure. Depending on risks and needs, testing can be cursory or extensive, lasting days or months. TIP: If your current method of operating Helix Core does not produce a regular link:https://www.perforce.com/manuals/p4sag/Content/P4SAG/backup-recovery-concepts.html[backup of checkpoints and versioned files], a change to the Blue environment will be required to get at least some basic form of checkpoint process in place. This may involve disruptive operations that might need to be scheduled in a maintenance window before the Green environment can be setup initially. A Big Blue Green Cutover (BBGC) is planned with the eventual goal of cutting over from the entire Blue infrastructure to the Green infrastructure in one single maintenance window. The _Big_ in Big Blue Green Cutover indicates that a phased approach cutover is not an option. In some types of projects, using a phased approach to migrations can mitigate risk. However, in this type of project, a phased cutover actually introduces more risk and complexity, because it requires operating bi-directionally across the Blue and Green infrastructures. In BBGC, there is a one-way, one-time migration from Blue to Green. == Migration Planning === Define Start State The starting environment can pretty much anything: * Any P4D version going back to 1995. * P4D server machine operating on Windows, UNIX, Mac OSX, Linux, or other platform. * Any legacy method of managing Helix Core: - An older version of SDP (prior to 2020.1). - Home-grown custom scripts. - Management scripts provided by a 3rd party such as ICManage. - The p4dctl service, possibly installed with the helix-p4d package installation. - Manual procedures. - No management whatsoever. === Take Inventory At the outset of a Migration Style Upgrade, take stock of everything that comprises the Helix Core ecosystem. The inventory should be comprehensive of everything that is part of your infrastructure, regardless of whether you intend for it to be affected by the current upgrade project. In the simplest case, inventory might consist of single server machine with a single p4d commit server. Some items to include in the inventory are: * All server machines involved in the ecosystem, including those running Helix Core software (such as p4d servers) as well as those not operating any Helix Core services but from which automation runs, such as build server farms. * All Helix Core software components, such as - Server products p4d/p4broker/p4p - p4broker p4p, Helix Swarm, P4DTG, Helix DAM, etc. * Any customization done using Helix Core custom features: - Any custom Helix Core triggers. - Any custom Helix Core broker scripts. * Any other custom automation. * Any systems integrations with 3rd party systems, such as issue/bug trackers, task and agile project management systems. If in doubt about whether a system is potentially affected by or involved in the upgrade, include it in the inventory for consideration. TIP: Triggers in Helix Core are a means of customizing and extending Helix Core behaviors. TIP: If an ecosystem has production and non-production elements (such as a copy-of-production sandbox environment), those distinctions should be noted in the inventory. Such distinctions may effect whether the elements are in the scope of the effort, and when they are handled in the schedule. === Define End State At a minimum, the desired *End State* of a Migration-Style Upgrade is as defined above in <<_optimal_helix_core_operating_environment>>. In addition to those aspects, you may choose to add other aspects to your desired *End State*, depending on the goals of your migration. In the mindset of "While we're at it," some common examples things added to the End State definition are: * Authentication and Single-Sign On (SSO): Enable a Single Sign On (SSO) solution using the link:https://github.com/perforce/helix-authentication-service/[Helix Authentication Service]. * Monitoring: Deploy monitoring with link:https://github.com/perforce/p4prometheus[P4Prometheus]. Even if you are "pretty close" to the definition of optimal, say on Rocky Linux 8 but otherwise modern, the Migration-Style Upgrade is the preferred method of getting to the modern topology (on Rocky 9). (If your environment is optimal in all other respects and _only_ the P4D version is older, than you might consider an in situ upgrade). Other options are not discussed in this document, because the Migration-Style Upgrade has significant benefits that often make it preferable even when other options are possible. The specific starting environment will impact migration options and procedures, as will be called out in this document. Some _big changes_ that call for a Migration-Style Upgrade include: * Windows to Linux migration. Those are discussed in the link:SDP_Win2Linux_Guide.html[SDP Windows to Linux Migration Guide]. * Upgrade of a major operating system version, e.g. CentOS 7 -> Rocky 8, RHEL 8 -> RHEL 9, Ubuntu 20.04 -> Ubuntu 22.04, or even a Linux family change, e.g. CentOS 7 -> Ubuntu 22.04. * Upgrade of SDP from a version prior to 2020.1, where the link:https://swarm.workshop.perforce.com/view/guest/perforce_software/sdp/main/doc/SDP_Legacy_Upgrades.Unix.html[SDP Legacy Upgrades] applies, a well-documented but manual upgrade procedure. That document is for in situ upgrades of SDP -- this document avoids the in situ procedure entirely by using a Migration-Style Upgrade. The primary scenario this document is focused on is a migration to a cloud provider, though only minor adaptations are needed if going to an on-premises environment. We lean toward a cloud environments for documentation purposes, because we can assume more, and define more, with a cloud as a target. If your target environment is on-prem, there is a greater likelihood of local polices, practices, and perhaps even different teams within your organization that may be involved. === Define Project Scope Scoping the migration project starts with the inventory. Decide on a per-item basis whether that item in the inventtory is to be included in the migration, left alone, or perhaps decomissioned. Pay attention to whether any software components are obsolete or have been sunset. For example, if P4Web (a legacy server that was sunset in 2012) is part of your inventory in the Blue environment, you'll want to plan to replace with its successor, Helix Swarm. In addition to scoping based on inventory, the scope must take into account considerations mentioned in the End State, such as adding HA/DR, changing the authentication mechanism, etc. === Define Server Templates It is a good idea to define templates for server machines in some form. You may require multiple templates depending on the classes of machine you will operate in the Green environment. You will certainly have a Helix Core (p4d) server. You may have a separate template for Helix Proxy server, and another for a Helix Swarm server, etc. A review the inventory of server machines can guide the list of server templates needed. Some common forms of defining a server template are: * A written language description (perhaps on a wiki page) that defines key characteristics such as server hardware specifics (RAM/CPU), storage system details, operating system, and the like. * A "golden image," a labeled virtual image in your preferred virtualization infrastructure. * A script of some kind that converts a base operating installation into one suitable for Helix Core. * link:https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/Welcome.html[AWS CloudFormation Templates]. * link:https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/overview[Azure Resource Manager Templates]. * link:https://developer.hashicorp.com/terraform/intro[Terraform]. * The Enhanced Studio Pack (ESP) from Perforce. See <<_enhanced_studio_pack>>. ==== Enhanced Studio Pack The link:https://www.perforce.com/blog/vcs/perforce-enhanced-studio-pack[Enhanced Studio Pack (ESP)] should be considered if the target environment is Amazon or Azure. ESP is not available for on-prem, and not presently available for other cloud environments. Even in cases where ESP is available, it may not be the best choice. Some cases where ESP is not optimal or would need adjustment afterward include: * You have a corporate policy that dictates machines must be launched from internally produced baseline virtual machines images. ESP is essentially a set of machine images, but of course not be based on any customer's unique base images. (In some cases, an exception can be granted because ESP is reasonably well secured, using link:https://www.perforce.com/press-releases/support-rocky-linux-and-centos-alternatives[Security Enhanced Rocky Linux provided by Perforce OpenLogic]). * You plan to use advanced storage solutions like NFS. * Your `perforce` operating system user is defined in LDAP rather than being local. (Note: We recommend using local accounts when you can for optimal reliability, but using an LDAP or NIS account is required when using NFS to ensure numeric user IDs align across a fleet of server machines). === Consider Client Upgrades Generally speaking, for a global topology upgrade, it is a good idea to plan to upgrade Helix Core client software as well as server components. Helix Core has a powerful feature that allows client and server software to be upgraded independently, so that upgrading the server does force an upgrade of all clients at the same time, and clients can upgrade ahead of the server version as well. However, the greater the disparity between client and server versions, the greater the risk of issues due to version skew between clients and servers. Further, new clients are going to have the latest security features. Lastly, certain product features and security benefits require clients to be upgraded. Upgrading clients should be considered during a Migration Style Upgrade. Typically Migration Style Upgrades are done infrequently compared to in situ upgrades, perhaps once every 3 to 6 years, and thus a good time to address client/server version skew. The following should be considered: * The `p4` command line client binary. Updating this may involve notifying users how to download it themselves, or may involve an admin updating the binary on a system used by others. * The P4V GUI client, which may be installed by individual users or provided by admins to users. * Any software built on any of the Helix Core APIs, such as the {cpp} API or any of the APIs derived from the {cpp} API such as P4Perl, P4Python, etc. This may include 3rd party tools. Such software will require recompiling with the new API version, and may possible require code changes and associated testing. If your environment has custom automation built with the {cpp} or derived APIs, updating such things will require specialized expertise. If the automation is from a third party, you may need to explore options for getting updated versions from the vendor to match the new Helix Core version, or information about whether an upgrade is needed. TIP: If updating clients becomes a complex endeavour, the risk calculus becomes a bit complex. Changing more complex things at once can increase risk, but so can allowing too much version skew. Think of excessive version skew, say 3+ years, as a form of technical debt: Sooner or later you'll need to make a payment. TIP: It you don't have a good sense for what clients are connecting your Helix Core server, some server log analysis can work wonders. There are various approaches, but the gist is that you can scan a series of p4d server logs, say for a week or so worth of operations, and determine what client programs are connecting to your Helix Core servers. == Migration Preparation === Build the Green Infrastructure. EDITME == Sample Test Plan EDITME == Sample Cutover Procedure EDITME - Add content here. [appendix] == DRAFT NOTICE WARNING: This document is in DRAFT status and should not be relied on yet. It is a preview of a document to be completed in a future release.