HMS continues modest evolution.
The HMS refactoring is complete and the car is back in the garage!
The HMS has been in "car apart in the garage" state due to changes related to refactoring HMS, splitting it from the SDP, back into a separate project (which is actually how it started before being blended into the SDP for a time). From a deployment perspective, HMS is deployed as a layered product on top of the SDP (requiring SDP 2019.2+). From a consumption perspective, the HMS and SDP are separate products.
Customers using the Helix native DVCS features to fetch new versions of the SDP from the Workshop (e.g. to support a 'fetch' and 'merge' process for pulling vanilla SDP into a potentially custom local implementation) will need a new procedure. The new procedure will fetch both HMS and the SDP from separate areas in the Public Depot, while consolidating them locally for deployment. This updated procedure will be documented soon, and referenced on this page.
News Flash: The HMS Project is being extracted from the SDP project, to once again become a standalone project with its own roadmap.
As Perforce Helix evolves to meet ever-growing enterprise demands, sophisticated global deployment architectures have become commonplace.
There has been a corresponding rise in deployment complexity as more Helix installations take advantage of:
That's a lot of complexity to manage! Fear not! The Helix Versioning Engine and the Server Deployment Package (SDP) are well suited for this purpose. The Helix Management System evolves and codifies manual best practices used by Perforce Consultants and enterprise site admins to help customers manage sophisticated enterprise environments.
Simply put: Routine Helix administration tasks should be consistent and simple.
HMS will help with:
Knowing What you Have - Knowing what components and what versions of all the various Helix components exist in your topology. HMS v1.0 will not do form of automated discovery of the topology, but will provide a well defined way of defining and tracking all components in use, where they are, and various details of how they are configured.
Consistent Start/Stop/Status - Managing various Helix Server instances in your environment, including system start, stop, and "Down for Maintenance" modes. The mechanical steps to start and stop a Helix Server can vary, based (for example) on whether or not a broker is in place for that instance, whether or not SSL is enabled, etc. HMS abstracts those details to get it down to Start/Stop/Status.
Health Status Ping - For each Helix topology component, we'll quickly see if it is running or not for v1.0. (Later, this may expand to include aspects of health monitoring, going beyond mere up/down status).
Upgrading - Upgrades in a global topology are straightforward and well understood, but there are a lot of moving parts and a lot of commands to type to make it happen. HMS will make this easy by knowing which components have a newer version available and/or a newer patch of the same version available. It will be easy to upgrade all Helix topology components, individually or in groups, with a click of a button. In sophisticated topologies involving edge servers and replicas, there will be a built-in awareness of the order in which components must be upgraded, without relying on a human admin to know those details.
When it comes to topology-wide upgrades, enterprises need it all -- control, flexibility (which introduces a degree of complexity), and operational simplicity. They may want to apply a P4D patch to one instance but not others, or upgrade all instances at once. We can present admins with options rather than have them figure out custom upgrade procedures for updating executables and symlinks tweaking to get it right.
Human-Initiated Failover - HMS will execute the steps to achieve a Failover by executing a single command hms failover
.
Stretch Goal: Failover addresses mechanics comprehensively, even including such things that are often outside the scope of Perforce administrators, but which are truly necessary to achieve failover in some environments. Things like DNS updates, Virtual IP configuration changes, etc.
Hardware fault detection and automated failover initiation are explicitly outside the scope of this project. This project's more humble goal is simply to clarify and simplify the mechanics of executing a failover. That said, these are necessary first steps to those loftier goals.
Comprehending the Parts - Knowing every detail will help Perforce admins understand the many moving parts that keep a Helix environment happy. Details like:
p4 info -s
from each Helix Server instanceAll this and much more is needed and should be visible from a source more dynamic and reliably updated than a human-maintained wiki page. The data will be gathered centrally for the human administrator, but kept current automatically.
With Helix Server 2015.2, High Availability (HA) and Disaster Recovery (DR) solutions are closer to being commoditized than ever before. But it's still not quite commodity. HMS captures and codifies what Perforce Consultants have done for individual customers with custom solutions, automating all the wiring under a big red Failover button.
A set of pre-defined, pre-configured failover options are defined with HMS. At the time of execution, the administrator must select from a short list of options to execute a failover. Based on the type of option selected (Local, HA, DR), failover will occur to a pre-defined target machine for that type of failover.
Planned Failover - Planned failover is a planned, generally scheduled event, not a reaction to a problem. In a planned failover, assumptions can safely be made about the state of the things. This might occur, for example, to allow master Server A to be powered down for several hours to add RAM, with Server B coming online to avoid downtime of the Helix ecosystem for more than a few minutes. Nothing is broken, so this type of failover can be nearly transparent.
Unscheduled failover occurs as a decision by a human administrator, in reaction to something breaking. The human administrator must determine the nature of the problem, and determine if failover is needed, and if so, what failover option is best.
Following are the list of potential failover options that can be configured:
Local Failover - Local failover is a failover to an offline copy of the Perforce databases on the same machine. This is useful for scenarios where the database integrity is in question for some reason, but there's no reason to suspect the hardware is damaged. For example, this might be the case after a sudden power loss, or error on the part of a human administrator (like removing live databases by accident -- yes, it happens to the best of us).
HA Failover - HA failover involves failover over to another server in the same data center, optionally sharing storage with the master for archive files. Little or no data loss is acceptable for an HA failover, and downtime should be minimal.
DR Failover - DR failover involves failing over to another data center. Some data loss is expected, and higher downtime is deemed acceptable (as it is unavoidable). DR failover can be further classified as:
See: HMS Product Road Map.md.
See: SystemComponents.md.
Failover in an enterprise environment may always involve some degree of customization. HMS will capture everything that can be understood based on how the various Perforce software technologies work. Injection points can be identified for handling things likely to be topology-specific, such as redirecting traffic via DNS, integration with monitoring systems, or tying into failover of other related systems.
This software is community supported. Evolution can also be driven by engaging Perforce Consulting. Please DO NOT contact Perforce Support for the Helix Management System, as it is not an officially supported product offering.
This project started as a standalone package layered on the SDP. Near the time of the first production release circa September 28, 2016, HMS was blended into the SDP for a time, as it drove many SDP changes and was tightly coupled for a stretch. After changes in 2019, it is one again a standalone project.