Good News! The HMS Project is being blended into the standard SDP! Follow the Server Deployment Package SDP project for updates.
This has several benefits:
This HMS project in the Workshop will cease to be updated as a standalone project in The Workshop.
As Perforce Helix evolves to meet ever-growing enterprise demands, sophisticated global deployment architectures have become commonplace.
There has been a corresponding rise in deployment complexity as more Helix installations take advantage of:
That's a lot of complexity to manage! Fear not! The Helix Versioning Engine and the Server Deployment Package (SDP) are well suited for this purpose. The Helix Management System evolves and codifies manual best practices used by Perforce Consultants and enterprise site admins to help customers manage sophisticated enterprise environments.
Simply put: Routine Helix administration tasks should be consistent and simple.
HMS will help with:
Knowing What you Have - Knowing what components and what versions of all the various Helix components exist in your topology. HMS v1.0 will not do form of automated discovery of the topology, but will provide a well defined way of defining and tracking all components in use, where they are, and various details of how they are configured.
Consistent Start/Stop/Status - Managing various Helix Server instances in your environment, including system start, stop, and "Down for Maintenance" modes. The mechanical steps to start and stop a Helix Server can vary, based (for example) on whether or not a broker is in place for that instance, whether or not SSL is enabled, etc. HMS abstracts those details to get it down to Start/Stop/Status.
Health Status Ping - For each Helix topology component, we'll quickly see if it is running or not for v1.0. (Later, this may expand to include aspects of health monitoring, going beyond mere up/down status).
Upgrading - Upgrades in a global topology are straightforward and well understood, but there are a lot of moving parts and a lot of commands to type to make it happen. HMS will make this easy by knowing which components have a newer version available and/or a newer patch of the same version available. It will be easy to upgrade all Helix topology components, individually or in groups, with a click of a button. In sophisticated topologies involving edge servers and replicas, there will be a built-in awareness of the order in which components must be upgraded, without relying on a human admin to know those details.
When it comes to topology-wide upgrades, enterprises need it all -- control, flexibility (which introduces a degree of complexity), and operational simplicity. They may want to apply a P4D patch to one instance but not others, or upgrade all instances at once. We can present admins with options rather than have them figure out custom upgrade procedures for updating executables and symlinks tweaking to get it right.
Human-Initiated Failover - HMS will execute the steps to achieve a Failover by executing a single command
Stretch Goal: Failover addresses mechanics comprehensively, even including such things that are often outside the scope of Perforce administrators, but which are truly necessary to achieve failover in some environments. Things like DNS updates, Virtual IP configuration changes, etc.
Hardware fault detection and automated failover initiation are explicitly outside the scope of this project. This project's more humble goal is simply to clarify and simplify the mechanics of executing a failover. That said, these are necessary first steps to those loftier goals.
Comprehending the Parts - Knowing every detail will help Perforce admins understand the many moving parts that keep a Helix environment happy. Details like:
p4 info -sfrom each Helix Server instance
All this and much more is needed and should be visible from a source more dynamic and reliably updated than a human-maintained wiki page. The data will be gathered centrally for the human administrator, but kept current automatically.
With Helix Server 2015.2, High Availability (HA) and Disaster Recovery (DR) solutions are closer to being commoditized than ever before. But it's still not quite commodity. HMS captures and codifies what Perforce Consultants have done for individual customers with custom solutions, automating all the wiring under a big red Failover button.
A set of pre-defined, pre-configured failover options are defined with HMS. At the time of execution, the administrator must select from a short list of options to execute a failover. Based on the type of option selected (Local, HA, DR), failover will occur to a pre-defined target machine for that type of failover.
Planned Failover - Planned failover is a planned, generally scheduled event, not a reaction to a problem. In a planned failover, assumptions can safely be made about the state of the things. This might occur, for example, to allow master Server A to be powered down for several hours to add RAM, with Server B coming online to avoid downtime of the Helix ecosystem for more than a few minutes. Nothing is broken, so this type of failover can be nearly transparent.
Unscheduled failover occurs as a decision by a human administrator, in reaction to something breaking. The human administrator must determine the nature of the problem, and determine if failover is needed, and if so, what failover option is best.
Following are the list of potential failover options that can be configured:
Local Failover - Local failover is a failover to an offline copy of the Perforce databases on the same machine. This is useful for scenarios where the database integrity is in question for some reason, but there's no reason to suspect the hardware is damaged. For example, this might be the case after a sudden power loss, or error on the part of a human administrator (like removing live databases by accident -- yes, it happens to the best of us).
HA Failover - HA failover involves failover over to another server in the same data center, optionally sharing storage with the master for archive files. Little or no data loss is acceptable for an HA failover, and downtime should be minimal.
See: HMS Product Road Map.md.
Failover in an enterprise environment may always involve some degree of customization. HMS will capture everything that can be understood based on how the various Perforce software technologies work, and provide clear injection points for handling things likely to be topology-specific, such as redirecting traffic via DNS or Virtual IP changes.
As of September 28, 2016, the HMS Project has has been cancelled as a standalone project. But Fear Not! The goodness of HMS is becoming part of the stock SDP. A new
hms script will appear in the next release of the SDP.
|#6||25290||tom_tyler||Rev. HMS/Linux/2016.1/20740 (2016/09/28).|
|#5||20741||tom_tyler||Published note about blending into the SDP.|
|#3||20079||tom_tyler||Updated and refactored.|
|#2||20041||tom_tyler||Lots of updates; added preliminary roadmap.|
|#1||19373||tom_tyler||Populate -o //guest/tom_tyler/hms/main/README.md //guest/perforce_software/hms/main/README.md.|
|#1||17234||tom_tyler||Added README file for Helix Management System|