#============================================================================== # Server Deployment Package (SDP) State File #------------------------------------------------------------------------------ # The state file tracks the usability status of the 3 sets of SDP databases, # Live, Offline, and Extra. The SDP requires 3 sets of DBs to maintain # a consistent RTO in all failure scenarios, even if failures occur at # untimely moments during execution of certain long-running SDP processes, # such as the regeneration of offline databases from a checkpoint. # # Design Mantra: # At no time are there less than 2 usable copies of databases, with # the difference between them being no more than the current/active # journal file. # # Tag Description Location # -------- ----------------- ----------------- # Live Live databases /p4/n/root # Offline Offline databases /p4/n/offline_db # Extra Extra copy of DBs /p4/n/root/save # # This file documents States and Transitions possible for the SDP. # States are normally updated by SDP processing. However, there are a # few scenarios where human intervention to manually adjust a state file # is required. # # The remainder of this file documents States, Transitions, and Manual # State File Adjustment Considerations. # #============================================================================== #============================================================================== # STATES #============================================================================== # Nominal States: This represents the state of the SDP in between # runs of daily, weekly, and manual/live checkpoint operations. #------------------------------------------------------------------------------ # INITIAL:OK:NR:NR # INITIAL applies when teh SDP is first installed. The Offline and # Spare # # READY:OK:OK:OK # Normal in-between-long-running-processses state, when all sets of DBs # are usable to support various recovery procedures. #------------------------------------------------------------------------------ # Work-In-Progress (WIP) States. These states apply during the execution of # certain long-running SDP processes. #------------------------------------------------------------------------------ # LIVE_START:OK:NR:NR - Preliminary processing. DBs not touched. # LIVE_WIP_1:OK:NR:NR - Live checkpoint in progress. # LIVE_WIP_2:OK:NR:NR - Offline DBs being made. Extra copy in 'save'. # LIVE_WIP_3:OK:OK:NR - Extra copy of DBs being copied to 'save' # DAILY_START:OK:OK:OK - Preliminary processing. DBs not touched. # DAILY_WIP_1:OK:NR:OK - Offline DBs being made. Extra in 'save'. # DAILY_WIP_2:OK:OK:NR - Extra copy of DBs being copied to 'save'. # WEEKLY_START:OK:OK:OK - Preliminary processing. DBs not touched. # WEEKLY_WIP_1:OK:OK:NR - Live DBs moved to 'save', old save removed. # WEEKLY_WIP_2:OK:NR:OK - Live DBs ready. Dumping checkpoint from 'save'. # WEEKLY_WIP_3:OK:OK:NR - Extra copy of dbs being copied to 'offline_db'. #------------------------------------------------------------------------------ # Exception States #------------------------------------------------------------------------------ # RESET:NR:OK:OK # The RESET state must be set manually. This state should be set when the # live databases are suspected of being corrupt, e.g. after a sudden power # outage, and if a failover to the HA node is not being performed. #============================================================================== # TRANSITIONS #============================================================================== # Transition - Live Checkpoint (live_checkpoint.sh) # INITIAL -> LIVE_START -> LIVE_WIP_1 -> LIVE_WIP_2 -> LIVE_WIP_3 -> READY # READY -> LIVE_START -> LIVE_WIP_1 -> LIVE_WIP_2 -> LIVE_WIP_3 -> READY # The LIVE_START state applies immediately when live_checkpoint.sh starts. # The LIVE_WIP_1 state applies when live databases are being checkpointed. # The LIVE_WIP_2 state applies when the Offline databases are being generated. # The LIVE_WIP_3 state applies after the Offline databases are usable, and # while the Extra copies are still being generated. #------------------------------------------------------------------------------ # Transition - Daily Offline Checkpoint (daily_backup.sh) # READY -> DAILY_START -> DAILY_WIP_1 -> DAILY_WIP_2 -> READY # The DAILY_START state applies immediately when daily_backup.sh starts. # # The DAILY_WIP_1 state applies only during the portion of processing when # the Offline databases are not usable, which is the case when databases # in the offline_db folder are being regenerated from a checkpoint. # Should a sudden power outage occur when these databases are incomplete, # the Extra databases remain usable for a quick recovery. # The DAILY_WIP_2 state applies after checkpoint has been regenerated and # the Offline databases #------------------------------------------------------------------------------ # Transition: Weekly Offline Checkpoint (weekly_backup.sh) # READY -> WEEKLY_START -> WEEKLY_WIP_1 -> WEEKLY_WIP_2 -> WEEKLY_WIP_3 -> READY # The WEEKLY_START state applies immediately when weekly_backup.sh starts. # The WEEKLY_WIP_1 state applies for the brief moment in time when the # live databases are unsuable during initial processing of weekly_backup.sh, # when p4d is actually offline. # The WEEKLY_WIP_2 state applies after the Offline databases are usable, and # while the Extra copies are still being generated. # The WEEKLY_WIP_3 state applies after the Offline databases are usable, and # wthe Extra copies are just being copied from offline_db. #------------------------------------------------------------------------------ # Exception Scenarios: # When the core SDP scripts (p4d_1_init, live_checkpoint.sh, daily_backup.sh, # and weekly_backup.sh) start, they check to see that the SDP is in an # acceptable state, and abort otherwise. # # This could happen if, for example, a power outage occurred while checkpoints # in the offline_db folder were being regenerated from a checkpoint. The # state file would indicate that the Offline databases aren't ready and # cannot be used. # # This could also happen if two human administrators attempt to execute # commands simultaneously, e.g. two live_checkpoint.sh are attempted at the # same time by two non-coordinating administrators tyring to fight the same # fire after a failure of some kind. #============================================================================== # State File Format #------------------------------------------------------------------------------ # A state file consists of a series of one-line state entries. The last # line/entry in the state file represents the current state. Previous # lines are for historical reference only. The state file lives on the # /depotdata volume, with the implication that it is on a SAN and writes # to it are immediately visible on all hosts in the LAN environment.i # It is automatically rotated out by daily_backup.sh whenever it exceeds # some number of lines (e.g. 500), to avoid performance impact. # # State entries start with a prefix of "SDP:"; other lines are ignored. # State entries contain a series of colon-delimited fields: # # SDP:Host:StateName:LiveDBStatus:OfflineDBStatus:SpareDBStatus:Timestamp # # Host is $THIS_HOST in p4_vars, as returned by 'hostname -s'. # # The *DBStatus field values are: "OK" and "NR" (Not Ready, i.e. not # guaranteed to be ready). #============================================================================== #============================================================================== # MANUAL ADJUSTMENT #============================================================================== # Manual State File Adjustment Considerations. #------------------------------------------------------------------------------ # # Rule #1: First, always confirm with other administrators that only one # person is executing SDP scripts at any given time. Even though the # SDP has some protection against running dangerous commands concurrently, # human coordination is always wise. # # Rule #2: Adjust the state file by adding a new State entry, using the # set_sdp_state.sh, providing a new state name and colon-separated "OK"/"NR" # values for the 3 database status. For example, to confirm that all is # well manually, do: # # set_sdp_state.sh READY:OK:OK:OK # # This appends a new state entry to the state file that looks like: # # SDP:scm01:READY:OK:OK:OK:2011-12-13-012202 # # Note that the prefix including the hostname, and the timestamp suffix, # are generated by set_sdp_state.sh. # # The state file is in the 'tmp' folder for the appropriate instance of # Perforce, e.g: # # /p4/1/tmp/SDP.state # # Rule #3: If you are unsure of the state of the live databases, e.g. after # a sudden power outage, adjust the state file before calling p4_1_init to # start Perforce. THIS IS ONLY NECESSARY if you plan to do a local failover; # that is, if you do not intend to fail over to the HA node. # # Adjust the state file by adding a new State entry. The new entry will be # similar to the previous, but but change the field representing the status # of the live databases to "NR" (not ready). For example, if the last State # entry is: # # SDP:scm01:DAILY_WIP_2:OK:OK:NR:2011-12-13-042314 # # Do this to create a new entry: # # set_sdp_state.sh DAILY_WIP_2:NR:OK:NR # ^^ Changed! # # Special Case: If the State Name was READY, change it to LOCAL_FAILOVER, in # addition to changein the status value for the live databases to NR. # That is, if the last State entry is: # # SDP:scm01:READY:OK:OK:OK:2011-12-13-042314 # # Do this to create a new entry: # # set_sdp_state.sh LOCAL_FAILOVER:NR:OK:OK # # Rule #4: If (say, after a sudden power outage), you find that SDP state # file is is beyond the state where offline checkpoints were being generated, # e.g. LIVE_WIP_2 # That means the Offline databases are OK. You can regenerate the Extra # databases from the most recent checkpoint, and then manually reset the # SDP state by running: # # set_sdp_state.sh READY:OK:OK:OK # # If you find that the SDP state file is in one of the *WIP_1 states, that # indicates that the Offline databases not usable, and the Extra databases # must be used instead.