daily_checkpoint.sh #4

  • //
  • guest/
  • perforce_software/
  • sdp/
  • dev/
  • Server/
  • Unix/
  • p4/
  • common/
  • bin/
  • daily_checkpoint.sh
  • View
  • Commits
  • Open Download .zip Download (2 KB)
#!/bin/bash
#==============================================================================
# Copyright and license info is available in the LICENSE file included with
# the Server Deployment Package (SDP), and also available online:
# https://swarm.workshop.perforce.com/projects/perforce-software-sdp/view/main/LICENSE
#------------------------------------------------------------------------------

# This script expects the most recent valid checkpoint to be available in
# $CHECKPOINTS in order for this script to work.
#
# This script is using the following external variables:
#
# SDP_INSTANCE - The instance of Perforce that is being backed up. If not
# set in environment, pass in as argument to script.
#
# P4HOME - Server's home directory.
# P4BIN - Command line client name for the instance being backed up.
# P4DBIN - Server executable name for the instance being backed up.
# P4ROOT - Server's root directory. p4/root, p4_N/root
# P4PORT - TCP/IP port for the server instance being backed up.
# P4JOURNAL - Location of the Journal for the server instance being backed up.
#
#
export SDP_INSTANCE=${SDP_INSTANCE:-Undefined} 
export SDP_INSTANCE=${1:-$SDP_INSTANCE} 
if [[ $SDP_INSTANCE == Undefined ]]; then 
   echo "Instance parameter not supplied." 
   echo "You must supply the Perforce instance as a parameter to this script." 
   exit 1 
fi 

. /p4/common/bin/p4_vars $SDP_INSTANCE
. /p4/common/bin/backup_functions.sh
LOGFILE=$LOGS/checkpoint.log

######### Start of Script ##########

check_vars
set_vars
check_uid
check_dirs
check_offline_db_usable
ckp_running
/p4/common/bin/p4login
get_journalnum
rotate_last_run_logs
log "Start $P4SERVER Checkpoint"
get_offline_journal_num
truncate_journal
replay_journals_to_offline_db
ROOTDIR=$OFFLINE_DB
dump_checkpoint
recreate_offline_db_files
remove_old_checkpoints_and_journals
if [[ -d ${P4HOME}/journals.rep ]]; then
   cd ${P4HOME}/journals.rep
   rm $(ls -t | awk 'NR>1') > /dev/null 2>&1
fi
check_disk_space
remove_old_logs
log "End $P4SERVER Checkpoint"
mail_log_file "$HOSTNAME $P4SERVER Daily maintenance log."
set_counter
ckp_complete

# Change User Description Committed
#17 30210 C. Thomas Tyler Adjusted set_counter() so the checkpoint counter is set consistently on any p4d
server (commit, edge, standby, filtered forwarding replica, etc.).  Also
enhanced auditability of counter setting.

#review-30211
#16 28637 C. Thomas Tyler Tech Preview Feature: Added option to skip regeneration of offline_db.

By default, after creating a checkpoint from the offline_db, the
daily_checkpoint.sh script deletes the db.* files in the offline_db,
and then recreates them from from the checkpoint.  This has several
benefits, including verifying the usability of the checkpoint and
replacing potentially bloated databases with more compact, freshly
rebuilt from a checkpoint databases that can be taken advantage of
when refresh_P4ROOT_from_offline_db.sh is used.

However, in some circumstances it may be better to forego those
benefits in favor of reducing the duration of the daily_checkpoint.sh
script, and maintaining the usability of db.* files in the offline_db
for a fast recovery. This is especially the case where checkpoint
operations take several hours.

This initial implementation is as a Tech Preview feature, sans
documentation. A better follow-on implementation may be tied to
SDP-530 and SDP-568 (rewriting command line usage and logging).

This feature can be enabled with a command like the following, with
this example being for SDP instance 1:
echo 'export SDP_DAILY_OFFLINE_DB_REGEN=0' >> /p4/common/site/config/p4_1.vars.local
#15 28421 C. Thomas Tyler verify_sdp.sh v5.20.0:
* New checks: /p4/N/bin/p4{d,p,broker}_N need correct symlink target,
and must exist if the corresponding _init script exists.

For p4d, it can be a symlink (for a case-sensitive instance) or script
(for a case-insensitive instance to pass the C1 flag). Either way the
target is checked.

These checks cannot be skipped or converted to warnings.

* Added check that /p4/N/bin/p4{d,p,broker}_N_init scripts have content
that matches templates. This can be skipped with '-skip' or reported as
mere warnings (with '-warn') with a new and documented 'init' category
of test skipping/warning.

#review-28422
#14 27829 C. Thomas Tyler Changed call in daily_checkpoint.sh to verify_sdp.sh to always skip license expiration checks.

#review @robert_cowham @neal_firth
#13 27750 C. Thomas Tyler upgrade.sh v4.6.9:
* Fixed issue where where patch-only upgrades of instances after the first
in a multi-instance environment are skipped.
* Corrected error message for scenario where downgrades are attempted; the
logic was correct but error message was confusing.

verify_sdp.sh v5.17.3:
* Extended '-skip version' meaning to also skip new live binary version
comparison checks.

Related updates:
* A call to verify_sdp.sh in the switch_db_files() function in backup_functions.sh
now skips the version check.
* A call to daily_checkpoint.sh now skips the version check.

#review-27743
#12 27455 C. Thomas Tyler Normalized usage of calls to verify_sdp.sh from other scripts.

In some contexts, we may desire "squeaky clean," while in other
contexts we may only need assurance that core functions are
operating OK.

Fixed typo in variable name used by verify_sdp.sh in call from
upgrade.sh that prevented local VERIFY_SDP_SKIP_TEST_LIST def'n
from having any effect.
#11 27192 C. Thomas Tyler Fixed hang issue during journal rotation is journalPrefix is wrong.

The verify_sdp.sh check is now incorporated into daily_checkpiont.sh.

When verify_sdp.sh is called by other scripts, '-skip excess,crontab'
options are supplied to avoid issues during maintenance windows with
possibly overzealous/nitpicky verifications.

#review-27193
#10 26456 C. Thomas Tyler Patch to fix issue with refresh_P4ROOT_from_offline_db.sh behavior
on replicas.

Adjusted behavior for other scripts to ensure proper behaivor when run on
replicas vs. edge servers vs. the master server.

Approving patch for testing.
#9 26081 ashaikh Add support for opting out of information email messages

Currently, we're trying to filter our email to only contain actionable messages. By default, the SDP scripts will always send an email if it's configured when a job has completed. I'd like to add a parameter to suppress informational messages and only get notified if there is an error.

I kept the default behavior of always sending an email.
#8 24191 C. Thomas Tyler Submit on behalf of ashaikh after merging/resolving with current
tip revision.
#7 23266 C. Thomas Tyler Fixes and Enhancements:
* Enabled daily_checkpoint.sh operate on edge servers, to
keep /p4/N/offline_db current on those hosts for site-local
recovery w/o requiring a site-local replica (though having
a site-local replica can still be useful).
* Disabled live_checkpoint.sh for edge servers.
* More fully support topologies using edge severs, in both
geographically distributed and horizaontal scaling "wokspace
server" solutions.
* Fix broken EDGESERVER value definition.
* Modified name of SDP counter that gets set when a checkpoint is taken
to incorporate ServerID, so now the counter name will look like
lastSDPCheckpoint.master.1, or lastSDPCheckpoint.p4d_edge_sfo, rather
than just lastSDPCheckpoint.

There will be multiple such counters in a topology that uses edge
servers, and/or which takes checkpoints on replicas.

* Added comments for all functions.

For the master server, journalPrefix remains:
/p4/N/checkpoints/p4_N

The /p4/N/checkpoints is reserved for writing by the
master/commit server only.

For non-standby (possibly filtered) replicas and edge serves,
journalPrefix is:
/p4/N/checkpoints.<ShortServerID>/p4_N.<ShortServerID>

Here, ShortServerID is just the ServerID with the 'p4d_' prefix
trimmed, since it is redundant in this context.  See mkrep.sh,
which enshines a ServerID (server spec) naming standard, with
values like 'p4d_fr_bos' (forwarding replica in Boston) and
p4d_edge_blr (Edge server in Bangalore).  So the journalPrefix
for the p4d_edge_bos replica would be:
/p4/N/checkpoints.edge_bos/p4_N.edge_bos

For "standby" (aka journalcopy) replicas, journalPrefix is set
to /p4/N/journals.rep. which is written to the $LOGS volume, due
to the nature of standby replicas using journalPrefix to write
active server logs to pre-rotated journals.

Some take-away to be updated in docs:
* The /p4/N/checkpoints folder must be reserved for checkpoints that
originate on the master. It should be safe to rsync this folder
(with --delete if desired) to any replica or edge server.  This is
consistent with the current SDP.
* I want to change 'journals.rep' to 'checkpoints.<ShortServerID>'
for non-standby replicas, to ensure that checkpoints and journals
taken on those hosts are written to a volume where they are backed
up.
* In sites with multiple edge serves, some sharing achive files
('workspace servers'), multiple edge servers will share the same
SAN. So we one checkpoints dir per ServerID, and we want that
dir to be on the /hxdepots volume.

Note that the journalPrefix for replicas was a fixed /p4/N/journals.rep.
This was on the /hxlogs volume - a presumably fast-for-writes volume,
but typically NOT backed up and not very large. This change puts it
under /p4/N/checkpoints.* for edge servers and non-standby replicas,
but ensures other replica types and edge servers can generate
checkpoints to a location that is backed up and has plenty of storage
capacity.  For standby replicas only (which cannot be filtered),
the journalPrefix remains /p4/N/journals.rep on the /hxlogs volume.
#6 20940 Russell C. Jackson (Rusty) Drop JOURNALNUM from the rotated log names because it forces you to wait to rotate
the prior logs until you get the journal number and creates a problem where the error
that you couldn't get the journal number ends up at the end of the previous days log
file, and that is what gets email out. That causes confusion for the person trying
to see what the error is.

Moved all rotate_last_run_logs up to the point right after we set the environment.
#5 20749 C. Thomas Tyler Approved and committed, but I believe that the shared data setting is always set to false on the master and we should look at fixing that in another change.

Enhanced p4login again.

Improvements:
Default behavior with no arguments gives the desired results.
For example, if run on a master, we login on the super user P4USER to
P4PORT.  If run on a replica/edge and auth.id is set, we login P4USER
to the P4TARGET port of the replica.

All other login functionality, such as logging in the replication
service user on a replica, logging in supplemental automation users,
is now accessed via new flags.

A usage message is now available via '-h' and '-man' options.  The
new synopsys is:
p4login [<instance>] [-p <port> | -service] [-automation] [-all]

The <instance> parameter is the only non-flag positional parameter,
and can be ommitted if SDP_INSTANCE is already defined (as is typical
when called by scripts).

With this change, several other scripts calling either the 'p4login'
script or 'p4 login' commands were normalized to call p4login as
appropriate given the new usage.

Reviewer Note:  Review p4login first, then other files.  Most changes
are in p4login.

In other scripts callling p4login, calls similar to:
$P4BIN -u $P4USER -p $P4PORT login < /path/to/pwd
are replaced with: $P4CBIN/p4login

In other scritps calling p4login, calls similar to:
$P4BIN -p $P4MASTERPORT login < /path/to/pwd
are replaced with: $P4CBIN/p4login -p $P4MASTERPORT

Note that, if auth.id is set, calling 'p4login' actually has the
same behavior as 'p4login -p $P4MASTERPORT', since p4login
called on a replica with auth.id set will just login to the master
port anyway.

Depending on intent, sometimes $P4BIN/p4login -service
is used.

== Misc Cleanup ==

In doing the cleanup:
* Fixed a hard-coding-to-instance-1 bug in broker_rotate.sh.
* Fixed an inconsistency in recreate_db_sync_replica.sh, where
it did just a regular login rather than a login -a as done in other
places for (for compatibility with some multi-interface NIC card
configs).

== p4login Call Normalization ==
Code cleanup was done to normalize calls to p4login, such that:
1) the call starts with $P4CBIN/p4login (not the hard-coded path),
and 2) logic to redirect sdtout/stderr to /dev/null was removed,
since it's not necessary with p4login.  (And if p4login ever
does generate any unwanted output, we only fix it in one place).

== Tweak to instance_vars.template ==
This change includes a tweak to set P4MASTERPORT dynamically
on a replica to ensure the value precisely matches P4TARGET
for the given replica.  This will reduce a source of problems
when SSL is used, as it is particularly sensitive to the precise
P4PORT values used, and will also help for environments which
have not yet set auth.id.  If the port cannot be determined
dynamically, we fall back to the old logic using the assigned
value.

== Tweak to SDP_ALWAYS_LOGIN behavior ==
This used to default to 1, now it defaults to 0.  At this
point we should no longer need to force logins, and in fact
doing so can get into a 'p4 login' hang situation with
auth.id set.  Best to avoid unnecessary logins if we
already have a valid ticket.  (I think the need to force a
login may have gone away with p4d patches).

== Obsolete Script ==
With this change, svclogin.sh is now obsolete.  All it was doing
was a few redundant 'p4 login' commands followed by a call to
p4login anyway.

== Testing ==
Our test suite doesn't fully cover this change, so additional
manual testing was done in the Battle School lab environment.
#4 20708 C. Thomas Tyler Per discussion: s/checkpoints.rep/journals.rep/g

This directory name changed, used in the journalPrefix configurable, is
intended to clarify that the should be targeted to for a FAST volume
for use with journalcopy, rather than the LARGE volume as would be
implied when using a directory with "checkpoints" in the name.
#3 19851 Robert Cowham Check for usable offline_db before creating checkpoint work file.
This avoids an error right at the start locking out the utility which
will fix said error!
#2 19768 UnstoppableDrew @tom_tyler @russell_jackson
Bug fix for running p4master_run as root, and some comment header cleanup. Job 000543

p4master_run: Preserve original arguments list and use this when exec'ing as $OSUSER.

backup_functions.sh: Add text about sourcing p4_vars yourself instead of using p4master_run.

update_limites.py: Run p4login directly without p4master_run since p4login calls p4_vars now.

everything else: Remove comment block about needing to run with p4master_run. Reword comment
  about SDP_INSTANCE since it is not always an integer value.
#1 19113 Russell C. Jackson (Rusty) Changed name of daily_backup.sh to daily_checkpoint.sh
Changed name of weekly_backup.sh to recreate_db_checkpoint.sh

Updated crontabs with new names, and changed to run recreate_db_checkpoint
on the 1st Sat. of Jan. and July. For most companies, this is a better
practice than recreating weekly per discussion with Anton.

Remove solaris crontab since Solaris is pretty much dead, and we don't test on it.

Updated docs to reflect name changes, and did a little clean other other sections
while I was in there.
//guest/perforce_software/sdp/dev/Server/Unix/p4/common/bin/daily_backup.sh
#13 18587 Russell C. Jackson (Rusty) Reworked the log rotation stuff in backup_functions.sh to make it cleaner and
handle the new log from recreate_offline_db.sh.

Modified recreate_offline_db.sh to add comments about a bad checkpoint. Also
made it create its own log file since it isn't doing a checkpoint. Removed the
log rotation for the same reason.

Moved the LOGFILE setting out to all of scripts to make it more obvious for future
scripts that you need to set that variable in your script so that it doesn't just
default to checkpoint.log.

Moved the functions in weekly_backup.sh and recreate_offline_db.sh into backup_functions.sh
where they belong for consistency.

Modified backup_functions.sh to use a consistent naming convention for all the
rotated log files rather than checkpoint.log being unique.

Replaced all back ticks with the newer bash $() method.

Removed all of the line wrapping since I am pretty sure that none of us are working on an
80 character terminal these days and it is easier to read this way.
#12 18528 Russell C. Jackson (Rusty) #review-18511

Added code to remove all but the most recent file in the checkpoints.rep
       directory. The most recent file is the active journal that has been
       pre-rotated by a poorly designed journalcopy method. The other files in
       this directory are copies of journals that we already have in the regular
       checkpoints directory, so there is no need to keep them.
#11 16029 C. Thomas Tyler Routine merge to dev from main using:
p4 merge -b perforce_software-sdp-dev
#10 15778 C. Thomas Tyler Routine Merge Down to dev from main.
#9 15701 C. Thomas Tyler Routine merge down using 'p4 merge -b perforce_software-sdp-dev'.
#8 15375 adrian_waters Routine merge-down from main->dev
#7 15374 adrian_waters - Ensure backup scripts are run as the OSUSER (to prevent accidental running as root); 
- in scripts where LOGFILE value is changed from the 'checkpoint.log'  set by set_vars, ensure the new assignment is before check_dirs is called, otherwise errors could be written to the 'wrong' log
- in 'die()' - detect if running from terminal & also send output to stderr
#6 13931 C. Thomas Tyler Routine merge-down to dev from main.
#5 13906 C. Thomas Tyler Normalized P4INSTANCE to SDP_INSTANCE to get Unix/Windows
implementations in sync.

Reasons:
1. Things that interact with SDP in both Unix and Windows
environments shoudn't have to account for this obscure
SDP difference between Unix and Windows.  (I came across
this doing CBD work).

2. The Windows and Unix scripts have different variable
names for defining the same concept, the SDP instance.
Unix uses P4INSTANCE, while Windows uses SDP_INSTANCE.

3. This instance tag, a data set identifier, is an SDP concept.
I prefer the SDP_INSTANCE name over P4INSTANCE, so I prpose
to normalize to SDP_INSTANCE.

4. The P4INSTANCE name makes it look like a setting that might be
recognized by the p4d itself, which it is not.  (There are other
such things such as P4SERVER that could perhaps be renamed as
a separate task; but I'm not sure we want to totally disallow
the P4 prefix for variable names. It looks too right to be wrong
in same cases, like P4BIN and P4DBIN.  That's a discussion for
another day, outside the scope of this task).

Meanwhile:
* Fixed a bug in the Windows 2013.3 upgrade script that
was referencing undefined P4INSTANCE, as the Windows
environment defined only SDP_INSTANCE.

* Had P4INSTANCE been removed completely, this change would
likely cause trouble for users doing updates for existing
SDP installations.  So, though it involves slight technical debt,
I opted to keep a redundant definition of P4INSTANCE
in p4_vars.template, with comments indicating SDP_INSTANCE should be
used in favor of P4INSTANCE, with a warning that P4INSTANCE
may go away in a future release.  This should avoid unnecessary
upgrade pain.

* In mkdirs.sh, the varialbe name was INSTANCE rather than
SDP_INSTANCE.  I changed that as well.  That required manual
change rather than sub/replace to avoid corrupting other similar
varialbe names (e.g.  MASTERINSTANCE).

This is a trivial change technically (a substitute/replace, plus
tweaks in p4_vars.template), but impacts many files.
#4 12169 Russell C. Jackson (Rusty) Updated copyright date to 2015

 Updated shell scripts to require an instance parameter to eliminate the need
 for calling p4master_run.    Python and Perl still need it since you have to set the
environment for them to run in.

 Incorporated comments from reviewers. Left the . instead of source as that seems
more common in the field and has the same functionality.
#3 12028 C. Thomas Tyler Refreshed SDP dev branch, merging down from main.
#2 11485 Russell C. Jackson (Rusty) Brought over changes from RCJ sdp to properly handle Edge servers
 and to properly replicate shelves when replicating from Windows to Linux
#1 10638 C. Thomas Tyler Populate perforce_software-sdp-dev.
//guest/perforce_software/sdp/main/Server/Unix/p4/common/bin/daily_backup.sh
#1 10148 C. Thomas Tyler Promoted the Perforce Server Deployment Package to The Workshop.