sync_replica.sh #24

  • //
  • guest/
  • perforce_software/
  • sdp/
  • dev/
  • Server/
  • Unix/
  • p4/
  • common/
  • bin/
  • sync_replica.sh
  • View
  • Commits
  • Open Download .zip Download (4 KB)
#!/bin/bash
#==============================================================================
# Copyright and license info is available in the LICENSE file included with
# the Server Deployment Package (SDP), and also available online:
# https://swarm.workshop.perforce.com/projects/perforce-software-sdp/view/main/LICENSE
#------------------------------------------------------------------------------

# Intended to be run on a replica machine to sync replica from its corresponding master
export SDP_INSTANCE=${SDP_INSTANCE:-Undefined} 
export SDP_INSTANCE=${1:-$SDP_INSTANCE} 
if [[ $SDP_INSTANCE == Undefined ]]; then 
   echo "Instance parameter not supplied." 
   echo "You must supply the Perforce instance as a parameter to this script." 
   exit 1 
fi 

# shellcheck disable=SC1091
source /p4/common/bin/p4_vars "$SDP_INSTANCE"
# shellcheck disable=SC1091
source /p4/common/bin/backup_functions.sh
LOGFILE="$LOGS/sync_replica.log"

######### Start of Script ##########
check_vars
set_vars
rotate_last_run_logs
log "Starting sync_replica.sh"

if [[ "${P4REPLICA}" == "FALSE" ]]; then
  die "Error: sync_replica.sh can only run on a replica." 
fi

check_uid

"$P4CBIN/p4login" -p "$P4MASTERPORT"
MASTERJOURNALNUM=$("$P4BIN" -u "$P4USER" -p "${P4MASTERPORT}" counter journal)

if [[ "$MASTERJOURNALNUM" == "" ]]; then
   die "Error:  Couldn't get journal number from master.  Aborting."
fi

# We set JOURNALNUM to one less than the master since we are not truncating the
# journal and replay_journals_to_offline_db assumes that truncate_journal has
# been run.

# shellcheck disable=SC2034
JOURNALNUM=$((MASTERJOURNALNUM-1))

TargetServerJournalPrefix=$(get_target_config_value journalPrefix)
if [[ -n "$TargetServerJournalPrefix" ]]; then
   CheckpointsDir="${TargetServerJournalPrefix%/*}"
else
   die "Could not determine journalPrefix of P4TARGET server of ServerID $SERVERID."
fi

# Determine the target host from P4TARGET, which is a P4PORT setting. Extract
# the server host from this value. The value extracted must be suitable with
# passwordless ssh for rsync usage.
if [[ -r "$P4ROOT/db.config" ]]; then
   TargetHost="$("$P4DBIN" -r "$P4ROOT" -k db.config -jd - 2>/dev/null | grep "@${SERVERID}@ @P4TARGET@" | cut -d '@' -f 10)"
   if [[ "$TargetHost" == *:* ]]; then
      TargetHost=${TargetHost%:*}
      TargetHost=${TargetHost#*:}
   else
      die "Could not determine target host from $P4ROOT/db.config.  Tried:\\n$P4DBIN -r $P4ROOT -k db.config -jd - 2>/dev/null | grep @${SERVERID}@ @P4TARGET@ | cut -d '@' -f 10)"
   fi
else
   die "Expected file $P4ROOT/db.config is missing. Could not determine target host."
fi

# You must set up a public keypair using "ssh-keygen -t rsa" in order for this
# to work. You need to paste your CLIENT ~perforce/.ssh/id_rsa.pub contents
# into the REMOTE ~perforce/ssh/authorized_keys file. 
if [[ "$SHAREDDATA" == "FALSE" ]]; then
   RsyncCmd="rsync -av --exclude='.nfs*' --delete ${OSUSER}@${TargetHost}:$CheckpointsDir/ $CheckpointsDir"
   log "Executing: $RsyncCmd"
   $RsyncCmd >> "$LOGFILE" 2>&1
   rsync_exit_code=$?

   if [[ "$rsync_exit_code" -ne 0 ]]; then
      die "Error: Failed to pull $CheckpointsDir from host $TargetHost.  The rsync exit code was: $rsync_exit_code.  Aborting."
   fi
else
   log "Skipping rsync of $CheckpointsDir because SHAREDDATA is $SHAREDDATA."
fi

recreate_offline_db_files 1
get_offline_journal_num
replay_journals_to_offline_db 1
"$P4CBIN/p4login"
check_disk_space
remove_old_logs
remove_old_checkpoints_and_journals

log "End $P4SERVER sync replica"
mail_log_file "$HOSTNAME $P4SERVER Daily sync replica log."
# Change User Description Committed
#26 30979 C. Thomas Tyler Eliminated buildup of temp dirs, e.g.
/tmp/tmp.XXXXXXXXXX.

Added remove_jd_tables() function and calls to it to prevent buildup of new
cruft.

Modified remove_old_logs() to cleanup cruft created previously.

#review-30980 @robert_cowham
#25 30267 Robert Cowham Copy files to be dumped via p4d -jd to tmp dir first
to avoid locks on P4ROOT (or offline_db)

SDP-1087
#24 30182 C. Thomas Tyler Added change to rsync to avoid issues with .nfs* files on NFS.
#23 29798 C. Thomas Tyler sync_replica.sh: Enhanced to silence spurious p4d output generated
when scanning db.config to get the target host for the rsync.

Also improved error checking and error messages.
#22 29590 C. Thomas Tyler Another refinement of sync_replica.sh logic to handle all intended use cases:
* HA of commit server, with and without NFS sharing (SHAREDDATA).
* HA of edge server, with and without NFS sharing (SHAREDDATA).

When NFS sharing, the rsync from the target server is disabled, as before.

When rsycing for full replicas (not NFS-sharing, metadata-only replicas),
the sync_replica.sh script now always rsyncs from a checkpoints directory
based on the journalPrefix of the P4TARGET server.  This is correct for all
scenarios.

If that cannot be determined, the script now does a die() call to avoid
rsyncing to a possibly incorrect path.

When rebuilding the local offline_db, the checkpoints directory based on
the journalPrefix of the P4TARGET server is always used. This directory
should exist, whether due to rsync from the target, or NFS-sharing.

Logic to remove old checkpoints and journals now only cleans in folders
written to by the local replica, to avoid removing files on an NFS-shared
upstream server.

Auditability of checkpoint operations in backup_functions.sh is improved.

#review-29591
#21 29576 C. Thomas Tyler Enhanced sync_replica.sh to support operation on a 'ham'
type replica (HA, Metadata-only). A 'ham' type replica
replicates only metadata, and shares the /hxdepots volume
(via NFS) with its target server.

In this configuration, the SHAREDDATA=TRUE value is set,
and this corresponds to a p4d configuration setting for
the replica of lbr.replication=shared.

In this configuration, the journalPrefix value of the
replica server will differ from that of its target
server.  For example, the commit server will may have
the First Form jour the journalPrefix, while an HA
of the commit will have the Second Form.  See
'The journalPrefix Standard':

https://swarm.workshop.perforce.com/projects/perforce-software-sdp/view/main/doc/SDP_Guide.Unix.html#_the_journalprefix_standard

As another example, for an edge server and HA of that
edge, both servers will use the Second Form of the
journalPrefix, the form which incorporates a shortened
form of the ServerID into the journalPrefix value. But
since the ServerIDs are different, the actual journalPrefix
values will be different, even though both are of the
Second Form.

The common pattern is that, when configured for NFS sharing,
the sync_replica.sh script should use the journalPrefix
of its target server when determining where to look for
a checkpoint and numbered journal to load into the offline_db.

#review @mark_zinthefer @robert_cowham
#20 28621 C. Thomas Tyler Added support to sync_replica.sh for operation on a replica of an
edge server.

Fixed related error with erroneous output in backup_functions.sh.

#review-28622
#19 26718 Robert Cowham Rename P4MASTER to P4MASTERHOST for clarity with comments in:
- mkdirs.cfg/mkdirs.sh
- p4_<instance>.vars
- other files which reference
Remove unnecessary sed for p4p.template
#18 26156 C. Thomas Tyler Shellcheck v0.6.0 and style compliance changes.

Fixed minor bugs related to capturing output, driven by
shellcheck changes.

Fixed sync_replica.sh for standby replicas with the
configurable rpl.journalcopy.location=1 (SDP-424),
removing an unnecessary and broken check.

Fixed test for pre-existing checkpoints in function
recreate_offline_db_files() so that it checks only
for the master server, fixing an issue where it would
report "No checkpoints found - run live_checkpoint.sh"
when used on a replica where checkpoints might
legatimately not exist. Also fixed the actual test itself.

Replaced P4COMMITSERVER variable with P4MASTERPORT to
support daisy chain scenarios, removing the assumption
that all servers target only the master. (This assumption
was made only in journal_watch.sh).

Enhanced check_vars() to report individual missing
environment variables, and to add more info on how
to fix environment problems (e.g. adding to
p4_vars or p4_N.vars files).

Fixed bug in check_dirs() where a missing directory check
intended to result in a die() call would result in a
syntax error instead.

These files have been field tested.
#17 23130 C. Thomas Tyler Changed rsync commands to avoid using compression when transferring
checkpoints, which are already compressed.

Fixed logging issue where log was truncated in mid-script.
#16 20940 Russell C. Jackson (Rusty) Drop JOURNALNUM from the rotated log names because it forces you to wait to rotate
the prior logs until you get the journal number and creates a problem where the error
that you couldn't get the journal number ends up at the end of the previous days log
file, and that is what gets email out. That causes confusion for the person trying
to see what the error is.

Moved all rotate_last_run_logs up to the point right after we set the environment.
#15 20937 Russell C. Jackson (Rusty) Added check to exit if we are not on a replica server.
#14 20749 C. Thomas Tyler Approved and committed, but I believe that the shared data setting is always set to false on the master and we should look at fixing that in another change.

Enhanced p4login again.

Improvements:
Default behavior with no arguments gives the desired results.
For example, if run on a master, we login on the super user P4USER to
P4PORT.  If run on a replica/edge and auth.id is set, we login P4USER
to the P4TARGET port of the replica.

All other login functionality, such as logging in the replication
service user on a replica, logging in supplemental automation users,
is now accessed via new flags.

A usage message is now available via '-h' and '-man' options.  The
new synopsys is:
p4login [<instance>] [-p <port> | -service] [-automation] [-all]

The <instance> parameter is the only non-flag positional parameter,
and can be ommitted if SDP_INSTANCE is already defined (as is typical
when called by scripts).

With this change, several other scripts calling either the 'p4login'
script or 'p4 login' commands were normalized to call p4login as
appropriate given the new usage.

Reviewer Note:  Review p4login first, then other files.  Most changes
are in p4login.

In other scripts callling p4login, calls similar to:
$P4BIN -u $P4USER -p $P4PORT login < /path/to/pwd
are replaced with: $P4CBIN/p4login

In other scritps calling p4login, calls similar to:
$P4BIN -p $P4MASTERPORT login < /path/to/pwd
are replaced with: $P4CBIN/p4login -p $P4MASTERPORT

Note that, if auth.id is set, calling 'p4login' actually has the
same behavior as 'p4login -p $P4MASTERPORT', since p4login
called on a replica with auth.id set will just login to the master
port anyway.

Depending on intent, sometimes $P4BIN/p4login -service
is used.

== Misc Cleanup ==

In doing the cleanup:
* Fixed a hard-coding-to-instance-1 bug in broker_rotate.sh.
* Fixed an inconsistency in recreate_db_sync_replica.sh, where
it did just a regular login rather than a login -a as done in other
places for (for compatibility with some multi-interface NIC card
configs).

== p4login Call Normalization ==
Code cleanup was done to normalize calls to p4login, such that:
1) the call starts with $P4CBIN/p4login (not the hard-coded path),
and 2) logic to redirect sdtout/stderr to /dev/null was removed,
since it's not necessary with p4login.  (And if p4login ever
does generate any unwanted output, we only fix it in one place).

== Tweak to instance_vars.template ==
This change includes a tweak to set P4MASTERPORT dynamically
on a replica to ensure the value precisely matches P4TARGET
for the given replica.  This will reduce a source of problems
when SSL is used, as it is particularly sensitive to the precise
P4PORT values used, and will also help for environments which
have not yet set auth.id.  If the port cannot be determined
dynamically, we fall back to the old logic using the assigned
value.

== Tweak to SDP_ALWAYS_LOGIN behavior ==
This used to default to 1, now it defaults to 0.  At this
point we should no longer need to force logins, and in fact
doing so can get into a 'p4 login' hang situation with
auth.id set.  Best to avoid unnecessary logins if we
already have a valid ticket.  (I think the need to force a
login may have gone away with p4d patches).

== Obsolete Script ==
With this change, svclogin.sh is now obsolete.  All it was doing
was a few redundant 'p4 login' commands followed by a call to
p4login anyway.

== Testing ==
Our test suite doesn't fully cover this change, so additional
manual testing was done in the Battle School lab environment.
#13 20708 C. Thomas Tyler Per discussion: s/checkpoints.rep/journals.rep/g

This directory name changed, used in the journalPrefix configurable, is
intended to clarify that the should be targeted to for a FAST volume
for use with journalcopy, rather than the LARGE volume as would be
implied when using a directory with "checkpoints" in the name.
#12 20170 Russell C. Jackson (Rusty) Moved password and users into the config directory to allow for instance specific
users and passwords. Ran into a case where two different teams were sharing the same
server hardware and needed this type of differentiation. Surprised that we haven't hit
this sooner.

Also defaulted mkdirs to use the numeric ports since this is the most common
installation.
#11 18587 Russell C. Jackson (Rusty) Reworked the log rotation stuff in backup_functions.sh to make it cleaner and
handle the new log from recreate_offline_db.sh.

Modified recreate_offline_db.sh to add comments about a bad checkpoint. Also
made it create its own log file since it isn't doing a checkpoint. Removed the
log rotation for the same reason.

Moved the LOGFILE setting out to all of scripts to make it more obvious for future
scripts that you need to set that variable in your script so that it doesn't just
default to checkpoint.log.

Moved the functions in weekly_backup.sh and recreate_offline_db.sh into backup_functions.sh
where they belong for consistency.

Modified backup_functions.sh to use a consistent naming convention for all the
rotated log files rather than checkpoint.log being unique.

Replaced all back ticks with the newer bash $() method.

Removed all of the line wrapping since I am pretty sure that none of us are working on an
80 character terminal these days and it is easier to read this way.
#10 16335 C. Thomas Tyler Routine Merge Down to dev from main using:
p4 merge -b perforce_software-sdp-dev
#9 16029 C. Thomas Tyler Routine merge to dev from main using:
p4 merge -b perforce_software-sdp-dev
#8 15778 C. Thomas Tyler Routine Merge Down to dev from main.
#7 15701 C. Thomas Tyler Routine merge down using 'p4 merge -b perforce_software-sdp-dev'.
#6 15374 adrian_waters - Ensure backup scripts are run as the OSUSER (to prevent accidental running as root); 
- in scripts where LOGFILE value is changed from the 'checkpoint.log'  set by set_vars, ensure the new assignment is before check_dirs is called, otherwise errors could be written to the 'wrong' log
- in 'die()' - detect if running from terminal & also send output to stderr
#5 13906 C. Thomas Tyler Normalized P4INSTANCE to SDP_INSTANCE to get Unix/Windows
implementations in sync.

Reasons:
1. Things that interact with SDP in both Unix and Windows
environments shoudn't have to account for this obscure
SDP difference between Unix and Windows.  (I came across
this doing CBD work).

2. The Windows and Unix scripts have different variable
names for defining the same concept, the SDP instance.
Unix uses P4INSTANCE, while Windows uses SDP_INSTANCE.

3. This instance tag, a data set identifier, is an SDP concept.
I prefer the SDP_INSTANCE name over P4INSTANCE, so I prpose
to normalize to SDP_INSTANCE.

4. The P4INSTANCE name makes it look like a setting that might be
recognized by the p4d itself, which it is not.  (There are other
such things such as P4SERVER that could perhaps be renamed as
a separate task; but I'm not sure we want to totally disallow
the P4 prefix for variable names. It looks too right to be wrong
in same cases, like P4BIN and P4DBIN.  That's a discussion for
another day, outside the scope of this task).

Meanwhile:
* Fixed a bug in the Windows 2013.3 upgrade script that
was referencing undefined P4INSTANCE, as the Windows
environment defined only SDP_INSTANCE.

* Had P4INSTANCE been removed completely, this change would
likely cause trouble for users doing updates for existing
SDP installations.  So, though it involves slight technical debt,
I opted to keep a redundant definition of P4INSTANCE
in p4_vars.template, with comments indicating SDP_INSTANCE should be
used in favor of P4INSTANCE, with a warning that P4INSTANCE
may go away in a future release.  This should avoid unnecessary
upgrade pain.

* In mkdirs.sh, the varialbe name was INSTANCE rather than
SDP_INSTANCE.  I changed that as well.  That required manual
change rather than sub/replace to avoid corrupting other similar
varialbe names (e.g.  MASTERINSTANCE).

This is a trivial change technically (a substitute/replace, plus
tweaks in p4_vars.template), but impacts many files.
#4 12169 Russell C. Jackson (Rusty) Updated copyright date to 2015

 Updated shell scripts to require an instance parameter to eliminate the need
 for calling p4master_run.    Python and Perl still need it since you have to set the
environment for them to run in.

 Incorporated comments from reviewers. Left the . instead of source as that seems
more common in the field and has the same functionality.
#3 12107 C. Thomas Tyler Routine merge down from 'main' to 'dev', resolved with
'p4 resolve -as'.
#2 12028 C. Thomas Tyler Refreshed SDP dev branch, merging down from main.
#1 10638 C. Thomas Tyler Populate perforce_software-sdp-dev.
//guest/perforce_software/sdp/main/Server/Unix/p4/common/bin/sync_replica.sh
#1 10148 C. Thomas Tyler Promoted the Perforce Server Deployment Package to The Workshop.