SDP-246

tom_tyler (C. Thomas Tyler)
C. Thomas Tyler created this job , modified by swarm-user
Closed
Sourcing p4_vars hangs if live_checkpoint.sh is running.

Clearly the 'p4d -cshow' check is the culprit here.

There is a reasonable expectation when sourcing an environment file
that:
* it'll take about a millisecond, perhaps only a nanosecond, and
* it won't do anything 'active', only setup the environment.

With the recent fix to avoid calling 'p4d -cshow' if there are no
db.* files, we are no longer doing anything active.  Yay.

This was first experienced when doing a 'sudo su - perforce'
was done; it hanged when a live_checkpoint.sh was running,
as the ~perforce/.bashrc file contained a line to do
'source /p4/common/bin/p4_vars N', which is a common and
reasonable thing to do.

This may be related to SDP-245.
27065Fixed issue where 'source p4_vars' hangs if load_checkpoint.sh is running.

Added new semaphore file, $P4ROOT/P4ROOT_not_usable.txt.  This is used in
a way similar to 'offline_db_usable.txt' in the offline_db, except that this
file only exists when the databases in P4ROOT are not usable. This is the
opposite of how offline_db_usable.txt works, because P4ROOT is expected to
be usable 99.9% fo the time.  p4d_base will refuse to start p4d if this file
exists, protecting against possible operator errors (like trying to start
p4d when a checkpoint is still loading).

Added check_file_dne() function to verify_sdp.sh to confirm a named file does not exist.
Added checks in verify_sdp.sh that P4ROOT_not_usable.txt does not exist in P4ROOT
or offline_db.

Modified switch_db_files() (called by refresh_P4ROOT_from_offline_db.sh) to properly
use the new P4ROOT_not_usable.txt safety file.

Fixed bugs in p4d_base that could cause p4d_init.log to be overwritten if error output
was generated.

Removed call to 'backup_functions.sh' in p4d_base, as on balance it added more complexity
than needed.
27064Fixed issue where 'source p4_vars' hangs if load_checkpoint.sh is running.

Added new semaphore file, $P4ROOT/P4ROOT_not_usable.txt.  This is used in
a way similar to 'offline_db_usable.txt' in the offline_db, except that this
file only exists when the databases in P4ROOT are not usable. This is the
opposite of how offline_db_usable.txt works, because P4ROOT is expected to
be usable 99.9% fo the time.  p4d_base will refuse to start p4d if this file
exists, protecting against possible operator errors (like trying to start
p4d when a checkpoint is still loading).

Added check_file_dne() function to verify_sdp.sh to confirm a named file does not exist.
Added checks in verify_sdp.sh that P4ROOT_not_usable.txt does not exist in P4ROOT
or offline_db.

Modified switch_db_files() (called by refresh_P4ROOT_from_offline_db.sh) to properly
use the new P4ROOT_not_usable.txt safety file.

Fixed bugs in p4d_base that could cause p4d_init.log to be overwritten if error output
was generated.

Removed call to 'backup_functions.sh' in p4d_base, as on balance it added more complexity
than needed.

#review-27065
  • Details
  • Comments -
Status
Closed
Project
perforce-software-sdp
Severity
C
Reported By
C. Thomas Tyler
Reported Date
Modified By
swarm-user
Modified Date
Owned By
tom_tyler
Dev Notes
2020/12/13 ttyler:
I had expected this P4D server job, released in P4D 2019.2 Patch 2,
would be the fix for this:

#1898491 (Job #93870) **
    'p4d -cset', 'p4d -cunset' and 'p4d -cshow' no longer take write
    locks on all tables; instead only the tables required are locked.
    'p4d -cshow' now only takes read locks.

That is good, however not sufficient.  With P4D 2020.2 For example, if
load_checkpoint.sh is running to reseed a replica, sourcing p4_vars
still hangs for that instance. An SDP fix is needed.
Component
core-unix
Type
Bug