SDP-302

akwan
Suspended
Parallelized checkpoint processing to reduce duration.

Enable parallel checkpoints, and include test suite coverage for
same.

Excerpt of email from Alan Kwan:
---
 I've framed out a pseudo code implementation of how it could behave
as backup_functions in SDP:

dump_checkpoint_parallel()

- get list of db files (this can be optimized to sort by largest or
smallest to keep work queues as saturated as possible)
- get p4_var variable set to # of worker threads, else use logic to
determine a just in time value:
- figure out cpu core count
- check system active load value
    - define # of threads = to core count minus active load, minus 1 (if
result is 0 or less than 1, set to 1 - not parallel)
- define work queue (ls -1 /p4/1/offline_db/)
- insert code to execute against work queue based on (
http://hackthology.com/a-job-queue-in-bash.html ), and while limit,
keep working until end - checkpoint files are named
/p4/1/checkpoints/p4_1.ckp.db.have.number.gz (along with their MD5)
    - rewrite the offline_

restore parallel would implement something similar - get the list of
compressed checkpoint files, throw in a work queue, and jr -z on each
one into the same offline_db folder until they're all done.

augment remove_old_checkpoints_and_journals to incorporate these sort
of checkpoints

Excerpt of email from Robert Cowham:
---

An alternative step along the way also is to use pigz or similar for
parallel compression which is where a lot of time is spent.

Typically the focus should be on the 3-7 or so files which comprise
the vast majority of the data (db.have/db.rev and friends/db.integed/
db.label depending)

I would also be tempted to tar the result into one file rather after
zipping/before unzipping for ease of management.
Status
Suspended
Project
perforce-software-sdp
Severity
C
Reported By
akwan
Reported Date
Modified By
tom_tyler
Modified Date
Owned By
tom_tyler
Dev Notes
[2021/07/06 tom_tyler]: This job has been suspended.  Turns out some
needed p4d support (a command to get a list of checkpointed tables)
isn't available. Also, there is hope that a future release of p4d will
provide this capability without the need for scripting.

While there are implementations of the parallel checkpoint mechanism
that have been made to work (by checkpointing all tables whether they
need it or not), this is the sort of thing that can never fail.  We
decided this feature, while it would be valuable, is best done as a
p4d feature rather than an SDP feature.  When the needed functionality
is added to p4d, this job will be re-opened.

[2020/08/18 tom_tyler]: Re-opening this job to re-add this feature,
with full test suite coverage.

Older Notes:

This can be done reliably, but will be sophisticated.  We may want
to add an optional new setting in instance_vars.template, e.g.
PARALLEL_CHECKPOINTS with a default value of 0.

Then either dump_checkpoint() or dump_checkpoint_parallel() would be
called depending on whether that new var is set to 1 or not.  So
by default it would still do single-threaded checkpoints, and
would do parallel checkpoints if explicitly enabled.
Component
core-unix
Type
Feature