OUTCOME (completed port)
All phases below were implemented. Files touched (under
Server/Unix/p4/common/):New scripts copied/ported from upstream:
bin/ccheck.sh(+config/configurables.cfg),bin/check_dir_ownership.sh,bin/depot_verify_chunks.py,bin/edge_vars,bin/journal_watch.sh,bin/keep_offline_db_current.sh,bin/load_checkpoint.sh,bin/opt_perforce_sdp_backup.sh,bin/p4sanity_check.sh,bin/proxy_rotate.sh,bin/refresh_P4ROOT_from_offline_db.sh,bin/request_replica_checkpoint.sh,bin/run_if_broker.sh,bin/run_if_proxy.sh,bin/sdp_health_check.sh,bin/verify_sdp.sh.Patched existing fork scripts:
bin/backup_functions.sh(copy_jd_table, remove_jd_tables, get_target_config_value, rsync_with_preflight, copy_readonly_clients_dir, request_replica_checkpoint, get_latest_checkpoint_with_md5; copy_readonly wired into switch_db_files),bin/upgrade.sh(p4 storage -w + p4 upgrades polling + 2nd journal rotation),bin/edge_dump.sh(partitioned-storage tables),bin/replica_status.sh(archive-replication check),bin/replica_cleanup.sh,bin/broker_rotate.sh(bug fix),bin/gen_default_broker_cfg.sh(net.autotune),bin/p4dstate.sh(lslocks -J),bin/p4login(full modern replacement),bin/p4_vars(p4login support vars),bin/p4d_base(ownership preflight gate, start-only).MUST be validated on a non-prod instance before relying on them (high blast radius / untestable here):
p4login,p4d_base(start gate),verify_sdp.sh(tune its-skiplist for this fork's older layout),load_checkpoint.sh/refresh_P4ROOT_from_offline_db.sh(DB restore),upgrade.sh.Known limitations:
load_checkpoint.shedge-server path requiresedge_vars(now shipped);keep_offline_db_current.shreplays from local CHECKPOINTS/JOURNALS (NFS-shared target checkpoints must be reachable locally) and itsreplay_journals_to_offline_dbwas intentionally NOT given the upstreamuseTargetJournalPrefixarg to avoid changing the fork's custom two-pass replay. Not ported: wholesale p4_vars/instance_vars env contract, systemd enforcement, mkrep.sh, edge_shelf_replicate.sh, p4ftpd_base, p4review2.py, p4brokerstate.sh/p4pstate.sh (ship as broken admin-edit templates upstream).POST-PORT REFINEMENTS
- Config migration to the config-file model: the 15 fork-specific configurables now live in
config/configurables.cfgand are applied viabin/ccheck.sh -fix;setup/configure_new_server.shwas slimmed to setup-only (it callsccheck.shfor configurables).journalPrefixin the cfg points atjournals.rotated(fork layout).- Rename:
helix_binaries/get_helix_binaries.sh→p4_binaries/get_p4_binaries.sh(Perforce's P4 rebrand); FTP URL and product names left unchanged.- Removed as unused: the
backup_functions.shhelpersget_target_config_valueandget_latest_checkpoint_with_md5were deleted (no dynamic journalPrefix logic remains;copy_jd_table/remove_jd_tablesare retained and still used).MULTI-AGENT REVIEW FIXES (submitted as Perforce change 32803)
A multi-agent review of the full changeset confirmed: no dangling references to the removed functions, no leftover
helixreferences, and cleanbash -nacross all scripts. It found 9 real issues — all fixed. Several were pre-existing upstream defects inherited by verbatim copy, so the fork is now cleaner than upstream on those.
- (High)
bin/backup_functions.shrsync_with_preflight: rsync--statssize is in BYTES (not KB) and stock GNU rsync adds comma separators, so the disk-space safety check was off by ~1024× and could crash bash arithmetic. Now strips commas, converts bytes→KB, defaults to 0.- (High)
config/configurables.cfghcc|filesys.P4JOURNAL.minandhcc|filesys.P4LOG.min: a strayExactfield (8 fields instead of 7) madeccheck -fixbail under thehccprofile. Removed.- (High)
bin/upgrade.sh: p4d version thresholds were 2-digit ("18.2"/"19.1") but compared against 4-digitp4d -Voutput. Fixed to"2018.2"/"2019.1".- (Med)
bin/upgrade.sh:start_p4dwas nested inside the version gate. Made unconditional so p4d is never left stopped after a DB upgrade.- (Low)
bin/p4login:$JDTmpDirwas referenced unguarded underset -uwhen db.config is unreadable. Guarded with a[[ -r "$P4ROOT/db.config" ]]check.- (Low)
bin/load_checkpoint.sh: a compressed numbered journal was stored without its.gzsuffix, aborting replay. Suffix added.- (Low)
bin/proxy_rotate.sh: deadcheck_dirs 2argument (fork'scheck_dirstakes none) removed.- (Low)
p4_binaries/get_p4_binaries.sh: long-formYYYY.Nyear whitelist stopped at 2024, rejecting2025.2(the default isr25.2). Added 2025/2026.- (Low)
config/configurables.cfgalways|dm.user.resetpassword: only 6 fields (missingServerIDType). Fixed to 7.
Scope of Server/Unix/p4/common/bin. Compares the fork against upstream
Rev. SDP/MultiArch/2025.2/32234. Already ported: partitioned/readonly client
directory handling (rsync_with_preflight + copy_readonly_clients_dir, wired
into switch_db_files).
printf %q safe re-exec (p4d_base), array arg passing
(p4master_run), p4d -xu pre-start upgrade, local-scoped ps_functions.sh,
pid-protecting kill_idle.sh, modernized update_limits.py, and a
SERVER_TYPE-based run_if_* design. A blind overwrite would REGRESS these.p4_vars uses a
hostname-qualified P4PORT, SERVER_TYPE (from sdp_server_type.txt),
JOURNALS=journals.rotated, RSYNCUSER, and a serverid.vars model. Upstream
uses a db.config-driven instance_vars model. Do NOT replace p4_vars /
instance_vars wholesale — add individual variables only.| # | Item | Upstream source | What modern p4d feature | Risk | Effort |
|---|---|---|---|---|---|
| 1 | p4 storage -w + p4 upgrades polling in upgrade flow |
upgrade.sh |
Waits for async db.storage upgrade (2019.1) and background upgrade tasks (2020.2+) before rotating/replaying journals. The fork's upgrade.sh does journaled -xu but does NOT wait — this is the one genuinely dangerous modern-p4d gap. |
LOW–MED | SMALL |
| 2 | Post-upgrade second journal rotation | upgrade.sh |
Rotates journal after a major upgrade so upgrade DB changes flow into offline_db. Master/commit only. | MED | SMALL |
| 3 | edge_vars partitioned-storage table lists |
edge_vars, edge_dump.sh, recover_edge.sh |
Excludes/seeds modern partitioned-storage db.* tables (db.storagesh/sx, db.haveg, db.workingg, db.locksg, db.resolveg) per p4d version, instead of the fork's hardcoded inline table list. |
MED | MED |
| 4 | request_replica_checkpoint (p4 admin checkpoint -Z) |
request_replica_checkpoint.sh + backup_functions.sh |
journalcopy/standby checkpoint-at-next-rotation workflow, parallel -p -m -N. |
LOW | SMALL |
| 5 | Archive-replication health check | replica_status.sh (lbr.replication) |
Detects librarian/archive transfer failures (pull -ls, "Transfer of librarian file failed"), version-gated pull -ljv/-lj. Fork checks only metadata journal lag — archive failures go silent today. |
MED | MED |
| 6 | keep_offline_db_current.sh |
upstream-only script | Keeps a standby's offline_db current on NFS-shared checkpoints without full checkpoints. | MED | MED |
| 7 | Modern p4login |
p4login (v4.6.2) |
auth.id-aware login, service/automation users, edge ExternalAddress, P4AUTH/P4BROKERPORT login, SSL p4 trust, encrypted password file. Fork's p4login is a 2-line stub. Signature-compatible with callers. |
MED | LARGE |
| 8 | Proxy/broker modernization | p4p_base, gen_default_broker_cfg.sh, p4broker_base |
net.autotune/-v track proxy flags, SSL p4 trust of target/listen, multi-config broker (*.broker.<host>.cfg). |
MED | MED |
| 9 | check_dir_ownership.sh preflight |
p4d_base + script |
Fast (-maxdepth 1) P4ROOT ownership check before start; explicitly designed around large partitioned-client / server.locks dirs. |
LOW–MED | MED |
Foundational helpers (port first — they unlock #4/#5/#6 and improve safety):
copy_jd_table() / remove_jd_tables() (read db.config/db.counters from a temp
copy instead of the live DB), get_target_config_value() (P4TARGET→configure show
discovery), get_latest_checkpoint_with_md5(). All LOW risk / SMALL effort.
p4sanity_check.sh — service smoke test. Standalone (sources only p4_vars). LOW/SMALL.sdp_health_check.sh — version-agnostic health report; low entanglement by design. Better target than verify_sdp.sh. LOW–MED/MED.check_dir_ownership.sh (standalone) — wrong-owner detector after bad restore/rsync. LOW/SMALL.journal_watch.sh — journal-partition free-space watcher + auto-rotate/mail. LOW–MED/SMALL.depot_verify_chunks.py — chunk huge depots for parallel p4 verify. Complements fork's cron_verify.sh/p4verify.py. Needs P4Python. LOW/SMALL–MED.broker_rotate.sh bug fix — fork hardcodes P4PORT=1666 and calls a meaningless get_journalnum on broker hosts. Real defect; fix regardless. LOW/SMALL.replica_cleanup.sh — add -service login, disk-space check, mail. LOW/SMALL.run_if_broker.sh / run_if_proxy.sh — fork lacks these gates. LOW/SMALL.proxy_rotate.sh, p4brokerstate.sh, p4pstate.sh — proxy/broker log-rotation + state-capture diagnostics. LOW/SMALL.p4dstate.sh — add lslocks -J (JSON, machine-parseable) capture. LOW/SMALL.SDP_MAX_START/STOP_DELAY_*, SDP_ALWAYS_LOGIN, SDP_AUTOMATION_USERS, SDP_VERSION, SDP_ADMIN_PASSWORD_FILE (only wire up alongside their consumers). LOW/SMALL.verify_sdp.sh (87KB) — full SDP self-verification harness; biggest single capability gap but tightly bound to the 2025.2 layout/configurables. HIGH/LARGE. Prefer sdp_health_check.sh first.upgrade.sh orchestrator (1576 vs 169 lines) — 5-phase preflight-driven multi-binary upgrade. Pulls in verify_sdp.sh + get_helix_binaries.sh. HIGH/LARGE. (But the cheap subset — Tier 1 #1/#2 — captures most of the safety benefit.)refresh_P4ROOT_from_offline_db.sh / load_checkpoint.sh — modern swap/restore tools (parallel + compressed checkpoint aware). Overlap with fork's recreate_db_* lineage. HIGH/LARGE.ccheck.sh + configurables.cfg — config-drift / security audit. Needs the baseline data file ported too. MED/MED.opt_perforce_sdp_backup.sh — DR backup of the SDP tooling layer itself. MED/MED–LARGE.p4_vars / instance_vars (breaks fork's P4PORT/SERVER_TYPE/JOURNALS contract).db.config-driven replica resolution.p4d_base, p4master_run, ps_functions.sh, kill_idle.sh, update_limits.py (fork is ahead in places — merge fields only).mkrep.sh (fork's mkstandby*/mkedge* cover it), edge_shelf_replicate.sh (obsolete on modern p4d), p4ftpd_base (obsolete), p4review2.py (Swarm supersedes).copy_jd_table/remove_jd_tables,
get_target_config_value, get_latest_checkpoint_with_md5 into backup_functions.sh.broker_rotate.sh fix.Added to bin/live_checkpoint.sh + bin/backup_functions.sh (config in bin/p4_vars). Goal:
shrink the live-server outage of a live checkpoint from the WHOLE checkpoint dump (the p4d -jc
lock, minutes–hours) to the few seconds needed to take a consistent snapshot of P4ROOT, then build
the checkpoint FROM the snapshot offline.
Flow (snapshot_checkpoint, master-only, parallel-aware): build+validate provider create command
→ rotate journal (non-fatal) → p4d -r $P4ROOT -c "<create>" (p4d "lock tables, run command, unlock"
= consistent snapshot, brief lock) → expose snapshot as a readable root → dump_checkpoint_from_root
(offline) → guaranteed teardown. On any failure it returns non-zero and live_checkpoint.sh falls
back to the in-place checkpoint().
Methods (detect_snapshot_method, precedence, override SNAPSHOT_METHOD=auto|reflink|aws|azure|gcp|off):
db.* (XFS reflink=1 / btrfs). Fully local, the
primary/testable path; no config.SNAPSHOT_* vars in p4_vars); needs the provider
CLI + an instance role with snapshot/volume permissions, plus root/sudo for mount.checkpoint().Supporting changes: refactored dump_checkpoint → dump_checkpoint_from_root <root> (reused by the
snapshot path; preserves the SNAPSHOT_SCRIPT hook). New helpers: detect_snapshot_method,
snapshot_{create_script,expose,destroy}_<method>, snapshot_rotate_journal (non-fatal rotate),
snapshot_wait_for_device, snapshot_build_create_script, snapshot_expose, snapshot_cleanup,
snapshot_checkpoint, _snapshot_checkpoint_run.
Multi-agent review fixes (in the same change): mode-aware checkpoint existence-skip (parallel -jdpm
vs serial .gz); half-written-checkpoint guard restored; SNAPSHOT_METHOD validation + declare -F
guard; cloud teardown subshell-leak fixed (expose sets a global SnapRoot, runs in-shell so
volume/mount state persists for cleanup); non-fatal journal rotation so failures fall back instead of
aborting; build/validate before rotating; bounded device-wait poll instead of a fixed sleep.
MUST validate on a non-prod instance: the journal-boundary consistency (rebuild offline_db from a
snapshot checkpoint, replay journals, p4 verify vs. a control checkpoint) and the cloud paths'
environment-specific config (volume IDs, device naming incl. AWS Nitro nvme, mount privilege). The
rotate→p4d -c-lock transaction gap is a documented, accepted residual risk — run during a quiet window.
# SDP Feature-Port Scope: russell_jackson fork vs perforce_software 2025.2
> ## OUTCOME (completed port)
> All phases below were implemented. Files touched (under `Server/Unix/p4/common/`):
>
> **New scripts copied/ported from upstream:** `bin/ccheck.sh` (+ `config/configurables.cfg`),
> `bin/check_dir_ownership.sh`, `bin/depot_verify_chunks.py`, `bin/edge_vars`,
> `bin/journal_watch.sh`, `bin/keep_offline_db_current.sh`, `bin/load_checkpoint.sh`,
> `bin/opt_perforce_sdp_backup.sh`, `bin/p4sanity_check.sh`, `bin/proxy_rotate.sh`,
> `bin/refresh_P4ROOT_from_offline_db.sh`, `bin/request_replica_checkpoint.sh`,
> `bin/run_if_broker.sh`, `bin/run_if_proxy.sh`, `bin/sdp_health_check.sh`, `bin/verify_sdp.sh`.
>
> **Patched existing fork scripts:** `bin/backup_functions.sh` (copy_jd_table, remove_jd_tables,
> get_target_config_value, rsync_with_preflight, copy_readonly_clients_dir, request_replica_checkpoint,
> get_latest_checkpoint_with_md5; copy_readonly wired into switch_db_files), `bin/upgrade.sh`
> (p4 storage -w + p4 upgrades polling + 2nd journal rotation), `bin/edge_dump.sh` (partitioned-storage
> tables), `bin/replica_status.sh` (archive-replication check), `bin/replica_cleanup.sh`,
> `bin/broker_rotate.sh` (bug fix), `bin/gen_default_broker_cfg.sh` (net.autotune), `bin/p4dstate.sh`
> (lslocks -J), `bin/p4login` (full modern replacement), `bin/p4_vars` (p4login support vars),
> `bin/p4d_base` (ownership preflight gate, start-only).
>
> **MUST be validated on a non-prod instance before relying on them** (high blast radius / untestable here):
> `p4login`, `p4d_base` (start gate), `verify_sdp.sh` (tune its `-skip` list for this fork's older
> layout), `load_checkpoint.sh` / `refresh_P4ROOT_from_offline_db.sh` (DB restore), `upgrade.sh`.
>
> **Known limitations:** `load_checkpoint.sh` edge-server path requires `edge_vars` (now shipped);
> `keep_offline_db_current.sh` replays from local CHECKPOINTS/JOURNALS (NFS-shared target checkpoints
> must be reachable locally) and its `replay_journals_to_offline_db` was intentionally NOT given the
> upstream `useTargetJournalPrefix` arg to avoid changing the fork's custom two-pass replay. Not ported:
> wholesale p4_vars/instance_vars env contract, systemd enforcement, mkrep.sh, edge_shelf_replicate.sh,
> p4ftpd_base, p4review2.py, p4brokerstate.sh/p4pstate.sh (ship as broken admin-edit templates upstream).
> ## POST-PORT REFINEMENTS
> - **Config migration to the config-file model:** the 15 fork-specific configurables now
> live in `config/configurables.cfg` and are applied via `bin/ccheck.sh -fix`;
> `setup/configure_new_server.sh` was slimmed to setup-only (it calls `ccheck.sh` for
> configurables). `journalPrefix` in the cfg points at `journals.rotated` (fork layout).
> - **Rename:** `helix_binaries/get_helix_binaries.sh` → `p4_binaries/get_p4_binaries.sh`
> (Perforce's P4 rebrand); FTP URL and product names left unchanged.
> - **Removed as unused:** the `backup_functions.sh` helpers `get_target_config_value` and
> `get_latest_checkpoint_with_md5` were deleted (no dynamic journalPrefix logic remains;
> `copy_jd_table`/`remove_jd_tables` are retained and still used).
> ## MULTI-AGENT REVIEW FIXES (submitted as Perforce change 32803)
> A multi-agent review of the full changeset confirmed: no dangling references to the
> removed functions, no leftover `helix` references, and clean `bash -n` across all scripts.
> It found 9 real issues — **all fixed**. Several were pre-existing upstream defects
> inherited by verbatim copy, so the fork is now cleaner than upstream on those.
>
> 1. **(High)** `bin/backup_functions.sh` `rsync_with_preflight`: rsync `--stats` size is in
> BYTES (not KB) and stock GNU rsync adds comma separators, so the disk-space safety check
> was off by ~1024× and could crash bash arithmetic. Now strips commas, converts bytes→KB,
> defaults to 0.
> 2. **(High)** `config/configurables.cfg` `hcc|filesys.P4JOURNAL.min` and `hcc|filesys.P4LOG.min`:
> a stray `Exact` field (8 fields instead of 7) made `ccheck -fix` bail under the `hcc`
> profile. Removed.
> 3. **(High)** `bin/upgrade.sh`: p4d version thresholds were 2-digit (`"18.2"`/`"19.1"`) but
> compared against 4-digit `p4d -V` output. Fixed to `"2018.2"`/`"2019.1"`.
> 4. **(Med)** `bin/upgrade.sh`: `start_p4d` was nested inside the version gate. Made
> unconditional so p4d is never left stopped after a DB upgrade.
> 5. **(Low)** `bin/p4login`: `$JDTmpDir` was referenced unguarded under `set -u` when
> db.config is unreadable. Guarded with a `[[ -r "$P4ROOT/db.config" ]]` check.
> 6. **(Low)** `bin/load_checkpoint.sh`: a compressed numbered journal was stored without its
> `.gz` suffix, aborting replay. Suffix added.
> 7. **(Low)** `bin/proxy_rotate.sh`: dead `check_dirs 2` argument (fork's `check_dirs` takes
> none) removed.
> 8. **(Low)** `p4_binaries/get_p4_binaries.sh`: long-form `YYYY.N` year whitelist stopped at
> 2024, rejecting `2025.2` (the default is `r25.2`). Added 2025/2026.
> 9. **(Low)** `config/configurables.cfg` `always|dm.user.resetpassword`: only 6 fields
> (missing `ServerIDType`). Fixed to 7.
Scope of `Server/Unix/p4/common/bin`. Compares the fork against upstream
Rev. SDP/MultiArch/2025.2/32234. **Already ported:** partitioned/readonly client
directory handling (`rsync_with_preflight` + `copy_readonly_clients_dir`, wired
into `switch_db_files`).
## Guiding constraints (why this is field-level, not file-level)
- The fork is **not uniformly older**. It is independently ahead of upstream in
several places: `printf %q` safe re-exec (`p4d_base`), array arg passing
(`p4master_run`), `p4d -xu` pre-start upgrade, `local`-scoped `ps_functions.sh`,
pid-protecting `kill_idle.sh`, modernized `update_limits.py`, and a
`SERVER_TYPE`-based `run_if_*` design. A blind overwrite would REGRESS these.
- The **env contract diverges fundamentally**. The fork's `p4_vars` uses a
hostname-qualified `P4PORT`, `SERVER_TYPE` (from `sdp_server_type.txt`),
`JOURNALS=journals.rotated`, `RSYNCUSER`, and a `serverid.vars` model. Upstream
uses a `db.config`-driven `instance_vars` model. **Do NOT replace `p4_vars` /
`instance_vars` wholesale** — add individual variables only.
- The fork is intentionally **init-based, not systemd**. Do not re-enable the
systemd-enforcement blocks.
---
## TIER 1 — Modern p4d feature support (highest value, your stated goal)
| # | Item | Upstream source | What modern p4d feature | Risk | Effort |
|---|------|-----------------|-------------------------|------|--------|
| 1 | **`p4 storage -w` + `p4 upgrades` polling** in upgrade flow | `upgrade.sh` | Waits for async db.storage upgrade (2019.1) and background upgrade tasks (2020.2+) before rotating/replaying journals. The fork's `upgrade.sh` does journaled `-xu` but does NOT wait — this is the one genuinely **dangerous** modern-p4d gap. | LOW–MED | SMALL |
| 2 | **Post-upgrade second journal rotation** | `upgrade.sh` | Rotates journal after a major upgrade so upgrade DB changes flow into offline_db. Master/commit only. | MED | SMALL |
| 3 | **`edge_vars` partitioned-storage table lists** | `edge_vars`, `edge_dump.sh`, `recover_edge.sh` | Excludes/seeds modern partitioned-storage db.* tables (`db.storagesh/sx`, `db.haveg`, `db.workingg`, `db.locksg`, `db.resolveg`) per p4d version, instead of the fork's hardcoded inline table list. | MED | MED |
| 4 | **`request_replica_checkpoint` (`p4 admin checkpoint -Z`)** | `request_replica_checkpoint.sh` + `backup_functions.sh` | journalcopy/standby checkpoint-at-next-rotation workflow, parallel `-p -m -N`. | LOW | SMALL |
| 5 | **Archive-replication health check** | `replica_status.sh` (lbr.replication) | Detects librarian/archive transfer failures (`pull -ls`, "Transfer of librarian file failed"), version-gated `pull -ljv`/`-lj`. Fork checks only metadata journal lag — archive failures go silent today. | MED | MED |
| 6 | **`keep_offline_db_current.sh`** | upstream-only script | Keeps a standby's offline_db current on NFS-shared checkpoints without full checkpoints. | MED | MED |
| 7 | **Modern `p4login`** | `p4login` (v4.6.2) | `auth.id`-aware login, service/automation users, edge `ExternalAddress`, P4AUTH/P4BROKERPORT login, SSL `p4 trust`, encrypted password file. Fork's `p4login` is a 2-line stub. Signature-compatible with callers. | MED | LARGE |
| 8 | **Proxy/broker modernization** | `p4p_base`, `gen_default_broker_cfg.sh`, `p4broker_base` | `net.autotune`/`-v track` proxy flags, SSL `p4 trust` of target/listen, multi-config broker (`*.broker.<host>.cfg`). | MED | MED |
| 9 | **`check_dir_ownership.sh` preflight** | `p4d_base` + script | Fast (`-maxdepth 1`) P4ROOT ownership check before start; explicitly designed around large partitioned-client / server.locks dirs. | LOW–MED | MED |
**Foundational helpers** (port first — they unlock #4/#5/#6 and improve safety):
`copy_jd_table()` / `remove_jd_tables()` (read db.config/db.counters from a temp
copy instead of the live DB), `get_target_config_value()` (P4TARGET→`configure show`
discovery), `get_latest_checkpoint_with_md5()`. All LOW risk / SMALL effort.
---
## TIER 2 — Robustness / quality (cheap, safe, mostly standalone)
- **`p4sanity_check.sh`** — service smoke test. Standalone (sources only `p4_vars`). LOW/SMALL.
- **`sdp_health_check.sh`** — version-agnostic health report; low entanglement by design. **Better target than `verify_sdp.sh`.** LOW–MED/MED.
- **`check_dir_ownership.sh`** (standalone) — wrong-owner detector after bad restore/rsync. LOW/SMALL.
- **`journal_watch.sh`** — journal-partition free-space watcher + auto-rotate/mail. LOW–MED/SMALL.
- **`depot_verify_chunks.py`** — chunk huge depots for parallel `p4 verify`. Complements fork's `cron_verify.sh`/`p4verify.py`. Needs P4Python. LOW/SMALL–MED.
- **`broker_rotate.sh` bug fix** — fork hardcodes `P4PORT=1666` and calls a meaningless `get_journalnum` on broker hosts. Real defect; fix regardless. LOW/SMALL.
- **`replica_cleanup.sh`** — add `-service` login, disk-space check, mail. LOW/SMALL.
- **`run_if_broker.sh` / `run_if_proxy.sh`** — fork lacks these gates. LOW/SMALL.
- **`proxy_rotate.sh`, `p4brokerstate.sh`, `p4pstate.sh`** — proxy/broker log-rotation + state-capture diagnostics. LOW/SMALL.
- **`p4dstate.sh`** — add `lslocks -J` (JSON, machine-parseable) capture. LOW/SMALL.
- **New additive instance vars** — `SDP_MAX_START/STOP_DELAY_*`, `SDP_ALWAYS_LOGIN`, `SDP_AUTOMATION_USERS`, `SDP_VERSION`, `SDP_ADMIN_PASSWORD_FILE` (only wire up alongside their consumers). LOW/SMALL.
---
## TIER 3 — High value but heavy / entangled (decide case-by-case)
- **`verify_sdp.sh`** (87KB) — full SDP self-verification harness; biggest single capability gap but tightly bound to the 2025.2 layout/configurables. HIGH/LARGE. Prefer `sdp_health_check.sh` first.
- **Full `upgrade.sh` orchestrator** (1576 vs 169 lines) — 5-phase preflight-driven multi-binary upgrade. Pulls in `verify_sdp.sh` + `get_helix_binaries.sh`. HIGH/LARGE. (But the cheap subset — Tier 1 #1/#2 — captures most of the safety benefit.)
- **`refresh_P4ROOT_from_offline_db.sh` / `load_checkpoint.sh`** — modern swap/restore tools (parallel + compressed checkpoint aware). Overlap with fork's `recreate_db_*` lineage. HIGH/LARGE.
- **`ccheck.sh` + `configurables.cfg`** — config-drift / security audit. Needs the baseline data file ported too. MED/MED.
- **`opt_perforce_sdp_backup.sh`** — DR backup of the SDP tooling layer itself. MED/MED–LARGE.
---
## DO NOT port
- Wholesale `p4_vars` / `instance_vars` (breaks fork's P4PORT/SERVER_TYPE/JOURNALS contract).
- systemd enforcement blocks (fork is intentionally init-based).
- Wholesale `db.config`-driven replica resolution.
- Whole-file overwrite of `p4d_base`, `p4master_run`, `ps_functions.sh`, `kill_idle.sh`, `update_limits.py` (fork is ahead in places — merge fields only).
- `mkrep.sh` (fork's `mkstandby*`/`mkedge*` cover it), `edge_shelf_replicate.sh` (obsolete on modern p4d), `p4ftpd_base` (obsolete), `p4review2.py` (Swarm supersedes).
---
## Recommended phased order
1. **Phase 0 (foundational, LOW/SMALL):** `copy_jd_table`/`remove_jd_tables`,
`get_target_config_value`, `get_latest_checkpoint_with_md5` into `backup_functions.sh`.
2. **Phase 1 (critical correctness):** Tier 1 #1 + #2 (upgrade.sh storage/upgrades polling).
3. **Phase 2 (cheap robustness):** Tier 2 standalone scripts + `broker_rotate.sh` fix.
4. **Phase 3 (replica/standby features):** Tier 1 #4, #5, #6, #3.
5. **Phase 4 (larger):** Tier 1 #7 (p4login), #8 (proxy/broker), #9; then Tier 3 as desired.
---
## SNAPSHOT-BASED LIVE CHECKPOINT (later enhancement; submitted as change 32806)
Added to `bin/live_checkpoint.sh` + `bin/backup_functions.sh` (config in `bin/p4_vars`). Goal:
shrink the live-server outage of a live checkpoint from the WHOLE checkpoint dump (the `p4d -jc`
lock, minutes–hours) to the few seconds needed to take a consistent snapshot of P4ROOT, then build
the checkpoint FROM the snapshot offline.
**Flow (`snapshot_checkpoint`, master-only, parallel-aware):** build+validate provider create command
→ rotate journal (non-fatal) → `p4d -r $P4ROOT -c "<create>"` (p4d "lock tables, run command, unlock"
= consistent snapshot, brief lock) → expose snapshot as a readable root → `dump_checkpoint_from_root`
(offline) → guaranteed teardown. On any failure it returns non-zero and `live_checkpoint.sh` falls
back to the in-place `checkpoint()`.
**Methods** (`detect_snapshot_method`, precedence, override `SNAPSHOT_METHOD=auto|reflink|aws|azure|gcp|off`):
1. **reflink** — local copy-on-write clone of `db.*` (XFS reflink=1 / btrfs). Fully local, the
primary/testable path; no config.
2. **aws / azure / gcp** — create a volume snapshot, materialize a temp volume from it, attach, mount,
checkpoint, then detach/delete. Config-driven (`SNAPSHOT_*` vars in `p4_vars`); needs the provider
CLI + an instance role with snapshot/volume permissions, plus root/sudo for mount.
3. fall back to the in-place `checkpoint()`.
**Supporting changes:** refactored `dump_checkpoint` → `dump_checkpoint_from_root <root>` (reused by the
snapshot path; preserves the `SNAPSHOT_SCRIPT` hook). New helpers: `detect_snapshot_method`,
`snapshot_{create_script,expose,destroy}_<method>`, `snapshot_rotate_journal` (non-fatal rotate),
`snapshot_wait_for_device`, `snapshot_build_create_script`, `snapshot_expose`, `snapshot_cleanup`,
`snapshot_checkpoint`, `_snapshot_checkpoint_run`.
**Multi-agent review fixes (in the same change):** mode-aware checkpoint existence-skip (parallel `-jdpm`
vs serial `.gz`); half-written-checkpoint guard restored; `SNAPSHOT_METHOD` validation + `declare -F`
guard; cloud teardown subshell-leak fixed (expose sets a global `SnapRoot`, runs in-shell so
volume/mount state persists for cleanup); non-fatal journal rotation so failures fall back instead of
aborting; build/validate before rotating; bounded device-wait poll instead of a fixed sleep.
**MUST validate on a non-prod instance:** the journal-boundary consistency (rebuild offline_db from a
snapshot checkpoint, replay journals, `p4 verify` vs. a control checkpoint) and the cloud paths'
environment-specific config (volume IDs, device naming incl. AWS Nitro nvme, mount privilege). The
rotate→`p4d -c`-lock transaction gap is a documented, accepted residual risk — run during a quiet window.
| # | Change | User | Description | Committed | |
|---|---|---|---|---|---|
| #3 | 32807 | Russell C. Jackson (Rusty) | Update SDP_PORT_SCOPE.md: document the snapshot-based live checkpoint feature (reflink/aws/azure/gcp via p4d -c, fallback to in-place checkpoint) and its multi-agent review fixes, submitted in change 32806. | ||
| #2 | 32804 | Russell C. Jackson (Rusty) | Update SDP_PORT_SCOPE.md: document post-port refinements (config-file migration to configurables.cfg/ccheck, helix_binaries->p4_binaries rename, removal of unused helpers) and the 9 multi-agent review fixes applied in change 32803. | ||
| #1 | 32803 | Russell C. Jackson (Rusty) |
Modernize russell_jackson SDP fork from upstream 2025.2. - Port modern p4d features: partitioned/readonly clients, upgrade-safety (p4 storage -w / p4 upgrades polling), checkpoint/replica/edge tooling, proxy & broker SSL trust, modern p4login, dir-ownership preflight. - Add scripts: get_p4_binaries.sh (renamed from helix), ccheck.sh, verify_sdp.sh, sdp_health_check.sh, journal_watch.sh, load_checkpoint.sh, refresh_P4ROOT, request_replica_checkpoint.sh, keep_offline_db_current.sh, gen_sudoers.sh, etc. - Migrate configurables to configurables.cfg applied via ccheck.sh -fix; slim configure_new_server.sh to setup-only. - upgrade.sh: dry-run default, verified clean rollback point. - Fixes from multi-agent review (rsync byte/KB+comma, cfg field counts, version thresholds, etc.). See SDP_PORT_SCOPE.md for the full manifest. |