SDP_PORT_SCOPE.md #3

  • //
  • guest/
  • russell_jackson/
  • SDP_PORT_SCOPE.md
  • Markdown
  • View
  • Commits
  • Open Download .zip Download (15 KB)

SDP Feature-Port Scope: russell_jackson fork vs perforce_software 2025.2

OUTCOME (completed port)

All phases below were implemented. Files touched (under Server/Unix/p4/common/):

New scripts copied/ported from upstream: bin/ccheck.sh (+ config/configurables.cfg), bin/check_dir_ownership.sh, bin/depot_verify_chunks.py, bin/edge_vars, bin/journal_watch.sh, bin/keep_offline_db_current.sh, bin/load_checkpoint.sh, bin/opt_perforce_sdp_backup.sh, bin/p4sanity_check.sh, bin/proxy_rotate.sh, bin/refresh_P4ROOT_from_offline_db.sh, bin/request_replica_checkpoint.sh, bin/run_if_broker.sh, bin/run_if_proxy.sh, bin/sdp_health_check.sh, bin/verify_sdp.sh.

Patched existing fork scripts: bin/backup_functions.sh (copy_jd_table, remove_jd_tables, get_target_config_value, rsync_with_preflight, copy_readonly_clients_dir, request_replica_checkpoint, get_latest_checkpoint_with_md5; copy_readonly wired into switch_db_files), bin/upgrade.sh (p4 storage -w + p4 upgrades polling + 2nd journal rotation), bin/edge_dump.sh (partitioned-storage tables), bin/replica_status.sh (archive-replication check), bin/replica_cleanup.sh, bin/broker_rotate.sh (bug fix), bin/gen_default_broker_cfg.sh (net.autotune), bin/p4dstate.sh (lslocks -J), bin/p4login (full modern replacement), bin/p4_vars (p4login support vars), bin/p4d_base (ownership preflight gate, start-only).

MUST be validated on a non-prod instance before relying on them (high blast radius / untestable here): p4login, p4d_base (start gate), verify_sdp.sh (tune its -skip list for this fork's older layout), load_checkpoint.sh / refresh_P4ROOT_from_offline_db.sh (DB restore), upgrade.sh.

Known limitations: load_checkpoint.sh edge-server path requires edge_vars (now shipped); keep_offline_db_current.sh replays from local CHECKPOINTS/JOURNALS (NFS-shared target checkpoints must be reachable locally) and its replay_journals_to_offline_db was intentionally NOT given the upstream useTargetJournalPrefix arg to avoid changing the fork's custom two-pass replay. Not ported: wholesale p4_vars/instance_vars env contract, systemd enforcement, mkrep.sh, edge_shelf_replicate.sh, p4ftpd_base, p4review2.py, p4brokerstate.sh/p4pstate.sh (ship as broken admin-edit templates upstream).

POST-PORT REFINEMENTS

  • Config migration to the config-file model: the 15 fork-specific configurables now live in config/configurables.cfg and are applied via bin/ccheck.sh -fix; setup/configure_new_server.sh was slimmed to setup-only (it calls ccheck.sh for configurables). journalPrefix in the cfg points at journals.rotated (fork layout).
  • Rename: helix_binaries/get_helix_binaries.sh → p4_binaries/get_p4_binaries.sh (Perforce's P4 rebrand); FTP URL and product names left unchanged.
  • Removed as unused: the backup_functions.sh helpers get_target_config_value and get_latest_checkpoint_with_md5 were deleted (no dynamic journalPrefix logic remains; copy_jd_table/remove_jd_tables are retained and still used).

MULTI-AGENT REVIEW FIXES (submitted as Perforce change 32803)

A multi-agent review of the full changeset confirmed: no dangling references to the removed functions, no leftover helix references, and clean bash -n across all scripts. It found 9 real issues — all fixed. Several were pre-existing upstream defects inherited by verbatim copy, so the fork is now cleaner than upstream on those.

  1. (High) bin/backup_functions.sh rsync_with_preflight: rsync --stats size is in BYTES (not KB) and stock GNU rsync adds comma separators, so the disk-space safety check was off by ~1024× and could crash bash arithmetic. Now strips commas, converts bytes→KB, defaults to 0.
  2. (High) config/configurables.cfg hcc|filesys.P4JOURNAL.min and hcc|filesys.P4LOG.min: a stray Exact field (8 fields instead of 7) made ccheck -fix bail under the hcc profile. Removed.
  3. (High) bin/upgrade.sh: p4d version thresholds were 2-digit ("18.2"/"19.1") but compared against 4-digit p4d -V output. Fixed to "2018.2"/"2019.1".
  4. (Med) bin/upgrade.sh: start_p4d was nested inside the version gate. Made unconditional so p4d is never left stopped after a DB upgrade.
  5. (Low) bin/p4login: $JDTmpDir was referenced unguarded under set -u when db.config is unreadable. Guarded with a [[ -r "$P4ROOT/db.config" ]] check.
  6. (Low) bin/load_checkpoint.sh: a compressed numbered journal was stored without its .gz suffix, aborting replay. Suffix added.
  7. (Low) bin/proxy_rotate.sh: dead check_dirs 2 argument (fork's check_dirs takes none) removed.
  8. (Low) p4_binaries/get_p4_binaries.sh: long-form YYYY.N year whitelist stopped at 2024, rejecting 2025.2 (the default is r25.2). Added 2025/2026.
  9. (Low) config/configurables.cfg always|dm.user.resetpassword: only 6 fields (missing ServerIDType). Fixed to 7.

Scope of Server/Unix/p4/common/bin. Compares the fork against upstream Rev. SDP/MultiArch/2025.2/32234. Already ported: partitioned/readonly client directory handling (rsync_with_preflight + copy_readonly_clients_dir, wired into switch_db_files).

Guiding constraints (why this is field-level, not file-level)

  • The fork is not uniformly older. It is independently ahead of upstream in several places: printf %q safe re-exec (p4d_base), array arg passing (p4master_run), p4d -xu pre-start upgrade, local-scoped ps_functions.sh, pid-protecting kill_idle.sh, modernized update_limits.py, and a SERVER_TYPE-based run_if_* design. A blind overwrite would REGRESS these.
  • The env contract diverges fundamentally. The fork's p4_vars uses a hostname-qualified P4PORT, SERVER_TYPE (from sdp_server_type.txt), JOURNALS=journals.rotated, RSYNCUSER, and a serverid.vars model. Upstream uses a db.config-driven instance_vars model. Do NOT replace p4_vars / instance_vars wholesale — add individual variables only.
  • The fork is intentionally init-based, not systemd. Do not re-enable the systemd-enforcement blocks.

TIER 1 — Modern p4d feature support (highest value, your stated goal)

# Item Upstream source What modern p4d feature Risk Effort
1 p4 storage -w + p4 upgrades polling in upgrade flow upgrade.sh Waits for async db.storage upgrade (2019.1) and background upgrade tasks (2020.2+) before rotating/replaying journals. The fork's upgrade.sh does journaled -xu but does NOT wait — this is the one genuinely dangerous modern-p4d gap. LOW–MED SMALL
2 Post-upgrade second journal rotation upgrade.sh Rotates journal after a major upgrade so upgrade DB changes flow into offline_db. Master/commit only. MED SMALL
3 edge_vars partitioned-storage table lists edge_vars, edge_dump.sh, recover_edge.sh Excludes/seeds modern partitioned-storage db.* tables (db.storagesh/sx, db.haveg, db.workingg, db.locksg, db.resolveg) per p4d version, instead of the fork's hardcoded inline table list. MED MED
4 request_replica_checkpoint (p4 admin checkpoint -Z) request_replica_checkpoint.sh + backup_functions.sh journalcopy/standby checkpoint-at-next-rotation workflow, parallel -p -m -N. LOW SMALL
5 Archive-replication health check replica_status.sh (lbr.replication) Detects librarian/archive transfer failures (pull -ls, "Transfer of librarian file failed"), version-gated pull -ljv/-lj. Fork checks only metadata journal lag — archive failures go silent today. MED MED
6 keep_offline_db_current.sh upstream-only script Keeps a standby's offline_db current on NFS-shared checkpoints without full checkpoints. MED MED
7 Modern p4login p4login (v4.6.2) auth.id-aware login, service/automation users, edge ExternalAddress, P4AUTH/P4BROKERPORT login, SSL p4 trust, encrypted password file. Fork's p4login is a 2-line stub. Signature-compatible with callers. MED LARGE
8 Proxy/broker modernization p4p_base, gen_default_broker_cfg.sh, p4broker_base net.autotune/-v track proxy flags, SSL p4 trust of target/listen, multi-config broker (*.broker.<host>.cfg). MED MED
9 check_dir_ownership.sh preflight p4d_base + script Fast (-maxdepth 1) P4ROOT ownership check before start; explicitly designed around large partitioned-client / server.locks dirs. LOW–MED MED

Foundational helpers (port first — they unlock #4/#5/#6 and improve safety): copy_jd_table() / remove_jd_tables() (read db.config/db.counters from a temp copy instead of the live DB), get_target_config_value() (P4TARGET→configure show discovery), get_latest_checkpoint_with_md5(). All LOW risk / SMALL effort.


TIER 2 — Robustness / quality (cheap, safe, mostly standalone)

  • p4sanity_check.sh — service smoke test. Standalone (sources only p4_vars). LOW/SMALL.
  • sdp_health_check.sh — version-agnostic health report; low entanglement by design. Better target than verify_sdp.sh. LOW–MED/MED.
  • check_dir_ownership.sh (standalone) — wrong-owner detector after bad restore/rsync. LOW/SMALL.
  • journal_watch.sh — journal-partition free-space watcher + auto-rotate/mail. LOW–MED/SMALL.
  • depot_verify_chunks.py — chunk huge depots for parallel p4 verify. Complements fork's cron_verify.sh/p4verify.py. Needs P4Python. LOW/SMALL–MED.
  • broker_rotate.sh bug fix — fork hardcodes P4PORT=1666 and calls a meaningless get_journalnum on broker hosts. Real defect; fix regardless. LOW/SMALL.
  • replica_cleanup.sh — add -service login, disk-space check, mail. LOW/SMALL.
  • run_if_broker.sh / run_if_proxy.sh — fork lacks these gates. LOW/SMALL.
  • proxy_rotate.sh, p4brokerstate.sh, p4pstate.sh — proxy/broker log-rotation + state-capture diagnostics. LOW/SMALL.
  • p4dstate.sh — add lslocks -J (JSON, machine-parseable) capture. LOW/SMALL.
  • New additive instance vars — SDP_MAX_START/STOP_DELAY_*, SDP_ALWAYS_LOGIN, SDP_AUTOMATION_USERS, SDP_VERSION, SDP_ADMIN_PASSWORD_FILE (only wire up alongside their consumers). LOW/SMALL.

TIER 3 — High value but heavy / entangled (decide case-by-case)

  • verify_sdp.sh (87KB) — full SDP self-verification harness; biggest single capability gap but tightly bound to the 2025.2 layout/configurables. HIGH/LARGE. Prefer sdp_health_check.sh first.
  • Full upgrade.sh orchestrator (1576 vs 169 lines) — 5-phase preflight-driven multi-binary upgrade. Pulls in verify_sdp.sh + get_helix_binaries.sh. HIGH/LARGE. (But the cheap subset — Tier 1 #1/#2 — captures most of the safety benefit.)
  • refresh_P4ROOT_from_offline_db.sh / load_checkpoint.sh — modern swap/restore tools (parallel + compressed checkpoint aware). Overlap with fork's recreate_db_* lineage. HIGH/LARGE.
  • ccheck.sh + configurables.cfg — config-drift / security audit. Needs the baseline data file ported too. MED/MED.
  • opt_perforce_sdp_backup.sh — DR backup of the SDP tooling layer itself. MED/MED–LARGE.

DO NOT port

  • Wholesale p4_vars / instance_vars (breaks fork's P4PORT/SERVER_TYPE/JOURNALS contract).
  • systemd enforcement blocks (fork is intentionally init-based).
  • Wholesale db.config-driven replica resolution.
  • Whole-file overwrite of p4d_base, p4master_run, ps_functions.sh, kill_idle.sh, update_limits.py (fork is ahead in places — merge fields only).
  • mkrep.sh (fork's mkstandby*/mkedge* cover it), edge_shelf_replicate.sh (obsolete on modern p4d), p4ftpd_base (obsolete), p4review2.py (Swarm supersedes).

Recommended phased order

  1. Phase 0 (foundational, LOW/SMALL): copy_jd_table/remove_jd_tables, get_target_config_value, get_latest_checkpoint_with_md5 into backup_functions.sh.
  2. Phase 1 (critical correctness): Tier 1 #1 + #2 (upgrade.sh storage/upgrades polling).
  3. Phase 2 (cheap robustness): Tier 2 standalone scripts + broker_rotate.sh fix.
  4. Phase 3 (replica/standby features): Tier 1 #4, #5, #6, #3.
  5. Phase 4 (larger): Tier 1 #7 (p4login), #8 (proxy/broker), #9; then Tier 3 as desired.

SNAPSHOT-BASED LIVE CHECKPOINT (later enhancement; submitted as change 32806)

Added to bin/live_checkpoint.sh + bin/backup_functions.sh (config in bin/p4_vars). Goal: shrink the live-server outage of a live checkpoint from the WHOLE checkpoint dump (the p4d -jc lock, minutes–hours) to the few seconds needed to take a consistent snapshot of P4ROOT, then build the checkpoint FROM the snapshot offline.

Flow (snapshot_checkpoint, master-only, parallel-aware): build+validate provider create command → rotate journal (non-fatal) → p4d -r $P4ROOT -c "<create>" (p4d "lock tables, run command, unlock" = consistent snapshot, brief lock) → expose snapshot as a readable root → dump_checkpoint_from_root (offline) → guaranteed teardown. On any failure it returns non-zero and live_checkpoint.sh falls back to the in-place checkpoint().

Methods (detect_snapshot_method, precedence, override SNAPSHOT_METHOD=auto|reflink|aws|azure|gcp|off):

  1. reflink — local copy-on-write clone of db.* (XFS reflink=1 / btrfs). Fully local, the primary/testable path; no config.
  2. aws / azure / gcp — create a volume snapshot, materialize a temp volume from it, attach, mount, checkpoint, then detach/delete. Config-driven (SNAPSHOT_* vars in p4_vars); needs the provider CLI + an instance role with snapshot/volume permissions, plus root/sudo for mount.
  3. fall back to the in-place checkpoint().

Supporting changes: refactored dump_checkpoint → dump_checkpoint_from_root <root> (reused by the snapshot path; preserves the SNAPSHOT_SCRIPT hook). New helpers: detect_snapshot_method, snapshot_{create_script,expose,destroy}_<method>, snapshot_rotate_journal (non-fatal rotate), snapshot_wait_for_device, snapshot_build_create_script, snapshot_expose, snapshot_cleanup, snapshot_checkpoint, _snapshot_checkpoint_run.

Multi-agent review fixes (in the same change): mode-aware checkpoint existence-skip (parallel -jdpm vs serial .gz); half-written-checkpoint guard restored; SNAPSHOT_METHOD validation + declare -F guard; cloud teardown subshell-leak fixed (expose sets a global SnapRoot, runs in-shell so volume/mount state persists for cleanup); non-fatal journal rotation so failures fall back instead of aborting; build/validate before rotating; bounded device-wait poll instead of a fixed sleep.

MUST validate on a non-prod instance: the journal-boundary consistency (rebuild offline_db from a snapshot checkpoint, replay journals, p4 verify vs. a control checkpoint) and the cloud paths' environment-specific config (volume IDs, device naming incl. AWS Nitro nvme, mount privilege). The rotate→p4d -c-lock transaction gap is a documented, accepted residual risk — run during a quiet window.

# SDP Feature-Port Scope: russell_jackson fork vs perforce_software 2025.2

> ## OUTCOME (completed port)
> All phases below were implemented. Files touched (under `Server/Unix/p4/common/`):
>
> **New scripts copied/ported from upstream:** `bin/ccheck.sh` (+ `config/configurables.cfg`),
> `bin/check_dir_ownership.sh`, `bin/depot_verify_chunks.py`, `bin/edge_vars`,
> `bin/journal_watch.sh`, `bin/keep_offline_db_current.sh`, `bin/load_checkpoint.sh`,
> `bin/opt_perforce_sdp_backup.sh`, `bin/p4sanity_check.sh`, `bin/proxy_rotate.sh`,
> `bin/refresh_P4ROOT_from_offline_db.sh`, `bin/request_replica_checkpoint.sh`,
> `bin/run_if_broker.sh`, `bin/run_if_proxy.sh`, `bin/sdp_health_check.sh`, `bin/verify_sdp.sh`.
>
> **Patched existing fork scripts:** `bin/backup_functions.sh` (copy_jd_table, remove_jd_tables,
> get_target_config_value, rsync_with_preflight, copy_readonly_clients_dir, request_replica_checkpoint,
> get_latest_checkpoint_with_md5; copy_readonly wired into switch_db_files), `bin/upgrade.sh`
> (p4 storage -w + p4 upgrades polling + 2nd journal rotation), `bin/edge_dump.sh` (partitioned-storage
> tables), `bin/replica_status.sh` (archive-replication check), `bin/replica_cleanup.sh`,
> `bin/broker_rotate.sh` (bug fix), `bin/gen_default_broker_cfg.sh` (net.autotune), `bin/p4dstate.sh`
> (lslocks -J), `bin/p4login` (full modern replacement), `bin/p4_vars` (p4login support vars),
> `bin/p4d_base` (ownership preflight gate, start-only).
>
> **MUST be validated on a non-prod instance before relying on them** (high blast radius / untestable here):
> `p4login`, `p4d_base` (start gate), `verify_sdp.sh` (tune its `-skip` list for this fork's older
> layout), `load_checkpoint.sh` / `refresh_P4ROOT_from_offline_db.sh` (DB restore), `upgrade.sh`.
>
> **Known limitations:** `load_checkpoint.sh` edge-server path requires `edge_vars` (now shipped);
> `keep_offline_db_current.sh` replays from local CHECKPOINTS/JOURNALS (NFS-shared target checkpoints
> must be reachable locally) and its `replay_journals_to_offline_db` was intentionally NOT given the
> upstream `useTargetJournalPrefix` arg to avoid changing the fork's custom two-pass replay. Not ported:
> wholesale p4_vars/instance_vars env contract, systemd enforcement, mkrep.sh, edge_shelf_replicate.sh,
> p4ftpd_base, p4review2.py, p4brokerstate.sh/p4pstate.sh (ship as broken admin-edit templates upstream).

> ## POST-PORT REFINEMENTS
> - **Config migration to the config-file model:** the 15 fork-specific configurables now
>   live in `config/configurables.cfg` and are applied via `bin/ccheck.sh -fix`;
>   `setup/configure_new_server.sh` was slimmed to setup-only (it calls `ccheck.sh` for
>   configurables). `journalPrefix` in the cfg points at `journals.rotated` (fork layout).
> - **Rename:** `helix_binaries/get_helix_binaries.sh` → `p4_binaries/get_p4_binaries.sh`
>   (Perforce's P4 rebrand); FTP URL and product names left unchanged.
> - **Removed as unused:** the `backup_functions.sh` helpers `get_target_config_value` and
>   `get_latest_checkpoint_with_md5` were deleted (no dynamic journalPrefix logic remains;
>   `copy_jd_table`/`remove_jd_tables` are retained and still used).

> ## MULTI-AGENT REVIEW FIXES (submitted as Perforce change 32803)
> A multi-agent review of the full changeset confirmed: no dangling references to the
> removed functions, no leftover `helix` references, and clean `bash -n` across all scripts.
> It found 9 real issues — **all fixed**. Several were pre-existing upstream defects
> inherited by verbatim copy, so the fork is now cleaner than upstream on those.
>
> 1. **(High)** `bin/backup_functions.sh` `rsync_with_preflight`: rsync `--stats` size is in
>    BYTES (not KB) and stock GNU rsync adds comma separators, so the disk-space safety check
>    was off by ~1024× and could crash bash arithmetic. Now strips commas, converts bytes→KB,
>    defaults to 0.
> 2. **(High)** `config/configurables.cfg` `hcc|filesys.P4JOURNAL.min` and `hcc|filesys.P4LOG.min`:
>    a stray `Exact` field (8 fields instead of 7) made `ccheck -fix` bail under the `hcc`
>    profile. Removed.
> 3. **(High)** `bin/upgrade.sh`: p4d version thresholds were 2-digit (`"18.2"`/`"19.1"`) but
>    compared against 4-digit `p4d -V` output. Fixed to `"2018.2"`/`"2019.1"`.
> 4. **(Med)** `bin/upgrade.sh`: `start_p4d` was nested inside the version gate. Made
>    unconditional so p4d is never left stopped after a DB upgrade.
> 5. **(Low)** `bin/p4login`: `$JDTmpDir` was referenced unguarded under `set -u` when
>    db.config is unreadable. Guarded with a `[[ -r "$P4ROOT/db.config" ]]` check.
> 6. **(Low)** `bin/load_checkpoint.sh`: a compressed numbered journal was stored without its
>    `.gz` suffix, aborting replay. Suffix added.
> 7. **(Low)** `bin/proxy_rotate.sh`: dead `check_dirs 2` argument (fork's `check_dirs` takes
>    none) removed.
> 8. **(Low)** `p4_binaries/get_p4_binaries.sh`: long-form `YYYY.N` year whitelist stopped at
>    2024, rejecting `2025.2` (the default is `r25.2`). Added 2025/2026.
> 9. **(Low)** `config/configurables.cfg` `always|dm.user.resetpassword`: only 6 fields
>    (missing `ServerIDType`). Fixed to 7.



Scope of `Server/Unix/p4/common/bin`. Compares the fork against upstream
Rev. SDP/MultiArch/2025.2/32234. **Already ported:** partitioned/readonly client
directory handling (`rsync_with_preflight` + `copy_readonly_clients_dir`, wired
into `switch_db_files`).

## Guiding constraints (why this is field-level, not file-level)
- The fork is **not uniformly older**. It is independently ahead of upstream in
  several places: `printf %q` safe re-exec (`p4d_base`), array arg passing
  (`p4master_run`), `p4d -xu` pre-start upgrade, `local`-scoped `ps_functions.sh`,
  pid-protecting `kill_idle.sh`, modernized `update_limits.py`, and a
  `SERVER_TYPE`-based `run_if_*` design. A blind overwrite would REGRESS these.
- The **env contract diverges fundamentally**. The fork's `p4_vars` uses a
  hostname-qualified `P4PORT`, `SERVER_TYPE` (from `sdp_server_type.txt`),
  `JOURNALS=journals.rotated`, `RSYNCUSER`, and a `serverid.vars` model. Upstream
  uses a `db.config`-driven `instance_vars` model. **Do NOT replace `p4_vars` /
  `instance_vars` wholesale** — add individual variables only.
- The fork is intentionally **init-based, not systemd**. Do not re-enable the
  systemd-enforcement blocks.

---

## TIER 1 — Modern p4d feature support (highest value, your stated goal)

| # | Item | Upstream source | What modern p4d feature | Risk | Effort |
|---|------|-----------------|-------------------------|------|--------|
| 1 | **`p4 storage -w` + `p4 upgrades` polling** in upgrade flow | `upgrade.sh` | Waits for async db.storage upgrade (2019.1) and background upgrade tasks (2020.2+) before rotating/replaying journals. The fork's `upgrade.sh` does journaled `-xu` but does NOT wait — this is the one genuinely **dangerous** modern-p4d gap. | LOW–MED | SMALL |
| 2 | **Post-upgrade second journal rotation** | `upgrade.sh` | Rotates journal after a major upgrade so upgrade DB changes flow into offline_db. Master/commit only. | MED | SMALL |
| 3 | **`edge_vars` partitioned-storage table lists** | `edge_vars`, `edge_dump.sh`, `recover_edge.sh` | Excludes/seeds modern partitioned-storage db.* tables (`db.storagesh/sx`, `db.haveg`, `db.workingg`, `db.locksg`, `db.resolveg`) per p4d version, instead of the fork's hardcoded inline table list. | MED | MED |
| 4 | **`request_replica_checkpoint` (`p4 admin checkpoint -Z`)** | `request_replica_checkpoint.sh` + `backup_functions.sh` | journalcopy/standby checkpoint-at-next-rotation workflow, parallel `-p -m -N`. | LOW | SMALL |
| 5 | **Archive-replication health check** | `replica_status.sh` (lbr.replication) | Detects librarian/archive transfer failures (`pull -ls`, "Transfer of librarian file failed"), version-gated `pull -ljv`/`-lj`. Fork checks only metadata journal lag — archive failures go silent today. | MED | MED |
| 6 | **`keep_offline_db_current.sh`** | upstream-only script | Keeps a standby's offline_db current on NFS-shared checkpoints without full checkpoints. | MED | MED |
| 7 | **Modern `p4login`** | `p4login` (v4.6.2) | `auth.id`-aware login, service/automation users, edge `ExternalAddress`, P4AUTH/P4BROKERPORT login, SSL `p4 trust`, encrypted password file. Fork's `p4login` is a 2-line stub. Signature-compatible with callers. | MED | LARGE |
| 8 | **Proxy/broker modernization** | `p4p_base`, `gen_default_broker_cfg.sh`, `p4broker_base` | `net.autotune`/`-v track` proxy flags, SSL `p4 trust` of target/listen, multi-config broker (`*.broker.<host>.cfg`). | MED | MED |
| 9 | **`check_dir_ownership.sh` preflight** | `p4d_base` + script | Fast (`-maxdepth 1`) P4ROOT ownership check before start; explicitly designed around large partitioned-client / server.locks dirs. | LOW–MED | MED |

**Foundational helpers** (port first — they unlock #4/#5/#6 and improve safety):
`copy_jd_table()` / `remove_jd_tables()` (read db.config/db.counters from a temp
copy instead of the live DB), `get_target_config_value()` (P4TARGET→`configure show`
discovery), `get_latest_checkpoint_with_md5()`. All LOW risk / SMALL effort.

---

## TIER 2 — Robustness / quality (cheap, safe, mostly standalone)

- **`p4sanity_check.sh`** — service smoke test. Standalone (sources only `p4_vars`). LOW/SMALL.
- **`sdp_health_check.sh`** — version-agnostic health report; low entanglement by design. **Better target than `verify_sdp.sh`.** LOW–MED/MED.
- **`check_dir_ownership.sh`** (standalone) — wrong-owner detector after bad restore/rsync. LOW/SMALL.
- **`journal_watch.sh`** — journal-partition free-space watcher + auto-rotate/mail. LOW–MED/SMALL.
- **`depot_verify_chunks.py`** — chunk huge depots for parallel `p4 verify`. Complements fork's `cron_verify.sh`/`p4verify.py`. Needs P4Python. LOW/SMALL–MED.
- **`broker_rotate.sh` bug fix** — fork hardcodes `P4PORT=1666` and calls a meaningless `get_journalnum` on broker hosts. Real defect; fix regardless. LOW/SMALL.
- **`replica_cleanup.sh`** — add `-service` login, disk-space check, mail. LOW/SMALL.
- **`run_if_broker.sh` / `run_if_proxy.sh`** — fork lacks these gates. LOW/SMALL.
- **`proxy_rotate.sh`, `p4brokerstate.sh`, `p4pstate.sh`** — proxy/broker log-rotation + state-capture diagnostics. LOW/SMALL.
- **`p4dstate.sh`** — add `lslocks -J` (JSON, machine-parseable) capture. LOW/SMALL.
- **New additive instance vars** — `SDP_MAX_START/STOP_DELAY_*`, `SDP_ALWAYS_LOGIN`, `SDP_AUTOMATION_USERS`, `SDP_VERSION`, `SDP_ADMIN_PASSWORD_FILE` (only wire up alongside their consumers). LOW/SMALL.

---

## TIER 3 — High value but heavy / entangled (decide case-by-case)

- **`verify_sdp.sh`** (87KB) — full SDP self-verification harness; biggest single capability gap but tightly bound to the 2025.2 layout/configurables. HIGH/LARGE. Prefer `sdp_health_check.sh` first.
- **Full `upgrade.sh` orchestrator** (1576 vs 169 lines) — 5-phase preflight-driven multi-binary upgrade. Pulls in `verify_sdp.sh` + `get_helix_binaries.sh`. HIGH/LARGE. (But the cheap subset — Tier 1 #1/#2 — captures most of the safety benefit.)
- **`refresh_P4ROOT_from_offline_db.sh` / `load_checkpoint.sh`** — modern swap/restore tools (parallel + compressed checkpoint aware). Overlap with fork's `recreate_db_*` lineage. HIGH/LARGE.
- **`ccheck.sh` + `configurables.cfg`** — config-drift / security audit. Needs the baseline data file ported too. MED/MED.
- **`opt_perforce_sdp_backup.sh`** — DR backup of the SDP tooling layer itself. MED/MED–LARGE.

---

## DO NOT port
- Wholesale `p4_vars` / `instance_vars` (breaks fork's P4PORT/SERVER_TYPE/JOURNALS contract).
- systemd enforcement blocks (fork is intentionally init-based).
- Wholesale `db.config`-driven replica resolution.
- Whole-file overwrite of `p4d_base`, `p4master_run`, `ps_functions.sh`, `kill_idle.sh`, `update_limits.py` (fork is ahead in places — merge fields only).
- `mkrep.sh` (fork's `mkstandby*`/`mkedge*` cover it), `edge_shelf_replicate.sh` (obsolete on modern p4d), `p4ftpd_base` (obsolete), `p4review2.py` (Swarm supersedes).

---

## Recommended phased order
1. **Phase 0 (foundational, LOW/SMALL):** `copy_jd_table`/`remove_jd_tables`,
   `get_target_config_value`, `get_latest_checkpoint_with_md5` into `backup_functions.sh`.
2. **Phase 1 (critical correctness):** Tier 1 #1 + #2 (upgrade.sh storage/upgrades polling).
3. **Phase 2 (cheap robustness):** Tier 2 standalone scripts + `broker_rotate.sh` fix.
4. **Phase 3 (replica/standby features):** Tier 1 #4, #5, #6, #3.
5. **Phase 4 (larger):** Tier 1 #7 (p4login), #8 (proxy/broker), #9; then Tier 3 as desired.

---

## SNAPSHOT-BASED LIVE CHECKPOINT (later enhancement; submitted as change 32806)

Added to `bin/live_checkpoint.sh` + `bin/backup_functions.sh` (config in `bin/p4_vars`). Goal:
shrink the live-server outage of a live checkpoint from the WHOLE checkpoint dump (the `p4d -jc`
lock, minutes–hours) to the few seconds needed to take a consistent snapshot of P4ROOT, then build
the checkpoint FROM the snapshot offline.

**Flow (`snapshot_checkpoint`, master-only, parallel-aware):** build+validate provider create command
→ rotate journal (non-fatal) → `p4d -r $P4ROOT -c "<create>"` (p4d "lock tables, run command, unlock"
= consistent snapshot, brief lock) → expose snapshot as a readable root → `dump_checkpoint_from_root`
(offline) → guaranteed teardown. On any failure it returns non-zero and `live_checkpoint.sh` falls
back to the in-place `checkpoint()`.

**Methods** (`detect_snapshot_method`, precedence, override `SNAPSHOT_METHOD=auto|reflink|aws|azure|gcp|off`):
1. **reflink** — local copy-on-write clone of `db.*` (XFS reflink=1 / btrfs). Fully local, the
   primary/testable path; no config.
2. **aws / azure / gcp** — create a volume snapshot, materialize a temp volume from it, attach, mount,
   checkpoint, then detach/delete. Config-driven (`SNAPSHOT_*` vars in `p4_vars`); needs the provider
   CLI + an instance role with snapshot/volume permissions, plus root/sudo for mount.
3. fall back to the in-place `checkpoint()`.

**Supporting changes:** refactored `dump_checkpoint` → `dump_checkpoint_from_root <root>` (reused by the
snapshot path; preserves the `SNAPSHOT_SCRIPT` hook). New helpers: `detect_snapshot_method`,
`snapshot_{create_script,expose,destroy}_<method>`, `snapshot_rotate_journal` (non-fatal rotate),
`snapshot_wait_for_device`, `snapshot_build_create_script`, `snapshot_expose`, `snapshot_cleanup`,
`snapshot_checkpoint`, `_snapshot_checkpoint_run`.

**Multi-agent review fixes (in the same change):** mode-aware checkpoint existence-skip (parallel `-jdpm`
vs serial `.gz`); half-written-checkpoint guard restored; `SNAPSHOT_METHOD` validation + `declare -F`
guard; cloud teardown subshell-leak fixed (expose sets a global `SnapRoot`, runs in-shell so
volume/mount state persists for cleanup); non-fatal journal rotation so failures fall back instead of
aborting; build/validate before rotating; bounded device-wait poll instead of a fixed sleep.

**MUST validate on a non-prod instance:** the journal-boundary consistency (rebuild offline_db from a
snapshot checkpoint, replay journals, `p4 verify` vs. a control checkpoint) and the cloud paths'
environment-specific config (volume IDs, device naming incl. AWS Nitro nvme, mount privilege). The
rotate→`p4d -c`-lock transaction gap is a documented, accepted residual risk — run during a quiet window.
# Change User Description Committed
#3 32807 Russell C. Jackson (Rusty) Update SDP_PORT_SCOPE.md: document the snapshot-based live checkpoint feature (reflink/aws/azure/gcp via p4d -c, fallback to in-place checkpoint) and its multi-agent review fixes, submitted in change 32806.
#2 32804 Russell C. Jackson (Rusty) Update SDP_PORT_SCOPE.md: document post-port refinements (config-file migration to configurables.cfg/ccheck, helix_binaries->p4_binaries rename, removal of unused helpers) and the 9 multi-agent review fixes applied in change 32803.
#1 32803 Russell C. Jackson (Rusty) Modernize russell_jackson SDP fork from upstream 2025.2.

- Port modern p4d features: partitioned/readonly clients, upgrade-safety (p4
  storage -w / p4 upgrades polling), checkpoint/replica/edge tooling, proxy &
  broker SSL trust, modern p4login, dir-ownership preflight.
- Add scripts: get_p4_binaries.sh (renamed from helix), ccheck.sh, verify_sdp.sh,
  sdp_health_check.sh, journal_watch.sh, load_checkpoint.sh, refresh_P4ROOT,
  request_replica_checkpoint.sh, keep_offline_db_current.sh, gen_sudoers.sh, etc.
- Migrate configurables to configurables.cfg applied via ccheck.sh -fix; slim
  configure_new_server.sh to setup-only.
- upgrade.sh: dry-run default, verified clean rollback point.
- Fixes from multi-agent review (rsync byte/KB+comma, cfg field counts, version
  thresholds, etc.).

See SDP_PORT_SCOPE.md for the full manifest.