infra.md #1

Intro

This document is written to help new Workshop admin to get started with the Workshop infrastructure used by the Community Development Group. It is a work-in-progress so feel free to update the information below as you see fit.

If you want to hack on the Swarm code, see [the swarm dev quickstart][swarm-dev-quickstart.md].

VMs

Most VMs used are now upgraded to Ubuntu 14.04 except the following:

maillist (aka frankie-vm): running 12.04 - should be okay to upgrade to 14.04.
forums: running 12.04 with IPBoard - need some serious testing before upgrading to 14.04 (will upgrade PHP 5.3 -> 5.5 and a lot of other things).
wayfarer-wiki: this host is going away - do the following before nuking the VM:
- Move all Apache redirection stuff to wayfarer-p4d.
- Archive the MySQL dump of the mediawiki.

Public VMs

We document the list of all the VMs used at:

https://confluence.perforce.com:8443/display/COM/Machines+and+VMs

Pay extra attention to:

wayfarer-p4d: our production P4D instance that backs everything. As of 2015/07/17 we are running P4D on 1666 unencrypted. SSL broker deployed on port 1667, use P4PORT=ssl:workshop.perforce.com:1667 to use encrypted connections.
wayfarer-swarm: Swarm instance that powers swarm.workshop.perforce.com.
wayfarer-{p4d,swarm}-stage: staging server for the production servers.
forums.perforce.com: runs our forum software IPBoard with bridge to mailing list perforce-user@perforce.com. Which runs on maillist.perforce.com.
frankie-vm aka maillist.perforce.com runs a vanilla Mailman install to host our external mailing lists.
wayfarer-gf: Git Fusion instances that connects to wayfarer-p4d - HTTPS only. Anonymous clone can be done with the Perforce user "guest". For example:

git clone https://guest:@git.workshop.perforce.com/<git-repo>
wayfarer-search - p4search (VM up, not installed as of 2015/07/17).

The above VMs runs in the DMZ - locked down with no access to the internal networks.

In general, running the following command will give you all the important (internal) IPs:

host -l perforce.com | egrep '(wayfarer-|frankie-vm|forums|wayport)'

forums.perforce.com has address 10.199.2.42
forums-vm.perforce.com has address 10.199.2.42
frankie-vm.perforce.com has address 10.199.2.85
wayfarer-gf.perforce.com has address 10.199.2.54
wayfarer-p4d.perforce.com has address 10.199.2.50
wayfarer-p4d-stage.perforce.com has address 10.199.2.56
wayfarer-search.perforce.com has address 10.199.2.52
wayfarer-swarm.perforce.com has address 10.199.2.53
wayfarer-swarm-stage.perforce.com has address 10.199.2.57
wayfarer-wiki.perforce.com has address 10.199.2.51
wayport.perforce.com has address 10.199.2.187

VMs from the inside

das.perforce.com is a subnet in the internal network.

eco-test.das.perforce.com: Jenkins slave - see @tgray.
eco-dev.das.perforce.com: has a P4D running on 1666, used to store stuff that were deemed not appropriate for p4poke:1666 - get rid of it if you can. Also used as a hosts on the inside for quick deployment (our Capistrano deployment involves uploading a big zip file to wayfarer-swarm - very slow if you are sipping coffee in your local cafe).
eco.eng.perforce.com.au - a read-only replica of wayfarer-p4d - just in case bad things happens (tm).

Swarm deployment

From @llam:

Make sure you have Capistrano installed
(version and dependencies located in Gemfile)
Sync //workshop/main/ on local machine
cd into swarm directory
Run appropriate cap command

If you see: failing due to Problem accessing Perforce: []; cannot continue

export P4PORT=server.perforce.com:1666
export P4CLIENT=your p4 client

From @tgray:

cap -T lists all the tasks
cap staging deploy runs "staging" and "deploy"
cap production deploy to push to prod
and cap production deploy:rollback is your friend.

On wayfarer-swarm* make sure that you are in the www-data and team (for access to password file) group with sudo privilege.

Passwords

For the workshop project we store password in two places:

Passwords are evil - get rid of them if you can. ;-)

External -> internal job replication

We do one-way replication of jobs filed against selected projects under the perforce_software name to our internal P4D server (p4poke:1666). This is done by the following script:

//workshop/dev/pvt/jobxfr/jobxfr.py

Now part of p4util:

https://swarm.workshop.perforce.com/projects/p4util/files/main/p4util/admin/jobxfr.py

Due to the nature of job replication, the script requires a custom configuration file and it's archived on our internal server:

//workshop/dev/pvt/jobxfr/workshop-to-p4prod.conf

https://swarm.perforce.com/files/workshop/dev/pvt/jobxfr/workshop-to-p4prod.conf

All replication states are stored in the key jobxfr.p4prod.

Currently the script runs from lcheung's cronjob on eco-dev.das.perforce.com. If you plan nuking the account, make sure you backup the P4TICKET in /home/lcheung/.p4tickets (or get IT to reset passwords for the user workshop-jobxfr).

Currently we are only replicating the following projects:

project=perforce-software-p4connect project=perforce-software-sdp project=perforce-software-p4api-net

Edit the [crontab][https://swarm.perforce.com/files/workshop/dev/pvt/jobxfr/crontab] to enable replication for more projects.

In case the script files, look for errors in eco-dev.das.perforce.com:/home/lcheung/jobxfr.log.

Disk setup on wayfarer-{p4d,swarm}*

Both wayfarer-p4d and wayfarer-p4d-stage are using Linux Volume Manager (LVM) for all the Perforce storage needs. Both a volume group (VG) named "sdp" with 3 logical volumes (LVs):

LV Name /dev/sdp/metadata
LV Size 10.00 GiB
LV Name /dev/sdp/logs
LV Size 5.00 GiB
LV Name /dev/sdp/depotdata
LV Size 30.00 GiB

LVM was used to provide isolation between the filesystems and minimizing downtime during disk upgrades.

To expand disk online you do the following:

email IT to add a new virtual disk to the VM (online - no shutdown of VM required).
confirm the new disk is listed in /proc/partitions.
partition the disk, for example, if the new disk is /dev/sde, you setup a single partition sde1 and set the partition label to "Linux LVM" (id "8e").
initialize the new partition as a physical volume (PV) for LVM:

pvcreate /dev/sde1
add the new PV to the volume group (VG), say the VG is called sdp, you do:

vgextend sdp /dev/sde1
Run the following to check exactly how much space is available for your logical volume (LV):

vgdisplay

You will see something similar to:

  --- Volume group ---
  VG Name               sdp
  System ID
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  6
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                2
  Open LV               2
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               31.50 GiB
  PE Size               4.00 MiB
  Total PE              8063
  Alloc PE / Size       8063 / 31.50 GiB
  Free  PE / Size       XXXX / YYYY GiB
  VG UUID               uVktxp-6EOO-fvLg-JM7f-tjKN-M4m5-tyuiRm

Note the number XXXX above. Say your LV is /dev/sdp/depotdata (or /dev/mapper/sdp-depotdata), you can extend your LV with:

lvextend -l XXXX /dev/sdp/depotdata
Expand the filesystem on the LV:

resizefs /dev/sdp/depotdata

Removing Virtual Disk

In the rare event that you need to remove unused disks from the VM, you need to free up the block device and deregister it from the running kernel.

If it's a regular filesystem, un-mount the filesystem.
If the disk is part of a LVM volume group (VG), you need to run a few more commands. For example, if your disk is sdb and has a single LVM partition sdb1 which is part of the VG sdp, you do the following:
- Remove it from the volume group by moving physical extends (PEs) away from the disk:
pvmove sdp /dev/sdb1
- Remove the PV from the VG:
vgreduce sdp /dev/sdb1
- Capture the output of fdisk -l /dev/sdb for IT's reference (see below).
- Tell the kernel to forget the whole block device:
echo 1 > /sys/block/sdb/device/delete
- Notify IT the block device is ready for removal by forwarding the fdisk output above. This may require rebooting of the VM depending on how the VM is configured and/or if there are snapshots created against the VM.

Deleting Users (Spammers) from the workshop

Removing users from the workshop can be tricky because we have a trigger that preventing users from being removed unless the user in question is not a member of any groups.

If you have p4util you can purge a user with:

    python -m p4util.adm.deluser <user>

Pulling in live data from wayfarer-p4d to wayfarer-p4d-stage

Wiki: https://confluence.perforce.com:8443/display/COM/How+to+refresh+data+on+wayfarer-p4d-stage

Script: https://swarm.perforce.com/files/workshop/dev/pvt/scripts/les-refresh-data-from-prod.sh

Run the above script on wayfarer-p4d-stage as root. It does the following:

Stop the Perforce server.
Remove old copies of online+offline db.*, journal and checkpoint.
Move existing online+offline db.*, journal and checkpoint files out of the way.
Grab the latest checkpoint from wayfarer-p4d and restore it.
Starts up the server.
Fix up protection table to enable access from wayfarer-swarm-stage instead of wayfarer-swarm.
Fix up the offline database in offline_db.
Sync depot files over.
Re-deploy swarm to stage with cap staging deploy - it will re-generate the tickets used by Swarm as part of the deployment.

Note that:

We do not have a Git Fusion server connected to wayfarer-p4d-stage - thus stubs were installed instead.
Tickets on *-stage is different to the ones in production. This is being handled by the deployment scripts in capistrano.

Changing project ownership

At this point it's mostly a manual process:

Create project in web app with new owner's name
This may involve resetting owner password or needing to
obtain password from owner.
Manually add members over
p4 duplicate //source/project/files/... //target/project/files/...
Use moveFollowers route to move followers over:
http://$SWARM_URL/admin/moveFollowers/source-project/target-project
Delete original project in web app (there is a Delete button in Edit Project page)
This may involve modifying owner list, since they will all receive a
confirmation email as a result of the project deletion.
Obliterate original files

## Intro

This document is written to help new Workshop admin to get
started with the Workshop infrastructure used by the Community
Development Group. It is a work-in-progress so feel free to
update the information below as you see fit.

If you want to hack on the Swarm code, see
[the swarm dev quickstart][swarm-dev-quickstart.md].

## VMs

Most VMs used are now upgraded to Ubuntu 14.04 except the
following:

- maillist (aka frankie-vm): running 12.04 - should be okay to upgrade to 14.04.

- forums: running 12.04 with IPBoard - need some serious testing
before upgrading to 14.04 (will upgrade PHP 5.3 -> 5.5 and
a lot of other things).

- wayfarer-wiki: this host is going away - do the following
before nuking the VM:

- Move all Apache redirection stuff to `wayfarer-p4d`.

- Archive the MySQL dump of the mediawiki.

### Public VMs

We document the list of all the VMs used at:

https://confluence.perforce.com:8443/display/COM/Machines+and+VMs

Pay extra attention to:

- `wayfarer-p4d`: our production P4D instance that backs
everything. As of 2015/07/17 we are running P4D on 1666
*unencrypted*. SSL broker deployed on port 1667, use `P4PORT=ssl:workshop.perforce.com:1667` to use encrypted connections.

- `wayfarer-swarm`: Swarm instance that powers
swarm.workshop.perforce.com.

- `wayfarer-{p4d,swarm}-stage`: staging server for the production
servers.

- `forums.perforce.com`: runs our forum software IPBoard with
bridge to mailing list `perforce-user@perforce.com`. Which runs on
`maillist.perforce.com`.

- `frankie-vm` aka `maillist.perforce.com` runs a vanilla Mailman
install to host our external mailing lists.

- `wayfarer-gf`: Git Fusion instances that connects to
wayfarer-p4d - HTTPS only. Anonymous clone can be done with the
Perforce user "guest". For example:

git clone https://guest:@git.workshop.perforce.com/<git-repo>

- `wayfarer-search` - p4search (VM up, not installed as of 2015/07/17).

The above VMs runs in the DMZ - locked down with no access to the
internal networks.

In general, running the following command will give you all the
important (internal) IPs:

```
host -l perforce.com | egrep '(wayfarer-|frankie-vm|forums|wayport)'

forums.perforce.com has address 10.199.2.42
forums-vm.perforce.com has address 10.199.2.42
frankie-vm.perforce.com has address 10.199.2.85
wayfarer-gf.perforce.com has address 10.199.2.54
wayfarer-p4d.perforce.com has address 10.199.2.50
wayfarer-p4d-stage.perforce.com has address 10.199.2.56
wayfarer-search.perforce.com has address 10.199.2.52
wayfarer-swarm.perforce.com has address 10.199.2.53
wayfarer-swarm-stage.perforce.com has address 10.199.2.57
wayfarer-wiki.perforce.com has address 10.199.2.51
wayport.perforce.com has address 10.199.2.187
```

## VMs from the inside

`das.perforce.com` is a subnet in the internal network.

- `eco-test.das.perforce.com`: Jenkins slave - see @tgray.

- `eco-dev.das.perforce.com`: has a P4D running on 1666, used to
store stuff that were deemed not appropriate for
`p4poke:1666` - get rid of it if you can. Also used as a hosts
on the inside for quick deployment (our `Capistrano` deployment
involves uploading a big zip file to `wayfarer-swarm` - very
slow if you are sipping coffee in your local cafe).

- `eco.eng.perforce.com.au` - a read-only replica of
`wayfarer-p4d` - just in case bad things happens (tm).

## Swarm deployment
From @llam:
- Make sure you have Capistrano installed
(version and dependencies located in [Gemfile](https://swarm.perforce.com/files/workshop/main/swarm/Gemfile))
- Sync //workshop/main/ on local machine
- cd into swarm directory
- Run appropriate `cap` command

If you see: `failing due to Problem accessing Perforce: []; cannot continue`
- export P4PORT=server.perforce.com:1666
- export P4CLIENT=your p4 client

From @tgray:
- `cap -T` lists all the tasks

- `cap staging` deploy runs "staging" and "deploy"

- `cap production deploy` to push to prod

- and `cap production deploy:rollback` is your friend.

On `wayfarer-swarm*` make sure that you are in the `www-data` and `team` (for access to password file)
group with `sudo` privilege.

## Passwords

For the workshop project we store password in two places:

- [P4D HOST]:/p4/common/bin/adminpass (for the SDP)

- [SWARM HOST]:/var/www/swarm/shared/data/password (for Capistrano)

Passwords are evil - get rid of them if you can. ;-)

## External -> internal job replication

We do *one-way replication* of jobs filed against selected
projects under the `perforce_software` name to our internal
P4D server (p4poke:1666). This is done by the following script:

//workshop/dev/pvt/jobxfr/jobxfr.py

Now part of `p4util`:

https://swarm.workshop.perforce.com/projects/p4util/files/main/p4util/admin/jobxfr.py

Due to the nature of job replication, the script requires a
custom configuration file and it's archived on our internal
server:

//workshop/dev/pvt/jobxfr/workshop-to-p4prod.conf

https://swarm.perforce.com/files/workshop/dev/pvt/jobxfr/workshop-to-p4prod.conf

All replication states are stored in the key `jobxfr.p4prod`.

Currently the script runs from `lcheung`'s cronjob on
`eco-dev.das.perforce.com`. If you plan nuking the account, make
sure you backup the P4TICKET in `/home/lcheung/.p4tickets` (or
get IT to reset passwords for the user `workshop-jobxfr`).

Currently we are only replicating the following projects:

project=perforce-software-p4connect project=perforce-software-sdp project=perforce-software-p4api-net

Edit the
[crontab][https://swarm.perforce.com/files/workshop/dev/pvt/jobxfr/crontab]
to enable replication for more projects.

In case the script files, look for errors in eco-dev.das.perforce.com:/home/lcheung/jobxfr.log.

## Disk setup on wayfarer-{p4d,swarm}*

Both wayfarer-p4d and wayfarer-p4d-stage are using Linux Volume
Manager (LVM) for all the Perforce storage needs. Both a volume
group (VG) named "sdp" with 3 logical volumes (LVs):

```
LV Name /dev/sdp/metadata
LV Size 10.00 GiB
LV Name /dev/sdp/logs
LV Size 5.00 GiB
LV Name /dev/sdp/depotdata
LV Size 30.00 GiB
```

LVM was used to provide isolation between the filesystems and
minimizing downtime during disk upgrades.

To expand disk online you do the following:

- email IT to add a new virtual disk to the VM (online - no
shutdown of VM required).

- confirm the new disk is listed in `/proc/partitions`.

- partition the disk, for example, if the new disk is /dev/sde,
you setup a single partition `sde1` and set the partition label
to "Linux LVM" (id "8e").

- initialize the new partition as a physical volume (PV) for LVM:

pvcreate /dev/sde1

- add the new PV to the volume group (VG), say the VG is called
`sdp`, you do:

vgextend sdp /dev/sde1

- Run the following to check exactly how much space is available
for your logical volume (LV):

vgdisplay

You will see something similar to:

```
--- Volume group ---
VG Name sdp
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 6
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 2
Open LV 2
Max PV 0
Cur PV 1
Act PV 1
VG Size 31.50 GiB
PE Size 4.00 MiB
Total PE 8063
Alloc PE / Size 8063 / 31.50 GiB
Free PE / Size XXXX / YYYY GiB
VG UUID uVktxp-6EOO-fvLg-JM7f-tjKN-M4m5-tyuiRm
```

- Note the number XXXX above. Say your LV is /dev/sdp/depotdata
(or /dev/mapper/sdp-depotdata), you can extend your LV with:

lvextend -l XXXX /dev/sdp/depotdata

- Expand the filesystem on the LV:

resizefs /dev/sdp/depotdata

### Removing Virtual Disk

In the rare event that you need to remove unused disks from the
VM, you need to free up the block device and deregister it from
the running kernel.

- If it's a regular filesystem, un-mount the filesystem.

- If the disk is part of a LVM volume group (VG), you need to run
a few more commands. For example, if your disk is `sdb` and has
a single LVM partition `sdb1` which is part of the VG `sdp`,
you do the following:

- Remove it from the volume group by moving physical extends
(PEs) away from the disk:

```pvmove sdp /dev/sdb1```

- Remove the PV from the VG:

```vgreduce sdp /dev/sdb1```

- Capture the output of `fdisk -l /dev/sdb` for IT's reference
(see below).

- Tell the kernel to forget the whole block device:

```echo 1 > /sys/block/sdb/device/delete ```

- Notify IT the block device is ready for removal by
forwarding the `fdisk` output above. This may require
rebooting of the VM depending on how the VM is configured
and/or if there are snapshots created against the VM.

## Deleting Users (Spammers) from the workshop

Removing users from the workshop can be tricky because we have a
trigger that preventing users from being removed unless the user
in question is not a member of any groups.

If you have [p4util][p4util] you can purge a user with:

```
python -m p4util.adm.deluser <user>
```

[p4util]: https://swarm.workshop.perforce.com/projects/p4util/

## Pulling in live data from wayfarer-p4d to wayfarer-p4d-stage

Wiki:
https://confluence.perforce.com:8443/display/COM/How+to+refresh+data+on+wayfarer-p4d-stage

Script:
https://swarm.perforce.com/files/workshop/dev/pvt/scripts/les-refresh-data-from-prod.sh

Run the above script on wayfarer-p4d-stage as root. It does the following:
- Stop the Perforce server.
- Remove old copies of online+offline db.*, journal and checkpoint.
- Move existing online+offline db.*, journal and checkpoint files out of the way.
- Grab the latest checkpoint from `wayfarer-p4d` and restore it.
- Starts up the server.
- Fix up protection table to enable access from `wayfarer-swarm-stage` instead of `wayfarer-swarm`.
- Fix up the offline database in offline_db.
- Sync depot files over.
- Re-deploy swarm to stage with `cap staging deploy` - it will re-generate the tickets used by Swarm as part of the deployment.

Note that:
- We do not have a Git Fusion server connected to `wayfarer-p4d-stage` - thus stubs were installed instead.
- Tickets on `*-stage` is different to the ones in
production. This is being handled by the deployment scripts in
capistrano.

## Changing project ownership
At this point it's mostly a manual process:
- Create project in web app with new owner's name
This may involve resetting owner password or needing to
obtain password from owner.
- Manually add members over
- p4 duplicate //source/project/files/... //target/project/files/...
- Use moveFollowers route to move followers over:
http://$SWARM_URL/admin/moveFollowers/source-project/target-project
- Delete original project in web app (there is a Delete button in Edit Project page)
This may involve modifying owner list, since they will all receive a
confirmation email as a result of the project deletion.
- Obliterate original files

#	Change	User	Description	Committed
#2	18730	Liz Lam	clean up code and move things around
#1	18334	Liz Lam	initial add of jambox