Glasgow cold-start procedures
From GridPP Wiki
How to bring UKI-ScotGrid-Glasgow out of power failure induced slumber
- prereqs - Check Aircon, Basement routers stable
- Bring back power to underfloor sockets via reset button on main switchboard
- UPS in server rack should power up, as should all the APC Masterswitches (but NO machines should be on apart from svr031
- Bring up svr031 fully (disabling nagios may be advisable?)
- Power up disk037
powernode --host=disk037 --on
- Power up the rest of the disk servers
powernode --host=disk032-036,disk038-041 --on
- bring up the servers
powernode --host=svr016-030 --on # hey, check the actual list of servers - it keeps expanding....
- bring up the nodes (add in a stagger to prevent network flooding)
for i in `seq -w 1 140`; do powernode --host=node$i ; sleep 20 ; done
Next you'll need to check that the NFS mounts to disk037 are behaving, the state of the torque workers
svr016 $> pbsnodes -l
and that the monitoring infrastructure looks normal (ganglia, nagios, emails etc)