Glasgow cold-start procedures

From GridPP Wiki
Jump to: navigation, search

How to bring UKI-ScotGrid-Glasgow out of power failure induced slumber

  • prereqs - Check Aircon, Basement routers stable
  • Bring back power to underfloor sockets via reset button on main switchboard
  • UPS in server rack should power up, as should all the APC Masterswitches (but NO machines should be on apart from svr031
  • Bring up svr031 fully (disabling nagios may be advisable?)
  • Power up disk037
powernode --host=disk037 --on 
  • Power up the rest of the disk servers
powernode --host=disk032-036,disk038-041 --on
  • bring up the servers
powernode --host=svr016-030 --on # hey, check the actual list of servers - it keeps expanding....
  • bring up the nodes (add in a stagger to prevent network flooding)
for i in `seq -w 1 140`; do powernode --host=node$i ; sleep 20 ; done


Next you'll need to check that the NFS mounts to disk037 are behaving, the state of the torque workers

svr016 $> pbsnodes -l

and that the monitoring infrastructure looks normal (ganglia, nagios, emails etc)