Difference between revisions of "RAL Tier1 Machine Room Walkthrough"

From GridPP Wiki
Jump to: navigation, search
 
(No difference)

Latest revision as of 10:41, 26 February 2008

This page is designed to be an aid to those people who are doing a physical check of the machines at the Tier1 who may not normally do it. It's currently a draft and may need rewording or a different layout to make sense to all.

N.B. Ear defenders are mandatory in A5 lower.

General

Keep access clear. Look for trip hazards, waste and lifted floor tiles. Report any obstructions to Ops and remove them if safe to do so.

A1 upper

  • Look for red lights on disk servers. Ignore lights on the arrays with missing drives, these are the spares.
  • Look at the rear of the worker nodes. Note any lights that are different when compared with the other machines in the rack.
  • Check for red lights on enigma (in the bottom of the lone "dev" rack, near the console).
  • Check machines (and associated arrays) on the shelving: touch, goc01, goc02, wallace (AFS), sting (helpdesk), shelob (cvs, www)
  • Check switches are on in the comms rack in the far corner.
  • Check Manchester tower servers (csfnfs31,32,44,45) for any lights and beeps.
  • Check csfnfs02 for beeps or read lights.

A5 upper

  • Pop into Ops and find out if there has been any issues overnight, especially regarding tapes (CASTOR, ADS).

A5 Lower

Ear defenders are required in A5 Lower. Tier1 racks are labelled with the Tier1 logo.

  • Walk around all the disk servers (Viglen, Clustervision, Compusys).
    • Look for red lights and listen for bleeps (no need to remove ear defenders, you will hear them).
    • Look for solid blue lights on Viglen disk servers.
    • Magenta lights on Clustervision disk servers mean that they are rebuilding.
    • If a red attention light is found on the Compusys arrays, check message on the LCD. If the message is COntroller BBU is charging then follow the following procedure:
      • Press ESC to acknolwedge the message. The display should now read Force controller write-through.
      • Press ESC again. The message should now be Ctlr Def write policy restore.
      • Press ESC a third time. The red light should disappear.
        If the above procedure does not work or the message is different, do not acknowledge the alarm. Contact the fabric team.
  • Check the lights on the disks of the Compusys arrays:
    • Blue = activity
    • Green = OK
    • Red = disk fault. This should be accompanied by a red light on the array controller and a fault message on the LCD.
  • Check back of Tier1 worker nodes for differing lights and beeps.
  • Check the service nodes in the Compusys racks.
  • Glance at the LCDs on the air conditioning units. Inform Ops of any errors although they should know already.
  • Check that the A5 lower exterior access doors are shut.