RAL Tier1 weekly operations Fabric 20091026

From GridPP Wiki
Jump to: navigation, search

Summary of week gone

Developments

  • All:
  • Martin:
    • Prep for HEPiX
  • Ian:
    • Prep for HEPiX
    • Quattor development
  • James T:
    • A/L
  • Jonathan:
    • completed update of SSH keys for root across farm
    • configured and started atlasbackup on nfs1
    • Nagios configuration updates
    • 3 days leave
  • James A:
    • Working with manufacturers and suppliers towards a solution for the problems with half of the 2008 storage purchase.
    • Looked after the batch and storage farms.
    • Worked on SINDES in any spare time.
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • gdss297 replaced 4x2gb memory fixed and back in production.
    • gdss126 double disks failure. Completed verifying array.
    • gdss207 fixed and ready for deployment.
    • gdss120 fixed and given back to castor.
    • Working on 2008 Disk servers and working nodes.
    • Working on gdss67, 86, 126, 140, 143 and 383.

Operational Issues and Incidents

Index Description Start End Severity Affected VO(s)
EMC arrays serving 3D/LFC/FTS databases made unstable by attempts to stabilise the Castor EMC arrays Tuesday 6/0ct am not in sight Catastrophic All

Summary of plans for week ahead

Scheduled and Cancelled Down Times

Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB

Component Description Start End Affected VO(s) Type

Development priorities

  • All
  • Martin:
    • @ HEPiX
  • Ian:
    • @ HEPiX
  • James T:
    • on Leave
  • Jonathan:
    • Migrate Tier1 home filesystem to nfs1 (/home/tier1)
    • Configure Nagios slave in Quattor
    • Nagios configuration updates
  • James A:
    • Continue working with manufacturers and suppliers towards a solution for the problems with half of the 2008 storage purchase.
    • Look after the batch and storage farms.
    • Work on SINDES in any spare time.
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Continuous working on 2008 disk servers and working nodes.
    • Continuous Working on gdss67, 86, 126, 140, 143 and 383.

Absences

  • James T
    • James T on A/L from Thursday 15th until Monday November 2nd.


Fabric On-Call

  • Mon-Fri:

Advanced Warning of Requirements and Blocking issues

Services Issues

  • Various requests for hardware.

Category:RAL_Tier1

RAL Tier1 weekly operations fabric