RAL Tier1 weekly operations Fabric 20091026
From GridPP Wiki
Contents
Summary of week gone
Developments
- All:
- Martin:
- Prep for HEPiX
- Ian:
- Prep for HEPiX
- Quattor development
- James T:
- A/L
- Jonathan:
- completed update of SSH keys for root across farm
- configured and started atlasbackup on nfs1
- Nagios configuration updates
- 3 days leave
- James A:
- Working with manufacturers and suppliers towards a solution for the problems with half of the 2008 storage purchase.
- Looked after the batch and storage farms.
- Worked on SINDES in any spare time.
- Kash:
- Drive replacement.
- Fixing broken WNs.
- gdss297 replaced 4x2gb memory fixed and back in production.
- gdss126 double disks failure. Completed verifying array.
- gdss207 fixed and ready for deployment.
- gdss120 fixed and given back to castor.
- Working on 2008 Disk servers and working nodes.
- Working on gdss67, 86, 126, 140, 143 and 383.
Operational Issues and Incidents
Index | Description | Start | End | Severity | Affected VO(s) |
---|---|---|---|---|---|
EMC arrays serving 3D/LFC/FTS databases made unstable by attempts to stabilise the Castor EMC arrays | Tuesday 6/0ct am | not in sight | Catastrophic | All |
Summary of plans for week ahead
Scheduled and Cancelled Down Times
Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB
Component | Description | Start | End | Affected VO(s) | Type |
---|
Development priorities
- All
- Martin:
- @ HEPiX
- Ian:
- @ HEPiX
- James T:
- on Leave
- Jonathan:
- Migrate Tier1 home filesystem to nfs1 (/home/tier1)
- Configure Nagios slave in Quattor
- Nagios configuration updates
- James A:
- Continue working with manufacturers and suppliers towards a solution for the problems with half of the 2008 storage purchase.
- Look after the batch and storage farms.
- Work on SINDES in any spare time.
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Continuous working on 2008 disk servers and working nodes.
- Continuous Working on gdss67, 86, 126, 140, 143 and 383.
Absences
- James T
- James T on A/L from Thursday 15th until Monday November 2nd.
Fabric On-Call
- Mon-Fri:
Advanced Warning of Requirements and Blocking issues
Services Issues
- Various requests for hardware.