RAL T1 weekly ops Fabric 20110516
From GridPP Wiki
Contents
Developments
- All:
- Tim:
- James A:
- Restructured disk server templates to remove duplication.
- Created new Storage-D machine type for facilities instance.
- Thinking about metrics and statistics.
- Cheney
- change ip addresses facilities arrays
- investigate storageD lockup
- certificates problem
- fix some backups
- apply errata to some facilities kit
- docco for dmf dr
- server stats
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Decommissioning old batch systems.(R 27)
- quattor02 showing same errors again.
- Viglen 2007 all disk servers firmware update. (ongoing)
- Update firmware on Jetstor systems.(ongoing) Updated on three.
- gdss502 passed acceptance test. (Given back to Castor team)
- Add (SL09, V09 and SL10) in Adaptec Storage Manager for monitoring.
- SL08 testing 3 disk servers with multiple drives failure. Review with MJB.
- gdss294 read only file-system.
- gdss293 passed memory test.
- Dell system from Castor Rack H reported. (IDRAC failure)
- gdss206 replaced drives and rebuild completed. (Back into production)
- Fabric hardware failure analyze meeting with Andrew and James.
- Martin:
- Ian:
Operational Issues and Incidents
Index | Description | Start | End | Severity | Affected VO(s) |
---|
Summary of plans for week ahead
Scheduled and Cancelled Down Times
Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB
Component | Description | Start | End | Affected VO(s) | Type |
---|
Development priorities
- All
- Tim:
- Cheney
- Backups for SCT
- James A:
- Assisting with finalisation of Facilities instance configs.
- Developing Job Plan.
- Computer room tours.
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Hardware failure metrics continue.
- Continue SL08 testing.
- Continuous decommissioning old batch systems.(R 27)
- Continue Labelling racks and systems in UPS and HPD room
- Martin:
- Ian:
Absences
Fabric On-Call
- Monday - Sunday