RAL Tier1 weekly operations Fabric 20101220
From GridPP Wiki
Contents
Developments
- All:
- Martin:
- Ian:
- Tim:
- James A:
- James T
- Cheney
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Decommissioning old batch systems.(R 27)
- gdss380 still with Streamline for fix.(Crashed with single faulty drive)
- gdss417 acceptance testing. (Crashed with single faulty drive)
- gdss280 replaced 16 ports raid card, configured and installed with quattor.
- gdss117 replaced raid card and 3 drives, configured and installed with quattor.
- Job plan review.
- Hardware failure metrics.
- Streamline/areca disk servers crashed due to single faulty drive. (ongoing)
- gdss364 replaced 16 ports raid card. (Back to production)
- gdss135 given back to Castor team.
Operational Issues and Incidents
Index | Description | Start | End | Severity | Affected VO(s) |
---|
Summary of plans for week ahead
Scheduled and Cancelled Down Times
Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB
Component | Description | Start | End | Affected VO(s) | Type |
---|
Development priorities
- All
- Martin:
- Ian:
- Tim:
- Cheney
- James T:
- James A:
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Job plan review update.
- Update wiki for hardware spares during Christmas.
- Hardware failure metrics continue.
- Continuous decommissioning old batch systems.(R 27)