RAL Tier1 weekly operations Fabric 20101004
From GridPP Wiki
Contents
Developments
- All:
- Martin:
- Ian:
- Tim:
- Jonathan:
- James A:
- James T
- 2.1.9 upgrade work
- Switched to replacement loggers in R89
- Migrated /stage/sl3-lcg-exp from csfnfs58 to gdss142
- Migrated dteamTest from gdss51 to gdss87
- Cheney
- quatting the castor facilities
- patching
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Decommissioning old batch systems.(R 27)
- gdss110 fsprobe errors. (Acceptance testing)
- gdss380 replaced raid card and started acceptance test again. (Crashed with single faulty drive)
- gdss417 started acceptance testing. (Crashed with single faulty drive)
- gdss405 edac memory error. Replaced 4x2gb memory.
- gdss280 crashed during acceptance testing. (Probably raid card)
- srm205 upgraded memory.
- Updated post-mortem for gdss280 & gdss417.
- Hardware failure stats/graphs.
- gdss408 given back to production.
- Installing Streamline 2009 disk server for testing with James T.
- Streamline/areca disk servers crashed due to single faulty drive. (ongoing)
Absences
- Jonathan on partial retirement (not in on Monday and Friday)
Operational Issues and Incidents
Index | Description | Start | End | Severity | Affected VO(s) |
---|
Summary of plans for week ahead
Scheduled and Cancelled Down Times
Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB
Component | Description | Start | End | Affected VO(s) | Type |
---|
Development priorities
- All
- Martin:
- Ian:
- Tim:
- Cheney
- Quatt the castor facilities
- Jonathan:
- James T:
- A/L until 14th October
- James A:
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Update daily status of Streamline 2009 disk servers testing.
- Continuous decommissioning old batch systems.(R 27)
Absences
- James T on A/L until 14th October
- Jonathan on partial retirement (not in on Monday and Friday)
Fabric On-Call
- Kashif Hafeez