RAL Tier1 weekly operations Fabric 20101011
From GridPP Wiki
Revision as of 14:17, 25 October 2010 by Kashif hafeez (Talk | contribs)
Contents
Developments
- All:
- Martin:
- Ian:
- Tim:
- Jonathan:
- James A:
- James T
- Cheney
- quatting the facilities
- fix castor151
- copy over database archive logs
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Decommissioning old batch systems.(R 27)
- gdss110 fsprobe errors. (Acceptance testing)
- gdss380 failed acceptance test with new raid card as well.(Crashed with single faulty drive)
- gdss417 started acceptance testing. (Crashed with single faulty drive)
- Changed network settings in BIOS of Streamline 2009 disk servers.
- gdss280 crashed during acceptance testing. (Probably raid card)
- Arranged Streamline engineers visit for gdss490. Received and back into rack.
- Updated post-mortem for gdss280 & gdss417.
- Hardware failure stats/graphs.
- Fixed couple of new Dell machines.
- gdss512 received back from LSI. (USA)
- Streamline 2009 disk server testing in absence of James T.
- Streamline/areca disk servers crashed due to single faulty drive. (ongoing)
Absences
- Jonathan on partial retirement (not in on Monday and Friday)
Operational Issues and Incidents
Index | Description | Start | End | Severity | Affected VO(s) |
---|
Summary of plans for week ahead
Scheduled and Cancelled Down Times
Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB
Component | Description | Start | End | Affected VO(s) | Type |
---|
Development priorities
- All
- Martin:
- Ian:
- Tim:
- Cheney
- quatt the facilities
- Jonathan:
- James T:
- A/L until 14th October
- James A:
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Update daily status of Streamline 2009 disk servers testing.
- Continuous decommissioning old batch systems.(R 27)
Absences
- James T on A/L until 14th October
- Jonathan on partial retirement (not in on Monday and Friday)
Fabric On-Call
- Kashif Hafeez