RAL Tier1 weekly operations Fabric 20101101
From GridPP Wiki
Contents
Developments
- All:
- Martin:
- Ian:
- Tim:
- Jonathan:
- James A:
- James T
- Cheney
- investigating why castor facilties got wiped
- writing db backups check scripts
- set up test of amanda backup & recovery for kevin h
- investigate web intrusions on hinode
- applying quatted OS updates
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Decommissioning old batch systems.(R 27)
- gdss380 still with Streamline for fix.(Crashed with single faulty drive)
- gdss417 acceptance testing. (Crashed with single faulty drive)
- gdss280 crashed again with replacement raid card borrowed from gdss338. (Testing)
- gdss569 finish testing.
- gdss463 replaced raid card but couldn't fix the problem.
- Hardware failure stats/graphs.
- lcgwms03 replaced drive in sdb. (hotswap)
- gdss408 replaced memory.(Replacement memory added in gdss377)
- Pack Streamline faulty parts for collection.
- Streamline/areca disk servers crashed due to single faulty drive. (ongoing)
Absences
- Jonathan on partial retirement (not in on Monday and Friday)
- Cheney early warning -likely to be off most of november- date subject to change
Operational Issues and Incidents
Index | Description | Start | End | Severity | Affected VO(s) |
---|
Summary of plans for week ahead
Scheduled and Cancelled Down Times
Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB
Component | Description | Start | End | Affected VO(s) | Type |
---|
Development priorities
- All
- Martin:
- Ian:
- Tim:
- Cheney
- more investigations as to how castor facilities got wiped
- finish set up of database backup checks
- Jonathan:
- James T:
- James A:
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Continuous decommissioning old batch systems.(R 27)
Absences
- Jonathan on partial retirement (not in on Monday and Friday)
- Cheney early warning -likely to be off most of november- date subject to change
Fabric On-Call
- Kashif Hafeez