Difference between revisions of "RAL Tier1 weekly operations Fabric 20101115"
From GridPP Wiki
(No difference)
|
Latest revision as of 16:29, 22 November 2010
Contents
Developments
- All:
- Martin:
- Ian:
- Tim:
- Jonathan:
- James A:
- James T
- Cheney
- blown drives replaced
- make script to restore from tape for db group
- script for db group backups checks
- write nagios checks for castor fac
- fix castor303
- cutover sls to jiscmail server only
- backups
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Decommissioning old batch systems.(R 27)
- gdss380 still with Streamline for fix.(Crashed with single faulty drive)
- gdss417 acceptance testing. (Crashed with single faulty drive)
- gdss280 crashed again with replacement raid card borrowed from gdss338. (Testing)
- Annual Hearing review.
- gdss117 failed during test.
- Hardware failure stats/graphs.
- Meeting with Gareth from Streamline about SL08 issues.
- Streamline/areca disk servers crashed due to single faulty drive. (ongoing)
Operational Issues and Incidents
Index | Description | Start | End | Severity | Affected VO(s) |
---|
Summary of plans for week ahead
Scheduled and Cancelled Down Times
Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB
Component | Description | Start | End | Affected VO(s) | Type |
---|
Development priorities
- All
- Martin:
- Ian:
- Tim:
- Cheney
- backups of various sorts
- Jonathan:
- James T:
- James A:
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Continuous decommissioning old batch systems.(R 27)
Absences
- Jonathan on partial retirement (not in on Monday and Friday)
- Cheney - changed date for being off - now Nov 24th - early warning -likely to be off most of december - date subject to change -
Fabric On-Call
- Kashif Hafeez