RAL Tier1 weekly operations Fabric 20110131
From GridPP Wiki
Editing RAL Tier1 weekly operations Fabric 20110110
Contents
Developments
- All:
- Martin:
- Ian:
- Tim:
- James A:
- James T
- Strategy meeting
- Preparation for CMS SL5 upgrade
- Disk sweep in Kash's absence
- Security log searches
- Project management course
- Cheney
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Decommissioning old batch systems.(R 27)
- gdss380 add new mac address in dhcp and re-install.
- gdss189 read-only filesystem.(Scsi errors)
- gdss496 Scsi errors. Reported to Streamline with logs.(Intervention)
- Tier1 Strategy meeting.
- Fabric Hardware failure metrics.
- Jetstor systems more drive failures.
- lcgbdii0652 moved into UPS room with Richard.
- gdss337 replaced 4x2gb memory. (Back into production)
- gdss98 given back to Castor team.
- gdss280 started Acceptance test. Replacement disk server for gdss283.
- Clear Test area.
- gdss435 replaced 4x2gb memory. Back into production.
- lcgec01 replaced drive with hotswap method.
- SL 2010 and Viglen 2010 disk servers in testing.
- SL 2009 Auto rebuild on hotspare fails. Set rebuild priority from Low to High.
Operational Issues and Incidents
Index | Description | Start | End | Severity | Affected VO(s) |
---|
Summary of plans for week ahead
Scheduled and Cancelled Down Times
Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB
Component | Description | Start | End | Affected VO(s) | Type |
---|
Development priorities
- All
- Martin:
- Ian:
- Tim:
- Cheney
- James T:
- CMS SL5 upgrade
- SL08 investigations
- Puppet -> Quattor work
- James A:
- Kash:
- Drive replacement.
- Fixing broken WNs.
- SL 2009 Auto rebuild on hotspare fails. Set rebuild priority from Low to High
- Hardware failure metrics continue.
- SL08 testing.
- Continuous decommissioning old batch systems.(R 27)
Absences
Fabric On-Call
- Monday - Sunday