RAL Tier1 weekly operations Fabric 20110214
From GridPP Wiki
Revision as of 16:37, 21 February 2011 by Kashif hafeez (Talk | contribs)
Contents
Developments
- All:
- Martin:
- Ian:
- Tim:
- James A:
- James T
- Correct installation of V10 machines with Quattor
- WAN tuning on cmsWanIn and cmsWanOut
- SL08 testing
- Investigating blocking processes
- Cheney
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Decommissioning old batch systems.(R 27)
- gdss380 add new mac address in dhcp and re-install.
- Change control for Adaptec raid cards. (SL09 & SL10)
- gdss496 start Acceptance test.(Intervention)
- High battery temperature messages on V10 and SL10 disk servers.
- Fabric Hardware failure metrics.
- Update firmware on Jetstor systems.
- gdss502 drives failure and failed stripes. (Started verify fix)
- gdss510 faulty motherboard. Reported to Streamline.
- gdss66 given back to Castor team.
- gdss280 passed acceptance test and put back into production.
- SL 2009 Auto rebuild on hotspare fails. Set rebuild priority from Low to High.
Operational Issues and Incidents
Index | Description | Start | End | Severity | Affected VO(s) |
---|
Summary of plans for week ahead
Scheduled and Cancelled Down Times
Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB
Component | Description | Start | End | Affected VO(s) | Type |
---|
Development priorities
- All
- Martin:
- Ian:
- Tim:
- Cheney
- James T:
- Gen upgrade to SL5 64-bit
- Apply new WAN tuning to all CMS disk servers
- Disk servers as iSCSI targets
- SL08 testing
- Re-install all V10/SL10 machines
- A/L Thursday
- James A:
- Kash:
- Drive replacement.
- Fixing broken WNs.
- SL 2009 Auto rebuild on hotspare fails. Set rebuild priority from Low to High
- Hardware failure metrics continue.
- SL08 testing.
- Continuous decommissioning old batch systems.(R 27)
Absences
- James T A/L Thursday
Fabric On-Call
- Monday - Sunday