RAL T1 weekly ops Fabric 20111031
From GridPP Wiki
Contents
Developments
- All:
- Tim:
- James A:
- Cheney
- meeting with mark van de sanden
- fix oracle backups
- edit out unused partition from oracle backups
- fix ctsd12
- fix castor201
- amanda rollout
- edit firewall rules
- upgrade acsss to 8.0.2
- fix zora firewall rule problem
- remove conflicting amanda package
- fix xen backup
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Decommissioning old disk servers/batch systems. (Viglen 2006 started)
- gdss296 failed acceptance test.
- Updating wiki page about how to report fault to Vendors.
- gdss540 reporting abnormal battery temperature after firmware update.
- gdss456 started 7 days acceptance testing.
- lcgcts13 updated firmware and re-install. (fixed) moved back into rack.
- gdss538 given back to Castor team for preprod.
- gdss396 failed acceptance test.
- gdss295 failed acceptance test.
- Change control for updating firmware on Adaptec controllers on Viglen 2009 disk servers accepted.
- Updated firmware on d0t1 disk servers in Viglen 2009 generation.
- Martin:
- Ian:
Operational Issues and Incidents
Index | Description | Start | End | Severity | Affected VO(s) |
---|
Summary of plans for week ahead
Scheduled and Cancelled Down Times
Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB
Component | Description | Start | End | Affected VO(s) | Type |
---|
Development priorities
- All
- Tim:
- Cheney
- Continue amanda rollout for ads shutdown
- confer with db group on iptables rules
- James A:
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Hardware failure review and metrics continue.
- Continuous decommissioning old disk servers/batch systems.(R 27)
- Continue Labelling racks and systems in UPS and HPD room.
- Martin:
- Ian:
Absences
Fabric On-Call
- Kashif (Monday to Sunday)