RAL T1 weekly ops Fabric 20111003
From GridPP Wiki
Contents
Developments
- All:
- Tim:
- James A:
- Cheney
- upgrade acsls and solaris
- add new secret tape servers
- massage backups reporting
- improve solaris backups
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Decommissioning old disk servers/batch systems. (Viglen 2006 started)
- 5 SL08 disk servers partitioned and installed for re-deployment.
- AFS2 drive failure. (Reported)
- Firmware update completed on Viglen 2007 AMD disk servers.
- gdss542 sent support.zip (log) to Adaptec.
- Replaced 10 drives in Viglen 2009 disk servers. (SMART errors)
- gdss581 logs sent to vendors.
- gdss456 read only file system. Double disks failure.
- gdss295 fsprobe errors. Started memory test.
- gdss403 kernel panic.
- Martin:
- Ian:
Operational Issues and Incidents
Index | Description | Start | End | Severity | Affected VO(s) |
---|
Summary of plans for week ahead
Scheduled and Cancelled Down Times
Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB
Component | Description | Start | End | Affected VO(s) | Type |
---|
Development priorities
- All
- Tim:
- Cheney
- Finish off acsls and solaris upgrade
- get tsbn stats working again
- James A:
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Hardware failure review and metrics continue.
- Continuous decommissioning old disk servers/batch systems.(R 27)
- Continue Labelling racks and systems in UPS and HPD room
- Martin:
- Ian:
Absences