RAL T1 weekly ops Fabric 20110725
From GridPP Wiki
Contents
Developments
- All:
- Tim:
- T10KC drive issues
- T10KC tape server work
- Castor rack reconsiliation
- Work experiance student
- SGI Test kit install
- Atlas tape pool consolidation
- CMS tape pool consolidation
- James A:
- Added support for Nortel/Avaya switches to Observium.
- Gave a tour to work experience students from ISIS.
- Added checks to disk server deployment scripts to avoid multiple template snafus.
- Changed IPs of Streamline 2008 WNs.
- Cheney
- Backups - stability and performance improvements
- DMF - testing of integrity and performance
- look thru dns for servers to backup
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Decommissioning old disk servers/batch systems.
- Appointment with Physio.
- gdss335 went down. (Kernel panic)
- SL08 15 disk servers (with no errors) for deployment.
- EMC PSU failure. (Report)
- gdss208 re-create raid array.
- gdss193 back into production.
- Replaced 10 drives in Viglen 2009 disk servers. (SMART errors)
- Configured network ports and re-installed SL10 disk servers to fix network.
- Add SL10 disk servers in Adaptec Storage Manager for monitoring.
- gdss96 Kernel panic. (Started memory test)
- Martin:
- Ian:
- On Leave
Operational Issues and Incidents
Index | Description | Start | End | Severity | Affected VO(s) |
---|
Summary of plans for week ahead
Scheduled and Cancelled Down Times
Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB
Component | Description | Start | End | Affected VO(s) | Type |
---|
Development priorities
- All
- Tim:
- Install T10KC tape servers
- Further T10KC tape drive problem investigation
- SGI test kit testing
- ADS shutdown work
- Amanda install time-line planning
- Cheney
- Backups
- DMF testing
- James A:
- Change IPs of Viglen 2008 WNs.
- Feed back patches to Observium.
- Request and installation of certificates on all 2010 Storage Nodes.
- Fixing and repackaging acceptance tests for distribution.
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Hardware failure review and metrics continue.
- Continuous decommissioning old disk servers/batch systems.(R 27)
- Continue Labelling racks and systems in UPS and HPD room
- Martin:
- Ian:
- Helping Aslan (Nuffield Bursary student) get started
- Install additional hypervisor
- Start on eval of additional Equalogic array
Absences
- Ian on leave Weds PM Thursday and Friday
- Tim on leave Thursday and Friday
Fabric On-Call
- Ian Monday-Tuesday; Kash Wednesday - Sunday