RAL Tier1 weekly operations castor 24/12/2012
From GridPP Wiki
Contents
Preparation for Christmas
- Make sure there is no disk servers in draining status (DONE)
- Make sure all outstanding ticket regarding to checksum problems are close
- Verify if vcert is working for Jens' Christmas CIP development/tests
Operations News
- RAID controller firmware was upgraded on all V08 disk servers to prevent further crashes and transitory data loss of these disk servers.
Operations Problems
- (Tue) 4 hour network outage due to a failed board on Router A meant all Tier 1 services were inaccessible.
- Problem with PreProd DB (Fortuna) filling up the opt partition
- Quattor bug means that root passwords are unusable on all SL5 systems. This has been fixed for all production disk servers as an emergency change this week, in preparation for the Christmas holiday. It has yet to be fixed on other systems.
Blocking Issues
none
Planned, Scheduled and Cancelled Interventions
Entries in/planned to go to GOCDB none
Advanced Planning
Tasks
- Simplify and document Quattor templates to make them easier to maintain
- Test and certify 2.1.13-5 with simplified Quattor templates
Interventions
- Upgrade stagers from 2.1.12 to 2.1.13 and central services (NS,CUPV,VDQM) from 2.1.11 to 2.1.13
Staffing
- Castor on Call person
- Matthew
- Staff absence/out of the office:
- Christmas Holiday week