RAL Tier1 weekly operations castor 24/12/2012
From GridPP Wiki
Preparation for Christmas
- Make sure there is no disk servers in draining status (DONE)
- Make sure all outstanding ticket regarding to checksum problems are close
- Verify if vcert is working for Jens' Christmas CIP development/tests
- RAID controller firmware was upgraded on all V08 disk servers to prevent further crashes and transitory data loss of these disk servers.
- (Tue) 4 hour network outage due to a failed board on Router A meant all Tier 1 services were inaccessible.
- Problem with PreProd DB (Fortuna) filling up the opt partition
- Quattor bug means that root passwords are unusable on all SL5 systems. This has been fixed for all production disk servers as an emergency change this week, in preparation for the Christmas holiday. It has yet to be fixed on other systems.
Planned, Scheduled and Cancelled Interventions
Entries in/planned to go to GOCDB none
- Simplify and document Quattor templates to make them easier to maintain
- Test and certify 2.1.13-5 with simplified Quattor templates
- Upgrade stagers from 2.1.12 to 2.1.13 and central services (NS,CUPV,VDQM) from 2.1.11 to 2.1.13
- Castor on Call person
- Staff absence/out of the office:
- Christmas Holiday week