RAL Tier1 weekly operations castor 25/06/2012
From GridPP Wiki
Contents
Operations News
- none
Operations Problems
- Latest tested kernel + errata (kernel-2.6.18-308.4.1.el5 + 20120514) caused daemon crashes on ATLAS SRM lcgsrm03 so we have rolled back to the previous setup (kernel-2.6.18-274.12.1.el5 + 20120305) and will not be updating the SRMs this Wednesday
- Intermittent failures in CMS VO tests over the weekend
Blocking Issues
none
Planned, Scheduled and Cancelled Interventions
Entries in/planned to go to GOCDB
Description | Start | End | Type | Affected VO(s) | Lead by |
---|---|---|---|---|---|
ORACLE 11g upgrade plus server reboots and errata updates (STC) | 27/06/12 08:45 | 27/06/12 14:30 | Downtime | All | Rich |
Advanced Planning
Tasks
- Test and certify 2.1.12-4 (Matthew, Chris)
- Re-instantiate certification on HyperV VMs using Quattor+Puppet (Rob)
- Selection of disk-only prototype solution (Shaun, Rob, Brian, James)
Interventions
- Upgrade repack to 2.1.12-4 (Jun)
- Upgrade Castor Facilities and Tier1 instances to 2.1.11-9 (Jul)
- Upgrade to 2.1.12 on Tier1 instances once we are happy with TM and TG in performance (Jul)
Staffing
- Castor on Call person: Chris
- Staff absence/out of the office:
- (Mon) ..