RAL Tier1 weekly operations castor 27/05/2013
From GridPP Wiki
Revision as of 13:14, 28 May 2013 by Matt viljoen (Talk | contribs)
Contents
Operations News
- None
Operations Problems
- (Tue) After the morning networking downtime, ORACLE problem appeared on newly upgraded 2.1.13 Facilities CASTOR relating to "in-doubt distributed transactions". Some %10 of transactions failing as a result.
- (Tue) Some problem on ATLAS SRM database caused all four SRMs to die at the same time. Impact was minimal as they quickly re-established connection.
Blocking Issues
- Can't upgrade puppet until someone spends time learning about administering it (to replace Chris) and this may delay an SL6 upgrade
Planned, Scheduled and Cancelled Interventions
Entries in/planned to go to GOCDB none
Advanced Planning
Tasks
- None
Interventions
- Upgrade central services (NS,CUPV,VDQM) from 2.1.11-9 to 2.1.13-9
- Upgrade stagers from 2.1.12 to 2.1.13
Staffing
- Castor on Call person
- Rob
- Staff absence/out of the office:
- (Wed) Matthew and Shaun at CERN