RAL Tier1 weekly operations castor 05/03/2012
From GridPP Wiki
- Gen upgraded to 2.1.11-8. Only SL4 headnodes are now the older tape servers.
- (Thu) SRMs emergency upgraded to 2.11-1 to workaround FED crashing. All SL4 RPMs on lcgsrm03 upgraded to SL5 to try to fix the underlying problem.
- Ongoing SRM crashing continued until Thursday when we upgraded the SRMs - which fixed the crashing
- Newly deployed gdss535 (lhcb) was found to have a wrong routing table and was removed from prod and reinstalled
Planned, Scheduled and Cancelled Interventions
Entries in/planned to go to GOCDB
|Description||Start||End||Type||Affected VO(s)||Lead by|
|Move to new DB hardware with DataGuard||6 Mar 10:00||6 Mar 14:00||Downtime||All||Richard|
|CIP 2.2.0 upgrade (STC)||TBD||TBD||At-risk||All||Matthew|
- Test and re-apply CIP upgrade
- Move Tier1 instances to new Database infrastructure which with a Dataguard backup instance in R26.
- Stress testing of *11 generation disk servers in preprod during March
- Switch from LSF to Transfer Manager after 2.1.11 upgrade. Will need to better stress-test TM on preprod with more disk servers.
- Start using Tape Gateway once CERN have been using it in production for approx. 2 months.
- Castor on Call person: Chris
- Staff absence/out of the office:
- Shaun at EUDAT (Wed-Fri)
- Rob on training all week