RAL Tier1 weekly operations castor 27/02/2012
From GridPP Wiki
- ATLAS upgraded to 2.1.11-8
- Puppet upgraded to 2.7.11-1
- 'go-faster stripes' enabled on all 'B' and 'C' tape drives
- preprod now configured with lcgc*03 headnodes (destined for Gen) + preprod NS for Alice xrootd testing
- preprod SRMs now configured with updated RPMs and ready for testing. It is hoped that this will help improve the periodic crashing.
- ATLAS SRM periodic crashing continuing. Restarter didn't kick in on Thursday, leading to a short time being blacklisted.
- cleanLostFiles running against 5 disk servers caused stager slowdown on Thursday evening. From now on we will run no more than 3 cleanLostFiles threads and none out of hours.
Planned, Scheduled and Cancelled Interventions
Entries in/planned to go to GOCDB
|Description||Start||End||Type||Affected VO(s)||Lead by|
|CASTOR 2.11-8 LHCb Stager upgrade, inc. move to new hardware+SL5+Quattor||27/02/2012 08:00||27/02/2012 16:00||Downtime||LHCb||Matthew|
|CASTOR 2.11-8 Gen Stager upgrade, inc. move to new hardware+SL5+Quattor||29/02/2012 08:00||29/02/2012 16:00||Downtime||Gen||Matthew|
|CIP 2.2.0 upgrade (STC)||TBD||TBD||At-risk||All||Matthew|
- Test and re-apply CIP upgrade
- Move Tier1 instances to new Database infrastructure which with a Dataguard backup instance in R26.
- Stress testing of *11 generation disk servers in preprod during March
- Switch from LSF to Transfer Manager after 2.1.11 upgrade. Will need to better stress-test TM on preprod with more disk servers.
- Start using Tape Gateway once CERN have been using it in production for approx. 2 months.
- Castor on Call person: MV
- Staff absence/out of the office: