RAL Tier1 weekly operations castor 27/02/2012
From GridPP Wiki
Revision as of 09:51, 24 February 2012 by Matt viljoen (Talk | contribs)
Contents
Operations News
- ATLAS upgraded to 2.1.11-8
- Puppet upgraded to 2.7.11-1
- 'go-faster stripes' enabled on all 'B' and 'C' tape drives
- preprod now configured with lcgc*03 headnodes (destined for Gen) + preprod NS for Alice xrootd testing
- preprod SRMs now configured with updated RPMs and ready for testing. It is hoped that this will help improve the periodic crashing.
Operations Problems
- ATLAS SRM periodic crashing continuing. Restarter didn't kick in on Thursday, leading to a short time being blacklisted.
- cleanLostFiles running against 5 disk servers caused stager slowdown on Thursday evening. From now on we will run no more than 3 cleanLostFiles threads and none out of hours.
Blocking Issues
- none
Planned, Scheduled and Cancelled Interventions
Entries in/planned to go to GOCDB
Description | Start | End | Type | Affected VO(s) | Lead by |
---|---|---|---|---|---|
CASTOR 2.11-8 LHCb Stager upgrade, inc. move to new hardware+SL5+Quattor | 27/02/2012 08:00 | 27/02/2012 16:00 | Downtime | LHCb | Matthew |
CASTOR 2.11-8 Gen Stager upgrade, inc. move to new hardware+SL5+Quattor | 29/02/2012 08:00 | 29/02/2012 16:00 | Downtime | Gen | Matthew |
CIP 2.2.0 upgrade (STC) | TBD | TBD | At-risk | All | Matthew |
Advanced Planning
- Test and re-apply CIP upgrade
- Move Tier1 instances to new Database infrastructure which with a Dataguard backup instance in R26.
- Stress testing of *11 generation disk servers in preprod during March
- Switch from LSF to Transfer Manager after 2.1.11 upgrade. Will need to better stress-test TM on preprod with more disk servers.
- Start using Tape Gateway once CERN have been using it in production for approx. 2 months.
Staffing
- Castor on Call person: MV
- Staff absence/out of the office:
- ..