RAL Tier1 weekly operations castor 27/02/2012

Operations News

ATLAS upgraded to 2.1.11-8
Puppet upgraded to 2.7.11-1
'go-faster stripes' enabled on all 'B' and 'C' tape drives
preprod now configured with lcgc*03 headnodes (destined for Gen) + preprod NS for Alice xrootd testing
preprod SRMs now configured with updated RPMs and ready for testing. It is hoped that this will help improve the periodic crashing.

ATLAS SRM periodic crashing continuing. Restarter didn't kick in on Thursday, leading to a short time being blacklisted.
cleanLostFiles running against 5 disk servers caused stager slowdown on Thursday evening. From now on we will run no more than 3 cleanLostFiles threads and none out of hours.

Entries in/planned to go to GOCDB

Description	Start	End	Type	Affected VO(s)	Lead by
CASTOR 2.11-8 LHCb Stager upgrade, inc. move to new hardware+SL5+Quattor	27/02/2012 08:00	27/02/2012 16:00	Downtime	LHCb	Matthew
CASTOR 2.11-8 Gen Stager upgrade, inc. move to new hardware+SL5+Quattor	29/02/2012 08:00	29/02/2012 16:00	Downtime	Gen	Matthew
CIP 2.2.0 upgrade (STC)	TBD	TBD	At-risk	All	Matthew

Test and re-apply CIP upgrade
Move Tier1 instances to new Database infrastructure which with a Dataguard backup instance in R26.
Stress testing of *11 generation disk servers in preprod during March
Switch from LSF to Transfer Manager after 2.1.11 upgrade. Will need to better stress-test TM on preprod with more disk servers.
Start using Tape Gateway once CERN have been using it in production for approx. 2 months.