RAL Tier1 weekly operations castor 02/01/2012

Operations News

atlasStager var partition close to the limit on 26th Dec
readonly disk server gdss307(cmsWanIn) on 27th Dec
large number of errors in aliceDisk on 28th Dec, investigation showed all disk servers were busy writing files and some of them were timing out
efficiency around 40% for aliceDisk on 29th Dec due to similar problem as on 28th Dec. Number of xrootd job slots were changed from 50 to 100 to accommodate all requests which were timing out
atlasStager var partition close to the limit and puppetmaster02 was unresponsive which was restarted on 31st Dec

Entries in/planned to go to GOCDB

Description	Start	End	Type	Affected VO(s)	Lead by
Stage 1 of move to new CASTOR DB hardware	05/01/2012 08:30	05/01/2012 16:00	Downtime	All	Rich
SRM 2.11 upgrade, inc. move to new hardware+SL5+Quattor (STC)	16/01/2012 08:00	18/01/2012 16:00	Downtime	All	Shaun
CIP 2.2.0 upgrade (STC)	26/01/2012 10:00	26/01/2012 12:00	At-risk	All	Matthew
Stage 2 of CASTOR DB move (STC)	07/02/2012 08:00	07/02/2012 16:00	Downtime	All	Rich
CASTOR 2.11-8 upgrade, inc. move to new hardware+SL5+Quattor (STC)	13/02/2012 08:00	24/02/2012 16:00	Downtime	All	Matthew

Move Tier1 instances to new Database infrastructure which with a Dataguard backup instance in R26