RAL Tier1 weekly operations castor 23/01/2012

Operations News

The ORACLE RM problem reoccurred on Preprod during stress testing. A workaround for 10g was provided by ORACLE and was confirmed to work (The problem is fixed in 11g)
(Fri) cmsWanIn and cmsFarmRead diskpools were mered into a common diskpool cmsTape.

(Mon) The ATLAS 2.11 SRM upgrade was unsuccessful and was rolled back due to necessary DB optimization procedures not being carried out.
aliceDisk diskpool filled up which caused operational problems for Alice.
(Sun) More DNS problems caused operational problems for the Tier1, including CASTOR

Entries in/planned to go to GOCDB

Description	Start	End	Type	Affected VO(s)	Lead by
SRM 2.11 upgrade, inc. move to new hardware+SL5+Quattor	23/01/2012 10:00	23/01/2012 12:00	Downtime	CMS	Shaun
SRM 2.11 upgrade, inc. move to new hardware+SL5+Quattor	26/01/2012 10:00	26/01/2012 12:00	Downtime	Gen	Shaun
CIP 2.2.0 upgrade (STC)	26/01/2012 12:00	26/01/2012 15:00	At-risk	All	Matthew
SRM 2.11 upgrade, inc. move to new hardware+SL5+Quattor	30/01/2012 10:00	30/01/2012 12:00	Downtime	CMS	Shaun
SRM 2.11 upgrade, inc. move to new hardware+SL5+Quattor	02/01/2012 10:00	02/01/2012 12:00	Downtime	LHCb	Shaun
Stage 2 of CASTOR DB move (STC)	07/02/2012 08:00	07/02/2012 16:00	Downtime	All	Rich
CASTOR 2.11-8 upgrade, inc. move to new hardware+SL5+Quattor (STC)	13/02/2012 08:00	24/02/2012 16:00	Downtime	All	Matthew

Move Tier1 instances to new Database infrastructure which with a Dataguard backup instance in R26