RAL Tier1 weekly operations castor 20/06/2011

Operations News

Tape servers upgraded to 2.1.10-0
On Thursday DB team changed an OS parameter on all remaining nodes to fix internal logging at ORACLE's request. It will only be known whether this change is successful at the next intervention.
No increased rate of failures on 4 WNs upgraded to CASTOR Client 2.1.10-1

On Wednesday, continuing high load from CMS caused instability on LSF which led to 1h30m of unscheduled downtime. Restarting LSF didn't help, when large logs were re-created upon startup. It only started after all LSF logs were deleted. The logging level has now been reduced on this instance.

Lack of production-class hardware running ORACLE 10g needs to be resolved prior to CASTOR for Facilities can guarantee the same level of service as the Tier1 instances. Has arrived and we are awaiting installation.

Entries in/planned to go to GOCDB

Upgrade of CASTOR clients on WNs to 2.1.10-0
Upgrade Tier1 tape subsystem to 2.1.10-1 which allows us to support files>2TB and T10KC
Move Tier1 instances to new Database infrastructure which with a Dataguard backup instance in R26
Move Facilities DB instance to new Database hardware running 10g
Upgrade SRMs to 2.11 which incorporates VOMS support
Start migrating from T10KA to T10KC media later this year
Quattorization of remaining SRM servers
Hardware upgrade, Quattorization and Upgrade to SL5 of Tier1 CASTOR headnodes