RAL Tier1 weekly operations castor 17/01/2011

Operations News

Started testing wirth CEDA on facilities
Agreed a preliminary structure for CASTOR configuration within Quattor
Possible cause of disk servers becoming unresponsive identified as use of TCP by rsyslog. Investigating switching to UDP.
Certification now upgraded to 2.1.9-6
Several disk servers went into intervention with various errors, mostly in CMS
Dicussed merging of ATLAS disk pools. Best approach will be discussed with CERN later this week.

T2K caused DoS on SRM because their production jobs were hitting us with 75k srmLs requests/hr. T2K looking into this.

Lack of production-class hardware running ORACLE 10g needs to be resolved prior to CASTOR for Facilities going into full production

Entries in/planned to go to GOCDB

Description	Start	End	Type	Affected VO(s)	Lead by
Update ATLAS disk servers to SL5 64bit	17/01/2011 08:00	18/12/2011 16:00	Downtime	ATLAS	MV

Upgrade CMS, Gen disk servers to SL5 64bit and Quattorize the non-Quattorized disk servers
CASTOR certification and upgrade to 2.1.9-10 which incorporates the fix for gridftp-internal to support multiple service classes, enabling checksums for Gen

Castor on Call person: Chris
Staff absence/out of the office:
- Matthew, Shaun at CERN (Thu-Fri)
- Richard (Fri)