RAL Tier1 weekly operations castor 17/01/2011
From GridPP Wiki
- Started testing wirth CEDA on facilities
- Agreed a preliminary structure for CASTOR configuration within Quattor
- Possible cause of disk servers becoming unresponsive identified as use of TCP by rsyslog. Investigating switching to UDP.
- Certification now upgraded to 2.1.9-6
- Several disk servers went into intervention with various errors, mostly in CMS
- Dicussed merging of ATLAS disk pools. Best approach will be discussed with CERN later this week.
- T2K caused DoS on SRM because their production jobs were hitting us with 75k srmLs requests/hr. T2K looking into this.
- Lack of production-class hardware running ORACLE 10g needs to be resolved prior to CASTOR for Facilities going into full production
Planned, Scheduled and Cancelled Interventions
Entries in/planned to go to GOCDB
|Description||Start||End||Type||Affected VO(s)||Lead by|
|Update ATLAS disk servers to SL5 64bit||17/01/2011 08:00||18/12/2011 16:00||Downtime||ATLAS||MV|
- Upgrade CMS, Gen disk servers to SL5 64bit and Quattorize the non-Quattorized disk servers
- CASTOR certification and upgrade to 2.1.9-10 which incorporates the fix for gridftp-internal to support multiple service classes, enabling checksums for Gen
- Castor on Call person: Chris
- Staff absence/out of the office:
- Matthew, Shaun at CERN (Thu-Fri)
- Richard (Fri)