Difference between revisions of "RAL Tier1 weekly operations castor 17/01/2011"
From GridPP Wiki
Matt viljoen (Talk | contribs) |
(No difference)
|
Latest revision as of 11:46, 17 January 2011
Contents
Operations News
- Started testing wirth CEDA on facilities
- Agreed a preliminary structure for CASTOR configuration within Quattor
- Possible cause of disk servers becoming unresponsive identified as use of TCP by rsyslog. Investigating switching to UDP.
- Certification now upgraded to 2.1.9-6
- Several disk servers went into intervention with various errors, mostly in CMS
- Dicussed merging of ATLAS disk pools. Best approach will be discussed with CERN later this week.
Operations Issues
- T2K caused DoS on SRM because their production jobs were hitting us with 75k srmLs requests/hr. T2K looking into this.
Blocking issues
- Lack of production-class hardware running ORACLE 10g needs to be resolved prior to CASTOR for Facilities going into full production
Planned, Scheduled and Cancelled Interventions
Entries in/planned to go to GOCDB
Description | Start | End | Type | Affected VO(s) | Lead by |
---|---|---|---|---|---|
Update ATLAS disk servers to SL5 64bit | 17/01/2011 08:00 | 18/12/2011 16:00 | Downtime | ATLAS | MV |
Advanced Planning
- Upgrade CMS, Gen disk servers to SL5 64bit and Quattorize the non-Quattorized disk servers
- CASTOR certification and upgrade to 2.1.9-10 which incorporates the fix for gridftp-internal to support multiple service classes, enabling checksums for Gen
Staffing
- Castor on Call person: Chris
- Staff absence/out of the office:
- Matthew, Shaun at CERN (Thu-Fri)
- Richard (Fri)