RAL Tier1 weekly operations castor 17/01/2011

From GridPP Wiki
Jump to: navigation, search

Operations News

  • Started testing wirth CEDA on facilities
  • Agreed a preliminary structure for CASTOR configuration within Quattor
  • Possible cause of disk servers becoming unresponsive identified as use of TCP by rsyslog. Investigating switching to UDP.
  • Certification now upgraded to 2.1.9-6
  • Several disk servers went into intervention with various errors, mostly in CMS
  • Dicussed merging of ATLAS disk pools. Best approach will be discussed with CERN later this week.

Operations Issues

  • T2K caused DoS on SRM because their production jobs were hitting us with 75k srmLs requests/hr. T2K looking into this.

Blocking issues

  • Lack of production-class hardware running ORACLE 10g needs to be resolved prior to CASTOR for Facilities going into full production

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB

Description Start End Type Affected VO(s) Lead by
Update ATLAS disk servers to SL5 64bit 17/01/2011 08:00 18/12/2011 16:00 Downtime ATLAS MV

Advanced Planning

  • Upgrade CMS, Gen disk servers to SL5 64bit and Quattorize the non-Quattorized disk servers
  • CASTOR certification and upgrade to 2.1.9-10 which incorporates the fix for gridftp-internal to support multiple service classes, enabling checksums for Gen


  • Castor on Call person: Chris
  • Staff absence/out of the office:
    • Matthew, Shaun at CERN (Thu-Fri)
    • Richard (Fri)