RAL Tier1 weekly operations castor 26/12/2010

From GridPP Wiki
Jump to: navigation, search

Operations News

  • Secondary job managers installed on LSF machines of remaining instances (LHCb, Gen) which will guard us against the intermittent bug when the JM stops processing requests for no reason.

Operations Issues

  • ..

Blocking issues

  • Lack of production-class hardware running ORACLE 10g needs to be resolved prior to CASTOR for Facilities going into full production

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB

Description Start End Type Affected VO(s) Lead by
Update ATLAS disk servers to SL5 64bit 17/01/2011 08:00 18/12/2011 16:00 Downtime ATLAS MV

Advanced Planning

  • CASTOR for Facilities instance in production by end of 2010
  • Upgrade ATLAS, CMS, Gen disk servers to SL5 64bit and Quattorize the non-Quattorized disk servers
  • CASTOR certification and upgrade to 2.1.9-10 which incorporates the fix for gridftp-internal to support multiple service classes, enabling checksums for Gen
  • CASTOR upgrade to 2.1.9-10 and SRM upgrade to 2.10 to fix the unavailable status being reported to FTS with draining disk servers


  • Castor on Call person: Chris
  • Staff absence/out of the office:
    • (Christmas holiday - cover from home only)