RAL Tier1 weekly operations castor 18/10/2010

From GridPP Wiki
Jump to: navigation, search

Work previous week

  • Matthew:
    • Debugging and fixing zero file sized problems on LHCb
    • Remaining 2.1.9 upgrade planning
  • Shaun:
    • ..
  • Chris:
    • Castor Facilities work
    • Castor on duty person
    • Preparation for Gen upgrade
  • Richard:
    • Prepare for testing GEN instance [ongoing]
    • Prepare Quattor structure for "cert in a box" [ongoing]
  • Brian:
    • ..
  • Jens:
    • ..

Operations Issues

  • The LHCb timeouts were as a result of I/O contention on the database node running the LHCb stager, due to the backup script being located on the same node. The LHCb stager was moved to a different node on 11/10/10 and RAL were unbanned by LHCb afterwards.
  • On 12/10/10 neptune4 rebooted, momentarily affecting the LHCb SRMs
  • On 13/10/10 the index on id2type on the ATLAS stager got corrupted on and the ATLAS instance had to be brought down between 11:15-12:49 for the index to be rebuilt.
  • On 15/10/10 LHCb reported further cases of zero-sized files. This time the cause was an instance of the stager running on the wrong headnode (LSF). The problem was quickly identified and 24 affected files corrected.

Blocking issues

  • Lack of production-class hardware running ORACLE 10g needs to be resolved prior to CASTOR for Facilities going into production

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB

Description Start End Type Affected VO(s)
Update Gen to 2.1.9 25/10/2010 08:00 27/10/2010 18:00 Downtime Gen
Update CMS to 2.1.9 (STC) 08/11/2010 08:00 10/11/2010 18:00 Downtime CMS
Update ATLAS to 2.1.9 (STC) 22/11/2010 08:00 24/11/2010 18:00 Downtime ATLAS

Advanced Planning

  • Upgrade disk servers to 64bit o/s
  • Upgrade to 2.1.9-8 after all instances are upgraded to 2.1.9-6
  • CASTOR for Facilities instance in production by end of 2010


  • Castor on Call person: Chris
  • Staff absences:
    • Shaun (CHEP all week)
    • Matthew (Thu PM)
    • Jens (Mon-Wed)