RAL Tier1 weekly operations castor 11/10/2010

From GridPP Wiki
Jump to: navigation, search

Work previous week

  • Matthew:
    • LHCb monitoring & investigating performance problems
    • Planning Castor Facilities work
    • CoD work
  • Shaun:
  • Chris:
    • Castor Facilities work
  • Richard:
    • ..
  • Brian:
    • ..
  • Jens:
    • ..

Operations Issues

  • A number of LHCb CASTOR jobs are failing apparently because of an internal network timeout between disk servers and the stager. LHCb SAM tests seem particularly affected.

Blocking issues

none

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB

Description Start End Type Affected VO(s)
Update Gen to 2.1.9 (STC) 25/10/2010 08:00 27/10/2010 18:00 Downtime Gen
Update CMS to 2.1.9 (STC) 08/11/2010 08:00 10/11/2010 18:00 Downtime CMS
Update ATLAS to 2.1.9 (STC) 22/11/2010 08:00 24/11/2010 18:00 Downtime ATLAS

Advanced Planning

  • Upgrade disk servers to 64bit o/s
  • Upgrade to 2.1.9-8 after all instances are upgraded to 2.1.9-6
  • CASTOR for Facilities instance in production by end of 2010

Staffing

  • Castor on Call person: Chris
  • Staff absences:
    • ..