RAL Tier1 weekly operations castor 25/10/2010

From GridPP Wiki
Jump to: navigation, search

Work previous week

  • Matthew:
    • CaF planning and in discussions resolving hardware for running ORACLE 10g
    • Investigating the crashing ATLAS SRMs
    • Writing change control document with JT for 64 bit disk server update
  • Shaun:
    • ..
  • Chris:
    • Castor Facilities work
    • Castor on duty person
    • Preparation for Gen upgrade
    • Discussing with Atlas how to manage permissions in Castor
  • Richard:
    • Started running functional tests on Facilities instance
  • Brian:
    • ..
  • Jens:
    • ..

Operations Issues

  • On 19/10/10 there were two attempts of a Unique Constraint Violation on the LHCB Stager, noticed by the db team, but without any ill effect on CASTOR. This was confirmed as normal behaviour by the CERN CASTOR team.
  • On 20/10/10 an ATLAS user was executing disallowed workflows (to get around the fact that ATLAS end users are not allowed to run jobs at RAL), resulting in repeatedly crashing all ATLAS SRM machines. The ATLAS user was subsequently banned. ATLAS are looking to ways at allowing non-privileged users onto RAL.
  • CMS repack activities coincided with a larger number of recalls, which resulted in recall delays and inconsistent states, which required tapes streams to be manually reset.

Blocking issues

  • Lack of production-class hardware running ORACLE 10g needs to be resolved prior to CASTOR for Facilities going into production

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB

Description Start End Type Affected VO(s)
Update Gen to 2.1.9-6 25/10/2010 08:00 27/10/2010 18:00 Downtime Gen
Update CMS to 2.1.9-6 (STC) 08/11/2010 08:00 10/11/2010 18:00 Downtime CMS
Update ATLAS to 2.1.9-6 (STC) 22/11/2010 08:00 24/11/2010 18:00 Downtime ATLAS

Advanced Planning

  • Upgrade disk servers to 64bit o/s
  • CASTOR upgrade to 2.1.9-10 and SRM upgrade to 2.10 to fix the unavailable status being reported to FTS with draining disk servers
  • CASTOR for Facilities instance in production by end of 2010

Staffing

  • Castor on Call person: Matthew
  • Staff absence/out of the office:
    • Jens (Mon-Fri)