RAL Tier1 weekly operations castor 18/07/2011

From GridPP Wiki
Jump to: navigation, search

Operations News

  • 7 SL08 disk servers (140TB) deployed to lhcbRawRdst
  • All remaining tape-based service classes GC policy now Last Recently Used (LRU)
  • FT, CT and DB Team have decided to use Amanda as a means of backing up the new Tier1 databases to tape.

Operations Problems

  • Gen SRM problems on night of Wed/Thu
  • On Thu PM ATLAS Stager database got into an inconsistent state (a sub-request without an entry in the id2type table) which caused approx. 2 hours of unscheduled downtime. Cause unknown.

Blocking Issues


Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB none

Advanced Planning

  • Move Tier1 instances to new Database infrastructure which with a Dataguard backup instance in R26
  • Move Facilities DB instance to new Database hardware running 10g
  • Upgrade SRMs to 2.11 which incorporates VOMS support
  • Start migrating from T10KA to T10KC media later this year
  • Certify 2.1.11 and evaluate the new LSF replacement
  • Quattorization of remaining SRM servers
  • Hardware upgrade, Quattorization and Upgrade to SL5 of Tier1 CASTOR headnodes


  • Castor on Call person: Matthew
  • Staff absence/out of the office:
    • (Mon-Wed) Shaun on training