RAL Tier1 weekly operations castor 14/11/2011

From GridPP Wiki
Revision as of 08:47, 18 November 2011 by Matt viljoen (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Operations News

  • none

Operations Problems

  • Root certificates were not upgraded on LHCb SRMs causing some pilot jobs to fail. This was fixed on Thursday.
  • CMS jobmanager froze on Friday morning at 0400. This was restarted automatically before it caused problems.

Blocking Issues

  • none

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB none

Advanced Planning

  • Move Tier1 instances to new Database infrastructure which with a Dataguard backup instance in R26
  • Upgrade SRMs to 2.11 which incorporates VOMS support
  • Certify 2.1.11 and evaluate the Transfer Manager (the new LSF replacement)
  • Quattorization of remaining SRM servers
  • Hardware upgrade, Quattorization and Upgrade to SL5 of Tier1 CASTOR headnodes

Staffing

  • Castor on Call person: Chris
  • Staff absence/out of the office:
    • none