RAL Tier1 weekly operations castor 30/01/2012

From GridPP Wiki
Revision as of 15:53, 30 January 2012 by Matt viljoen (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Operations News

  • CMS and Gen successfully upgraded to SRM 2.11
  • Stress testing with the Transfer Manager has uncovered a number of problems that require further investigation. We have decided not to use the TM immediately after upgrading.

Operations Problems

  • LHCb own tests caused a high number of failures within LSF on Wednesday, that possibly adversely impacted end users.

Blocking Issues

  • none

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB

Description Start End Type Affected VO(s) Lead by
SRM 2.11 upgrade, inc. move to new hardware+SL5+Quattor 30/01/2012 10:00 30/01/2012 12:00 Downtime ATLAS Shaun
SRM 2.11 upgrade, inc. move to new hardware+SL5+Quattor 02/02/2012 10:00 02/02/2012 12:00 Downtime LHCb Shaun
CIP 2.2.0 upgrade (STC) 02/02/2012 12:00 22/02/2012 15:00 At-risk All Matthew
Stage 2 of CASTOR DB move (STC) 07/02/2012 08:00 07/02/2012 16:00 Downtime All Rich
CASTOR 2.11-8 upgrade, inc. move to new hardware+SL5+Quattor (STC) 13/02/2012 08:00 24/02/2012 16:00 Downtime All Matthew

Advanced Planning

  • Move Tier1 instances to new Database infrastructure which with a Dataguard backup instance in R26
  • Switch from LSF to Transfer Manager after 2.1.11 upgrade
  • Start using Tape Gateway once CERN have been using it in production for approx. 2 months.

Staffing

  • Castor on Call person: Matthew
  • Staff absence/out of the office:
    • (Mon-Wed) Chris at Contrail conference