RAL Tier1 weekly operations castor 13/02/2012

From GridPP Wiki
Revision as of 10:14, 10 February 2012 by Matt viljoen (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Operations News

  • xrootd testing finished on Preprod.

Operations Problems

  • New CIP information was not being published due to caching somewhere. On Thursday, when this was fixed, SAM OPs tests started failing due to not finding GlueSAPath. The CIP was rolled back until we fix the problem.
  • (Fri) During the early hours, the SRM daemons crashed, causing problems for ATLAS between 0400-0600.

Blocking Issues

  • none

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB

Description Start End Type Affected VO(s) Lead by
CASTOR 2.11-8 NS upgrade, inc. move to new hardware+SL5+Quattor 14/02/2012 08:00 24/02/2012 13:00 Downtime All Matthew
CASTOR 2.11-8 CMS Stager upgrade, inc. move to new hardware+SL5+Quattor 20/02/2012 08:00 20/02/2012 16:00 Downtime CMS Matthew
CASTOR 2.11-8 ATLAS Stager upgrade, inc. move to new hardware+SL5+Quattor 22/02/2012 08:00 22/02/2012 16:00 Downtime ATLAS Matthew
CASTOR 2.11-8 LHCb Stager upgrade, inc. move to new hardware+SL5+Quattor 27/02/2012 08:00 27/02/2012 16:00 Downtime LHCb Matthew
CASTOR 2.11-8 Gen Stager upgrade, inc. move to new hardware+SL5+Quattor 29/02/2012 08:00 29/02/2012 16:00 Downtime Gen Matthew

Advanced Planning

  • Move Tier1 instances to new Database infrastructure which with a Dataguard backup instance in R26
  • Switch from LSF to Transfer Manager after 2.1.11 upgrade
  • Start using Tape Gateway once CERN have been using it in production for approx. 2 months.

Staffing

  • Castor on Call person: Chris
  • Staff absence/out of the office:
    • (Wed PM) Matthew A/L