RAL Tier1 weekly operations castor 23/01/2012

From GridPP Wiki
Jump to: navigation, search

Operations News

  • The ORACLE RM problem reoccurred on Preprod during stress testing. A workaround for 10g was provided by ORACLE and was confirmed to work (The problem is fixed in 11g)
  • (Fri) cmsWanIn and cmsFarmRead diskpools were mered into a common diskpool cmsTape.

Operations Problems

  • (Mon) The ATLAS 2.11 SRM upgrade was unsuccessful and was rolled back due to necessary DB optimization procedures not being carried out.
  • aliceDisk diskpool filled up which caused operational problems for Alice.
  • (Sun) More DNS problems caused operational problems for the Tier1, including CASTOR

Blocking Issues

  • none

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB

Description Start End Type Affected VO(s) Lead by
SRM 2.11 upgrade, inc. move to new hardware+SL5+Quattor 23/01/2012 10:00 23/01/2012 12:00 Downtime CMS Shaun
SRM 2.11 upgrade, inc. move to new hardware+SL5+Quattor 26/01/2012 10:00 26/01/2012 12:00 Downtime Gen Shaun
CIP 2.2.0 upgrade (STC) 26/01/2012 12:00 26/01/2012 15:00 At-risk All Matthew
SRM 2.11 upgrade, inc. move to new hardware+SL5+Quattor 30/01/2012 10:00 30/01/2012 12:00 Downtime CMS Shaun
SRM 2.11 upgrade, inc. move to new hardware+SL5+Quattor 02/01/2012 10:00 02/01/2012 12:00 Downtime LHCb Shaun
Stage 2 of CASTOR DB move (STC) 07/02/2012 08:00 07/02/2012 16:00 Downtime All Rich
CASTOR 2.11-8 upgrade, inc. move to new hardware+SL5+Quattor (STC) 13/02/2012 08:00 24/02/2012 16:00 Downtime All Matthew

Advanced Planning

  • Move Tier1 instances to new Database infrastructure which with a Dataguard backup instance in R26

Staffing

  • Castor on Call person: Shaun
  • Staff absence/out of the office:
    • none