RAL Tier1 weekly operations castor 07/03/2011

From GridPP Wiki
Jump to: navigation, search

Operations News

  • Testing of 2.1.10 NS upgrade completed successfully

Operations Issues

  • On 28/2 the ATLAS JM stopped processing requests at 05:04 for 19 minutes. The automatic strace script did not pick this up, as debugging activity was continuing over this period.
  • On 2/3 the ATLAS Stager ran out of space on /var resulting in an out of hours callout

Blocking Issues

  • Lack of production-class hardware running ORACLE 10g needs to be resolved prior to CASTOR for Facilities going into full production. Have arrived and we are awaiting installation.

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB

Description Start End Type Affected VO(s)
Upgrade NS to 2.1.10-0 and switch EMC to isolating transformer 9 March 08:00 9 March 15:00 Downtime ALL
Upgrade ATLAS to 2.1.10-0 (STC) 28 March 08:00 28 March 16:00 Downtime ATLAS
Upgrade CMS to 2.1.10-0 (STC) 29 March 08:00 29 March 16:00 Downtime CMS
Upgrade LHCb, Gen to 2.1.10-0 (STC) 30 March 08:00 30 March 16:00 Downtime LHCb, Gen

Advanced Planning

  • Move Tier1 instances to new Database infrastructure which with a Dataguard backup instance in R26
  • Move Facilities instance to new Database hardware running 10g
  • Start migrating from T10KA to T10KC media later this year

Staffing

  • Castor on Call person: Chris
  • Castor on Day Duty person: Matthew
  • Staff absence/out of the office:
    • Shaun at LHCb Jamboree (Mon,Tue)
    • Matthew on A/L (Tue PM)