RAL Tier1 weekly operations castor 25/06/2012

From GridPP Wiki
Jump to: navigation, search

Operations News

  • none

Operations Problems

  • Latest tested kernel + errata (kernel-2.6.18-308.4.1.el5 + 20120514) caused daemon crashes on ATLAS SRM lcgsrm03 so we have rolled back to the previous setup (kernel-2.6.18-274.12.1.el5 + 20120305) and will not be updating the SRMs this Wednesday
  • Intermittent failures in CMS VO tests over the weekend

Blocking Issues

none

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB

Description Start End Type Affected VO(s) Lead by
ORACLE 11g upgrade plus server reboots and errata updates (STC) 27/06/12 08:45 27/06/12 14:30 Downtime All Rich

Advanced Planning

Tasks

  • Test and certify 2.1.12-4 (Matthew, Chris)
  • Re-instantiate certification on HyperV VMs using Quattor+Puppet (Rob)
  • Selection of disk-only prototype solution (Shaun, Rob, Brian, James)

Interventions

  • Upgrade repack to 2.1.12-4 (Jun)
  • Upgrade Castor Facilities and Tier1 instances to 2.1.11-9 (Jul)
  • Upgrade to 2.1.12 on Tier1 instances once we are happy with TM and TG in performance (Jul)

Staffing

  • Castor on Call person: Chris
  • Staff absence/out of the office:
    • (Mon) ..