RAL Tier1 weekly operations castor 18/06/2012

From GridPP Wiki
Jump to: navigation, search

Operations News

  • Successful upgrade of all Tier 1 instances to 2.1.11-9

Operations Problems

  • John Gordon pointed out that we are still over-reporting on disk capacity for Gen due to the shared nature of the service
  • (Tue) 35min network outage in morning
  • Problems between RAL and FZK since the interventions this week. It's not clear that this is anything to do with CASTOR.
  • (Fri) Following the Gen LSF machine partition going read-only, the DLF machine was reconfigured with all relevant services. Downtime approx. 30min

Blocking Issues


Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB

Description Start End Type Affected VO(s) Lead by
ORACLE 11g upgrade plus server reboots and errata updates (STC) 27/06/12 09:00 27/06/12 17:00 Downtime All Rich

Advanced Planning


  • Test and certify 2.1.12-4 (Matthew, Chris)
  • Re-instantiate certification on HyperV VMs using Quattor+Puppet (Rob)
  • Selection of disk-only prototype solution (Shaun, Rob, Brian, James)


  • Upgrade repack to 2.1.12-4 (Jun)
  • Upgrade Castor Facilities and Tier1 instances to 2.1.11-9 (Jul)
  • Upgrade to 2.1.12 on Tier1 instances once we are happy with TM and TG in performance (Jul)


  • Castor on Call person: Shaun
  • Staff absence/out of the office:
    • (Mon) Matthew - ambulance service training