RAL Tier1 weekly operations castor 02/04/2012

From GridPP Wiki
Jump to: navigation, search

Operations News

  • (Wed) Installed new LSF licenses (old ones expired on 29/3/12)
  • Upgraded repack from 2.1.10-1 to 2.1.11-8 then 2.1.11-9. Also quattorized the repack headnode.
  • Certification now virtualized, quattorized and puppetized; awaiting testing
  • Transfer Manager is now considered sufficiently tested to be moved into production for the first instance (Gen). However, the setup changes need to be done in Quattor and TM Gangia monitoring needs to be completed.

Operations Problems

  • puppetmaster02 partition filled up, resulting in a callout. Foreman test instance was turned off.

Blocking Issues

  • none

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB

Description Start End Type Affected VO(s) Lead by
CIP 2.2.0 upgrade (STC) TBD TBD At-risk All Matthew

Advanced Planning


  • Test and re-apply CIP upgrade (Jens, Matthew)
  • Test and certify 2.1.12-4 and 2.1.11-9 (Matthew, Chris)
  • Stress testing of Transfer Manager (TM) (Shaun, All) DONE
  • Ganglia monitoring for TM (Rob, Chris)
  • Re-instantiate certification on VMs using Quattor+Puppet (Rob)
  • Stress testing of CV11 generation disk servers on preprod (Rob, Matthew)
  • Selection of disk-only prototype solution (Shaun, Rob, Brian, James)


  • Upgrade repack to 2.1.12-4 (Apr)
  • Switch from LSF to TM after 2.1.11-8 upgrade. Will need to better stress-test TM on preprod with more disk servers. (Apr)
  • Switch to Tape Gateway (TG) once it has been tested on repack (May)
  • Upgrade Castor Facilities and Tier1 instances to 2.1.11-9 (Jun)
  • Upgrade Oracle to 11g (Jun)
  • Upgrade to 2.1.12 on Tier1 instances once we are happy with TM and TG in performance (Jul)


  • Castor on Call person: Shaun
  • Staff absence/out of the office:
    • Chris A/L
    • (Tue) Matthew
    • (Wed,Thu) Shaun
    • (Fri) everyone, Good Friday