RAL Tier1 weekly operations castor 25/02/2013

From GridPP Wiki
Jump to: navigation, search

Operations News

  • Functional tests pass under 2.1.13-9, tested with the January errata and kernel. We are now ready to start stress testing this version in preparation for upgrading the Facilities instance.
  • Four production tape servers now upgraded to 2.1.13-9 (3 in Tier 1 and 1 in Facilities). We plan to upgrade the rest this week.

Operations Problems

  • Lost 68 unmigrated files on a Gen disk server (gdss594) which had a double disk failure. This incident is being post mortemed.

Blocking Issues

  • Can't upgrade puppet until someone spends time learning about administering it (to replace Chris) and this may delay an SL6 upgrade
  • aliceDisk still full. ALICE are aware.

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB none

Advanced Planning

Tasks

  • Test and certify 2.1.13-9 with simplified Quattor templates
  • Turn off Amanda backups

Interventions

  • Upgrade tape servers to 2.1.13-9
  • Upgrade central services (NS,CUPV,VDQM) from 2.1.11-9 to 2.1.13-9
  • Upgrade stagers from 2.1.12 to 2.1.13

Staffing

  • Castor on Call person
    • Shaun
  • Staff absence/out of the office:
    • (Tue+Wed AM) Rob A/L