RAL Tier1 weekly operations castor 18/03/2013

From GridPP Wiki
Revision as of 13:24, 18 March 2013 by Matt viljoen (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Operations News

  • No problems following the upgraded to 2.1.13-9 CASTOR client of the V09 generation of WNs. Remaining generations are being done this week.
  • rsyslog TCP logging now begin sent from all headnodes to the second central syslog servers. We have requested that Amanda backups be turned off across all CASTOR headnodes apart from puppetmaster and puppetdev.
  • WAN rates seem to have doubled since the C300 replacement on Tuesday.
  • Haven't managed to repeat 2.1.13 db bug after turning on ORACLE auditing and stager debugging yet.

Operations Problems

  • Bug that has been seen on Facilities where recalls fail to be scheduled to a disk server has been seen on CMS. Tim implemented the fix, which involved invalidating all recalls.

Blocking Issues

  • Can't upgrade puppet until someone spends time learning about administering it (to replace Chris) and this may delay an SL6 upgrade

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB none

Advanced Planning

Tasks

  • Test and certify 2.1.13-9 with simplified Quattor templates

Interventions

  • Upgrade tape servers to 2.1.13-9
  • Upgrade central services (NS,CUPV,VDQM) from 2.1.11-9 to 2.1.13-9
  • Upgrade stagers from 2.1.12 to 2.1.13

Staffing

  • Castor on Call person
    • Matthew
  • Staff absence/out of the office:
    • Shaun at ISGC (all week)