Difference between revisions of "RAL Tier1 weekly operations castor 11/03/2013"

From GridPP Wiki
Jump to: navigation, search
 
(No difference)

Latest revision as of 10:56, 8 March 2013

Operations News

  • Tape verification server (lcgcadm01) now up and running against the Tier 1 CASTOR instances
  • V09 generation of WNs upgraded to 2.1.13-9 CASTOR client
  • rsyslog TCP logging now begin sent from all headnodes, including non-rsyslog compliment daemons (e.g. nsd, xrd)
  • fetch-crl now running every 6 hours on all SRMs

Operations Problems

  • (Sun-Tue) Disk manager bug which stopped all transfers on gdss590 for 2.5 days. Only error in dismanager log was: "Detected stuck ActivityControl thread. Killed connections to transfermanagerd". We haven't seen this error before, but will look out for it in the future.
  • Still DB problems "ORA-32108: max column or parameter size not specified" when testing 2.1.13-9. Have tried instantoracle clients: 11.2.0.3.0-1, 11.2.0.2.0 and 10.2.0.3-4

Blocking Issues

  • Can't upgrade puppet until someone spends time learning about administering it (to replace Chris) and this may delay an SL6 upgrade

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB none

Advanced Planning

Tasks

  • Test and certify 2.1.13-9 with simplified Quattor templates
  • Turn off Amanda backups

Interventions

  • Upgrade tape servers to 2.1.13-9
  • Upgrade central services (NS,CUPV,VDQM) from 2.1.11-9 to 2.1.13-9
  • Upgrade stagers from 2.1.12 to 2.1.13

Staffing

  • Castor on Call person
    • Rob
  • Staff absence/out of the office:
    • Shaun at EUDAT and ISGC (all week)
    • Jens at EUDAT (Mon-Tue) and A/L (Fri)