Difference between revisions of "RAL Tier1 weekly operations castor 29/10/2012"

From GridPP Wiki
Jump to: navigation, search
 
(No difference)

Latest revision as of 15:53, 29 October 2012

Operations News

  • LHCb upgraded to 2.1.12-10. There were two problems following on from this intervention (see below), but unrelated to the CASTOR upgrade itself.

Operations Problems

  • (23/10/12) LHCb headnode lcgclsf04 contracted hardware problems after the reboot during the LHCb upgrade. It had to be transparently swapped out with spare machine lcgsrm03 at around 1615, which involved a DNS alias change. Operational impact was of the order of minutes.
  • (23/10/12) gdss535 was inadvertently reinstalled upon reboot during the LHCb upgrade. It was not moved to production until 1530, but there were ongoing problems due to certificates not being properly installed for gsiftp. This was not fixed until Friday at 1050.

Blocking Issues

Enabling central syslog collection of central service logs is needed before we turn off Amanda backups on all CASTOR headnodes

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB

Description Start End Type Affected VO(s) Lead by
2.1.12-10 Stager upgrade 30/10/12 08:00 30/10/12 14:00 Downtime Gen Matthew

Advanced Planning

Tasks

  • Test and certify 2.1.13-5 with simplified Quattor templates

Interventions

  • Upgrade stagers to 2.1.13 and central services (NS,CUPV,VDQM) from 2.1.11 to 2.1.13

Staffing

  • Castor on Call person
    • Matthew
  • Staff absence/out of the office:
    • (Mon-Wed) Shaun at EUDAT
    • (Tue-Fri) Jens at Contrail
    • (All week) Rob A/L