Difference between revisions of "RAL Tier1 weekly operations castor 27/11/2012"

From GridPP Wiki
Jump to: navigation, search
 
(No difference)

Latest revision as of 15:38, 23 November 2012

Operations News

  • ..

Operations Problems

  • (Tue) Another power outage brought all services down at around 12:00. CASTOR did not return to service till Thurs at 14:40.
  • Memory leak problems are continuing on Gen and ATLAS. A 6 hourly restarter has been implemented.

Blocking Issues

Enabling central syslog collection of central service logs is needed before we turn off Amanda backups on all CASTOR headnodes

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB none

Advanced Planning

Tasks

  • Simplify and document Quattor templates to make them easier to maintain
  • Test and certify 2.1.13-5 with simplified Quattor templates

Interventions

  • Upgrade stagers from 2.1.12 to 2.1.13 and central services (NS,CUPV,VDQM) from 2.1.11 to 2.1.13

Staffing

  • Castor on Call person
    • Chris
  • Staff absence/out of the office:
    • Jens (most of the week)
    • (Thu PM-Fri) Matt, Shaun, Chris, Rob, Rich at CASTOR F2F, CERN