Difference between revisions of "RAL Tier1 weekly operations castor 24/12/2012"

From GridPP Wiki
Jump to: navigation, search
 
(No difference)

Latest revision as of 22:57, 21 December 2012

Preparation for Christmas

  • Make sure there is no disk servers in draining status (DONE)
  • Make sure all outstanding ticket regarding to checksum problems are close
  • Verify if vcert is working for Jens' Christmas CIP development/tests

Operations News

  • RAID controller firmware was upgraded on all V08 disk servers to prevent further crashes and transitory data loss of these disk servers.

Operations Problems

  • (Tue) 4 hour network outage due to a failed board on Router A meant all Tier 1 services were inaccessible.
  • Problem with PreProd DB (Fortuna) filling up the opt partition
  • Quattor bug means that root passwords are unusable on all SL5 systems. This has been fixed for all production disk servers as an emergency change this week, in preparation for the Christmas holiday. It has yet to be fixed on other systems.

Blocking Issues

none

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB none

Advanced Planning

Tasks

  • Simplify and document Quattor templates to make them easier to maintain
  • Test and certify 2.1.13-5 with simplified Quattor templates

Interventions

  • Upgrade stagers from 2.1.12 to 2.1.13 and central services (NS,CUPV,VDQM) from 2.1.11 to 2.1.13

Staffing

  • Castor on Call person
    • Matthew
  • Staff absence/out of the office:
    • Christmas Holiday week