RAL Tier1 weekly operations castor 05/05/2014

From GridPP Wiki
Jump to: navigation, search

Operations News

  • 3 new V'13 disk servers were deployed into cmsDisk.

Operations Problems

  • cmsDisk was very full, all but three recently added CV'13 disks were full hence resulted in timeouts and a string of callouts. CMS have since deleted many files which has improved matters.
  • One of the 3 new V'13 disk servers installed in cmsDisk on 1st May has failed (others of this revision have also failed before going into production). Issue is currently bring investigated by fabric and remaining 2 servers are to stay in cmsDisk for now.
  • A few SUM test failures for Atlas WE 26/27th April - cause not obvious and issue not reoccurred.

Blocking Issues

  • none

Planned, Scheduled and Cancelled Interventions

  • CASTOR 2.1.14 upgrade for Tier 1. Possible date for first stage of intervention (NS upgrade) is May 27th.
  • Deployment of 2013 generation disk servers.

Advanced Planning

Tasks

  • CASTOR 2.1.14 for Tier 1

Interventions

Staffing

  • Castor on Call person
    • Matt until Tuesday / Rob thereafter
  • Staff absence/out of the office:
    • Chris out Tues/Wed