Difference between revisions of "RAL Tier1 weekly operations castor 05/05/2014"

From GridPP Wiki
Jump to: navigation, search
(Created page with ".")
 
 
(One intermediate revision by one user not shown)
Line 1: Line 1:
.
+
== Operations News ==
 +
* 3 new V'13 disk servers were deployed into cmsDisk.
 +
 
 +
== Operations Problems ==
 +
* cmsDisk was very full, all but three recently added CV'13 disks were full hence resulted in timeouts and a string of callouts. CMS have since deleted many files which has improved matters.
 +
* One of the 3 new V'13 disk servers installed in cmsDisk on 1st May has failed (others of this revision have also failed before going into production). Issue is currently bring investigated by fabric and remaining 2 servers are to stay in cmsDisk for now.
 +
* A few SUM test failures for Atlas WE 26/27th April - cause not obvious and issue not reoccurred.
 +
 
 +
== Blocking Issues ==
 +
* none
 +
 
 +
== Planned, Scheduled and Cancelled Interventions ==
 +
* CASTOR 2.1.14 upgrade for Tier 1. Possible date for first stage of intervention (NS upgrade) is May 27th.
 +
* Deployment of 2013 generation disk servers.
 +
 
 +
== Advanced Planning ==
 +
'''Tasks'''
 +
 
 +
* CASTOR 2.1.14 for Tier 1
 +
 
 +
'''Interventions'''
 +
 
 +
== Staffing ==
 +
* Castor on Call person
 +
** Matt until Tuesday / Rob thereafter
 +
* Staff absence/out of the office:
 +
** Chris out Tues/Wed

Latest revision as of 15:56, 2 May 2014

Operations News

  • 3 new V'13 disk servers were deployed into cmsDisk.

Operations Problems

  • cmsDisk was very full, all but three recently added CV'13 disks were full hence resulted in timeouts and a string of callouts. CMS have since deleted many files which has improved matters.
  • One of the 3 new V'13 disk servers installed in cmsDisk on 1st May has failed (others of this revision have also failed before going into production). Issue is currently bring investigated by fabric and remaining 2 servers are to stay in cmsDisk for now.
  • A few SUM test failures for Atlas WE 26/27th April - cause not obvious and issue not reoccurred.

Blocking Issues

  • none

Planned, Scheduled and Cancelled Interventions

  • CASTOR 2.1.14 upgrade for Tier 1. Possible date for first stage of intervention (NS upgrade) is May 27th.
  • Deployment of 2013 generation disk servers.

Advanced Planning

Tasks

  • CASTOR 2.1.14 for Tier 1

Interventions

Staffing

  • Castor on Call person
    • Matt until Tuesday / Rob thereafter
  • Staff absence/out of the office:
    • Chris out Tues/Wed