Difference between revisions of "RAL Tier1 weekly operations castor 07/04/2014"

From GridPP Wiki
Jump to: navigation, search
(Created page with "== Operations News == * Facilities CASTOR was successfully upgraded to 2.1.14-11 == Operations Problems == * CMS load continues to cause problems, we had to restart transfer/...")
 
Line 1: Line 1:
 
== Operations News ==
 
== Operations News ==
 
* Facilities CASTOR was successfully upgraded to 2.1.14-11
 
* Facilities CASTOR was successfully upgraded to 2.1.14-11
 +
* 2.1.14 upgrade has been repeated on Preprod - this time with the NS Compatibility flag enabled - as it will be in Tier 1 when we do staggered upgrades across the instances after the initial NS upgrade
  
 
== Operations Problems ==
 
== Operations Problems ==
* CMS load continues to cause problems, we had to restart transfer/diskmanagers to get things working again (Monday 10:45 and Tuesday 17:30)
+
* 2.1.14 bug was uncovered by Facilities where DiskManager timout (set to 2min) prevented recalled files being returned to users. We've disabled this timeout.
* transfermanagerd restarted on fdscdlf02 Thursday
+
* vcert srm and name server not accessible due to issues with hypervisor after rack move, possibly some config required to bring it back. Dimitrios is looking into this
+
* We had a node crash on Neptune causing brief issues with Atlas srm, known issue has already been logged with Oracle
+
  
 
== Blocking Issues ==
 
== Blocking Issues ==

Revision as of 13:20, 4 April 2014

Operations News

  • Facilities CASTOR was successfully upgraded to 2.1.14-11
  • 2.1.14 upgrade has been repeated on Preprod - this time with the NS Compatibility flag enabled - as it will be in Tier 1 when we do staggered upgrades across the instances after the initial NS upgrade

Operations Problems

  • 2.1.14 bug was uncovered by Facilities where DiskManager timout (set to 2min) prevented recalled files being returned to users. We've disabled this timeout.

Blocking Issues

  • none

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB none

Advanced Planning

Tasks

  • Atlas would like to store c2 million EVNT monte carlo files – Brian to discuss with Alastair. Other tier 1s are not keen but RAL tier 1 / castor should be able to cope with this.

Interventions

Staffing

  • Castor on Call person
    • (Mon-Wed) Matthew
    • (Thu-Fri) Rob?
  • Staff absence/out of the office:
    • (Mon-Fri) Chris A/L
    • (Mon-Wed) Matt in DL then First Aid training
    • (Thu-Fri) Matt A/L