Difference between revisions of "RAL Tier1 weekly operations castor 06/01/2014"

From GridPP Wiki
Jump to: navigation, search
 
(No difference)

Latest revision as of 15:57, 6 January 2014

Operations News

  • No major problems over the holiday period and excellent availability.
  • 2.1.14 stress testing on preprod restarted on 2 Jan with the latest configuration changes (xrootd weighings increased within the transfer manager) after instabilities during previous stress testing runs.
  • Elastic logging turned back on for preprod with UDP instead of TCP after it was discovered that it can lead to CASTOR instability.
  • After decreased CASTOR overhead to 1% on 5 production disk servers they have started to fill up with no adverse effects. We will soon roll out the change to all disk servers.

Operations Problems

  • The ATLAS stager started swapping and PoC was called out and restarted it. There appears to be a slight memory leak in the 2.1.13-9 stager which results in swapping if the daemon hasn't been restarted for approx. 2 months in ATLAS which is the busies instance. It takes longer to cause problems in the other instances.

Blocking Issues

  • none

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB

  • none

Advanced Planning

Tasks

  • CASTOR 2.1.14 + SL5/6 testing

Interventions

  • none

Staffing

  • Castor on Call person
    • Matthew
  • Staff absence/out of the office:
    • none