RAL Tier1 weekly operations castor 06/01/2014

Operations News

No major problems over the holiday period and excellent availability.
2.1.14 stress testing on preprod restarted on 2 Jan with the latest configuration changes (xrootd weighings increased within the transfer manager) after instabilities during previous stress testing runs.
Elastic logging turned back on for preprod with UDP instead of TCP after it was discovered that it can lead to CASTOR instability.
After decreased CASTOR overhead to 1% on 5 production disk servers they have started to fill up with no adverse effects. We will soon roll out the change to all disk servers.

The ATLAS stager started swapping and PoC was called out and restarted it. There appears to be a slight memory leak in the 2.1.13-9 stager which results in swapping if the daemon hasn't been restarted for approx. 2 months in ATLAS which is the busies instance. It takes longer to cause problems in the other instances.

Entries in/planned to go to GOCDB

Tasks

Interventions