RAL Tier1 weekly operations castor 06/01/2014
From GridPP Wiki
- No major problems over the holiday period and excellent availability.
- 2.1.14 stress testing on preprod restarted on 2 Jan with the latest configuration changes (xrootd weighings increased within the transfer manager) after instabilities during previous stress testing runs.
- Elastic logging turned back on for preprod with UDP instead of TCP after it was discovered that it can lead to CASTOR instability.
- After decreased CASTOR overhead to 1% on 5 production disk servers they have started to fill up with no adverse effects. We will soon roll out the change to all disk servers.
- The ATLAS stager started swapping and PoC was called out and restarted it. There appears to be a slight memory leak in the 2.1.13-9 stager which results in swapping if the daemon hasn't been restarted for approx. 2 months in ATLAS which is the busies instance. It takes longer to cause problems in the other instances.
Planned, Scheduled and Cancelled Interventions
Entries in/planned to go to GOCDB
- CASTOR 2.1.14 + SL5/6 testing
- Castor on Call person
- Staff absence/out of the office: