Difference between revisions of "RAL Tier1 weekly operations castor 06/01/2014"
From GridPP Wiki
Matt viljoen (Talk | contribs) |
(No difference)
|
Latest revision as of 15:57, 6 January 2014
Contents
Operations News
- No major problems over the holiday period and excellent availability.
- 2.1.14 stress testing on preprod restarted on 2 Jan with the latest configuration changes (xrootd weighings increased within the transfer manager) after instabilities during previous stress testing runs.
- Elastic logging turned back on for preprod with UDP instead of TCP after it was discovered that it can lead to CASTOR instability.
- After decreased CASTOR overhead to 1% on 5 production disk servers they have started to fill up with no adverse effects. We will soon roll out the change to all disk servers.
Operations Problems
- The ATLAS stager started swapping and PoC was called out and restarted it. There appears to be a slight memory leak in the 2.1.13-9 stager which results in swapping if the daemon hasn't been restarted for approx. 2 months in ATLAS which is the busies instance. It takes longer to cause problems in the other instances.
Blocking Issues
- none
Planned, Scheduled and Cancelled Interventions
Entries in/planned to go to GOCDB
- none
Advanced Planning
Tasks
- CASTOR 2.1.14 + SL5/6 testing
Interventions
- none
Staffing
- Castor on Call person
- Matthew
- Staff absence/out of the office:
- none