RAL Tier1 weekly operations castor 14/04/2014
From GridPP Wiki
Contents
Operations News
- The NN_FILE_STAGERTIME constraint has been removed for the Facilities CASTOR database, completing the 2.1.14 upgrade. This upgrade was thought to be transparent, but some daemons didn't reconnect, TM and VMGR is particular. This was fixed by restarting services.
- 2.1.14 upgrade has been repeated on Preprod - this time with the NS Compatibility flag enabled - as it will be in Tier 1 when we do staggered upgrades across the instances after the initial NS upgrade
- The xrootd timeout in castor.conf is now set to 30s for all nodes.
Operations Problems
- 2.1.14 bug was uncovered by Facilities where DiskManager timout (set to 2min) prevented recalled files being returned to users. We've disabled this timeout.
- gdss673 failed after draining and has been removed from CASTOR for Fabric intervention.
- An ATLAS user caused a callout by specifying an incorrect space token on write.
Blocking Issues
- none
Planned, Scheduled and Cancelled Interventions
Entries in/planned to go to GOCDB none
Advanced Planning
Tasks
- Atlas would like to store c2 million EVNT monte carlo files – Brian to discuss with Alastair. Other tier 1s are not keen but RAL tier 1 / castor should be able to cope with this.
- CASTOR 2.1.14 for Tier 1
Interventions
Staffing
- Castor on Call person
- Rob
- Staff absence/out of the office:
- (Mon) Chris A/L
- (Mon-Tues) Matt A/L
- (Mon-Thu) Shaun A/L