RAL Tier1 weekly operations castor 14/04/2014

Operations News

The NN_FILE_STAGERTIME constraint has been removed for the Facilities CASTOR database, completing the 2.1.14 upgrade. This upgrade was thought to be transparent, but some daemons didn't reconnect, TM and VMGR is particular. This was fixed by restarting services.
2.1.14 upgrade has been repeated on Preprod - this time with the NS Compatibility flag enabled - as it will be in Tier 1 when we do staggered upgrades across the instances after the initial NS upgrade
The xrootd timeout in castor.conf is now set to 30s for all nodes.

2.1.14 bug was uncovered by Facilities where DiskManager timout (set to 2min) prevented recalled files being returned to users. We've disabled this timeout.
gdss673 failed after draining and has been removed from CASTOR for Fabric intervention.
An ATLAS user caused a callout by specifying an incorrect space token on write.

Entries in/planned to go to GOCDB none

Tasks

Atlas would like to store c2 million EVNT monte carlo files – Brian to discuss with Alastair. Other tier 1s are not keen but RAL tier 1 / castor should be able to cope with this.
CASTOR 2.1.14 for Tier 1

Interventions

Castor on Call person
- Rob
Staff absence/out of the office:
- (Mon) Chris A/L
- (Mon-Tues) Matt A/L
- (Mon-Thu) Shaun A/L