RAL Tier1 weekly operations castor 20/09/2013

Operations News

All cmsTape disk servers scheduled for redeployment have been removed from CASTOR
ATLAS have modified timing of their deletion scripts for ATLASSCRATCHDISK to circumvent timeout problems observed.
- The underlying cause of these is still not understood
Kernel and errata updates have been performed on preprod, with Bruno and Chris working on vcert
Rob believes he has a solution to stop SRMs logging into older files but needs a kernel update and reboot.
- porstoned until next week

There was an dblink errror observed on production (as seen before on facilities)
- Problem was resolved in the same manor with a (almost) minimum of downtime
- This should also be applied to the standby (John to investigate)
ATLAS hammercloud tasts have shown large but intermittent failure rates; the cause is under investigation (Alastair)
Brian belives there are about 1.5M files in scratch disk which are dark
- Shaun to start off a namespace dump of scratch disk
No progress on HBASE logging

Entries in/planned to go to GOCDB

Tasks

Interventions

Castor on Call person
- Rob
Staff absence/out of the office:
- (Mon-Thu) Matt at RDA
- (Wed-Thu) Shaun at EUDAT