RAL Tier1 weekly operations castor 20/09/2013
From GridPP Wiki
Contents
Operations News
- All cmsTape disk servers scheduled for redeployment have been removed from CASTOR
- ATLAS have modified timing of their deletion scripts for ATLASSCRATCHDISK to circumvent timeout problems observed.
- The underlying cause of these is still not understood
- Kernel and errata updates have been performed on preprod, with Bruno and Chris working on vcert
- Rob believes he has a solution to stop SRMs logging into older files but needs a kernel update and reboot.
- porstoned until next week
Operations Problems
- There was an dblink errror observed on production (as seen before on facilities)
- Problem was resolved in the same manor with a (almost) minimum of downtime
- This should also be applied to the standby (John to investigate)
- ATLAS hammercloud tasts have shown large but intermittent failure rates; the cause is under investigation (Alastair)
- Brian belives there are about 1.5M files in scratch disk which are dark
- Shaun to start off a namespace dump of scratch disk
- No progress on HBASE logging
Blocking Issues
- none
Planned, Scheduled and Cancelled Interventions
Entries in/planned to go to GOCDB
- none
Advanced Planning
Tasks
- CASTOR 2.1.14 + SL6 testing, once 2.1.14 is released.
Interventions
- none
Staffing
- Castor on Call person
- Rob
- Staff absence/out of the office:
- (Mon-Thu) Matt at RDA
- (Wed-Thu) Shaun at EUDAT