RAL Tier1 weekly operations castor 27/01/2014
From GridPP Wiki
Contents
Operations News
- Testing of 2.1.14 ongoing.
Operations Problems
- Caltech are having problems with writing a file into CASTOR. CASTOR is reporting that the file is already present when there is not evidence of its existence.
- The xroot daemon that runs on the backup CMS transfermanager node failed on Tuesday, showing a segfault in /var/log/messages. The CMS CASTOR instance was down for around 20 minutes. A look back through the logs suggests that this has happened several times over the last 6 months. Adjustments are to be made to puppet to notify us when it restarts a service.
Blocking Issues
- none
Planned, Scheduled and Cancelled Interventions
Entries in/planned to go to GOCDB
- none
Advanced Planning
Tasks
- CASTOR 2.1.14 + SL5/6 testing
- iptables to be installed on lcgcviewer01 to harden the logging system against the injection of junk data by security scans.
- Quattor cleanup process. First step is to deal with 200-odd servers in 'misc'.
Interventions
- none
Staffing
- Castor on Call person
- Matt
- Staff absence/out of the office: