RAL Tier1 weekly operations castor 10/12/2012
From GridPP Wiki
- New configuration of rsyslog has now been tested to work against non-rsyslog logs (e.g. xrootd, nsd) which means that once rolled out, we can turn off backups on headnodes
- Tape verification script now tested to be working at RAL. This is like a tape version of Shaun's checksumValidator script on disk servers.
- New CIP is ready for testing which fixes the bug whereby some service classes wrongly report an UNDEFINED path in CASTOR.
- (Mon) Poor performance on ATLAS stager. Stats were rebuilt, but this caused numerous locking sessions, which did not disappear when the stats rebuilding was halted, and only disappeared when the node hosting the ATLAS stager was restarted.
- (Tue) There appeared to be a transient network failure for ~5 minutes around 07:55 - which affected batch, transfers and the castor db.
Planned, Scheduled and Cancelled Interventions
Entries in/planned to go to GOCDB none
- Simplify and document Quattor templates to make them easier to maintain
- Test and certify 2.1.13-5 with simplified Quattor templates
- Upgrade stagers from 2.1.12 to 2.1.13 and central services (NS,CUPV,VDQM) from 2.1.11 to 2.1.13
- Castor on Call person
- Staff absence/out of the office:
- (Mon) Matthew A/L
- (Mon-Wed) Chris at SDB user meeting, The Hague
- (Mon-Wed) Brian at ATLAS Jamboree, CERN
- (Thu-Fri) DS Group Away Day, DL