RAL Tier1 weekly operations castor 10/12/2012

From GridPP Wiki
Jump to: navigation, search

Operations News

  • New configuration of rsyslog has now been tested to work against non-rsyslog logs (e.g. xrootd, nsd) which means that once rolled out, we can turn off backups on headnodes
  • Tape verification script now tested to be working at RAL. This is like a tape version of Shaun's checksumValidator script on disk servers.
  • New CIP is ready for testing which fixes the bug whereby some service classes wrongly report an UNDEFINED path in CASTOR.

Operations Problems

  • (Mon) Poor performance on ATLAS stager. Stats were rebuilt, but this caused numerous locking sessions, which did not disappear when the stats rebuilding was halted, and only disappeared when the node hosting the ATLAS stager was restarted.
  • (Tue) There appeared to be a transient network failure for ~5 minutes around 07:55 - which affected batch, transfers and the castor db.

Blocking Issues


Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB none

Advanced Planning


  • Simplify and document Quattor templates to make them easier to maintain
  • Test and certify 2.1.13-5 with simplified Quattor templates


  • Upgrade stagers from 2.1.12 to 2.1.13 and central services (NS,CUPV,VDQM) from 2.1.11 to 2.1.13


  • Castor on Call person
    • Matthew
  • Staff absence/out of the office:
    • (Mon) Matthew A/L
    • (Mon-Wed) Chris at SDB user meeting, The Hague
    • (Mon-Wed) Brian at ATLAS Jamboree, CERN
    • (Thu-Fri) DS Group Away Day, DL