RAL Tier1 weekly operations castor 27/01/2014

From GridPP Wiki
Revision as of 15:49, 27 January 2014 by Rob appleyard (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Operations News

  • Testing of 2.1.14 ongoing.

Operations Problems

  • Caltech are having problems with writing a file into CASTOR. CASTOR is reporting that the file is already present when there is not evidence of its existence.
  • The xroot daemon that runs on the backup CMS transfermanager node failed on Tuesday, showing a segfault in /var/log/messages. The CMS CASTOR instance was down for around 20 minutes. A look back through the logs suggests that this has happened several times over the last 6 months. Adjustments are to be made to puppet to notify us when it restarts a service.

Blocking Issues

  • none

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB

  • none

Advanced Planning

Tasks

  • CASTOR 2.1.14 + SL5/6 testing
  • iptables to be installed on lcgcviewer01 to harden the logging system against the injection of junk data by security scans.
  • Quattor cleanup process. First step is to deal with 200-odd servers in 'misc'.

Interventions

  • none

Staffing

  • Castor on Call person
    • Matt
  • Staff absence/out of the office: