RAL Tier1 weekly operations castor 10/02/2014

From GridPP Wiki
Jump to: navigation, search

Operations News

  • XROOT with GSI authentication is now enabled on Gen and has been successfully used by T2K
  • The new headnodes for preprod are ready to be deployed.
  • Testing of 2.1.14 ongoing.

Operations Problems

  • We have a persistent problem showing up on our SRMs where queries are not getting a response from the name server. There is a very strong correlation between the incidence of these errors and typical working hours (9am-5am weekdays). Investigations into the cause of this are ongoing.
  • We had a callout on the CMS SRMs on Saturday which was related to the ongoing FTS testing.
  • LHCb has noticed a sudden rise in Input Data Resolution errors at RAL. A new user's DN had not been added to our grid-mapfiles. Investigations showed that the hostcert had expired on lcgccvm02, stopping VOMS handshake when updating grid-mapfiles. We have now implemented a Nagios cert lifetime check on this box.

Blocking Issues

  • none

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB


Advanced Planning

Tasks

  • CASTOR 2.1.14 + SL5/6 testing
  • iptables to be installed on lcgcviewer01 to harden the logging system against the injection of junk data by security scans.
  • Quattor cleanup process is ongoing.
  • Installation of new Preprod headnodes

Interventions

  • none

Staffing

  • Castor on Call person
    • Matt
  • Staff absence/out of the office:
    • None