RAL Tier1 weekly operations castor 24/02/2014

From GridPP Wiki
Revision as of 15:59, 24 February 2014 by Rob appleyard (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Operations News

  • Now increased the threads on ATLAS Stager and Transfermanager (in addition to the NS) in an attempt to reduce occurrences of "threads busy with CASTOR" errors we see on the SRMs. The number has gone down, but it is unclear whether this is due to lighter load.
  • T2K's tape recall problems have been fixed by some adjustments to CASTOR settings and an increased timeout on the transfers.
  • The new disk server generation will be deployed into preprod for CASTOR testing in the next week.

Operations Problems

  • We had some CMS SUM test failures between Tuesday and Thursday which were believed to be due to load on the disk servers.

Blocking Issues

  • none

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB


Advanced Planning

Tasks

  • CASTOR 2.1.14 + SL5/6 testing. The change control has gone through today with few problems.
  • iptables to be installed on lcgcviewer01 to harden the logging system against the injection of junk data by security scans.
  • Quattor cleanup process is ongoing.
  • Installation of new Preprod headnodes

Interventions

  • none

Staffing

  • Castor on Call person
    • Matthew
  • Staff absence/out of the office:
    • Matt A/L (Thu)