RAL Tier1 weekly operations castor 09/12/2013

From GridPP Wiki
Revision as of 21:23, 6 December 2013 by Matt viljoen (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Operations News

  • Stress testing from the batch farm (based on xrootd) has started against 2.1.14-5. No major issues found yet.
  • 2.1.14 logging browser prototype now working with pure rsyslog+Logstash (i.e. no HBASE/HDFS)
  • Testing with the new SHA2 certificates (personal and host) with CASTOR preprod has confirmed that everything works.

Operations News (ongoing)

  • We have decreased CASTOR overhead to 1% on 5 production disk servers. We are still waiting until they fill up and will be monitoring them closely before rolling out the change to everything.

Operations Problems

  • A small number (10^2) ATLAS files identified as lost during their mass rename - but nothing recently.

Blocking Issues

  • none

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB

  • none

Advanced Planning

Tasks

  • CASTOR 2.1.14 + SL5/6 testing

Interventions

  • none

Staffing

  • Castor on Call person
    • Matthew
  • Staff absence/out of the office:
    • (Mon/Tue) Shaun, Rob, Bruno, Chris, Tim at CASTOR F2F at CERN
    • (Mon/Tue) Matthew working from home; available for operations support.
    • (Thu AM) Matthew A/L
    • (Fri) Matthew A/L