RAL Tier1 weekly operations castor 01/04/2016

From GridPP Wiki
Jump to: navigation, search

Operations News

  • NSS patching on Tier1 was successful
  • 8*2014 generation disk nodes deployed to atlasStripInput to ensure 2016-7 storage pledges are met.
  • CERN suggested 2.1.16 deployed to tape servers (Steve Murray)

Operations Problems

  • draining is not working for atlas (does however seem to work on LHCb) - is this due to file count?
  • transfermanager on atlas dlf was not performing TM tasks but was reporting as being up
  • 2.1.15 Problems re config required for production to solve slow file open times - Andrey reports that CERN use 100GB of memory for DB servers in castor to run 2.1.15 (vs our 32GB), Oracle are not providing adequate support at the moment. 2.1.15 deployment will not be scheduled at the moment.

Planned, Scheduled and Cancelled Interventions

  • CASTOR 2.1.15


Long-term projects

  • RA has produced a python script to handle SRM db duplication issue which is causing callouts. This script has been tested and will now be put into production, as a cron job. This should be a temporary fix, so a bug report should be made to the FTS development team, via AL.
  • JJ – Glue 2 for CASTOR, used for publishing information. RA writing data getting end in python, JJ writing Glue 2 end in LISP. No schedule as yet.

Advanced Planning

Tasks

  • CASTOR 2.1.15 implementation and testing
  • Deployment of SRM 2.14

Staffing

  • Castor on Call person next week
    • RA


New Actions

  • AS & RA to chase MB about the new DB hardware
  • RA to write change control document for new SRM version.
  • RA/BD to experiment with draining problems
  • RA to get someone to code review his SRM_DB_DUPLICATES blatting script.

Existing Actions

  • BD mice ticket - asking for a separate tape pool for d0t1 for monticarlo
  • GS is there any documentation re handling broken CIPs (raised following CIP failure at weekend)
  • GS Callout for CIP only in waking hours?
  • BD to clarify if separating the DiRAC data is a necessity
  • BD ensure quattorising atlas consistency check
  • BD re. WAN tuning proposal - discuss with GS, does it need a change control?
  • RA to try stopping tapeserverd mid-migration to see if it breaks - ask Tim.
  • RA (was SdW) to modify cleanlostfiles to log to syslog so we can track its use - under testing
  • GS to investigate how/if we need to declare xrootd endpoints in GOCDB BDII - progress