RAL Tier1 weekly operations castor 22/04/2016

Agenda:

1.Problems encountered this week

  CMS - AAA
  LHCb job failure

2.Upgrades/improvements made this week

  gfal

3.What are we planning to do next week? 4.Long-term project updates (if not already covered)

  2.1.15 
  Progress
  Planning

5.Special topics 6.Actions 7.Anything for CASTOR-Fabric? 8.AoTechnicalB 9.Availability for next week 10.On-Call 11.AoOtherB

Operations News

gfalcat does not work with castor, underlying issue fixed for gfalcopy but not gfalcat (gfal developers responsible) - Tracking
AtlasScratch, users from atlas still having problems accessing atlasScratch files - investigations ongoing
GDSS771 crashed - now in draining
draining is not working for atlas (does however seem to work on LHCb) - Brian has changed parameters as recommended by Shaun no improvement. manual method of draining still works - diskServerLs and stager_get (to move file to another disk server)

RA has produced a python script to handle SRM db duplication issue which is causing callouts. This script has been tested and will now be put into production, as a cron job. This should be a temporary fix, so a bug report should be made to the FTS development team, via AL.
JJ – Glue 2 for CASTOR, used for publishing information. RA writing data getting end in python, JJ writing Glue 2 end in LISP. No schedule as yet.

Tasks

GS ask Kashif re RAID firmware updates on d0t1 v2011 machines and if there are other batches of machines that should upgraded

BD check if D drives have arrived for WLCG
BD report draining issues to CERN
BD to work with George P (new hire) to hand over the WAN tuning work
RA/AS new tool for monitoring srm db dups - the user type
RA to get someone to code review his SRM_DB_DUPLICATES blatting script
BD mice ticket - asking for a separate tape pool for d0t1 for monticarlo
GS is there any documentation re handling broken CIPs (raised following CIP failure at weekend)
GS Callout for CIP only in waking hours?
RA ensure quattorising atlas consistency check - Rob to talk to Andrew L
RA to try stopping tapeserverd mid-migration to see if it breaks - ask Tim.
RA (was SdW) to modify cleanlostfiles to log to syslog so we can track its use - under testing
GS to investigate how/if we need to declare xrootd endpoints in GOCDB BDII - progress