RAL Tier1 weekly operations castor 22/04/2016
From GridPP Wiki
Revision as of 09:32, 6 May 2016 by Rob Appleyard 7f7797b74a (Talk | contribs)
Agenda:
1.Problems encountered this week
CMS - AAA LHCb job failure
2.Upgrades/improvements made this week
gfal
3.What are we planning to do next week? 4.Long-term project updates (if not already covered)
2.1.15 Progress Planning
5.Special topics 6.Actions 7.Anything for CASTOR-Fabric? 8.AoTechnicalB 9.Availability for next week 10.On-Call 11.AoOtherB
Contents
Operations News
- New MICE user set up
Operations Problems
- gfalcat does not work with castor, underlying issue fixed for gfalcopy but not gfalcat (gfal developers responsible) - Tracking
- AtlasScratch, users from atlas still having problems accessing atlasScratch files - investigations ongoing
- GDSS771 crashed - now in draining
- draining is not working for atlas (does however seem to work on LHCb) - Brian has changed parameters as recommended by Shaun no improvement. manual method of draining still works - diskServerLs and stager_get (to move file to another disk server)
Planned, Scheduled and Cancelled Interventions
- CASTOR 2.1.15
Long-term projects
- RA has produced a python script to handle SRM db duplication issue which is causing callouts. This script has been tested and will now be put into production, as a cron job. This should be a temporary fix, so a bug report should be made to the FTS development team, via AL.
- JJ – Glue 2 for CASTOR, used for publishing information. RA writing data getting end in python, JJ writing Glue 2 end in LISP. No schedule as yet.
Advanced Planning
Tasks
- CASTOR 2.1.15 implementation and testing
- Deployment of SRM 2.14
Staffing
- All in
- Castor on Call person next week
New Actions
- GS ask Kashif re RAID firmware updates on d0t1 v2011 machines and if there are other batches of machines that should upgraded
Existing Actions
- BD check if D drives have arrived for WLCG
- BD report draining issues to CERN
- BD to work with George P (new hire) to hand over the WAN tuning work
- RA/AS new tool for monitoring srm db dups - the user type
- RA to get someone to code review his SRM_DB_DUPLICATES blatting script
- BD mice ticket - asking for a separate tape pool for d0t1 for monticarlo
- GS is there any documentation re handling broken CIPs (raised following CIP failure at weekend)
- GS Callout for CIP only in waking hours?
- RA ensure quattorising atlas consistency check - Rob to talk to Andrew L
- RA to try stopping tapeserverd mid-migration to see if it breaks - ask Tim.
- RA (was SdW) to modify cleanlostfiles to log to syslog so we can track its use - under testing
- GS to investigate how/if we need to declare xrootd endpoints in GOCDB BDII - progress