Difference between revisions of "RAL Tier1 weekly operations castor 22/04/2016"
From GridPP Wiki
Line 33: | Line 33: | ||
10.On-Call | 10.On-Call | ||
− | 11.AoOtherB | + | 11.AoOtherB |
== Operations News == | == Operations News == |
Latest revision as of 09:34, 6 May 2016
Agenda:
1. Problems encountered this week
- CMS - AAA
- LHCb job failure
2.Upgrades/improvements made this week
- gfal
3.What are we planning to do next week?
4.Long-term project updates (if not already covered)
- 2.1.15
- Progress
- Planning
5.Special topics
6.Actions
7.Anything for CASTOR-Fabric?
8.AoTechnicalB
9.Availability for next week
10.On-Call
11.AoOtherB
Contents
Operations News
- New MICE user set up
Operations Problems
- gfalcat does not work with castor, underlying issue fixed for gfalcopy but not gfalcat (gfal developers responsible) - Tracking
- AtlasScratch, users from atlas still having problems accessing atlasScratch files - investigations ongoing
- GDSS771 crashed - now in draining
- draining is not working for atlas (does however seem to work on LHCb) - Brian has changed parameters as recommended by Shaun no improvement. manual method of draining still works - diskServerLs and stager_get (to move file to another disk server)
Planned, Scheduled and Cancelled Interventions
- CASTOR 2.1.15
Long-term projects
- RA has produced a python script to handle SRM db duplication issue which is causing callouts. This script has been tested and will now be put into production, as a cron job. This should be a temporary fix, so a bug report should be made to the FTS development team, via AL.
- JJ – Glue 2 for CASTOR, used for publishing information. RA writing data getting end in python, JJ writing Glue 2 end in LISP. No schedule as yet.
Advanced Planning
Tasks
- CASTOR 2.1.15 implementation and testing
- Deployment of SRM 2.14
Staffing
- All in
- Castor on Call person next week
New Actions
- GS ask Kashif re RAID firmware updates on d0t1 v2011 machines and if there are other batches of machines that should upgraded
Existing Actions
- BD check if D drives have arrived for WLCG
- BD report draining issues to CERN
- BD to work with George P (new hire) to hand over the WAN tuning work
- RA/AS new tool for monitoring srm db dups - the user type
- RA to get someone to code review his SRM_DB_DUPLICATES blatting script
- BD mice ticket - asking for a separate tape pool for d0t1 for monticarlo
- GS is there any documentation re handling broken CIPs (raised following CIP failure at weekend)
- GS Callout for CIP only in waking hours?
- RA ensure quattorising atlas consistency check - Rob to talk to Andrew L
- RA to try stopping tapeserverd mid-migration to see if it breaks - ask Tim.
- RA (was SdW) to modify cleanlostfiles to log to syslog so we can track its use - under testing
- GS to investigate how/if we need to declare xrootd endpoints in GOCDB BDII - progress