Difference between revisions of "RAL Tier1 weekly operations castor 20/07/2015"
From GridPP Wiki
(Created page with "[https://www.gridpp.ac.uk/wiki/RAL_Tier1_weekly_operations_castor List of CASTOR meetings] == Operations News == * Proposed CASTOR face to face W/C Oct 5th or 12th == Operat...") |
|||
Line 37: | Line 37: | ||
== Actions == | == Actions == | ||
+ | * Shaun to modify cleanlostfiles to log to syslog so we can track its use | ||
* Shaun to look into GC improvements - notify if file in inconsistent state | * Shaun to look into GC improvements - notify if file in inconsistent state | ||
* Shaun to replace canbemigr test - to test for file that have not been migrated to tape (warn 8h / alert 24h) | * Shaun to replace canbemigr test - to test for file that have not been migrated to tape (warn 8h / alert 24h) |
Revision as of 15:45, 20 July 2015
Contents
Operations News
- Proposed CASTOR face to face W/C Oct 5th or 12th
Operations Problems
- CMS still upset. We have asked them to define exactly why their jobs are slow.
- Brian and Shaun investigating double putstart problem
- The gridmap file on the the webdav host lcgcadm04.gridpp.rl.ac.uk is not auto-updating - needed for lhcb and vo.dirac.ac.uk
Blocking Issues
- grid ftp bug in SL6 - stops any globus copy if a client is using a particular library. This is a show stopper for SL6 on disk server.
Planned, Scheduled and Cancelled Interventions
- Stress test SRM poss deploy week after (Shaun)
Advanced Planning
Tasks
- Proposed CASTOR face to face W/C Oct 5th or 12th
- Discussed CASTOR 2017 planning, see wiki page.
Interventions
Staffing
- Castor on Call person next week
- Rob
- Staff absence/out of the office:
- Rob out Monday afternoon
- Chris out Wed morning
Actions
- Shaun to modify cleanlostfiles to log to syslog so we can track its use
- Shaun to look into GC improvements - notify if file in inconsistent state
- Shaun to replace canbemigr test - to test for file that have not been migrated to tape (warn 8h / alert 24h)
- Rob/Jens to look at information provider re DiRAC (reporting disk only etc)
- All to book meeting with Rob re draining / disk deployment / decommissioning ...
- Rob to look into procedural issues with CMS disk server interventions
- Bruno to document processes to control services previously controlled by puppet
- Gareth to arrange meeting castor/fab/production to discuss the decommissioning procedures
- Gareth to investigate providing checks for /etc/noquatto on production nodes & checks for fetch-crl - ONGOING
- Rob to remove Facilities disk servers from cedaRetrieve to go back to Fabric for acceptance testing.
- Rob to get jobs thought to cause CMS pileup
- Bruno to put SL6 on preprod disk
- Bruno / Rob to write change control doc for SL6 disk
- Shaun testing/working gfalcopy rpms
- Someone - mice, what access protocol do they use?
Completed actions
- Rob/Gareth to write some new docs to cover oncall procedures for CMS with the introduction of unscheduled xroot reads
- Rob/Alastair to clarify what we are doing with 'broken' disk servers
- Gareth to ensure that there is a ping test etc to the atlas building