Monday 6th November 2017, 15.00 GMT
43 Open UK Tickets this month.
There are 15 "IPv6 deployment at WLCG Tier-2 Sites", only 5 have been acknowledged so far- but then the tickets did land on a Friday. We can discuss these in the meeting.
Storage Accounting Deployment
These were the other big blob of tickets to land in the last week, for Oxford, Birmingham, Durham, Manchester and Brunel. The Brunel ticket mentions the latest monitoring page for this:
http://goc-accounting.grid-support.ac.uk/storagetest/storagesitesystems.html
The 23 remaining regular tickets, site by site:
SUSSEX
122772 (11/7/16)
The webdav/xroot ticket - after rebuilding the system from scratch and getting help from Dan it looks like xroot still isn't playing ball. The last update has a few questions in it that could with some storage experts to weigh in on. In progress (18/10)
RALPP
131328 (25/10)
CMS "low hammercloud xroot success rate" ticket. Chris has been working hard on this, but is left in need of some answers looking at his last post. Waiting for reply (30/10)
131565 (2/11)
A CMS ticket for local stageout failures, due to the "unmerged" area filling up by the looks of it. Chris increased the size of this area, and asked some questions on quotas. Waiting for reply (2/11)
130264 (28/8)
Biomed ticket about invalid publishing from the RALPP CEs. The problem seems to have fixed itself despite no joy on the matching Brunel ticket (130263) and Chris not doing anything. Chris asks if Biomed wants to still track the issue. Waiting for reply (30/10)
OXFORD
129931 (4/8)
Failing atlas http SAM tests at Oxford. Have you tried to upgrade your DPM yet? On hold (19/9)
CAMBRIDGE
130787 (28/9)
LHCB jobs dying at Cambridge, John tried to fix the problem by upping the CPU limit, but the ticket needs feedback from LHCB. Waiting for reply (6/11)
BRISTOL
131590 (3/11)
A Friday ticket from CMS, regarding network links or something (I still don't speak CMS). Assigned (3/11) Update - the two issues are likely related, plus Lukasz sees some other errors and is tidying things up. In Progress.
131641 (6/11)
A fresh CMS ticket, that could be related to the previous one - this is about connection problems killing transfers ("connection limit exceeded" errors). Assigned (6/11) Reducing the number of simultaneous connections and allowing the database to catch up fixed things. Solved.
BIRMINGHAM
129930 (4/8)
Failing atlas http SAM tests at Birmingham. Mark put in a handy update this morning, with plans to reinstall in the next couple of weeks to see if that helps. Thanks Mark! On hold (6/11)
SHEFFIELD
131472 (31/10)
Atlas transfers having trouble at Sheffield. Elena notes that she is working on balancing disk servers, which we all know is slow work. In progress (31/10)
MANCHESTER
131171 (18/10)
Atlas VAC jobs failing at Manchester - this has been discussed heavily on lists and in the atlas uk meetings, and I think some conclusions have been made? Or did I dream that? In progress (24/10)
LIVERPOOL
131623 (4/11)
Atlas deletion error ticket - Steve is on it, the problem being on one of the disk servers. In progress (6/11) Update - solved. The server is back from the brink.
QMUL
130262 (28/8)
Biomed complaining that the QM Storm SE publishing invalid glue2 data. Daniel spotted that this is due to storm not publishing glue2 data at all. Biomed wants to know if there's a ticket about this - perhaps we should suggest they submit one? In progress (27/10)
IMPERIAL
(who win kudos for already closing their IPv6 tickets)
131126 (16/10)
Debugging CMS job problems, after some digging it looks like the jobs are having problems accessing files that they should be able to without issue. A mystery. Waiting for reply (is this still the right status?) (2/11) Update - Daniela couldn't replicate the issue so rightly closed the ticket.
131663 (6/11)
A fresh ticket from Brian, asking to check on the status of a file. Assigned (6/11) Solved with the file declared lost
BRUNEL
130263 (28/8)
The other Biomed publishing ticket, waiting on the ARC devs to patch the patch that Raul has been trying out. On hold (13/10)
TIER 1
131652 (6/11)
Jobs failing with gfal-copy errors, although Brian hasn't been able to replicate them. In progress (6/11)
131213 (19/10)
CMS having issues with fallback requests to RAL, tracked down to some dodgy xrootd servers which were restarted. Waiting on hearing if things are fixed. Waiting for reply (23/10)
131299(24/10)
CMS Hammercloud failure ticket. A hint has been dropped that the error message might be due to root CAs being off on the ECHO server. Has this been checked? In progress (24/10)
130949 (6/10)
CMS transfers failing to RAL disk, the root problem caused by their being no room at the disk servers! Chris has been helping, finding about 100TB unaccounted for. Any luck generating those file lists? In progress (25/10)
130207 (24/8)
A MICE ticket regarding Castor, I think all the issues have been solved, this ticket is just being left open whilst "new" disk servers are freed up to go into the disk pool. How goes that process? On hold (25/10)
127597 (7/4)
A CMS request to check the RAL networking. Gareth updated the ticket a few weeks back with news on the state of the firewall. On hold (5/10)
124876 (7/11/16)
ECHO gridftp ops tests failing, due to the tests not having the right path in them. Alastair has poked the ticket to get the tests fixed (125026) On Hold (25/10)
117683(18/11/15)
Castor not publishing glue 2. This is being worked on slowly in the background, but the ticket could do with a quarterly update. On hold (6/7)
|