Monday 8th October 2018, 14.30 BST
42 Open UK Tickets this month.
Last update 4/10. There is some recent conversation of this ticket, servers are v6 configured but things are not quite right - perhaps with the v6 routing?
Last update 4/10. Some positive news with hopefully some v6 addresses rolling out soon - but that old bugbear of v6 DNS being a problem is showing up again. Ben finishes his update asking if firewall rules remain the same between v6 and v4.
Last update 10/7. Any news on this front? There seemed to be a lot of exasperation in the last, short post back in July.
Last update 25/9. Some good news here, with the Cambridge perfsonars dualstacked and added to the mesh. Duncan noticed some low throughputs for the v6 traffic, but they are otherwise working.
Last update 27/8. At the last update in August Mark mentioned that their Central I.T. was waiting on some shiny new infrastructure before they could provide v6 DNS. Has a timescale on getting this rolled out appeared in the last 6 weeks? Pressure definitely needs to be applied I think.
Last update 4/6. Steve and Co were waiting on new switches so that their v6 performance wouldn't be terrible, plus there were internal negotiations going on. Any news on any of this?
Last update 10/9 (from Duncan). Any news at all on this? The perfsonar was dual-stacked but as Duncan pointed out no v6 DNS.
Last update 13/7. Kashif once again mentioned v6 DNS and a blocker. Any progress pressuring them?
Last update 4/9. It's not been that long since your last update where your move to Plan B (or is it Plan C, D, or Z?) was mentioned. Any news in that short space of time?
Last (proper) update 16/1. Any news here? Things seemed really positive for a while, but that was 2 seasons ago. Really needs an update.
Last (proper) update 25/4. The perfsonars are dualstacked, but no news on the storage (again due to v6 DNS problems IIRC). Another ticket that really needs any update?
Last update: 6/9. There have been some Ipv6 misadventures at ECDF, but a lot of effort has been put into getting things working. Any luck on getting your pool nodes dualstacked (or finding out when you'll be able to do this)?
Last update 10/7. Things were looking up for a while in the last update, but I take it from the silence since things haven't made much progress?
Back to the regular tickets, site-by-site as is the tradition.
A very fresh CMS ticket for transfer failures to RALPP. Assigned (8/10)
A t2k ticket, where a user notices that you can upload a zero-sized file but you cannot then download it. I'm not sure why this is relevant, but Chris can replicate with the Imperial SE and reckons it's a dcache "feature". It might be that this will go unresolved. In progress (26/9)
Request to upgrade the Glasgow perfsonars. With the release of 4.1 Gareth is working on it, and would like to build the new perfsonars using the docker images. Duncan has suggested giving it a go with the perfsonar-testpoint image to see how things go. In progress (21/9)
A ROD ticket for failed SRM tests, Rob notes likely caused by some storages on a disk server falling over. Being fixed (and indeed the tests are working now). In progress (8/8)
The Durham request to upgrade perfsonar. Adam has put upgrading onto their todo list in the last update. In progress (26/9)
Atlas transfer failures to Sheffield. Acknowledged by the site, but have you had any luck tackling the issue? From today's update by the DDM shifters it looks like it's ongoing. In progress (8/10)
Atlas spotted that SRM space reporting at Manchester was broken. Robert set them straight- it was due to a bug in a draining script moving data outside of the tokens. Fixing this is a slow process, Robert estimated of the order of weeks. On hold (20/9)
Liverpool's SE not working for biomed, due to there being no space left on the communal area. Liverpool have a spacetoken for biomed, but it was going unused. John and Stephan helped the VO with how to query these. I suspect this ticket can be closed soon. In progress (4/10)
Low availability ticket for Lancaster. Being tough to get a clear 30 days due to a collection of downtimes and the Lancaster SE playing up a bit. On hold (8/10)
APEL accounting ticket for QM's slurm batch system. Lots of discussion and a related APEL ticket (118969). I'm not sure if there's much further input to be had from the site for now? In progress (12/9)
A t2k ticket complaining about the QM data access being slow. One of several tickets tackling a known issue with the QMUL STORM (particularly SRM). Dan helpfully provided a bunch of alternatives and suggestions to help the user - so useful it should be documented! In progress (14/9)
A fresh ROD ticket - all SE based tests... Assigned (8/10)
LHCB having file access problems at QM (although I think file metadata access problems would be more exact). The ticket mentions a database move, did you get round to this? In progress (18/9)
LHCB FTS transfer problems - Dan notes that a rack had power problems which required physical intervention. The rack is back on so hopefully transfers will work again. In progress (8/10)
An atlas ticket for the same issues. In progress (8/10)
A request to install singularity from CMS. Dan mentioned right at the start that this would be part of their CentOS7 move. Is this on the horizon? On hold (17/4)
CMS production job stage outs from Brunel to I.C. failing. Daniela cannot reproduce this by hand even though the problems persist for CMS. An environment to test this out by hand has been provided so Raul could try it out on a WN directly. In progress (5/10)
CMS noticed a few transfers failing - a pool node had fallen over and then had filesystem troubles. All fixed now, so we're in the wait and see if things go green stage. Waiting for reply (8/10)
Loosely related to 137468 (this ticket uncovered that issue), CMS stageout failures at Brunel. I didn't quite follow the thread, but diagnostics were being run over the weekend. Did they reveal anything? In progress (5/10)
LHCB data transfer problems at Brunel. A lack of information had made Raul's job debugging this difficult. Vladimir responded with something that could help a bit today. In progress (8/10)
CMS xroot config change ticket. In July a multi-point plan was laid out, how goes it? In progress (3/7)
100IT have a ticket: 137306
Orphaned ticket: 136687
I think this ticket regarding third party http transfers and the FTS can be closed, I'm not sure anyone's looking at it.
THE TIER 1
A ROD ticket due to bdii problems causing SRM test failures. The issues are known about and being worked through, but at last check on Friday the problems persist. In progress (5/10)
Atlas seeing poor transfer efficiency to tape and disk at RAL. Tim narrowed the errors down to a pair of sources. One of the sources (TRIUMF) has spotted the cause at their side (a v6 networking issue I failed to fully understand), it may or may not be a similar problem for EELA-UTFSM. In progress (5/10)
LHCB noticing a high (5%) background failure rate for jobs at RAL. The theory is a network issue or a problem with Castor. Waiting on the submitter to get back from his hols. Waiting for reply (24/9)
LHCB stuck FTS transfers. There's been a long break in looking at this, waiting on Catalin to get back from a well-earned break. On hold (1/10)
A t2k ticket about 0-sized files, asking how to deal with them in the LFC (where they seem to have a bunch of these). It appears to be unrelated to the recent LFC issues. There's some discussion at the Tier 1 about what to do. In progress (25/9)
A fresh CMS ticket regarding some transfer failures to METU (although the error could be at either end). In progress (8/10)
The old ECHO gridftp test ticket - it's looking now like a simple authorisation problem for the test's robot certificate - so maybe we're nearly there fixing this. In progress (8/10)