Difference between revisions of "Past Ticket Bulletins 2018"

From GridPP Wiki
Jump to: navigation, search
Line 1: Line 1:
 +
'''Monday 5th November 2018, 14.00 GMT'''<br />
 +
41 Open UK Tickets this month
 +
 +
'''SUSSEX'''<br />
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=138071 138071] (2/11)<br />
 +
A fresh ticket from atlas about SRM problems. The lack of links in the ticket made it hard for Leo to debug, and he has asked for clarification. Waiting for reply (2/11)
 +
 +
'''BIRMINGHAM'''<br />
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=138026 138026] (31/10)<br />
 +
A ticket concerning the alice VOBOX at Birmingham. It looks like the problem went away on its on and this ticket can be closed, but there appear that there will be other conversations to have about alice needs at Birmingham at a later date. In Progress (can be closed) (3/11)
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=137801 137801] (17/10)<br />
 +
A Birmingham related ticket rather then a ticket for the site, the tracking of the decommissioning of their DPM SE. I can't quite remember how things are properly done, but shouldn't this be put On Hold until the 28th November? In progress (22/10)
 +
 +
'''BRISTOL'''<br />
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=138041 138041] (1/11)<br />
 +
A CMS ticket concerning failing transfers. Lukasz has traced the problem due to the files being on disk but not in the namespace, and emailed the dpm support list for help fixing this (if a fix is possible). In progress (5/11)
 +
 +
'''OXFORD'''<br />
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=137941 137941] (25/10)<br />
 +
Sno+ had problems accessing data on the Oxford SE due to BDII issues. Kashif fixed the IPv6 routing problems that were the cause of these, and things are working once again. Another ticket that can be closed. In progress (30/10) ''Update - closed''
 +
 +
'''GLASGOW'''<br />
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=134689 134689] (23/4)<br />
 +
Request to upgrade Perfsonar boxes to CentOS7. Gareth gave his plan and (good) reasons why they won't be able to do this just yet at Glasgow - getting v6 working comes first. On hold (30/10)
 +
 +
'''ECDF'''<br />
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=137985 137985] (29/10)<br />
 +
Atlas deletion errors at Edinburgh. Andy is reckoning this is a consistency problem as the system tries to delete files that aren't there anymore, and has asked if it's lots of different file deletion attempts failing or the same few deletion attempts failing repeatedly. I used to have a dodgy bash script that could help with that (by working on the downloaded xml from the DDM pages), but I don't think it made it off of our old SE I'm afraid. Waiting for reply (1/11)
 +
 +
'''DURHAM'''<br />
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=134687 134687] (23/4)<br />
 +
Request to upgrade Perfsonar to CentOS7. It was mentioned verbally that this has been postponed to be part of "CentOS 7 Big Push" early next year, could that be put into the ticket. Be aware that Perfsonar support on SL6 will end soon though (we're already in the "grace period"). In progress (26/9)
 +
 +
'''SHEFFIELD'''<br />
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=137732 137732] (15/10)<br />
 +
One of the ROD availability tickets, waiting for the time to pass. There's been a good stretch of green 100%s in the argo monitoring, so things are looking good. On hold (15/10)
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=138095 138095] (5/11)<br />
 +
Another ROD ticket, this is for the "APEL-Pub" tests. Set In Progress (5/11) ''Solved- test gone green''
 +
 +
'''MANCHESTER'''<br />
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=137112 137112] (11/9)<br />
 +
An atlas ticket about Manchester's Space Token numbers being broken after trouble with a draining script moving data outside of the tokens. The process to move them back was expected to take weeks. How are things looking now? Tim provided some figures from rucio a few weeks back, but that picture might be out of date now. On Hold (16/10)
 +
 +
'''LANCASTER'''<br />
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=136635 136635] (9/8)<br />
 +
A very long running low availability ticket caused by issues with Lancaster's SE. The recent problems were caused by the CertLifetime check not working for a while during the move to DOME was underway, returning an "Unknown" status. As every other aspect of our DPM worked in that time I've requested a recomputation, which may or may not be a bit cheeky. On Hold (5/11)
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=137996 137996] (30/10)<br />
 +
Another ROD ticket (sorry ROD shifters), this time failing a non-critical http test. The issue has been tracked to a problem in the DOME code and a fix will hopefully be out this month. Until then... On Hold (5/11)
 +
 +
'''QMUL'''<br />
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=134573 134573] (17/4)<br />
 +
CMS request to install singularity, on hold until the QM move to CentOS7. CMS has re-poked the ticket, asking again for the site's plans. On Hold (31/10) ''Update - thanks for the, err, update.''
 +
 +
'''BRUNEL'''<br />
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=133956 133956] (9/3)<br />
 +
CMS ticket regarding Brunel's xroot configs. Raul has done some work involving DOME in the background, perhaps some of that progress could be used to update the ticket? In progress (16/10)
 +
 +
'''TIER 1'''<br />
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=138033 138033] (1/11)<br />
 +
Atlas singularity jobs failing at RAL, with some reference to similar issues for SKA. It's being looked at, and Tim has provided some extra observations. In progress (1/11)
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=137650 137650] (9/10)<br />
 +
CMS seeing low HC xroot success rates at RAL. Lots of back and forth on the ticket, I don't think a conclusion has been reached yet though. In progress (2/11)
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=138077 138077] (2/11)<br />
 +
CMS SAM tests failing at RAL. Things seem to have healed themselves, but John has asked some team members to check the logs. In progress (5/11)
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=138103 138103] (5/11)<br />
 +
CMS transfers failing - the cause looks to be a zero-sized "stub file" causing issues, and it's being investigated. In progress (5/11)
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=138002 138002] (29/10)<br />
 +
CMS problems with the FTS, with a lot of sites seeing "bad transfer quality". Investigations pointed to a IPv6 problem that has since been fixed. However Gareth couldn't see an endemic issue with the RAL FTS whilst looking through the plots, and has asked for clarification. Waiting for reply (5/11) ''Update - closed, the bad periods were too short in timescale to show up on the plots.''
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=137897 137897] (23/10)<br />
 +
enmr.eu jobs have zero normalised CPU hours in the accounting portals. It seems to have been a problem with the data the site reported. Catalin has asked the VO if anything changed around the 16th of October, and to resubmit some more jobs so they can watch them. Waiting for reply (5/11)
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=137752 137752] (15/10)<br />
 +
A request to replicate the OSG cvmfs repositories on the EGI stratum 1s. These have been replicated to the RAL servers, so I don't know what the next steps are. In progress (2/11)
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=136199 136199] (18/7)<br />
 +
One of a few LHCB FTS tickets, it looks like the work here has progressed nicely so maybe this ticket can be closed? Or is it waiting on all LHCB FTS issues to be solved? In progress (1/11) ''Update - this ticket has been closed, the other issues are tracked elsewhere.''
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=137822 137822] (18/10)<br />
 +
FTS servers seemingly in a bad state for LHCB. I think this is being worked on, but no news in this particular ticket. In progress (22/10)
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=138028 138028] (1/11)<br />
 +
LHCB noticing files cannot be staged from tape to disk. The issue is somewhat understood (there is a copy of the file on disk elsewhere), but it's unsure why the resulting disk-to-disk transfer fails so the ticket is being kept open. In progress (1/11)
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=136701 136701] (14/8)<br />
 +
LHCB ticket regarding the high background rate of failures putting job output into RAL. I don't think a conclusion has been reached, the last update has Chris from LHCB collecting some more stats and saying that LHCB hope to be using direct xroot connections in the not too distant future. In progress (17/10) ''Update, closed as per a conversation at last week's Tier 1 Liason meeting,''
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=137153 137153] (12/9)<br />
 +
T2K having trouble with zero sized files in the LFC. The LFC devs have been contacted for help, but that was a few weeks back. Any news from them? In progress (10/10)
 +
 
'''Monday 29th October 2018, 15.00 GMT'''<br />
 
'''Monday 29th October 2018, 15.00 GMT'''<br />
 
32 Open UK Tickets this week.
 
32 Open UK Tickets this week.

Revision as of 14:17, 12 November 2018

Monday 5th November 2018, 14.00 GMT
41 Open UK Tickets this month

SUSSEX
138071 (2/11)
A fresh ticket from atlas about SRM problems. The lack of links in the ticket made it hard for Leo to debug, and he has asked for clarification. Waiting for reply (2/11)

BIRMINGHAM
138026 (31/10)
A ticket concerning the alice VOBOX at Birmingham. It looks like the problem went away on its on and this ticket can be closed, but there appear that there will be other conversations to have about alice needs at Birmingham at a later date. In Progress (can be closed) (3/11)

137801 (17/10)
A Birmingham related ticket rather then a ticket for the site, the tracking of the decommissioning of their DPM SE. I can't quite remember how things are properly done, but shouldn't this be put On Hold until the 28th November? In progress (22/10)

BRISTOL
138041 (1/11)
A CMS ticket concerning failing transfers. Lukasz has traced the problem due to the files being on disk but not in the namespace, and emailed the dpm support list for help fixing this (if a fix is possible). In progress (5/11)

OXFORD
137941 (25/10)
Sno+ had problems accessing data on the Oxford SE due to BDII issues. Kashif fixed the IPv6 routing problems that were the cause of these, and things are working once again. Another ticket that can be closed. In progress (30/10) Update - closed

GLASGOW
134689 (23/4)
Request to upgrade Perfsonar boxes to CentOS7. Gareth gave his plan and (good) reasons why they won't be able to do this just yet at Glasgow - getting v6 working comes first. On hold (30/10)

ECDF
137985 (29/10)
Atlas deletion errors at Edinburgh. Andy is reckoning this is a consistency problem as the system tries to delete files that aren't there anymore, and has asked if it's lots of different file deletion attempts failing or the same few deletion attempts failing repeatedly. I used to have a dodgy bash script that could help with that (by working on the downloaded xml from the DDM pages), but I don't think it made it off of our old SE I'm afraid. Waiting for reply (1/11)

DURHAM
134687 (23/4)
Request to upgrade Perfsonar to CentOS7. It was mentioned verbally that this has been postponed to be part of "CentOS 7 Big Push" early next year, could that be put into the ticket. Be aware that Perfsonar support on SL6 will end soon though (we're already in the "grace period"). In progress (26/9)

SHEFFIELD
137732 (15/10)
One of the ROD availability tickets, waiting for the time to pass. There's been a good stretch of green 100%s in the argo monitoring, so things are looking good. On hold (15/10)

138095 (5/11)
Another ROD ticket, this is for the "APEL-Pub" tests. Set In Progress (5/11) Solved- test gone green

MANCHESTER
137112 (11/9)
An atlas ticket about Manchester's Space Token numbers being broken after trouble with a draining script moving data outside of the tokens. The process to move them back was expected to take weeks. How are things looking now? Tim provided some figures from rucio a few weeks back, but that picture might be out of date now. On Hold (16/10)

LANCASTER
136635 (9/8)
A very long running low availability ticket caused by issues with Lancaster's SE. The recent problems were caused by the CertLifetime check not working for a while during the move to DOME was underway, returning an "Unknown" status. As every other aspect of our DPM worked in that time I've requested a recomputation, which may or may not be a bit cheeky. On Hold (5/11)

137996 (30/10)
Another ROD ticket (sorry ROD shifters), this time failing a non-critical http test. The issue has been tracked to a problem in the DOME code and a fix will hopefully be out this month. Until then... On Hold (5/11)

QMUL
134573 (17/4)
CMS request to install singularity, on hold until the QM move to CentOS7. CMS has re-poked the ticket, asking again for the site's plans. On Hold (31/10) Update - thanks for the, err, update.

BRUNEL
133956 (9/3)
CMS ticket regarding Brunel's xroot configs. Raul has done some work involving DOME in the background, perhaps some of that progress could be used to update the ticket? In progress (16/10)

TIER 1
138033 (1/11)
Atlas singularity jobs failing at RAL, with some reference to similar issues for SKA. It's being looked at, and Tim has provided some extra observations. In progress (1/11)

137650 (9/10)
CMS seeing low HC xroot success rates at RAL. Lots of back and forth on the ticket, I don't think a conclusion has been reached yet though. In progress (2/11)

138077 (2/11)
CMS SAM tests failing at RAL. Things seem to have healed themselves, but John has asked some team members to check the logs. In progress (5/11)

138103 (5/11)
CMS transfers failing - the cause looks to be a zero-sized "stub file" causing issues, and it's being investigated. In progress (5/11)

138002 (29/10)
CMS problems with the FTS, with a lot of sites seeing "bad transfer quality". Investigations pointed to a IPv6 problem that has since been fixed. However Gareth couldn't see an endemic issue with the RAL FTS whilst looking through the plots, and has asked for clarification. Waiting for reply (5/11) Update - closed, the bad periods were too short in timescale to show up on the plots.

137897 (23/10)
enmr.eu jobs have zero normalised CPU hours in the accounting portals. It seems to have been a problem with the data the site reported. Catalin has asked the VO if anything changed around the 16th of October, and to resubmit some more jobs so they can watch them. Waiting for reply (5/11)

137752 (15/10)
A request to replicate the OSG cvmfs repositories on the EGI stratum 1s. These have been replicated to the RAL servers, so I don't know what the next steps are. In progress (2/11)

136199 (18/7)
One of a few LHCB FTS tickets, it looks like the work here has progressed nicely so maybe this ticket can be closed? Or is it waiting on all LHCB FTS issues to be solved? In progress (1/11) Update - this ticket has been closed, the other issues are tracked elsewhere.

137822 (18/10)
FTS servers seemingly in a bad state for LHCB. I think this is being worked on, but no news in this particular ticket. In progress (22/10)

138028 (1/11)
LHCB noticing files cannot be staged from tape to disk. The issue is somewhat understood (there is a copy of the file on disk elsewhere), but it's unsure why the resulting disk-to-disk transfer fails so the ticket is being kept open. In progress (1/11)

136701 (14/8)
LHCB ticket regarding the high background rate of failures putting job output into RAL. I don't think a conclusion has been reached, the last update has Chris from LHCB collecting some more stats and saying that LHCB hope to be using direct xroot connections in the not too distant future. In progress (17/10) Update, closed as per a conversation at last week's Tier 1 Liason meeting,

137153 (12/9)
T2K having trouble with zero sized files in the LFC. The LFC devs have been contacted for help, but that was a few weeks back. Any news from them? In progress (10/10)

Monday 29th October 2018, 15.00 GMT
32 Open UK Tickets this week.

Ding Dong, the Ticket is Dead
124876
I'm happy to announce the solving of our oldest ticket, this Tier 1 ROD ticket from 2016. Thanks for keeping at it guys!

RHUL
131603 (3/11/17)
The only IPv6 ticket I'll look at this week (although thanks to everyone who has updated their ticket in the last week). Simon reports that RHUL are farming the job of v6 DNS out to JANET. Could this be an option for others to suggest to their IT departments? In progress (29/10)

GLASGOW and DURHAM
134689
134687
These two "please upgrade your perfsonar boxes" tickets are a bit neglected. I think there's been work on these, just nothing reported on the tickets.

QMUL
134573 (17/4)
This CMS request to install singularity could do with an update now that Summer has been and gone, even if it's just a new hopeful date for deployment. On hold (17/4)

BRISTOL
137789 (16/10)
This CMS transfer failure ticket had a good start, but no news on it for a few weeks. It was looking positive in the last update though, so maybe it's solved? In Progress (17/10)

TIER 1
137153 (12/9)
Did the LFC experts come back with a solution to this T2K user's problem? In Progress (10/10)

137822
136199
Both these tickets concern LHCB and the FTS - are there any joy with either of them?

Monday 22nd October 2018, 14.30 BST
37 Open UK Tickets this week.

BIRMINGHAM
137801 (17/10)
For information only, the ticket charting the decommissioning of Birmingham's DPM SE. In Progress (17/10)

LANCASTER
136635 (9/8)
Lancaster is having no luck shifting this availability alarm due to SRM CertLifetime checks erroring constantly, causing 100% "Unknown" status. Has anyone seen this before? Could we blag that it's a problem with the check? On hold (8/10)

TIER 1
137195 (14/9)
Any luck fixing the Castor LDIFs to clear these ROD alarms? (I don't seem to have permission to check the links to the tests myself). In progress (10/10)

136199 (18/7)
LHCB FTS ticket. Last update was a prod from Christophe last week asking if there's any news. In progress (17/10)

IPv6 Tickets
(with reference to the IPv6 site status table)

RALPP: 131616
No update for a long while on either the ticket or the table after some promising progress. In Progress (31/1)

OXFORD: 131615
Last update on the ticket back in July describes v6 as the blocker, is this still the case? On Hold (13/7)

CAMBRIDGE: 131614
Some nice progress with the site perfsonar boxen dual-stacked. The table is also up to date. Nice. In progress (8/10)

BRISTOL: 131613
Some recent discussion with CMS and Duncan, xroot works for v6 but srm/gridftp doesn't. I think steps have been made to fix this (from an entry in the table) but I'm not sure what they were! In progress (11/10)

BIRMINGHAM: 131612
Mark's update back in August painted a familiar picture of central IT dragging their heels. As ALICE consider v6 a necessary thing Mark has been asked to keep up the pressure. No recent update on the table. On hold (27/8)

GLASGOW: 131611
Some good progress here, with some v6-ining (or dual-stacking) of the perfsonar boxes and a hope to be able to dual-stack the storage by the end of the year. Duncan spotted a few problems (that might be ip6table related), but I understand Gareth was off last week. The table entry is up to date too, sterling stuff. In progress (12/10)

EDINBURGH: 131610
Dual-stacking has caused a lot of interesting times at ECDF. Rob gave a good update on their situation last month, and the table is up to date. Interesting that v6 traffic won't go through the JANET link for the site (it must have at one point, right?). Any recent news about dual-stacking your disk nodes? In Progress (10/9)

DURHAM: 131609
Adam provided an update last week, with (again) reverse IPv6 DNS being a sticking point (despite v6 traffic otherwise flowing fine). Adam's expected timeframe for being able to do a full v6 rollout is mid-2019. Whilst it's depressing it's good to know this ahead of time. No update on the table since 2015 though, but looking at the last entry there you could just change the date. In progress (should be on hold?) (17/10)

SHEFFIELD: 131608
Things looked a bit hopeful with work in this area (re-)scheduled to start in July, but it looks like thing stalled. Any news? The table mirrors the last update. On hold (10/7)

MANCHESTER: 131607
The perfsonars are dual-stacked but IIUI v6 reverse DNS is again the sticking point that prevents dual-stacking the storage. The table entry is more up to date then the ticket. On hold (25/4)

LIVERPOOL: 131606
The Liver-admins were waiting on new routers needed to sort out the site's v6 problems back in June. Any news at all since? Bad news is better then no news in this case. In progress (On hold it?) (4/6)

UCL: 131604
An update today that Ben is hoping to reinstall and dual-stack their perfsonar this week. As they have no storage that will be them sorted if it all goes to plan. In progress (22/10)

RHUL: 131603
Perfsonars are dual-stacked, but the v6 reverse lookup is a problem again which is a problem for storage. Table and ticket are up to date, but has this work been properly handed over to Antonio and Simon? In progress (10/9)

Tuesday 16th October 2018, 10.30 BST

33 Open UK Tickets today - down 10 from last week!

Just a few tickets light up:

TIER 1
124876 (7/11/2016)
This ancient ROD ticket really needs some input. I think the problem might on the ECHO side now, as I suspect the changes to the batch environment have been rolled out (which caused the recent change in error message). In Progress (23/7) (There was a later update, but that was by me so I'm not counting it).

LIVERPOOL
137458 (28/9)
I would say that this biomed ticket has been successfully dealt with by the site. You've given them a terabyte of space and the instructions to use and monitor it, what more could a VO want? In Progress (4/10)

RALPP
137361 (24/9)
Any word back from the dcache mailing list about this zero sized file feature? It might be that this is a ticket that we just need to "unsolve" and move on from. In progress (26/9)

IPv6 Tickets
I'll go over all the v6 tickets next week, so you have something to look forward to. Kudos to Glasgow for dual-stacking their perfsonars this last week though! Please can sites update their tickets and the IPv6 status table if they haven't done so in the last ~6 weeks, and if you've updated one but not t'other can you make sure they "align": https://www.gridpp.ac.uk/wiki/IPv6_site_status


Monday 8th October 2018, 14.30 BST
42 Open UK Tickets this month.

IPv6 Tickets
BRISTOL: 131613
Last update 4/10. There is some recent conversation of this ticket, servers are v6 configured but things are not quite right - perhaps with the v6 routing?

UCL: 131604
Last update 4/10. Some positive news with hopefully some v6 addresses rolling out soon - but that old bugbear of v6 DNS being a problem is showing up again. Ben finishes his update asking if firewall rules remain the same between v6 and v4.

DURHAM: 131609
Last update 10/7. Any news on this front? There seemed to be a lot of exasperation in the last, short post back in July.

CAMBRIDGE: 131614
Last update 25/9. Some good news here, with the Cambridge perfsonars dualstacked and added to the mesh. Duncan noticed some low throughputs for the v6 traffic, but they are otherwise working.

BIRMINGHAM: 131612
Last update 27/8. At the last update in August Mark mentioned that their Central I.T. was waiting on some shiny new infrastructure before they could provide v6 DNS. Has a timescale on getting this rolled out appeared in the last 6 weeks? Pressure definitely needs to be applied I think.

LIVERPOOL: 131606
Last update 4/6. Steve and Co were waiting on new switches so that their v6 performance wouldn't be terrible, plus there were internal negotiations going on. Any news on any of this?

RHUL: 131603
Last update 10/9 (from Duncan). Any news at all on this? The perfsonar was dual-stacked but as Duncan pointed out no v6 DNS.

OXFORD: 131615
Last update 13/7. Kashif once again mentioned v6 DNS and a blocker. Any progress pressuring them?

GLASGOW: 131611
Last update 4/9. It's not been that long since your last update where your move to Plan B (or is it Plan C, D, or Z?) was mentioned. Any news in that short space of time?

RALPP: 131616
Last (proper) update 16/1. Any news here? Things seemed really positive for a while, but that was 2 seasons ago. Really needs an update.

MANCHESTER: 131607
Last (proper) update 25/4. The perfsonars are dualstacked, but no news on the storage (again due to v6 DNS problems IIRC). Another ticket that really needs any update?

ECDF: 131610
Last update: 6/9. There have been some Ipv6 misadventures at ECDF, but a lot of effort has been put into getting things working. Any luck on getting your pool nodes dualstacked (or finding out when you'll be able to do this)?

SHEFFIELD: 131608
Last update 10/7. Things were looking up for a while in the last update, but I take it from the silence since things haven't made much progress?

Back to the regular tickets, site-by-site as is the tradition.

RALPP
137633 (8/10)
A very fresh CMS ticket for transfer failures to RALPP. Assigned (8/10)

137361 (24/9)
A t2k ticket, where a user notices that you can upload a zero-sized file but you cannot then download it. I'm not sure why this is relevant, but Chris can replicate with the Imperial SE and reckons it's a dcache "feature". It might be that this will go unresolved. In progress (26/9)

GLASGOW
134689 (23/4)
Request to upgrade the Glasgow perfsonars. With the release of 4.1 Gareth is working on it, and would like to build the new perfsonars using the docker images. Duncan has suggested giving it a go with the perfsonar-testpoint image to see how things go. In progress (21/9)

ECDF
137627 (8/8)
A ROD ticket for failed SRM tests, Rob notes likely caused by some storages on a disk server falling over. Being fixed (and indeed the tests are working now). In progress (8/8)

DURHAM
134687 (23/4)
The Durham request to upgrade perfsonar. Adam has put upgrading onto their todo list in the last update. In progress (26/9)

SHEFFIELD
137491 (1/10)
Atlas transfer failures to Sheffield. Acknowledged by the site, but have you had any luck tackling the issue? From today's update by the DDM shifters it looks like it's ongoing. In progress (8/10)

MANCHESTER
137112 (11/9)
Atlas spotted that SRM space reporting at Manchester was broken. Robert set them straight- it was due to a bug in a draining script moving data outside of the tokens. Fixing this is a slow process, Robert estimated of the order of weeks. On hold (20/9)

LIVERPOOL
137458 (28/9)
Liverpool's SE not working for biomed, due to there being no space left on the communal area. Liverpool have a spacetoken for biomed, but it was going unused. John and Stephan helped the VO with how to query these. I suspect this ticket can be closed soon. In progress (4/10)

LANCASTER
136635 (9/8)
Low availability ticket for Lancaster. Being tough to get a clear 30 days due to a collection of downtimes and the Lancaster SE playing up a bit. On hold (8/10)

QMUL
132929 (18/1)
APEL accounting ticket for QM's slurm batch system. Lots of discussion and a related APEL ticket (118969). I'm not sure if there's much further input to be had from the site for now? In progress (12/9)

137180 (13/9)
A t2k ticket complaining about the QM data access being slow. One of several tickets tackling a known issue with the QMUL STORM (particularly SRM). Dan helpfully provided a bunch of alternatives and suggestions to help the user - so useful it should be documented! In progress (14/9)

137631 (8/10)
A fresh ROD ticket - all SE based tests... Assigned (8/10)

136719 (15/8)
LHCB having file access problems at QM (although I think file metadata access problems would be more exact). The ticket mentions a database move, did you get round to this? In progress (18/9)

137622 (8/10)
LHCB FTS transfer problems - Dan notes that a rack had power problems which required physical intervention. The rack is back on so hopefully transfers will work again. In progress (8/10)

137617 (7/10)
An atlas ticket for the same issues. In progress (8/10)

134573 (17/4)
A request to install singularity from CMS. Dan mentioned right at the start that this would be part of their CentOS7 move. Is this on the horizon? On hold (17/4)

IMPERIAL
137468 (28/9)
CMS production job stage outs from Brunel to I.C. failing. Daniela cannot reproduce this by hand even though the problems persist for CMS. An environment to test this out by hand has been provided so Raul could try it out on a WN directly. In progress (5/10)

137352 (24/9)
CMS noticed a few transfers failing - a pool node had fallen over and then had filesystem troubles. All fixed now, so we're in the wait and see if things go green stage. Waiting for reply (8/10)

136687 (28/9)
Loosely related to 137468 (this ticket uncovered that issue), CMS stageout failures at Brunel. I didn't quite follow the thread, but diagnostics were being run over the weekend. Did they reveal anything? In progress (5/10)

137451 (28/9)
LHCB data transfer problems at Brunel. A lack of information had made Raul's job debugging this difficult. Vladimir responded with something that could help a bit today. In progress (8/10)

133956 (9/3)
CMS xroot config change ticket. In July a multi-point plan was laid out, how goes it? In progress (3/7)

100IT have a ticket: 137306

Orphaned ticket: 136687
I think this ticket regarding third party http transfers and the FTS can be closed, I'm not sure anyone's looking at it.

THE TIER 1
137195 (14/9)
A ROD ticket due to bdii problems causing SRM test failures. The issues are known about and being worked through, but at last check on Friday the problems persist. In progress (5/10)

137391 (25/9)
Atlas seeing poor transfer efficiency to tape and disk at RAL. Tim narrowed the errors down to a pair of sources. One of the sources (TRIUMF) has spotted the cause at their side (a v6 networking issue I failed to fully understand), it may or may not be a similar problem for EELA-UTFSM. In progress (5/10)

136701 (14/8)
LHCB noticing a high (5%) background failure rate for jobs at RAL. The theory is a network issue or a problem with Castor. Waiting on the submitter to get back from his hols. Waiting for reply (24/9)

136199 (18/7)
LHCB stuck FTS transfers. There's been a long break in looking at this, waiting on Catalin to get back from a well-earned break. On hold (1/10)

137153 (12/9)
A t2k ticket about 0-sized files, asking how to deal with them in the LFC (where they seem to have a bunch of these). It appears to be unrelated to the recent LFC issues. There's some discussion at the Tier 1 about what to do. In progress (25/9)

137634 (8/10)
A fresh CMS ticket regarding some transfer failures to METU (although the error could be at either end). In progress (8/10)

124876 (7/11/16)
The old ECHO gridftp test ticket - it's looking now like a simple authorisation problem for the test's robot certificate - so maybe we're nearly there fixing this. In progress (8/10)


Monday 24th September 2018, 15.00 BST
50 Open UK tickets this week.

VOMS 137342 (23/9)
T2K noticed a voms outage last night, which Robert fixed first thing. Just checking if things are back working for them now (and that little question of why they couldn't get proxies from the other two UK servers). Waiting for reply (24/9)

OXFORD/LHCB
136687 (13/8)
This ticket was originally created to track a side issue invloving 3rd party transfers, but I've seen no chatter on it since the 17th of August (and that was me). Is it still relevant? In progress (17/8)

TIER 1 LFC issues fixed.
136884 (t2k)
136884 (t2k)
136884 (t2k)
137254 (ROD)
Just to point out some good progress it looks like the LFC has been fixed. Darren has kept on top of prodding the tickets too.

TIER 1 FTS ticket
136199 (18/7)
This LHCB FTS ticket hasn't had any input in it since August, is the issue still an issue? In progress (7/8)

124876 (17/11/16)
Any news on this old ROD/ECHO/gridftp test ticket, the most ancient of tickets? It actually looks like the error message has changed, which is something. In progress (23/7)


Monday 3rd September 2018, 14.30 BST.
47 Open UK Tickets this month.

SUSSEX
131617 (3/11/17)
The site's IPv6 ticket. Good news from Leo today, with all external services dual-stacked (the perfsonar just needs adding to the mesh). Nice one! Waiting for reply (3/9)

RALPP
136958 (30/8)
t2k had their replications to ralpp timing out, but Chris noticed that they've run out of space. He's kindly trying to free them up a bit more room. In progress (31/8)

136927 (29/8)
CMS transfer failures, which look to be due to a bad file. It's been invalidated and a proper replica moved to the site. This should be just about done with? In progress (30/8)

131616 (3/11/17)
RALPP v6 ticket. Any updates? In progress (16/1)

OXFORD
131615 (3/11/17)
Oxford's v6 ticket. Last update was July, I suspect there's been no progress over the summer. On hold (13/7)

136687 (13/8)
A bit of an odd LHCB FTS ticket as it's intended to mirror the issure for a site rather then track an issue at a site. From my understanding these are known issues with third party http transfers? In progress (17/8)

CAMBRIDGE
131614 (3/11/17)
Cambridge's v6 ticket. Last update was back in June, any more news on the move to a new address block? On hold (5/6)

BRISTOL
131613 (3/11/17)
And Bristol's v6 ticket. Winnie's kept us appraised of the situation back in July. On hold (16/7)

BIRMINGHAM
129930 (4/8/17)
The http ticket that kind of tracks the move to EOS now. I do wonder if it's worth keeping this ticket around much anymore, as there's an epic Jira ticket charting the migration. On hold (14/8)

131612 (3/11/17)
The Birmingham v6 ticket. Mark gave some not great news last month. Andrea has asked to turn up the pressure. Maybe we can help? On hold (27/8)

GLASGOW
134689 (23/4)
Request to upgrade perfsonar to C7. Perfsonar 4.1 is out now if you feel like revisiting this. On hold (14/8) Update - Gareth has come up with some questions for Duncan about how best to go about setting up a perfsonar for the mesh. Waiting for reply.

131611 (3/11/17)
Glasgow's v6 ticket. Any new (aspirational) plans that need to go into the ticket? On hold (26/2)

ECDF
131610 (3/11/17)
Just the old v6 ticket at ECDF. Any news since your misadventures back in May? On hold (28/5)

DURHAM
136909 (28/8)
Atlas deletion error ticket. The Durham guys are just back in the office and poking the ticket. In progress (3/9)

134687 (23/4)
The other outstanding request to upgrade the perfsonar host. In progress (14/8)

131609 (3/11/17)
Durham's v6 ticket. From the feel of the July update I'm going to hazard a guess that there's not been any recent progress. In progress (should be On Hold?) (10/7)

SHEFFIELD
136014 (10/7)
Atlas transfer errors, evolved to problems with the error message "job has been canceled because it stayed in the queue for too long". As this an FTS side error? Perhaps the SE is simply being overworked? In progress (30/8) Update - Elena reports that after some DPM config changes efficiency is up to 73%

131608 (3/11/17)
Sheffield's v6 ticket. How goes the work that was supposed to resume in July? On hold (10/7)

MANCHESTER
136976 (1/9)
A fresh atlas transfer error ticket. Robert found that their DPM's mysql database wasn't responding, but a restart should have fixed it (and indeed a peek at the monitoring shows this to be the case). In progress (1/9) Update - solved with transfers at 100% again.

131607 (3/11/17)
Manchester's v6 ticket. Any news on this since the April update? On hold (24/4)

LIVERPOOL
131606 (3/11/17)
Just the v6 ticket at Liverpool. Any news since the June update? In progress (4/6)

LANCASTER
136793 (20/8)
A ticket from snoplus as the Lancaster SE wasn't in the cern bdii. It looks like none of the Lancaster resources are in the cern bdii, even though other top bdiis know about this. A bit of a head scratcher. Has anyone else been "censored" (or censured) by the CERN bdii? In progress (3/9)

136635 (9/8)
A low availability ticket, not too far off being able to close it. On hold (9/8)

UCL
134686 (23/4)
Request to upgrade perfsonar. No news since the ticket was acknowledged. On hold (23/4)

131604 (3/11/17)
UCL's v6 ticket. There was a re-poking of the network team back in May but no news since. On hold (4/5)

RHUL
131603 (3/11/17)
Just the v6 ticket at Royal Holloway. How's it going? It looks like lack of v6 DNS was the problem here again. In progress (perhaps should be On Hold?) (6/2)

QMUL
136719 (15/8)
LHCB having file access problems (again?). Daniel thought it might be the SE misbehaving under load that's causing the problems. There was some testing, but I'm not sure of the conclusion. Waiting for reply (23/8)

136550 (4/8)
t2k having file access problems, with the root cause being the top bdii they were using being broken (sounds similar to what Lancaster has been seeing). It looks like the problem has gone away here though, so I think this ticket can be closed. In progress (14/8)

136714 (14/8)
The same t2k user having problems reliably copying files, but again this issue seems fixed. In progress (15/8)

136918 (28/8)
t2k not noticing downtime notices. It looks like this ticket can be closed too. In progress (3/9)

136178 (17/7)
It's a seemingly solved t2k ticket at Queen Mary that hasn't been closed by the user. You don't see many of those around. In progress (14/8)

136712 (14/8)
LHCB noticed they weren't running (many) jobs at QM. Dan explained why (all very reasonable). It looks to me this ticket is resolved. In progress (14/8)

136576 (6/8)
A low-availability ROD ticket after the cooling troubles. The A/R numbers are almost up to par. On hold (6/8) Update - closed by Kashif with his ROD hat on.

132929 (18/1)
APEL accounting for slurm ticket. Dan has been working on this, and has spread the new accounting scripts around his CEs and APEL box. In progress (3/9)

134573 (17/4)
Request from CMS to install singularity. Dan has it on his to do list, is the move to C7 still planned for the end of the Summer (i.e. soonish)? On hold (17/4)

BRUNEL
136806 (21/8)
CMS jobs having problems at Brunel. There was an interesting case where WNs lost v4 connectivty whilst maintaining v6 and this were still able to get jobs, but the root cause looks to be problems with the xroot fallback mechanism. I think this might be above the site's metaphorical paygrade. In progress (31/8) Update - it's looking like the site-side issues are solved, just waiting on one last round of checks.

133956 (9/3)
A CMS ticket to reconfigure the site's xrootd configs. Postponed due to waiting on a move to C7/DOME. Have you made any progress with this? FYI we're planning on turning DOME on at Lancaster soonish. In progress (3/7)

TIER 1
136884 (27/8)
lcg-cr not working for t2k, an lfc ticket that's been ported over to RAL as it looks like their database is corrupted. In progress (29/8)

136840 (23/8)
A Sno+ ticket, which looks to be related to the LFC issues. In progress (29/8)

136942 (29/8)
t2k noticing timeouts copying ONLINE_AND_NEARLINE files at RAL. After investigation it lead to the RALPP ticket above, and this ticket was left unclosed. In progress (can be closed) (30/8)

136701 (14/8)
LHCB would like to investigate the high background failure rate of jobs transferring their data out at RAL. A lot of back and forth on the ticket. Waiting for reply (3/9)

136967 (31/8)
CMS Phedex transfers from RAL to FNAL failing. Checking on it has been passed to the ECHO team. In progress (31/8) Update- solved after some files were declared lost.

136366 (25/7)
Removing MICE from the batch queues. It looks like submission has been successfully disabled. In progress (20/8)

136757 (17/8)
MICE VO voms configs missing from the LFC. This looks to be fixed (although the suspected database problems might interfere with stuff). In progress (21/8)

136028 (10/7)
CMS have issues reading files on ECHO, which looks to be a xroot problem (I couldn't follow the ticket). Chris B has put a lot of effort into this, and Brian Bockelman is roped into the ticket now. In progress (29/8)

136199 (18/7)
An LHCB ticket to the FTS team, progress on the ticket stalled nearly a month ago (have people been on holiday?). In progress (7/8)

124876 (17/11/16)
Getting ECHO gridftp ROD tests working. Things were looking quite good, but it looks like the ticket is waiting on a WN config change to be rolled out at RAL still? The tests are still broken. In progress (23/7)


Monday 13th August 2018, 15.00 BST

42 Open UK Tickets this month.

SUSSEX
131617 (3/11/17)
Only this IPv6 ticket at Sussex. End of August was the timescale for the IPv6 rollout given in the network planning document. How's that looking? In progress (4/6)

RALPP
136259 (20/7)
A t2k.org ticket with the user having some trouble with downloading files using the gfal tools. Chris gave some tips but no news from the user side. Any luck finding out what was up with the bdii info? If that seems sorted I reckon this ticket can be closed. In progress (20/7)

131616 (3/11)
RALPP's IPv6 ticket. This poor chap seems neglected and really could do with an update, or putting out of its misery. In progress (16/1)

OXFORD
131615 (3/11)
Again only the v6 ticket for Oxford. Kashif provided a recent (although sadly not very positive) update. On hold (13/7)

CAMBRIDGE
131614 (3/11/17)
Another v6 ticket, at last update (in June) John was waiting on the University to migrate to a new v6 block. Any news on this? No worries if there's not. On hold (5/6)

BRISTOL
131613 (3/11/17)
Yet another v6 ticket, in an update that was only last month Winnie restated the Bristol position, which is the unique state of having dual-stacked all the "hard stuff" and are just waiting to be able to v6-ify the perfsonar. On hold (16/7)

BIRMINGHAM
129930 (4/8/17)
The atlas ticket that heralded the "Dawn of EOS" at Birmingham. I believe that there is work going on towards getting EOS working for atlas at the site, but I see no updates in the ticket.... Just hinting. On hold (23/4)

131612 [(3/11/17)
And Birmingham's v6 ticket. Some postive(ish) news, with Mark waiting on the v6 DNS entries for the perfsonar. Any joy getting that done? On hold (28/5)

GLASGOW
134689 (23/4)
A request from Duncan to upgrade the site's perfsonar to CentOS7. Gareth was waiting on 4.1 to be released first - which at last check (29th June) was only in beta. On hold (24/4)

131611 (3/11/17)
v6 ticket. I don't really want to ask how it's all going... On hold (26/2)

EDINGBURGH
136391 (26/7)
Atlas jobs failing at ECDF-RDF with lcg-cp not found. It appears that there's some confusion in the site's AGIS configs, Teng has asked last week for some changes which he thinks need to be made. In progress (6/8)

131610 (3/11/17)
The ECDF IPv6 ticket. After a promising start the ticket is on hold waiting on updates from the networking team. On hold (21/5)

DURHAM
134687 (23/4)
Durham's request to update their perfsonar, they've adopted the Glasgow position on this matter, but might end up updated "by accident". In progress (3/7)

131609 (3/11/17)
And Durham's v6 ticket. In a recent update there was sadly no positive progress. In progress (10/7)

SHEFFIELD
136014 (10/7)
Atlas transfer errors at Sheffield. Elena cleaned the database, but although things looked better her end atlas still report troubles. I'm reminded of this setup hint that might help (setting control_idle_timeout to 7200). In progress (6/8)

136043 (11/7)
ROD low availability ticket. On hold (11/7)

131608 (3/11/17)
As you may have guessed, the Sheffield v6 ticket. Elena gave an update last month, although sadly the news wasn't very positive. On hold (10/7)

MANCHESTER
136631 (9/8)
Atlas seeing deletion errors. Alessandra has had to kick things a few times but things were looking good this morning. In progress (13/8)

131607 (7/11/17)
Manchester's v6 ticket. Perfsonar is v6'd, but IIRC it might be a while until the SE is dual-stacked due to v6 DNS issues. Any progress with this from your Central IT people? On hold (14/5)

LIVERPOOL
136667 (12/8)
Atlas deletion error ticket. John is away, but my recommendation is just check you have httpd running on all your pool nodes. In progress (13/8)

131606 (3/11/17)
The Liverpool v6 ticket. At last check not a sausage was to be heard about the new routers that will sort out the v6 performance issues seen by the site. In progress (4/6)

LANCASTER
136648 (10/8)
Atlas transfer ticket for Lancaster as the site recovers from a DPM headnode migration that was a little rough (to say the least). We might have fixed things (or at least made things less bad). In progress (13/8)

136635 (9/8)
ROD Availability ticket, another side effect of the DPM migration. On hold (9/8)

UCL
134686 (23/4)
Request to upgrade the site perfsonar hosts to C7. No news on this at all. On hold (23/4)

131604 (3/11/17)
UCL's v6 ticket. In a May update Ben noted he had re-poked UCL Networks. No news since - has anyone heard from Ben about anything? Just checking he's okay down there. On hold (4/5)

RHUL
131603 (3/11/17)
The only ticket for RHUL is their v6 ticket. Now that you are Post-Govind how are the site's v6 plans looking? In progress (6/2)

QMUL
136550 (4/8)
A ticket from t2k that I only just noticed didn't have the "notify site" field filled in. The user was having SE connection problems, but I think this coincided with other problems at the site, so maybe the issue is fixed now? Again the same user is seeing bdii problems in his gfal commands. Assigned (13/8)

136178 (17/7)
A kind of parent ticket to the above one, here the same t2k user was seeing timeout issues. The situation seems a little confusing. In progress (7/8)

136576 (13/8/0647)
Availability ticket after the aircon failures. On hold (6/8)

136641 (9/8)
Atlas transfer failures after some load problems. Following the link it looks like the issues are cleared up for the last 48 hours. Dan notes in the ticket that QM might need to buy a new Storm headnode next time out. In progress (11/8)

134573 (17/4)
Request from CMS to install singularity. Planned for the end of Summer move to C7. Any news on that? On hold (17/4)

132929 (18/1)
Problems with APEL at QM. No news in the ticket since May. In progress (10/5)

BRUNEL
133956 (9/3)
Changing the CMS xroot config ticket. Raul is postponing the work until the move to C7/dome. In progress (3/7)

136649 (10/8)
CMS notes some possible missing files at Brunel, which Raul cannot find a record for in the last 4 weeks. Raul has asked for some extra details, as they've had phantom files at the site before. In progress (13/8)

TIER 1
136199 (18/7)
LHCB are seeing a lot of stuck transfers on the RAL FTS. In progress (7/8)

136655 (10/8)
LHCB spotted a missing file (with no replicas) at RAL. Sadly the file was dead, so I think this ticket can be closed. In progress (13/8)

136028 (10/7)
A CMS ticket with troubles reading files off of RAL disk - which I think are due to xroot problems (it's a very long ticket and I only had tiem to skim it). Chris has asked if there's a CMSSW expert that they can get involved. In progress (13/8)

136358 (25/7)
CMS xroot access problems from the RAL WN. On hold whilst other issues are cleared up. On hold (13/8)

136563 (6/8)
CMS spotted possibly corrupt files at RAL, the ticket has been referred to the ECHO team, but that was a week ago now. Of course there have been problems... In progress (6/8)

136665 (11/8)
CMS ticket for the site being down, the ticket is referred to the unscheduled downtime and other issues. In progress (12/8)

136366 (25/7)
A ticket from MICE asking for them to be removed from the batch system at RAL as the experiment winds down. Catalin asks if the removal can be tested to see if it fails as it should. Waiting for reply (8/8)

124876 (7/11/16)
The old Ops gridftp test ticket for ECHO. Waiting on a change to be rolled out to the RAL WNs to get the new tests working. In progress (28/6)

Monday 30th July 2018, 15.00 BST
41 Open UK Tickets this week.

MANCHESTER
136402 (26/7)
This atlas transfer error ticket looks like it hasn't been spotted. A quick peak at the DDM site shows that the errors are ongoing too. Assigned (26/7) Update - Alessandra noted that the ticket can be closed as the errors have abated.

QMUL
136414 (27/7)
Similarly a t2k ticket that seems to have been missed, although I suspect that was because Dan was busy with the air-conditioning problems and the resulting downtime. Once things are looking better and your SE is back up can you please field this ticket. Assigned (27/7)

136178 (17/7)
When things are less hectic this t2k ticket looks like it's done with (at least to my eyes). In progress (17/7)

ECDF
135976 (6/7)
We discussed this atlas deletion ticket last week in the storage meeting, could you chuck an update onto the end of it please. In progress (25/7)

SHEFFIELD
136014 (10/7)
Could this other atlas transfer failures ticket please can get an update from the site, it's had a few updates from the atlas DDM shifters. In progress (27/7)

RALPP
136259 (20/7)
Another t2k user ticket noting trouble with using the gfal tools (he notes that lcg-utils still work). Any luck figuring out the BDII problems? In progress (20/7)

TIER 1
134685 (23/4)
Upgrading the Tier-1's perfsonar seems to have hit a snag getting the MESH to work. Darren asks if the features can be lived with and the ticket can be closed? Waiting for reply (30/7)

136199 (18/7)
LHCB FTS problem ticket. The user had a few questions in his last two posts that might have been missed, although I think things seem in hand. In progress (27/7)

Monday 23rd July 2018, 15.00 BST
41 Open Tickets this week.

IMPERIAL
136112 (14/7)
This atlas ticket is waiting on input from someone from the cloud squad to check on queue status. Elena jumped in front of that task, but was travelling last week. Any joy getting round to it Elena? Waiting for reply (18/7) Update - Elena has updated the queue information and the ticket- thanks!

SHEFFIELD
136014 (10/7)
An atlas ticket for Sheffield transfers, not meaning to pick on you Elena but any luck looking at this ticket too? In Progress (23/7)

ECDF
135976 (6/7)
Another atlas ticket, this one for deletion errors. It landed with all the Edinburgh admins were at CHEP, any luck tackling it now that (I assume) at least some of you are back on home soil? In progress (12/7)

BRISTOL
135925 (3/7)
This CMS transfer failure ticket was reopened after being closed by the VO, so I thought I'd bring it to your attention as it's easy to miss them when this happens. Reopened (20/7)

QMUL
136178 (17/7)
The age old story, a (t2k) user tried to use lcg-tools, Dan suggests they use gfal tools instead, gfal tools work for the user. Unless Dan wants to dig into why obsolete tools weren't working for his sites SE I think this ticket can be closed. In progress (17/7)

BIRMINGHAM
129930 (4/8/2017)
The atlas ticket heralding the coming of EOS at Birmingham. Any chance of us being able to close this ticket before it hits its first anniversary? On hold (23/4)

TIER 1
124876 (7/11/16)
This old ECHO sam test ticket is so close the being closed, just waiting on a small change to be rolled out on the cluster. How goes this? In the mean time I took the liberty of putting the ticket back "In Progress". Reopened (28/6)

136138 (16/7)
A sequel to an old t2k ticket with a similar problem (131815), where bringing files online takes too long. The original request was for the Tier 1 to bring the files online manually if possible, but the submitter has hacked a possible solution by explicitly setting the timeout. I think this ticket is done with, but someone (Brian?) might want to add something to the ticket before it's closed. In progress (17/7) Update - Henry has added in a useful link to an old ticket he had regarding these tools: 135066

136104 (13/7)
A final ticket, another ROD one with a weird date, John asks if he can close the ticket and I reckon he can - but to be sure have the alarms cleared on the dashboard? In progress (23/7) Update - case closed.

Monday 9th July 2018, 15.00 BST
29 Open UK Tickets this week - 14 IPv6 Tickets

SUSSEX: 131617
Some positive news at the start of June, with the ETA on the network work required being the end of August. Is this still on course? In progress (4/6)

RALPP: 131616
Back in January Chris was thumping IPv6 on his perfsonar boxes into shape. I don't think you're still flogging that problem. How goes things? In progress (31/1)

OXFORD: 131615
Some good news delivered with some new switches back in March. Any update since then, now that Summer is well and truely upon us? On Hold (6/3)

CAMBRIDGE: 131614
Last month John reported that the University was moving to a new address block and thus all v6 addressed would need to the changed. Prudently he's decided to hold back till then. Any idea on the timeline for the change? On hold (5/6)

BRISTOL: 131613
At the last update Bristol admins were debating how to get an perfsonar box onto an IPv6 network back in April. Any luck with that? On hold (11/4)

BIRMINGHAM: 131612
Some good progress IPv6-ing his perfsonar box back in May, did you get the v6 DNS working? On hold (28/5)

GLASGOW: 131611
I don't want to really ask this, as I have an inkling as to the answer and it kind of feels like adding insult to injury, but how goes the new data centre? On hold (26/2)

ECDF: 131610
The busiest v6 ticket by far, if I'm reading things right things are waiting until a big networking procurement in September. Is this still the case? On Hold (21/5)

DURHAM: 131609
Any joy with this? Did the IT department get round to deploying v6 reverse DNS? In progress (4/5)

SHEFFIELD: 131608
Things took at positive turn at the end of April, with news that the University would work towards deploying IPv6 - but it was only at the approving the work to be done level at that time. Any news since? On Hold (30/4)

MANCHESTER: 131607
Good progress getting the perfsonar boxes dual-stacked, any plan of attack for doing the same for your DPM? On Hold (25/4)

LIVERPOOL: 131606
Waiting on new routers to deal with the rubbish IPv6 performance Liverpool are seeing. Any news since last months update? In progress (4/6)

UCL: 131604
Not much news here, Ben's been occasionally repoking UCL Networks. As noted UCL only need to dual stack their perfsonar boxes to satisfy this ticket. On hold (5/6)

RHUL: 131603
At last update in February the perfsonars were dual-stacked but once again v6 DNS lookup wasn't supported yet - did they manage to get it working? On Hold (23/5)

Monday 2nd July 2018, 15.00 BST
33 Open UK Tickets this month.

IPv6 Tickets (14)
SUSSEX: 131617
RALPP: 131616
OXFORD: 131615
CAMBRIDGE: 131614
BRISTOL: 131613
BIRMINGHAM: 131612
GLASGOW: 131611
ECDF: 131610
DURHAM: 131609
SHEFFIELD: 131608
MANCHESTER: 131607
LIVERPOOL: 131606
UCL: 131604
RHUL: 131603

Oxford, Bristol, Sheffield, UCL and maybe Glasgow's tickets are all a little stale. The RALPP, Manchester and RHUL tickets really need an update due to various reasons.

Regular Tickets.

BRISTOL
135618 (11/6)
CMS SAM test stageout failure. Last update from Friday things are waiting for the site-local-config to be changed, which is in hand. In progress (29/6)

134820 (29/4)
CMS DDM quota ticket - after a long while CMS provided some feedback as to what they would like their quotas adjusted to. After this is done then hopefuly this ticket can be closed. In progress (26/6)

BIRMINGHAM
129930 (4/8/17)
The "atlas http SAM test" ticket soon to be rendered moot by the deployment of EOS at the site. The ticket could do with a progress update, in part to see if you need atlas to do anything. On hold (23/4)

GLASGOW
134689 (23/4)
Perfsonar upgrade ticket. 4.1 is due out this quarter I believe, so the anointed time to upgrade will soon be upon you. On hold (24/4)

135621 (12/6)
Atlas deletion errors, much discussed. I agree that the site cannot do much, we'll need to keep pressure on the VO to fix things. In progress (21/6)

ECDF
135404 (30/5)
Low availability ROD ticket. The last week's looking green, so it's just a waiting game. On hold (28/6)

DURHAM
134687 (23/4)
Request to upgrade the perfsonar host to Centos 7. Set in progress but no news on this for a while. Could do with an update (even if it's just to take the Glasgow-stance). In progress (30/4)

SHEFFIELD
134947 (4/5)
Atlas transfers timing out. I believe Elena rolled out the gSOAP fix for the original issue, so it looks like a fresh cause of errors has popped up - checking the monitoring they're still there at at a rate of a few hundred a day. Reopened (11/6) Update - solved with a DB cleanup.

135851 (28/6)
T2K having trouble downloading files from Sheffield. It turns out that the files have disappeared from the site. The VO has offered to close the ticket, but it's a good idea to suggest that they delete these entries from their LFC first (if they haven't already). In progress (30/6)

LIVERPOOL
135852 (29/6)
A ticket assigned on Friday, atlas complaining that deletions aren't working. I suspect a fallen over httpd daemon. Assigned (29/6) Update - solved. The second pass of the automatic httpd restarter fixed things.

UCL
134686 (23/4)
I am once again regretting not spinning these off into their own list - another request to upgrade the site perfsonar hosts. No news since it since the ticket was acknowledged. On hold (23/4)

QMUL
134573 (17/4)
CMS request to install Singularity, on hold until the site's C7 upgrade. On hold (17/4)

132929 (18/1)
CMS spotting APEL problems. The ticket's been quiet for a bit, is there a conversation with the APEL devs happening along a different channel? In progress (10/5)

BRUNEL
135871 (1/7)
LHCB have noticed a black hole node gobbling up pilots at Brunel. Assigned (1/7) Solved by offlining the bad node.

133956 (9/3)
CMS xroot config ticket. CMS would like to know if there's any progress? In progress (15/6)

100IT
135786 (22/6)
A simple availability ticket, made a bit horrid by all the auto-replies from the 100IT helpdesk system. On hold (28/6)

TIER 1
134685 (23/4)
The last of the requests to upgrade the perfsonar hosts. Last update was positive. In progress (11/6)

135822 (26/6)
CMS jobs seeing file read problems. Quite a complex issue, but it looks to be a problem with singularity and bind paths - a corresponding singularity issue has been opened (here). In progress (29/6)

124876 (7/11/16)
The oldest ticket, this ROD ticket for ECHO was waiting on the tests to be fixed. As seen in the now solved (125026) it looks like things should work (or at least be able to be made to work). But it requires some setup at the site tweaking the SE_PATH, this is underway. Reopened (28/6)

Monday 25th June 2018, 15.00 BST

38 Open UK Tickets this week.

BRUNEL
133956 (9/3)
Any update for this CMS xroot ticket? The VO re-poked the ticket a few weeks ago and they were still seeing unwanted behaviour. In progress (15/6)

TIER 1
135455 (31/5)
CMS Checksums failing. After Chris cleared up some bad files it still looks like a few more persist at the last update. In progress (4/6) Chris closed the ticket after noting the errors had cleared - cheers for that

135293 (23/5)
ROD ticket for Castor after the publisher was took offline. Daniela notes that the alarms have gone from the dashboard and asks if things are fixed - I believe that a new (if temporary) publisher was put in place? On hold (25/6)

SHEFFIELD
134947 (4/5)
Atlas file transfers. It looks like atlas are just reusing the old ticket for the latest batch of failures they are seeing. Reopened (11/6)

BRISTOL
134820 (29/4)
Can we please close this CMS ticket? The VO doesn't seem bothered with it. In progress (1/5) A prodding of the ticket has restarted the conversation.

RALPP
135552 (7/6)
CMS SAM test failures after a disk node hit some troubles. Taking a look at the test pages it's looking nice and green again so I suspect this ticket can be closed too. Chris updated and closed this ticket too.

ECDF
135404 (30/5)
Low availability ticket. Looking at the monitoring I see some more blue negative availability days in argo, so it looks like some weird stuff has been going on. It'll be a while before the alarm clears itself (especially as the weekend wasn't great), so this ticket could do with being put on hold. In progress (4/6)

Monday 11th June 2018, 16.00 BST

36 Open UK Tickets this week.

Link to all the UK Tickets.

The only tickets that caught my eye are:

NGI: 135038 - The NGI gocdb check ticket. Does any more input need to be given from sites for this?
QMUL: 134532 - LHCB have got back to this ticket and confirmed things are working, so it can be closed.
BRISTOL: 134820 - I reckon this old CMS ticket requested information can be closed too.
RHUL: 135542 - Another for the 'can be closed' pile, although CMS would like to see if there were any explanations for the temporary pilot problems that they saw.


Monday 4th June 2018, 14.30 BST
45 Open UK Tickets this month.

IPv6 Tickets.

SUSSEX: 131617
Some good progress here with the last update on Friday painting a hopeful picture of IPv6 come the autumn.

RALPP: 131616
Last update had Chris trying to beat his dual-stacked PS boxes into shape - but this was back in January. Needless to say the ticket needs an update!

OXFORD: 131615
Last update was back in March, with summer the likely timeframe for v6 deployment. Three months on the ticket could do with a slight update to re-confirm this is still the case.

CAMBRIDGE: 131614
It's a similar case for Cambridge.

BRISTOL: 131613
Any news on your plans from back in April to get your PS box onto a v6-enabled network?

BIRMINGHAM: 131612
Some recent good news here with Mark getting his PS box (kindof) v6 pingable, just waiting on the v6 DNS now.

GLASGOW: 131611
Gareth covered his bases well with his update back in February. Hopefully the new build is on schedule.

ECDF: 131610
Andy gave a mixed update a few weeks ago, citing some v6 routing differences and an upcoming wholesale networking overhaul scheduled for September so the ticket is freshly on hold pending more information.

DURHAM: 131609
A quick update last month reports no significant progress.

SHEFFIELD: 131608
Elena gave an update at the end of April, with work on the border routers scheduled for May. Hopefully that went well and you'll have more information soon.

MANCHESTER: 131607
Any plans on dual-stacking your storage after your Perfsonar successes?

LIVERPOOL: 131606
Any news on those ongoing negotiations mentioned in the last March update?

UCL: 131604
Have re-poked their network admins over this.

RHUL: 131603
No news for a while after the February update that v6 reverse-lookup wasn't working.

Back to the regular tickets...

NGI
135038 (9/5)
Review of the GOCDB info for the NGI. On to the second stage of the review now, but it's still a good time for sites to double-check their gocdb entries if they haven't recently. In progress (22/5)

OXFORD
135485 (3/6)
A fresh in ticket from Sno+, concerning the bdii information disappearing from their feeds. Assigned (4/6)

BRISTOL
135121 (15/5)
A ROD ticket for failed webdav tests. The tests were doomed to never work, so Lukasz disabled the endpoint in the gocdb. Daniela reckoned the ticket needs to be closed to try and see if it disables the alarms. In progress (24/5)

135120 (15/5)
Another week or so and this availability ticket should be able to be closed - until then it should be On-Hold'd. Reopened (4/6)

135302 (23/5)
CMS transfer failure ticket. It looks like this ticket hasn't been noticed yet. Assigned (23/5)

134820 (29/4)
This CMS pledge enquiry ticket has had the question answered. I suspect it can be closed. In progress (1/5)

BIRMINGHAM
129930 (4/8/17)
The old atlas http test failure ticket. How goes the EOS migration? On hold (23/4)

GLASGOW
134689 (23/4)
Perfsonar update ticket. Gareth is waiting on 4.1 to be released (which I can't find any news on). On Hold (24/4)

ECDF
135243 (21/5)
ROD ticket for failed srm-put tests- Rob had to restart things to get them working but no shifting the alarms joy at first. The tests seem okay for the last day. In progress (24/5)

135314 (24/5)
Another ROD ticket, this one for old IGTF rpms on the workers. As a quick note that may be helpful, an up-to-date version of the certificates is kept in /cvmfs/grid.cern.ch/etc/grid-security/ . In progress (28/5)

135404 (30/5)
The resulting low availability ticket for the previous issues. In progress (30/5)

DURHAM
134687 (23/4)
Request to update the Durham perfsonar. Any news? In progress (30/4)

SHEFFIELD
134947 (4/5)
Atlas transfer failures - one of the C7 DPM problem tickets- see this ticket. On hold (31/5) Update - the Oxford version of this ticket (134945 was solved by upgrading to the latest version of gsoap (as mentioned in the jira ticket).

MANCHESTER
134684 (23/4)
Perfsonar upgrade request ticket. Alessandra still wants to know how necessary this update is (my thoughts are it will be quite necessary, *once* Perfsonar 4.1 is out, but I don't have Duncan's expertise). Waiting for reply (23/4)

UCL
134686 (23/4)
Another perfsonar upgrade ticket, Ben was looking at it at the last update. Any joy? On Hold (23/4)

RHUL
134945 (4/5)
Another atlas transfer ticket due to the C7 DPM troubles. On hold (17/5)

QMUL
134532 (12/4)
The return of an old LHCB download problem, where the turl can't be resolved. Daniel has applied a fix to his production SE. Any news that it's worked? In progress (14/5)

134573 (17/4)
CMS request to install singularity, on hold until the Summer move to C7. On hold (17/4)

132929 (18/1)
CMS seeing SLURM accounting problems. The APEL devs are involved now, and have asked for some parser outputs to test some stuff. In progress (10/5)

IMPERIAL
135464 (1/6)
A CMS ticket about checksum failures that came in on Friday afternoon. Files are being declared invalid after being double-checked, and another transfer failure query has been tacked onto the ticket today. In progress (4/6)

134567 (17/4)
A ticket concerning the site rather then a site ticket, the declaration of some lost Pheno files. I poked it today. In progress (4/6) Closed by Pheno, so all's well.

BRUNEL
133956 (9/3)
A CMS xroot config change ticket. Any luck with rolling out these changes after your troubles getting the new hardware to roll them out onto? In progress (23/4)

THE TIER 1
135367 (28/5)
Another SNO+ information system ticket, this one has a lot of conversation going on in it about Castor publishing even before it landed at the Tier 1 (see the mice ticket below). In progress (4/6)

135133 (15/5)
CMS spotting corrupt files on ECHO, which looked not just be a problem with the file but perhaps with their metadata as well? A lot of conversation has occurred in this ticket so I'm not entirely sure what has occurred, but corrupt files have been deleted. Waiting for reply (4/6)

134685 (23/4)
Another request to upgrade Perfsonar to C7. At last check some C7 perfsonars were up and running in testing. Any luck getting them into production? In progress (2/5)

135308 (24/5)
MICE problems after the loss of Castor publishing. Henry has hit a problem when trying to combine the workarounds with LFC entries. In progress (1/6)

135293 (23/5)
ROD tickets, again related to the loss of castor publishing. Alastair has put in a request for the SRM Ops tests for Castor to be removed. On Hold (31/5)

134703 (23/4)
CMS transfers failing from RAL_disk. It appears files were being sent to the wrong namespace. There has since been a lot of lists of files being searched for. Any luck getting to the bottom of this? In progress (25/5)

135455 (31/5)
CMS checksum verification at RAL. This looks to be a duplication of 135133 but I think you guys already spotted that. In progress (4/6)

127597 (7/4/17)
CMS wanting to know about the RAL networking. After the new firewall went in at the end of April Chris asked for some RAL/RALPP job performance comparisons to try to see how xroot proxies could affect things. No news back, but the question could be lost in the noise. On Hold (30/4)

124876 (7/11/16)
Gridftp tests failing for ECHO due to a problem with the tests - after 117683 was left unsolved this is our oldest ticket. Not a hint of movement on the counter ticket (125026) for a long time. I think we could do with weighing up our options here. On hold (13/11/17)

Monday 21st May 2018, 14.00 BST
43 Open UK Tickets this week.

Sites getting Ticket Updates?
We've had two anecdotal tales of sites not getting emails updates to some tickets (135139 at Lancaster and 134945 at RHUL). Has anyone else not been getting emails about their tickets?

NGI
135038(9/5)
The yearly review of our gocdb information. Everyone did check that their site (and their own) details were correct, right? (I had my old office phone number listed). Jeremy is proceeding to the NGI level review now. In progress (21/5)

SKATELESCOPE.EU TICKETS
QMUL: 135042
CAMBRIDGE: 134980
Both these tickets appears to be waiting for submitter/VO input, and perhaps are solved.

BRISTOL
135121 (15/5)
135120 (15/5)
I'm not sure what's going on with the dates on these, but both ROD tickets were submitted by Kashif last week and both are still in the assigned state at time of writing. Assigned (15/5) Update - these tickets have been fielded now- thanks!

134820 (29/4)
This CMS pledge enquiry was answered, so I think it's waiting on CMS input. The ticket likely needs a prod to stir the pot. In progress (1/5)

MANCHESTER
134684 (23/4)
One of the "please upgrade perfsonar" tickets, this is waiting for an answer from Duncan. Waiting for reply (4/5)

DURHAM
134687 (23/4)
Also a perfsonar upgrade ticket, any progress with this? In progress (30/4)

IMPERIAL (kind of)
134567 (17/5)
I think this Pheno corrupt file cleanup ticket is nearly completed, it just needs to be kept an eye on to make sure it doesn't hang around after it's done. In progress (17/5)

QMUL
132929 (18/1)
After re-reading Adrian's last post I think there's a request for information in there to help debug things from the apel side. In progress (10/5)

TIER 1
135133 (15/5)
Chris has asked the ECHO admins to look at this CMS ticket, after seeing some odd behaviour. In progress (17/5)

133992 (12/3)
This other ECHO related ticket, from atlas, looks like it has badly stalled. Is there still an active issue to fix? In progress (19/4)

Monday 14th May 2018, 15.00 BST
43 Open UK Tickets this week.

NGI
135038 (9/5)
The annual review of the UK's information in the gocdb has rolled around again. As part of this could site admins please check their e-mail, telephone number and CSIRT are correct in their site's gocdb entry. In progress (14/5)

VAC
135059 (10/5)
Daniela has submitted a VAC ticket, which she assigned to Andrew. If VAC doesn't have a GGUS Support Unit it probably should get one, right? Assigned (10/5) Update: VAC did have a GGUS SU all along, but Andrew suggests the ticket belongs somewhere else.

Tickets that can be closed(?)
IMPERIAL: 135056
QMUL: 133402
Both these ticket's last updates suggested that the ticket can be closed. Update - Henry would like to keep the IC ticket open a bit longer.


Transfer Error Tickets
RHUL: 134945
OXFORD: 135051
LIVERPOOL: 134868
I thought that these tickets are worth mentioning, with the on going conversation and debugging going on in the storage group list.


Friday 4th May 2018, 17.00 BST.

Due to the Bank Holiday no proper look at the tickets this week, just the obligatory link to all the UK tickets.

Raul pointed me to a CMS ticket, 134763, that might be of interest to sites running condor and moving to IPv6. The problem was worked around rather then solved (disabling IPv6 in the CMS condor instances).

Another ticket that caught my eye were the RALPP ticket 134899, which is still just assigned but I believe the issue is already solved.

Monday 30th April 2018, 14.00 BST
49 Open UK Tickets this month.

IPv6 Tickets
SUSSEX: 131617
Had a recent update - thanks for that.

RALPP: 131616
How did the perfsonar dualstacking go?

OXFORD: 131615
I don't think anything has changed since March's update.

CAMBRIDGE: 131614
Whilst I don't think any progress is expected, the ticket could probably do with a token update. Thanks for the quick update.

BRISTOL: 131613
Winnie provided an April update, thanks for that.

BIRMINGHAM: 131612
Thanks also for the update Mark (any news is better then no news).

GLASGOW: 131611
From the silence I don't think there's been any change, but I don't think any was expected.

ECDF: 131610
Things looked reasonably positive at the last update.

DURHAM: 131609
Any luck getting a better connection from your Central IT people?

SHEFFIELD: 131608
No news for a while, but again I don't think any was expected yet. Update - Elena reports that work will start on network changes this month, but no date of when IPv6 will be available has been given.

MANCHESTER: 131607
The Manchester perfsonars are working over v6 now, which is nice.

LIVERPOOL: 131606
There's mention of "negotiations" so there might not be much news for a while!

UCL: 131604
No news here either.

RHUL: 131603
Has the v6 DNS lookup been rolled out yet?

Duncan's Perfsonar Upgrade Tickets
SUSSEX: 134741
Leo is jumping to it.

BIRMINGHAM: 134691
Mark acknowledged the ticket.

LANCASTER: 134690
Matt weeps at what should have been an easy job.

GLASGOW: 134689
Gareth is waiting for the proper perfsonar 4.1 release.

DURHAM: 134687
Adam marked the ticket in progress.

UCL: 134686
Ben hopes to get this done in the next few weeks.

MANCHESTER: 134684
Alessandra asks how necessary this upgrade is? (Presumably because Manchester just got their perfsonar fully up and running for v6).

TIER 1: 134685
Darren assigned the ticket to the appropriate team, but no other news.

And now the rest of the tickets, site by site.

SUSSEX
134415 (5/4)
A standard issue availability ticket, the numbers are healing. On hold (5/4)

BRISTOL
134820 (29/4)
CMS have asked for the 2018 DDM disk pledges. The ticket has not been spotted as of the time of writing. Assigned (29/4) Luke provided an update, with some figures and projections. In Progress

134513 (12/4)
Another CMS ticket, this one is for date transfer problems from the Tier 1s. Has Dr K had a chance to take a look at this yet? In progress (23/4)

134278 (27/3)
Another CMS transfer ticket, with errors due to the file existing. Have you managed to fix the DPM file permission problems? In progress (23/4)

BIRMINGHAM
129930 (4/8/17)
The old atlas http ticket - Mark provided an update that the time of EOS at Birmingham draws closer, which will "solve" this ticket. On Hold (23/4)

ECDF
134740 (25/4)
A different sort of perfsonar ticket from Duncan, asking to clear up some settings on the Edinburgh perfsonars. Any joy? In progress (26/4)

SHEFFIELD
134356 (30/3)
This atlas transfer ticket looks like it can be closed. On hold (28/4)

QMUL
133402 (9/2)
Snoplus jobs having problems at QM. It looks like things are fixed, and Dan had kindly offered to up Sno+'s fairshare to compensate them. Just waiting on VO confirmation that things are working. Waiting for reply (16/4)

133965 (9/3)
LHCB jobs failing due to "no space left on device". It looked like atlas jobs were being poor neighbours. In progress (16/4)

134532 (12/4)
LHCB data access problems at QM, which they thing is an old issue raising its head again (129155), Dan isn't convinced, and thinks it might be a problem with using the root protocol on storm. CNAF experts are getting involved (hopefully). In progress (17/4)

132713 (4/1)
HyperK.org support at QM - Daniela has asked to take another look and given her current set of errors. In progress (22/4)

134573 (17/4)
CMS request to install Singularity - which Dan has said them will do, when he upgrades to CentOS7 during the Summer. On Hold (17/4)

132929 (18/1)
CMS complaining that APEL accounting is not working for slurm at QM. The apel admins got cc'd into the ticket but no sign of them getting involved yet. In progress (29/1)

134455 (9/4)
LHCB pilots dying at QM. Turned out to be an interesting case of a few black-hole nodes. I think Dan is keeping this ticket open until he figures out the route cause (or a monitoring check).

BRUNEL
133956 (9/3)
CMS requesting xroot changes. Raul is on it after having some hardware delivery hiccups messing up his workflow. In progress (23/4)

134819 (29/4)
As the Bristol ticket, CMS requesting 2018 DDM disk pledges. This ticket also hasn't been spotted, but then it only landed on Sunday. Assigned (29/4)

134826 (30/4)
A fresh ROD ticket, for a lot of errors. Assigned (30/4) I suspect his is part of the aftermath of the disasters Raul described.

THE TIER 1
132708 (4/1)
WMS decommissioning ticket. Just in case you guys have forgotten about this - I suspect you can go onto the next stage now. In progress (18/1)

134468 (9/4)
CMS complaining that the xrootd redirector is not seeing some ECHO files. Turned out to be a stuck redirector, but the ticket got reopened with a different issue. Chris has asked if problems persist. Waiting for reply (30/4)

134769 (26/4)
CMS transfer from RAL to Florida failing. Chris and George jumped on it, but again the problem might have been fleeting. In progress (30/4)

133992 (12/3)
Atlas seeing no such file or directory errors in ECHO. A need for some sort of consistency checking has been identified (as well as the fact that existing tools might be difficult to adjust). No progress for a bit. In progress (19/4)

134619 (19/4)
Another ECHO ticket, this is from Chris B on behalf of CMS, citing problems reading data. Things look like they were fixed but got confused due to other issues. In progress (30/4)

134494 (11/4)
Atlas noticing that the json based space reporting isn't updating. Alastair notes that this was on purpose. Waiting for a reply from the submitter. Waiting for reply (11/4)

127597 (7/4/17)
CMS ticket checking on xroot and network performance. Chris reports that the new firewall is in place, although there are still kinks needing to be worked out. Chris also mentions that a useful exercise will be comparing the Tier 1 and RALPP, to see if the latter's xroot proxies could help. On Hold (30/4)

124876 (7/11/16)
The old gridftp tests failing for ECHO ticket. Not a hint of movement on the counter ticket (125026). On hold (13/11/17)

117683 (18/11/15)
The oldest ticket, glue2 publishing for Castor. Could likely do with a quarterly update. On hold (3/1)

Monday 23rd April 2018, 15.30 BST
56 Open UK Tickets this week.

Upgrading Perfsonar
Duncan has rolled out 8 tickets to sites running old, CentOS7 perfsonars asking them to upgrade. The sites are on the receiving end are Lancaster, Glasgow, Birmingham, QM, Durham, UCL, Manchester and the Tier 1.

RHUL
134574 (17/4)
This request from CMS to install Singularity looks like it hasn't been spotted yet. Assigned (17/4) In progress now.

BRUNEL
133956 (9/3)
Request from CMS to update xroot configs. To repeat Brian's questions, any news on this? In progress (9/3) Thanks for the update Raul

QMUL
132713 (4/1)
Daniela has asked that this hyperk ticket be looked at with renewed vigor. In progress (22/2)

TIER 1
133992 (12/3)
One of a few ECHO tickets, there are some problems using existing consistency tools which are being looked at - some interesting points are raised. In progress (19/4)

Monday 26th March 2018, 15.00 BST
38 Open UK Tickets this week.

ECDF
131610 (3/11)
The only IPv6 ticket to see some recent action, there's some interesting information here about dual-stacking VMs running on hypervisors. In progress (22/3) Updated with some extra information and good progress.

134034 (14/3)
This LHCB ticket looks to be long solved, and can be closed. In progress (15/3) Solved

BRISTOL
133806 (2/3)
134081 (17/3)
Both of these Bristol CMS tickets could do with an update to show how things are coming along.

BRUNEL
133956 (9/3)
Have you managed to get started on this CMS xrootd config request? In progress (9/3) Raul updated the list with his plans.

QMUL
132713 (4/1)
Any luck tracking down the hyperk job errors? In progress (6/2)

RHUL
134144 (20/3)
It looks like the SRM problems are RHUL side according to all the atlas monitoring. If it helps I've had success clearing similar looking errors with restarts of the srmv2.2 services. In progress (21/3)

TIER 1
134136 (20/3)
This atlas "no such file" ticket sounds very familiar to the issues seen on the ECHO service (133992) and at Lancaster (133991 ). In progress (20/3)

Monday 19th March 2018, 15.00 GMT
47 Open UK Tickets this week.

GLASGOW
134072 (15/3)
Atlas want sites running v3 of the frontier squid to upgrade to 3.5.27-3.1 (or higher). Nothing wrong with the ticket, just something to bare in mind for any other sites missed by atlas' monitoring. In progress (16/3)

MANCHESTER
134032 (14/3)
Atlas seeing deletion errors - as of Friday the errors persisted. Assigned (16/3)

QMUL
133965 (9/3)
LHCB jobs suffering "no space left on the device" errors at QM (which has happened before IIRC). This ticket might have been missed. Assigned (16/3)

ECDF
134034 (14/3)
LHCB job problems - but things look fixed so it seems the ticket can be closed. In progress (15/3)

IMPERIAL
133818 (4/3)
A question for LHCB rather then IC - Simon answered the queries about the site's per-sse4_2 nodes, so the ball is in the VO's court now. Waiting for reply (5/3) Update - closed by LHCB.

TIER 1
134037 (15/3)
An interesting LHCB ticket where file access for a file in Castor appears to be working from RAL itself, but not from lxplus or some other places. In progress (15/3) Chris has linked to this similar CMS ticket 134119, in which he notes that he is seeing similar errors from his home ISP.

MISSING FILES THAT WERE NEVER THERE (PERHAPS)
133991 (Lancaster)
133992 (Tier 1)
Two tickets with similar symptoms, where rucio seems to think files are there and the SEs don't. Elena opened a JIRA ticket for the Lancaster problems - https://its.cern.ch/jira/browse/ATLDDMOPS-5434

Monday 12th March 2018, 14.30 GMT
42 Open UK Tickets this week.

SUSSEX
133325 (6/2)
This Availability ticket looks like it can be closed, with the alarms having gone green. In progress (8/3)

DURHAM
133338 (7/2)
Is this subject of Atlas ticket still causing problems? Lots of things were done at the last update - did they fix the issue? In progress (21/2)

TIER 1
133719 (27/2)
This ECHO ticket hasn't had an update since its acknowledgment, any news? In progress (27/2)

133717 (27/2)
Possibly related, this CMS FTS ticket hasn't had an update this month either. In progress (27/2)

Both of these issues look like they're related to this atlas ticket, which has been getting updates: 133752

133619 (21/2)
I have a feeling that this CMS unmerged file ticket can be closed, but I could be misreading the last updates. It's definitely work checking to see if it is solved. In progress (12/3)

133764 (1/3)
Finally, this Sno+ BDII ticket can be closed, the problem appears to have been at the source. In progress (8/3)


Monday 5th March 2018, 14.30 GMT
44 Open Tickets this month.

IPv6 Deployment Tickets
Sussex: 131617
Possibly on hold until mid-2018.
RALPP: 131616
Chris had an encouraging update back in January, but hit some snags with a new Perfsonar install. Any joy?
OXFORD: 131615
No update since stating you had dual-stacked Perfsonar boxes back in November. Anything to add? Thanks for the update.
CAMBRIDGE: 131614
No progress expected until the Summer of this year. Is this still the case?
BRISTOL: 131613
Last update hoped progress could happen by February, any news? No recent news
BIRMINGHAM: 131612
Some progress on the v6 infrastructure news, hopefully the bugs Mark described a few weeks back can be ironed out.
GLASGOW: 131611
Gareth provided a recent, if not totally positive, update.
ECDF: 131610
There were some interesting times last week when taking the first steps in dual-stacking the ECDF DPM broke things. Keeping to dual-stacking their test DPM for now.
DURHAM: 131609
Last update at the end of January had no positive movement from central IT on v6 deployment.
SHEFFIELD: 131608
This ticket really could do with an update - even an unexciting one.
MANCHESTER: 131607
IIRC I think reverse lookup works only for the Perfsonar boxes - the ticket could do with an update about this.
LIVERPOOL: 131606
Another ticket that could do with an update, even if it's a boring one. John provided a brief update.
UCL: 131604
No news from central IT at last check back in January.
RHUL: 131603
Perfsonar dual-stacked, but DNS lookup not supported yet.

Common or Garden Tickets

SUSSEX
122772 (11/7/16)
Webdav/Xroot ticket. Some good looking progress on getting this to work, although at last check Leo hit some more problems. In progress (7/2)

133325 (6/2)
Availability ticket. Hopefully given another week of smooth running this can be closed. In progress (12/2)

RALPP
133819 (4/3)
LHCB asked RALPP to provide details of nodes without any SSE4.2 support. As Chris instructed the ticket was reopened by LHCB to request lhcb jobs do not land on these nodes. Reopened (4/3) Update - solved, the nodes are being decommissioned very soon.

OXFORD
133809 (3/3)
Availability ticket, caused by the AC troubles. On hold (5/3)

BRISTOL
133762 (1/3)
CMS Transfer problems, on hold until Friday. On Hold (5/3)

133806 (2/3)
CMS asked sites to deploy Singularity by March 2018, this ticket is the follow up. On hold (5/3)

BIRMINGHAM
129930 (4/8/17)
Atlas http SAM tests failing. Any luck with the puppet scripts Kashif shared with you? On hold (13/2)

GLASGOW
133667 (23/2)
LHCB data access problems at Glasgow. The ticket tailed off a bit, Andrew McNab has offered to help compare Glasgow and Manchester settings. In progress (5/3) Update - everything looks good now after Sam updated xroot across the Glasgow storage. Maarten noted in the xroot changelog the likely fix. I should imagine this ticket can be closed now.

DURHAM
133338 (7/2)
Atlas jobs failing at Durham, with the problems likely to be related to the Arc Control Tower handling of pilots. Adam rolled out some changes, have these fixed things? In progress (21/2)

SHEFFIELD
133019 (24/1)
Availability ticket. Ticking along. On hold (1/3)

133810 (3/3)
Sno+ jobs failing due to cvmfs errors on a node, which Elena has offline. I suspect that that's this ticket done with. In progress (4/3)

133770 (2/3)
LHCB jobs failing due to problems on some WNs, Elena has been fixing them, hopefully it's all sorted now. In progress (3/3)

MANCHESTER
133716 (27/2)
Atlas deletion errors - it looks like this ticket has been missed. Assigned (27/2)

QMUL
133402 (9/2)
A good portion of Sno+ jobs failing at QM, due to stage in/out errors. This is likely caused by the reduced network bandwidth being hogged by atlas. Hopefully this will be fixed soon (by restoring the 20GB/s site connection). In progress (22/2)

132713 (4/1)
hyperk.org support ticket. Any news? In progress (6/2)

132929 (18/1)
CMS having problems due APEL's problem parsing slurm logs (or something like that). APEL support have been called in, but no news yet. In progress (29/1)

IMPERIAL
133683 (24/2)
Atlas seeing a high job failure at Imperial, due to problems with their AGIS configs that they have no control over. Elena proposes closing the ticket and moving the conversation to JIRA. In progress (5/3) Update - atlas are waiting on seeing some running jobs before closing the ticket

133818 (4/3)
Another LHCB asking how many nodes do not have sse4.2 support. Simon reports there are no plans to decommission these nodes yet. Waiting for reply (5/3)

133723 (27/2)
This is a ticket for the Cloud site, Sno+ saw problems. Simon was investigating, and has offlined the cloud site in Dirac to prevent further failures. In progress (27/2) Update - Simon hasn't managed to reproduce any errors, and has suggested closing the ticket for now, reopening if needed.

132688 (3/1)
Another not really an Imperial ticket, I think this lost Pheno file ticket can be closed soon. In progress (29/1) Update - ticket closed

TIER 1
133719 (27/2)
Atlas spotted tranfers failing into Echo. It was being investigated, any news? In progress (27/2)

133752 (1/3)
Atlas noticed the FTS was broken. Was investigating Alastair noted that it appears to be an IPv6 issue. In progress (1/3)

133717 (27/2)
Likely related, a similar sounding CMS ticket. Any news? In progress (27/2)

133619 (21/2)
Missing unmerged CMS files at RAL. Chris has been helping a lot, but has asked CMS to double check his working. Waiting for reply (5/3)

133764 (1/3)
Sno+ ticket about the RAL BDII not having SFU information. It looks like the bdii information has recently changed (for the worse). Any news? In progress (2/3) Update - Karin has updated the ticket saying that things have got a lot worse for Sno+, upping the ticket's priority.

132589 (21/12/17)
LHCB killed pilots ticket. Some more investigations into this show that the problem is getting worse. Any luck with your investigation? In progress (23/2)

132708 (4/1)
WMS decommissioning ticket. Nothing to do here until next month I don't think. In progress (18/1)

127597 (7/4/17)
CMS network performance ticket. No news since Chris' comprehensive update in January. On hold (29/1)

124876 (7/11/16)
ECHO gridftp ROD tests not working, due to problems with the tests. No news on the counter ticket, still. On hold (13/11/17)

117683 (18/11/15)
GLUE2 publishing for Castor. A quick update in January reports a prototype version is being tested. On hold (3/1)

Monday 26th February 2018, 14.30 GMT
37 Open UK Tickets this week.

It's still seemingly like a stagnant time on the ticket front. A few tickets that need a poke include this RALPP ticket: 133390, which has been in waiting for reply for a few weeks, and this QMUL ticket: 132929, waiting for some input (or acknowledgement) from the APEL devs.

Glasgow have a few tickets related to some issues with xrootd playing up in various ways at their site (causing errors for lhcb in 133667 and a return of the classic xroot overload problems in 133690). The tickets are being handled with the usual Glasgow panache, but I thought I'd give an opportunity to talk about them.

For the first time in a while (that I can remember at least) a ticket has been (re-)assigned to atlas-adc-cloud-UK - the IC ticket 133683. The root causes of the problems are likely the move to using QM as IC's DATADISK. It could be interesting to watch (hopefully it won't be though!).

Related to the previous tickets, for the Sussex xroot ticket 122772 it is worth atlas re-engaging with this. Plus perhaps the errors seen could be related to xroot playing up rather then a misconfig?


Monday 19th February 2018, 15.30 GMT
35 Open UK Tickets this week.

IPv6 Tickets.
A quick skim over these - does anyone have anything they want to add?

Bristol
133508 (14/2) CMS sites have been asked to set up Rucio test areas - this one hasn't been spotted yet. The Brunel equivalent (133506 contains possibly useful information. Assigned (14/2)

Tier 1
133421 (12/2) This Sno+ transfer ticket looks like it can be closed, the VO reports that things are fixed. In progress (14/2)

QMUL
132713 (4/1) One of the last hyperk support tickets, Daniela had a suggestion but no news on the ticket since. In progress (6/2)

DURHAM
133338 (7/2) This atlas jobs failure ticket has been reopened, with atlas still seeing issues but not sure about the cause (the jobs complain with "cat: output.list: No such file or directory"). Reopened tickets can often sneak by us so I thought I'd bring this one up. Reopened (17/2)

Monday 12th February 2018, 17.00 GMT
46 Open UK Tickets this week.

Link to all the UK Tickets.

It doesn't feel like a very exciting week for tickets - although it's worth noting that Sno+ seem to be having a ticket drive, cleaning up problems that they're seeing.

There's a RHUL ticket (133409) that needs acknowledging, and there's a few tickets from CMS regarding that data transfers that just seem confusing to me (133390 and 133389 at RALPP, 133344 at Imperial) - although sites aren't to blame for this confusion!

Completely anecdotally (citing 133424), is it me or does CVMFS feel less robust recently? It of course could just be me.

Finally I'll take this opportunity to do my bi-annual reminder to sites to please check the status of their tickets - when you start working on it please make sure to set them 'In Progress', when you ask a question please mark the ticket 'Waiting for reply' and when you're not going to make any progress for a while please set the tickets 'On Hold'. Finally finally, it's not really worth leaving tickets for too long before closing them - a day or two is usually more then enough.

Monday 5th February 2018, 15.30 GMT
38 Open UK Tickets this month

IPv6 Tickets
Sussex: 131617 On Hold (15/11/17)
RALPP: 131616 Chris put in a nice update a fortnight ago, citing some perfsonar problems. In progress (31/1)
Oxford: 131615 No recent news on the ticket but I think there's v6 progress at Oxford? On hold (7/11/17)
Cambridge: 131614 On hold (15/11/17)
Bristol: 131613 Early February was the estimated time to get the perfsonar boxes dual stacked, how's that looking? On hold (7/11)
Birmingham: 131612 Duncan poked the ticket last month. On hold (11/11/17)
Glasgow: 131611 I think any further news awaits you chaps moving into your new digs (once they're built). On hold (6/11)
ECDF: 131610 Planning is underway, Raul has kindly offered to help. In progress (5/2)
Durham: 131609 The v6 reverse DNS at Durham is still not working, Adam has provided an update on this. In progress (31/1)
Sheffield: 131608 Is there anyway we can help encourage the University to enable v6 for you? On hold (6/11/17)
Manchester: 131607 Duncan reckons you now have v6 reverse DNS lookup, so that's good news. On hold (1/2)
Liverpool: 131606 As further progress here is reliant on some upstream routers getting upgraded maybe this ticket should be put on hold? In progress (14/11/17)
Lancaster: 131605 Lancaster is just waiting on some testing from a v6 only endpoint. I'm working on setting up a v6 only UI to see if that helps. In progress (5/2)
UCL: 131604 Waiting on central IT to get back. On hold (15/1)
RHUL: 131603 RHUL's perfsonar boxen are now dualstacked - nice. On hold (31/1)

Regular Tickets:

SUSSEX
122772 (11/7/16)
Atlas xroot/webdav ticket. At last word just before Christmas Leo was waiting on some ports being opened up in the external firewall. Any joy? In progress (19/12/17)

RALPP
133250 (5/2/1042)
A ROD ticket - the date looks a bit suspect (I don't think GGUS has been around for that long). The test (ch.cern.WebDAV) and the server failing it (mover.pp.rl.ac.uk) all sound a bit weird too. Assigned (2/2/2018)

133274 (5/2)
CMS xroot failures. Things were fixed by a trusty restart script, but Chris has asked about the state of the AAA network. Waiting for reply (5/2)

OXFORD
133215 (31/1)
Atlas deletion errors on the newly reinstalled Oxford SE. After consulting on the dpm list Kashif tweaked his mysql settings and is in the "wait and see" phase. In progress (5/2)

BRISTOL
133220 (1/2)
CMS hammercloud jobs hitting their wall clock limit - for reason for which is proving a bit of a mystery. Luke has looked into this very closely so far, but it might be some weird emergent property. In progress (2/2)

BIRMINGHAM
132569 (19/12/17)
Dirac pilots not being able to be submitted to Birmingham. I think the problem is well understood, have the effected VOs been removed from the bdii? Assigned (22/1)

129930 (4/8/17)
Atlas http tests failing at Birmingham. Perhaps Kashif might have some insight into this after his recent DPM adventure? Although maybe this ticket will become moot. On hold (16/11/17)

GLASGOW
133115 (29/1)
Checking if the new lchb conddb cvmfs mount is mounted. For some odd reason some of Glasgow CEs are failing/not running the tests. Despite all the tests running across the same WNs. In progress (5/2) Update- LHCB seem to think this is a problem with the tests, and so the ticket can be closed.

ECDF
133222 (5/2/3164)
A ROD ticket from the distant future! The tests look okay now, so I suspect this ticket can be closed. Waiting for reply (5/2/2018)

SHEFFIELD
133019 (24/1)
Low availability ticket, all good. On hold (30/1)

133260 (3/2)
Atlas transfers failing. Any luck debugging this? In progress (3/2)

MANCHESTER
131526 (1/11/17)
Storage accounting deployment. Were there some roadblocks for this? On hold (12/1)

LIVERPOOL
133114 (29/1)
New LHCB mountpoint ticket. It looks like this ticket was missed. Assigned (29/1)

RHUL
132715 (4/1)
Supporting hyperk.org. Any word on this? In progress (22/1)

QMUL
132713 (4/1)
Support for hyperk.org. Sadly despite some fixing errors persist. In progress (5/2)

132929 (18/1)
CMS APEL problem for QM jobs. Due to a problem with SLURM, Dan originally "unsolved" this ticket. Reopened with some useful tips, but the apel team has been involved to check on this, which was the right call. In progress (29/1)

BRUNEL
132876 (16/1)
CMS seeing reading issues at Brunel. After some expert debugging from Raul I think we're waiting on the CERN ticket 133010. In progress (5/2)

IMPERIAL (kinda)
132688 (3/1)
A lost pheno files ticket that bounced back to IC. Just waiting for word back from users (which may take a while). In progress (25/1)

TIER 1
132589 (21/12/17)
Killed LHCB pilots at the Tier 1. There's a proposal to mark the ticket "unsolved", but Vladimir seems reluctant to do this. In progress (31/1)

117683 (18/11/15)
The old Glue 2 publishing for Castor ticket. Last news is that a prototype version is in testing. On hold (3/1)

127597 (4/7/17)
CMS ticket checking xroot and network performance. Chris provided a good news update - new firewall hardware is on its way. However this might not fix things, Chris warns more work might be needed. On hold (29/1)

124876 (7/11/16)
Echo failing gridftp nagios tests - due to the tests being broken. Absolutely no movement on the linked ticket to fix the tests (125026). On hold (13/11/17)

132708 (4/1)
The ticket tracking the decommissioning for the RAL WMSseses. It's going well. In progress (18/1)

Monday 29th January 2018, 15.30 GMT
43 Open UK Tickets this week.

New LHCB mountpoint tickets
LHCB have ticketed a bunch of sites to make sure that they have "/cvmfs/lhcb-condb.cern.ch" accessible on their WNs. It's a simple case of check and close, LHCB will do the verification their end afterwards.

BIRMINGHAM
132569 (19/12/17)
I'm not sure if some solid actions were planned out that week for this ticket, but it could do with an update. I think the decision was simply to remove the dirac supported VOs from the CREAM CE bdii? Assigned (should be a different status) (22/1)

BRUNEL
132876 (16/1)
I'm not sure what's going on in this CMS xroot ticket, but I'm wondering if the original issue either still exists or was not a Brunel problem after all. This ticket either can be closed, or perhaps put on hold whilst the related CERN ticket is sorted. In progress (23/1)

ECDF
132446 (11/12/17)
It looks like this ticket tracking dirac jobs having batch system problems can be closed after so tweaking in the argus servers. In progress (26/1)

Also I think the corresponding hyperk support ticket 132716 can be closed too.

RHUL
132715 (4/1)
It might well be that you're still in the middle of network maintenance, but a polite reminder of this hyperk support ticket. In progress (22/1)

TIER 1
132712 (4/1)
Still on the hyperk support ticket, this ticket was just waiting on the hyperk configs to get into quattor. Has that happened yet? In progress (23/1) Update - solved

132589 (21/12/17)
Raja has updated the ticket to sadly report that they are still seeing LHCB job deaths at RAL. In progress (29/1) A further update this morning from Vladimir asks to check on a bunch of jobs' statuses.

132708 (4/1)
Just for information, this is the ticket tracking the decommissioning of the RAL WMSses. In progress (18/1)

Monday 22nd January 2018, 15.00 GMT
54 Open UK Tickets this year.

Start with the good news - these tickets look like they can be closed:

BRISTOL
132880 (16/1)
It looks like transfers are working after the firewall fix. In progress (19/1) Solved, but CMS have hit Bristol with another xroot ticket: 132990

QMUL
132615 (26/12/17)
After changing the working directory LHCB jobs don't seem to be running out of space anymore, so the ticket can be closed. In progress (20/1)

TIER 1
132712 (4/1)
There seems to be positive news getting hyperK jobs working at the Tier 1, so maybe this ticket is sorted? In progress (22/1)

RALPP
132830 (12/1)
This complex CMS xroot ticket looks likely to be solved (in fact Chris might be closing the ticket as I type). In progress (19/1) Solved

Now onto the bad:

RHUL
132715 (4/1)
This ticket from Daniela about supporting the hyperK VO seems to have gone un-noticed. Can you please notice it? Assigned (4/1)

RALPP
132851 (15/1)
This CMS xroot ticket might be related to the one above, hence why it's not been tended to (indeed it might be able to be closed too). There's a request for some verbose output of an xrdcp from different CMS peeps, so the conversation is out of the site's hands for now. In progress (17/1)

QMUL
132713 (4/1)
Fixing hyperk jobs at QM on a couple of CEs. Dan had a kick of things a while back, how did that work out? In progress (4/1)

BIRMINGHAM
132569 (19/12)
Daniela spotted Dirac problems at Birmingham. Ultimately this is fallout from the Birmingham move to VAC, Daniela has suggested that Mark remove the VOs from the BDII to stop dirac sending jobs to an almost dead CE. Assigned (should be something else) (22/1)

MANCHESTER
132121 (28/11/17)
Any news or progress with this ticket to the VOMS service? There's been no updates with words in them from any site admins. In progress (1/12/17)

TIER 1
132589 (21/12/17)
LHCB pilots are still failing at the Tier 1 at Raja's last post, this ticket could do with an update from the Tier 1's side. In progress (10/1)

And the Ugly are a few tickets that need updates from the VOs:

MANCHESTER
132468 (14/12/17)
Alessandra updated this atlas transfer ticket with news that she has informed atlas of many lost files that were causing the errors. No news from anyone since. Perhaps someone from cloud support could update things? In progress (4/1)

IMPERIAL
132688 (3/1)
Daniela tried to poke Pheno over some lost files, but has had nothing but silence from them. Must have not been important files. Assigned (19/1)

132692 (3/1)
This LHCB ticket is in the same state as the Pheno one- waiting for someone from the VO to acknowledge the lost files. Assigned (3/1)

132683 (3/1)
The atlas equivalent of the previous two, Brian jumped on it when poked through another channel - so maybe these lines of communication aren't getting to where they should? In progress (22/1)

Extra extra...

Raul pointed out on tb-support this Brunel ticket 132876, which points to an IPv6 config issue and has been thrown back towards the T0 to fix things (132993).