Difference between revisions of "Past Ticket Bulletins"

From GridPP Wiki
Jump to: navigation, search
Line 1: Line 1:
 +
'''Monday 8th April 2018, 14.00 BST'''<br />
 +
38 Open Tickets this month.
 +
 +
'''RALPP'''<br />
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=139539 139539] (5/2)<br />
 +
A ticket from Duncan regarding blocked perfsonar ports. The host is failing to talk to itself due to odd reasons. Any luck finding the time to look again at this? Duncan posted a few hints a month ago. In Progress (14/2)
 +
 +
'''OXFORD'''<br />
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=140134 140134] (11/3)<br />
 +
Atlas jobs seeing an "Unspecified grid manager error". The classic grid error. Prompted the discussion about atlas asking for jobs not to be killed due to using too much memory. Oxford have raised their default memory per job to 4GB (and they won't kill a job unless it uses 1.5 times that). Waiting to see if that fixes things. How does it look atlas side? Waiting for reply (2/4)
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=138647 138647] (3/12/2018)<br />
 +
T2K DFC migration ticket. If I had left it a few hours this ticket would be closed, Kashif has successfully renamed the files without having to do anything DOMEy. Daniela is re-registering the files in the DFC and hopefully this will be sorted soon. In progress (8/4)
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=131615 131615]  (3/11/2017)<br />
 +
Oxford's IPv6 ticket. Kashif provided a light update last month, still not much movement but there are plans made. On hold (13/3)
 +
 +
'''BIRMINGHAM'''<br />
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=140573 140573] (4/4)<br />
 +
A request from biomed to update the .lsc information. Is this even relevant for your site anymore? Either way the ticket needs acknowledging (or straight up "not relevant to our site"-ing). Assigned (4/4)
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=140584 140584] (4/4)<br />
 +
A ROD ticket for your cream CE Birmingham only has to not get these tickets. Looks like a simple lcg-CA (or whatever they're called these days) updated needed on your WN. As this ticket also hasn't been noticed I wonder if Mark was off work on the 4th? Assigned (4/4)
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=131612 131612] (3/11/2017)<br />
 +
Birmingham's v6 ticket. Things were progressing nicely but slowly back in February. Any time to have any joy on this since then? In progress (5/2)
 +
 +
'''GLASGOW'''<br />
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=140151 140151] (12/3)<br />
 +
LHCB jobs seeing "can't start new thread" errors. The problem isn't understood as far as I can see but it appeared to disappear on its own - on hold to see if it comes back. On hold (27/3)
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=140222 140222] (15/3)<br />
 +
MICE DFC migration ticket. Daniela repoked to ask for an ETA on getting those checksums. In progress (15/3)
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=131611 131611] (3/11/2017)<br />
 +
Glasgow's v6 Ticket. Any news in the last two months about why your v6 performance is pants? In progress (5/2)
 +
 +
'''DURHAM'''<br />
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=131609 131609] (3/11/2017)<br />
 +
Just a v6 ticket at Durham. Really could do with an update. On Hold (4/12/2018)
 +
 +
'''SHEFFIELD'''<br />
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=138649 138649] (3/12/2018)<br />
 +
T2K DFC migration ticket. Elena is asking Kashif for his secrets in renaming DPM files. On Hold (8/4)
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=131608 131608] (3/11/2017)<br />
 +
Sheffield's v6 ticket. Really, really, really needs an update. And probably putting on hold if there are no postive updates. In progress (30/10/2018)
 +
 +
'''MANCHESTER'''<br />
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=131607 131607] (3/11/2017)<br />
 +
Just the v6 ticket here. Please please can we get an update here? And again, if there is not positive news expected can it be set on hold too? In progress (3/12/2018)
 +
 +
'''LIVERPOOL'''<br />
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=139411 139411] (30/1)<br />
 +
What's the plan for this biomed space token ticket? On hold (1/2)
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=138648 138648] (3/12/2018)<br />
 +
T2K DFC migration ticket. The VO would like some folders just plain deleted to help clear things up. In progress (8/4)
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=131606 131606] (3/11/2017)<br />
 +
Liverpool's v6 ticket. As we're at the start of a new FY are plans being formed for the new network upgrades? On hold (6/2)
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=139683 139683] (14/2)<br />
 +
Not really a problem, the decommissioning ticket for the site's SL6 CE. Nice and by the book. In progress (12/3)
 +
 +
'''UCL'''<br />
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=139101 139101] (8/1)<br />
 +
ROD APEL-Pub alarm for UCL's VAC cluster. I think the site is a bit dead in the water here - has there been any news or progress? At last check Ben couldn't install ViaB so was stuck. In progress (4/3)
 +
 +
'''RHUL'''<br />
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=131603 131603] (3/11/2017)<br />
 +
RHUL's v6 ticket. No news on this since January, is there still no news? Are the right people being prodded and poked? In progress (23/1)
 +
 +
'''QMUL'''<br />
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=140190 140190] (14/3)<br />
 +
LHCB seeing FTS problems. After a lot of poking it seemed the problem went away after a few service and node restarts for unrelated reasons. Brian notices that the FTS monitoring is looking okay now - can this ticket be closed? Or do we want to watch it a bit longer? In progress (2/4)
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=140628 140628] (8/4)<br />
 +
Atlas jobs failing because the work directory is too large (wait, what? Too much space?). Fresh in today - Dan is on the atlas cloud support list getting advice as I type. Assigned (8/4)
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=138364 138364] (19/11/2018)<br />
 +
T2K DFC migration ticket. Daniela has repoked the ticket with urgency! In progress (1/3)
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=134573 134573] (17/4/18)<br />
 +
CMS request to install singularity. Dan rolled this into the C7 migration, how's that going? On hold (5/11/2018)
 +
 +
'''IMPERIAL'''<br />
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=138359 138359] (19/11/2018)<br />
 +
T2K DFC migration master ticket.
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=140198 140198] (14/3)<br />
 +
MICE DFC migration master ticket.
 +
 +
No real tickets at IC - just master tickets tracking other issues.
 +
 +
'''BRUNEL'''<br />
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=140619 140619] (8/4)<br />
 +
CMS transfers of a file failing, due to a classic disk server down error. Raul is working to bring it back from the brink. In progress (8/4)
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=140223 140223] (15/3)<br />
 +
MICE DFC migration ticket. Progressing nicely, just one more job to do (a big file move). In progress (8/4)
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=140598 140598] (4/4)<br />
 +
Another CMS ticket due to the same server being down. In progress (5/4)
 +
 +
'''THE TIER 1'''<br />
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=140599 140599] (5/4)<br />
 +
LHCB data access problems. Restarting a castor server seems to have got things going again, can this ticket be closed then? In progress (5/4)
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=140577 140577] (4/4)<br />
 +
Really a ticket to LHCB, George noticed loads of file requests coming in with no service class defined, which has the potential to cause issues. It is being looked at (Matt code for I scanned the ticket and got lost), as of Friday there still were a few files coming in with these symptoms. In progress (5/4)
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=140447 140447] (27/3)<br />
 +
The ever vigilant Duncan spotted v6 outbound packet loss on the RAL perfsonar. Investigations went underway - any results? In progress (2/4)
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=140220 140220] (15/3)<br />
 +
MICE DFC migration ticket. Daniela has batted questions at both the VO and the Tier 1 today, so it's still ongoing. In progress (8/4)
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=138665 138665] (4/12/18)<br />
 +
MICE LFC problem ticket. With the progress on the previous issue I think this can be closed (to either solved or unsolved)? On Hold (30/1)
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=140589 140589] (4/4)<br />
 +
A case of killed LHCB pilots at RAL. James confessed to an accidental docker restart across many nodes that could be responsible for the carnage. Some more investigation is being done, are jobs still being killed? In progress (4/4)
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=139672 139672] (13/2)<br />
 +
No LIGO pilots running at RAL. Efforts were made but sadly no fruit was born from them - and there's been no news for a month. Is this being looked at offline? The last update from the VO had a pleading tone to it. In progress (5/3)
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=138033 138033] (1/11/2018)<br />
 +
The old atlas singularity jobs failing ticket. This was closed for a while, but Alessandra has reopened the case - the last batch of tests revealed some issues with the RAL configuration. Reopened (6/4)
 +
 +
 
'''Tuesday 12th March 2019, 10.15 GMT'''<br />
 
'''Tuesday 12th March 2019, 10.15 GMT'''<br />
 
41 Open Tickets this week.
 
41 Open Tickets this week.

Revision as of 14:21, 29 April 2019

Monday 8th April 2018, 14.00 BST
38 Open Tickets this month.

RALPP
139539 (5/2)
A ticket from Duncan regarding blocked perfsonar ports. The host is failing to talk to itself due to odd reasons. Any luck finding the time to look again at this? Duncan posted a few hints a month ago. In Progress (14/2)

OXFORD
140134 (11/3)
Atlas jobs seeing an "Unspecified grid manager error". The classic grid error. Prompted the discussion about atlas asking for jobs not to be killed due to using too much memory. Oxford have raised their default memory per job to 4GB (and they won't kill a job unless it uses 1.5 times that). Waiting to see if that fixes things. How does it look atlas side? Waiting for reply (2/4)

138647 (3/12/2018)
T2K DFC migration ticket. If I had left it a few hours this ticket would be closed, Kashif has successfully renamed the files without having to do anything DOMEy. Daniela is re-registering the files in the DFC and hopefully this will be sorted soon. In progress (8/4)

131615 (3/11/2017)
Oxford's IPv6 ticket. Kashif provided a light update last month, still not much movement but there are plans made. On hold (13/3)

BIRMINGHAM
140573 (4/4)
A request from biomed to update the .lsc information. Is this even relevant for your site anymore? Either way the ticket needs acknowledging (or straight up "not relevant to our site"-ing). Assigned (4/4)

140584 (4/4)
A ROD ticket for your cream CE Birmingham only has to not get these tickets. Looks like a simple lcg-CA (or whatever they're called these days) updated needed on your WN. As this ticket also hasn't been noticed I wonder if Mark was off work on the 4th? Assigned (4/4)

131612 (3/11/2017)
Birmingham's v6 ticket. Things were progressing nicely but slowly back in February. Any time to have any joy on this since then? In progress (5/2)

GLASGOW
140151 (12/3)
LHCB jobs seeing "can't start new thread" errors. The problem isn't understood as far as I can see but it appeared to disappear on its own - on hold to see if it comes back. On hold (27/3)

140222 (15/3)
MICE DFC migration ticket. Daniela repoked to ask for an ETA on getting those checksums. In progress (15/3)

131611 (3/11/2017)
Glasgow's v6 Ticket. Any news in the last two months about why your v6 performance is pants? In progress (5/2)

DURHAM
131609 (3/11/2017)
Just a v6 ticket at Durham. Really could do with an update. On Hold (4/12/2018)

SHEFFIELD
138649 (3/12/2018)
T2K DFC migration ticket. Elena is asking Kashif for his secrets in renaming DPM files. On Hold (8/4)

131608 (3/11/2017)
Sheffield's v6 ticket. Really, really, really needs an update. And probably putting on hold if there are no postive updates. In progress (30/10/2018)

MANCHESTER
131607 (3/11/2017)
Just the v6 ticket here. Please please can we get an update here? And again, if there is not positive news expected can it be set on hold too? In progress (3/12/2018)

LIVERPOOL
139411 (30/1)
What's the plan for this biomed space token ticket? On hold (1/2)

138648 (3/12/2018)
T2K DFC migration ticket. The VO would like some folders just plain deleted to help clear things up. In progress (8/4)

131606 (3/11/2017)
Liverpool's v6 ticket. As we're at the start of a new FY are plans being formed for the new network upgrades? On hold (6/2)

139683 (14/2)
Not really a problem, the decommissioning ticket for the site's SL6 CE. Nice and by the book. In progress (12/3)

UCL
139101 (8/1)
ROD APEL-Pub alarm for UCL's VAC cluster. I think the site is a bit dead in the water here - has there been any news or progress? At last check Ben couldn't install ViaB so was stuck. In progress (4/3)

RHUL
131603 (3/11/2017)
RHUL's v6 ticket. No news on this since January, is there still no news? Are the right people being prodded and poked? In progress (23/1)

QMUL
140190 (14/3)
LHCB seeing FTS problems. After a lot of poking it seemed the problem went away after a few service and node restarts for unrelated reasons. Brian notices that the FTS monitoring is looking okay now - can this ticket be closed? Or do we want to watch it a bit longer? In progress (2/4)

140628 (8/4)
Atlas jobs failing because the work directory is too large (wait, what? Too much space?). Fresh in today - Dan is on the atlas cloud support list getting advice as I type. Assigned (8/4)

138364 (19/11/2018)
T2K DFC migration ticket. Daniela has repoked the ticket with urgency! In progress (1/3)

134573 (17/4/18)
CMS request to install singularity. Dan rolled this into the C7 migration, how's that going? On hold (5/11/2018)

IMPERIAL
138359 (19/11/2018)
T2K DFC migration master ticket.

140198 (14/3)
MICE DFC migration master ticket.

No real tickets at IC - just master tickets tracking other issues.

BRUNEL
140619 (8/4)
CMS transfers of a file failing, due to a classic disk server down error. Raul is working to bring it back from the brink. In progress (8/4)

140223 (15/3)
MICE DFC migration ticket. Progressing nicely, just one more job to do (a big file move). In progress (8/4)

140598 (4/4)
Another CMS ticket due to the same server being down. In progress (5/4)

THE TIER 1
140599 (5/4)
LHCB data access problems. Restarting a castor server seems to have got things going again, can this ticket be closed then? In progress (5/4)

140577 (4/4)
Really a ticket to LHCB, George noticed loads of file requests coming in with no service class defined, which has the potential to cause issues. It is being looked at (Matt code for I scanned the ticket and got lost), as of Friday there still were a few files coming in with these symptoms. In progress (5/4)

140447 (27/3)
The ever vigilant Duncan spotted v6 outbound packet loss on the RAL perfsonar. Investigations went underway - any results? In progress (2/4)

140220 (15/3)
MICE DFC migration ticket. Daniela has batted questions at both the VO and the Tier 1 today, so it's still ongoing. In progress (8/4)

138665 (4/12/18)
MICE LFC problem ticket. With the progress on the previous issue I think this can be closed (to either solved or unsolved)? On Hold (30/1)

140589 (4/4)
A case of killed LHCB pilots at RAL. James confessed to an accidental docker restart across many nodes that could be responsible for the carnage. Some more investigation is being done, are jobs still being killed? In progress (4/4)

139672 (13/2)
No LIGO pilots running at RAL. Efforts were made but sadly no fruit was born from them - and there's been no news for a month. Is this being looked at offline? The last update from the VO had a pleading tone to it. In progress (5/3)

138033 (1/11/2018)
The old atlas singularity jobs failing ticket. This was closed for a while, but Alessandra has reopened the case - the last batch of tests revealed some issues with the RAL configuration. Reopened (6/4)


Tuesday 12th March 2019, 10.15 GMT
41 Open Tickets this week.

Biomed .lsc Tickets
A few of these don't seem to have been 'spotted' yet:
139971 (QMUL)
139961 (LIVERPOOL)

When do you give up on a disk server?
139863 (LANCASTER)
A broken disk server has been causing all manner of troubles at Lancaster - and we've not had any luck fixing it. The data's okay (it's the NIC that's broken), but at what point is it easier just to declare the 121TB of data on board lost?

When is an atlas ticket not an atlas ticket?
139741 (Glasgow)
When they forget to fill in the "concerned VO" section. In a slightly similar theme, Sam notes the pain of having to click through files on the DDM monitoring pages to look for failures (I've had some success by downloading the json and grepping through that, but that's still a cold comfort when you're dealing with a good few thousand failures as you can only do 500 fails at a time).

Whistle-stop Tour of the v6 Tickets OXFORD: 131615
Last update: 7/1/19. Could do with a bi-monthly update, even if it's "nothing to report, move along".

BIRMINGHAM: 131612
Last update: 5/2/19. Things were looking a little positive last month with Mark wrestling with v6 DNS, were you victorious Mark?

GLAGOW: 131611
Last update: 5/2/19. v6 kind of works at Glasgow, if one defines the concept of working loosely enough. I don't think any more news is expected a month on, but it would be nice to be proved wrong.

ECDF: 131610
Last update: 4/2/19. The chaps at Edinburgh were waiting on DPM 1.11 to come out before dual stacking their storage. Wait longer. Trust me. Just wait.

DURHAM: 131609
Last Update: 4/12/18. Any news whatsoever from your networking team? The ticket has reached a point where it could do with even a null update.

SHEFFIELD: 131608
Last Update: 30/10/18. This really, really, really, really needs an update, especially as the last word was quite positive.

MANCHESTER: 131607
Last update: 3/12/18. Similar for the Manchester ticket. Last word was quite positive, but since then silence.

LIVERPOOL: 131606
Last update: 6/2/18. John gave a good summary of the situation, with the site waiting on upgrades due next FY. Hope it all goes well.

RHUL: 131603
Last update: 23/1/19. No change at RHUL, but it really would be nice to see how outsourcing the v6 DNS to JANET works out.

Tuesday 5th March 2019, 9.30 GMT
47 Open UK Tickets. Skipping over the v6 tickets as we need something to look forward to next week.

Update Biomed's .lsc
RALPP: https://ggus.eu/?mode=ticket_info&ticket_id=139962 (assigned)
SHEFFIELD: https://ggus.eu/?mode=ticket_info&ticket_id=139771 (in progress)
SHEFFIELD AGAIN(?): https://ggus.eu/?mode=ticket_info&ticket_id=139963 (assigned)
LIVERPOOL: https://ggus.eu/?mode=ticket_info&ticket_id=139961 (assigned)
RHUL: https://ggus.eu/?mode=ticket_info&ticket_id=139973 (in progress)
QMUL: https://ggus.eu/?mode=ticket_info&ticket_id=139971 (assigned)
BRUNEL: https://ggus.eu/?mode=ticket_info&ticket_id=139953 (assigned)

A lot of these tickets are fairly neglected - is this just a symptom of the VO's priority?

T2K DFC Migration Tickets
Master Ticket: https://ggus.eu/?mode=ticket_info&ticket_id=138359 (Last update was Daniela giving T2K sensible suggestions, waiting feedback)
QMUL: https://ggus.eu/?mode=ticket_info&ticket_id=138364 (stalled but hopefully files started moving this week?)
LIVERPOOL: https://ggus.eu/?mode=ticket_info&ticket_id=138648 (how goes the DOME migration?)
SHEFFIELD: https://ggus.eu/?mode=ticket_info&ticket_id=138649 (waiting to here if T2K want their data still)
OXFORD: https://ggus.eu/?mode=ticket_info&ticket_id=138647 (as above)

Most of the rest of the tickets, site by site:

TIER 1
https://ggus.eu/?mode=ticket_info&ticket_id=139476 (1/2)
This request for an LFC dump for MICE has been sitting for a month, has it fallen under a pile? In progress (4/2)

https://ggus.eu/?mode=ticket_info&ticket_id=138665 (4/12/18)
The LFC access problem ticket that prompted the above LFC dump ticket. On Hold (30/1)

https://ggus.eu/?mode=ticket_info&ticket_id=139990 (1/3)
CMS spotted the ral xrootd segfaulting and asked for a core dump file. Some back and forth looking at problems in an ongoing conversation. In progress (4/3)

https://ggus.eu/?mode=ticket_info&ticket_id=139306 (24/1)
Request from Duncan to upgrade the Tier 1 perfsonar boxen, and update a few settings. A few technical hurdles were hit (classic yum problems). Any recent news on this? In progress (27/2)

https://ggus.eu/?mode=ticket_info&ticket_id=139672 (13/2)
LIGO don't have any pilots running at RAL. Catalin has been bug hunting to fix some configs and has just requested the VO send some test jobs. Waiting for reply (4/3)

https://ggus.eu/?mode=ticket_info&ticket_id=139639 (12/2)
CMS having troubles opening a file at RAL. This has turned into a dodgy file hunt, and Katy has been involved in debugging what's going on. Waiting for the user to try again. Waiting for reply (27/2)

https://ggus.eu/?mode=ticket_info&ticket_id=138033 (1/11/18)
The atlas singularity ticket. Not much news on this one for a while. In progress (31/1)

https://ggus.eu/?mode=ticket_info&ticket_id=139983 (28/2)
T2K having problems bringing files online from RAL TAPE. The ticket sat in limbo for a bit as the site wasn't notified. John confirms the file exists and has passed it onto the castor team. In progress (5/3)

https://ggus.eu/?mode=ticket_info&ticket_id=138500 (26/11/18)
CMS transfer problems between Swierk and RAL. Looks like this might be fixed by changing the block size (not sure in what context), just waiting for monitoring plots to all turn green before closing. In progress (4/3)

https://ggus.eu/?mode=ticket_info&ticket_id=139723 (15/2)
Atlas permissions problems on echo scratch disk. I've just scanned the ticket, but there's a reference to problems with stating files immediately after upload that were seen at Lancaster. Investigations are ongoing, but perhaps are a little stalled? In progress (26/2)

UCL
https://ggus.eu/?mode=ticket_info&ticket_id=139101 (8/1)
A ROD APEL ticket - Ben is stuck trying to fix it as he's having no luck installing ViaB on his nodes - "Cannot retrieve repository metadata (repomd.xml) for repository: UMDbase.". Although it jsut occurs to me it might be worth having Ben try again as he might have tried when the UMD repos were down t'other week. In progress (4/3)

QMUL
https://ggus.eu/?mode=ticket_info&ticket_id=134573 (17/4/18)
CMS request to install singularity. Any recent joy with your C7 deployment to put into the ticket? On Hold (5/11)

https://ggus.eu/?mode=ticket_info&ticket_id=139097 (7/1)
LHCB data transfer problems. Everything looks good now so this ticket can be closed (lhcb say so). In progress (25/2)

RALPP
https://ggus.eu/?mode=ticket_info&ticket_id=139539 (5/2)
Debugging a problem with one of RALPP's perfsonar hosts. Chris thinks there might be an errant expired certificate somewhere on his server. But where? Does it sound similar to anyone? In progress (14/2)

https://ggus.eu/?mode=ticket_info&ticket_id=139915 (26/2)
v6 transfer problem ticket. Chris fixed the immediate problem with a network restart, but the root cause remains unknown. In Progress (28/2)

BRUNEL
https://ggus.eu/?mode=ticket_info&ticket_id=140043 (4/3)
A fresh ROD ticket for the Brunel ARC CE. Raul was provisioning some new hosts and it hit things a bit, he's snuck in a downtime. In progress (4/3)

https://ggus.eu/?mode=ticket_info&ticket_id=139509 (4/2)
CMS data transfer problems to Brunel, aftershocks from moving to DOME and away from using SRM. Things were looking better after some heroic efforts by Raul. How goes things? In progress (24/2)

https://ggus.eu/?mode=ticket_info&ticket_id=139344 (28/1)
Another CMS transfer ticket(?) The same story as above. Maybe they can be consolidated? In progress (24/2)

BRISTOL
https://ggus.eu/?mode=ticket_info&ticket_id=139870 (25/2)
CMS transfer ticket. Looks like it's being sorted. Waiting for reply (5/3)

BIRMINGHAM
https://ggus.eu/?mode=ticket_info&ticket_id=137801 (17/10/18)
The DPM decommissioning ticket. I reckon this ticket can be closed now, a month on from removing the DPM from gocdb. In progress (5/2)


Tuesday 19th February 2019, 10.30 GMT
43 Open Tickets Today.

Request to Update Biomed .lscs
139776 (Birmingham)
139771 (Sheffield)
A reminder that that biomed have updated their voms information, and have started ticketing sites about it.

T2K File Catalog Migration
138359
It might be pointless bringing this up as I think Daniela is off today, but I believe this issue is mostly waiting on feedback from t2k.

ROD Dashboard not updating?
138894 (Birmingham SRM alarm). 138243 (ECDF Availability alarm). Both these ROD tickets seem to be for issues that should have gone away - the SE in the Birmingham ticket is now but a memory and some tarred up log files, whilst the argo monitoring shows that ECDF's availability has been > 95% for the last 30 days. Yet alarms on the ROD dashboard persist. Has a ticket been put in about this?

Double then nothing at Glasgow
139741 Gareth has spotted what looks like more double-transfer-then-delete-it-all FTS transfers at Glasgow. I thought this problem should have been sorted? Or is that only on certain FTSes?

MICE Dump at the Tier 1
139476 Any news on this ticket to generate an LFC dump for MICE?

Stale v6 Tickets
131609 (Durham)
131608 (Sheffield)
131607 (Manchester)
Another polite reminder to please update these tickets - even a quick "nothing new here" update will be fine.


Tuesday 12th February 2018, 10.15 GMT

41 Open Tickets today. A light touch today as you heard more then enough of my voice last week!

Will the new release of DPM (hrlp) fix these tickets?
139509 - Brunel, CMS
139344 - Brunel, CMS
139587 - Lancaster, ATLAS
137996 - Lancaster, ROD
131610 - ECDF, Ipv6
DPM 1.11 is out imminently, with a bunch of bug fixes that might sort these out. But a righteously grumpy Raul needed this fix a few weeks ago, and is dubious about DPM's testing infrastructure.

Could the following IPv6 tickets please get an update?
131609 - Durham
131608 - Sheffield
131607 - Manchester


NGI Ticket regarding Birmingham
139506 - We've responded, but the site might have other things to add. One thing I noticed - the new EOS instance doesn't appear to be in the gocdb (unless I'm being blind). Is this intentional?


Monday 4th February 2019, 14.30 GMT
41 Open UK Tickets this month.

NGI
139506 (4/2)
The NGI got a ticket regarding Birmingham's availability figures, which are thrown by the decommissioning of their SE. We need to formulate a reponse, but we should perhaps ask for an A/R recomputation for January for the site. Assigned (4/2)

OXFORD
139431 (30/1)
A request from CMS to updated the site's site-local-config. Being looked at. In progress (31/1)

138647 (3/12/18)
Ticket tracking the t2k DFC migration at Oxford. Kashif has supplied the best file dump that he can without DOME installed. Daniela has asked the VO if they can enact a "clean slate" solution at Oxford to make life easier for all. In progress (31/1)

131615 (3/11/17)
Oxford's IPv6 ticket. Kashif has kept this up to date, with some semi-positive news - things are moving in the right direction, however slowly. On Hold (7/1)

BRISTOL
139410 (30/1)
CMS ticket for transfer failures from Florida to the site. Investigation suggests that this might be an IPv6 issue. In progress (4/2)

131613 (3/11/17)
Bristol's IPv6 ticket. Good progress here, but more holes needed to be poked in the site's v6 firewall. We'll need to check the PS mesh (still all grey for Bristol's v6 endpoints at time of writing). In progress (4/2)

BIRMINGHAM
137801 (17/10/18)
Ticket tracking the decommissioning of the Birmingham DPM. The node was removed from gocdb and switched off last week. I can't remember how long these tickets need to be kept open - I should look that up really. Just remember to keep your logs for 90 days Mark! In progress (30/1)

138894 (17/12/18)
This ROD ticket for the decommissioned SE might have hit a problem - Mark removed the server from the gocdb but there's still an alarm on the dashboard... On Hold (9/1)

138244 (12/11/18)
Meanwhile since killing off the old DPM completely the Birmingham Availability/Reliability figures have started to fix themselves. On Hold (1/2)

131612 (3/11/17)
Birmingham's v6 ticket. Some good news just before Christmas, hopefully Mark will be able to start dual-stacking once he's cleared his plate a bit. On Hold (24/12/18)

GLASGOW
131611 (3/11/17)
Only the v6 ticket at Glasgow. Last update (today) was a request for info from the v6 ticket watchers. In progress (4/12/18)

EDINBURGH
139240 (21/1)
An LHCB ticket about jobs failing, tracked to a "black hole" node that was took offline. Last update was waiting on the VO to confirm if the problem has gone away, which they were having problems doing due to having "issues" at the time. If there's no word from LHCB soon then I would close this ticket. In progress (22/1)

138243 (12/1/18)
An availability ticket. I'm a little confused as to why there's still an alarm on the dashboard, as the argo page looks to my eyes like the site has had >85% availability over the last 30 days (only one non-100% day). On Hold (1/2)

131610 (3/11/17)
ECDF's v6 ticket. Some positive news back in early December, the ticket could do with an update. In progress (4/12/18)

DURHAM
131609 (3/11/17)
Another site with just the v6 ticket. Last update was the start of December, any news from your network team at all? On Hold (4/12/18)

SHEFFIELD
138649 (3/12/18)
Sheffield's t2k DFC migration ticket. The site's status is the same as Oxford, and was included in Daniela's query to t2k in that ticket. In progress (9/1)

131608 (3/11/17)
Sheffield's v6 ticket. In great need of an update. In progress (30/10)

MANCHESTER
131607 (3/11/17)
Only the v6 ticket at Manchester too. Things were looking good towards the end of last year, any news? In progress (27/11/18)

LIVERPOOL
139411 (30/1)
A request from Biomed querying if they still need to use the -s option to use the site's space token (note that they're still using lcg tools). John replied that currently this is still the case, but in the DOME future it won't be (due to quotatokens being applied to a directory). On Hold (1/2)

138648 (3/12/18)
Liverpool's t2k DFC migration ticket. Unlike the other two sites Liverpool is planning on migrating to DOME soonish, so they might not require a "clean slate solution". On Hold (18/12/18)

131606 (3/11/17)
Liverpool's v6 ticket. Last report had the networking team look at this in the New Year (so now-ish) to dual stack the storage, whilst the perfsonars are happily dual-stacked already. Please update the ticket once you know more (whoch will hopefully be soon-ish). In Progress (5/12/18)

LANCASTER
137996 (30/10/18)
A ROD ticket for an http test failure caused by DPM not quite handling http file moves quite right. Waiting on an updated version of DPM to get into epel - I will ask the devs today how that's going. On Hold (14/1)

UCL
139101 (8/1)
A ROD ticket for APEL publishing test failures. Ben has called Andrew McNab in for help installing things. In Progress (30/1)

RHUL
131603 (7/11/17)
Just the v6 ticket at RHUL too. Simon confirms that there's been no news on this front. In progress (23/1)

QMUL
139430 (30/1)
Another CMS ticket to update the site-local-config. Daniela has sorted it and has asked CMS to confirm. Waiting for reply (4/2)

139097 (7/1)
LHCB seeing data transfer problems, but this was a while ago. Dan has asked if problems persist. Waiting for reply (30/1)

138364 (19/11/18)
QM's t2k DFC migration ticket. Dan was ready to do the data moving bit, just asked for a confirmation of that needed to be done. Is the move underway Dan? In progress (16/1)

134573 (17/4/19)
CMS request to install singularity. Dan is rolling this into the move to C7, which was in the testing phase last November. Any recent news? On Hold (5/11/18)

IMPERIAL
139454 (31/1)
A ticket from a t2k user having trouble accessing post-DFC migration data at RALPP - which for reasons had to be routed to Imperial. Daniela can't spot any problems, so it looks like a user side issue. Although it might be worth checking the t2k.org .lsc files at RALPP. Assigned (should be something else) (31/1)

138359 (19/11/18)
Daniela runs such a tight ship at IC that she has to assign other issues to her site - this is the DFC migration master ticket. On Hold (22/1)

BRUNEL
139344 (28/1)
CMS transfer failures at Brunel. The storage is working fine, but it looks like some files aren't at Brunel that CMS things should be at Brunel, with no explanation of where they went. It's being investigated. In progress (4/2)

100IT still have ticket: 137306 (last update 16/1)

TIER 1
138361 (19/11/18)
The Tier 1's t2k DFC migration ticket. The ticket looks done with, just waiting on t2k to see if things are okay. That seems to be a little unclear, but that might be a VO side problem. In progress (31/1)

138665 (4/12/18)
The original mice LFC ticket, on hold whilst the above is sorted out.

139476 (1/2)
With the MICE LFC dead in the water this is the request for a dump to migrate to the DFC. In progress (4/2)

139306 (24/1)
A request from Duncan to upgrade the RAL perfsonar hosts (and fix some configs). In progress (29/1)

138891 (17/12)
A ROD availability ticket that looks a bit off - John thinks this is due to invalid tests being run and has opened a counter ticket: 139198 - from that the test in question is due to be removed this week. On Hold (16/1)

139477 (1/2)
A ROD ticket for a couple of sickly ARC CEs. One node is fixed, the other was already on the naughty step for having a high load (possibly from the A-REX slapd process), and it's being poked and prodded. In progress (4/2)

138500 (26/11/18)
CMS transfers from T2_PL_SWIERK failing. File transfer experts were about to be called in, and the ticket is now On Hold. Is it going to be a tough one to debug? On Hold (30/1)

138033 (1/11/18)
Atlas ticket for singuarlity job failures at RAL. Still lots of back and forth here, with great efforts from James and Alessandra. In progress (31/1)

139414 (30/1)
LHCB jobs seg faulting. It appears these errors all occurred on VMs, and now those VMs have passed on the errors have disappeared too. As there's no way to easily proceed (VM necromancy isn't a thing afaik) then it looks like this one can be closed. In progress (4/2)

Tuesday 29th January 2019, 10.00 GMT

36 Open UK Tickets today.

TIER 1
138665 (4/12/18)
My apologies for being a nag, but this MICE LFC ticket still hasn't had an update this side of Christmas. Could someone please take a look and update the ticket (or at least re-acknowledge the ticket's existence). In progress (12/12/18)

138500 (26/11/18)
This CMS transfer ticket is a little quiet, although I suspect that's due to a lot of conversation going on along other channels and work is ongoing on the issue. Are my suspicions correct? In progress (17/1)

QMUL
139097 (7/1)
In a similar nagging tone, no words have been added to this ticket, from either side (site or LHCB). Is the issue still an issue now that (I believe) the works at QM are finished? In progress (8/1)

ECDF
139240 (21/1)
A comment aimed at LHCB rather then the site - have you been able to check that the issue at hand (which looked to be a classic black hole node) has been dealt with? The last report from the VO mentioned there were other issues preventing seeing if things were solved. In progress (22/1)

BIRMINGHAM
137801 (17/10/18)
The aspirational switch off date for the Birmingham DPM was yesterday. How did that go Mark? Do you now feel like a huge weight is off your shoulders? In Progress (22/1)

LIVERPOOL
138943 (19/12/18)
Just in case the Liver lads haven't seen it, this LHCB transfer issue is no more and the ticket can be closed. In progress (28/1)

BRISTOL
131613 (3/11/17)
To keep with the positives, the Bristol IPv6 ticket looks to be almost finished with - firewall ports are open so we just need to see if PS tests run fine. Nice. In progress (29/1)

Monday 21st January 2019, 16.30 GMT
39 Open UK Tickets this week.

First a look at a few regular tickets:

TIER 1
138665 (4/12/18)
This MICE LFC ticket that was mentioned last week still could do with some attention, it still hasn't been updated since last year. It looks like a connection issue (and a bit of an odd one at that). In Progress (12/12/18)

RALPP
139222 (18/1)
A ROD ticket for webdav test failures. Chris has asked where to get some help with figuring out the error code seen when the test fails - the test description link in the ticket appears to be broken. In progress (21/1)

QMUL
139097 (7/1)
Any luck fixing these LHCB data transfer failures? In progress (8/1)

THE IPv6 TICKETS
OXFORD: 131615
Kashif provided a comprehensive update at the start of the month - it looks like some progress is soon going being made, although it looks like it will be a slow process due to the low priority of IPv6 with the Oxford networking people. On Hold (7/1)

BRISTOL: 131613
Things are looking positive at Bristol, just waiting on some holes in the site firewall for the perfsonar boxen. Any luck with that? In progress (21/12)

BIRMINGHAM: 131612
More positive news, the new central infrastructure is in place and so hopefully Mark can have a go at dual-stacking soon (I assume after he's killed off his DPM). On Hold (24/12/18)

GLASGOW: 131611
Sadly not so positive an update from Glasgow - Gareth explained how their perfsonar revealed v6 traffic issues when it was dual-stacked (which is its job), so they're waiting on this getting fixed. Luckily the usual sticking point of v6 reverse DNS isn't an issue. In progress (4/12)

ECDF: 131610
Rob explained in the last update how the physical migration of the site *didn't* break the v6 connectivity of the perfsonar and test DPM (yey!). Dualstacking the production storage was predicted to start around nowish (give or take a month I assume). Any recent news? Now worries if there's not though. In Progress (4/12/18)

DURHAM: 131609
Adam forwarded Duncan's information about the JISC Secondary DNS service to the Durham networking team - v6 packets can otherwise flow (just no DNS!). Any word back from them? On Hold (4/12)

SHEFFIELD: 131608
There was hope that the perfsonar box could be dualstacked in November, but I assume the usual end of the year rush happened. Any luck dual-stacking it this year? An update for this ticket would be great. In progress (30/10/18)

MANCHESTER: 131607
Manchester got a shiny new IPv6 range towards the end of last year. Any luck dual-stacking your storage yet? Any timeframe for doing so if you haven't got round to it yet? In progress (3/12/18)

LIVERPOOL: 131606
John gave a nice chunky update last month - the site stands ready to dualstack their storage, but just waiting on getting the WAN routing fixed (hopefully sometime soonish). But at least their perfsonar is happily v6'd. In Progress (5/12/18)

RHUL: 131603
At last check RHUL were waiting on v6 DNS before they could proceed. Simon reported that RHUL were looking at outsourcing this service to JANET, but no word on if/how well that's going/gone. Any news? An update would be appreciated. In progress (29/10)

Monday 14th January 2019, 14.30 GMT

40 Open UK Tickets this week.

T2K DFC Migration on DPMs
Liverpool: 138648
Oxford: 138647
Sheffield: 138649
Lancaster: 138365

A quick summing up of these tickets- to provide the information T2K need (namely adler32 checksums for files that don't already have them) it appears your DPM needs to be DOME'd. At Lancaster seem to be having the most luck with this so far so please feel free to prod me about it.

v6-looking transfer problems
Liverpool (lhcb): 138943 (19/12)
RALPP: (atlas): 139127 (10/11)

Whilst for different VOs there's a common theme to both of these tickets - it looks like the failing transfers are trying to use IPv6. Any thoughts? Update - both tickets have been looked at further, the Liverpool ticket was a firewall issue and should be fixed. Chris has looked into the RALPP errors and is a little confused as there don't seem to be any v6 routing problems but there are too many v6 transfer failures.

Bristol LHCB Ticket
138402 (21/11/18)
Are the issues described in this ticket still happening? That might be a question for the VO rather then the site. (6/12/18)

Last Year's Tier 1 Tickets:
138665 (LFC access issues)
138500 (CMS transfer failures)
138361 (T2K DFC migration)
A quick note that none of these tickets have had an update from the site yet this year to indicate that they've been picked up again after the Holiday break.

Extra Extra 139152 - This Sheffield LHCB ticket from the weekend seems to have been missed, it looks like there might be a black hole node gobbling up LHCB jobs.


Monday 14th January 2019, 14.30 GMT

40 Open UK Tickets this week.

T2K DFC Migration on DPMs
Liverpool: 138648
Oxford: 138647
Sheffield: 138649
Lancaster: 138365

A quick summing up of these tickets- to provide the information T2K need (namely adler32 checksums for files that don't already have them) it appears your DPM needs to be DOME'd. At Lancaster seem to be having the most luck with this so far so please feel free to prod me about it.

v6-looking transfer problems
Liverpool (lhcb): 138943 (19/12)
RALPP: (atlas): 139127 (10/11)

Whilst for different VOs there's a common theme to both of these tickets - it looks like the failing transfers are trying to use IPv6. Any thoughts? Update - both tickets have been looked at further, the Liverpool ticket was a firewall issue and should be fixed. Chris has looked into the RALPP errors and is a little confused as there don't seem to be any v6 routing problems but there are too many v6 transfer failures.

Bristol LHCB Ticket
138402 (21/11/18)
Are the issues described in this ticket still happening? That might be a question for the VO rather then the site. (6/12/18)

Last Year's Tier 1 Tickets:
138665 (LFC access issues)
138500 (CMS transfer failures)
138361 (T2K DFC migration)
A quick note that none of these tickets have had an update from the site yet this year to indicate that they've been picked up again after the Holiday break.

Extra Extra 139152 - This Sheffield LHCB ticket from the weekend seems to have been missed, it looks like there might be a black hole node gobbling up LHCB jobs.