Past Ticket Bulletins

From GridPP Wiki
Jump to: navigation, search

Monday 3rd June 2019, 14.30 BST
47 Open UK Tickets this month

Yearly GOCDB review
141296 (21/5)
Has everyone checked their site's GOCDB information to make sure it's all up to date?

iris.ac.uk VO tickets
141533
A new VO has been created. Will it be added to the operations portal? Is the plan to make this a gridpp approved VO, or only for IRIS sites? Andrew spoilered this one for us on TB-SUPPORT!

LCGDM "Retirement" tickets
141466 (Glasgow)
141468 (Liverpool)
141469 (Manchester)
Tickets asking for site's plans now that LCGDM is phased out - all sites have replied.

BRUNEL
141475 (29/5)
This ticket regarding your ARC CEs seems to have been missed. Assigned (29/5)

141435 (28/5)
Similar for this atlas ticket. Assigned (28/5)

Both tickets handled now.

QMUL
141553 (3/6)
It appears that (presumably harmless) CERT warnings are generating alarms in the ROD dashboard. Is that new behaviour? In progress (3/6) Discussed on TB-SUPPORT

CAMBRIDGE
141241 (20/5)
How goes the decommissioning? Are all broadcasts sent out? In progress (20/5)

ECDF
141098 (9/5)
This DUNE ticket has been solved, and can be closed with the blessing of the VO. In progress (29/5)

SHEFFIELD
138649 (3/12/18)
Do you need any extra help fixing the file move Elena? On hold (20/5)

DURHAM
141234 (20/5)
Another case of atlas jobs being killed due to using too much memory. In Progress (29/5)

TIER 1
141262 (21/5)
Any progress with this LHCB ticket, where jobs failed trying to access a file? In progress (22/5)

140870 (25/4)
Has the process of moving these T2K files to where they should be in the CASTOR namespace started? In progress (29/4)

Monday 3rd June 2019, 14.30 BST
47 Open UK Tickets this month

Yearly GOCDB review
141296 (21/5)
Has everyone checked their site's GOCDB information to make sure it's all up to date?

iris.ac.uk VO tickets
141533
A new VO has been created. Will it be added to the operations portal? Is the plan to make this a gridpp approved VO, or only for IRIS sites? Andrew spoilered this one for us on TB-SUPPORT!

LCGDM "Retirement" tickets
141466 (Glasgow)
141468 (Liverpool)
141469 (Manchester)
Tickets asking for site's plans now that LCGDM is phased out - all sites have replied.

BRUNEL
141475 (29/5)
This ticket regarding your ARC CEs seems to have been missed. Assigned (29/5)

141435 (28/5)
Similar for this atlas ticket. Assigned (28/5)

Both tickets handled now.

QMUL
141553 (3/6)
It appears that (presumably harmless) CERT warnings are generating alarms in the ROD dashboard. Is that new behaviour? In progress (3/6) Discussed on TB-SUPPORT

CAMBRIDGE
141241 (20/5)
How goes the decommissioning? Are all broadcasts sent out? In progress (20/5)

ECDF
141098 (9/5)
This DUNE ticket has been solved, and can be closed with the blessing of the VO. In progress (29/5)

SHEFFIELD
138649 (3/12/18)
Do you need any extra help fixing the file move Elena? On hold (20/5)

DURHAM
141234 (20/5)
Another case of atlas jobs being killed due to using too much memory. In Progress (29/5)

TIER 1
141262 (21/5)
Any progress with this LHCB ticket, where jobs failed trying to access a file? In progress (22/5)

140870 (25/4)
Has the process of moving these T2K files to where they should be in the CASTOR namespace started? In progress (29/4)

Monday 13th May 2019, 14.00 BST
37 Open UK Tickets this month.

IPv6 Tickets
Oxford: 131615 On Hold (13/3)
This ticket could likely do with a progress update once you jump under the DOME.

Birmingham: 131612 In Progress (5/2)
How goes things with the v6 DNS? The ticket could do with an update, or if you're not having much joy perhaps being set back to "On Hold". Thanks for the update Mark

Glasgow: 131611 In Progress (5/2)
I suspect this ticket should be put On Hold until the big move?

Durham: 131609 On Hold (4/12/18)
This ticket could do with an update (even a null one).

Manchester: 131607 In Progress (9/4)
There's light at the end of the v6 tunnel for Manchester, did you roll deploying this into last week's work?

Liverpool: 131606 On Hold (6/2)
Another ticket that could do with an update, even if it's just to fill the silence.

RHUL: 131603 In Progress (8/4)
Simon and Tony have been poking their network manager but no joy in getting the paperwork to use the JANET services signed off. Anything we can do?

Sheffield: 131608 On Hold (7/5)
Elena has layed out the Sheffield plan - with the SE on the way out this year it's not going to be dual-stacked, and a plan to reinstall their perfsonars next month (dual-stacked?).

The rest of the tickets...

BIRMINGHAM
140679 (10/4)
Ticket tracking the decommissioning of the Birmingham CREAM CE. I believe things have progressed further then are reflected in the ticket? But in general there doesn't seem to be any big uproar over switching it off (yet). In progress (29/4)

OXFORD
141099 (9/5)
A DUNE ticket about jobs not working, it looks like a classic problem of wrong voms settings and DUNE jobs are running now, so maybe this ticket can be closed. Waiting for reply (13/5)

GLASGOW
141124 (10/5)
Atlas analysis jobs not running properly at Glasgow. Gareth as on held the ticket until AC problems at the site are fixed, to see if that solves the issue as well. On Hold (13/5)

140222 (15/3)
MICE DFC migration ticket. I think the conversation has stalled, last update was from Daniela asking if a user was still a user, and if their data can be purged or not. In progress (10/4)

ECDF
140802 (22/4)
LHCB Pilots aborting at ECDF. This seems to come and go in cycles - any joy in figuring out what the underlying issues are? I assume you're waiting to see if all is okay after your recent downtime. In progress (26/4)

141098 (9/5)
Another DUNE ticket (they've found their voice!). Teng has confirmed that ECDF is in a "let's wait and see if it's all working now" period. On Hold (10/5)

140846 (24/4)
ROD ticket for the ECDF CE, a symptom of all the badness that the site's been experiencing. A quick check shows that ce8 is looking fairly green in nagios. In Progress (29/4)

DURHAM
141104 (10/5)
Multiple ROD alarms at the site. Adam was investigating, but things look okayish now. In Progress (10/5) Solved now.

SHEFFIELD
141134 (12/5)
Atlas transfer failures due to what looks like an expired cert. In progress (12/5)

140866 (25/4)
Biomed jobs failing with what look like authentication errors. Any joy/time in looking at this? In progress (25/4)

140987 (2/5)
Another DUNE ticket. A number of small showstoppers have been fixed, so hopefully DUNE jobs are running at Sheffield now? Waiting for reply (10/5)

138649 (3/12/18)
(The last) T2K migration ticket. Elena has it in her schedule, but her plate is full. Hopefully tomorrow. On Hold (13/5)

LIVERPOOL
139411 (30/1)
A Biomed ticket from Biomed asking if using the BIOMEDDISK spacetoken could be made automagic. John explained the situation, what's the plan with this ticket. Keep it open until you're DOME'd? You could close it as the initial question has been answered. On Hold (1/2)

UCL
139101 (8/1)
Any joy getting ViaB working at UCL? In Progress (8/4)

QMUL
141133 (12/5)
LHCB FTS transfer problems. Terry reports power problems at the site, with no news when juice will be restored. Hopefully soon! On Hold (13/5)

141132 (12/5)
Atlas issues for the same reason. The atlas response was quite harsh though. In Progress (13/5)

140190 (14/3)
The old LHCB FTS ticket, left on hold whilst things are investigated in the event problems crop up again. On Hold (23/4)

138364 (19/11/18)
T2K DFC migration ticket - it's all in the VO's hands now and Daniela proposes closing the ticket. In progress (13/5)

134573 (17/4/18)
CMS request to install singularity. I'll ask you how it's going at a later date. On Hold (5/11/18)

BRUNEL
140669 (10/4)
LHCB job failures, by the looks of it on a pair of v6-only WNs. I'm not sure how things are going, but the ticket is being actively investigated on both sides. In Progress (9/5)

IMPERIAL MASTER TICKETS
138359 -T2K DFC migration
140198 -MICE DFC migration

TIER 1
141108(10/5)
DUNE having problems submitting to the Tier 1. Any joy? In Progress (10/5) It appears some no-as-well-documented-as-it-should-be process needs to be done to get things working.

140870 (25/4)
T2K spotted a number of files that should be on type according to the DFC aren't. After a bit of digging it seems that the files simply weren't moved to their correct (according to the DFC) places yet. How goes the move? In Progress (29/4)

140773 (18/4)
LHCB ticket over slow ECHO deletions. Investigating thoroughly, spotting a possible issue with gridftp session reuse. Trying to replicate the issue elsewhere. In progress (8/5)

141105 (10/5)
ROD SRM failures. George notes a problem with the castor certs that are being fixed. In progress (10/5)

140220 (15/3)
MICE DFC migration ticket. The last update had some questions from Daniela to both the CASTOR admins and MICE - neither of which have had a reply in the ticket. In progress (8/4)

140447 (27/3)
Duncan spotted outbound v6 packet loss from the Tier 1. Brian confirms that only v6 is affected, and there is a plan to upgrade the firmware in the border router (I think that's what the plans are?). Hopefully that will get it. On Hold (10/5)

139672 (13/2)
LIGO pilots not running at RAL. Did Catalin get his LIGO membership reinstated for debugging purposes? In progress (30/4)

Tuesday 7th May 2019, 10.30 BST

34 Open UK Tickets today. I'll do a full review next week (so please prune your tickets accordingly).

SHEFFIELD
131608 (3/11/17)
There was no update on this ticket last week - please can we get one this week - preferably today. We don't want to have to deal with an escalation. In Progress (30/10/18)

138649 (3/12/18)
Have you managed to poke Kashif for his recipe to generate the dumps that T2K requested? In Progress (8/4)

UCL
139101 (8/1)
The only update on this ticket was a poke from Daniela. Any news on any other channels? In progress (8/4)

DURHAM
141034 (7/5)
Related to the thread Adam started, Durhams new ARCs aren't playing nice. I thought I'd mention it here in case we didn't have a place to discuss it elsewhere. In Progress (7/5)


Anything I missed?

Monday 8th April 2018, 14.00 BST
38 Open Tickets this month.

RALPP
139539 (5/2)
A ticket from Duncan regarding blocked perfsonar ports. The host is failing to talk to itself due to odd reasons. Any luck finding the time to look again at this? Duncan posted a few hints a month ago. In Progress (14/2)

OXFORD
140134 (11/3)
Atlas jobs seeing an "Unspecified grid manager error". The classic grid error. Prompted the discussion about atlas asking for jobs not to be killed due to using too much memory. Oxford have raised their default memory per job to 4GB (and they won't kill a job unless it uses 1.5 times that). Waiting to see if that fixes things. How does it look atlas side? Waiting for reply (2/4)

138647 (3/12/2018)
T2K DFC migration ticket. If I had left it a few hours this ticket would be closed, Kashif has successfully renamed the files without having to do anything DOMEy. Daniela is re-registering the files in the DFC and hopefully this will be sorted soon. In progress (8/4)

131615 (3/11/2017)
Oxford's IPv6 ticket. Kashif provided a light update last month, still not much movement but there are plans made. On hold (13/3)

BIRMINGHAM
140573 (4/4)
A request from biomed to update the .lsc information. Is this even relevant for your site anymore? Either way the ticket needs acknowledging (or straight up "not relevant to our site"-ing). Assigned (4/4)

140584 (4/4)
A ROD ticket for your cream CE Birmingham only has to not get these tickets. Looks like a simple lcg-CA (or whatever they're called these days) updated needed on your WN. As this ticket also hasn't been noticed I wonder if Mark was off work on the 4th? Assigned (4/4)

131612 (3/11/2017)
Birmingham's v6 ticket. Things were progressing nicely but slowly back in February. Any time to have any joy on this since then? In progress (5/2)

GLASGOW
140151 (12/3)
LHCB jobs seeing "can't start new thread" errors. The problem isn't understood as far as I can see but it appeared to disappear on its own - on hold to see if it comes back. On hold (27/3)

140222 (15/3)
MICE DFC migration ticket. Daniela repoked to ask for an ETA on getting those checksums. In progress (15/3)

131611 (3/11/2017)
Glasgow's v6 Ticket. Any news in the last two months about why your v6 performance is pants? In progress (5/2)

DURHAM
131609 (3/11/2017)
Just a v6 ticket at Durham. Really could do with an update. On Hold (4/12/2018)

SHEFFIELD
138649 (3/12/2018)
T2K DFC migration ticket. Elena is asking Kashif for his secrets in renaming DPM files. On Hold (8/4)

131608 (3/11/2017)
Sheffield's v6 ticket. Really, really, really needs an update. And probably putting on hold if there are no postive updates. In progress (30/10/2018)

MANCHESTER
131607 (3/11/2017)
Just the v6 ticket here. Please please can we get an update here? And again, if there is not positive news expected can it be set on hold too? In progress (3/12/2018)

LIVERPOOL
139411 (30/1)
What's the plan for this biomed space token ticket? On hold (1/2)

138648 (3/12/2018)
T2K DFC migration ticket. The VO would like some folders just plain deleted to help clear things up. In progress (8/4)

131606 (3/11/2017)
Liverpool's v6 ticket. As we're at the start of a new FY are plans being formed for the new network upgrades? On hold (6/2)

139683 (14/2)
Not really a problem, the decommissioning ticket for the site's SL6 CE. Nice and by the book. In progress (12/3)

UCL
139101 (8/1)
ROD APEL-Pub alarm for UCL's VAC cluster. I think the site is a bit dead in the water here - has there been any news or progress? At last check Ben couldn't install ViaB so was stuck. In progress (4/3)

RHUL
131603 (3/11/2017)
RHUL's v6 ticket. No news on this since January, is there still no news? Are the right people being prodded and poked? In progress (23/1)

QMUL
140190 (14/3)
LHCB seeing FTS problems. After a lot of poking it seemed the problem went away after a few service and node restarts for unrelated reasons. Brian notices that the FTS monitoring is looking okay now - can this ticket be closed? Or do we want to watch it a bit longer? In progress (2/4)

140628 (8/4)
Atlas jobs failing because the work directory is too large (wait, what? Too much space?). Fresh in today - Dan is on the atlas cloud support list getting advice as I type. Assigned (8/4)

138364 (19/11/2018)
T2K DFC migration ticket. Daniela has repoked the ticket with urgency! In progress (1/3)

134573 (17/4/18)
CMS request to install singularity. Dan rolled this into the C7 migration, how's that going? On hold (5/11/2018)

IMPERIAL
138359 (19/11/2018)
T2K DFC migration master ticket.

140198 (14/3)
MICE DFC migration master ticket.

No real tickets at IC - just master tickets tracking other issues.

BRUNEL
140619 (8/4)
CMS transfers of a file failing, due to a classic disk server down error. Raul is working to bring it back from the brink. In progress (8/4)

140223 (15/3)
MICE DFC migration ticket. Progressing nicely, just one more job to do (a big file move). In progress (8/4)

140598 (4/4)
Another CMS ticket due to the same server being down. In progress (5/4)

THE TIER 1
140599 (5/4)
LHCB data access problems. Restarting a castor server seems to have got things going again, can this ticket be closed then? In progress (5/4)

140577 (4/4)
Really a ticket to LHCB, George noticed loads of file requests coming in with no service class defined, which has the potential to cause issues. It is being looked at (Matt code for I scanned the ticket and got lost), as of Friday there still were a few files coming in with these symptoms. In progress (5/4)

140447 (27/3)
The ever vigilant Duncan spotted v6 outbound packet loss on the RAL perfsonar. Investigations went underway - any results? In progress (2/4)

140220 (15/3)
MICE DFC migration ticket. Daniela has batted questions at both the VO and the Tier 1 today, so it's still ongoing. In progress (8/4)

138665 (4/12/18)
MICE LFC problem ticket. With the progress on the previous issue I think this can be closed (to either solved or unsolved)? On Hold (30/1)

140589 (4/4)
A case of killed LHCB pilots at RAL. James confessed to an accidental docker restart across many nodes that could be responsible for the carnage. Some more investigation is being done, are jobs still being killed? In progress (4/4)

139672 (13/2)
No LIGO pilots running at RAL. Efforts were made but sadly no fruit was born from them - and there's been no news for a month. Is this being looked at offline? The last update from the VO had a pleading tone to it. In progress (5/3)

138033 (1/11/2018)
The old atlas singularity jobs failing ticket. This was closed for a while, but Alessandra has reopened the case - the last batch of tests revealed some issues with the RAL configuration. Reopened (6/4)


Tuesday 12th March 2019, 10.15 GMT
41 Open Tickets this week.

Biomed .lsc Tickets
A few of these don't seem to have been 'spotted' yet:
139971 (QMUL)
139961 (LIVERPOOL)

When do you give up on a disk server?
139863 (LANCASTER)
A broken disk server has been causing all manner of troubles at Lancaster - and we've not had any luck fixing it. The data's okay (it's the NIC that's broken), but at what point is it easier just to declare the 121TB of data on board lost?

When is an atlas ticket not an atlas ticket?
139741 (Glasgow)
When they forget to fill in the "concerned VO" section. In a slightly similar theme, Sam notes the pain of having to click through files on the DDM monitoring pages to look for failures (I've had some success by downloading the json and grepping through that, but that's still a cold comfort when you're dealing with a good few thousand failures as you can only do 500 fails at a time).

Whistle-stop Tour of the v6 Tickets OXFORD: 131615
Last update: 7/1/19. Could do with a bi-monthly update, even if it's "nothing to report, move along".

BIRMINGHAM: 131612
Last update: 5/2/19. Things were looking a little positive last month with Mark wrestling with v6 DNS, were you victorious Mark?

GLAGOW: 131611
Last update: 5/2/19. v6 kind of works at Glasgow, if one defines the concept of working loosely enough. I don't think any more news is expected a month on, but it would be nice to be proved wrong.

ECDF: 131610
Last update: 4/2/19. The chaps at Edinburgh were waiting on DPM 1.11 to come out before dual stacking their storage. Wait longer. Trust me. Just wait.

DURHAM: 131609
Last Update: 4/12/18. Any news whatsoever from your networking team? The ticket has reached a point where it could do with even a null update.

SHEFFIELD: 131608
Last Update: 30/10/18. This really, really, really, really needs an update, especially as the last word was quite positive.

MANCHESTER: 131607
Last update: 3/12/18. Similar for the Manchester ticket. Last word was quite positive, but since then silence.

LIVERPOOL: 131606
Last update: 6/2/18. John gave a good summary of the situation, with the site waiting on upgrades due next FY. Hope it all goes well.

RHUL: 131603
Last update: 23/1/19. No change at RHUL, but it really would be nice to see how outsourcing the v6 DNS to JANET works out.

Tuesday 5th March 2019, 9.30 GMT
47 Open UK Tickets. Skipping over the v6 tickets as we need something to look forward to next week.

Update Biomed's .lsc
RALPP: https://ggus.eu/?mode=ticket_info&ticket_id=139962 (assigned)
SHEFFIELD: https://ggus.eu/?mode=ticket_info&ticket_id=139771 (in progress)
SHEFFIELD AGAIN(?): https://ggus.eu/?mode=ticket_info&ticket_id=139963 (assigned)
LIVERPOOL: https://ggus.eu/?mode=ticket_info&ticket_id=139961 (assigned)
RHUL: https://ggus.eu/?mode=ticket_info&ticket_id=139973 (in progress)
QMUL: https://ggus.eu/?mode=ticket_info&ticket_id=139971 (assigned)
BRUNEL: https://ggus.eu/?mode=ticket_info&ticket_id=139953 (assigned)

A lot of these tickets are fairly neglected - is this just a symptom of the VO's priority?

T2K DFC Migration Tickets
Master Ticket: https://ggus.eu/?mode=ticket_info&ticket_id=138359 (Last update was Daniela giving T2K sensible suggestions, waiting feedback)
QMUL: https://ggus.eu/?mode=ticket_info&ticket_id=138364 (stalled but hopefully files started moving this week?)
LIVERPOOL: https://ggus.eu/?mode=ticket_info&ticket_id=138648 (how goes the DOME migration?)
SHEFFIELD: https://ggus.eu/?mode=ticket_info&ticket_id=138649 (waiting to here if T2K want their data still)
OXFORD: https://ggus.eu/?mode=ticket_info&ticket_id=138647 (as above)

Most of the rest of the tickets, site by site:

TIER 1
https://ggus.eu/?mode=ticket_info&ticket_id=139476 (1/2)
This request for an LFC dump for MICE has been sitting for a month, has it fallen under a pile? In progress (4/2)

https://ggus.eu/?mode=ticket_info&ticket_id=138665 (4/12/18)
The LFC access problem ticket that prompted the above LFC dump ticket. On Hold (30/1)

https://ggus.eu/?mode=ticket_info&ticket_id=139990 (1/3)
CMS spotted the ral xrootd segfaulting and asked for a core dump file. Some back and forth looking at problems in an ongoing conversation. In progress (4/3)

https://ggus.eu/?mode=ticket_info&ticket_id=139306 (24/1)
Request from Duncan to upgrade the Tier 1 perfsonar boxen, and update a few settings. A few technical hurdles were hit (classic yum problems). Any recent news on this? In progress (27/2)

https://ggus.eu/?mode=ticket_info&ticket_id=139672 (13/2)
LIGO don't have any pilots running at RAL. Catalin has been bug hunting to fix some configs and has just requested the VO send some test jobs. Waiting for reply (4/3)

https://ggus.eu/?mode=ticket_info&ticket_id=139639 (12/2)
CMS having troubles opening a file at RAL. This has turned into a dodgy file hunt, and Katy has been involved in debugging what's going on. Waiting for the user to try again. Waiting for reply (27/2)

https://ggus.eu/?mode=ticket_info&ticket_id=138033 (1/11/18)
The atlas singularity ticket. Not much news on this one for a while. In progress (31/1)

https://ggus.eu/?mode=ticket_info&ticket_id=139983 (28/2)
T2K having problems bringing files online from RAL TAPE. The ticket sat in limbo for a bit as the site wasn't notified. John confirms the file exists and has passed it onto the castor team. In progress (5/3)

https://ggus.eu/?mode=ticket_info&ticket_id=138500 (26/11/18)
CMS transfer problems between Swierk and RAL. Looks like this might be fixed by changing the block size (not sure in what context), just waiting for monitoring plots to all turn green before closing. In progress (4/3)

https://ggus.eu/?mode=ticket_info&ticket_id=139723 (15/2)
Atlas permissions problems on echo scratch disk. I've just scanned the ticket, but there's a reference to problems with stating files immediately after upload that were seen at Lancaster. Investigations are ongoing, but perhaps are a little stalled? In progress (26/2)

UCL
https://ggus.eu/?mode=ticket_info&ticket_id=139101 (8/1)
A ROD APEL ticket - Ben is stuck trying to fix it as he's having no luck installing ViaB on his nodes - "Cannot retrieve repository metadata (repomd.xml) for repository: UMDbase.". Although it jsut occurs to me it might be worth having Ben try again as he might have tried when the UMD repos were down t'other week. In progress (4/3)

QMUL
https://ggus.eu/?mode=ticket_info&ticket_id=134573 (17/4/18)
CMS request to install singularity. Any recent joy with your C7 deployment to put into the ticket? On Hold (5/11)

https://ggus.eu/?mode=ticket_info&ticket_id=139097 (7/1)
LHCB data transfer problems. Everything looks good now so this ticket can be closed (lhcb say so). In progress (25/2)

RALPP
https://ggus.eu/?mode=ticket_info&ticket_id=139539 (5/2)
Debugging a problem with one of RALPP's perfsonar hosts. Chris thinks there might be an errant expired certificate somewhere on his server. But where? Does it sound similar to anyone? In progress (14/2)

https://ggus.eu/?mode=ticket_info&ticket_id=139915 (26/2)
v6 transfer problem ticket. Chris fixed the immediate problem with a network restart, but the root cause remains unknown. In Progress (28/2)

BRUNEL
https://ggus.eu/?mode=ticket_info&ticket_id=140043 (4/3)
A fresh ROD ticket for the Brunel ARC CE. Raul was provisioning some new hosts and it hit things a bit, he's snuck in a downtime. In progress (4/3)

https://ggus.eu/?mode=ticket_info&ticket_id=139509 (4/2)
CMS data transfer problems to Brunel, aftershocks from moving to DOME and away from using SRM. Things were looking better after some heroic efforts by Raul. How goes things? In progress (24/2)

https://ggus.eu/?mode=ticket_info&ticket_id=139344 (28/1)
Another CMS transfer ticket(?) The same story as above. Maybe they can be consolidated? In progress (24/2)

BRISTOL
https://ggus.eu/?mode=ticket_info&ticket_id=139870 (25/2)
CMS transfer ticket. Looks like it's being sorted. Waiting for reply (5/3)

BIRMINGHAM
https://ggus.eu/?mode=ticket_info&ticket_id=137801 (17/10/18)
The DPM decommissioning ticket. I reckon this ticket can be closed now, a month on from removing the DPM from gocdb. In progress (5/2)


Tuesday 19th February 2019, 10.30 GMT
43 Open Tickets Today.

Request to Update Biomed .lscs
139776 (Birmingham)
139771 (Sheffield)
A reminder that that biomed have updated their voms information, and have started ticketing sites about it.

T2K File Catalog Migration
138359
It might be pointless bringing this up as I think Daniela is off today, but I believe this issue is mostly waiting on feedback from t2k.

ROD Dashboard not updating?
138894 (Birmingham SRM alarm). 138243 (ECDF Availability alarm). Both these ROD tickets seem to be for issues that should have gone away - the SE in the Birmingham ticket is now but a memory and some tarred up log files, whilst the argo monitoring shows that ECDF's availability has been > 95% for the last 30 days. Yet alarms on the ROD dashboard persist. Has a ticket been put in about this?

Double then nothing at Glasgow
139741 Gareth has spotted what looks like more double-transfer-then-delete-it-all FTS transfers at Glasgow. I thought this problem should have been sorted? Or is that only on certain FTSes?

MICE Dump at the Tier 1
139476 Any news on this ticket to generate an LFC dump for MICE?

Stale v6 Tickets
131609 (Durham)
131608 (Sheffield)
131607 (Manchester)
Another polite reminder to please update these tickets - even a quick "nothing new here" update will be fine.


Tuesday 12th February 2018, 10.15 GMT

41 Open Tickets today. A light touch today as you heard more then enough of my voice last week!

Will the new release of DPM (hrlp) fix these tickets?
139509 - Brunel, CMS
139344 - Brunel, CMS
139587 - Lancaster, ATLAS
137996 - Lancaster, ROD
131610 - ECDF, Ipv6
DPM 1.11 is out imminently, with a bunch of bug fixes that might sort these out. But a righteously grumpy Raul needed this fix a few weeks ago, and is dubious about DPM's testing infrastructure.

Could the following IPv6 tickets please get an update?
131609 - Durham
131608 - Sheffield
131607 - Manchester


NGI Ticket regarding Birmingham
139506 - We've responded, but the site might have other things to add. One thing I noticed - the new EOS instance doesn't appear to be in the gocdb (unless I'm being blind). Is this intentional?


Monday 4th February 2019, 14.30 GMT
41 Open UK Tickets this month.

NGI
139506 (4/2)
The NGI got a ticket regarding Birmingham's availability figures, which are thrown by the decommissioning of their SE. We need to formulate a reponse, but we should perhaps ask for an A/R recomputation for January for the site. Assigned (4/2)

OXFORD
139431 (30/1)
A request from CMS to updated the site's site-local-config. Being looked at. In progress (31/1)

138647 (3/12/18)
Ticket tracking the t2k DFC migration at Oxford. Kashif has supplied the best file dump that he can without DOME installed. Daniela has asked the VO if they can enact a "clean slate" solution at Oxford to make life easier for all. In progress (31/1)

131615 (3/11/17)
Oxford's IPv6 ticket. Kashif has kept this up to date, with some semi-positive news - things are moving in the right direction, however slowly. On Hold (7/1)

BRISTOL
139410 (30/1)
CMS ticket for transfer failures from Florida to the site. Investigation suggests that this might be an IPv6 issue. In progress (4/2)

131613 (3/11/17)
Bristol's IPv6 ticket. Good progress here, but more holes needed to be poked in the site's v6 firewall. We'll need to check the PS mesh (still all grey for Bristol's v6 endpoints at time of writing). In progress (4/2)

BIRMINGHAM
137801 (17/10/18)
Ticket tracking the decommissioning of the Birmingham DPM. The node was removed from gocdb and switched off last week. I can't remember how long these tickets need to be kept open - I should look that up really. Just remember to keep your logs for 90 days Mark! In progress (30/1)

138894 (17/12/18)
This ROD ticket for the decommissioned SE might have hit a problem - Mark removed the server from the gocdb but there's still an alarm on the dashboard... On Hold (9/1)

138244 (12/11/18)
Meanwhile since killing off the old DPM completely the Birmingham Availability/Reliability figures have started to fix themselves. On Hold (1/2)

131612 (3/11/17)
Birmingham's v6 ticket. Some good news just before Christmas, hopefully Mark will be able to start dual-stacking once he's cleared his plate a bit. On Hold (24/12/18)

GLASGOW
131611 (3/11/17)
Only the v6 ticket at Glasgow. Last update (today) was a request for info from the v6 ticket watchers. In progress (4/12/18)

EDINBURGH
139240 (21/1)
An LHCB ticket about jobs failing, tracked to a "black hole" node that was took offline. Last update was waiting on the VO to confirm if the problem has gone away, which they were having problems doing due to having "issues" at the time. If there's no word from LHCB soon then I would close this ticket. In progress (22/1)

138243 (12/1/18)
An availability ticket. I'm a little confused as to why there's still an alarm on the dashboard, as the argo page looks to my eyes like the site has had >85% availability over the last 30 days (only one non-100% day). On Hold (1/2)

131610 (3/11/17)
ECDF's v6 ticket. Some positive news back in early December, the ticket could do with an update. In progress (4/12/18)

DURHAM
131609 (3/11/17)
Another site with just the v6 ticket. Last update was the start of December, any news from your network team at all? On Hold (4/12/18)

SHEFFIELD
138649 (3/12/18)
Sheffield's t2k DFC migration ticket. The site's status is the same as Oxford, and was included in Daniela's query to t2k in that ticket. In progress (9/1)

131608 (3/11/17)
Sheffield's v6 ticket. In great need of an update. In progress (30/10)

MANCHESTER
131607 (3/11/17)
Only the v6 ticket at Manchester too. Things were looking good towards the end of last year, any news? In progress (27/11/18)

LIVERPOOL
139411 (30/1)
A request from Biomed querying if they still need to use the -s option to use the site's space token (note that they're still using lcg tools). John replied that currently this is still the case, but in the DOME future it won't be (due to quotatokens being applied to a directory). On Hold (1/2)

138648 (3/12/18)
Liverpool's t2k DFC migration ticket. Unlike the other two sites Liverpool is planning on migrating to DOME soonish, so they might not require a "clean slate solution". On Hold (18/12/18)

131606 (3/11/17)
Liverpool's v6 ticket. Last report had the networking team look at this in the New Year (so now-ish) to dual stack the storage, whilst the perfsonars are happily dual-stacked already. Please update the ticket once you know more (whoch will hopefully be soon-ish). In Progress (5/12/18)

LANCASTER
137996 (30/10/18)
A ROD ticket for an http test failure caused by DPM not quite handling http file moves quite right. Waiting on an updated version of DPM to get into epel - I will ask the devs today how that's going. On Hold (14/1)

UCL
139101 (8/1)
A ROD ticket for APEL publishing test failures. Ben has called Andrew McNab in for help installing things. In Progress (30/1)

RHUL
131603 (7/11/17)
Just the v6 ticket at RHUL too. Simon confirms that there's been no news on this front. In progress (23/1)

QMUL
139430 (30/1)
Another CMS ticket to update the site-local-config. Daniela has sorted it and has asked CMS to confirm. Waiting for reply (4/2)

139097 (7/1)
LHCB seeing data transfer problems, but this was a while ago. Dan has asked if problems persist. Waiting for reply (30/1)

138364 (19/11/18)
QM's t2k DFC migration ticket. Dan was ready to do the data moving bit, just asked for a confirmation of that needed to be done. Is the move underway Dan? In progress (16/1)

134573 (17/4/19)
CMS request to install singularity. Dan is rolling this into the move to C7, which was in the testing phase last November. Any recent news? On Hold (5/11/18)

IMPERIAL
139454 (31/1)
A ticket from a t2k user having trouble accessing post-DFC migration data at RALPP - which for reasons had to be routed to Imperial. Daniela can't spot any problems, so it looks like a user side issue. Although it might be worth checking the t2k.org .lsc files at RALPP. Assigned (should be something else) (31/1)

138359 (19/11/18)
Daniela runs such a tight ship at IC that she has to assign other issues to her site - this is the DFC migration master ticket. On Hold (22/1)

BRUNEL
139344 (28/1)
CMS transfer failures at Brunel. The storage is working fine, but it looks like some files aren't at Brunel that CMS things should be at Brunel, with no explanation of where they went. It's being investigated. In progress (4/2)

100IT still have ticket: 137306 (last update 16/1)

TIER 1
138361 (19/11/18)
The Tier 1's t2k DFC migration ticket. The ticket looks done with, just waiting on t2k to see if things are okay. That seems to be a little unclear, but that might be a VO side problem. In progress (31/1)

138665 (4/12/18)
The original mice LFC ticket, on hold whilst the above is sorted out.

139476 (1/2)
With the MICE LFC dead in the water this is the request for a dump to migrate to the DFC. In progress (4/2)

139306 (24/1)
A request from Duncan to upgrade the RAL perfsonar hosts (and fix some configs). In progress (29/1)

138891 (17/12)
A ROD availability ticket that looks a bit off - John thinks this is due to invalid tests being run and has opened a counter ticket: 139198 - from that the test in question is due to be removed this week. On Hold (16/1)

139477 (1/2)
A ROD ticket for a couple of sickly ARC CEs. One node is fixed, the other was already on the naughty step for having a high load (possibly from the A-REX slapd process), and it's being poked and prodded. In progress (4/2)

138500 (26/11/18)
CMS transfers from T2_PL_SWIERK failing. File transfer experts were about to be called in, and the ticket is now On Hold. Is it going to be a tough one to debug? On Hold (30/1)

138033 (1/11/18)
Atlas ticket for singuarlity job failures at RAL. Still lots of back and forth here, with great efforts from James and Alessandra. In progress (31/1)

139414 (30/1)
LHCB jobs seg faulting. It appears these errors all occurred on VMs, and now those VMs have passed on the errors have disappeared too. As there's no way to easily proceed (VM necromancy isn't a thing afaik) then it looks like this one can be closed. In progress (4/2)

Tuesday 29th January 2019, 10.00 GMT

36 Open UK Tickets today.

TIER 1
138665 (4/12/18)
My apologies for being a nag, but this MICE LFC ticket still hasn't had an update this side of Christmas. Could someone please take a look and update the ticket (or at least re-acknowledge the ticket's existence). In progress (12/12/18)

138500 (26/11/18)
This CMS transfer ticket is a little quiet, although I suspect that's due to a lot of conversation going on along other channels and work is ongoing on the issue. Are my suspicions correct? In progress (17/1)

QMUL
139097 (7/1)
In a similar nagging tone, no words have been added to this ticket, from either side (site or LHCB). Is the issue still an issue now that (I believe) the works at QM are finished? In progress (8/1)

ECDF
139240 (21/1)
A comment aimed at LHCB rather then the site - have you been able to check that the issue at hand (which looked to be a classic black hole node) has been dealt with? The last report from the VO mentioned there were other issues preventing seeing if things were solved. In progress (22/1)

BIRMINGHAM
137801 (17/10/18)
The aspirational switch off date for the Birmingham DPM was yesterday. How did that go Mark? Do you now feel like a huge weight is off your shoulders? In Progress (22/1)

LIVERPOOL
138943 (19/12/18)
Just in case the Liver lads haven't seen it, this LHCB transfer issue is no more and the ticket can be closed. In progress (28/1)

BRISTOL
131613 (3/11/17)
To keep with the positives, the Bristol IPv6 ticket looks to be almost finished with - firewall ports are open so we just need to see if PS tests run fine. Nice. In progress (29/1)

Monday 21st January 2019, 16.30 GMT
39 Open UK Tickets this week.

First a look at a few regular tickets:

TIER 1
138665 (4/12/18)
This MICE LFC ticket that was mentioned last week still could do with some attention, it still hasn't been updated since last year. It looks like a connection issue (and a bit of an odd one at that). In Progress (12/12/18)

RALPP
139222 (18/1)
A ROD ticket for webdav test failures. Chris has asked where to get some help with figuring out the error code seen when the test fails - the test description link in the ticket appears to be broken. In progress (21/1)

QMUL
139097 (7/1)
Any luck fixing these LHCB data transfer failures? In progress (8/1)

THE IPv6 TICKETS
OXFORD: 131615
Kashif provided a comprehensive update at the start of the month - it looks like some progress is soon going being made, although it looks like it will be a slow process due to the low priority of IPv6 with the Oxford networking people. On Hold (7/1)

BRISTOL: 131613
Things are looking positive at Bristol, just waiting on some holes in the site firewall for the perfsonar boxen. Any luck with that? In progress (21/12)

BIRMINGHAM: 131612
More positive news, the new central infrastructure is in place and so hopefully Mark can have a go at dual-stacking soon (I assume after he's killed off his DPM). On Hold (24/12/18)

GLASGOW: 131611
Sadly not so positive an update from Glasgow - Gareth explained how their perfsonar revealed v6 traffic issues when it was dual-stacked (which is its job), so they're waiting on this getting fixed. Luckily the usual sticking point of v6 reverse DNS isn't an issue. In progress (4/12)

ECDF: 131610
Rob explained in the last update how the physical migration of the site *didn't* break the v6 connectivity of the perfsonar and test DPM (yey!). Dualstacking the production storage was predicted to start around nowish (give or take a month I assume). Any recent news? Now worries if there's not though. In Progress (4/12/18)

DURHAM: 131609
Adam forwarded Duncan's information about the JISC Secondary DNS service to the Durham networking team - v6 packets can otherwise flow (just no DNS!). Any word back from them? On Hold (4/12)

SHEFFIELD: 131608
There was hope that the perfsonar box could be dualstacked in November, but I assume the usual end of the year rush happened. Any luck dual-stacking it this year? An update for this ticket would be great. In progress (30/10/18)

MANCHESTER: 131607
Manchester got a shiny new IPv6 range towards the end of last year. Any luck dual-stacking your storage yet? Any timeframe for doing so if you haven't got round to it yet? In progress (3/12/18)

LIVERPOOL: 131606
John gave a nice chunky update last month - the site stands ready to dualstack their storage, but just waiting on getting the WAN routing fixed (hopefully sometime soonish). But at least their perfsonar is happily v6'd. In Progress (5/12/18)

RHUL: 131603
At last check RHUL were waiting on v6 DNS before they could proceed. Simon reported that RHUL were looking at outsourcing this service to JANET, but no word on if/how well that's going/gone. Any news? An update would be appreciated. In progress (29/10)

Monday 14th January 2019, 14.30 GMT

40 Open UK Tickets this week.

T2K DFC Migration on DPMs
Liverpool: 138648
Oxford: 138647
Sheffield: 138649
Lancaster: 138365

A quick summing up of these tickets- to provide the information T2K need (namely adler32 checksums for files that don't already have them) it appears your DPM needs to be DOME'd. At Lancaster seem to be having the most luck with this so far so please feel free to prod me about it.

v6-looking transfer problems
Liverpool (lhcb): 138943 (19/12)
RALPP: (atlas): 139127 (10/11)

Whilst for different VOs there's a common theme to both of these tickets - it looks like the failing transfers are trying to use IPv6. Any thoughts? Update - both tickets have been looked at further, the Liverpool ticket was a firewall issue and should be fixed. Chris has looked into the RALPP errors and is a little confused as there don't seem to be any v6 routing problems but there are too many v6 transfer failures.

Bristol LHCB Ticket
138402 (21/11/18)
Are the issues described in this ticket still happening? That might be a question for the VO rather then the site. (6/12/18)

Last Year's Tier 1 Tickets:
138665 (LFC access issues)
138500 (CMS transfer failures)
138361 (T2K DFC migration)
A quick note that none of these tickets have had an update from the site yet this year to indicate that they've been picked up again after the Holiday break.

Extra Extra 139152 - This Sheffield LHCB ticket from the weekend seems to have been missed, it looks like there might be a black hole node gobbling up LHCB jobs.


Monday 14th January 2019, 14.30 GMT

40 Open UK Tickets this week.

T2K DFC Migration on DPMs
Liverpool: 138648
Oxford: 138647
Sheffield: 138649
Lancaster: 138365

A quick summing up of these tickets- to provide the information T2K need (namely adler32 checksums for files that don't already have them) it appears your DPM needs to be DOME'd. At Lancaster seem to be having the most luck with this so far so please feel free to prod me about it.

v6-looking transfer problems
Liverpool (lhcb): 138943 (19/12)
RALPP: (atlas): 139127 (10/11)

Whilst for different VOs there's a common theme to both of these tickets - it looks like the failing transfers are trying to use IPv6. Any thoughts? Update - both tickets have been looked at further, the Liverpool ticket was a firewall issue and should be fixed. Chris has looked into the RALPP errors and is a little confused as there don't seem to be any v6 routing problems but there are too many v6 transfer failures.

Bristol LHCB Ticket
138402 (21/11/18)
Are the issues described in this ticket still happening? That might be a question for the VO rather then the site. (6/12/18)

Last Year's Tier 1 Tickets:
138665 (LFC access issues)
138500 (CMS transfer failures)
138361 (T2K DFC migration)
A quick note that none of these tickets have had an update from the site yet this year to indicate that they've been picked up again after the Holiday break.

Extra Extra 139152 - This Sheffield LHCB ticket from the weekend seems to have been missed, it looks like there might be a black hole node gobbling up LHCB jobs.