Difference between revisions of "Past Ticket Bulletins 2017"

From GridPP Wiki
Jump to: navigation, search
Line 1: Line 1:
 +
'''Monday 14th August 2017, 15.30 BST'''<br />
 +
23 Open tickets this week.
 +
 +
Very few issues stand out this week:<br />
 +
The Oxford HTTP ticket ([https://ggus.eu/?mode=ticket_info&ticket_id=129931 129931]) and the Birmingham ticket ([https://ggus.eu/?mode=ticket_info&ticket_id=129930 129930]) do look very similar, and both issues persist despite work on them.
 +
 +
The Imperial LHCB transfer ticket to SARA ([https://ggus.eu/?mode=ticket_info&ticket_id=129946 129946]) looks indeed to be as Simon thought a networking issue - possibly IPv6/LHCONE related. Interesting stuff, although probably not relevant to all.
 +
 +
Finally there some movement on the Storage Accounting tickets, John has asked Lancaster ([https://ggus.eu/?mode=ticket_info&ticket_id=129183 129183]) to trailblaze a bit but we hit some snags.
 +
 
'''Monday 7th August 2017, 15.00 BST'''<br />
 
'''Monday 7th August 2017, 15.00 BST'''<br />
 
24 Open UK Tickets this month.
 
24 Open UK Tickets this month.

Revision as of 12:57, 21 August 2017

Monday 14th August 2017, 15.30 BST
23 Open tickets this week.

Very few issues stand out this week:
The Oxford HTTP ticket (129931) and the Birmingham ticket (129930) do look very similar, and both issues persist despite work on them.

The Imperial LHCB transfer ticket to SARA (129946) looks indeed to be as Simon thought a networking issue - possibly IPv6/LHCONE related. Interesting stuff, although probably not relevant to all.

Finally there some movement on the Storage Accounting tickets, John has asked Lancaster (129183) to trailblaze a bit but we hit some snags.

Monday 7th August 2017, 15.00 BST
24 Open UK Tickets this month.

NGI
129964 (7/8)
We got ticketed about our tickets! Or to be more exact, we had too many "very urgent" tickets last month that weren't responded to within 1 working day. Of the three such tickets, the QMUL one was well understood, whilst I suspect the Oxford and RALPP tickets were simply missed (and I'm not sure how very urgent they were). So please please set your tickets in progress as soon as you start them, otherwise I have to deal with tickets about tickets, and that's a bit too meta for me! In progress (7/8)

STORAGE ACCOUNTING TICKETS
Cambridge, Bristol, Edinburgh, Liverpool, Lancaster and IC - all are On Hold awaiting "Phase 2" of this campaign.

SUSSEX
122772 (11/7/16)
Atlas webdav/xroot ticket. Any joy with the xroot? In progress (10/7)

RALPP
129886 (1/8)
A request from CMS to check the HC xroot rates the site. Chris couldn't see a problem, so has asked for clarification (and some clues). Waiting for reply (2/8)

129913 (3/8)
CMS transfers failing to the site due to lack of space. Ian has created a dump to help CMS clean up their files - he just needs to know where to put it! Any help from anyone who has done this for CMS before would be appreciated. Waiting for reply (7/8)

OXFORD
129931 (4/8)
As seen by Pete's thread on the storage list, atlas http SAM tests are failing at Oxford. Pete is fighting the good fight. In progress (7/8)

BIRMINGHAM
129930 (4/8)
The same ticket for Birmingham. Perhaps Oxford can trailblaze with this issue? (if it's the same problem.) Assigned (4/8)

SHEFFIELD
129520 (12/7)
A standard issue availability ticket. On hold (12/7)

LIVERPOOL
129296 (30/6)
LHCB ticket about the low number of jobs running at the site, explained to be due to the lower number of cores online whilst the machine room is refurbished. The ticket could do with an update though to let us know how things are coming along. In progress (30/6)

QMUL
129692 (24/7)
LHCB transfers failing during QM SE woes at the end of July. It looks like things are much better now, so I suspect that this ticket can be closed. In progress (2/8)

129155 (26/6)
XROOT not working for lhcb at QM - last update, shortly before Dan's holiday (and the SE keeling over) a test infrastructure was put into place to get round no one at the site having lhcb credentials. Any luck with this, or has things been too busy? In progress (25/7)

IMPERIAL
129946 (6/8)
LHCB having trouble uploading to SARA. I'm wondering if this is a problem SARA end... Waiting for reply (7/8)

BRUNEL
129807 (27/7)
A long and involved CMS ticket about squids. I didn't follow it all, but one port Raul brought up was that I don't think there's a document detailing the ports that sites need open. The ticket has moved to talk about squid monitoring on the side. In progress (4/8)

100IT
Have one ticket (129023) that's being handled well.

TIER 1
129883 (1/8)
CMS low HC xroot rates - a similar ticket to the RALPP one. Problems persist, despite investigation and service restarts. In progress (3/8)

127597 (7/4)
A CMS ticket to check the RAL networking/xroot performance (similar to the previous one), waiting on news from the RAL networking team for the last 3 months as they investigate a firewall problem. On hold (14/6)

128991 (16/6)
Tape support for solidexperiment.org. Waiting on the Tier 1 Resources Meeting at last check. On Hold (20/7)

124876 (7/11/16)
Gridftp SAM tests failing for the echo - still waiting on the tests to be fixed (125026) - no news on the counter ticket since April. On hold (1/1)

117683(18/11/15)
The most venerable ticket - glue 2 publishing for Castor. Slowly chugging along in the background. On hold (6/7)

Monday 31st of July 2017, 15.30 BST
29 Open UK Tickets.

OXFORD
129723 (24/7)
Oxford had a few tickets regarding their ARC CE running out of memory. After upgrading to a less-leaky version have the problems gone away? In progress (25/7)

QMUL
The site has a few tickets due to some problems with the STORM, detailed in:
129684 (atlas)
129692 (lhcb)
I cheekily solved the related Sno+ ticket:
129726
as the VO reported not seeing any more problems. I hope things are on the mend.
Update - the atlas ticket is now solved, so presumably/hopefully things are fixed for lhcb too.

SHEFFIELD
129054 (19/6)
The ATLAS MCORE failures continue, the submitter dropped Alessandra's name, referring to her work fixing similar sounding problems at Israeli sites. Maybe she can help? In Progress (27/7)

GLASGOW
124052 (25/9/16)
I think this long-standing Glasgow ticket can be closed, Gareth reports it fixed (and shows his working!). Waiting for reply (25/7)

BRUNEL
129807 (27/7)
No problems with the ticket, but it's an interesting chronicle of trying to use RAL squids. In progress (31/7)

TIER 1
129342 (4/7)
This MICE SRM test ticket hasn't had any news on it for a while. Should it be on hold for Summer? In progress (19/7)

129573 (16/7)
This Atlas transfer failure ticket is also a bit quieter then we'd like, no news for a while for the RAL team. Sadly I've had no luck with DDM today to see if the problem persists. In progress (25/7)

Monday 24th July 2017, 15.30 BST
29 Open UK Tickets this week.

QM SE having a spot of bother?
129692 - LHCB
129684 - Atlas
Two tickets submitted over the weekend suggest that the QM SE has keeled over. No reply to them at the time of writing, but when your SE dies answering tickets is the least of your worries.

TIER 1 Transfer problems
129573 (16/7)
Whilst on the subject of transfer errors, it looks like atlas are still seeing failures with the symptoms describe in this ticket. In progress (22/7)

Atlas Jobs at Sheffield
129054 (19/6)
Atlas multicore jobs seemingly being killed by the batch system at Sheffield. Elena narrowed the failures down to a single task, suggesting the jobs were doing something weird - which isn't the first time we've seen a dodgey batch of jobs recently. But following the panda link it looks like these errors are persisting. In progress (21/6)

GLASGOW - the cursed lhcb ticket.
124052 (25/9/2015)
Vladimir replied today that the attempted fix didn't work for svr009. However he didn't answer any of the other questions. Perhaps talking to Raja or Andrew directly in the meeting tomorrow will help clear things up? In progress (24/7)

Tuesday 18th July 2017, 10.00 BST

34 Open UK Tickets this week. Last minute look at the tickets as I was off yesterday!

Waiting for Reply from LHCB
124052 - Glasgow Job-Publishing Ticket
129312 - RALPP ARC CE Ticket
Both of these tickets are waiting for LHCB feedback.

Atlas Multi-core Job Problems?
129054 - Sheffield
129593 - Oxford
Whilst the symptoms are a little different, could there be a link between these two atlas tickets?

Storage Accounting Tickets
A quick note that all 6 of these tickets are chugging along nicely, with only the Liverpool figures still "under scrutiny". There's a plan involving keeping them open.

Monday 10th July 2017, 16.00 BST
34 Open UK Tickets this week.

NGI/SUSSEX
129383 (5/7)
A ticket to the NGI requesting explanation for Sussex's not stellar second quarter results. I contacted Jeremy and Leo, and got an out of office reply from the latter... In progress (10/7)

SUSSEX
127767 (18/4)
Very much related, a availability/reliability ticket. Daniela noted last week that things too a turn for the worse - perhaps the machines rebelling with Leo away. On hold (6/7)

TIDBIT FROM DUNCAN
As noted in the ECDF ticket 129346 Perfsonar support for CentOS6 is likely to disappear at the start of next year. Perhaps we need an upgrade drive?

Talking of Edinburgh things, it appears that you're having SRM problems today, don't forget this (likely related) atlas ticket when you're done: 129452.

Update this Slater... (sorry for the pun)
I try not get get personal, but two tickets seem to be waiting for input from the ever busy Mark: This Birmingham ticket from Brian (which looks to have solved itself, so it's the best kind of ticket): 129441

And this LHCB ticket which you submitted to the Tier 1 is waiting to see if the problem is solved: 129059

But perhaps you're on a well-deserved holiday too?

Solid Tape Support
128991 (16/6)
Solidexperiment.org's request for tape at the Tier 1 has been semi-bounced back with a request for more details on the requirements. Useful to note if any other experiments start requesting tape. In progress (5/7)

Monday 3rd July 15.00 BST
36 Open UK Tickets this month.

SUSSEX
122772 (11/7/16)
Start with a golden oldie, the Sussex webdav/xrootd ticket. No news since some promising progress a few months back. As Alessandra asks in the ticket, any news on this? In progress (9/5)

127767 (18/4)
An availability ticket, hopefully the stats will be green enough to close this in a week or two. On hold (26/6)

RALPP
129312 (3/7)
Fresh this morning, an lhcb ticket about one of the ARC CEs not responding (heplnv147). Assigned (3/7)

129316 (3/7)
A ROD ticket about the same server. My powers of deduction suggest that it's a bit poorly. Assigned (3/7)

OXFORD
129201 (27/6)
A glue2 ROD ticket. Kashif sees an intermittent failure and had no joy debugging it so will close the ticket shortly. In progress (29/6)

CAMBRIDGE
129187 (26/6)
One of the "check your storage accounting" tickets, just waiting on word back from JG to try out a new gocdb related to this. Waiting for reply (28/6)

BRISTOL
126864 (28/2)
Enabling LZ (and friends) and Bristol. Things are coming along nicely, and after giving helpful advice about glexec Daniela is primed to send test jobs. In progress (15/6) Update, Winnie has got back reporting that LZ, Dune and LSST are ready for testing. Sweet!

GLASGOW
124052 (25/9/16)
LHCB ticket regarding publishing ARC at Glasgow (aka the Cursed Ticket). Gareth has updated the ticket with two prongs of hope - one is that a CE at Glasgow has been updated to a version of ARC with the hopeful fix in it, and the other is that VAC use by LHCB at Glasgow could make the issue moot. Both need feedback to confirm. Waiting for reply (30/6)

EDINBURGH
129185 (25/6)
The ECDF edition of the storage accounting tickets. Waiting on Andy's return from hols to confirm the figures. In progress (26/6)

SHEFFIELD
129230 (28/6)
Atlas transfers failing with time outs - Elena noted a rogue disk server and hopefully has kicked it into shape soon. In progress (3/7)

129054 (19/6)
Atlas mcore jobs being killed a Sheffield. At first glance Elena suspected the jobs hit the time limit, but CREAM errors have been spotted. Let us know if you need a hand debugging this. In progress (21/6)

LIVERPOOL
129296 (30/6)
LHCB wondering why they're not seeing more jobs at Liverpool. John politely reported on the state of the Liverpool AC. Hope things stay cool(ish). In progress (30/6)

129184 (26/6)
The Liverpool storage accounting check ticket - John provided some good feedback. In progress (30/6)

129288 (30/6)
Duncan spotted a problem with the Liverpool perfsonar results. John reports a full root partition and has resized it. Hopefully nothing go fubared, playing the wait and see game. In progress (30/6)

LANCASTER
129183 (25/6)
The Red Rose edition of the Storage Accounting Check. I was hoping to close this today, but my dpns-du is still going. In progress (27/6)

129242 (29/6)
Availability ticket after a rocky few weeks, on the road to recovery now. On hold (3/7)

129287 (30/6)
Another ticket from Duncan, the Lancaster perfsonars hadn't initialised their IPv6 address properly after a reboot. Fixed, but needs investigating. In progress (30/6)

RHUL
129223 (28/6)
In its continued decommissioning, Daniela has asked that vo.londongrid.ac.uk be removed from the site. Govind just has the SE to go. In progress (28/6)

QMUL
129221 (28/6)
A similar vo.londongrid.ac.uk decommissioning ticket for QM. Not spotted yet? Assigned (28/6)

129155 (26/6)
LHCB having issues accessing files over xroot at QM. This has only been tried in anger for atlas before, so it looks like some teething troubles are being seen. Daniela has asked for a handy CMS file to try out just in case. In progress (3/7)

IMPERIAL
129182 (25/6)
The Imperial storage accounting ticket, Simon replied and just waiting on the gocdb thing now. Waiting for reply (28/6)

128555 (30/6)
Mysterious CMS gfal problems in jobs at IC. It looks like this is still very much under investigation, although the site is drained of jobs in the meantime. In progress (3/7)

BRUNEL
129222 (28/6)
Just an londongrid vo decommissioning ticket here, progressing nicely. In progress (1/7)

100IT
Have 4 tickets, only one properly fielded:
128976
129023
128095
129156

THE TIER 1
129299 (3/7)
A CMS user spotted errors trying to pull data from RAL. Looks like checksum differences - the user has got back and it looks like the file is corrupt. In progress (3/7)

129211 (27/6)
Atlas transfer failures to Tokyo - they seem to be gone now, despite nothing being fixed or found to be wrong. If still fixed the ticket can be closed. In progress (29/6)

129059 (20/6)
LHCB spotted timeouts, which look to have since been resolved. Just waiting on confirmation of this before closing the ticket. Waiting for reply (28/6)

129098 (22/6)
Another atlas transfer ticket, thanks to Alastair and Gareth for trying to clear things up between the two issues. Are these globus errors still occuring?. In progress (29/6)

129228 (28/6)
A CMS ticket for low xroot success rates for HC jobs. A restart of xrootd redirectors seems to have fixed a few issues for Andrew at least, waiting on user feedback. Waiting for reply (30/6)

127597 (7/4)
A CMS ticket checking on site networking at RAL, waiting on a update from the RAL networking team. On hold (14/6)

128991 (16/6)
solidexperiment.org requesting tape support at RAL. Passed on to the castor team. In progress (16/6)

124876 (7/11/2016)
ROD ticket for echo, due to the probes not using the correct paths. Still no movement on the counter-ticket 125026. On hold (1/1)

117683 (18/11/15)
Glue2 publishing for Castor. Some progress, but no news for a few months. On hold (10/5)

Tuesday 27th June 2017
32 Open UK Tickets this week.

No proper update as I was off yesterday, but we have have half a dozen tickets from JG regarding verifying our storage accounting (example 129183).

Another ticket of interest is 129072 - asking for the vo.londengrid.ac.uk to be removed from the Tier 1 resources, as part of decommissioning the VO.

The Link to the UK tickets, just in case we need it.

Monday 5th June 2017, 14.30 BST
Down to 21 Open UK Tickets this month.

SUSSEX
122772 (11/7/116)
Atlas xroot/webdav ticket, with just xroot to go. Any luck with the xrootd server? In progress (9/5)

127767 (18/4)
Availability ticket - Daniela notes that there are still issues with test jobs not running in time, and advises perhaps reserving a slot for tests. On hold (25/5)

RALPP
127555 (7/4)
Another availability ticket, although Chris points out several valid reasons why this points to the monitoring being fubared, vitiating RALPP's results. On hold (30/5)

OXFORD
128512 (27/5)
LHCB spotted a problem with the Oxford ARC CE, which Kashif seems to have cleared up with a reboot, but no word from lhcb since. Could do with a poke. In progress (30/5)

BRISTOL
126864 (28/2)
Request to enable LZ at Bristol. Winnie is progressing with rolling out the pool accounts etc for LZ, Dune and LSST as fast as her limited time allows. In progress (30/5)

GLASGOW
124052 (25/9/16)
The cursed ticket, requiring new ARC CEs to fix a problem with publishing seen by lhcb. Hopefully enough positive karma has been accrued to give some breathing space to get round to this. On Hold (4/4)

ECDF
128294 (12/5)
Availability ticket - looking into the ECDF argo page is a journey into the unknown (status), with their infamous -1 availability. If you're site seems to be capping out at 99% then I'd blame the ECDF guys for stealing that extra 1% :-P In progress (19/5)

SHEFFIELD
127766 (18/4)
Another availability ticket, just waiting for the numbers to sooth themselves. Hopefully not too badly affected by the next ticket. On hold (25/5)

128429 (19/5)
BDII test failure ticket, the alarm was fixed by Elena syncing her clocks, but glue-validator failures are still seen. Kashif right recommends running the test manually. In progress (2/6)

LANCASTER
128321 (15/5)
And another Availability ticke. After improving test job flow through our cluster we're on holding the ticket. At least we're in good company. On hold (26/5)

RHUL
128750 (1/6) Duncan submitted a perfsonar related ticket after spotting some problems, Govind is investigating the issue. In progress (5/6)

IMPERIAL
128555 (30/5)
The Imperials are having an issue with gfal-copy within CMS jobs. Despite not being able to replicate manually, some quiddity of the job environment causes gfal-copy to seg fault. The site has tried a different version and raising the ulimit, but no joy yet despite the efforts. In progress (5/6)

100IT has 3 tickets that I won't go into.

TIER 1
124876 (7/11/16)
gridftp tests failing for echo, due to a problem with the tests. No movement on the counter ticket (125026) since April. On Hold (1/1)

127612 (8/4)
LHCB having problems with the RAL CEs, which seem to be ongoing (although they might have changed in nature). No news on the ticket in the last fortnight though. In progress (23/5)

127967 (27/5)
Enabling MICE pilots at the Tier 1. The accounts are created but it looks like job submission isn't working yet for this role. In progress (25/5)

127240 (21/3)
CMS staging tests. The last entry from the user was a request to clarify what the numbers in the plots meant and for additional plots. In progress (18/5)

127597 (7/4)
CMS would like the Tier 1 to check their xroot/networking performance. In response Andrew L had switched off "lazy download" Andrew asked if this has helped, but the issue is muddled by the firewall at RAL dropping packets, awaiting news from the RAL networking team. In progress (30/5)

117683 (18/11/2015)
Castor Glue 2 publishing ticket. Rob updated that development is still ongoing. On Hold (10/5)

Wednesday 24th May 2017
Due to the Bank Holiday and Matt being on leave for the Ops meeting here's a a link to all the UK tickets to tide you over.

Hope everyone had a nice weekend!

Monday 22nd May 2017, 15.30 BST.
33 Open UK Tickets this week.

TIER 1
128180 (5/5)
An IPv6 readiness ticket for the Tier 1. It could likely do with an update soon. In Progress (8/5)

127968 (27/4)
This MICE data access ticket looks like it can be closed as data can be accessed by the looks of it, probably best to double check first though. In Progress (17/3)

BRUNEL
128434 (20/5)
This CMS ticket, which looked like it could be closed, has been intercepted by others in the VO citing high remote read errors at the site. There might be a more insidious problem with the site network. In progress (21/5)

SHEFFIELD
128074 (3/5)
127904 (24/4)
I just realised that these two tickets (from lhcb and ops) likely have the same root cause (detailed in the INFN ticket https://ggus.eu/?mode=ticket_info&ticket_id=127725).

LIVERPOOL
128328 (15/5)
gocdb entry "webdav for OPS VO". I haven't had a chance to think on this over the last week, any thoughts your end Steve? Or anyone else? In progress (15/5)

BRISTOL
126864 (28/2)
Winnie has asked in this LZ enabling ticket if LZ et al really don't need "sgm*" and "prd*" users. I believe they don't (at least they don't at Lancaster!). Waiting for reply (22/5)

And finally thanks to Daniela for helping Sussex out closing the Sno+ file access ticket!

Monday 15th May 2017, 15.00 BST
34 Open UK Tickets this week - we're whittling them down.

Merseyside Webdav Access
128328 (15/5)
The Liver Lads got a ticket concerning Liverpool's information in the gocdb and how it was lacking a webdav endpoint. I'm not sure DPMs explicitly have a https://<hostname>443/webdav/<vo? endpoint, and webdav access just used the same namespace as everything else. But I could be wrong. Steve was on it though. In progress (15/5)

Sneaky Ticket at ECDF
128294 (12/5)
This ROD availability ticket snuck in on Friday and looks like it hasn't been noticed yet. Assigned (12/5)

Also regarding the SE ticket 128129 - we've been bitten by similar sounding issues at Lancaster, our SE was looking really clogged up and service restarts weren't cutting it - we had to reboot which seemed to cheer everything up. And regarding the perfsonar ticket 127940, we had the same error message was was fixed by sorting out IPv6 for the node which had gotten into a weird state. Not sure if that applies though.

UK XROOTD Redirector at RAL
127598 (7/4)
Andrew reports that the machine to host this service is installed and awaiting firewall holes. Nice. In progress (12/5)

THANKS TO DANIELA AND DAN for helping Sussex out.
122772 (11/7/16)
Webdav/xroot ticket - atlas have enabled webdav and I assume it's working for them. In progress (9/5)

125503 (9/12/16)
Sno+ troubles after Sussex SE renaming- Daniela has been helping out and devising an LFC renaming strategy. Waiting for reply (10/5)

Monday 8th May 2017, 14.30 BST
42 Open UK tickets this week.

GOCDB CHECK TICKET
127588 (7/4)
Thanks for everyone who has filled in the table - just a few more sites and the Tier 1 to go. In progress (4/5) Update - just the Tier 1 to go now

REALLY COULD DO WITH AN UPDATE

TIER 1
127240 (21/3)
CMS staging tests - the submitter asked for some monitoring links a while ago, and has since re-asked. In progress (27/4) Update - thanks Andrew for providing some monitoring plots.

127598 (7/4)
UK Xroot redirector ticket, it would be a shame if momentum was lost on this. In progress (19/4)

127968 (27/4)
One of several MICE tickets submitted by Ray, this was waiting on word from the storage sysadmins for comment. In progress (27/4)

127612 (8/4)
LHCB are still having problems according to their last update to this RAL CE ticket. In progress (4/5)

SHEFFIELD
127904 (24/4)
Another MICE ticket, the pilot role is having trouble submitting to one CE. In progress (24/4)

BRISTOL
126865 (28/2)
IPv6 transfers for CMS, submitted by Daniela. She's asked for the work to be picked up again. In progress (4/5)

126864 (28/2)
Similarly with this LZ-enablement ticket. In progress (31/3)

CAN BE CLOSED(...?)

QMUL
127551 (6/4)
This Sno+ ticket looks like it can be closed, as David suggests that jobs were running once more. In progress (19/4)

126650 (15/2)
Similarly this cern@school ticket is just waiting confirmation that all is well with their jobs. Waiting for reply (8/5)

OXFORD
127778 (19/4)
It looks like this CMS ticket, about orphened jobs that got lost at Oxford, can be closed. In progress (4/5)

TIER 1
127916 (25/4)
LHCB srm problems at RAL - a plan is in place so the ticket can be closed (unsolved?) or at least on held. In progress (2/5)

BRUNEL
127518 (5/4)
CMS replied to the question regarding the srm protocol in the site's storage.xml, so maybe this issue can be closed? In progress (26/4)

Friday 28th of April 2017

No proper summary this week due to Mayday- but here's the obligatory UK ticket link.

I will be looking at this NGI gocdb cconsistency ticket: 127588</br > The table to fill in your site's progress on this is here.
Thanks to the sites that had filled the table in already!

Update - Duncan submitted a bunch of perfsonar tickets just before the weekend - but as Dan noted in the QM ticket 127938 it looks like the latest perfsonar update hasn't gone smoothly for everyone.

Monday 24th April 2017, 14.30 GMT
36 Open UK tickets this week.

NGI
127588 (7/4)
As mentioned before Easter, this is a yearly review of the information in the gocdb. The deadline for this is Friday. Perhaps a swift wiki-table is in order to check the status? The really important security contact information has been check this year thanks to the security challenge, so we're good on one front at least. In progress (10/4)

126808 (24/2)
WMS usage ticket - not sure where we want to leave this and I don't think much will happen on it for a while. In progress (20/3)

SUSSEX
122772 (11/6/16)
Atlas webdav/xroot access ticket. I understand thanks to Dan's efforts there was some pre-gridpp progress on this. Can we have what was achieved "in writing" please! On Hold (6/3)

127767 (18/4)
ROD availability ticket from last week - not noticed by the site yet by the looks of it. Assigned (18/4)

127768 (18/4)
The likely cause of the low availability, a ROD "out of date CA test" ticket. Thanks to Daniela for providing some wisdom. In Progress (24/4)

125503 (9/12/16)
Sno+ file access ticket. A strategy was developed a while ago to try to tackle this, any further plans/problems? In progress (30/1)

RALPP
127555 (7/4)
A availability ticket due to the arc monitoring shenanigans. Chris has on held it as per the tradition. On hold (7/4)

OXFORD
127778 (19/4)
A CMS ticket, but I'm not sure it should have been assigned to the site. Some jobs that will never go anywhere somehow ended up at Oxford. Not the site's problem, but the conversation about Oxford's status within CMS production should be interesting. In progress (19/4)

BRISTOL
126865 (28/2)
A CMS related ticket from Daniela, regarding ipv6 phedex transfers from Bristol's SE. There's a question left hanging on the ticket regarding the IPv6 status of the cern gfal2 tools. Anyone have any knowledge about this? In progress (31/3)

127783 (19/4)
Some CMS sam test failures have re-cropped up. Although the link shows all green (or at least sickly yellowy-green) for me so perhaps this can be closed again? Reopened (21/4) Update - solved again, likely a problem with the SAM tests occasionally not running soon enough.

126864 (28/2)
Request to enable LZ at Bristol. How this progressing? In progress (31/3)

BIRMINGHAM
127319 (27/3)
Another Availability ticket, checking the argo link the Easter period has been full of Unknown's for Birmingham, so I didn't on hold the ticket yet. In progress (In progress (3/3)

GLASGOW
124052 (25/9/16)
The cursed ARC job publishing ticket- Gareth braved the bad mojo haunting this ticket to provide an update, hoping to be free of it by the end of May. On hold (4/4)

DURHAM
127832 (21/4)
LHCB job submission to Durham seems to be failing. Hope it's not all gone horribly wrong. Assigned (21/4) Update - In progress, Oliver notes some ldap problems causing trouble, but these are being fixed.

SHEFFIELD
127766 (18/4)
Another ROD availability ticket, tests have turned green (after the website was fixed) so just needs to be waited out. On hold (19/4)

MANCHESTER
127644 (10/4)
After a brief config mishap some icecube GPU jobs ran on some non-GPU nodes. Alessandra pounced on the issue, and asks if the ticket can be closed. Waiting for reply (20/4)

QMUL
126261 (30/1)
One of the QM ces not working for biomed, at last check the problem persists- probably due to the other woes befalling ce04. In progress (4/4)

127445 (1/4)
A biomed ticket for the other CE, which is also still not working for biomed. In progress (24/4)

126650 (15/2)
cern@school pilot problems. The initial problem was fixed, but one of the CEs is still playing up after getting into a bad state running out of disk. In progress (19/4)

127551 (6/4)
A Sno+ ticket, where Sno+ jobs were having problems competing with atlas for disk space on the nodes - compounded by jobs not cleaning up properly. Not limited to QM (Lancaster had the same issue), but these issues seem to have passed - for now. In progress (19/4)

127352 (28/3)
An Icecube ticket regarding job problems on a GPU node. Dan is taking this node out to use for centos7 testing, which the VO was happy with. This ticket is in limbo now, either needing on holding or solving. In progress (31/3)

127144 (15/3)
LHCB having trouble with ce04 also - with the CE's recent problems I don't know if things would be looking better on this front? Waiting fore reply (31/3)

BRUNEL
127518 (5/4)
The last of the CMS tickets asking to remove rfio from site's storage.xml. Raul will deal with this when he's back in the UK, but thanks Daniela for the kind offer of help/being a scapegoat. Are you back yet Raul? In progress (12/4)

127117 (13/3)
A CMS request to upgrade the spacemon client, which Raul didn't get round to before his hols. Could do with an update when you can get to it. In progress (14/3) Update - Raul is trying to tackle this, but has asked for some help as the DPM instructions seem quite atlas specific.

THE IMPERIAL CLOUD (which is sadly not a Star Wars Super-Weapon)
127620 (9/4)
David from snoplus has noticed jobs landing and failing on the Imperial Cloud, and has asked what the heck it is (but he asked more politely). Simon has spotted the problems and disabled the cloud site so it won't eat more jobs, hoping to fix it this week. On Hold (12/4)

100IT have 2 tickets:
127827
127539
But they're being handled fine.

THE TIER ONE
127597 (7/4)
A request from CMS to test xrootd and networking performance at RAL after noticing a large drop in job efficiency when access data offsite. Andrew L spotted that this was likely due to using "lazy download", and this is being removed from across the WNs but it is noted that this is needed for CEPH... In progress (12/4)

127240 (21/3)
Another CMS ticket, for a Run2 staging test. Tests were done, Sebastian from CMS has asked for access to some site monitoring plots so they can compare what they see to the "real values". In progress (12/4)

126905 (2/3)
Finishing off commissioning the solidexperiment.org cvmfs server. Just needs one last check user-side to make sure that the statum replication took and job's a good'un. Waiting for reply (21/4)

127388 (29/3)
An lhcb user is having trouble accessing files at RAL. The user has provided details of how they are trying to access the files using root - it's been donkey's years since I've mucked about with root - is the problem simply that castor won't accept xroot connections like this? In progress (20/4)

127612 (8/4)
An LHCB ticket where all the RAL CEs rejected LHCB jobs. There were some issues for a while, but things petered out over Easter. Any news? Hopefully all is well. In progress (12/4)

127598 (7/4)
A CMS ticket from Chris, regarding a cunning plan that was likely set in motion at GridPP38 - the setting up of a UK XrootD Redirector at RAL to pair with the one at Imperial. Likely stalled due to Easter, but Simon's added some notes on their config changes. In progress (19/4)

124876 (7/11/16)
ROD gridftp tests failing for CEPH, due to a problem with the tests. Alaistair poked the ticket to fix the tests (https://www.ggus.org/index.php?mode=ticket_info&ticket_id=125026) - these fixes haven't been implemented yet. On hold (1/1)

117683 (18/11/15)
Castor Glue2 publishing. As expected slow progress here due to the lack of effort available, but the last update was promising. On hold (2/3)

Monday 10th April 2017, 15.00 BST
43 Open UK tickets this week (not quite a record, but getting close).

UK NGI - GOCDB INFORMATION REVIEW
127588 (7/4)
There's a yearly drive to double check the NGI contact information in the GOCDB. There's a list of tasks for NGI managers, but RC managers (aka site admins) are also requested to review their contact details. If everyone could double check the information as requested that would be great - we've got a leg up on this thanks to Ian's Security Contact Challenge. They have asked for this to be completed by the 28th of April - so we'll bring this up again at that week's Ops meeting. In progress (10/4)

CMS RFIO tickets (5/4)
CMS have released a bunch of tickets to Glasgow, Edinburgh, UCL, Brunel and the Tier 1 asking for the rfio protocol to be removed from their site 'storage.xml's. The Edinburgh and UCL tickets might not have been noticed yet as they were still in the "assigned" state at time of writing, the rest seem to be going okay:
UCL: 127526
ECDF: 127527

QMUL Biomed ticket
127445 (1/4)
I don't think this ticket was a poorly thought out April Fool's day joke, so please could it be acknowledged (or just dismissed with a prompt solving/unsolving if it's a load of cobblers). Assigned (1/4)

IC MICE TICKET
127473 (3/4)
This Mice ticket has been bounced around a bit, so you may not have noticed it arrive in the Imperial Inbox (on top of all the other work).

That's all for this week, next week I'm enjoying Lancaster's generous long Easter closure so if there is a meeting I won't be there, so here's a link to all the UK tickets in case you need it.

Monday 3rd April 2017, 14.30 BST
30 UK tickets this month

SUSSEX
122772 (11/7/16)
Atlas webdav/xroot ticket. Any luck, or would you like a hand at GridPP this week? On hold (26/1)

125503 (9/12/16)
Sno+ ticket about file access problems due to a wrong SE name in the LFC. Any word on this too? I think a plan was put in place. In progress (30/1)

RALPP
126902 (2/3)
CMS ticket, I got a bit lost trying to follow it but a moot point as CMS indicate it can be closed. In progress (3/4)

BRISTOL
126864 (28/2)
Request to enable LZ, Daniela has provided the requested information. In progress (31/3) Update - solved

126865 (28/2)
A CMS ticket from Daniela, concerning ipv6 transfer failures to/from Bristol. Things were looking better, although there is an outstanding question that Winnie highlighted about the CERN setup that perhaps Duncan or someone could answer? In progress (31/3)

BIRMINGHAM
127319 (27/3)
A low-availability ticket. Whilst these are boring it needs to be tended (i.e. put In Progress or On Hold). Assigned (27/3) In progress - Mark cites a misbehaving DHCP server causing hassle.

GLASGOW
124052 (25/9)
LHCB ticket concerning incorrect job publishing, to be fixed in the next generation of ARC CEs deployed at Glasgow. Sadly the time has come for another update, even if it's a totally dry one. On Hold (31/1)

127160 (16/3)
An availability ticket. Nothing more to say then that. On hold (16/3)

SHEFFIELD
127210 (19/3)
Atlas transfer timeout failures. After coming out of downtime failures persist. Perhaps a similar problem to what we saw at Lancaster last week? As per the post to the storage list those issues were apparently soothed by increasing the DPM threads. In progress (3/4)

MANCHESTER
127464 (3/4)
A very fresh atlas deletion error ticket. In progress (3/4)

127384 (29/3)
LSST authorisation failure ticket. Alessandra has tracked down hopefully all the config errors that crept in during the move from svn to git. Hopefully this is nearly sorted. In progress (31/3)

LIVERPOOL
124819 (3/11/16)
AFS ticket. After the firewall ports were opened the submitter provided some feedback, but no news back from the site. Perhaps just put this ticket out of its misery (like what will soonish happen for AFS itself)? In progress (13/2)

127353 (28/3)
Steve bravely rolled out a small Centos7 test cluster and Sno+ job accidentally landed on it - they kept it that way to test things out but sadly it looks like their tests failed and have asked for their jobs to not land on the test cluster anymore. In progress (2/4)

126956 (6/3)
Availability ticket due to the annoying ARC monitoring issues. On hold (27/3)

QMUL
127352 (28/3)
Icecube jobs failing on a QM GPU node - the likely cause has been spotted (old AMD libs sitting on the system with a new nvidia card in it) but it might be a little while till this is fixed. Dan has proposed using this as an opportunity to roll out a Centos7 test node which Icecube were okay with. In progress (31/3)

127144 (15/3)
LHCB saw problems with ce04, which Dan reckons were caused by load and has asked if there are still problems. Waiting for reply (31/3)

126261 (30/1)
A biomed ticket for ce04, although they rechecked if this was still a problem during the aforementioned load problems. There seems to be other errors too though- maybe related to the biomed infrastructure? In progress (31/3)

126650 (15/2)
cern@school errors due to a misconfig in the VO usernames (slurm only does lowercase usernames!). Dan has rolled out the new users and Daniela has rolled out some tests jobs. In progress (31/3)

127445 (1/4)
Another biomed submission error ticket, I'm not sure if this is a duplicate of 126261. It looks like a similar error (on ce5 this time though). Assigned (1/4)

BRUNEL
127117 (13/3)
A request from CMS to upgrade the spacemon client. Raul was on it. Any luck with this? Although I've just remembered that Raul is in a different hemisphere so that question might fall on a deaf inbox. In progress (14/3)

127126 (14/3)
Availability ticket, again by the looks of it due to the ARC monitoring playing up. On hold (27/3)

TIER 1
127251 (21/3)
A ticket from an atlas user concerning transfers into castor have trouble and some errors the user is seeing. John has requested more information as the files themselves seem present and correct, but someone who has some idea as to what the error messages listed by the submitter mean would be handy. Waiting for reply (27/3) Update - closed as likely a problem with the user's code.

127449 (2/4)
One of the RAL ARCs wasn't working well for LHCB - but the problems appear to have passed and the ticket can be closed now. In progress (3/4)

126905 (2/3)
CVMFS commissioning for the SOLID experiment. With effort from Daniela and Catalin things all look to be working for solid now with /cvmfs/solidexperiment.egi.eu exported nicely and uploadable to by the VO. Looks like another ticket can be closed. Waiting for reply (29/3) Update - before it gets closed there has been a request for some extra information from Catalin.

127388 (29/3)
LHCB troubles accessing some files at RAL. Have these issues passed with the other castor problems from the weekend? In progress (3/4)

127240 (21/3)
CMS request to run staging tests in prep for Run 2. There was a request from CMS for access to some monitoring plots, I assume for the transfer rates between buffers, but it wasn't very clear. In progress (27/3)

126184 (26/1)
Atlas request for site monitoring input. Alessandra went over this in last week's atlas uk meeting. It's not too late to have your say in the google docs. In progress (7/2)

124876 (7/11)
ROD ticket concerning tests to the RAL echo instance. Alastair's counter ticket (ticket 125026) hasn't had an update since last year - I think it needs a kick. On Hold (1/1)

117683 (18/11/15)
Castor Glue 2 publishing. Rob reported some good progress. On Hold (2/3)

NGI
126808 (24/2)
WMS usage ticket - mainly involving Imperial and the Tier 1. There was some worry from Daniela regarding the closure of old WMS tickets due to it being "no longer supported", but there were reassurances that security bugs would be fixed. Are you feeling reassured? In progress (20/3)

Monday 27th March 2017, 15.15 BST
26 Open UK Tickets this week.

STALE INFORMATION
Nearly a third of the UK tickets are not On Hold but not received an update in over 10 days. Before I go over all the tickets next week please could everyone check their older tickets. In light of that I'll have a delicate review of the tickets this week.

ATLAS MONITORING
126184 (26/1)
Has everyone who had input on the atlas site monitoring survey said their piece? If yes then this ticket has done its job. In progress (22/3)

BAG OF ON-HOLDING
126956 (6/3)
I cheekily set this Liverpool availability ticket on hold as per the S.O.P. On Hold (27/3)

I did similar to this just as unjust Brunel ticket: 127126

CAN BE CLOSED
126976 (6/3)
This Sno+ ticket looks to be solved (it was fixed by adjusting the acls and directory permissions after implementing the new spacetoken) - so the ticket can probably be closed. In Progress (25/3)

Monday 20th March 2017, 14.30 GMT
25 Open UK Tickets this week.

But first, a ticket from the UK
127224 (20/3) Thanks to Daniela for filing a ticket about the ARC monitoring weirdness seen across sites the last few months. Let's hope the monitoring team can get to the bottom of this. Update - some ticket confusion mixing this up with 126724 from Steve himself.

NGI
126808 (24/2) The "WMS usage" ticket - Daniela notes that after some disheartening closure of the WMS tickets due to lack of support withdrawal of the service might need to happens sooner rather then later. There is an attempt to reassure us that security bugs would be fixed in spite of the lack of dev effort. In Progress (20/3)

Not ECDF
127223 (20/3) Daniela's quest to sort the dashboard continues with this request to get someone to look at anomalous alarms at ECDF. Sadly the TPM routed it back to the UK. I've tried to bounce it back. Assigned (20/3) In fairness this tickets was re-routed and completed before I finished writing this up.

TIER 1 IPv6
127185 (17/3) Of interest - a ticket from WLCG requesting that the Tier 1 completes a survey about it's IPv6 readiness and plans. In Progress (17/3)

OXFORD
126928 (3/3)
Are the transfer failures that prompted this atlas ticket still plaguing the site after Kashif's fix last week? Looking at the DDM plots myself I don't think they are, so it looks like this ticket can be closed. In progress (15/3)

And finally, SUSSEX and BRISTOL have a few tickets that could do with an update Although the two Bristol tickets are only a few weeks out of date and I have not followed through with providing support and encouragement to Sussex, so my bad there.

Thursday 9th March 2017
Matt's on leave so no ticket update from him, but console yourselves with a link to all the UK tickets!

Monday 6th March 2017, 15.00 GMT
21 Open UK Tickets this month

TIER 1 and IMPERIAL
126808 (24/2)
WMS "Usage Survey" ticket. Both WMS sites have replied and it has been noted that the two big WMS users in the UK (mice and t2k.org) can be encouraged to use dirac. In progress (28/1)

SUSSEX
122772 (11/7/16)
Atlas webdav/xroot ticket. It's a bit of a baptism of fire for the new admin, and any luck with this? The storage group are always happy to help. On Hold (26/1)

125503 (9/12)
Sno+ file access problems due to what appears to the SE headnode moving. Any movement here? I think a plan was made at least. In progress (30/1)

RALPP
126902 (2/3)
CMS ticketing RALPP essentially because their multicore jobs aren't getting slots. Chris notes this problem may be compounded by these CMS jobs being particularly RAM-hungry in their resource requirements. It doesn't look like this is a site problem really (as is noted in the ticket). In progress (2/3) Update - CMS have confirmed that this ticket can be closed.

OXFORD
126928 (3/3)
Atlas transfer failures ticket - Kashif spotted that the gridftp service wasn't listening and restarted it - I suspect this ticket can be closed but is it me or has the gridftp service been a bit flakey recently (particularly for Oxford?). In progress (3/3)

BRISTOL
126864 (28/2)
A ticket to track LZ deployment at Bristol. Ticking along. In progress (1/2)

126865 (28/2)
Investigating the cause of Ipv6 CMS Phedex transfer failures at Bristol. I think things are looking good after IPv6-ing the GridFTP server, the ticket was discussing how best to debug transfers. In progress (2/3)

GLASGOW
124052 (25/9/16)
LHCB ticket concerning incorrect CPU publishing from the Glasgow ARCs - Gareth was hoping to have things sorted when they rolled out the next generation of Centos7 ARC CEs. Any joy? On hold (31/1)

ECDF
126349 (3/2)
Availability ticket with some very odd figures - luckily the argo team are looking into what's going on. Waiting for reply (1/3)

126957 (6/3)
Nagios SRM-put test failure ticket - but checking the link all seems okay. Waiting for reply (6/3)

LIVERPOOL
126956 (6/3)
A fresh availability ticket, ripe for On Holding for 30 days. Assigned (6/3) Steve notes that this the bad figures are another example of the ARC testing problems.

126936 (3/3)
A confused atlas deletion ticket, as they seem to have Liverpool and Lancaster confused. I suspect that the ticket can be closed as things seem okay at Liverpool. I took some steps to sooth Lancaster in case deletion errors persist and the DDM plots looked okay. In progress (6/3) Update - closed after no issues at either site.

124819 (3/11/16)
AFS ticket - in reply to the University opening the requested port it's noted that some hosts are still having problems (firewall on the machines themselves?) and others look to be behind a NAT. Waiting for reply (should be In Progress) (13/2)

QMUL
126650 (15/2)
cern@school pilots failing at QM (submission command failed type errors). The ticket could do with an update (even a null one). In progress (15/2) Dan found the problem, the c@s user accounts had capitals in them but slurm doesn't like that. He's recreating the account.

126261 (30/1)
QM CEs not working for biomed. Duncan spots that ce04 and 05 might not be working for CMS either. In progress (3/3) Update - Dan sees biomed jobs running on their cluster. Maybe this is no longer an issue?

126838 (27/2)
Atlas "space reporting issue" ticket. Brian is aiding the investigation, spotting a large amount of possible dark data creation via botched deletions during the switch to webdav. The discussion moves onto problems inherent using webdav for deletions on storm. This has been talked about in the atlas uk meeting (although I'm afraid I phased out during the conversation). In progress (6/3)

TIER 1
126184 (26/1)
Atlas site monitoring survey ticket. Possibly closing soon? Has feedback been provided? In progress (7/2)

126889 (1/3)
Atlas deletion error ticket for the Tier 1. Again it looks like the problem has gone away, although Tomas kindly provided the error message that was being seen for investigation. In Progress (6/3) Update - Brian closed the ticket with a good explanation of what went on.

126905 (2/3)
Finishing up with deployment of the cvmfs support for solidexperiment.org - focusing in part on accepted upload proxies. This looks to have been done, and Dan the new solidexperiment chap has success in uploading software. Daniela has updated the ticket with a few new questions. In progress (6/3)

117683 (18/11/15)
Castor Glue2 publishing ticket. Rob reports that much of the code has been written. On hold (2/3)

124876 (7/11/16)
The echo instance not working for nagios tests due to the wrong path being used. No movement on the child ticket 125026 - it looks like some chasing is needed to be done by someone. On hold (1/1)

Monday 27th February 2017, 16.00 GMT
20 Open UK Tickets this week - just doing the highlights ahead of a full review next week.

NGI (well the Tier 1 and Imperial)
126808 (24/2)
With reference to the last OMB meeting, this is a request for a gathering of statistics for the remaining UK WMS - with an eye to using this statistics to plan WMS decommissioning. Assigned (24/2)

ECDF asks WTF? (where the last F obviously stands for Flip)
126349 (3/2)
A low availability ticket that has left Andy scratching his head a bit and asking for clarification on what is going on - before at his site and across the UK. The picture is muddled by yet another example of the tests not running regularly on ARC CEs. Waiting for reply (24/2)

Monday 20th February 2016, 16.15 GMT
20 Open UK Tickets this week.

Link to all the UK Tickets.
Whilst the number of tickets for the UK is low, a good few of the them are looking a bit neglected.

Atlas Pilots at RALPP
126632 (14/2) This is likely not a site problem, but I directed this atlas ticket RALPP's way to see if Chris can shed a little light onto what's up. Assigned (15/2)

Sno Space at Liverpool
126554 (10/2) After discussion last week Liverpool have rolled a Sno+ spacetoken. John is waiting on news from David to see how things work out, and if it all goes well the rest of us that support Sno+ will be asked to follow suit. In progress (20/2)

Atlas Monitoring Survey
126184 Yet another reminder that atlas are collecting feedback concerning site monitoring - feel free to add to the google doc yourself or forward your thoughts to atlas uk cloud support. In progress (7/2)

Monday 13th February 2017, 16.00 GMT
25 Open UK Tickets this Week

ATLAS want your INPUT
126184 (26/1)
Atlas request for input on sites monitoring. In last week's cloud meeting Alastair asked if anyone had any input for this. If you do feel free to add to the google doc linked in the ticket or email your points to the cloud support mailing lists. In progress (7/2)

TOKEN AFFECTION
126554 (10/2)
Sno+ jobs failed at Liverpool, and once again John B had to educate a user group that space tokens are a thing (thanks John!). Would everyone who supports Sno+ be willing to roll out a space token for them? We don't know at this stage how much space would be needed, at this point it mainly seems for job stage back. In progress (13/2)

UNRELIABLE AVAILABILITY
126349 - ECDF
125743- RALPP

Both of these availability tickets are confusing the sites and myself (although the latter is still quite easy to do). ECDF are getting negative results again (and a lot of unknowns) and RALPP seem to be not updating results very often at all, suffering a several day lag by the looks of it.


Monday 6th February 2017, 14.30 GMT
21 23 Open UK tickets this month

FRESH IN THIS MORNING - BRISTOL
https://ggus.eu/?mode=ticket_info&ticket_id=126454 (7/6) As seen on TB-SUPPORT, CMS are having test failures at Bristol and Winnie is left without a CMS site support at the moment. I see some replies already on the list, I'll leave this slot here for hopefully helpful discussion. On Hold (7/6)

SUSSEX
125503 (9/12/16)
Sno+ file download failure ticket, due to the wrong SE name in the LFC for the files. Jeremy M reports that he is looking into created a DNS alias and asking the CA sage (aka Jens) to shape the necessary certificate. In progress (30/1)

122772 (11/7/16)
Webdav/xroot deployment ticket from atlas. Jeremy M reports the appointment of their new admin, which is great stuff. This is one of the first things on his todo list. I'll repeat the usual "we're here to help" message. No point suffering in silence! On hold (26/1)

Fresh in last night - 126438 - atlas seeing srmPut failures, but the error is 'file already exists'. A problem with rucio?

RALPP
125743 (27/12/16)
An availability ticket. A few blips on the nagios page, but I don't think there's anything to see here really. On Hold (29/1)

125815 (5/1)
Atlas ticket regarding space not being released after deletion. Chris has beaten his dcache into shape, and asked for the deletions to be re-attempted. Waiting for reply (30/1)

OXFORD
126371 (4/2)
Atlas transfer failures. Kashif spotted that the dpm-gsiftp daemon and failed, and got it back up. I suspect this ticket it can be closed if the daemon is stable? In progress (4/2)

121924 (2/6/16)
Perfsonar rate ticket? Any news? If not, is there likely to be any? On Hold (5/12/16)

125822 (5/1)
The Oxford edition of the "Space not released after deletion" issue. Kashif too has been tinkering his SE, tweaking and (re-)starting httpd daemons and asks for a fresh list of files to check. Waiting for reply (27/1)

BIRMINGHAM
126131 (24/1)
Availability ticket. The numbers are on the mend so the ticket is On Hold (30/1)

GLASGOW
125867 (9/1)
LHCB seeing cvmfs-related job failures on WNs at Glasgow. Gareth has updated cvmfs across the Glasgow nodes and asks if the issue has calmed down. Waiting for reply (31/1)

124052 (25/9/16)
Another LHCB ticket, about the arc publishing incorrect job numbers. Gareth provided an update regarding the Glasgow plans, rolling fixing this into the Centos7 migration. Thanks Gareth! On Hold (31/1)

EDINBURGH
126349 (3/2)
Another availability ticket, although today's numbers look to be okay so hopefully the cause of the troubles has passed. Looks like this ticket hasn't been noticed yet though. Assigned (3/2) Andy noted that the argo numbers seem nonsensical with negative availability for a few days! But things are on the mend now. Looks like a simple case of On Holding the ticket for the next 26 days.

LIVERPOOL
124819 (3/11/16)
The last AFS ticket, John B reports that the university has stopped firewalling UDP port 7001 and asks if things are better now. Waiting for reply (3/2)

126167 (25/1)
Decommissioning ticket for the last CREAM CE at Liverpool (which will also see the end of torque at the site). Downtime for the service will be on the 14th (Happy Valentine's Day?) and the service will be switched off properly come the 28th. In progress (30/1)

QMUL
125627 (19/12/16)
Atlas transfers failing to the QM test SE. Dan increased the space to 10TB to sooth the last batch of failures, just waiting to here if that worked. Waiting for reply (26/1)

126261 (30/1)
Biomed nagios tests not working for ce4 at QM. The problem persists. In progress (2/2)

126312 (1/2)
Atlas spotted QM's squid had fallen over. Dan has noticed problems since upgrading to v3 of frontier-squid, although the issues could also be related to IPv6 on the hosts (of the two squids at QM the one that fell over was also the one that has an IPv6 address in DNS). Keeping the ticket open to see if things stay up. In progress (1/2)

TIER 1
126296 (1/2)
CMS SAM tests failing against srm-cms-disk.gridpp.rl.ac.uk. All transfers "by hand" pass without trouble, and Gareth points out that this service is not in production in the GOCDB, so tests shouldn't even be running against it! Waiting for reply (6/2) Update - CMS got back that this is the endpoint specified in PhEDeX so this is why it was tested. If this is wrong it will need to be changed.

126376 (5/2)
Another batch of CMS SAM test failures. This includes the srm-cms-disk issue again. John K restarted the CMS xroot directors to try to clear the CE test errors that were being seen - things were looking up. In progress (6/2)

126184 (26/1)
Request from atlas for input on the new site monitoring schemes, linked in the ticket. The appropriate people were being chased. In progress (26/1)

124876 (7/11/16)
echo instance at RAL failing nagios tests due to the tests not using the right path. The ticket addressing this (125026) has had no progress since just before Christmas and so could do with a shake up. On Hold (1/1)

117683 (18/11/15)
Glue 2 publishing for Castor ticket. Did Jens and Rob have any luck tackling this in the pre-Christmas get together? On Hold (7/12/16)

Monday 30th January 2017, 15.15 GMT
24 Open UK Tickets this week

QMUL
126156 (25/1)
A quite interesting ticket from John Gordon regarding QM having >100% efficiency. Within the ticket Dan debugs his homegrown slurm accounting scripts. Possibly of interest to others - some good stuff in this ticket. In progress (26/1)

A few other tickets at QM could do with a poke though:
126012 (17/1)
Nagios BDII ticket, problem keeps cropping up.

126234 (28/1)
LHCB pilots failing and jobs not returning output, the ticket likely has snuck by you. Assigned (28/1)

RALPP
126240 (29/1)
Whilst this CMS SAM test failure ticket filled me with righteous indignation with its brevity and lack of reference links, it still could do with acknowledging. Assigned (29/1)

(In fairness given my current coffee consumption it doesn't take much to send me off on one.)

GLASGOW
125867 (9/1)
This LHCB cvmfs ticket threatens to go stale - any word on extra failures (or lack thereof)? In progress (16/1)

Talking of Glasgo tickets looking a bit stale: ticket 124052 (arc publishing ticket last updated in September).

TIER 1
126184 (26/1)
Possibly not intended of general consumption, this is an atlas request for feedback concerning the atlas site monitors. In progress (26/1)

Monday 23rd January 2016, 15.30 GMT
21 Open UK Tickets this week

RALPP
126053 (19/1)
This one piqued my interest - CMS users in Florida are having trouble getting at files, seemingly due to their MTU settings - with their default of 9000 things timeout, with 1500 things work. Bristol transfers okay. Chris is investigating. In progress (20/1) Update- solved, the problem mysteriously fixed itself.

(also at RALPP is Biomed ticket 126065, which may have not been noticed yet). Update - in progress

OXFORD
125822 (5/1)
Oxford deletions not working. An observational question - is http working as expected on the Oxford nodes? I ask because when poking my nose around pointing my browser at the Oxford SE got me nothing. The file in question I could access using my dteam credentials (and xroot), so it still exists on disk. In progress (23/1)

121924 (2/6/16)
Perfsonar ticket - a polite reminder if you (or anyone else) would like help debugging perfsonar transfer problems with some independent "standard" iperf tests I'm happy to try to help out with them. On hold (5/12)


BIRMINGHAM
Good luck to Mark with his DPM headnode this week! Let us know if you need a hand.

AFS TICKETS (LIVERPOOL and GLASGOW, but mainly Glasgow)
Can you please throw in a soothing update to your AFS tickets when you have a few spare minutes:
124821 - GLASGOW
124819 - LIVERPOOL

SUSSEX SNOPLUS FILES
125503 (9/12/16)
And finally, no news is not good news on this Sno+ ticket for Sussex. It threatens to turn into a game of pass the buck, as the options available to the VO put the responsibility in three very different places. In progress (23/1) Update- Jeremy will look at the dns alias solution, which requires some certificate magic to be done.

TIER 1
124876
The ticket Daniela mentioned, regarding nagios tests for the echo instance. To quote Daniela "The requirement that machines in production should pass basic tests is really not that onerous."

Monday 16th January 2016, 15.00 GMT
21 Open Tickets this week

Bounced back to Bristol
125558 (13/12/16)
This ticket from Lukasz to CMS, concerning decommissioning a queue in the glidein factories, has been reassigned back to Bristol. Assigned (12/1) Update - solved by the site, the initial query sorted.

ANYONE SEEN SOMETHING LIKE THIS BEFORE?

DURHAM
125845 (6/1)
Durham are having intermittent, hard to explain nagios test failures on their arc CE - seeing a few failures a day. Fishing on the site's behalf, has anyone any suggestions about where to look? In progress (13/1) Update - Thanks to Kashif for his input.

GLASGOW
125867 (9/1)
Another piece of unasked for meddling by myself, Glasgow are seeing some greedy behaviour from cvmfs on some nodes running lhcb jobs - has anyone seen something similar? In progress (16/1)

AND FINALLY...

SUSSEX
125503 (9/12/16)
As seen on TB-SUPPORT, I stuck my 2-yen's worth in to this Sno+ ticket and got a little out of my depth. Either Sussex will need to alias their new SE to the old one or there will need to be some heavy LFC operations for Sno+ (either by them or the LFC admins). Thanks to Simon, Catalin and Henry for their input. In progress (16/1)

Monday 9th January 2017, 14.30 GMT
HAPPY NEW YEAR!

22 Open UK Tickets this year.

SUSSEX
124614 (24/10/16)
A availability/reliability ticket. The New Year is looking greener on the argo pages for Sussex, so hopefully there will be plain sailing until the alarm clears. On Hold (6/1)

125503 (9/12/16)
Snoplus file download failures. Doing a spot of investigation myself it looks like the Sno+ guys didn't convert their lfns when Sussex did an SE migration last year, I've informed them thusly. Waiting for reply (9/1)

122772 (11/7/16)
Webdav and xroot frontend ticket. Hopefully the new admin at Sussex will start wrangling this soon. On Hold (21/11/16)

RALPP
125815 (5/1)
A CMS ticket regarding space not being released after deletion. It is likely a dcache problem, but a similar issue was seen at Oxford for atlas (125822). Chris has asked for some problem surls. In progress (5/1)

125743 (27/12/16)
Another availability ticket - I had to dig deep into argo to convince myself tests were running but things are looking okay. On hold (6/1)

OXFORD
125822 (5/1)
Atlas deletion problems at Oxford - probably unrelated to the RALPP issue. There's mention of a similar issue seen at Liverpool, but no specifics- Kashif has asked for more information and supplied a dark data dump. In progress (9/1)

121924 (2/6/16)
Perfsonar throughput drop ticket. Suspected to be a problem with just the perfsonar tests, it likely warrants a spot of further investigation - perhaps someone with a "regular" iperf endpoint could help? On hold (5/12/16)

BIRMINGHAM
122771 (11/7/16)
xroot/webdav ticket from atlas. Mark finished off 2016 with some good progress - looks like permission issues to my eyes. On Hold (22/12/16)

GLASGOW
125867 (9/1)
lhcb seeing cvmfs problems on some Glasgow nodes. Gareth has his prodding stick out and removed the nodes from production just to be safe. In progress (9/1)

124821 (3/11)
AFS ticket. Not very exciting. On hold (16/11/16)

124052 (25/9)
LHCB arc job number publishing ticket. I believe tackling this is on the to-do list. On hold (26/9/16)

DURHAM
125845 (6/1)
ROD arc ce test ticket - I think this snuck by the Durham admins, understandable on the first Friday of the year. Assigned (6/1)

SHEFFIELD
125853 (6/1)
Apel publishing ROD ticket. Elena has fixed things, but it will take some time to trickle through. This ticket will want on holding until then I reckon. Waiting for reply (9/1) Update - solved, tests all green now.

MANCHESTER
125664 (20/12)
This is a ticket to Andrew with his VAC dev hat on, asking for a way to keep VAC and dirac versions in sync. Some good discussion going on. In progress (6/1)

LIVERPOOL
124819 (3/11/16)
Another AFS ticket - John provided an update before on holding it. On hold (16/12/16)

RHUL
125855 (6/1)
Biomed have asked if they're being purposely excluded from accessing ce3. I'm not sure if Raul is back yet, the ticket could do with some fielding. Assigned (6/1/17) Update - solved, biomed enabled on the queues.

QMUL
125627 (19/12/16)
Atlas noticing problems on a test SE at QM, which Dan was trying out a UMD4 install on. On hold (19/12)

TIER 1
125856 (6/1)
LHCB file access ticket, this has been investigated and the Tier 1 team have come back with a few questions. Waiting for reply (9/1)

125157 (24/11/16)
Creation of extras-fp7.eu cvmfs repo - chugging along nicely in spite of the holidays, with most stratum-1 replications in place. In progress (3/1)

124876 (7/11/16)
Ticket following getting nagios tests working for the RAL echo instance. Alastair provided a summary to the issue to start the new year off with with a reference to ticket 125026. On Hold (1/1)

125480 (9/12/16)
Physical/logical core publishing mismatch. After some discussion the ticket was held for the holidays. On Hold (21/12/16)

117683 (18/11/15)
Glue 2 publishing for Castor - Jens and Rob hopefully had a chance to have a bit of a bash at this before Christmas. Hope that went well! On hold (7/12/16)