Difference between revisions of "Past Ticket Bulletins 2018"

From GridPP Wiki
Jump to: navigation, search
Line 1: Line 1:
 +
'''Monday 11th June 2018, 16.00 BST'''<br />
 +
 +
36 Open UK Tickets this week.
 +
 +
[http://tinyurl.com/nwgrnys Link to all the UK Tickets.]
 +
 +
The only tickets that caught my eye are:
 +
 +
'''NGI:''' [https://ggus.eu/?mode=ticket_info&ticket_id=135038 135038] - The NGI gocdb check ticket. Does any more input need to be given from sites for this?<br />
 +
'''QMUL:''' [https://ggus.eu/?mode=ticket_info&ticket_id=134532 134532] - LHCB have got back to this ticket and confirmed things are working, so it can be closed.<br />
 +
'''BRISTOL:''' [https://ggus.eu/?mode=ticket_info&ticket_id=134820 134820] - I reckon this old CMS ticket requested information can be closed too. <br />
 +
'''RHUL:''' [https://ggus.eu/?mode=ticket_info&ticket_id=135542 135542] - Another for the 'can be closed' pile, although CMS would like to see if there were any explanations for the temporary pilot problems that they saw. <br />
 +
 +
 
'''Monday 4th June 2018, 14.30 BST'''<br />
 
'''Monday 4th June 2018, 14.30 BST'''<br />
 
45 Open UK Tickets this month.
 
45 Open UK Tickets this month.

Revision as of 09:36, 25 June 2018

Monday 11th June 2018, 16.00 BST

36 Open UK Tickets this week.

Link to all the UK Tickets.

The only tickets that caught my eye are:

NGI: 135038 - The NGI gocdb check ticket. Does any more input need to be given from sites for this?
QMUL: 134532 - LHCB have got back to this ticket and confirmed things are working, so it can be closed.
BRISTOL: 134820 - I reckon this old CMS ticket requested information can be closed too.
RHUL: 135542 - Another for the 'can be closed' pile, although CMS would like to see if there were any explanations for the temporary pilot problems that they saw.


Monday 4th June 2018, 14.30 BST
45 Open UK Tickets this month.

IPv6 Tickets.

SUSSEX: 131617
Some good progress here with the last update on Friday painting a hopeful picture of IPv6 come the autumn.

RALPP: 131616
Last update had Chris trying to beat his dual-stacked PS boxes into shape - but this was back in January. Needless to say the ticket needs an update!

OXFORD: 131615
Last update was back in March, with summer the likely timeframe for v6 deployment. Three months on the ticket could do with a slight update to re-confirm this is still the case.

CAMBRIDGE: 131614
It's a similar case for Cambridge.

BRISTOL: 131613
Any news on your plans from back in April to get your PS box onto a v6-enabled network?

BIRMINGHAM: 131612
Some recent good news here with Mark getting his PS box (kindof) v6 pingable, just waiting on the v6 DNS now.

GLASGOW: 131611
Gareth covered his bases well with his update back in February. Hopefully the new build is on schedule.

ECDF: 131610
Andy gave a mixed update a few weeks ago, citing some v6 routing differences and an upcoming wholesale networking overhaul scheduled for September so the ticket is freshly on hold pending more information.

DURHAM: 131609
A quick update last month reports no significant progress.

SHEFFIELD: 131608
Elena gave an update at the end of April, with work on the border routers scheduled for May. Hopefully that went well and you'll have more information soon.

MANCHESTER: 131607
Any plans on dual-stacking your storage after your Perfsonar successes?

LIVERPOOL: 131606
Any news on those ongoing negotiations mentioned in the last March update?

UCL: 131604
Have re-poked their network admins over this.

RHUL: 131603
No news for a while after the February update that v6 reverse-lookup wasn't working.

Back to the regular tickets...

NGI
135038 (9/5)
Review of the GOCDB info for the NGI. On to the second stage of the review now, but it's still a good time for sites to double-check their gocdb entries if they haven't recently. In progress (22/5)

OXFORD
135485 (3/6)
A fresh in ticket from Sno+, concerning the bdii information disappearing from their feeds. Assigned (4/6)

BRISTOL
135121 (15/5)
A ROD ticket for failed webdav tests. The tests were doomed to never work, so Lukasz disabled the endpoint in the gocdb. Daniela reckoned the ticket needs to be closed to try and see if it disables the alarms. In progress (24/5)

135120 (15/5)
Another week or so and this availability ticket should be able to be closed - until then it should be On-Hold'd. Reopened (4/6)

135302 (23/5)
CMS transfer failure ticket. It looks like this ticket hasn't been noticed yet. Assigned (23/5)

134820 (29/4)
This CMS pledge enquiry ticket has had the question answered. I suspect it can be closed. In progress (1/5)

BIRMINGHAM
129930 (4/8/17)
The old atlas http test failure ticket. How goes the EOS migration? On hold (23/4)

GLASGOW
134689 (23/4)
Perfsonar update ticket. Gareth is waiting on 4.1 to be released (which I can't find any news on). On Hold (24/4)

ECDF
135243 (21/5)
ROD ticket for failed srm-put tests- Rob had to restart things to get them working but no shifting the alarms joy at first. The tests seem okay for the last day. In progress (24/5)

135314 (24/5)
Another ROD ticket, this one for old IGTF rpms on the workers. As a quick note that may be helpful, an up-to-date version of the certificates is kept in /cvmfs/grid.cern.ch/etc/grid-security/ . In progress (28/5)

135404 (30/5)
The resulting low availability ticket for the previous issues. In progress (30/5)

DURHAM
134687 (23/4)
Request to update the Durham perfsonar. Any news? In progress (30/4)

SHEFFIELD
134947 (4/5)
Atlas transfer failures - one of the C7 DPM problem tickets- see this ticket. On hold (31/5) Update - the Oxford version of this ticket (134945 was solved by upgrading to the latest version of gsoap (as mentioned in the jira ticket).

MANCHESTER
134684 (23/4)
Perfsonar upgrade request ticket. Alessandra still wants to know how necessary this update is (my thoughts are it will be quite necessary, *once* Perfsonar 4.1 is out, but I don't have Duncan's expertise). Waiting for reply (23/4)

UCL
134686 (23/4)
Another perfsonar upgrade ticket, Ben was looking at it at the last update. Any joy? On Hold (23/4)

RHUL
134945 (4/5)
Another atlas transfer ticket due to the C7 DPM troubles. On hold (17/5)

QMUL
134532 (12/4)
The return of an old LHCB download problem, where the turl can't be resolved. Daniel has applied a fix to his production SE. Any news that it's worked? In progress (14/5)

134573 (17/4)
CMS request to install singularity, on hold until the Summer move to C7. On hold (17/4)

132929 (18/1)
CMS seeing SLURM accounting problems. The APEL devs are involved now, and have asked for some parser outputs to test some stuff. In progress (10/5)

IMPERIAL
135464 (1/6)
A CMS ticket about checksum failures that came in on Friday afternoon. Files are being declared invalid after being double-checked, and another transfer failure query has been tacked onto the ticket today. In progress (4/6)

134567 (17/4)
A ticket concerning the site rather then a site ticket, the declaration of some lost Pheno files. I poked it today. In progress (4/6) Closed by Pheno, so all's well.

BRUNEL
133956 (9/3)
A CMS xroot config change ticket. Any luck with rolling out these changes after your troubles getting the new hardware to roll them out onto? In progress (23/4)

THE TIER 1
135367 (28/5)
Another SNO+ information system ticket, this one has a lot of conversation going on in it about Castor publishing even before it landed at the Tier 1 (see the mice ticket below). In progress (4/6)

135133 (15/5)
CMS spotting corrupt files on ECHO, which looked not just be a problem with the file but perhaps with their metadata as well? A lot of conversation has occurred in this ticket so I'm not entirely sure what has occurred, but corrupt files have been deleted. Waiting for reply (4/6)

134685 (23/4)
Another request to upgrade Perfsonar to C7. At last check some C7 perfsonars were up and running in testing. Any luck getting them into production? In progress (2/5)

135308 (24/5)
MICE problems after the loss of Castor publishing. Henry has hit a problem when trying to combine the workarounds with LFC entries. In progress (1/6)

135293 (23/5)
ROD tickets, again related to the loss of castor publishing. Alastair has put in a request for the SRM Ops tests for Castor to be removed. On Hold (31/5)

134703 (23/4)
CMS transfers failing from RAL_disk. It appears files were being sent to the wrong namespace. There has since been a lot of lists of files being searched for. Any luck getting to the bottom of this? In progress (25/5)

135455 (31/5)
CMS checksum verification at RAL. This looks to be a duplication of 135133 but I think you guys already spotted that. In progress (4/6)

127597 (7/4/17)
CMS wanting to know about the RAL networking. After the new firewall went in at the end of April Chris asked for some RAL/RALPP job performance comparisons to try to see how xroot proxies could affect things. No news back, but the question could be lost in the noise. On Hold (30/4)

124876 (7/11/16)
Gridftp tests failing for ECHO due to a problem with the tests - after 117683 was left unsolved this is our oldest ticket. Not a hint of movement on the counter ticket (125026) for a long time. I think we could do with weighing up our options here. On hold (13/11/17)

Monday 21st May 2018, 14.00 BST
43 Open UK Tickets this week.

Sites getting Ticket Updates?
We've had two anecdotal tales of sites not getting emails updates to some tickets (135139 at Lancaster and 134945 at RHUL). Has anyone else not been getting emails about their tickets?

NGI
135038(9/5)
The yearly review of our gocdb information. Everyone did check that their site (and their own) details were correct, right? (I had my old office phone number listed). Jeremy is proceeding to the NGI level review now. In progress (21/5)

SKATELESCOPE.EU TICKETS
QMUL: 135042
CAMBRIDGE: 134980
Both these tickets appears to be waiting for submitter/VO input, and perhaps are solved.

BRISTOL
135121 (15/5)
135120 (15/5)
I'm not sure what's going on with the dates on these, but both ROD tickets were submitted by Kashif last week and both are still in the assigned state at time of writing. Assigned (15/5) Update - these tickets have been fielded now- thanks!

134820 (29/4)
This CMS pledge enquiry was answered, so I think it's waiting on CMS input. The ticket likely needs a prod to stir the pot. In progress (1/5)

MANCHESTER
134684 (23/4)
One of the "please upgrade perfsonar" tickets, this is waiting for an answer from Duncan. Waiting for reply (4/5)

DURHAM
134687 (23/4)
Also a perfsonar upgrade ticket, any progress with this? In progress (30/4)

IMPERIAL (kind of)
134567 (17/5)
I think this Pheno corrupt file cleanup ticket is nearly completed, it just needs to be kept an eye on to make sure it doesn't hang around after it's done. In progress (17/5)

QMUL
132929 (18/1)
After re-reading Adrian's last post I think there's a request for information in there to help debug things from the apel side. In progress (10/5)

TIER 1
135133 (15/5)
Chris has asked the ECHO admins to look at this CMS ticket, after seeing some odd behaviour. In progress (17/5)

133992 (12/3)
This other ECHO related ticket, from atlas, looks like it has badly stalled. Is there still an active issue to fix? In progress (19/4)

Monday 14th May 2018, 15.00 BST
43 Open UK Tickets this week.

NGI
135038 (9/5)
The annual review of the UK's information in the gocdb has rolled around again. As part of this could site admins please check their e-mail, telephone number and CSIRT are correct in their site's gocdb entry. In progress (14/5)

VAC
135059 (10/5)
Daniela has submitted a VAC ticket, which she assigned to Andrew. If VAC doesn't have a GGUS Support Unit it probably should get one, right? Assigned (10/5) Update: VAC did have a GGUS SU all along, but Andrew suggests the ticket belongs somewhere else.

Tickets that can be closed(?)
IMPERIAL: 135056
QMUL: 133402
Both these ticket's last updates suggested that the ticket can be closed. Update - Henry would like to keep the IC ticket open a bit longer.


Transfer Error Tickets
RHUL: 134945
OXFORD: 135051
LIVERPOOL: 134868
I thought that these tickets are worth mentioning, with the on going conversation and debugging going on in the storage group list.


Friday 4th May 2018, 17.00 BST.

Due to the Bank Holiday no proper look at the tickets this week, just the obligatory link to all the UK tickets.

Raul pointed me to a CMS ticket, 134763, that might be of interest to sites running condor and moving to IPv6. The problem was worked around rather then solved (disabling IPv6 in the CMS condor instances).

Another ticket that caught my eye were the RALPP ticket 134899, which is still just assigned but I believe the issue is already solved.

Monday 30th April 2018, 14.00 BST
49 Open UK Tickets this month.

IPv6 Tickets
SUSSEX: 131617
Had a recent update - thanks for that.

RALPP: 131616
How did the perfsonar dualstacking go?

OXFORD: 131615
I don't think anything has changed since March's update.

CAMBRIDGE: 131614
Whilst I don't think any progress is expected, the ticket could probably do with a token update. Thanks for the quick update.

BRISTOL: 131613
Winnie provided an April update, thanks for that.

BIRMINGHAM: 131612
Thanks also for the update Mark (any news is better then no news).

GLASGOW: 131611
From the silence I don't think there's been any change, but I don't think any was expected.

ECDF: 131610
Things looked reasonably positive at the last update.

DURHAM: 131609
Any luck getting a better connection from your Central IT people?

SHEFFIELD: 131608
No news for a while, but again I don't think any was expected yet. Update - Elena reports that work will start on network changes this month, but no date of when IPv6 will be available has been given.

MANCHESTER: 131607
The Manchester perfsonars are working over v6 now, which is nice.

LIVERPOOL: 131606
There's mention of "negotiations" so there might not be much news for a while!

UCL: 131604
No news here either.

RHUL: 131603
Has the v6 DNS lookup been rolled out yet?

Duncan's Perfsonar Upgrade Tickets
SUSSEX: 134741
Leo is jumping to it.

BIRMINGHAM: 134691
Mark acknowledged the ticket.

LANCASTER: 134690
Matt weeps at what should have been an easy job.

GLASGOW: 134689
Gareth is waiting for the proper perfsonar 4.1 release.

DURHAM: 134687
Adam marked the ticket in progress.

UCL: 134686
Ben hopes to get this done in the next few weeks.

MANCHESTER: 134684
Alessandra asks how necessary this upgrade is? (Presumably because Manchester just got their perfsonar fully up and running for v6).

TIER 1: 134685
Darren assigned the ticket to the appropriate team, but no other news.

And now the rest of the tickets, site by site.

SUSSEX
134415 (5/4)
A standard issue availability ticket, the numbers are healing. On hold (5/4)

BRISTOL
134820 (29/4)
CMS have asked for the 2018 DDM disk pledges. The ticket has not been spotted as of the time of writing. Assigned (29/4) Luke provided an update, with some figures and projections. In Progress

134513 (12/4)
Another CMS ticket, this one is for date transfer problems from the Tier 1s. Has Dr K had a chance to take a look at this yet? In progress (23/4)

134278 (27/3)
Another CMS transfer ticket, with errors due to the file existing. Have you managed to fix the DPM file permission problems? In progress (23/4)

BIRMINGHAM
129930 (4/8/17)
The old atlas http ticket - Mark provided an update that the time of EOS at Birmingham draws closer, which will "solve" this ticket. On Hold (23/4)

ECDF
134740 (25/4)
A different sort of perfsonar ticket from Duncan, asking to clear up some settings on the Edinburgh perfsonars. Any joy? In progress (26/4)

SHEFFIELD
134356 (30/3)
This atlas transfer ticket looks like it can be closed. On hold (28/4)

QMUL
133402 (9/2)
Snoplus jobs having problems at QM. It looks like things are fixed, and Dan had kindly offered to up Sno+'s fairshare to compensate them. Just waiting on VO confirmation that things are working. Waiting for reply (16/4)

133965 (9/3)
LHCB jobs failing due to "no space left on device". It looked like atlas jobs were being poor neighbours. In progress (16/4)

134532 (12/4)
LHCB data access problems at QM, which they thing is an old issue raising its head again (129155), Dan isn't convinced, and thinks it might be a problem with using the root protocol on storm. CNAF experts are getting involved (hopefully). In progress (17/4)

132713 (4/1)
HyperK.org support at QM - Daniela has asked to take another look and given her current set of errors. In progress (22/4)

134573 (17/4)
CMS request to install Singularity - which Dan has said them will do, when he upgrades to CentOS7 during the Summer. On Hold (17/4)

132929 (18/1)
CMS complaining that APEL accounting is not working for slurm at QM. The apel admins got cc'd into the ticket but no sign of them getting involved yet. In progress (29/1)

134455 (9/4)
LHCB pilots dying at QM. Turned out to be an interesting case of a few black-hole nodes. I think Dan is keeping this ticket open until he figures out the route cause (or a monitoring check).

BRUNEL
133956 (9/3)
CMS requesting xroot changes. Raul is on it after having some hardware delivery hiccups messing up his workflow. In progress (23/4)

134819 (29/4)
As the Bristol ticket, CMS requesting 2018 DDM disk pledges. This ticket also hasn't been spotted, but then it only landed on Sunday. Assigned (29/4)

134826 (30/4)
A fresh ROD ticket, for a lot of errors. Assigned (30/4) I suspect his is part of the aftermath of the disasters Raul described.

THE TIER 1
132708 (4/1)
WMS decommissioning ticket. Just in case you guys have forgotten about this - I suspect you can go onto the next stage now. In progress (18/1)

134468 (9/4)
CMS complaining that the xrootd redirector is not seeing some ECHO files. Turned out to be a stuck redirector, but the ticket got reopened with a different issue. Chris has asked if problems persist. Waiting for reply (30/4)

134769 (26/4)
CMS transfer from RAL to Florida failing. Chris and George jumped on it, but again the problem might have been fleeting. In progress (30/4)

133992 (12/3)
Atlas seeing no such file or directory errors in ECHO. A need for some sort of consistency checking has been identified (as well as the fact that existing tools might be difficult to adjust). No progress for a bit. In progress (19/4)

134619 (19/4)
Another ECHO ticket, this is from Chris B on behalf of CMS, citing problems reading data. Things look like they were fixed but got confused due to other issues. In progress (30/4)

134494 (11/4)
Atlas noticing that the json based space reporting isn't updating. Alastair notes that this was on purpose. Waiting for a reply from the submitter. Waiting for reply (11/4)

127597 (7/4/17)
CMS ticket checking on xroot and network performance. Chris reports that the new firewall is in place, although there are still kinks needing to be worked out. Chris also mentions that a useful exercise will be comparing the Tier 1 and RALPP, to see if the latter's xroot proxies could help. On Hold (30/4)

124876 (7/11/16)
The old gridftp tests failing for ECHO ticket. Not a hint of movement on the counter ticket (125026). On hold (13/11/17)

117683 (18/11/15)
The oldest ticket, glue2 publishing for Castor. Could likely do with a quarterly update. On hold (3/1)

Monday 23rd April 2018, 15.30 BST
56 Open UK Tickets this week.

Upgrading Perfsonar
Duncan has rolled out 8 tickets to sites running old, CentOS7 perfsonars asking them to upgrade. The sites are on the receiving end are Lancaster, Glasgow, Birmingham, QM, Durham, UCL, Manchester and the Tier 1.

RHUL
134574 (17/4)
This request from CMS to install Singularity looks like it hasn't been spotted yet. Assigned (17/4) In progress now.

BRUNEL
133956 (9/3)
Request from CMS to update xroot configs. To repeat Brian's questions, any news on this? In progress (9/3) Thanks for the update Raul

QMUL
132713 (4/1)
Daniela has asked that this hyperk ticket be looked at with renewed vigor. In progress (22/2)

TIER 1
133992 (12/3)
One of a few ECHO tickets, there are some problems using existing consistency tools which are being looked at - some interesting points are raised. In progress (19/4)

Monday 26th March 2018, 15.00 BST
38 Open UK Tickets this week.

ECDF
131610 (3/11)
The only IPv6 ticket to see some recent action, there's some interesting information here about dual-stacking VMs running on hypervisors. In progress (22/3) Updated with some extra information and good progress.

134034 (14/3)
This LHCB ticket looks to be long solved, and can be closed. In progress (15/3) Solved

BRISTOL
133806 (2/3)
134081 (17/3)
Both of these Bristol CMS tickets could do with an update to show how things are coming along.

BRUNEL
133956 (9/3)
Have you managed to get started on this CMS xrootd config request? In progress (9/3) Raul updated the list with his plans.

QMUL
132713 (4/1)
Any luck tracking down the hyperk job errors? In progress (6/2)

RHUL
134144 (20/3)
It looks like the SRM problems are RHUL side according to all the atlas monitoring. If it helps I've had success clearing similar looking errors with restarts of the srmv2.2 services. In progress (21/3)

TIER 1
134136 (20/3)
This atlas "no such file" ticket sounds very familiar to the issues seen on the ECHO service (133992) and at Lancaster (133991 ). In progress (20/3)

Monday 19th March 2018, 15.00 GMT
47 Open UK Tickets this week.

GLASGOW
134072 (15/3)
Atlas want sites running v3 of the frontier squid to upgrade to 3.5.27-3.1 (or higher). Nothing wrong with the ticket, just something to bare in mind for any other sites missed by atlas' monitoring. In progress (16/3)

MANCHESTER
134032 (14/3)
Atlas seeing deletion errors - as of Friday the errors persisted. Assigned (16/3)

QMUL
133965 (9/3)
LHCB jobs suffering "no space left on the device" errors at QM (which has happened before IIRC). This ticket might have been missed. Assigned (16/3)

ECDF
134034 (14/3)
LHCB job problems - but things look fixed so it seems the ticket can be closed. In progress (15/3)

IMPERIAL
133818 (4/3)
A question for LHCB rather then IC - Simon answered the queries about the site's per-sse4_2 nodes, so the ball is in the VO's court now. Waiting for reply (5/3) Update - closed by LHCB.

TIER 1
134037 (15/3)
An interesting LHCB ticket where file access for a file in Castor appears to be working from RAL itself, but not from lxplus or some other places. In progress (15/3) Chris has linked to this similar CMS ticket 134119, in which he notes that he is seeing similar errors from his home ISP.

MISSING FILES THAT WERE NEVER THERE (PERHAPS)
133991 (Lancaster)
133992 (Tier 1)
Two tickets with similar symptoms, where rucio seems to think files are there and the SEs don't. Elena opened a JIRA ticket for the Lancaster problems - https://its.cern.ch/jira/browse/ATLDDMOPS-5434

Monday 12th March 2018, 14.30 GMT
42 Open UK Tickets this week.

SUSSEX
133325 (6/2)
This Availability ticket looks like it can be closed, with the alarms having gone green. In progress (8/3)

DURHAM
133338 (7/2)
Is this subject of Atlas ticket still causing problems? Lots of things were done at the last update - did they fix the issue? In progress (21/2)

TIER 1
133719 (27/2)
This ECHO ticket hasn't had an update since its acknowledgment, any news? In progress (27/2)

133717 (27/2)
Possibly related, this CMS FTS ticket hasn't had an update this month either. In progress (27/2)

Both of these issues look like they're related to this atlas ticket, which has been getting updates: 133752

133619 (21/2)
I have a feeling that this CMS unmerged file ticket can be closed, but I could be misreading the last updates. It's definitely work checking to see if it is solved. In progress (12/3)

133764 (1/3)
Finally, this Sno+ BDII ticket can be closed, the problem appears to have been at the source. In progress (8/3)


Monday 5th March 2018, 14.30 GMT
44 Open Tickets this month.

IPv6 Deployment Tickets
Sussex: 131617
Possibly on hold until mid-2018.
RALPP: 131616
Chris had an encouraging update back in January, but hit some snags with a new Perfsonar install. Any joy?
OXFORD: 131615
No update since stating you had dual-stacked Perfsonar boxes back in November. Anything to add? Thanks for the update.
CAMBRIDGE: 131614
No progress expected until the Summer of this year. Is this still the case?
BRISTOL: 131613
Last update hoped progress could happen by February, any news? No recent news
BIRMINGHAM: 131612
Some progress on the v6 infrastructure news, hopefully the bugs Mark described a few weeks back can be ironed out.
GLASGOW: 131611
Gareth provided a recent, if not totally positive, update.
ECDF: 131610
There were some interesting times last week when taking the first steps in dual-stacking the ECDF DPM broke things. Keeping to dual-stacking their test DPM for now.
DURHAM: 131609
Last update at the end of January had no positive movement from central IT on v6 deployment.
SHEFFIELD: 131608
This ticket really could do with an update - even an unexciting one.
MANCHESTER: 131607
IIRC I think reverse lookup works only for the Perfsonar boxes - the ticket could do with an update about this.
LIVERPOOL: 131606
Another ticket that could do with an update, even if it's a boring one. John provided a brief update.
UCL: 131604
No news from central IT at last check back in January.
RHUL: 131603
Perfsonar dual-stacked, but DNS lookup not supported yet.

Common or Garden Tickets

SUSSEX
122772 (11/7/16)
Webdav/Xroot ticket. Some good looking progress on getting this to work, although at last check Leo hit some more problems. In progress (7/2)

133325 (6/2)
Availability ticket. Hopefully given another week of smooth running this can be closed. In progress (12/2)

RALPP
133819 (4/3)
LHCB asked RALPP to provide details of nodes without any SSE4.2 support. As Chris instructed the ticket was reopened by LHCB to request lhcb jobs do not land on these nodes. Reopened (4/3) Update - solved, the nodes are being decommissioned very soon.

OXFORD
133809 (3/3)
Availability ticket, caused by the AC troubles. On hold (5/3)

BRISTOL
133762 (1/3)
CMS Transfer problems, on hold until Friday. On Hold (5/3)

133806 (2/3)
CMS asked sites to deploy Singularity by March 2018, this ticket is the follow up. On hold (5/3)

BIRMINGHAM
129930 (4/8/17)
Atlas http SAM tests failing. Any luck with the puppet scripts Kashif shared with you? On hold (13/2)

GLASGOW
133667 (23/2)
LHCB data access problems at Glasgow. The ticket tailed off a bit, Andrew McNab has offered to help compare Glasgow and Manchester settings. In progress (5/3) Update - everything looks good now after Sam updated xroot across the Glasgow storage. Maarten noted in the xroot changelog the likely fix. I should imagine this ticket can be closed now.

DURHAM
133338 (7/2)
Atlas jobs failing at Durham, with the problems likely to be related to the Arc Control Tower handling of pilots. Adam rolled out some changes, have these fixed things? In progress (21/2)

SHEFFIELD
133019 (24/1)
Availability ticket. Ticking along. On hold (1/3)

133810 (3/3)
Sno+ jobs failing due to cvmfs errors on a node, which Elena has offline. I suspect that that's this ticket done with. In progress (4/3)

133770 (2/3)
LHCB jobs failing due to problems on some WNs, Elena has been fixing them, hopefully it's all sorted now. In progress (3/3)

MANCHESTER
133716 (27/2)
Atlas deletion errors - it looks like this ticket has been missed. Assigned (27/2)

QMUL
133402 (9/2)
A good portion of Sno+ jobs failing at QM, due to stage in/out errors. This is likely caused by the reduced network bandwidth being hogged by atlas. Hopefully this will be fixed soon (by restoring the 20GB/s site connection). In progress (22/2)

132713 (4/1)
hyperk.org support ticket. Any news? In progress (6/2)

132929 (18/1)
CMS having problems due APEL's problem parsing slurm logs (or something like that). APEL support have been called in, but no news yet. In progress (29/1)

IMPERIAL
133683 (24/2)
Atlas seeing a high job failure at Imperial, due to problems with their AGIS configs that they have no control over. Elena proposes closing the ticket and moving the conversation to JIRA. In progress (5/3) Update - atlas are waiting on seeing some running jobs before closing the ticket

133818 (4/3)
Another LHCB asking how many nodes do not have sse4.2 support. Simon reports there are no plans to decommission these nodes yet. Waiting for reply (5/3)

133723 (27/2)
This is a ticket for the Cloud site, Sno+ saw problems. Simon was investigating, and has offlined the cloud site in Dirac to prevent further failures. In progress (27/2) Update - Simon hasn't managed to reproduce any errors, and has suggested closing the ticket for now, reopening if needed.

132688 (3/1)
Another not really an Imperial ticket, I think this lost Pheno file ticket can be closed soon. In progress (29/1) Update - ticket closed

TIER 1
133719 (27/2)
Atlas spotted tranfers failing into Echo. It was being investigated, any news? In progress (27/2)

133752 (1/3)
Atlas noticed the FTS was broken. Was investigating Alastair noted that it appears to be an IPv6 issue. In progress (1/3)

133717 (27/2)
Likely related, a similar sounding CMS ticket. Any news? In progress (27/2)

133619 (21/2)
Missing unmerged CMS files at RAL. Chris has been helping a lot, but has asked CMS to double check his working. Waiting for reply (5/3)

133764 (1/3)
Sno+ ticket about the RAL BDII not having SFU information. It looks like the bdii information has recently changed (for the worse). Any news? In progress (2/3) Update - Karin has updated the ticket saying that things have got a lot worse for Sno+, upping the ticket's priority.

132589 (21/12/17)
LHCB killed pilots ticket. Some more investigations into this show that the problem is getting worse. Any luck with your investigation? In progress (23/2)

132708 (4/1)
WMS decommissioning ticket. Nothing to do here until next month I don't think. In progress (18/1)

127597 (7/4/17)
CMS network performance ticket. No news since Chris' comprehensive update in January. On hold (29/1)

124876 (7/11/16)
ECHO gridftp ROD tests not working, due to problems with the tests. No news on the counter ticket, still. On hold (13/11/17)

117683 (18/11/15)
GLUE2 publishing for Castor. A quick update in January reports a prototype version is being tested. On hold (3/1)

Monday 26th February 2018, 14.30 GMT
37 Open UK Tickets this week.

It's still seemingly like a stagnant time on the ticket front. A few tickets that need a poke include this RALPP ticket: 133390, which has been in waiting for reply for a few weeks, and this QMUL ticket: 132929, waiting for some input (or acknowledgement) from the APEL devs.

Glasgow have a few tickets related to some issues with xrootd playing up in various ways at their site (causing errors for lhcb in 133667 and a return of the classic xroot overload problems in 133690). The tickets are being handled with the usual Glasgow panache, but I thought I'd give an opportunity to talk about them.

For the first time in a while (that I can remember at least) a ticket has been (re-)assigned to atlas-adc-cloud-UK - the IC ticket 133683. The root causes of the problems are likely the move to using QM as IC's DATADISK. It could be interesting to watch (hopefully it won't be though!).

Related to the previous tickets, for the Sussex xroot ticket 122772 it is worth atlas re-engaging with this. Plus perhaps the errors seen could be related to xroot playing up rather then a misconfig?


Monday 19th February 2018, 15.30 GMT
35 Open UK Tickets this week.

IPv6 Tickets.
A quick skim over these - does anyone have anything they want to add?

Bristol
133508 (14/2) CMS sites have been asked to set up Rucio test areas - this one hasn't been spotted yet. The Brunel equivalent (133506 contains possibly useful information. Assigned (14/2)

Tier 1
133421 (12/2) This Sno+ transfer ticket looks like it can be closed, the VO reports that things are fixed. In progress (14/2)

QMUL
132713 (4/1) One of the last hyperk support tickets, Daniela had a suggestion but no news on the ticket since. In progress (6/2)

DURHAM
133338 (7/2) This atlas jobs failure ticket has been reopened, with atlas still seeing issues but not sure about the cause (the jobs complain with "cat: output.list: No such file or directory"). Reopened tickets can often sneak by us so I thought I'd bring this one up. Reopened (17/2)

Monday 12th February 2018, 17.00 GMT
46 Open UK Tickets this week.

Link to all the UK Tickets.

It doesn't feel like a very exciting week for tickets - although it's worth noting that Sno+ seem to be having a ticket drive, cleaning up problems that they're seeing.

There's a RHUL ticket (133409) that needs acknowledging, and there's a few tickets from CMS regarding that data transfers that just seem confusing to me (133390 and 133389 at RALPP, 133344 at Imperial) - although sites aren't to blame for this confusion!

Completely anecdotally (citing 133424), is it me or does CVMFS feel less robust recently? It of course could just be me.

Finally I'll take this opportunity to do my bi-annual reminder to sites to please check the status of their tickets - when you start working on it please make sure to set them 'In Progress', when you ask a question please mark the ticket 'Waiting for reply' and when you're not going to make any progress for a while please set the tickets 'On Hold'. Finally finally, it's not really worth leaving tickets for too long before closing them - a day or two is usually more then enough.

Monday 5th February 2018, 15.30 GMT
38 Open UK Tickets this month

IPv6 Tickets
Sussex: 131617 On Hold (15/11/17)
RALPP: 131616 Chris put in a nice update a fortnight ago, citing some perfsonar problems. In progress (31/1)
Oxford: 131615 No recent news on the ticket but I think there's v6 progress at Oxford? On hold (7/11/17)
Cambridge: 131614 On hold (15/11/17)
Bristol: 131613 Early February was the estimated time to get the perfsonar boxes dual stacked, how's that looking? On hold (7/11)
Birmingham: 131612 Duncan poked the ticket last month. On hold (11/11/17)
Glasgow: 131611 I think any further news awaits you chaps moving into your new digs (once they're built). On hold (6/11)
ECDF: 131610 Planning is underway, Raul has kindly offered to help. In progress (5/2)
Durham: 131609 The v6 reverse DNS at Durham is still not working, Adam has provided an update on this. In progress (31/1)
Sheffield: 131608 Is there anyway we can help encourage the University to enable v6 for you? On hold (6/11/17)
Manchester: 131607 Duncan reckons you now have v6 reverse DNS lookup, so that's good news. On hold (1/2)
Liverpool: 131606 As further progress here is reliant on some upstream routers getting upgraded maybe this ticket should be put on hold? In progress (14/11/17)
Lancaster: 131605 Lancaster is just waiting on some testing from a v6 only endpoint. I'm working on setting up a v6 only UI to see if that helps. In progress (5/2)
UCL: 131604 Waiting on central IT to get back. On hold (15/1)
RHUL: 131603 RHUL's perfsonar boxen are now dualstacked - nice. On hold (31/1)

Regular Tickets:

SUSSEX
122772 (11/7/16)
Atlas xroot/webdav ticket. At last word just before Christmas Leo was waiting on some ports being opened up in the external firewall. Any joy? In progress (19/12/17)

RALPP
133250 (5/2/1042)
A ROD ticket - the date looks a bit suspect (I don't think GGUS has been around for that long). The test (ch.cern.WebDAV) and the server failing it (mover.pp.rl.ac.uk) all sound a bit weird too. Assigned (2/2/2018)

133274 (5/2)
CMS xroot failures. Things were fixed by a trusty restart script, but Chris has asked about the state of the AAA network. Waiting for reply (5/2)

OXFORD
133215 (31/1)
Atlas deletion errors on the newly reinstalled Oxford SE. After consulting on the dpm list Kashif tweaked his mysql settings and is in the "wait and see" phase. In progress (5/2)

BRISTOL
133220 (1/2)
CMS hammercloud jobs hitting their wall clock limit - for reason for which is proving a bit of a mystery. Luke has looked into this very closely so far, but it might be some weird emergent property. In progress (2/2)

BIRMINGHAM
132569 (19/12/17)
Dirac pilots not being able to be submitted to Birmingham. I think the problem is well understood, have the effected VOs been removed from the bdii? Assigned (22/1)

129930 (4/8/17)
Atlas http tests failing at Birmingham. Perhaps Kashif might have some insight into this after his recent DPM adventure? Although maybe this ticket will become moot. On hold (16/11/17)

GLASGOW
133115 (29/1)
Checking if the new lchb conddb cvmfs mount is mounted. For some odd reason some of Glasgow CEs are failing/not running the tests. Despite all the tests running across the same WNs. In progress (5/2) Update- LHCB seem to think this is a problem with the tests, and so the ticket can be closed.

ECDF
133222 (5/2/3164)
A ROD ticket from the distant future! The tests look okay now, so I suspect this ticket can be closed. Waiting for reply (5/2/2018)

SHEFFIELD
133019 (24/1)
Low availability ticket, all good. On hold (30/1)

133260 (3/2)
Atlas transfers failing. Any luck debugging this? In progress (3/2)

MANCHESTER
131526 (1/11/17)
Storage accounting deployment. Were there some roadblocks for this? On hold (12/1)

LIVERPOOL
133114 (29/1)
New LHCB mountpoint ticket. It looks like this ticket was missed. Assigned (29/1)

RHUL
132715 (4/1)
Supporting hyperk.org. Any word on this? In progress (22/1)

QMUL
132713 (4/1)
Support for hyperk.org. Sadly despite some fixing errors persist. In progress (5/2)

132929 (18/1)
CMS APEL problem for QM jobs. Due to a problem with SLURM, Dan originally "unsolved" this ticket. Reopened with some useful tips, but the apel team has been involved to check on this, which was the right call. In progress (29/1)

BRUNEL
132876 (16/1)
CMS seeing reading issues at Brunel. After some expert debugging from Raul I think we're waiting on the CERN ticket 133010. In progress (5/2)

IMPERIAL (kinda)
132688 (3/1)
A lost pheno files ticket that bounced back to IC. Just waiting for word back from users (which may take a while). In progress (25/1)

TIER 1
132589 (21/12/17)
Killed LHCB pilots at the Tier 1. There's a proposal to mark the ticket "unsolved", but Vladimir seems reluctant to do this. In progress (31/1)

117683 (18/11/15)
The old Glue 2 publishing for Castor ticket. Last news is that a prototype version is in testing. On hold (3/1)

127597 (4/7/17)
CMS ticket checking xroot and network performance. Chris provided a good news update - new firewall hardware is on its way. However this might not fix things, Chris warns more work might be needed. On hold (29/1)

124876 (7/11/16)
Echo failing gridftp nagios tests - due to the tests being broken. Absolutely no movement on the linked ticket to fix the tests (125026). On hold (13/11/17)

132708 (4/1)
The ticket tracking the decommissioning for the RAL WMSseses. It's going well. In progress (18/1)

Monday 29th January 2018, 15.30 GMT
43 Open UK Tickets this week.

New LHCB mountpoint tickets
LHCB have ticketed a bunch of sites to make sure that they have "/cvmfs/lhcb-condb.cern.ch" accessible on their WNs. It's a simple case of check and close, LHCB will do the verification their end afterwards.

BIRMINGHAM
132569 (19/12/17)
I'm not sure if some solid actions were planned out that week for this ticket, but it could do with an update. I think the decision was simply to remove the dirac supported VOs from the CREAM CE bdii? Assigned (should be a different status) (22/1)

BRUNEL
132876 (16/1)
I'm not sure what's going on in this CMS xroot ticket, but I'm wondering if the original issue either still exists or was not a Brunel problem after all. This ticket either can be closed, or perhaps put on hold whilst the related CERN ticket is sorted. In progress (23/1)

ECDF
132446 (11/12/17)
It looks like this ticket tracking dirac jobs having batch system problems can be closed after so tweaking in the argus servers. In progress (26/1)

Also I think the corresponding hyperk support ticket 132716 can be closed too.

RHUL
132715 (4/1)
It might well be that you're still in the middle of network maintenance, but a polite reminder of this hyperk support ticket. In progress (22/1)

TIER 1
132712 (4/1)
Still on the hyperk support ticket, this ticket was just waiting on the hyperk configs to get into quattor. Has that happened yet? In progress (23/1) Update - solved

132589 (21/12/17)
Raja has updated the ticket to sadly report that they are still seeing LHCB job deaths at RAL. In progress (29/1) A further update this morning from Vladimir asks to check on a bunch of jobs' statuses.

132708 (4/1)
Just for information, this is the ticket tracking the decommissioning of the RAL WMSses. In progress (18/1)

Monday 22nd January 2018, 15.00 GMT
54 Open UK Tickets this year.

Start with the good news - these tickets look like they can be closed:

BRISTOL
132880 (16/1)
It looks like transfers are working after the firewall fix. In progress (19/1) Solved, but CMS have hit Bristol with another xroot ticket: 132990

QMUL
132615 (26/12/17)
After changing the working directory LHCB jobs don't seem to be running out of space anymore, so the ticket can be closed. In progress (20/1)

TIER 1
132712 (4/1)
There seems to be positive news getting hyperK jobs working at the Tier 1, so maybe this ticket is sorted? In progress (22/1)

RALPP
132830 (12/1)
This complex CMS xroot ticket looks likely to be solved (in fact Chris might be closing the ticket as I type). In progress (19/1) Solved

Now onto the bad:

RHUL
132715 (4/1)
This ticket from Daniela about supporting the hyperK VO seems to have gone un-noticed. Can you please notice it? Assigned (4/1)

RALPP
132851 (15/1)
This CMS xroot ticket might be related to the one above, hence why it's not been tended to (indeed it might be able to be closed too). There's a request for some verbose output of an xrdcp from different CMS peeps, so the conversation is out of the site's hands for now. In progress (17/1)

QMUL
132713 (4/1)
Fixing hyperk jobs at QM on a couple of CEs. Dan had a kick of things a while back, how did that work out? In progress (4/1)

BIRMINGHAM
132569 (19/12)
Daniela spotted Dirac problems at Birmingham. Ultimately this is fallout from the Birmingham move to VAC, Daniela has suggested that Mark remove the VOs from the BDII to stop dirac sending jobs to an almost dead CE. Assigned (should be something else) (22/1)

MANCHESTER
132121 (28/11/17)
Any news or progress with this ticket to the VOMS service? There's been no updates with words in them from any site admins. In progress (1/12/17)

TIER 1
132589 (21/12/17)
LHCB pilots are still failing at the Tier 1 at Raja's last post, this ticket could do with an update from the Tier 1's side. In progress (10/1)

And the Ugly are a few tickets that need updates from the VOs:

MANCHESTER
132468 (14/12/17)
Alessandra updated this atlas transfer ticket with news that she has informed atlas of many lost files that were causing the errors. No news from anyone since. Perhaps someone from cloud support could update things? In progress (4/1)

IMPERIAL
132688 (3/1)
Daniela tried to poke Pheno over some lost files, but has had nothing but silence from them. Must have not been important files. Assigned (19/1)

132692 (3/1)
This LHCB ticket is in the same state as the Pheno one- waiting for someone from the VO to acknowledge the lost files. Assigned (3/1)

132683 (3/1)
The atlas equivalent of the previous two, Brian jumped on it when poked through another channel - so maybe these lines of communication aren't getting to where they should? In progress (22/1)

Extra extra...

Raul pointed out on tb-support this Brunel ticket 132876, which points to an IPv6 config issue and has been thrown back towards the T0 to fix things (132993).