Difference between revisions of "Past Ticket Bulletins 2016"

From GridPP Wiki
Jump to: navigation, search
Line 1: Line 1:
 +
'''Monday 21st of November, 14.30 GMT'''<br />
 +
21 Open UK Tickets this week.
 +
 +
'''TIER 1'''<br />
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=124606 124606] (24/10)<br />
 +
This CMS consistency checking *really* needs an update - for three weeks the user has been requesting some news. Even a simple placation would be worth something at this point. In Progress (1/11)
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=124876 124876] (7/11)<br />
 +
Nagios tests (not unexpectedly) failing against the RAL ECHO endpoint. At Daniela's advice Alastair filed a ticket to the monitoring group ([https://www.ggus.org/index.php?mode=ticket_info&ticket_id=125026 125026]). This ticket has been answered and closed, as Daniela asks did it help clarify things? On Hold (15/11) ''Update - Alastair has updated the ticket and is feeding his findings back upstream.''
 +
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=124785 124785] (2/11)<br />
 +
Another CMS ticket, about xroot server settings, that seems to be going a bit stale after being reopened. Has it simply escaped notice? Reopened (10/11)
 +
 +
'''BRISTOL'''<br />
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=125083 125083] (18/11)<br />
 +
FYI for the Bristol admins - this CMS waiting rool ticket might have snuck by when it came in last Friday. Assigned (18/11) ''Update - solved.''
 +
 +
 +
'''ATLAS XROOT/webdav tickets'''<br />
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=122771 122771] - Birmingham<br />
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=122772 122772] - Sussex<br />
 +
Both these tickets could do with an update, even if it's an unexciting "nothing to see here" one.<br />
 +
''Thanks to Jeremy M for updating the Sussex ticket - January isn't that far away!''
 +
 +
 +
'''OXFORD'''<br />
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=121924 121924] (2/6)<br />
 +
Similarly this Oxford perfsonar performance ticket could do with some kind of update. On Hold (10/8)
 +
 +
'''LSST tickets'''<br />
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=120351 120351] - '''Glasgow'''<br />
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=120350 120350] - '''RAL'''<br />
 +
An FYI that Alessandra has started tackling sites that had LSST job problems again. After a spot of arc being arc at Glasgow both sites are able to run "hello world" jobs, ready for some proper testing.
 +
 +
AFS jobs
 +
Just a suggestion, but sites who have decided not to close the AFS tickets assigned to them might want to put them On Hold.
 +
 
'''Friday 11th November 2016, 15.30 GMT'''<br />
 
'''Friday 11th November 2016, 15.30 GMT'''<br />
 
30 Open UK Tickets at the moment - keeping the review light:
 
30 Open UK Tickets at the moment - keeping the review light:

Revision as of 14:55, 28 November 2016

Monday 21st of November, 14.30 GMT
21 Open UK Tickets this week.

TIER 1
124606 (24/10)
This CMS consistency checking *really* needs an update - for three weeks the user has been requesting some news. Even a simple placation would be worth something at this point. In Progress (1/11)

124876 (7/11)
Nagios tests (not unexpectedly) failing against the RAL ECHO endpoint. At Daniela's advice Alastair filed a ticket to the monitoring group (125026). This ticket has been answered and closed, as Daniela asks did it help clarify things? On Hold (15/11) Update - Alastair has updated the ticket and is feeding his findings back upstream.

124785 (2/11)
Another CMS ticket, about xroot server settings, that seems to be going a bit stale after being reopened. Has it simply escaped notice? Reopened (10/11)

BRISTOL
125083 (18/11)
FYI for the Bristol admins - this CMS waiting rool ticket might have snuck by when it came in last Friday. Assigned (18/11) Update - solved.


ATLAS XROOT/webdav tickets
122771 - Birmingham
122772 - Sussex
Both these tickets could do with an update, even if it's an unexciting "nothing to see here" one.
Thanks to Jeremy M for updating the Sussex ticket - January isn't that far away!


OXFORD
121924 (2/6)
Similarly this Oxford perfsonar performance ticket could do with some kind of update. On Hold (10/8)

LSST tickets
120351 - Glasgow
120350 - RAL
An FYI that Alessandra has started tackling sites that had LSST job problems again. After a spot of arc being arc at Glasgow both sites are able to run "hello world" jobs, ready for some proper testing.

AFS jobs Just a suggestion, but sites who have decided not to close the AFS tickets assigned to them might want to put them On Hold.

Friday 11th November 2016, 15.30 GMT
30 Open UK Tickets at the moment - keeping the review light:

CMS want their Storage Dumps from RAL
https://ggus.eu/?mode=ticket_info&ticket_id=124606
They've upped the urgency on this request for the storage consistency checks - please can someone from the Tier 1 sooth them?

WMS for na62
124478 (RAL)
124241 (IC)
I might be misunderstanding things by lumping these tickets together, but in the IC ticket Daniela notes that bug listed in 124499has been fixed and the fix rolled out, so has asked if na62 want to try it again. Daniela also adds some suggestions to the RAL ticket to help debug things. Thanks Daniela!

REOPENED CMS Tickets
124785 - TIER 1
124848 - RHUL
For information - these two tickets have been reopened/re-assigned, and we know from experience that tickets in these states have a habit of sneaking past our collective guard.

THE 5 AFS TICKETS
I won't list them, but I would regard these tickets as quite closable one you've forwarded the issue onto the local admins (if you are the local admin too feel free to keep them open for your convenience though!)

Finally, the link to all the UK Tickets.

Monday 7th November 2016, 22.00 GMT
34 Open UK Tickets this month.

AFS TICKETS (?!)
A bunch of tickets landed on our doorsteps concerning AFS - which is odd, as they were all meant for the respective Tier 3s.

124805 (3/11) AFS being blocked at RALPP, Chris has requested some changes to the RAL firewall but there's no hurry. In progress (3/11)
124823 (3/11) Birmingham's AFS ticket,. Mark is unblocking udp port 7001 in the Birmingham firewalls. In progress (4/11)
124821 (3/11) Glasgow's copy of the AFS ticket - Gareth points out that it isn't really a site problem, but has kindly passed the information on. On Hold (3/11)
124822 (3/11) Manchester's AFS ticket, again the University firewall is the likely culprit. In progress (4/11)
124819 (3/11) And Liverpool's AFS ticket... John also thinks it's the University blocking UDP. In progress (3/11)
124816 (3/11) Finally RHUL's AFS ticket. Simon forwarded it to the Tier 3. In progress (7/11)

Andrew has suggested in the Manchester ticket that these shouldn't have been submitted - being a Tier 3 and not a Tier 2 problem. I'd suggest that sites would be fine solving the ticket after passing on the details to their local counterparts, perhaps leaving contact details in the solution.

SUSSEX
124614 (24/10)
Low availability ticket, nothing exciting here. On Hold (26/10)

122772 (11/7)
xrootd/webdav ticket from atlas. Although no progress is expected the ticket could do with an update (even a null one). On Hold (26/7)

RALPP
124684 (27/10)
Another availability ticket. On Hold (3/11)

OXFORD
124487 (17/10)
IPv6 had fallen over on the Oxford perfsonar - Kashif has fixed things and Duncan has put Oxford back into the mesh, so with any luck this ticket can be closed soon. In progress (3/11) Update - solved this morning'

121924 (2/6)
Another perfsonar ticket for Oxford, this one regarding a drop in throughput. How are things looking? On Hold (10/8)

BRISTOL
124796 (3/11)
CMS ticket about Bristol being moved to the waiting room. Has it snuck under the Monday radar? Assigned (3/11)

BIRMINGHAM
122771 (11/7)
atlas xrootd/webdav ticket. Mark has tried to get xroot to work but things don't seem to be playing ball. Mark will try again when he has time - let the rest of us know if you need help Mark! In progress (25/10)

GLASGOW
120351 (22/3)
Enabling LSST at Glasgow. The VO was enabled but things weren't working - any news? Although I'm not sure any of us have had much spare time over the last month! On hold (5/10)

124862 (6/11)
atlas deletions having problems at Glasgow where the DPM is playing up, but the good fight is being fought and it looks like things are nearly sorted. In progress (7/11)

122378 (28/6)
perfsonar at Glasgow - David has reinstalled the nodes and Duncan has meshed Glasgow up, things are looking good. In progress (7/11) Update - some last minute teething troubles, but David has given everything what is hopefully one last kick.

124052 (25/9)
LHCB ticket about incorrect running/waiting jobs being published on the Glasgow ARCs. Is the plan to wait for an official release with the fix in? On hold (26/9)

EDINBURGH
124758 (1/11)
A low availability ticket for ECDF. Looks like it was caused by Ops jobs getting caught in the queues. This might have been enough to clear the alarm, if not it's likely you'll need to on hold the ticket till it clears itself. In progress (4/11)

QMUL
124556 (20/10)
Biomed ticketed QM over a pair of CEs not working for them - which is expected as they're in the middle of being de/re-commissioned. On Hold (20/10)

IMPERIAL
124241 (5/10)
The IC WMS not working for a na62 user. NA62 is in the process of moving to Dirac, so the ticket is being kept on hold for reference; Daniela has however found a possibly related bug that could be the root cause of this issue. On hold (7/11)

BRUNEL
124428 (13/10)
CMS transfers failing after someone cut the Brunel network link. Hopefully IPv4 traffic will be restored fully later this week. This is completely out of the site's hand, but an interesting study of how Tier 2s need their phat network pipes these days. In progress (5/11) Update - Raul solved the tickets, the extra emergency network plumbing seems to have solved the congestion problems.

100IT - have an availability ticket 124511

TIER ONE
124876 (7/11)
Nagios gridftp failures for the nwe echo interface - which shouldn't really be tested, but needs to be set to production for atlas tests. Sound familiar to the CDF/Archer problem? Maybe PRODUCTION=Y, MONITORING=N needs to be a reinstated option? In progress (7/11) Alastair has added some extra input to the issue - perhaps though the ticket should be waiting for reply now. I'm not entirely sure if the questions posed were rhetorical or not.

124606 (24/10)
CMS consistency check, delayed by the usual consistency checker being on leave - hopefully another team member knows the invocations. In progress (1/11)

124785 (2/11)
CMS have noted that the two xroot servers at RAL need an extra config added (or something - CMS ways appear eldritch and arcane to my atlas-tempered worldview). The ticket has been acknowledged but no news. In progress (2/11)

120350 (22/3)
Enabling LSST at RAL. Test jobs failed here too. Maybe the 3 sites that had LSST payloads fail (the TIER 1, Glasgow and Lancaster) could put their heads together with a site where the jobs work and take some of the pressure off of Alessandra? On Hold (12/9)

122827 (12/7)
SNO+ disk space query, upended by the departure of Matt M. David his replacement has provide his details, and is waiting to see what the next batch of MC looks like - so this ticket should probably go On Hold. Waiting for reply (21/10)

121687 (20/5)
Packet loss for the Tier 1 Perfsonar. Brian notes the replacement of the UK Light router seems to have improved the picture somewhat, as hopefully will moving the perfsonar host within the network infrastructure. Waiting to see how that pans out. In progress (26/10)

124877 (7/11)
Nagios tests failing for one of the Tier 1 ARCs - being looked at, and it's a very fresh ticket. In Progress (7/11)

124478 (17/10)
Another WMS ticket from na62. This one got confused as ideally it needed help from WMS devs. Hopefully the na62 move to Dirac will render this ticket moot as well. On hold (1/11)

123504 (19/8)
T2K proxy expiration problem ticket. This ticket really should just be closed with the departure of Jon Perkin and the rumoured likelihood that t2k will be quiet on the grid for a while. Waiting for reply (28/10)

117683 (18/11/2015)
Glue 2 publishing for Castor. Jens provided an update last month (thanks Jens!), citing the lack of resources to commit to this - but there is a promising prototype (still far from production ready but better then naught). On hold (5/10)

FINALLY, IN MEMORY OF EFDA-JET
Despite being decommissioned it still has two tickets:
122198 - Decommissioning ticket, waiting for 90 days to pass before the site and ticket can be closed (end of this month).
124237 - poor Gordon found himself in the ridiculous situation of having to ticket a decommissioned site for low availability in order to stop the ROD dashboard alarming. There's a life lesson in there somewhere, but I don't want to dwell on it too much.


Monday 31st October 2016, 15.40 GMT
29 Open UK Tickets this week.

Tier 1
124244 (5/10)
LHCB having cvmfs-ish problems, no news for a while (since the day of submission). In progress (5/10)

124606 (24/10)
CMS consistence checking ticket. The ticket asks for lists of LFNs, perhaps it got lost in noise of last week? The submitter is getting restless. In progress (24/1)

124478 (17/10)
A WMS ticket from an na62 user - this ticket is in a weird limbo as it calls for help from a WMS support unit which of course no longer exists. This ticket risks getting stuck. Also Dan asks in the ticket how to get na62 added to the list of VOs in GGUS - I'll look into this (unless someone has that information handy?).

ECDF
124592 (22/10)
LHCB problems with an ECDF arc ce. Andy thought he had fixed things, but asked for confirmation a week ago. Waiting for reply (24/10)

Imperial
124241 (5/10)
NA62 having problems with the IC WMS. Daniela asked if there's another UI the Imperials can use to test things as they could not reproduce the error with their UI. Waiting for reply (17/10)

Liverpool
123962 (19/9)
John's schooling of Biomed in the art of Spacetokening continues with the creation of the spacetoken BIOMEDDISK. Perhaps if other sites are supporting biomed on their SEs they could follow suit as an incentive to biomed? In progress (31/10)

IPv6 Perfsonar
124487 (Oxford)
124616 (Durham)
Some sites appear to be having problems with their IPv6 - but teething problems are expected I suppose. Oliver asks for Durham if reverse DNS is needed for IPv6 mesh tests to work (my thought is yes, but I'm often wrong).

Monday 24th October 15.00 BST
32 Open UK Tickets
Just another lite update this week.

Looks like they can be closed:'
123504 (20/9) - Tier 1 T2K ticket, with T2K quiet on the grid and Jon P's departure I would close this ticket.

124411 (12/10) - QMUL ticket regarding IceCube GPU glideins. Looks like things are working and the ticket can be closed, interesting stuff though.

Need an Update:
122378 (4/10) - Glasgow Perfsonar ticket. I'm being a bit unfair singling this one out though.

123870 (14/9) - Manchester Perfsonar ticket.

121687 (10/10) - Tier 1 Perfsonar ticket - a reminder to update how do things look after a fortnight of statistics with the new router.

123962 (13/10) - Liverpool biomed ticket, after being taught the wonders of space tokens by John I noticed that Sorina's update contained a question that could do with a response.

124244 (5/10) - Tier 1 Ticket from LHCB for a cvmfs issue at RAL. Is this still a problem? No update for 19 days.

124431 (14/10) - Tier 1 ticket from atlas regarding a frontier squid. Checking the plots things still don't look right (at least to my eyes). Update - thanks to Alessandra for closing the ticket, the (unmaintained) monitoring was lying to us!


Monday 17th October 2016, 16.00 BST 34 Open UK Tickets this week.

We have a lot of open tickets for the UK, but none seem to urgent need an update[1]. This week's mission, should you choose to accept it, is to look at any UK Tickets your site might have and do what you can with them.

[1] Except maybe the Birmingham ticket 124319, which I think can be closed.

Monday 10th October 2016, 14.30 BST
32 Open UK Tickets this week.

Security Jobs (ECDF, Bristol and RHUL)
124267 (6/10)
Security nagios jobs haven't been able to run at 5 CEs in the UK - 3 at ECDF, one at RHUL and one at Bristol. Govind has already queried the ticket for RHUL, and received an answer. In progress (7/10)

RALPP
123804 (9/9)
An availability ticket in its last stages, the RALPP numbers have soothed back to 85%/85%, and Chris asks if he can close this ticket now - has the dashboard stopped alarming? It would be good to know these thresholds so that sites don't prematurely close their tickets. In progress (10/10)

IMPERIAL
123959 (18/9)
This Sno+ Dirac ticket has confirmed that it can be closed. Good news from this ticket is that David Auty has taken over Matt M's role for Sno+. Waiting for reply (can be closed) (7/10)

Sneeky Ticket at Liverpool
124298 (7/10)
This LHCB ticket is still just assigned, wondered if it snuck under your radar? Looks like one of your ARC CEs is playing up. Assigned (7/10)

New Router at RAL
121687 (20/5)
Gareth updated this perfsonar ticket with news of the new router replacement at RAL. Let's hope the upgrade pay off will show in the perfsonar's stats! In progress (10/10)


Monday 3rd October 2016, 15.15 BST
31 Open UK tickets this month.

SUSSEX
122614 (6/7)
An NGI ticket concerning the availability problems at Sussex, afaics this is just waiting on a month or so smooth running so we can sign off on the time of troubles. On Hold (19/9)

123740 (6/9)
A common or garden ROD availability ticket, on hold as is the SOP. Looks like things are quite green down Brighton way, so hopefully this can be closed soon. On Hold (20/9)

122772(11/7)
Atlas ticket asking about webdav and xroot endpoints. On Hold until Sussex get an admin to do this - if it becomes a problem before then we will need to offer assistance. (By we I mean someone with a STORM who has a clue about what's going on...). On hold (26/7)

OXFORD
121924 (2/6)
Duncan spotted a drop in perfsonar rates at Oxford. Put on hold due to staff shortages, this ticket could do with an update (even a null one). On Hold (10/8)

BRISTOL
124051 (25/9)
An lhcb job submission problem at Bristol. Some cunning investigation traced the problem to a change of default client in the v5.1 arc tools (gridftp to a-rex), so Winnie and Lukasz are asking for the new ports to be open. They are however worried that they will need to upgrade their CE to match the major version of the arc tools for things to work smoothly. In progress (27/9)

BIRMINGHAM
122771 (11/7)
Another atlas xrootd/webdav deployment ticket. At last update Matt was in position to start rolling out these changes - any news? In progress (20/9)

GLASGOW
124052 (25/9)
A ticket from lhcb about the much reported on ARC CE job publishing problems. Discussed last week, but I'm always up for more discussion! On hold (26/9)

120351 (22/3)
Enabling LSST. Gareth has rolled out support to one of the Glasgow ARC CEs and is ready for testing (the CE for LSST, not Gareth himself). Proper job. Waiting for reply (28/9)

122378 (28/6)
Glasgow's perfsonars being out of commission. Rebuild ETA was in the next week or so, still on course for that? On Hold (19/9)

EDINBURGH
123996 (20/9)
LHCB jobs being murdered by the ECDF batch system, found due to them being submitted without a default wall time. Marcus made the batch system less murdery and pilots have stopped being killed, so it looks like this ticket can be closed. In progress (23/9) Update - solved

123732 (5/9)
ce5 and ce7 (both creams) throwing up ROD alarms, it looks like their state needs to be reviewed. Here's a friendly nudge to review it. Hint hint. On hold (20/9)

122653 (7/7)
The other ROD ticket, regarding the archer facing CE. I don't think this has had any progress on it. Is it causing (more of) a problem for the ROD yet? Waiting for reply (probably should be On Hold instead) (26/7)

SHEFFIELD
124003 (20/9)
Atlas transfer problems to Sheffield - Elena is ongoing a server rebalancing exercise to smooth this out but this will take time. On Hold (28/9)

124036 (23/9)
An expired argus certificate caused Sheffield to get a ROD availability ticket, but it's being handled and on-holded so all's well. On hold (29/9)

MANCHESTER
123870 (13/9)
Duncan ticketed over poor perfsonar throughput results to Manchester. Marked in progress, but any news in the investigation? In progress (14/9)

LIVERPOOL
123962 (19/9)
As discussed over the last few weeks, Biomed had trouble using the Liverpool SE as the shared area had filled up. John helpfully gave a quick introduction to space tokens to help them out, but no news from biomed since. A re-poke is likely in order. In progress (20/9)

IMPERIAL
123959 (18/9)
Sno+ DIRAC jobs failing without logs, likely due to being killed when exceeding memory limits. No update on this for a while - but then it will probably need to be looked at in a different light if we no longer have a Matt M. In progress (19/9)

BRUNEL
123947 (16/9)
CMS asking Brunel to do some investigation into some issues they see at other sites. I think some conclusions have been made but I'm not sure what they are! The ticket is very long - a testament to the effort put in. In progress (30/9)

124153 (29/9)
A very fresh ROD SRM-Put ticket. Assigned (29/9) And solved - a disk server was playing up.

100IT
123753 (6/9)
I think this ticket should be closed, I'll prod it to make sure, don't want them cluttering up our GGUS! In progress (19/9)

THE TIER 1
124183 (2/10)
An "ALARM" ticket from lhcb, with problems seen copying from the RAL WNs to the RAL BUFFER. Looks like the problem was solved before Sunday teatime, so I think this ticket can be closed. In progress (2/10) Solved, the root problem was an imbalance in lhcb data causing the majority of the work to be done by a minority of the disk servers - being looked at.

124188 (3/10)
atlas reckoned a frontier squid at RAL is down. The picture looks confusing, and perhaps the ticket should be waiting for reply so atlas double check their results. In progress (3/10)

120350 (22/3)
Enabling LSST at RAL. Test jobs failed, but in-depth debugging has yet to occur - on hold till then (probably until after San Franscisco). On hold (12/9)

122827 (12/7)
Sno+ requesting more disk space. There has been some discussion, but as with the Imperial ticket this will need to be reviewed. Waiting for reply (probably should be On Hold instead really) (19/9)

123504 (19/8)
T2K proxy problems between the RAL WMS and Sheffield. Another ticket that might be in an orphaned state after recent news of people leaving. Waiting for reply (20/9)

122364 (27/6)
cvmfs support for the solidexperiment.org VO. On hold awaiting the VO to gain some traction, any signs of solid progress? No rush if there isn't... yet! On Hold (24/8)

121687 (20/5)
Packet loss on the RAL perfsonar, which was due an update once the alloted time has passed and a key bit of routing kit was replaced (as John pointed out in the mini-update on Friday). On Hold (30/9)

117683 (18/11/15)
glue2 publishing for castor. Really, really, really could do with an update - even a null one! On hold (17/2)

NGI
119995 (7/3)
Culling the uncertified (sites). UKI-ScotGrid-Gla-PPS was confirmed no longer needed by people at Glasgow, so this is nearly done. In progress (19/9)

122198 (17/6)
Decommissioning JET ticket, just waiting for 90 days before the site can be officially removed. On hold (19/9)

Monday 26th of September 2016, 14.30 BST

31 Open UK Tickets this week.
Link to all the UK tickets.

An office move and the resulting chaos have left me not able to give the tickets a proper going over this week, but I'll do a proper review of all of them is due next week anyhow.

Highlights are the continued problems caused to LHCB by ARC publishing problems with tickets at Glasgow (124052) and Brunel (124045). In the Glasgow ticket Raja cites this bug report, which many of us have been following.

Somewhat related is the QM ticket 123451, although the route problem here is SLURMs lack of support for cputime rather then wall time.

Thanks to John for teaching Biomed the basics of Space Tokens in 123962.

Finally this Tier 1 ticket from T2K: 123504. Has Elena been informed that the problem might be at the Sheffield end?

Monday 19th of September 2015, 15.30 BST
27 Open UK tickets. Keeping it light this week.

No more room at Liverpool (at least for the little VOs)
123962(19/9)
Biomed ticketed Liverpool over not being able to access the Liverpool SE, which John reveals is due to the shared area running out of room. John mentions Space Tokens as a possible (and the only) solution for this problem. In my (and I think everyone here's) opinion this isn't a site problem per se, but perhaps we could use it as an opportunity to educate biomed in spacetoken usage? In progress (19/9) Biomed updated the ticket, wanting to know more about the magic of space tokens. Also they tried to clean up some data but had no joy in actually freeing up space.

Brunel on the Case for CMS
123947 (16/9)
CMS have asked Brunel to investigate an issue they have been seeing with their glideins on Condor/ARC systems - Brunel have one CE with the problem and the other CEs without so is in a great position to compare and contrast. Raul has been working hard getting to the bottom of this. In progress (19/9)

Sno+ having Sno luck with jobs in Dirac.
123959 (18/9)
Matt M ticketed the dirac team due to see a a high number of jobs failing without getting any logs back. Simon tracked this (at Imperial at least) to jobs going over their VMEM limit, which has Matt confused. Simon provided a script to extract job memory usage in the ticket, which may be of interest to some. In progress (19/9)

Certifiably Uncertified. 119995 (7/3) Purging long dead sites from the system - UKI-ScotGrid-Gla-PPS is the last on the list for exorcising from the gocdb. Just checking (on Jeremy's behalf) it's not still defunct? In Progress (19/9)

Suspended from the DTEAM (not a UK ticket, but it affects a lot of us).
123909 (15/9)
As spotted (and escalated) by Jeremy, the ticket covering last week's surprise dteam suspensions have been update with an explanation of what caused the trouble. In Progress (16/9)

Another few tickets to look at:
123504 (19/8)
T2K ticket to the Tier 1, regarding trouble with proxy renewal at Sheffield via the WMS. The discussion points to the problem possibly being with the Sheffield CE, so Elena should be made aware of this. The current suggestion is to remove Sheffield from the T2K production list. Waiting for reply (20/9)

123230 (2/8) Atlas checksum tests failing at Sussex. With thanks to input from Dan and Brian and prodding by Jeremy we're getting to the bottom of this. Thanks all! In progress (20/9)

Monday 12th September 2016, 15.00 GMT
33 Open UK tickets. Down to 29 this morning!

SUSSEX
I'm just going to skim the Sussex tickets, I'll contact Jeremy M about these again offline.
122772 (On Hold) - Atlas webdav/xroot ticket.
123230 (In progress) - Atlas transfer failures, was partially fixed as of 15/8.
123740 (Assigned) - Recent low availability ticket.
123733 (Assigned) - Ops SRM-LS test failures.
122614 (In progress) - Technically a ticket to the NGI concerning the Sussex status, not a Sussex ticket.

RALPP
123859 (12/9)
CMS noticed that the Phedex agents appear to be down at RALPP. Fresh this afternoon. Assigned (12/9)

123858 (12/9)
A duplicate of above, I believe because you run 2 Phedex boxes? Assigned (12/9) Update - one of these was a duplicate, but they're both solved, network issues seemed to be the underlying problem.

123804 (9/9)
Low availability ticket, Chris provided an explanation and notes that there have been no tests since - possibly related to the problems noticed in this morning's EGI broadcast. In progress (9/9)

OXFORD
121924 (2/6)
A ticket from Duncan concerning a drop in perfsonar throughput rates at Oxford. Currently on hold - any ideas perhaps when you'll get round to looking at this? On hold (10/8)

BRISTOL
123860 (12/9)
Another fresh "Phedex is down" ticket from CMS. Assigned (12/9) Update - solved alongside the RALPP tickets.

BIRMINGHAM
122771 (11/7)
Atlas ticket requesting xroot and webdav endpoints. The submitter requests an update. In progress (12/9)

GLASGOW
120351 (22/3)
Enabling LSST at Glasgow. Any news, or plans for having any news soonish? On hold (19/7)

122378 (28/6)
No perfsonar results for Glasgow, the server appeared borked so David took it down. Is it soon due to rise from the ashes soonish? On hold (28/6)

EDINBURGH
123732 (5/9)
Nagios ticket for ce6 and ce7, which I believe are defunct CEs? Marcus has put the ticket on hold until Andy is back. On hold (5/9)

123164 (28/7)
Another nagios ticket, a glue2.validate one this time. Andy was confused about where this was coming from, and I assume the alarm is still happening. Perhaps the glue-validator will yield some clues? Waiting for reply (29/8)

122653 (7/7)
The third nagios ticket for ECDF, this one covers the odd saga of the ARCHER queue that would ideally be "IN PRODUCTION, NOT MONITORED". Waiting for reply (this probably should on hold instead) (5/9)

DURHAM
123810 (9/9)
LHCB noticed that the arc CEs were producing an incorrect number of running/waiting jobs (was this the catalyst for the tb-support thread on the same subject?). The ticket could do with acknowledgment, perhaps someone in the know could lend Durham a hand. Assigned (9/9) In progress now

SHEFFIELD
123851 (12/9)
APEL-pub ROD ticket. Matt R has rerun the apel publishing scripts by hand and is awaiting the results, if this doesn't work we might need to ask the apel people how things are looking their end. In progress (12/9)

MANCHESTER
123813 (10/9)
Atlas deletion errors, probably due to a downed disk server that has been brought back up. In progress (12/9) Solved.

LANCASTER
123789 (8/9)
LHCB jobs failing at Lancaster, probably because they're sensitive to some problems we had with our NFS server housing home and sandbox areas. We've hopefully soothed our problems by upping the number of nfs threads, we're in the wait and see period. In progress (12/9)

UCL
123734 (5/9)
ROD apel ticket for UCL. Ben is investigating, it looks like some VAC boxes are having trouble talking to the network. In progress (7/9)

QMUL
123400 (15/8)
Low availability ticket for QM, but Daniela notes the alarms appear to have cleared so this should be able to be closed. On hold (12/9)

123451 (18/8)
LHCB had problems with a QM CE, which Dan noticed was fubared and needed a reinstall. Should hopefully be back online soon? On hold (18/8)

BRUNEL
Raul solved the two tickets before I could get to them, nicely done.

100IT have a ticket - 123753 (6/9) but you don't have to bring yourselves to look at it.

The TIER 1
123794 (9/9)
Atlas noticed a lot of analysis job failures, after some prodding it turned out that the culprit was one dodgy worker node. Case closed? In progress (9/9)

122827 (12/7)
Sno+ asking for more disk, this has developed into some discussion and Alastair has added a few more points. Matt M has expanded on the Sno+ needs, and has decided to make more use of Tier 2 space for Sno+, which many or may not keep them ticking over until Echo is upon is. In progress (24/8)

120350 (22/3)
Enabling LSST at RAL. "Proper" test jobs are failed, Alessandra has put the ticket on hold until the issue can be debugged properly.

123504 (19/8)
Jon Perkins noticed that the WMSes at RAL didn't seem to be updating proxies at Sheffield for some long T2K jobs. A conversation was started but seemed to have stalled, it seemed some weird resource matching errors were going on. The landscape might have changed in the last 3 weeks however. In progress (23/8)

122364 (27/6)
cvmfs support for solidexperiment.org. After some solid progress the ticket is on hold waiting for someone VO side to try to roll out some experiment software in anger. On hold (24/8)

117683 (18/11/2015)
Developing glue2 support for Castor. Any update will do?! On hold (5/4)

NGI
119995 (7/3)
The culling of the uncertified sites. The old Glasgow Pre-production service was mentioned too. In progress (23/8)

So long EFDA-JET
122198 (17/6)
The jet decommissioning ticket. The other related ticket (123291) was closed as I wrote this report, so it won't be long until this ticket should be closed. Bye Jet! On Hold (1/9)

Friday 19th August 2016, 11.30 BST
29 Open UK tickets today.
Link to all the UK Tickets.

VOMS servers
123333 (9/8)
After the blip with the VOMS servers last week Daniela opened this ticket - it looks like the problem is fixed now, and this ticket can be closed. Assigned (17/8)

BRISTOL
123419 (16/8)
A low availability ticket for Bristol, which has Winnie a little confused as the "created_at" date for the issue is back in July - Winnie asks for confirmation that this is actually an old, stale issue - the last 3 weeks of tests look good for Bristol, but the nagios link cuts off mid-July. Waiting for reply (16/8)

QMUL
123400 (15/8)
Another low availability ticket - following the nagios link here I see a lot of "-1.00" entries - which I think are caused by tests returning unknown statuses - Daniela is rightfully suspicious of it. There may be clues in this talk, but probably worth doing as suggested and just On Holding the ticket. Assigned (15/8)


A few tickets that could do with an update:
117683 - Glue 2 for Castor
122364 - cvmfs support for solidexperiment.org (looks like it's nearly done).

And finally, so long JET!
122198 (17/6)
EFDA-JET's decommissioning date is the 25th.


Monday 8th August 2016, 15.00 BST
28 UK Tickets this week.

SUSSEX - I'll chase Jeremy M up on these offline.
121797 (26/5)
Sno+ Dirac Jobs failing.

123230 (2/8)
Atlas transfers failing with "checksum denied" errors. Update - Jeremy M is looking at things, his job made easier thanks to the helpful tips given.

122772 (11/7)
Atlas requesting root and webdav endpoints.

120735 (11/4)
Low availability ticket - After the advent of Argo Daniela wisely suggests closing this ticket and opening again if the issue reappears. Update - Jeremy M closed the ticket as per the suggestion

See also:
122614 (6/7)
NGI ticket regarding low Sussex availability numbers - things were looking better but still a lot of unknown tests. I need to chase this up too. On Hold (1/8)

OXFORD
121924 (2/6)
Duncan noticed a drop in Perfsonar throughput at Oxford - Pete G is looking at this, but if it looks like it's going to be slow going maybe this ticket needs onholding (an update would be better though). In progress (7/6)

BRISTOL
120455 (29/3)
CMS validation of HTCondor CEs at Bristol - on hold awaiting the accounting getting sorted. On hold (29/6)

BIRMINGHAM
122771 (11/7)
Atlas request for webdav and root endpoints. http is working (nice one) but Mario for atlas reports that a naive xrdcp didn't work. How's your firewall? In progress (2/8)

GLASGOW
120351 (22/3)
LSST at Glasgow. On hold for a bit (19/7)

122378 (28/6)
Duncan spoted there were no Perfsonar throughput results for Glasgow. David put them in for a rebuild - any joy? On Hold (28/6)

ECDF
122653 (7/7)
ROD ticket for the archer ARC CE that will never work for ARC. At the last update Kashif suggested a "testing" type in gocdb - a good plan. Waiting for reply (should be on hold?) (26/7)

122981 (20/7)
Atlas ticket tracking CentOS7 testing at ECDF. In progress (4/8)

123164 (27/8)
glue2-validate ROD ticket, probably caused by the sudden demise of their old SL6 cluster. Have you managed to exorcise the ghosts of your old cluster from your publishing? In progress (3/8)

DURHAM
123293 (5/8)
LHCB having job submission problems with one of the Durham CEs (ce4). Fresh ticket. Assigned (5/8)

SHEFFIELD
123323 (8/8)
Fresh ROD apel-publishing ticket. In progress (8/8)

123314 (7/8)
Atlas transfer failure to Sheffield, the submitter cites ticket 121746. In progress (7/7)

QMUL
120204 (15/3)
LHCB having trouble submitting to QM's dual-stacked CEs. No news on the external ticket dealing with the underlying issue (120586). On hold (25/4)

EFDA-JET
122198 (17/6)
Decommissioning of the EFDA-JET grid site. In progress (5/8)

123291 (5/8)
Biomed ticket tracking their side of the JET SE decommissioning. On hold (5/8)

TIER 1
122827 (12/7)
Sno+ asking if they could have some disk space to go alongside their tape - which the Tier 1 might not be able to provide. Some discussion over this, worth going over in the storage meeting. In progress (4/8)

122364 (27/6)
Commissioning solidexperiment.org at the Tier 1 - this seems to be chugging along okay, but no news in nearly a month. In progress (18/7)

123276 (4/8)
ROD ticket, but the affected endpoint srm-biomed is going in for decommissioning so will be set to "not monitored" soon. In progress (4/8)

120350 (22/3)
Enabling LSST at the Tier 1. Job submission is working through Dirac to all CEs now, so just waiting on a final thumbs up on this one. Sweet. In progress (3/8)

121687 (20/5)
Packet loss on the RAL perfsonar, awaiting a router replacement, expected sometime in September. On hold (5/7)

119841 (1/3)
http support at RAL on lcgcadm04 - the last update looks good that the needed use case will be supported in the next release. On hold (2/8)

117683 (18/11/2015)
Castor not publishing glue2 - which requires some development slogging. Any news? On hold (5/4)

121258 (6/5)
Decommissioning the WMS lcgwms06. This ticket should probably be on hold (or perhaps even closed - I'll need to double-check proc 12. In progress (28/6) Update - closed. Cheers!

NGI
119995 (7/3)
Cleaning up the uncertified sites in the UK. Jeremy was on it. In progress (18/7)

Monday 19th July 2016, 14.00 BST
39 Open UK tickets this week!

SUSSEX
120735 (11/4)
Availability ticket - things appear to be looking up at Sussex with their Storm up and running again - hope it will stay up. In Progress (12/7)
Good news for the NGI ticket about Sussex's poor figures: 122614

LSST at GLASGOW and the TIER 1?
120351 (Glasgow)
120350 (Tier 1)
A pincer movement on you guys as Alessandra also asks in the tickets, any news? Update - thanks to Gareth for an honest appraisal of the situation at Glasgow.

Other Availability Tickets - RHUL and BRISTOL (13/7/16)
122854 (RHUL)
122853 (Bristol)
These two tickets seemed to have snuck pass the sentinals at their respective sites - they probably just need On Holding, currently just Assigned. Update - both On Hold now, thanks!

(also for RHUL is the Ops ticket 122851 - still just Assigned as well).

LHCB at DURHAM
122662 (7/7)
LHCB jobs are Durham are running into difficulties, Oliver asks if LHCB can take a look as the jobs are consistently hitting batch system limits and wasting CPU resources because of this. Waiting for reply (18/7)

MANCHESTER
122379 (28/6)
Robert tracked the reason behind the latency perfsonar issues seen - the owamp disk limits were set too low (what is presumably the default). Robert asks Duncan what they have at Imperial in terms of both configs and amount of space used (for what it's worth Lancaster has 10GB in the configs too, but only 21MB in /var/lib/owamp ...). Waiting for reply (18/7)

Decommissioning EFDA-JET
122198 (17/6)
Just an FYI, JET are in their final downtime. In Progress (12/7)

A spot of ticket buildup at BIRMINGHAM
122771 - Webdav & xroot ticket, Assigned.
122416 - Pilot role ticket, On Hold.
121125 - Missing atlas dump ticket, On Hold.
Tickets seem to be ganging up at the site - let us know if you need a hand. The atlas dump ticket is looking quite crusty, but all are important.

TIER 1
122827(12/7)
Sno+ are having regrets after saying a while ago they'd be okay with all tape and no proper disk. Matt M has requested a disk area that isn't scratch. This is being looked at. In Progress (13/7)

122818 (12/7)
Alessandra noticed that ATLAS Event Service jobs were failing due to the RAL Object Store was down. Alastair replied that the services was being configured, but there was no way to create a downtime for it and failover was not working as expected. Interesting stuff. In progress (12/7)

121258 (6/5)
Decommissioning one of the RAL WMSes. The service is in downtime and stopped, Are you waiting to delete it from the gocdb and close the ticket? In progress (28/6)


Monday 11th July 2016, 15.30 BST
36 Open UK tickets this week!

NGI
122614 (6/7)
The hammer is threatening to come down on Sussex (with reference to 120735) - I've let Jeremy M know offline about this- hopefully we can get things looking better this week. In progress (11/7)

ATLAS WEBDAV/XROOT TICKETS
122695 - Tier 1
122772 - SUSSEX
122771 - BIRMINGHAM
122770 - SHEFFIELD
Tickets from atlas trying to get the last few sites that don't seem to have xroot or webdav working or advertised (the Sheffield one looks like just a misconfiguration).

A polite nudge to Mark and Matt to check the Birmingham tickets in general, I think you have a few just assigned or need shimmying along...

ECDF
122653 (7/7)
Edinburgh having a CE that shouldn't be monitored being monitored has bitten them again (duplicate/expansion on 120004). Andy once again wishes that there was a way of unlinking the CE from the monitoring in gocdb. Waiting for reply (8/7)

GLASGOW
122498 (3/7)
It looks like this MICE ticket can be closed -it all looks OK. In progress (6/7)

Monday 4th July 2016, 13.00 BST
29 Open Tickets this month, arranged by site.

SUSSEX
121797 (26/5)
Sno+ dirac jobs failing at Sussex - looks to be a separate issue from the now closed pilot ticket (118289). In progress (13/6)

120735 (11/4)
A Low-availability ROD ticket. Hopefully this will "resolve itself" soon. On hold (6/6) Update - see Daniela's observation on TB-SUPPORT - looks like things a weird at Sussex.

RALPP
122463 (1/7) Atlas were seeing a "stalled" xrootd connection on the xroot door to RALPP's dcache, but it turned out that they were using the wrong url. The details are being changed in AGIS, and the submitter asks if this xroot door's hostname is planned on being kept as is. In progress (1/7)

OXFORD
121924 (2/6)
Duncan noticed a drop in perfsonar performance for Oxford- this is being brought up with the Oxford networking team. Any news from them? In progress (7/6)

BRISTOL
120455 (29/3)
CMS validation of the HTCondor CE at Bristol. Still waiting on the accounting getting sorted before things can kick off again properly. On hold (29/6)

BIRMINGHAM
122416 (29/6)
Daniela spotted that a number of VOs were missing pilot roles at Birmingham. On hold as Matt is away on holiday and Mark is unsure if he'll be able to get round to rolling out the changes. On hold (4/7)

121125 (28/4)
Missing atlas dumps, but Matt is on holiday so no news. On hold (4/7)

GLASGOW
120351 (22/3)
Enabling LSST at Glasgow. Gareth reports that things are picking up with their new Identity Management System almost in production, once this is satisfactorily sorted LSST will be enabled at the site. On hold (1/7)

122378 (28/6)
Duncan spotted that the Glasgow test results were all orange, like they had had too much fake tan applied to them. David and Co have decided to put them into scheduled downtime pendinga reinstall. On hold (28/6)

121929 (2/6)
Biomed still "accidentally" on the Glasgow SE. A date for purging biomed from the SE has been set for the 30th of July. In progress (24/6)

122498 (3/7)
A rare mice ticket - they had trouble getting their data. Sam reports that it was a config error on some of the mice-containing disk servers, which should be fixed now. Waiting for reply (4/7)

ECDF
120004 (7/3)
ROD ticket for the archer facing arc CE that will be perpetually failing nagios tests, it doesn't look like much can be done. On Hold (23/6)

SHEFFIELD
122517 (4/7)
Low availability ROD ticket. Elena is unsure of the reasons for the low metrics, and is having trouble digging up useful information. In progress (4/7)

MANCHESTER
122379 (28/6)
Duncan spotted that the Perfsonar latency results weren't looking right - after a bit of discussion of the right places to look (with the useful link) Robert restarted the services and things look better - perhaps the ticket can be closed? In progress (29/6)

LIVERPOOL
122414 (29/6)
Nagios failures after Liverpool's abrupt power "cut" last week - some services still seem unhappy. In progress (4/7)

122514 (4/7)
Another nagios ticket, for the arc CEs but likely the same underlying reason. Assigned (4/7)

RHUL
122417 (29/6)
A ticket to the CMS factory admins, asking for the entry for some RHUL resources into the system. There were some teething issues but they were fixed, and the factory overseers have asked if these endpoints are ready to be put into the production glidein factories? Assigned (should be something else) (1/7) Update - Govind has replied so looks like this ticket will be done soon.

QMUL
120204 (15/3)
LHCB trouble submitting to QM's dual stack CE, due to problems outside the site's control. Looking at the related ticket (120586) The ETA on a patch that should fix this behavior is the 8th of this month. On hold (25/4)

IMPERIALbr /> 122515 (4/7)
A very fresh ticket - CMS have complained that the permissions on a file folder are wrong. Daniela replies that this was on purpose - Imperial hasn't supported Heavy Ion data and asks why the change? Waiting for reply (4/7)

EFDA-JET
122198 (17/6)
The decommissioning of the EFDA-JET grid site ticket. Full switch off is scheduled for the 25th of August, downtime commences from the 12th of July. In progress (21/6)

121899 (1/6)
EFDA-JET ROD availability ticket. A bit of a moot point! On hold (28/6)

TIER 1
121258 (6/5)
Decommissioning of one of the RAL WMSes. Access stopped last week as announced. In Progress (28/6)

119841 (1/3)
HTTP support for lcgcadm04.gridpp.rl.ac.uk, currently being referred to the developers. No news for a while, any movement behind the scenes? On hold (26/4)

120350 (22/3)
Enabling LSST at RAL. At last check the remaining (large) hurdle was deploying the users to the worker nodes across the site. Any joy? In progress (6/5)

121687 (20/5)
Packet loss for the RAL perfsonar, investigation is awaiting a known network intervention which will replace a router. Do we know when this work will take place? On Hold (23/5)

122364 (27/6)
Getting cvmfs support for the solidexperiment.org VO. Catalin has setup /cvmfs/solidexperiment.egi.eu as the egi.eu namespace is the most suitable, Daniela gives the thumbs up for this and has set the ticket on hold for a fortnight pending other work. On Hold (29/6)

120810 (13/4)
The neat decommissioning of srm-biomed.gridpp.rl.ac.uk, reopened pending removal from the bdii. In progress (24/6)

117683 (18/11/15)
Getting CASTOR to publish GLUE2 information. No news for a while on this, could do with an update (even if it's a null update). On Hold (5/4)

NGI
119995 (7/3)
NGI uncertified site ticket, Jeremy is on it. In progress (28/6)

Monday 27th June 2016, 11.00 BST
25 Open UK Tickets this week - down a lot from last week.

VOMS admins doing us a Solid
122263 (Manchester)
122337 (Imperial)
122336 (Oxford)

FYI - setting up the VOMs servers for the solidexperiment.org, so it will soon be ready for us to support.

EFDA-JET
122198 (17/6)
Decommissioning EFDA-JET - the broadcasts have been sent out. Assigned (should be in progress) (21/6)

NGI
119995 (7/3)
Uncertified NGI sites for the chopping - this could do with at least an update. (17/5)

SUSSEX
118289 (10/12/15)
Pilots at Sussex. We managed to have a little look at this last week, but it was only a little one. There was a recent re-yaiming which might have changed the landscape somewhat, and some pilots seemed to be getting through. Waiting for reply (23/6)

(linked to the Sno+ version 121797)

BRISTOL
120455 (29/3)
CMS validation of be HTCondor CE - CMS have asked for an update. In progress (17/6)

LSST Tickets
120350 (Tier 1)
120351 (Glasgow)
Any news?

BIRMINGHAM
121125 (28/4)
Missing ATLAS dumps at Birmingham - no news for a while. Have you tried uploading them with rfcp? Has anyone got xrdcp to work for uploading their dumps automatically? In progress (1/6)

Monday 20th of June 2016, 13.00 BST
32 Open UK tickets this week- just doing the highlights due to HEPSYSMAN.

So long JET
122198 (17/6)
EFDA-JET are decommissioning, this is the ticket tracking that process. We should make sure that things keep on track. Assigned (17/6)

LANCASTER
122188 (16/6)
It might not be news to others, but Lancaster ran afoul of LHCB's newish 4GB job memory requirements, causing some job failures when they ran out of memory (we were only allocating 3GB per job). Should be okay now though with an increased memory allocation per job. Solved (20/6)

RHUL
121575 (16/5)
Happy news, this ROD availability ticket looks like it can be closed. In progress (17/6)

BRISTOL
122172 (15/6)
Bristol of hit on the "classic" problem of nagios tests timing out before the corresponding job is scheduled. Lukasz wonders if artificial job reservation is going to be need to stop this from happening. In progress (16/6)

QMUL
122193 (16/6)
Multiple DNS entries for the QM (dual stacked) perfsonar hosts are causing intermittent test failures, Duncan points to documentation that asks that perfsonar hosts only have one hostname.

Monday 13th June 2016, 16.00 BST
29 Open UK Tickets this week (down from 35 last week).

Not much exciting going on on the ticket front, and I have to send my apologies for today's meeting.

Here's the link to the UK tickets: http://tinyurl.com/nwgrnys

There are some cries for help on the ticket front, Kashif has already asked for help with the Oxford httpd ticket 122069, and a forewarning that Jeremy M will be seeking guidance with the Sussex issues at next week's HEPSYSMAN (118289 & 121797).


Monday 6th June 2016, 14.00 BST
35 Open UK Tickets this month.

NGI
121987
The NGI's very urgent reponse times for May weren't up to par - a dug into the reason why and updated the ticket. In progress (6/6) Update - solved once explanation given

119995 (7/3)
Uncertified NGS sites to clear up - Jeremy has been on it. In Progress (17/5)

SUSSEX
118289 (10/12/15)
Pilot ticket - Jeremy M thought he had got it, but Daniela's tests say otherwise (although the errors look like the CE playing up). In progress (26/5)

121797 (26/5)
Sno+ dirac jobs failing at Sussex - looks like the same problem as above. No word from the site - I'll poke Jeremy M offline. Assigned (26/5)

120735 (11/4)
Low availability ROD ticket. Hopefully Sussex will have a clear month. On Hold (6/6)

OXFORD
121641 (18/5)
Wrong capacities reported in REBUS - this ball has landed in Oxford's court, with Pete G looking at the SE publishing. Assigned (I set it In Progress) (3/6)

121924 (2/6)
An interesting ticket from Duncan, concerning a drop in perfsonar throughput performance at Oxford. Still just Assigned (2/6)

BRISTOL
120455 (29/3)
Validation of a new HTCondor CE at Bristol by CMS. At last check Bristol were testing the CERN accounting daemon, but that was a few weeks ago. Any news? In progress (9/5)

121989 (6/6)
Super-fresh ROD ticket (.glexec.CREAMCE-JobSubmit tests). Assigned (6/6) Update - solved 10 minutes after a wrote this.

BIRMINGHAM
121125 (28/4)
Missing ATLAS SE dumps. At last check Matt W was having troubles with xrdcp-ing the dumps into his DPM, Alessandra suggested that others succeeded using rfcp (with the caveat that rfcp might not be around much longer). In progress (1/6)

GLASGOW
120135 (11/3)
HTTP support ticket. Any news? An update would be nice, no matter how vacuous. On holding the ticket would be even better. In progress (7/4) Update - Solved, tests are green.

121929 (2/6)
Glasgow's SE "not working" for Biomed- which it isn't meant to - but biomed support was still being published. Gareth is sorting that out, and will close the ticket once the Biomed references are purged. All good. In progress (3/3)

120351 (22/3)
Enabling LSST at Glasgow. I'll repeat Alessandra's question in the ticket - any news? On hold (5/5)

EDINBURGH
121465 (11/5)
ROD Availability ticket, just waiting for time to pass. On hold (31/5)

121990 (6/6)
BDII issues caused a few ROD test failures - Marcus fixed things around lunchtime, hopefully they heal up soon (to answer Marcus's question, I believe BDII changes typically take 2-3 hours to fully propagate). In progress (6/6) Update - can likely be closed, tests are green again.

120004 (7/3)
ROD test failures for the ARCHER test CE. The last update has Andy asking if the ticket could be put on "long term hold"? That's if someone can't manually edit the gocdb to set this service "Monitored=N, Production=Y". On hold (24/5)

SHEFFIELD
121991 (6/6)
A fresh ROD ticket, srm tests were failing. Elena freed up some space and things should be good now. Waiting for reply (6/6) Update - tests are passing, another for the solved pile?

LIVERPOOL
121759 (25/5)
Another Availability ROD ticket. John identified the cause as likely due to the DPM publishing problems after a cert upgrade (that's got me too before). Just needs time to heal these wounds now. On hold (27/5) Update- actually the ticket isn't on hold, but it should be...hint...

RHUL
121575 (16/5)
Yet another availability ticket (May was not kind to the UK). Likely needs On Holding whilst the metrics "fix" themselves - if the underlying problems have passed. In progress (16/5)

QMUL
120352 (22/3)
LSSt support at QM. Dan reports today that LSST should be enabled on three CEs. Nice one. Waiting for testing. (6/6)

120204 (15/3)
LHCB problems, due to an issue submitting jobs to dual stack CEs from CERN. The referenced issue (120586) has had a priority bump and a few extra parties cc'd in, so hopefully there will be some movement on it. On Hold (25/4)

BRUNEL
121573 (16/5)
ROD BDII issue ticket - possibly due to multiple site BDIIs (although I was under the same impression as Daniela). Kashif has opened a related ticket (121760) which Raul commented on today to asj for confirmation the issues are related). On hold (27/5)

121813 (27/5)
Brunel failing CMS validation - likely due to cvmfs playing up on two nodes. The ticket has turned a conversation on CMS site settings, and seems to be chugging along fine. In progress (6/6)

EFDA-JET
121899 (1/6)
Low availability JET ticket. Assigned (1/6)

121837 (30/5)
JET SE not working for biomed. I thought that JET stopped supporting Biomed a while ago, I'll need to check my notes. Assigned (30/5)

100IT
121189 (2/5)
A ticket I don't understand! Waiting for reply (16/5)

TIER 1
119841 (1/3)
HTTP support at the Tier 1 - on hold awaiting dev support. On Hold (26/4)

121687 (20/5)
Another perfsonar performance ticket from Duncan. A router that could be the cause is due to be replaced, things will be looked into in more detail after. On hold (23/5)

121894 (1/6)
A request for the Tier 1's plans to deploy a "LHCOPN IPv6 Peering, incl. dualstack Perfsonar". The upcoming router replacement is a blocker for this. In progress (1/6)

120810 (13/4)
Biomed requiring a bit of extra reassurance during the decommissioning of their volume. In progress (20/6)

120350 (22/3)
Enabling LSST at RAL. Things were looking good, but it looks like progress stalled rolling out the VO to the workers (aka the hard bit). Any news? In progress (6/5)

121322 (10/5)
A Sno+ user having trouble accessing files at the Tier 1. Whilst the issue appears to be fixed for the example file, the user lists a few more fules which they have trouble downloaded a subset of. Reopened (3/6)

117683 (18/11)
Castor not publishing glue 2. Awaiting some background dev work. Any news? On hold (5/4)

DECOMMISSIONING TICKETS
120664- GenScratch Disk Pool at the Tier 1.
121258- WMSes & LB at the Tier 1 (I previously misadvertised this as being a Glasgow decommissioning ticket, thus revealing to everyone my secret- that I don't actually properly read every ticket.
All handed perfectly. Friday 27th May 2016

Matt's on leave w/c 30th May, he'll be back the week after for a full ticket review. Hopefully by then there will be fewer tickets for him to report on. That would make Matt a happy chappy!

In the mean time here's a link to all the UK tickets.
And he's the link to the Other VO Nagios.

Monday 23rd May 2016, 15.00 BST
37 Open UK Tickets this week.

Concentrating on tickets that look like they can be closed (if not now then soon):

TIER 1: 120954
This LHCB ticket to clean up DNS aliases looked to have the hard parts done.

TIER 1: 121698
CMS failures over the weekend, solved by increasing the max file limit by a factor of 10. Looks like this sorted the problem.

RALPP: 118628
Daniela reports that (after their voms change) LZ jobs submitted to RALPP okay - so maybe this LZ ticket can be wrapped up?

SUSSEX: 120714
This ROD ticket looks sorted for Sussex, Gareth has asked the site to set it to solved. Update - solved

RHUL: 121231
An LHCB ticket, problems were found and solved, pilots are flowing once again. Mark gives the thumbs up to solve the ticket.

GLASGOW: 120973
Ticket tracking the retirement the WMSii and LB. I suspect the Glasgow chaps know to (and are looking forward to) closing this ticket once you've completed the last few steps.

QMUL: 121574
It looks like the alarm triggering this ROD BDII ticket disappeared on its own, so feel free to close the ticket (as Gareth suggested).

LHCB VOFEED tickets
ECDF: 121360
BRUNEL: 121388
Both these VOFEED tickets have asked for feedback from lhcb on what way to proceed.

TIER 1 SNO+ TICKETS
120920
121322
Snoplus have two open tickets with the Tier 1 regarding file access - the first is regarding xrootd problems, the second accessing files from tape. Both tickets could do with an update, I believe both tickets have their root cause in Castor not playing ball.

Update - Birmingham
121125
Atlas dumps ticket for Birmingham - Matt reports he's trying to get the xroot to upload the dumps locally into the DPM. Did anyone have success with this?

Monday 16th May 2016, 15.00 BST
42 Open UK Tickets this week.

GOCDB/VOFEED mismatch tickets
There are 7 open tickets left from last week's campaign to clean up the VO tags featured in the gocdb. Only the Birmingham ticket is still in the "assigned" state, the rest are undergoing discussion or requesting feedback/clarification.
BIRMINGHAM 121450
RALPP 121464
LIVERPOOL 121394
BRUNEL 121388
BRISTOL 121386 Update - closed, thanks Winnie!
ECDF 121360
RHUL 121421

QUESTIONING ROD
121465(11/5)
This ECDF availability ticket is "on the mend", but Andy has asked how the numbers are calculated. Waiting for reply (16/5) (This goes in hand with Andy's question in ECDF's other ROD ticket 120004)

120714 (9/4)
I think this Sussex ROD ticket is solved, the link to the tests looks green (in a good way). I think it can be closed? In progress (28/4)

OXFORD 120019 (7/3)
Talking of tickets that probably can be closed, I think this CMS subscription change request issue is solved? Either way it could do with an update. In progress (29/4)

RHUL 121516 (12/5)
A biomed ticket, possibly the same networking problems affecting them that affected atlas jobs (121540). It looks like this ticket snuck past your sentries, and could do with acknowledgment. Assigned (12/5) Update- updated and in progress, hope the networking problems go away.

BIRMINGHAM
121125 (28/4)
Did you chaps have any luck getting your dumps working? Taking a peek myself I see that your dumps directories are still empty. Let us know if you need a hand. In progress (4/5)

Any there any other tickets or issues people want bringing up?

And finally, the Other VO Nagios...

Monday 9th May 2016, 13.00 BST
39 Open UK Tickets this month

So long and thanks for all the jobs - decommissioning tickets.
120973 (Glasgow, 2 WMSes and an LB).
121258 (Tier 1, just one WMS).
120664 (Tier 1, GenScratch disk pool).
Not much else to say, nothing to see here. Move along...

NGI
119995 (7/3)
Cleaning up old uncertified NGS sites. Any joy Jeremy? In Progress (18/4)

NEUGRID CVMFS STRATUM PROBLEMS
121179 (2/5)
The neugrid stratum at the Tier 1 isn't behaving - no site was notified with this ticket so it likely dodged people's notice. I sent it RAL's way- feel free to bounce elsewhere if it isn't a problem at the Tier 1. Assigned (9/5) Update - the submitter confirms things are fixed, it looks like the ticket can be closed.

SUSSEX
Ops tests woes:
121028 (25/4) -cream CE
120735 (11/4) -Availability
120714 (9/4) -CA distro.
Being handled as best Jeremy M can - it looks like the last two issues are on the mend. Not sure about the first one.

118289(10/12/15)
gridpp pilot role ticket. No news for a while, but hopefully a familiar face will sweep in and save the day soon. On Hold (25/1)

RALPP
120282 (18/3)
Atlas-centric HTTP support ticket. Chris is putting the site in downtime next week to upgrade the dcache hardware and version, and we'll see how this looks after. On hold (6/5)

118628 (5/1)
LZ pilot ticket. No news after the testing the test version of Arc didn't go so well, and so Chris decided to wait until they have a newer umd4 CE to try it out on, or at least until the fix makes it into the proper repos. The reminder date has passed, any news? On Hold (22/3)

OXFORD
120019 (7/3)
CMS federation subscription change for Oxford. Kashif has worked on this and it looks like it might be fixed. Any news? In progress (29/4)

121139 (22/4)
Enabling skatelescope.eu on the Oxford VOMS. Kashif kicked it but Robert's tests didn't work, so debugging is ongoing. In progress (6/5)

BRISTOL
121024 (25/4)
CMS transfer problems. Phedex was upgraded, but a few more problems with some dodgey datasets came up - Lukasz seems to have it all in hand though. In progress (6/5)

120455 (29/3)
A spot of self-ticketing, here Lukasz asked CMS to validate their new HTCondor CE. A lot of conversation in ticket (some regarding CMS multicore), the last entry has Lukasz looking at the cERN Condor accounting daemon. Assigned (could do with being changed to a different status) (9/5)

BIRMINGHAM
121125 (28/4)
The atlas storage dump is missing at Birmingham - Matt is looking for it (I had more trouble then I should have setting up this cron job at Lancaster - I forgot my 'nix-admining basics! The shame!). In progress (4/5)

120948 (20/4)
Ops availability ticket, on hold whilst things recover - naught to see here. On Hold (20/4)

GLASGOW
120135 (11/3)
Another atlas-centric http TF ticket. The ticket could do with an update/on holding. In progress (7/4)

120351 (22/3)
Enabling LSST at Glasgow, on hold awaiting the new identity management system[1]. Alessandra posted a helpful link here - how goes things? (5/5) Update - I noticed that 117706 (enabling pilots for pheno and friends) is done so hopefully this is just a roundtuit?

[1]Robin's started working on a CentOS7 argus sever build with ansible at Lancaster if that's relevant to your, or anyone else's, interests.

ECDF
121227 (4/5)
A crusty cream CE is causing ROD Ops test failures at ECDF - Andy and Marcus are deciding its fate. In progress (5/5) Update - the immediate issue was solved, and the ticket closed.

120004 (7/3)
The ARCHER facing test CE suffering ROD failures. Was a decision reached about whether or not to put the service in downtime or similar? I see the CE is in a short downtime at the moment. On Hold (25/4) Update - Andy is unsure what to do and has asked for some advice, or if perhaps a special case can be made for this CE in the monitoring/gocdb.

121285 (8/5)
Fleeting atlas transfer problems, caused by a network blip. The blip has passed, and Marcus asks if there are any more problems seen? Waiting for reply (9/5)

SHEFFIELD
121279 (7/5)
Atlas transfer failures - Elena noticed that the files don't actually exist at Sheffield and will declare them lost forthwith. In progress (8/5)

MANCHESTER

120998 (22/4)
skatelescope.eu VO creation ticket, nearly done. On Hold (4/5)

120430 (24/3)
Enabling Icecube VO at Manchester. It seems quite involved (gpu jobs sound quite exciting!), things look to be moving along nicely. In progress (5/5)

RHUL
121257 (6/5)
ROD ticket for multiple problems - a CE fell over and is being looked at (the CE problems might explain the BDII failures). In progress (6/5)

121231 (5/5)
LHCB pilots dying at RHUL. After finding a few problems at fixing them Govind wonders if problems persist. Waiting for reply (8/5)

QMUL
121245 (5/5)
Friday ROD issues - looks like multiple CEs were/are having a bad time of it. Assigned (5/5)

120352 (22/3)
Enabling LSST at QM. Alessandra posted the link to the information that Dan asked for. In Progress (5/5)

120204 (15/3)
The well-understood problem with lhcb jobs submitting to QM's dual-stack CEs. Waiting on 120586, where there has been no news for a month, although the last entry seemed positive. On Hold (25/4)

100IT (for 100% completeness)
121189 (2/5) - Being handled.
121271 (6/5) - Assigned
(interestingly this ticket asks for support for dteam as a child of 121262).

And Finally...

THE TIER 1
120810 (13/4)
Biomed asked that their castor storage pool that's being decommissioned (see 120664) be set to read-only prior to the decommissioning date. Gareth pointed out that this request is redundant, as the disk pool is set to be made read only as detailed in the decommissioning announcement. On Hold (27/4)

120350(22/3)
Enabling LSST at RAL. Andrew L reports good progress, still some work to go through. In progress (6/5)

https://ggus.eu/?mode=ticket_info&ticket_id=120920 (19/4)
Sno+ having xrootd problems at RAL. A lot of back and forth going on, the issue is being worked on. In progress (6/5)

117683 (18/11/15)
Castor not publishing glue2. This is being worked on slowly in the background, requires no small amount of dev work. On Hold (5/4)

119841 (1/3)
HTTP support ticket from the HTTP TF. On Hold whilst the developers are consulted. On Hold (26/4)

120954 (21/4)
SRM endpoint simplification for LHCB. At last check it looked good to remove the old alias, with a thumbs up from LHCB. Waiting fore reply (should be "In progress" I think) (3/5)

121147 (29/4)
CMS file reading failures at the Tier 1. Andrew L checked things and they looked okay, and asked for some clarification and extra information but no word back. Waiting for reply (29/4)


Tuesday 3rd May 2016, 10.00 BST
36 Open UK tickets this week.

The bank holiday through me off, but here's what a brief dredge of the tickets this morning dragged up:

NGI (TIER 1?)
121179
I think this ticket about the neugrid.egi.eu cvmfs is meant for the Tier 1 Stratum-1 admins (citing a problem with cvmfs-egi.gridpp.rl.ac.uk).

GET YOUR SKATELESCOPE.eu ON
120998
skatelescope.eu was the name settled on for this VO, IC and OXFORD have got child tickets to roll out the new VO to the backup VOMSESeses.


RALPP
121155
CMS noted that the RALPP PheDex agents decided to take the bank holiday off too. Assigned (29/4)

OXFORD
121175
Oxford got a ticket due to ATLAS using up all their space - as discussed many a time this is not a site problem - thanks to Elena for defending the site's honour.

LIVERPOOL
121092
As seen on TB-SUPPORT when Steve put a call out for advice, Liverpool were/are seeing multicore atlas jobs fail due to a lost heartbeat. Alessandra's digging revealed batch system memory restrictions as the likely culprit, but we can chat about it if it doesn't get brought up elsewhere.

QMUL
120352
Enabling LSST at QMUL - Dan has asked for some LSST details: "What's the software directory? Is it available via cvmfs? Typically how many accounts have you set up at other sites (10 / 50 100) ? No production role needed?". Waiting for reply (29/4)

similarly:
TIER 1
120350
The Tier 1 LSST ticket, this may contain the answers that Dan seeks - as Alessandra notes some VO information seems to have once again disappeared from the Ops Portal.

ECDF
120004
ROD ticket for the Archer-fronting CE, which doesn't really work but needs to look like it's in production for atlas to send tests. How long before this becomes a problem for the ROD Dashboard? Could ATLAS jobs be easily forced to a service in downtime?

GLASGOW
120973
WMS and L&B decomissioning ticket. The Ticket Pedant is saddened by the unchanged default status of this ticket...


Monday 25th of April 2016, 15.30 BST
31 Open UK Tickets.

A NEW CHALLENGER APPEARS
120998 (22/4)
Squire McNab has ticketed himself (which always feels like a weird thing to do) to set up the skatelescope.eu VO on the Manchester VOMS. No doubt many of us will be interested in enabling a SKA VO. Assigned (22/4)

A FEW FEWER WMSes
120973 (21/4)
Glasgow have announced the retirement of their WMSes and Logging and Bookkeeping server at the end of next month, with the Downtime starting in a fortnight (9/5). Assigned (Oh Hold or In Progress it?) (21/4)

The Tier 1 has a few tickets that peaked my interest:
120954 (21/4)
LHCB would like to amalgamate their endpoints at the Tier 1 - bringing the tape and the disk behind the same name. Brian rounded it out with a question- I think for LHCB. In progress (should be waiting for reply?) (25/4)

119841 (1/3)
This HTTP support ticket almost certainly looks like it should be On Hold, possibly awaiting some development work. In progress (22/3)

Talking of On Hold:
120204 (15/3)
This LHCB ticket for QMUL looks like it should be put On Hold, as it is awaiting an external fix that's outside the site's control (see ticket https://ggus.eu/?mode=ticket_info&ticket_id=120586). In progress (14/4)

And finally:
120019 (7/3)
A CMS ticket asking for a change of federation subscription for Oxford. I know Kashif and Pete are looking at it, but do you need a hand from someone who knows the arcane CMS ways? In progress (5/4).


Monday 18th April 2016, 15.30 BST
33 Open UK Tickets this week.

RALPP having a bad time?
120872 (cms)
120879 (lhcb)
I hope everything's not too (or at all) melty at RALPP. Both tickets still just assigned.

Update - there were bad times, caused by the condor collector filling up its filesystem, but things should be sorted now and both tickets are solved.

BIRMINGHAM
120860 (15/4)
Biomed are once again finding that they're running out of room at Birmingham. It seems like they either are very unsure of what data their users may or not be producing, or have (possibly unrealistic) views on what other user groups can and should be doing with their data. Assigned (15/4)

MANCHESTER
120706 (8/4)
This Biomed ticket looks like it took the Low Road whilst you were taking the High Road, missing each other along the way. Assigned (13/4) Update - In progress, Biomed have been purged from the Manchester information system and Alessandra has asked for the site to be removed from any static lists.

TIER 1
120664 (7/4)
The ticket tracking the retirement of one of the RAL disk volumes (this one supporting biomed, na62 and mice). All above board, but it could do with being set in progress or on hold. Assigned (7/4)

120810 (13/4)
I think related to the above ticket, Biomed have asked that write access be removed to their volume. In Progress (13/4)

120624 (5/4)
Atlas Consistency Checking Ticket - I don't think this should be in "waiting for reply" any more. Waiting for reply (13/4)

119841 (1/3)
HTTP Task force ticket. No news for a while, but it looked like the situation might be a complicated one to fix - perhaps the ticket needs on holding whilst its sorted out? In Progress (22/3)

Monday 4th April 2016, 14.00 BST
26 Open UK Tickets this month.

NGI
119995 (7/3)
Uncertified site ticket for the UK - Jeremy is on the case, and there appears to be no need to rush. In progress (4/4)

120588 (4/4)
A fresh ticket, saying we have achieved insufficient "Quality of Support performance" - we had an average of a 1.4 day response time for very urgent tickets during March.

I've looked into this using the ggus report viewer and I believe we're being accused of a crime we only technically committed (if I'm looking at things right). We only had 2 "very urgent" tickets in this period, and one of them the site forgot to put In Progress, so had an erroneous response time of two and a half days. When averaged with the single other very urgent ticket this gave us an average response time > 1. Poor statistics is a right blimmer. I've updated the ticket - which was solved whilst I wrote the report.

The take home from this - please remember to set your tickets In Progress! It does actually matter (kinda).

SUSSEX
118337 (14/12/15)
Sussex Storage down for Sno+ - I assume this is still the case? Jeremy M replied a while ago but no news since. On Hold (15/2)

117894 (23/11/15)
One of the last Atlas Consistency Checking tickets - in a similar state to the former. On Hold (25/1) Update - Solved by Alessandra, can make do without for Sussex

118289 (10/12/15)
gridpp pilots at Sussex- again no news. On Hold (25/1)

I was supposed to poke the Sussex tickets before Easter but local things came up - I will prod them after tomorrow's meeting if we don't get a chance to discuss them during.

RALPP
118628 (5/1)
LZ support at RALPP. Chris tried to roll out the LZ-friendly test version of ARC to a production server but hit a roadblock and had to rollback. Chris is waiting on the fix to go out into the proper repositories, and is interested to see how things fair on a test centos7/umd4 ArcCE he has brewing (no pun intended). On hold (22/3)

120282 (18/3) Atlas HTTP taskforce ticket. Chris has asked that the tests be re-aimed at another, less-loaded server. Waiting for reply (1/4)

OXFORD
120019 (7/3)
A CMS ticket asking the Oxford T3 to change its xrootd federation subscription. Ewan was the chap who first-responded to this ticket, quiet since - it needs some attention. In progress (7/3)

117892 (23/11/15)
The other holdout of the Atlas Storage Consistency Checking tickets, and again in a similar state. In progress (24/3)

120345 (22/3)
At atlas ticket asking Oxford to update their xroot monitoring settings. Kashif battled this issue with Ilija's help, and with luck it can be closed. In progress (31/3)

BIRMINGHAM
119957 (4/3)
A ROD availiability ticket after their SE DB crisis, just waiting to for the alarms to go green. On hold (31/3)

GLASGOW
117706 (19/11/15)
Pheno (and other?) pilots at Glasgow. Gareth reports that they should have their new identity management system up and running soon (it it arrived on time). On Hold (23/3)

118052 (30/11/15)
ATLAS HTTP Taskforce ticket. Reopened just before Easter after tests started failing with TLS issues. Reopened (24/3)

120351 (22/3)
The first on a few enable LSST tickets - On Hold until the new identity management system is up and running. On hold (23/3)

120135 (11/3)
I'm not entirely sure why you chaps got a second http TF ticket, but you have (for a slightly different issue). In progress (1/4)

EDINBURGH
120004 (7/3)
ROD ticket for the test ARC CE fronting ARCHER, where tests fail as expected. I remember years ago being among many who couldn't think of a good reason to keep the "Production=yes, Monitoring=no" option, so they got rid of it - but it would perfectly apply here. How long can the ROD keep this ticket on hold before the dashboard self-destructs? On hold (29/3)

SHEFFIELD
118764 (12/1)
Another HTTP TF ticket. Elena kicked the services a while ago, but no news since (and the tests are still not passing by the looks of things). In progress (24/2)

114460 (18/6/15)
gridpp pilots at Sheffield. Did you get round to having a look at this? In progress (29/2)

MANCHESTER
120430 (24/3)
Ticket tracking setting up Manchester for Icecube glideins (the coolest of VOs...). It opens with a request to the Manchester site admins to enable their user (looks like just the one pilot DN), but no reply (as the Mancunians might have missed that the ticket has turned on them). Assigned (24/3)

LANCASTER 120412 (24/3)
Atlas deletion errors at Lancaster - caused by a few files badly drained back in 2014. I'm trying to figure out a clever, database-y way of listing all the files on these long gone servers (the best I've got so far is `select * from Cns_file_replica where host like 'fal-pygrid-%';`, but of course the dpns mapping isn't that straightforward. Expect a cry for help soone! In progress (4/4)

RHUL
119509 (12/2)
Sno+ job directories being cleaned up prematurely. It looks like this problem could have been transient - Matt M submitted some test jobs and didn't see the problem, and is re-testing with some proper work. Hopefully those tests completed okay. In progress (22/3)

QMUL
120352 (22/3)
Request to enable LSST at QM. Dan has asked for a reminder after/during GRIDPP36. On hold (24/3)

120204 (15/3)
LHCB having issues with some of the QM CEs. The reasons for this are unclear - pilots stopped around the start of March and the problem persisted at last check. In progress (17/3)

THE TIER 1
117683 (18/11/15)
CASTOR not publishing GLUE2. It's being worked on in people's spare time - any recent news? If not, maybe progress is slow enough to warrant on-holding the ticket. In progress (17/2)

119841 (1/3)
HTTP TF ticket, this time for LHCB. Proxy functionality isn't working (although regular cert/key pair access is okay) - this functionality was never turned on and is being looked into. In progress (22/3)

120350 (22/3)
Request to enable LSST at the Tier 1. Daniela notes that the Tier 1 will likely hit the same problem as RALPP for LZ (118628), Andrew L concurs. Pool accounts have been requested, things chug along nicely. In progress (22/3)

Monday 21st March 2016, 15.15 GMT
29 Open UK Tickets this week.

After Ewan
Now that Ewan's living it up at his new job the Oxford tickets might need extra shepherding - let us know if you need help Kashif. The tickets are:

117892 (23/11/15)
Atlas consistency checking ticket. On Hold (16/3)

120019 (7/3)
CMS federation subscription change request. In progress (7/3)

120052 (8/3)
HTTP TF ticket. It appears to be looking hopeful though. In progress (14/3)

Whilst we're talking HTTP TF:
GLASGOW 120135 (11/3)
Looks like this ticket has snuck by, or maybe you chaps just never got roundtuit. Assigned (11/3)

SHEFFIELD
117886 (23/11/15) Atlas consistency check ticket - Elena's working on it, but the dump script fails as her DPM has run out of connections. Odd. In progress (21/3) - Update already - Elena ramped up the number of connections in my.cnf and things started working - just having trouble uploading the dumps now.

And I don't like to nag but the other two Sheffield tickets could do with an update:
118764 (http tf) and 114460 (pilot rollout)

QMUL
120204 (15/3)
A dearth of LHCB pilots at QM. Dan spotted that *something* broke at the start of March, and handily gave a list of suspects. Not sure which one is spoiling things though... In progress (17/3)

And that's all from me. The SUSSEX tickets will need chasing up again, I'll do that - plus the NGI ticket 119995 is a bit quiet. Finally, thanks to Alessandra for wrangling the Atlas Consistency Checking tickets.

Update - the RHUL Atlas Consistency Checking ticket looks on the verge of closure: 117881

Other VO Nagios looks clean. Nice one!

Monday 14th March 2015, 14.00 GMT

27 Open UK Tickets.

The Highlight(s):
The HTTP TF Tickets to DPM sites have mostly been reborn, seemingly changing tack from "http ain't working on your DPM" to "this ain't working all that well on your DPM - probably due to https".

The take home message from these tickets is:

"The DPM team strongly recommends disabling https on the disk servers. It is frequently a source of problems and has a significant performance penalty. Access is still authenticated and authorised on the head node which passes a token to the disk, so the setup is secure."

An example of one of these tickets (Manchester, by virtue of being the most recently updated): 120139

And um, that's it for interesting tickets AFAICS (over 50% of our tickets fall under atlas consistency checks, http TF tickets or rolling out pilot accounts). Let me know if I'm missing some excitement somewhere.

Looking at the other VO nagios... nope, that looks fine too (at time of writing). How peaceful...

Monday 7th March 2016, 14.30 GMT

28 Open UK Tickets this month.

NGI
119995 (7/3)
In some kind of clean up operation 5 old NGS sites that are uncertified have been identified for the "chopping block". Assigned (7/3)

ATLAS CONSISTENCY CHECKING SCRIPTS
SUSSEX 117894 On Hold (25/1)
OXFORD 117892 On Hold (12/1)
SHEFFIELD 117886 On Hold (29/1)
MANCHESTER 117885 On Hold (10/1)
RHUL 117881 On Hold (1/2)
QMUL 117880 Waiting for reply (25/2)

SUSSEX
119383 (5/2)
Low availability ticket - site recovering. On Hold (25/2)

118289 (10/12/15)
gridpp pilots, grounded after Matt RB left. Daniela has reiterated the need for this (as banning the site for the gridpp VO will ban it for snoplus too). On Hold (3/3)

118337 (14/12/15)
Sno+ having problems with the Sussex SE. The Sussex SE has been replaced, which will require some work with the Sno+ LFC (or aliasing magic). On Hold (15/2)

RALPP
118628 (5/1)
Getting LZ pilots working at RALPP. After trying out a patched version of ARC on a test CE there still appears to be a few problems with submission- no update for a few weeks though. In progress (15/2)

120006 (7/3)
A freshly squeezed ROD ticket. In progress (7/3) Update - dcache ws restarted just in case, but not sure what's going wrong. Nagios error messages aren't helpful.

BRISTOL
119930 (3/3)
A CMS user having trouble getting a file - it appears GFAL worked where xrdcp didn't. I suspect this ticket can be closed, the user seemed happy (and very polite!). Assigned (can be closed) (4/3) Update - solved

BIRMINGHAM
118155 (4/12/15)
Biomed problems with the Birmingham SE, ending with them greenlighting the removal of all their dark data (which I believe is all the biomed data still left on the SE). Matt's started the purge. In progress (7/7)

GLASGOW
118052 (30/11/15)
HTTP TF ticket - things seem to be intermittently working, Georgios spotted some interesting issues - but at least right now the SE looks all green. In progress (16/2)

117706 (19/11/15)
A pilot ticket, this one pheno-centric. Waiting on some infrastructure work at Glasgow. On hold (15/1)

ECDF
120004 (7/3)
A ROD ticket to the ARCHER facing ARC CE. Andy knows this will be a problem child, and has asked if there's a way to pull it from the ROD monitoring in a way that will still allow it to look in-production to ATLAS? Waiting for reply (7/3)

SHEFFIELD
118764 (12/1)
HTTP TF ticket. Things look a little odd on the probe page, but there's a fair amount of green. Any news? In progress (25/1)

114460 (18/6/15)
Pilot ticket - Elena rolled out the pilots but things didn't seem to work as intended. Any luck with this last week? In progress (29/2)

LIVERPOOL
119983 (4/3)
Some hardware (RAID) faults on a few pool nodes having been causing problems for some atlas users, but the Liver-lads are fighting the good fight. In progress (7/3) Update - solved. But I personally would like to hear about what hardware was failing in the Storage meeting.

RHUL
119795 (28/2)
Atlas transfer error ticket - fallout from the files lost during RHUL's draining troubles. Being declared lost. In progress (28/2) Update - spawned a ticket to track the cleanup: 120009

119509 (12/2)
Sno+ jobs are occasionally failing at RHUL with what looks to be premature sandbox cleanup problems. Govind is back in the saddle, and asked that some jobs be sent his way for testing. In progress (3/3)

QMUL
119013 (21/1)
CMS enabling QM and Glasgow as T3s - although the buck seems to have stopped at QM. After a lot of work it looks like we're waiting on the production team to greenlight the two sites. We might want to chase them up sooner rather then later. Waiting for reply (29/2).

IMPERIAL
119617 (19/2)
The CMS multicore adventure at Imperial. The jobs have run, so that looks good - CMS have asked if there is any form of reservation at the site, to which Simon replied with a resonating "kind of". Waiting for reply (7/3)

100IT
116358 (22/9/15)
Ongoing problems with missing images - work is still continuing this, but I won't go into it. In progress (2/3)

TIER 1
116864 (12/10/15)
CMS AAA test problems. CMS report that things seem to look better this week (EU redirector open and read tests are OK), and wonder if anything has changed? Has it? In progress (23/2) Update - Andrew L reports nothing changed. Maybe it was the nice Grid Pixies? We don't see them very often!

117683 (18/11/15)
CASTOR not publishing GLUE2. Jens reports that there's not been slow progress due to lack of time and ongoing CASTOR upgrade work, but slow progress is better then no progress! In progress (17/2)

Monday 29th February 2016, 15.00 GMT
Link to the 31 Open UK Tickets

A light review this week, some notes:
Still nothing from atlas on the Storage Consistency Check tickets-the ball is firmly in atlas' court.

Sheffield has two tickets that need some love:
118764 (http support)
114460 (pilot rollout)

Plus this Birmingham Biomed ticket has been left hanging (after Biomed gave the go ahead for purging their dark data at the site):118155.
(although I appreciate that Matt has had bigger fish to fry recently! I don't envy having to restore your DPM DB).

Helios is expiring: The Helios VO has hit a spot of bother and asked the Manchester VOMS admins to do...something. Robert has asked for clarification: 119363

And that's all I'll go into.

Looking at the other VO nagios

I see some persistent failures for pheno and t2k with the Imperial SE - a getTURLS failures (failing on the http protocol). I saw something like this at Lancaster but for the life of me can't remember what we fixed. Still I don't think this is a functional functional test!

Monday 22nd February 2016, 15.30 GMT
37 Open UK Tickets this week.

NGI
118930 (18/1)
This information system ticket really needs some attention. Assigned (19/1)

CMS Multicore
Brunel: 119618
Imperial: 119617
RALPP: 119616
CMS are to be rolling multicore pilots soonish and requested some information to set up their test queues with. Brunel might have missed the ticket, the other two are chugging along nicely. Update - Brunel's updated their ticket, so all's good.

Whilst we're talking CMS
119013 (21/1)
This ticket (wrongly assigned to just QMUL at the moment) seems to have become an odd catchall for enabling Glasgow and QM as Tier 3s. The CMS guys seem to think jobs should be flowing/trickling now, so maybe this can be closed? Assigned (18/2)

RHUL
119509
Govind is away and when the admin isn't looking things start breaking - in the case of this ticket Sno+ have disabled submission to RHUL so the ticket should be On Holded (I didn't want to On Hold the ticket myself, as that's a recipe for the ticket getting forgotten about). Or perhaps someone has a suggestion to tackle the problem? Assigned (12/2)

100IT
119534 (15/2)
ROD ticket for 100IT, where they're accused of failing a test that they shouldn't be failing. David opened a ticket about this (https://ggus.eu/index.php?mode=ticket_info&ticket_id=119513) but not received any attention at all - was it submitted to the right group? In progress (22/2)

GLASGOW
118052 (30/11)
HTTP support on the Glasgow SE. You seem to have been "upgraded" to "failing intermittently" (a possible title for my autobiography). Did you change anything to upgrade your status? In progress (16/2)

TIER 1
119389 (5/2)
This LHCB data transfer ticket to the Tier 1 has been waiting for a reply for a while now. Any news from lhcb? Waiting for reply (15/2)

Those 8 Atlas Storage Consistency Check Tickets
A chat about this at the Thursday atlas UK cloud meeting revealed that the chap handling these has gone to Argentina. It was unclear whether this was business, pleasure or as a GGUS fugitive escaping the grumpyness of dozens of site admins.

Updates:

Unsolved but not Unforgotten, the tarball glexec tickets
ECDF: 95303
Lancaster: 95299

Can be solved
Brunel: 119682 This ROD ticket looks like it's sorted now. Good stuff!


Monday 15th February 2016, 13.30 GMT

37 Open UK Tickets.
Link to them all: http://tinyurl.com/nwgrnys

A few highlights:

BRUNEL
118740 (10/1)
Atlas MCORE problems at Brunel. Raul has experimented with restricting MC jobs to nodes where the Condor Memory Checking is disabled, with promising results. Waiting for reply (13/2)

QMUL
119013 (21/1)
Enabling CMS T3 - this ticket has been reopened for QM. Dan has asked for some clarification and information with respect to xroot settings for CMS. The status could do with a tweak... Reopened (12/2)

RALPP
118628 (5/1)
The deployment of LZ pilots hitting an arc bug. Chris has managed to get ahold of and deploy the updated packages on his test CE (impressive turnaround!), and wonders if it works now. Waiting for reply (11/2)

And I think that's it - still a lot of atlas consistency checking tickets that I will mention in the Thursday atlas meeting - although I think Alastair and Brian are aware of them.

Other VO Nagios
I haven't looked at this for a while, the Imperial SE seems to have been seeing problems for pheno and t2k.org for nearly a fortnight.

Monday 8th February 2016, 13.30 GMT
44 43 Open UK Tickets this month. Going over all of them, in kinda-alphabetical order.

NGI
118930 (18/1)
That NGI information ticket, linked to the "wrong" (according to some) information being published by the UK arc CEs. This has haunted us for a while, the consensus was the ticket is a load of B-word and not really worth worrying over - but it does warrant a response (from someone over that Steve J).. Assigned (19/1)

SUSSEX
With Matt RB off to pastures green Sussex is in limbo - I'll contact Jeremy M concerning this last week's fresh tickets.

117894 (23/11)
Atlas Consistency Checking. On hold (25/1)

118289 (10/12)
Gridpp Pilots. On hold (25/1)

118337 (14/12)
The Sussex SE was not working for Sno+ - the most serious of these older issues. On hold (25/1)

119383 (5/2)
ROD Availability ticket. Assigned (5/2)

119384 (5/2)
ROD CA distribution ticket. Maybe the two ROD tickets are correlated (i.e. if we fix this one the previous one will soothe itself?) Assigned (5/2)

RALPP
118945 (19/1)
Poor CMS SAM results for RALPP due to digi-reco work pummeling the RALPP storage - Chris has asked for the digi-reco workload to stop at RALPP, then asked for clarification as to why the site was still in unknown state. Waiting for reply (25/1) Solved - it was them, not RALPP - a restart of the SAM services looks to have cleared the issue,

118628 (5/1)
LZ Pilot deployment at RALPP. Chris has submitted a bug report to nordugrid to fix the issue (http://bugzilla.nordugrid.org/show_bug.cgi?id=3529), which was fixed and should be available in the next release. On Hold (26/1) Update - Chris is trying to get hold of a pre-release to test things.

OXFORD
119197 (29/1)
CMS has asked to change some CRAB site configs at T3s - Daniela has ashed Chris B if he's the one looking after this for Oxford. Assigned (3/2)

117892 (23/11)
Atlas consistency checks. Ewan has firmly and clearly put this on the backburner. On hold (12/1)

BIRMINGHAM
118155 (4/12)
Biomed having a clear up of their stuff on the Brummie SE. Franck has given the nod for deleting the dark data left in the DPM after their cleanup efforts. It's on their heads now! In progress (2/2)

117890 (23/11)
Another Atlas Storage Consistency Checking ticket. Any chance to have a look at this again? On hold (15/12)

GLASGOW
117706 (19/11)
Another pilot ticket, this time for pheno. Glasgow were going to roll this into their overhaul of their identity management gubbins, but the Universe messed with their plans. How goes things? On hold (15/1)

118052 (30/11)
HTTP support on the Glasgow SE. I suspect progress here took a similar shoeing to the identity management plan - but the ticket could do with an update (and maybe on holding). In Progress (4/1)

ECDF
118787 (12/1)
Another HTTP ticket. Let us know if you need a hand Marcus and Andy. Or if you're too busy to make this a priority consider on-holding it. In progress (12/1)

95303 (1/7)
Tarball glexec ticket. On hold for a very long time.

An update on this - I managed to put in some good hours on trying to build a relocatable glexec last week, successfully building from source glexec and the lcas/lcmaps stack. *But* I still have rpath problems - short of attacking every lib file with patchelf I'm not sure how to proceed, and the process is such a mess that I'm not sure if I'll ever manage to make it into a proper recipe (much like my cocoa-butter shortbread).

SHEFFIELD
119374 (5/2)
A fresh ticket from Biomed, about incorrect/no dynamic information being published at Sheffield. In progress (5/2) Update - see Steve B's post to TB-SUPPORT for clues, Elena is retackling these problems today.

118789 (12/1)
ROD Information system ticket, almost certainly caused by the same underlying issue. Is the bdii service on your CEs silently dying or failing to update?

114460 (18/6)
Gridpp Pilots. Changes were implemented but at last check things weren't working right. How goes it now? In progress (20/1)

117886 (23/11)
Atlas Storage Consistency Check ticket - any luck with this? On hold (29/1)

118764 (12/1)
HTTP support ticket for the Sheffield SE. Have you had a chance to have a look at this? In progress (25/1)

The Storage list can lend a hand fixing either of these issues (which goes for everyone of course).

MANCHESTER
118679 (7/1)
HTTP support (atlas edition). Hit a problem due to there being no outside-a-space-token space at Manchester. On Hold (12/1)

118674 (7/1)
HTTP Support (lhcb edition). As above. On Hold (12/1)

117885 (23/11)
Atlas Storage Consistency Checks - hit the same problem as the previous 2 tickets. On hold (10/1)

118603 (4/1)
A VOMS ticket rather then a site ticket, removal of the nsccs.ac.uk VO. The VO has been removed from the other UK voms servers. In progress (5/2) Update-solved

LANCASTER
95299 (1/7)
Lancaster's glexec tarball ticket. See the entry above - although I really need to update the ticket properly! Practice what you preach, Matt! On hold.

RHUL
119380 (5/2)
ROD Low availability ticket - the site is in the green now, so it's the usual 30-day wait. On hold (8/2)

117881 (23/11)
Atlas SCC ticket. On hold until March. On hold (1/2)

QMUL
117723 (19/11)
Pilots at QM. Dan's been working on this, and asked Daniela for a picture of what should be enabled[1] - Any joy? In progress (27/1)

[1] http://www.hep.ph.ic.ac.uk/~dbauer/dirac/site_pilot_status.html

117880 (23/11)
Atlas SCC ticket (wish I had started using that acronym sooner). Just waiting for the nod from atlas that all is well. Dan included the script he uses that may be useful for other STORM sites. Waiting for reply (4/2)

118985 (21/1)
QM has banished biomed from their queues until QM have a cgroupy solution to the ill-behaved biomed user jobs. Biomed have asked that the ban be reconsidered and problem users by dealt with by the VO. QM are perfectly right to say no to this, but it'll be nice to not leave them hanging. On hold (1/2)

119348 (4/2)
LHCB have noticed cvmfs issues on some nodes, which Dan couldn't replicate. Dan ponders that perhaps this is caused by ephemeral memory issues on the nodes, noting more swap being used recently. Waiting for reply (4/2)

119409 (8/2)
Fresh ROD emi glexec ticket - things exploded at the weekend but the QM admins are fighting the good fight. In progress (8/2)

IMPERIAL
119294 - but this got solved by the times I got to it (it concerned a java update breaking md5).

BRUNEL
117878 (23/11)
Atlas SCC - Raul provided an example and is waiting on atlas to give a yay or nay before deploying. Waiting for reply (18/1)

118740 (10/1)
Atlas MCORE problems at Brunel, looks to be caused by some extreme Condor oddness, Raul reconfigured Condor to give a better view. Any joy? In progress (25/1)

100IT
119002 (Reopened)
116358 (In Progress)
Not going into detail with these as I'm not sure what the crack is with 100IT.

AND FINALLY...

THE TIER 1
118809 (12/1)
The Tier 1 provided feedback on configuring memory limits for batch jobs, the ticket left open for follow up. On hold (13/1)

116864 (12/10)
CMS AAA tests failing. Andrew L reports that the CASTOR headnode has received what sounds like a big fix which will hopefully improve things. In progress (29/1)

119389 (5/2)
LHCB data transfer problem to RAL. Being looked at. In progress (5/2)

117683 (18/11)
Another publishing ticket. How we love those! This one about CASTOR not publishing GLUE 2. Code was written by Jens and Rob but not integrated, something that works might be a long way off. That was a month ago, any news since? In progress (5/2)

109358 (15/10) or (5/2)
This ticket is weird - it started in a "waiting for reply" state and was apparently issued in 2014! I can't find a ticket with this number in my records though. Sno+ are unable to use the RAL WMS - it's being looked at. In progress (5/2)


Monday 1st February 2016, 14.30 GMT
50 Open UK Tickets this week, no Ops meeting scheduled so postponing a full review.

org.bdii.GLUE2-Validate tickets
We have 8 sites with these tickets (7 as Bristol have slain theirs), these are being discussed on TB-SUPPORT. A lot of these are still just assigned though - even if the issue is not really our fault we still need to handle the ticket proper. Rising above it all and all that.

If someone has submitted or knows of a counter-ticket for this issue please let me know.

NGI
Talking about a pain in the Information System, the UK still has this ticket to close (which has a similar root problem): 118930

CMS Siteconf problems.
GLASGOW 119196
EDINBURGH 119195
OXFORD 119197

CMS have spotted a number of misconfigured T3s across the globe (on a Friday afternoon)- the fix seems to be straightforward enough and Glasgow look like they're done already. Proper job!

ATLAS CONSISTENCY CHECKS
We still have 8 tickets open on this issue, although a couple are waiting for feedback from atlas. I'll bring this up in the Thursday UK atlas meeting to see if we can't shimmy along the tickets waiting for atlas feedback.

PILOTS
117723
Whilst investigating pilot issues at QM Daniela reminds us of this page that tells us what Dirac things should be going on at your site. Might be handy to preempt problems:
http://www.hep.ph.ic.ac.uk/~dbauer/dirac/site_pilot_status.html

118628
Whilst rolling out similar changes for LZ at RALPP Chris stumbled upon a problem, for which he submitted a bug report to nordugrid: http://bugzilla.nordugrid.org/show_bug.cgi?id=3529

AND FINALLY

QMUL
118985 (21/1)
Biomed have got back to Dan suggesting that rather then ban them altogether until he has a cgroup-corral to put their jobs in if he would be willing and able to supply a list of the problem users. Of course this requires that there be any non-problem users in the VO... On hold (1/2)

Monday 25th January 2016, 14.30 GMT

"OTHER VO" NAGIOS
Looks like hepgrid2.ph.liv.ac.uk at Liverpool is playing up for all VOs, and the Sheffield SE is misbehaving for the gripp VO. Other then that it looks clean.

43 Open UK Tickets this week.

That ticket to the NGI...
118930 (18/1)
Steve J put in a comprehensive reply about what Liverpool do to get their publishing kinda right. The view on this ticket from last week was to close it with a <carefully|harshly> worded statement about why this is a bit of a pointless request. Who was formulating the reply? If it was me I dropped that ball! Assigned (19/1)

Pilots Problems.
BRUNEL: 117710 Pheno. On Hold (19/11/15)
QMUL: 117723 Pheno - hopefully sorted. Waiting for reply (25/1)
SHEFFIELD: 114460 gridpp et al. In Progress (20/1)
RALPP: 118628 LZ (and maybe LSST?). In progress (14/1)

We have a few pilot rollout tickets, the last two being worked on but proving problematic.

RHUL
119027 (22/1)
As seen on the gridpp-storage list, Sno+ have asked RHUL (and will no doubt as others) for storage space (~20TB). In progress (22/1)

(for the interest of others the Govind's other thread on gridpp-storage was likely triggered by https://ggus.eu/?mode=ticket_info&ticket_id=118553)

QMUL
118985 (21/1)
QM have banished biomed from their cluster until they have a batch system that can put Biomed jobs in a c-group cage (looking at slurm). On Hold (21/1)

BIRMINGHAM
118155 (4/12)
Talking of Biomed, they've asked if they've successfully cleaned up all their files on the Birmingham SE - a cheeky uberftp onto your SE suggests the biomed directory is still full of cra.. I mean, files. In Progress (20/1)

HTTP TF Tickets
118787 (ECDF)
118764 (SHEFFIELD)
Feel free to poke the gridpp storage group for help with these. (I left out the 2 Manchester tickets as their immediate showstopper isn't their configs- but they can ask for help too!).

ATLAS CONSISTENCY CHECKS
Manchester, Oxford, Birmingham, Sussex, RHUL, Sheffield, Brunel and QMUL still open - a mix of chugging along nicely and being very much "On Hold".

Monday 18th January 2016, 14.00 GMT
49(!!) Open UK Tickets this week

NGI
118930 (18/1)
The NGI received a ticket concerning incorrect or missing glue information for the Tier 1, Brunel, Imperial, Liverpool, Durham, Glasgow, Bristol, Oxford and RALPP. The variables in question are GlueSubClusterPhysicalCPUs, GlueSubClusterLogicalCPUs and GlueHostProcessorOtherDescription. There are some extra instructions in the ticket - it would be nice if we didn't have to create child tickets (hint hint...).

ATLAS CONSISTENCY CHECKS (10 tickets)
Progress, or at least non-exciting but reassuring updates, on these. Birmingham and Glasgow tickets could do with an update (even if it's a "nothing to see here").

The QMUL ticket had an update providing feedback that might be useful to others too:
https://ggus.eu/?mode=ticket_info&ticket_id=117880

HTTP TF (5 tickets)
ECDF, Manchester, Sheffield and Glasgow are on the HTTP TF list - although no tickets are stale at the moment.

TIER 1 RECOMMENDATIONS
118809 (12/1) An interesting ticket asking T0 and T1s to fill in a questionnaire on configuring batch job memory limits - the Tier 1 have did their bit and the ticket is On Holded for feedback.

GLASGOW
118732 (9/1)
This ticket has got confusing - atlas want a dump for files "lost" at Glasgow that by the looks of it actually never made it to the site in the first place... Waiting for reply (15/1)

TIER 1 DUPLICATES
Are these three CMS are the same (or similar or related) issues -or am I just getting my wires crossed?
118494 (23/12/15)
116864 (12/10/15)
118722 (8/1)

CAN BE CLOSED (I THINK)
IC - 118162 (lfc ticket)
QM - 118839 (atlas job mcore jobs failures - doesn't look like the problem persists).

NEARLY THERE:
Lancaster - 118637 (squid misconfiguration hammering statum-0)
Birmingham - 118155 (biomed SE use - biomed now think they deleted all data at Birmingham).

Monday 11th January 2016, 14.30 GMT
48(!) Open UK Tickets this week

  • VOMS TWEAK

118603: nsccs.ac.uk has been requested to be removed from the gridpp voms servers. Just "Assigned" to the UK as a whole at the moment.

  • THE HTTP TASK FORCE STRIKES

Lancaster, RHUL and Manchester all had http TF tickets alongside Glasgow. Your site might be next! It'll be worth checking the monitoring pages and reviewing the documentation if you are: atlas: http://cern.ch/go/h8Rr
lhcb: http://cern.ch/go/Bk8J
https://twiki.cern.ch/twiki/bin/view/LCG/HTTPTFSAMProbe

  • TRANSFER ODDITIES

118494: The Tier-1 have a CMS ticket where xrootd is expecting a file which phedex and DAS don't think is at RAL. Is this even a site problem?

118728: In a similar vein, QMUL have an atlas ticket where a single file is refusing to be transfered - Dan has noticed a number of write attempts followed by immediate deletion. Checksumming causing a problem?

  • LOW HANGING FRUIT- tickets that can probably be closed, or are close to it.

IMPERIAL 118162
A ticket for the Imperial LFC, which appeared to be working (for Janusz at least).

RALPP 117740
Atlas datadisk cleanup ticket. Elena confirmed that the step09 directory can go for the chop. Not sure if Brian has had a chance at looking at the users directory contents yet.

BRISTOL 118311
I suspect that this CMS SAM ticket can be closed as the CEs were all green.

  • ATLAS CONSISTENCY CHECKS

As requested at the Thursday atlas meeting here's the outstanding consistency check tickets.

IMPERIAL: 117879
Not much news, (understandably) low priority for the site.

SUSSEX: 117894
It doesn't look like Matt got round to this before he left.

SHEFFIELD: 117886
Set in progress but no news since.

OXFORD: 117892
A similar case here - I assume it's on Ewan's to-do list before he heads off to pasture's green.

BIRMINGHAM: 117890
Matt was going to look at this again in the New Year. Any joy?

RHUL: 117881
Govind was going to try to get to this before Christmas. Any luck?

GLASGOW: 117889
Back in 2015 the dumps were run and Sam asked for some clarification. Considering Glasgow's current state any dump made using these tools might be full of lies, but I know that you chaps are working on this problem.

BRUNEL 117878
Raul asked some questions in his ticket, for which atlas only replied last week.

QMUL: 117880
Dan has created dumps and has asked for the all clear before he sets up the monthly cron.

TIER 1: 117846
Dumps have been created, but gfal and castor issues have slowed down the checking process (gfal-cat doesn't seem to work with castor).

MANCHESTER: 117885
This ticket was recently On-Holded, as currently Manchester has 0 free space outside of tokens whilst a few disk servers are down.

Monday 4th January 2015, 14.30 GMT
HAPPY NEW YEAR EVERYONE!

38 Open UK Tickets this year.

All-the-UK-tickets URL: http://tinyurl.com/nwgrnys

As Jeremy spotted, with Matt RB off to pastures new the Sussex tickets are looking a bit neglected, especially as one was reopened after his departure:
118337
118289

Finally in this Glasgow ticket the submitter gave two new links for the http taskforce monitoring: 118052

The links to the http tf monitoring pages are:
atlas: http://cern.ch/go/h8Rr
lhcb: http://cern.ch/go/Bk8J