Monday 12th September 2016, 15.00 GMT
33 Open UK tickets. Down to 29 this morning!
I'm just going to skim the Sussex tickets, I'll contact Jeremy M about these again offline.
122772 (On Hold) - Atlas webdav/xroot ticket.
123230 (In progress) - Atlas transfer failures, was partially fixed as of 15/8.
123740 (Assigned) - Recent low availability ticket.
123733 (Assigned) - Ops SRM-LS test failures.
122614 (In progress) - Technically a ticket to the NGI concerning the Sussex status, not a Sussex ticket.
CMS noticed that the Phedex agents appear to be down at RALPP. Fresh this afternoon. Assigned (12/9)
A duplicate of above, I believe because you run 2 Phedex boxes? Assigned (12/9) Update - one of these was a duplicate, but they're both solved, network issues seemed to be the underlying problem.
Low availability ticket, Chris provided an explanation and notes that there have been no tests since - possibly related to the problems noticed in this morning's EGI broadcast. In progress (9/9)
A ticket from Duncan concerning a drop in perfsonar throughput rates at Oxford. Currently on hold - any ideas perhaps when you'll get round to looking at this? On hold (10/8)
Another fresh "Phedex is down" ticket from CMS. Assigned (12/9) Update - solved alongside the RALPP tickets.
Atlas ticket requesting xroot and webdav endpoints. The submitter requests an update. In progress (12/9)
Enabling LSST at Glasgow. Any news, or plans for having any news soonish? On hold (19/7)
No perfsonar results for Glasgow, the server appeared borked so David took it down. Is it soon due to rise from the ashes soonish? On hold (28/6)
Nagios ticket for ce6 and ce7, which I believe are defunct CEs? Marcus has put the ticket on hold until Andy is back. On hold (5/9)
Another nagios ticket, a glue2.validate one this time. Andy was confused about where this was coming from, and I assume the alarm is still happening. Perhaps the glue-validator will yield some clues? Waiting for reply (29/8)
The third nagios ticket for ECDF, this one covers the odd saga of the ARCHER queue that would ideally be "IN PRODUCTION, NOT MONITORED". Waiting for reply (this probably should on hold instead) (5/9)
LHCB noticed that the arc CEs were producing an incorrect number of running/waiting jobs (was this the catalyst for the tb-support thread on the same subject?). The ticket could do with acknowledgment, perhaps someone in the know could lend Durham a hand. Assigned (9/9) In progress now
APEL-pub ROD ticket. Matt R has rerun the apel publishing scripts by hand and is awaiting the results, if this doesn't work we might need to ask the apel people how things are looking their end. In progress (12/9)
Atlas deletion errors, probably due to a downed disk server that has been brought back up. In progress (12/9) Solved.
LHCB jobs failing at Lancaster, probably because they're sensitive to some problems we had with our NFS server housing home and sandbox areas. We've hopefully soothed our problems by upping the number of nfs threads, we're in the wait and see period. In progress (12/9)
ROD apel ticket for UCL. Ben is investigating, it looks like some VAC boxes are having trouble talking to the network. In progress (7/9)
Low availability ticket for QM, but Daniela notes the alarms appear to have cleared so this should be able to be closed. On hold (12/9)
LHCB had problems with a QM CE, which Dan noticed was fubared and needed a reinstall. Should hopefully be back online soon? On hold (18/8)
Raul solved the two tickets before I could get to them, nicely done.
100IT have a ticket - 123753 (6/9) but you don't have to bring yourselves to look at it.
The TIER 1
Atlas noticed a lot of analysis job failures, after some prodding it turned out that the culprit was one dodgy worker node. Case closed? In progress (9/9)
Sno+ asking for more disk, this has developed into some discussion and Alastair has added a few more points. Matt M has expanded on the Sno+ needs, and has decided to make more use of Tier 2 space for Sno+, which many or may not keep them ticking over until Echo is upon is. In progress (24/8)
Enabling LSST at RAL. "Proper" test jobs are failed, Alessandra has put the ticket on hold until the issue can be debugged properly.
Jon Perkins noticed that the WMSes at RAL didn't seem to be updating proxies at Sheffield for some long T2K jobs. A conversation was started but seemed to have stalled, it seemed some weird resource matching errors were going on. The landscape might have changed in the last 3 weeks however. In progress (23/8)
cvmfs support for solidexperiment.org. After some solid progress the ticket is on hold waiting for someone VO side to try to roll out some experiment software in anger. On hold (24/8)
Developing glue2 support for Castor. Any update will do?! On hold (5/4)
The culling of the uncertified sites. The old Glasgow Pre-production service was mentioned too. In progress (23/8)
So long EFDA-JET
The jet decommissioning ticket. The other related ticket (123291) was closed as I wrote this report, so it won't be long until this ticket should be closed. Bye Jet! On Hold (1/9)