Difference between revisions of "Past Ticket Bulletins 2013"

From GridPP Wiki
Jump to: navigation, search
 
(No difference)

Latest revision as of 13:43, 16 December 2013

Monday 9th December 2013, 15.30 GMT</br> 34 Open tickets in the UK. In the interests of efficiency/laziness I only looked at the tickets that were updated in the last 7 days as I went over all the tickets last week. And here's what I spied (there's not much going on really).

TIER 1</br> https://ggus.eu/ws/ticket_info.php?ticket=99556 (6/12)</br> As seen on TB-SUPPORT, a ticket is in for an NGI level argus server at the Tier 1. I'm sure this will be discussed elsewhere in the meeting. In progress (9/12)

https://ggus.eu/ws/ticket_info.php?ticket=97385 (17/9)</br> The HyperK cvmfs ticket. This one is almost done, Catalin remarks that once he's happy he'll solve this ticket. The other cvmfs tickets (Sno+, cern@school, T2K) are also chugging along nicely. In progress (9/12)

SHEFFIELD</br> https://ggus.eu/ws/ticket_info.php?ticket=98594 (4/11)</br> LHCB problems transferring job results out of Sheffield. If progress has stalled could the ticket be On Held? Or if it's still chugging along can we get an update? In progress (27/11)

SUSSEX</br> https://ggus.eu/ws/ticket_info.php?ticket=99198 (26/11)</br> WN-glexec Nagios test failures. Daniela extended the ticket one more time on the 8th, it really could do with some love (as tickets can't be extended forever). In progress (3/12)

https://ggus.eu/ws/ticket_info.php?ticket=99524 (6/12)</br> This Nagios ticket (CADist-Check) looks like it can be closed, as Daniela reminds us the onus is on us to solve our tickets (in all senses of "solved"!). In progress (6/12)

QMUL</br> https://ggus.eu/ws/ticket_info.php?ticket=99428 (4/12)</br> Queen Mary's perfsonar latency box appears to be broken somehow, in a non-obvious way (from my observations, perfsonar's preferred way of breaking). Chris is looking at it, but might have to ask on the perfsonar list (I had forgotten that there was a perfsonar list). In progress (9/12)

SOLVED CASE PILE</br> There isn't much excitement on the Solved Case pile. The ngs.ac.uk removal tickets were dealt with quickly by the UK Vomses Teamses. The lfc webdav ticket (91658) has been solved, with a read-only lfc ready to be prodded. And a number of sites have been solving their publishing problems - I found the RAL one quite interesting as it has the summary of what they did at RAL to get their publishing to work with Condor (https://ggus.eu/ws/ticket_info.php?ticket=99162).

Monday 2nd December, 14.30 GMT </br> 39 Open UK Tickets this week. Site by Site, in no particular order, we have:

SUSSEX</br> https://ggus.eu/ws/ticket_info.php?ticket=99198 (26/11)</br> Sussex have recieved a glexec.WN nagios ticket. It could be that just glexec is broken. Pete G acknowledged the ticket on behalf of the site (is Jeremy M still having trouble with GGUS?). In progress (28/11)

https://ggus.eu/ws/ticket_info.php?ticket=95165 (28/6)</br> Sussex needing a refreshed Perfsonar. Emyr reports that it will be the new guy's first job, whenever he or she lands. On hold (20/11)

RALPP</br> https://ggus.eu/ws/ticket_info.php?ticket=98923 (15/11)</br> RALPP are being Nagiosed about their dcache not being a SHA-2 compliant version - but it is. It's just their publishing that's broken. Chris is scratching his head over it. In progress (2/12) Update - Chris took care of business, tracked a bug in the dcache publishing, fixed it and submitted a ticket to the dcache devs. Solved.

OXFORD</br> https://ggus.eu/ws/ticket_info.php?ticket=99362 (2/12)</br> Oxford have been asked to remove ngs.ac.uk from their backup voms server. Kashif is away, so it may not happen this week. In progress (2/12)

BRISTOL</br> https://ggus.eu/ws/ticket_info.php?ticket=99377 (2/12)</br> Bristol's ARC CE is producing errors for the Ops nagios tests. They're aware of the problem, and will put the CE in downtime if it keeps giving them gyp. In progress (2/12)

GLASGOW</br> https://ggus.eu/ws/ticket_info.php?ticket=97068 (5/9)</br> Duncan ticketed Glasgow about their perfsonar. Gareth reported that they'd get to it after the SL6 migration (which was the mantra for a lot of us over the last few months), but this was back in October. Please can you show the ticket (and more importantly the issue at hand) some love! On hold (15/10) Update - Dave showed the nessicery love to the ticket.

https://ggus.eu/ws/ticket_info.php?ticket=96234 (29/7)</br> Support for HyperK on the Glasgow WMS. After t'other weekend's power shenanigans Dave thinks he got it, could Chris (or another HyperK member) please test. Waiting for reply (2/12)

https://ggus.eu/ws/ticket_info.php?ticket=98253 (21/10)</br> A CMS ticket that has morphed into "getting CMS glideins to work at Glasgow (if I'm reading it right). Things are progressing (in spite of power cuts and top bdii's playing up), at last word Daniela spotted something off in the site xml file. In progress (26/11)

ECDF</br> https://ggus.eu/ws/ticket_info.php?ticket=95303 (1/7)</br> Edinburgh's GlExEC ticket. No progress, although Wahid has made his opinions on the matter quite clear. It's still on me (with my tarball hat on) I'm afraid. On hold (29/11)

https://ggus.eu/ws/ticket_info.php?ticket=99179 (25/11)</br> ECDF got a ticket over using a "buggy" version of the BDII (at Lancaster we had to update the site-BDII to fix this). Wahid correctly corrected the ticket to a "Change Request" not an "Incident". Still probably best to try to do this soon if you can, as it preempts a new set of nagios tests. On hold (26/11)

https://ggus.eu/ws/ticket_info.php?ticket=99180 (25/11)</br> Along the same vein, this ticket is about the publishing of default values. On hold (25/11) Update - Andy has asked some question to clarify which CEs are publishing bad values.

DURHAM</br></br> https://ggus.eu/ws/ticket_info.php?ticket=95302 (1/7) Durham's gLeXeC ticket. Ewan S reports that it's still being worked on. In progress (26/11)

SHEFFIELD</br> https://ggus.eu/ws/ticket_info.php?ticket=95301 (1/7)</br> Sheffield's GlexeC ticket. There was a hope to get this done min-November, what's the new timeframe? Does anyone know what the current deadline for this is? On hold (29/10)

https://ggus.eu/ws/ticket_info.php?ticket=98594 (4/11)</br> LHCB transfer problems from Sheffield. This is being worked on, but I don't think it's understood yet. In progress (27/11)

https://ggus.eu/ws/ticket_info.php?ticket=97039 (4/9)</br> Biomed complaining about lack of dynamic publishing at Sheffield. Its had a few bashes at it, but nothing works. As a note I had some problems recently after an update that needed me to set the ldap user up as an operator (in qmgr) for our torque server. Elena referenced similar problems in ticket 98748. On hold (13/11)

MANCHESTER</br> https://ggus.eu/ws/ticket_info.php?ticket=97066 (5/9)</br> A perfsonar ticket. Alessandra's revised date for getting this done seems to be the 9/12. On hold (25/11)

https://ggus.eu/ws/ticket_info.php?ticket=99334 (29/11)</br> The kinda parent ticket to the other two voms tickets, requesting the purging of ngs.ac.uk from the voms servers. Waiting on the other two tickets to be solved. On hold (2/12)

LANCASTER</br> https://ggus.eu/ws/ticket_info.php?ticket=95299 (1/7)</br> Lancaster's GlExEc ticket. If I don't sort out the tarball glexec soon I'm going to have to commit Seppuku to atone. Any volunteers to be my second? On hold (17/7)

UCL</br> https://ggus.eu/ws/ticket_info.php?ticket=98792 (11/11)</br> Nagios JobSubmit failures. The site's in downtime, and Ben is working on the SL6 upgrade. On hold (25/11)

https://ggus.eu/ws/ticket_info.php?ticket=98542 (1/11)</br> SL6 upgrade plan ticket. Ben's in the middle of upgrading now, he set the reminder date for today. On hold (25/11)

https://ggus.eu/ws/ticket_info.php?ticket=98719 (7/11)</br> Brian submitted a request from atlas to bring the UCL dpm up to a "minimum level" and enable WedDav. Still in the middle of the SL6 upgrade, so this work has been delayed slightly. On hold (25/11)

https://ggus.eu/ws/ticket_info.php?ticket=99174 (25/11)</br> Obsolete Glue2 entries ticket. Ben will fix this before putting the site back in production. On hold (25/11)

https://ggus.eu/ws/ticket_info.php?ticket=99176 (25/11)</br> "Publishing default values" ticket, twin to 99174. As above. On hold (25/11)

https://ggus.eu/ws/ticket_info.php?ticket=98125 (17/10)</br> Atlas file transfer problems that I believe preempted the current downtime. On hold (25/11)

https://ggus.eu/ws/ticket_info.php?ticket=95298 (1/7)</br> UCL's glExEc ticket. Will be worked on after the SL6 upgrade. On hold (25/11)

QMUL</br> https://ggus.eu/ws/ticket_info.php?ticket=94746 (10/6)</br> That one where QM's SE publishes that it supports Biomed. After going around the Storm developers (who's response was a politely worded "not our problem") it's back in Chris' hands. On hold (25/11)

https://ggus.eu/ws/ticket_info.php?ticket=99294 (28/11)</br> Brian has asked for some space juggling. This ticket seems to have flown under the radar. Assigned (28/11) Update - Solved by Chris

BRUNEL</br> https://ggus.eu/ws/ticket_info.php?ticket=99316 (29/11)</br> Nagios "Apel-Pub" ticket. Raul cites a ticket he has open with the APEL devs (99320). In progress (29/11)

EFDA-JET</br> https://ggus.eu/ws/ticket_info.php?ticket=97485 (21/9)</br> LHCB job failures at JET. Krishan is fighting the good fight, it maybe a missing package or two causing the trouble according to Vladimir. In progress (28/11)

https://ggus.eu/ws/ticket_info.php?ticket=95295 (1/7)</br> Jet's gLExEc ticket. It's installed, but not quite working right. A ticket has been submitted to the Argus devs (98609). In progress (2/12)

https://ggus.eu/ws/ticket_info.php?ticket=99197 (26/11)</br> Nagios gLexEc ticket. Working on it alongside t'other issues. On hold (2/12)

TIER 1</br> https://ggus.eu/ws/ticket_info.php?ticket=86152 (7/9/2012)</br> The most venerable of our tickets, "correlated packet-loss on perfsonar host". Did a new latency host get installed? On hold (18/10)

https://ggus.eu/ws/ticket_info.php?ticket=97868 (8/10)</br> T2K's cvmfs request ticket. Ben asked to test with a ROOT tarball, but no news since. In progress (18/11)

https://ggus.eu/ws/ticket_info.php?ticket=98249 (21/10)</br> SNO+ cvmfs request. Catalin asked some questions, no reply from Sno+ yet. The SYSTEM has sent its second warning/reminder. They have 7 days to comply! Waiting for reply (18/11)

https://ggus.eu/ws/ticket_info.php?ticket=97385 (17/9)</br> HyperK's cvmfs ticket. Chugging along nicely, but no news for a while. In progress (18/11)

https://ggus.eu/ws/ticket_info.php?ticket=98122 (17/10)</br> cern@school's cvmfs request, also on its second reminder. Waiting for reply (18/11)

https://ggus.eu/ws/ticket_info.php?ticket=97025 (3/9)</br> A ticket left open as a reminder about the RAL myproxy server's idiosyncrasies. Last word was that they hoped to have it replaced soon. On hold (5/11)

https://ggus.eu/ws/ticket_info.php?ticket=99162 (25/11)</br> A "publishing default values" ticket. Looked to be fixed, but then got reopened on the RAL guys today with default values being publishing for "GLUE2ComputingShareEstimatedAverageWaitingTime" and "GLUE2ComputingShareEstimatedWorstWaitingTime". Reopened (2/12)

https://ggus.eu/ws/ticket_info.php?ticket=91658 (20/2)</br> LFC webdav support ticket. Chris reports that his first tests worked well, and that this ticket can be closed. Top stuff. In progress (27/11) Update- Solved.

Monday 25th November 2013, 15.30 GMT</br> 41 Open UK tickets today.

Information System Tickets:</br> RALPP, ECDF, Lancaster, Liverpool, UCL Brunel, RHUL and the Tier 1 all got tickets about their information system (this is a prelude to information system probes going into the SAM tests). </br> I asked for some clarification in the Lancaster ticket, as our resource bdiis are up to date and recently reconfigured, but as these tickets are super-fresh don't panic about them.

RALPP</br> https://ggus.eu/ws/ticket_info.php?ticket=99186 (25/11)</br> Not a reflection on the site (the ticket is 10 minutes old at time of writing), but the subject interested me "NAGIOS *emi.cream.glexec.CREAMCE-JobSubmit-/ops/Role=pilot* failed on heplnv146.pp.rl.ac.uk@UKI-SOUTHGRID-RALPP". Are glexec failures becoming critical? Assigned (25/11)

Which reminds me, I'll be taking a look at all your (and my own...*whimper*) glexec tickets next week.

https://ggus.eu/ws/ticket_info.php?ticket=98923 (15/11)</br> Picking on RALPP again, this other (SHA2) nagios ticket got reopened. Looks like you're just not publishing your dcache version. To the ldifs! Reopened (25/11)

SUSSEX</br> https://ggus.eu/ws/ticket_info.php?ticket=98882 (14/11)</br> Emyr fixed Sussex's STORM (hang on, I thought Emyr had escaped?) The site's been whitelisted for testing since the 21st, if things are looking good I suggest closing this ticket. In progress (21/11)

SHEFFIELD</br> https://ggus.eu/ws/ticket_info.php?ticket=98594 (4/11)</br> This LHCB ticket, regarding file uploading troubles running at Sheffield post SL6 upgrade, is looking a bit neglected. Does anyone else know of any post-SL6 tweaks that they needed to apply (say a cheeky undocumented rpm) to get LHCB to work after their move to SL6? In Progress (13/11)

cvmfs@RAL tickets</br> https://ggus.eu/ws/ticket_info.php?ticket=98249 (SNO+)</br> https://ggus.eu/ws/ticket_info.php?ticket=98122 (cern@school)</br> Both of these tickets have received their first warning for being in the "waiting for reply" state for too long.

https://ggus.eu/ws/ticket_info.php?ticket=97868 (t2k)</br> T2K don't have software to put into their statum 0 yet, but would like to test with a ROOT tarball. No word from Catalin over this modest testing plan (at least on the ticket, you might be beavering away offline on this). In progress (18/11)

https://ggus.eu/ws/ticket_info.php?ticket=97385 (hyperK)</br> A similar story here (I think work is just progressing offline, hopefully we haven't entered a nightmarish universe where anything not documented in GGUS tickets doesn't happen-yet). In progress (18/11)

GLASGOW</br> https://ggus.eu/ws/ticket_info.php?ticket=96234 (29/7)</br> WMS support for HyperK at Glasgow. Chris spotted a problem, Dave said he'd get on it on Monday (which unless Dave had a 9 day weekend was a week ago). Any luck? In progress (15/11)

EFDA-JET</br> https://ggus.eu/ws/ticket_info.php?ticket=97485 (21/9)</br> I mention this LHCB ticket last week, as this recurring problem has stumped everyone involved. The JET guys have asked LHCB for some information to try to help them debug the problem. Waiting for reply (18/11)

I've no doubt missed something, having rushed this out in half the time I usually take, so I'll cover my shoddiness with my usual line that if I've missed any tickets of interest, please bring them up at the meeting or online.


Monday 18th November 15.00 GMT</br> 39 Open UK tickets this week. None of them are really exciting, a lot of "business as usual" this week, so I'm not going to go over them all. If you have been poked by Mohit, Sam or Guenter on your tickets please can you address their concerns.

Scraping the barrel of interestingness:</br> https://ggus.eu/ws/ticket_info.php?ticket=97485 (21/9)</br> Jet are still losing LHCB jobs to these wierd (seemingly) cert-based errors, even after a resinstall of their nodes. Has anyone seen anything like it before? Waiting for reply (11/11)

My ticket-sense tingle over the state of SUSSEX, now that Emyr has headed off to greener pastures. Especially as their Storm is in distress.

That's about it really. Of course I could be wrong, or missed something in my state of GGUS jadedness. So feel free to mention any tickets you want to talk about (particularly any you've submitted yourself).

http://tinyurl.com/cblj3ab

Monday 11th November 2013, 15.00 GMT</br> 50 Open UK tickets this week, most can be grouped into handy ticket blobs. Handy as I was a little short on time to do a decent review.

SL6 MIGRATION PLANS</br> RALPP, BRISTOL and UCL have all given feedback, with the first two at least 50% migrated and UCL working on it.

BRIAN'S STORAGE WRANGLING.</br> Brian was busy last week submitting tickets to sites to either move space between tokens or to update their storage to the latest and greatest (or at least not so crusty) versions.

SHEFFIELD, RALPP, MANCHESTER and ECDF all have requests to move space to LOCALGROUPDISK. (The Manchester ticket is still in the "assigned" state).

RALPP, RHUL, SUSSEX and UCL all of requests to upgrade their storage to the "minimum baseline version". (SUSSEX, RALPP and UCL tickets still just assigned).

BACK UP GridPP VOMS.</br> Only two sites left to roll out the changes fully. Glasgow (just one WMS to go) and Sussex (who haven't acknowledged the ticket yet - 98623)

GLEXEC</br> Not much movement here (correct me if I'm wrong)- Bristol have reached the almost done stage (not quite working for Ops). EFDA-JET are almost there too - in the debugging stage, Maarten has given some good feedback.

ECDF, DURHAM, SHEFFIELD, LANCASTER, UCL and QMUL are still working on glexec.


TIER 1 CVMFS REQUESTS</br> HYPERK https://ggus.eu/ws/ticket_info.php?ticket=97385</br> T2K https://ggus.eu/ws/ticket_info.php?ticket=97868</br> CERN@SCHOOL https://ggus.eu/ws/ticket_info.php?ticket=98122</br> SNO+ https://ggus.eu/ws/ticket_info.php?ticket=98249

These tickets are in a state of mild limbo whilst the infrastructure is updated and other technical issues worked out. They should at least be on-holded (on held?). The T2K in particular needs some feedback - there was a request for feedback in the ticket and it's at risk of being auto-closed by THE SYSTEM.


SOME MISCELLANEOUS TICKETS

SHEFFIELD</br> https://ggus.eu/ws/ticket_info.php?ticket=98594 (4/11)</br> LHCB have been having problems getting their data out at Sheffield. It reminds me of some problems that once plagued Liverpool (although like the memory I had that I took 32 inch waist trousers, this could be wrong). In progress (5/11)

GLASGOW</br> https://ggus.eu/ws/ticket_info.php?ticket=96234 (29/7) The Glasgow lads have rolled out HyperK support on their WMS, but have requested that it be tested before they wrap up the ticket. Waiting for reply (4/11)

That's all that caught my eye, of course please bring up any other tickets that I missed.

Monday 4th November 2013, 14.00 GMT</br>

Remember remember, the 5th of November. </br> GGUS tickets and Plot;</br> I know of no pretext</br> Why your GGUS tickets,</br> Should ever be forgot!</br>

On to the tickets (46 of them this month), I realised too late that we probably wouldn't be doing a ticket review today so please just read through this in your own time.

GridPP Back Voms Parent Ticket:</br> https://ggus.eu/ws/ticket_info.php?ticket=98614

TIER 1</br> https://ggus.eu/ws/ticket_info.php?ticket=98469 (29/10)</br> Gareth submitted a ticket to note the decommissioning of a bunch of RAL CEs tomorrow (listed in the ticket). On hold (29/10)

https://ggus.eu/ws/ticket_info.php?ticket=98249 (21/10)</br> SNO+ asking for cvmfs access at the RAL stratum-0. Waiting on the Stratum-1 to be upgraded to cvmfs v2.1 (which is a boat all the new cvmfs repos will be in). In Progress (30/10)

https://ggus.eu/ws/ticket_info.php?ticket=97385 (17/9)</br> The ticket tracking the HyperK cvmfs repo deployment. Presumably affected by the above issue, as well as a logistical one on figuring out how to get software in there. JK asks if this should be put "On Hold" whilst these things are figured out. Jeremy has asked if the issues need to be split as well as other questions. In Progress (28/10)

https://ggus.eu/ws/ticket_info.php?ticket=97868 (8/10)</br> T2K's cvmfs I-want-a-repo ticket. Hit the same cvmfs version problem as the previous two, but is also waiting on feedback from the VO itself since the 21/10. Waiting for reply (30/10)

https://ggus.eu/ws/ticket_info.php?ticket=98122 (17/10)</br> cern@school getting in on the cvmfs action. Being worked on, has the same issue as the other statum-0 tickets. In Progress (30/10)

https://ggus.eu/ws/ticket_info.php?ticket=98607 (4/11)</br> Atlas noticed some Castor access problems ("Too many threads busy"), which Alastair notes are probably due to some over-eager deletion tasks he was running. Alastair has paused his deletions, and will resume them at a slower rate once things are working again. In progress (4/11)

https://ggus.eu/ws/ticket_info.php?ticket=97759 (4/10)</br> The Tier-1's "SHA2" ticket. I believe that these CEs are being decommissioned tomorrow (98469) so hopefully this issue will resolve itself. Worth keeping an eye on. On hold (4/10)

https://ggus.eu/ws/ticket_info.php?ticket=91658 (20/2)</br> Request from Chris for webdav support on the RAL LFC. No news on this since August, it needs an update. On Hold (9/8)

https://ggus.eu/ws/ticket_info.php?ticket=98337 (23/10)</br> MICE were experiencing slow uploads to Castor. This one fell through the cracks for a few days, some questions back to MICE have been asked (partly to see if the problems are still there). Waiting for reply (30/10)

https://ggus.eu/ws/ticket_info.php?ticket=97025 (3/9)</br> The outstanding issue with the old RAL myproxy server's hostname not being in its certificate. A newer machine doesn't have this problem, but hasn't been declared production ready yet (looking in the gocdb at myproxy.gridpp.rl.ac.uk. Any news? On hold (12/9)

https://ggus.eu/ws/ticket_info.php?ticket=98214 (19/10)</br> CMS noticed Hammercloud failures at RAL. The problem disappeared, so this ticket can be closed - it looks like CMS have left that up to the RAL chaps. In progress (can be solved) (21/10)

https://ggus.eu/ws/ticket_info.php?ticket=86152 (17/9/2012)</br> "correlated packet-loss on perfsonar host". An update from Brian says that there's a planned reinstall of the latency host on new hardware to rule out endpoint troubles. On hold (18/10)

SUSSEX</br> https://ggus.eu/ws/ticket_info.php?ticket=95165 (28/6)</br> Duncan asked Sussex to check their perfsonar box. Emyr replied that there was a plan to reinstall it. No news for a while. On hold (14/8)

https://ggus.eu/ws/ticket_info.php?ticket=98172 (18/10)</br> A SHA2 test ticket for Sussex's Storm SE. Planning the upgrade to a version that passes muster. In progress (28/10)

RALPP</br> https://ggus.eu/ws/ticket_info.php?ticket=98544 (1/11)</br> RALPP were their SL6 upgrade plans by Alessandra since the SL6 deadline passed. Chris posted a comprehensive reply. On hold (1/11)

https://ggus.eu/ws/ticket_info.php?ticket=97834 (7/10)</br> A SHA2 ticket for RALPP's dcache SE. An upgrade is planned, but keeps getting pushed back (due to stuff happening). Current planned date is 12/11. On hold (1/11)

CAMBRIDGE</br> https://ggus.eu/ws/ticket_info.php?ticket=98597 (4/11)</br> Cambridge got an APEL-Pub (a good name for a GridPP bar?) nagios test failure. John is waiting on a open ticket he has with the APEL team during his transition to EMI3 apel (97957). In progress (4/11)

https://ggus.eu/ws/ticket_info.php?ticket=95306 (1/7)</br> glExec ticket. John has almost vanquished this, test failures look to not be related to gLexec at all. Just waiting to pass enough tests to declare the issue closed.

BRISTOL</br> https://ggus.eu/ws/ticket_info.php?ticket=98543 (1/11)</br> Bristol's SL6 WN migration plan ticket. A swift reply from Lukasz (with a savannah link) says they'll be half way there soon. In progress (1/11)

https://ggus.eu/ws/ticket_info.php?ticket=96261 (30/7)</br> A cms user's problems with their jobs failing during stage out. It should be fixed now (or at least a new issue should show up!), but getting word back from the user is difficult. Personally I'd just solve it and they can reopen if it's still broken. Waiting for reply (4/11)

https://ggus.eu/ws/ticket_info.php?ticket=95305 (1/7)</br> Bristol's Glexec ticket. All the SL6 WNs (behind lcgce01) are gleXeced, so tied into moving the remaining Sl5 nodes to SL6. On hold (23/10)

GLASGOW</br> https://ggus.eu/ws/ticket_info.php?ticket=96234 (29/7)</br> Request to enable HyperK on the Glasgow WMS. Gareth has enabled them on the WMS and in argus, and asked for some testing. Waiting for reply (4/11)

https://ggus.eu/ws/ticket_info.php?ticket=98253 (21/10)</br> CMS have spotted jobs failing due to full WNs, related to another user filling up the disk space (Biomed, 98239). The ticket then snowballed to include problems with the CMS environment after the move to SL6, and a move over to xroot for cms jobs. The documentation linked is handily hidden from all who are not cms. Gareth asks if anyone with CMS credentials please forward him the info linked in https://twiki.cern.ch/twiki/bin/viewauth/CMS/SWIntTrivial Waiting for reply (1/11)

https://ggus.eu/ws/ticket_info.php?ticket=97068 (5/9)</br> Glasgow's perfsonar wasn't being right. The plan is to reinstall the box, but after the SL6 upgrade is done and the dust settled from that. On hold (15/10)

ECDF</br> https://ggus.eu/ws/ticket_info.php?ticket=96002 (22/7)</br> SHA2 ticket for Edinburgh. The ticket could do with an update (so could your CE :-P). Ribbing aside, I believe the offending CE will be switched off soon now that the SL6 upgrade is passed. On hold (20/8)

https://ggus.eu/ws/ticket_info.php?ticket=95303 (1/7)</br> ECDF's glexeC ticket. As a tarball site, it's all on me. On hold (21/8)

DURHAM</br> https://ggus.eu/ws/ticket_info.php?ticket=98610 (4/11)</br> Nagios tests failing at Durham. Ewan reports a poorly site BDII, keeping a stern eye on it. In progress (4/11)

https://ggus.eu/ws/ticket_info.php?ticket=98585 (4/11)</br> Atlas having troubles accessing files at Durham. Acknowledged, but no other news yet. In progress (4/11)

https://ggus.eu/ws/ticket_info.php?ticket=95302 (1/7)</br> Durham's gLExec ticket. There were some teething problems, but they look to be fixed. Any more news? On hold (21/10).

SHEFFIELD</br> https://ggus.eu/ws/ticket_info.php?ticket=98594 (4/11)</br> LHCB having problem uploading their job outputs from Sheffield. Looks to be a local network problem? In progress (4/11)

https://ggus.eu/ws/ticket_info.php?ticket=95301 (1/7)</br> Sheffield's glexEc ticket. Disk server problems have pushed glexec configuration work back. On hold (29/10)

https://ggus.eu/ws/ticket_info.php?ticket=97039 (4/9)</br> Biomed complaining about lack of dynamic publishing at Sheffield. Due to having bigger fish to fry Elena has had to put this on the back burner. On hold (21/10)

MANCHESTER</br> https://ggus.eu/ws/ticket_info.php?ticket=97066 (5/9)</br> Dodgey perfsonar at Manchester. Set on hold until after SL6 and the Manchester network has finished playing up. Are you going to start soon? On hold (9/9)

LANCASTER</br> https://ggus.eu/ws/ticket_info.php?ticket=95299 (1/7)</br> GLeXEC ticket. Now that I have almost nothing left to upgrade I'm working on this, as well as a bunch of other tarball related requests. On hold (17/7)

https://ggus.eu/ws/ticket_info.php?ticket=98403 (25/10)</br> LHCB having trouble on the "upgraded" Lancaster clusters. Working on it with LHCB. In progress (4/11)

UCL</br> https://ggus.eu/ws/ticket_info.php?ticket=95298 (1/7)</br> glexec ticket. Planned to be done after SL6, comparatively low priority. On hold (14/10)

https://ggus.eu/ws/ticket_info.php?ticket=98125 (17/10)</br> Atlas transfer problems to/from UCL. Site blacklisted a lot, "globus_xio: System error in connect: Connection refused globus_xio: A system call failed: Connection refused" errors. In progress (1/11)

https://ggus.eu/ws/ticket_info.php?ticket=98542 (1/11)</br> SL6 WN migration plan ticket. No reply from the site yet. Assigned (1/11)

RHUL</br> https://ggus.eu/ws/ticket_info.php?ticket=95297 (1/7)</br> RHUL's gLeXeC ticket. Govind got it working for Ops, but not the "big three". Almost! Reopened (30/10)

QMUL</br> https://ggus.eu/ws/ticket_info.php?ticket=98592 (4/11)</br> cmtsite timeout failures for some atlas jobs. Dan thinks he tracked it down to some bad user jobs hammering the storage, and initiated some containment procedures. Hopefully this will have got it. In progress (4/11)

https://ggus.eu/ws/ticket_info.php?ticket=98376 (24/10)</br> Sno+ question about queue atrributes at QM which Sno balled into a problem with software install jobs failing. Looks like a post-reconfiguration problem but atlas jobs filling up the site are making things difficult to test the fixes. In progress (2/11)

https://ggus.eu/ws/ticket_info.php?ticket=95296 (1/7)</br> Queen Mary's glEXec ticket. Last word was that things are almost there. On hold (23/10)

https://ggus.eu/ws/ticket_info.php?ticket=98427 (26/10)</br> LHCB pilots aborted at QM. Looks fixed (was an old issue rearing its head: 88669), so the ticket can probably be solved. In progress (1/11)

EFDA-JET</br> https://ggus.eu/ws/ticket_info.php?ticket=97485 (21/9)</br> LHCB jobs failing at JET with handshake errors ("certificate verify failed"). Even after upgrading CA certs and the WNs to SL6 the problem persists (same sort of error by the looks of it). Team JET are still battling at this. In progress (30/10)

https://ggus.eu/ws/ticket_info.php?ticket=95295 (1/7)</br> Jet's glExEc ticket. Team Jet have deployed this, but are having issues and submitted a ticket describing their problems (https://ggus.eu/ws/ticket_info.php?ticket=98609). In progress (4/11)


Monday 21st October 17.00 BST</br>

In the ticket limelight this week:

NGI/SUSSEX</br> https://ggus.eu/ws/ticket_info.php?ticket=97941 (10/10)</br> The NGI got ticketed over the slow progress on the SUSSEX APEL ticket 97139. Although the powers that be are satisfied they're keeping this ticket open until the underlying issue is solved. In progress (21/10)

TIER 1</br> https://ggus.eu/ws/ticket_info.php?ticket=98122 (17/10)</br> This request for cernatschool to have access to the RAL cvmfs stratum zero has been around the houses a few times, so the RAL team might not have noticed it yet. Assigned (18/10)

https://ggus.eu/ws/ticket_info.php?ticket=97908 (9/10)</br> Chris ticketed the Tier 1, but it's a useful reminder to everybody - please have the GridPP "backup" voms servers added to your configs. The deadline for this is the end of this month. In progress (22/10)

SHEFFIELD</br> https://ggus.eu/ws/ticket_info.php?ticket=97039 (4/9)</br> This biomed ticket needs some love, or to be put On Hold again. But some attention to it would be nice. In progress (11/9)

That's about it really, I haven't had time to go through them all like I usually do. My apologies again.

Obligatory link to the tickets when I have haven't done a thorough job: </br> http://tinyurl.com/cblj3ab

Monday 14th October 2013, 15.15 BST</br> Today's ticket bit will be quick - the ggus landscape hasn't changed too much since last week's full review anyhoo. Plus I think a good number of people are at CHEP (I'm really just trying to justify my laziness now).

A few things of interest:

Of the 38 open UK tickets only 2 are green (and one of those was reopened). Not sure if this is good, bad or just an observation.

https://ggus.eu/ws/ticket_info.php?ticket=97385 (17/9)</br> The HyperK cvmfs repo is kinda up and running - some interesting stuff here. In progress (14/10)

https://ggus.eu/ws/ticket_info.php?ticket=97868 (8/10)</br> t2k.org are also wanting to get on the cvmfs bandwagon. The RAL guys are working on it. In progress (14/10)

Congratulations to Birmingham for solving their glexeC ticket. Cambridge are nearly there, just got to get passing the Glexec tests.

You can check your site's (and other's) tickets here:</br> http://tinyurl.com/cblj3ab

If you have any tickets of interest please let me (and everyone else) know.

Monday 7th October 2013, 14.30 BST</br> 38 Open UK tickets this month (nice, they're going down).

GLEXEC tickets - keeping these separate. Congratulations to Manchester for vanquishing their lack of glexec during their SL6 upgrade.

CAMBRIDGE</br> https://ggus.eu/ws/ticket_info.php?ticket=95306 </br> John said in his last update that they intend to roll out gLExec alongside their SL6 upgrade, which should be happening this week. How goes it? On hold (2/9)

BRISTOL</br> https://ggus.eu/ws/ticket_info.php?ticket=95305</br> gLexec will be dealt with after the other stuff (SL6, new CEs etc). On hold (11/7)

BIRMINGHAM</br> https://ggus.eu/ws/ticket_info.php?ticket=95304</br> Mark enabled gLexec, but had a few bugs that needed ironing out for Alice. I'd suggest solving it again if you think these are fixed Mark. In progress (26/9)

  • Note the LHCb test is now passing intermittently.

ECDF</br> https://ggus.eu/ws/ticket_info.php?ticket=95303</br> Waiting on the tarball - *tarball hat on* Sorry guys, I've dropped the ball here. Keeping the lights on at Lancaster ate all of September and not giving me much time to tackle this problem. I hope to have an update soon in a big tarball refresh. On hold (21/8)

DURHAM</br> https://ggus.eu/ws/ticket_info.php?ticket=95302</br> It looked like Durham were close, but Mike was on leave. Any news? On hold (2/9)

SHEFFIELD</br> https://ggus.eu/ws/ticket_info.php?ticket=95301</br> Like most others Elena is rolling out this with the SL6 upgrade, so soon hopefully. On hold (1/10)

LANCASTER</br> https://ggus.eu/ws/ticket_info.php?ticket=95299</br> See my apology to ECDF. On hold (17/7)

RHUL</br> https://ggus.eu/ws/ticket_info.php?ticket=95297</br> Coming along with SL6. On hold (2/9)

QMUL</br> https://ggus.eu/ws/ticket_info.php?ticket=95296</br> glExec is basically working on QM's SL6 nodes, so it's just a matter of time before glexEc is rolled out across their nodes. On hold (12/8)

UCL</br> https://ggus.eu/ws/ticket_info.php?ticket=95298</br> Ben hoped to roll this out by the end of September along with the SL6 upgrade. On hold (29/8)

EFDA-JET</br> https://ggus.eu/ws/ticket_info.php?ticket=95295</br> There was a plan to roll this out in a few weeks time, back in July. On hold (19/7)

Common or Garden Tickets:

NGI</br> https://ggus.eu/ws/ticket_info.php?ticket=95469 (5/7)</br> The last of the Unresponsive VO child tickets (supernemo). Jeremy has mentioned that he will close this ticket, it just looks like he hasn't got around to it yet. Malgorzata has chimed in asking for the decommissioning ticket to be opened. In progress (7/10)

  • JC: I had (thought) I closed it but immediately after the comment! Now 'solved'.

https://ggus.eu/ws/ticket_info.php?ticket=95442 (4/7)</br> The unresponsive VO master ticket. Nearly done now. On hold (12/8)

https://ggus.eu/ws/ticket_info.php?ticket=95833 (17/7)</br> Decommissioning of ral-ngs2. On hold whilst migrating some of the remaining services to Scarf. On hold (23/9)

VOMS</br> https://ggus.eu/ws/ticket_info.php?ticket=97823 (7/10)</br> A ticket has come into Manchester requesting that their voms stops supporting minos as part of the VO decommissioning process. Assigned (7/10)

SUSSEX</br> https://ggus.eu/ws/ticket_info.php?ticket=95165 (28/6)</br> Duncan asked Sussex to check their Perfsonar. At last word they were going to reinstall it with the latest version of perfsonar. Is anything likely to happen soon? On hold (14/8)

https://ggus.eu/ws/ticket_info.php?ticket=97139 (9/9)</br> APEL-Pub nagios test failures. Kashif has extended the ticket again. Other ROD shifters might not be so kind! In progress (18/9)

BRISTOL</br> https://ggus.eu/ws/ticket_info.php?ticket=96261 (30/7)</br> Bristol are seeing some stage out problems for a CMS user's jobs. They're trying valiantly to fix, but not having much luck. In progress (3/10)

GLASGOW</br> https://ggus.eu/ws/ticket_info.php?ticket=97068 (5/9)</br> Glasgow's perfsonar boxen need a kick too. Dave reports that they'll review the ports being used whilst updating their Perfsonar box to the latest and greatest version. On hold (18/9)

https://ggus.eu/ws/ticket_info.php?ticket=96234 (29/7)</br> Request for WMS HyperK support. The plan was to do it after GridPP. It's after GridPP. Just mentioning that... On hold (18/9)

ECDF</br> https://ggus.eu/ws/ticket_info.php?ticket=96002 (22/7)</br> A SHA-2 nagios ticket. Enough said. You might want to update your CEs for other reasons before SHA-2 becomes "mandatory". On hold (20/8)

DURHAM</br> https://ggus.eu/ws/ticket_info.php?ticket=97378 (17/9)</br> Another Apel-Pub nagios test ticket. Things are looking much better at Durham, and Stuart from the Apel team has gotten involved. In progress (7/10)

SHEFFIELD</br> https://ggus.eu/ws/ticket_info.php?ticket=97039 (4/9)</br> Biomed complaining about 44444444444 waiting jobs at Sheffield. Have you had a chance to take a peek at the problem Elena? On hold (11/9)

MANCHESTER</br> https://ggus.eu/ws/ticket_info.php?ticket=97066 (5/9)</br> Duncan spotted that the Manchester perfsonar wasn't working very well. Alessandra reports that they'll get to this until after the SL6 upgrade and they've finished debugging their central switch. On hold (9/9)

LIVERPOOL</br> https://ggus.eu/ws/ticket_info.php?ticket=97682 (1/10)</br> Liverpool's perfsonar has fallen ill after a power cut. It seems these boxes eventually go bad. The Liver lads can't get the perfSONAR-BUOY tests to start, it's looking like a reinstall is on the cards. In progress (2/10)

UCL</br> https://ggus.eu/ws/ticket_info.php?ticket=97783 (5/10)</br> Atlas are seeing transfer failures into UCL. There appears to be a discrepancy between real and reported space causing the issue. In progress (7/10)

https://ggus.eu/ws/ticket_info.php?ticket=97461 (20/9)</br> Atlas transfer failures caused by network troubles by the looks of it. Network engineers are putting in some 10G cards tomorrow (the 8th), so hopefully that will sooth it. In progress (3/10)

QMUL</br> https://ggus.eu/ws/ticket_info.php?ticket=97615 (27/9)</br> atlas complaining that the QM SE is not responding fast enough to lcg-stmd queries. Dan asked for some deserved clarification. Things are better, but Chris would like the Storm developers to get involved. On hold (2/10)

https://ggus.eu/ws/ticket_info.php?ticket=97819 (7/10)</br> hone have noticed a lack of disk space on some QM nodes affecting their jobs. Dan has tracked this down to epic not cleaning up after themselves, we've noticed the same at Lancaster. Dan has let epic know of their error. In progress (7/7)

TIER 1</br> https://ggus.eu/ws/ticket_info.php?ticket=97385 (17/9)</br> Request for a HyperK (the most breakfast cerealy sounding VO) cvmfs repo. Things are being worked on, but slowly. If things are going to0 slowly Chris pipe up! On hold (26/9)

https://ggus.eu/ws/ticket_info.php?ticket=97025 (3/9)</br> A ticket that keeps popping up in one form or another, concerning the RAL myproxy server's certificate. This has been worked on before, with a new service in stand by. The ticket has been held to prevent the Tier 1 being hassled about this again. On hold (12/9)

https://ggus.eu/ws/ticket_info.php?ticket=97516 (23/9)</br> t2k had problems FTSing their files about. It looks like this could have been caused by temporary network problems at external sites. The Tier 1 guys would like to know if the problem persists for t2k. Waiting for reply (30/9)

https://ggus.eu/ws/ticket_info.php?ticket=97759 (4/10)</br> The Tier 1's SHA-2 ticket. The plan is currently to let all the current out of date CEs die natural deaths as the clusters is decommissioned from under them. If they get a reprieve then they'll get upgraded. On hold (4/10)

https://ggus.eu/ws/ticket_info.php?ticket=97479 (20/9)</br> Atlas were seeing high failure rates on RAL SL5 nodes. Maybe an old cvmfs bug. With the move to SL6 this wasn't too much of a worry. Can this ticket be closed now? On hold (30/9)

https://ggus.eu/ws/ticket_info.php?ticket=86152 (17/9/2012)</br> I forgot to get a birthday card for this ticket - "correlated packet-loss on perfsonar host". Any news? On hold (17/6)

https://ggus.eu/ws/ticket_info.php?ticket=91658 (20/2)</br> Chris' request for webdav support on the RAL LFC. Last word were that a few holes needed to be poked in the RAL firewall for the service, then silence. On hold (9/8)

EFDA-JET</br> https://ggus.eu/ws/ticket_info.php?ticket=97485 (21/9)</br> lhcb jobs failures, probably due to an out of date set of CA certificates. Jet have updated and asked lhcb if things have started working. Waiting for reply (2/10) Monday 30th September 2013

First up a gentle reminder that if you've asked the submitter a question in a ticket (usually "Is it still broke for you?") remember to set the ticket to "Waiting for Reply". Then it's obvious to us watching that any tardiness on the ticket is the user's fault.

Secondly a general request that as next week we'll have a full review for sites to have a bit of a autumnal clean of their tickets, update what needs an update , close what can be closed.

Thirdly I've noticed a few cases of "boomerang tickets" over the last week, with tickets submitted by sites faithfully returning to their submitters. yet another thing to watch out for!

GLEXEC Hall of People with GLEXEC tickets:</br> CAMBRIDGE, RHUL, DURHAM, UCL, SHEFFIELD, EDINBURGH, QMUL, EFDA-JET, LANCASTER, BRISTOL and MANCHESTER all have gLEXec tickets open. It would be nice if they all received updates over the next week. BIRMINGHAM almost got gLexec dusted, but their ticket got reopened on them (95304). Looks like they should be out of the woods soon though.

NGI https://ggus.eu/ws/ticket_info.php?ticket=95469 (5/7)</br> Unresponsive VO master ticket. Malgorzata asks if the ticket can be passed to VO services to start the decommissioning process for the VOs that have had their day. On Hold (23/9)

SUSSEX https://ggus.eu/ws/ticket_info.php?ticket=97139 (9/9)</br> APEL test failures at Sussex. If you guys are stuck (no shame there, accounting problems are hard), then I suggest you set the ticket On Hold and/or open a support ticket with the apel guys (and cross reference it here). In progress (18/9)

EFDA-JET https://ggus.eu/ws/ticket_info.php?ticket=97485 (21/9)</br> This Jet ticket is looking a little crusty, could someone in Southgrid please have a poke, IIRC this just needs an upgrade of the CA cert rpms. In progress (23/9)

RAL (sort of)</br> https://ggus.eu/ws/ticket_info.php?ticket=97360 (17/9)</br> You may have heard about this infamous epic ticket last week at GridPP. It'll make you laugh, it'll make you cry. Then probably make you cry some more. It documents how circumstances have conspired to allow a single misconfigured CE break all WMS/CREAM interactions - others with a better understanding could say more. In Progress (possibly can be closed) (27/9)

DURHAM</br> https://ggus.eu/ws/ticket_info.php?ticket=97103 (6/9)</br> Another issue that cropped up in conversation at GridPP last week was this one at Durham, the likely suspect to their GridFTP problems being network security tools on the Durham firewall. Hopefully CIS will purge the IDS from your subnet! In progress (24/9)

From Steve:

I'd like to add this one, related to APEL. Sites upgrading from EMI2/UMD2 APEL to EMI2/UMD3 APEL should be aware of this bug. You'd be better off sticking for now.

https://ggus.eu/ws/ticket_info.php?ticket=97528


Monday 16th September 15.00 BST</br> It's all hands on deck here at Lancaster tomorrow as we move the shared cluster to SL6 (and SGE). So I may or may not be in the meeting tomorrow, depending on how things go! I would have skipped the review this week, but we're not getting one next week due to GridPP so I'll squeeze in a quick look at the tickets.

So here are the tickets that caught my eye this week:

MOST NEGLECTED TICKET:</br> SUSSEX https://ggus.eu/ws/ticket_info.php?ticket=97139</br> Hiding at the back of the cupboard with the marmite is this ticket, assigned a week ago. Being a ROD ticket this is at risk of escalating, and the ROD can't really do anything unless the ticket is at least acknowledged. Assigned (9/9)

HYPERK</br> @RAL</br> https://ggus.eu/ws/ticket_info.php?ticket=96233</br> https://ggus.eu/ws/ticket_info.php?ticket=96235</br> Catalin reports that the additional voms servers have been enabled on the WMS and LFC at RAL, both tickets await some testing.</br> @GLASGOW</br> https://ggus.eu/ws/ticket_info.php?ticket=96234 Chris would have it known that HyperK are now in the Operations portal, but there's no rush in enabling them at Glasgow though. I don't know if there's a general call for the sites to support the new VO?

RAL</br> https://ggus.eu/ws/ticket_info.php?ticket=95996 (22/7)</br> This is the SHA-2 ticket, it's set In Progress but looks like it's waiting on work that is still forthcoming, so probably could be On Holded, or at least updated. In Progress (5/9)

https://ggus.eu/ws/ticket_info.php?ticket=97168 (9/9)</br> Duncan spotted problems with the londongrid lfc. The ticket was acknowledged, but no news since. In Progress (9/9)

GLEXEC TICKETS</br> We're approaching the period where a lot of the sites with open GLEXEC tickets said they'd be starting to roll out GLEXEC. I'll save looking at these till the October ticket review.

UNRESPONSIVE VO TICKETS</br> Should the remaining two of these be closed and replaced with fresh, VO decommissioning tickets?

Monday 9th September 2013, 15.00 BST</br> We have 50 Open UK tickets- a number of issues have cropped up but I don't see any patterns. I went through all the tickets last week, so we'll just skim them today. Everyone's doing a good job, and a lot of the fresh issues that have cropped up are being handled nicely.

GLEXEC</br> Not much movement on the GLEXEC tickets since last week, but none was expected really.

HyperK</br> https://ggus.eu/ws/ticket_info.php?ticket=96235</br> https://ggus.eu/ws/ticket_info.php?ticket=96233</br> Whilst testing the HyperK VO Chris noticed some problems when he used proxies from voms2 and voms3 for the WMS and LFC, I guess due to a misconfiguration at the RAL end (and not problems with the voms servers), but I'm often wrong. Both In Progress (6/9)

PERFSONAR</br> Duncan has been busy poking sites that were looking bad in:</br> http://perfsonar.racf.bnl.gov:8080/exda/?page=25&cloudName=UK</br> A few sites have fixed the problems already, other sites are working on it. No tickets seem stalled (the Durham one was reopened though, they hadn't quite quashed the gremlins).

Well that's not many tickets is it. Nothing too exciting on the solved ticket pile either.

If anyone has any issues they want bought up please let us know.

And finally, a reminder that if you thought you read something in one of the Ticket Round-Ups that has since been overwritten and you can't be bothered trawling through the wiki history to resurrect the information I keep the old Ticket Round Ups here:</br> https://www.gridpp.ac.uk/wiki/Past_Ticket_Bulletins

(or if you're really looking for a blast from the past: https://www.gridpp.ac.uk/wiki/Past_Ticket_Bulletins_2012 ).

Monday 2nd September 2013, 14.00 BST</br> Ye gads, it's the start of another month already. So that means we're going over all the UK tickets again.

40 Open UK tickets this month.

CAMBRIDGE, BRISTOL, BIRMINGHAM, EDINBURGH, DURHAM, SHEFFIELD, MANCHESTER, LANCASTER, UCL, RHUL, QMUL and EFDA-JET all have glexec tickets. A lot of these could do with some updating, as a change of tactic I rolled them into each site's ticket roundup below.

NGI</br> https://ggus.eu/ws/ticket_info.php?ticket=95469 (5/7)</br> One of the remaining "Unresponsive VO" tickets, belonging to Supernemo. It's pretty much decided that the VO is defunct, we just need to decide what to do next. On hold (20/8)

https://ggus.eu/ws/ticket_info.php?ticket=95472 (5/7)</br> The other "Unresponsive VO" ticket, this one for minos. Pretty much the same situation as Supernemo. On hold (27/8)

https://ggus.eu/ws/ticket_info.php?ticket=95442 (4/7)</br> The "Unresponsive VOs" master ticket, just the above tickets left. On hold (12/8)

https://ggus.eu/ws/ticket_info.php?ticket=94780 (11/6)</br> Ye olde 100IT cloud site creation ticket. It's chugging along, the company has signed the OLA (ticket 96634). In progress (27/8)

https://ggus.eu/ws/ticket_info.php?ticket=95833 (17/7)</br> Decommissioning ral-ngs2. The last decommissioning ticket that I can see, was this guy forgotten about? On hold (25/7).

SUSSEX</br> https://ggus.eu/ws/ticket_info.php?ticket=95165 (28/6)</br> Sussex's perfsonar box playing up. Have you got round to reinstalling it yet? On hold (14/8)

CAMBRIDGE</br> https://ggus.eu/ws/ticket_info.php?ticket=95306 (1/7)</br> gLexec deployment ticket. Late summer is approaching, how goes things? On hold (9/7)

BRISTOL</br> https://ggus.eu/ws/ticket_info.php?ticket=96990 (2/9)</br> CMS seeing some transfer problems, it's being looked at. 'nuff said. In progress (2/9)

https://ggus.eu/ws/ticket_info.php?ticket=96261 (30/7)</br> After several quiet weeks the submitter of this ticket got back to Bristol, with the permissions problem still existing. He included some more information, maybe some bad mapping is going on. In progress (27/8)

https://ggus.eu/ws/ticket_info.php?ticket=95305 (1/7)</br> Glexec deployment ticket. On hold pending a laundry list of other tasks being completed. How goes it? On hold (11/7)

BIRMINGHAM</br> https://ggus.eu/ws/ticket_info.php?ticket=95304 (1/7)</br> glExec deployment ticket. It's September now, how goes it? On hold (17/7)

GLASGOW</br> https://ggus.eu/ws/ticket_info.php?ticket=96231 (29/7)</br> Sno+ had some WMS issues at Glasgow, which were probably caused by a temporary hiccup. Waiting on word back from Sno+ for a while. Waiting for reply (8/8)

https://ggus.eu/ws/ticket_info.php?ticket=96234 (29/7)</br> HyperK support on the Glasgow WMS. Glasgow were wanting to hold off until HyperK are in the Ops portal- which they still aren't. On hold (2/8)

EDINBURGH</br> https://ggus.eu/ws/ticket_info.php?ticket=96002 (22/7)</br> SHA-2 ticket. The ticket needs soothing, even if you don't solve it (it's *just* an update, although I successfully killed my backend database when I upgraded from my far too crusty cream version). On hold (20/8)

https://ggus.eu/ws/ticket_info.php?ticket=95303 (1/7)</br> GLexec ticket. Presumably waiting on me to get the tarball sorted (https://ggus.eu/ws/ticket_info.php?ticket=95832). On hold (27/8)

DURHAM</br> https://ggus.eu/ws/ticket_info.php?ticket=96628 (15/8)</br> Durham failing APEL pub tests. This ticket has been extended a lot. Are you in the middle of an EMI3 APEL move? Needs looking at, I think it's been extended as much as it can without an explanation. In progress (16/8)

https://ggus.eu/ws/ticket_info.php?ticket=96001 (22/7)</br> SHA-2 ticket. Will be completed this month. On hold (2/9)

https://ggus.eu/ws/ticket_info.php?ticket=96758 (21/8)</br> Another nagios error, this time SRM-Put failing. Again in urgent need of some input or else the ROD will have to escalate. They don't like doing that. In progress (21/8)

https://ggus.eu/ws/ticket_info.php?ticket=95302 (1/7)</br> glExec deployment. On hold until mid-September due to leave after some progress. On hold (2/9)

https://ggus.eu/ws/ticket_info.php?ticket=96791 (22/8)</br> Biomed noticed transfer failures to/from Durham Maybe related to 96758? Could do with at least On-holding. In progress (23/8)

SHEFFIELD</br> https://ggus.eu/ws/ticket_info.php?ticket=95301 (1/7)</br> GLExec deployment. Elena said last week that she aims to finish it off soon. How goes it? On hold (21/8)

MANCHESTER</br> https://ggus.eu/ws/ticket_info.php?ticket=95300 (1/7)</br> glexEC deployment ticket. Manchester will deploy this on the 14th of October - which if the tests become critical on the 1st of October might cause some hassle to them (but this was only a tentative date). On hold (1/7)

https://ggus.eu/ws/ticket_info.php?ticket=96837 (24/8)</br> Atlas transfers failing with "INVALID_PATH" errors. A disk server decided to take the bank holiday off and one of the switches is still playing up - waiting for things to calm down before closing. On hold (2/9)

https://ggus.eu/ws/ticket_info.php?ticket=96081 (23/7)</br> SHA-2 ticket. Planning to upgrade this mid-October during Manchester's big push. On hold (23/7)

LIVERPOOL</br> https://ggus.eu/ws/ticket_info.php?ticket=96940 (30/8)</br> SHA-2 failures, although you guys are late to the party on this one. Perhaps something whacked your ldifs and you're publishing old and stale information? In progress (2/9)

LANCASTER</br> https://ggus.eu/ws/ticket_info.php?ticket=96589 (12/8)</br> Failing APEL-Pub tests. An upgrade to EMI3 apel hasn't gone smoothly, trying to work out why (as nothing seems to broken). See ticket 95365 for details- that's where all the work is happening. On hold (13/8)

https://ggus.eu/ws/ticket_info.php?ticket=95299 (1/7)</br> GleXEC deployment. Still waiting on the tarball, leave and more pressing site issues have caused there to be no progress in the last month. On hold (17/7)

UCL</br> https://ggus.eu/ws/ticket_info.php?ticket=95298 (1/7)</br> gleXec deployment. Hope to get done after Sl6 deployment, by the end of September. On hold (29/8)

RHUL</br> https://ggus.eu/ws/ticket_info.php?ticket=95297 (1/7)</br> GlEXeC deployment. Plans to roll it out before the end of August went awry due to SL6 delays. Hope to do the two together. On hold (2/9)

QMUL</br> https://ggus.eu/ws/ticket_info.php?ticket=96618 (14/8)</br> Hone find that jobs are failing with not enough disk errors on cn633. Chris took the node offline and put the ticket on hold awaiting a permanent solution. On hold (19/8)

https://ggus.eu/ws/ticket_info.php?ticket=96856 (26/8)</br> I believe this is the same as above (but for a different node), although most jobs do not have a problem. In progress (27/8)

https://ggus.eu/ws/ticket_info.php?ticket=95296 (1/7)</br> gLeXeC deployment. It appears to be working for the SL6 nodes QM have deployed, just need to roll out more SL6 nodes. On hold (12/8)

EFDA-JET</br> https://ggus.eu/ws/ticket_info.php?ticket=95295 (1/7)</br> GlexeC deployment. Any ideas what the plan is with JET? On hold (19/7)

TIER 1</br> https://ggus.eu/ws/ticket_info.php?ticket=96235 (29/7)</br> Enabling HyperK on the RAL LFC. This is done, and Catalin has requested some tests (but IIRC Chris is on leave). Waiting for reply (22/8)

https://ggus.eu/ws/ticket_info.php?ticket=96233 (29/7)</br> WMS support for hyperk.org. Again this has been done, and just needs some testing. Waiting for reply (28/8)

https://ggus.eu/ws/ticket_info.php?ticket=96321 (2/8)</br> Sno+ SRM tests failing at the Tier-1, primarily caused by Castor not speaking Vomese. Kashif has suggested we ignore this test until such time as Castor learns how to read extensions. It needs some action though. In progress (23/8)

https://ggus.eu/ws/ticket_info.php?ticket=96968 (31/8)</br> CMS spotted a black hole node at RAL, with cvmfs problems. Offlined. In progress (31/8)

https://ggus.eu/ws/ticket_info.php?ticket=95996 (22/7)</br> SHA-2 ticket. One of the WMS is upgraded, but the CEs (and the other two WMS) need upgrading too. Confident that it will be done by the deadline though. In progress (23/8)

https://ggus.eu/ws/ticket_info.php?ticket=91658 (20/2)</br> The LFC WEbdav support ticket. Between leave and higher priority work I don't think there's been much progress on this over the last month though. In Progress (9/8)

https://ggus.eu/ws/ticket_info.php?ticket=86152 (17/9/12)</br> The oldest, and most neglected ticket - "correlated packet-loss on perfsonar host". Needs some sprucing up. On hold (17/6)

TICKETS FROM THE UK</br> My will to live is fading, but an argus ticket from Dan at QM was mentioned on TB-SUPPORT, which might affect some other UK sites:</br> https://ggus.eu/ws/ticket_info.php?ticket=96228 Tuesday 27th August 2013, 09.30 BST</br> 45 Open tickets this week. For the ticket list click here.

Some progress since last week. There remain gLexec Tickets, SHA-2 tickets, NGS site decommissioning tickets and Unresponsive VO tickets (minos and supernemo no change). Looking at the 'red' tickets that are not 'on hold' or interesting tickets.

NGI</br> https://ggus.eu/ws/ticket_info.php?ticket=96634 (15/8)</br> The "cloud" site, 100IT, has received a certification ticket. Assigned (15/8) (Child ticket of https://ggus.eu/ws/ticket_info.php?ticket=94780)


DURHAM</br> https://ggus.eu/ws/ticket_info.php?ticket=96530 (9/8)</br> Another 444444 waiting job ticket, could do with a bit of an update. In progress (12/8). No update since last ops meeting.

https://ggus.eu/ws/ticket_info.php?ticket=95302 (1/7)</br> Durham's gLexec ticket. Could do with a spot of soothing - either an update or on-holding. In progress (12/7). No updates in recent weeks.

Added a COD ticket https://ggus.eu/ws/ticket_info.php?ticket=96628. APEL pub failure. In progress (15/8)


TIER 1</br> https://ggus.eu/ws/ticket_info.php?ticket=95996 (22/7)</br> One of the SHA-2 tickets, this could do with an update or the ROD will have to declare it overdue. In progress (22/7). Catalin responded.

https://ggus.eu/ws/ticket_info.php?ticket=96321 (2/8)</br> Sno+ SAM jobs failing at RAL. Probably a problem with the cert these jobs are run under, I've involved Kashif in the ticket. Waiting for reply (19/8) Update - Kashif and Chris have had an exchange about this, as discussed last week the problem is due to Castor being VOMS unaware.

EFDA-JET</br> https://ggus.eu/ws/ticket_info.php?ticket=96625 (15/8)</br> Issue with installed certificates. In progress (15/8)

Monday 12th August 2013, 14.00 BST</br> 55 Open UK tickets this week. Let's take a couple of deep breathes, then go through them all. Yep, all of them!

NGI tickets:

Unresponsive VOs. (5/7):</br> https://ggus.eu/ws/ticket_info.php?ticket=95442</br> Master Ticket, on hold.</br> https://ggus.eu/ws/ticket_info.php?ticket=95473 </br> The gridpp ticket, Jeremy is onto this, hoping to wrap it up soon. In progress (12/8) SOLVED</br> https://ggus.eu/ws/ticket_info.php?ticket=95472</br> minos ticket. The state of the minos VO is still unknown, but suspected to be defunct. This was set in progress by a ticket manager, although technically it still doesn't have a home. In progress (26/7)</br> https://ggus.eu/ws/ticket_info.php?ticket=95469</br> Supernemo ticket. Gianfranco has confirmed it's "not his problem" anymore, and given a few names to try to contact at UCL. In progress (29/7)

NGS decommissioning.</br> https://ggus.eu/ws/ticket_info.php?ticket=95833 ral-ngs2</br> https://ggus.eu/ws/ticket_info.php?ticket=96141 oxford-ngs2 </br> https://ggus.eu/ws/ticket_info.php?ticket=96128 manchester-ngs2</br> https://ggus.eu/ws/ticket_info.php?ticket=96538 NGS-SHEFFIELD</br> Nothing to see here really, on hold until JK back from leave) and it doesn't really affect us.

Other NGI tickets:</br> https://ggus.eu/ws/ticket_info.php?ticket=94780</br> Cloud Site Creation request. The NGI has been asked for an update, JK has asked others for feedback but is currently on leave. In progress (probably should be on hold if JK isn't back for a while) (5/8)


gLExec tickets. (1/7):</br> SUSSEX https://ggus.eu/ws/ticket_info.php?ticket=95309 Some progress. On hold (23/7)</br> CAMBRIDGE https://ggus.eu/ws/ticket_info.php?ticket=95306 Get to it in late summer. On hold (9/7)</br> BRISTOL https://ggus.eu/ws/ticket_info.php?ticket=95305 After the current work-pile is conquered. Also as an aside, going for an arc ce? Interesting. On hold (11/7)</br> BIRMINGHAM https://ggus.eu/ws/ticket_info.php?ticket=95304 Aim to do it in ~August, along with other upgrades. On hold (9/7)</br> ECDF https://ggus.eu/ws/ticket_info.php?ticket=95303 On hold. (1/7)</br> DURHAM https://ggus.eu/ws/ticket_info.php?ticket=95302 Some progress made, but things stalled. Should be on held if things don't pick up again. In progress (8/8)</br> SHEFFIELD https://ggus.eu/ws/ticket_info.php?ticket=95301 On hold (10/7)</br> MANCHESTER https://ggus.eu/ws/ticket_info.php?ticket=95300 Will do it in October upgrade. On hold (1/7)</br> LANCASTER https://ggus.eu/ws/ticket_info.php?ticket=95299 Trying to get it to work on the tarball. Not having much luck. On hold (17/7)</br> UCL https://ggus.eu/ws/ticket_info.php?ticket=95298 Won't start until end of August. On Hold (29/7)</br> RHUL https://ggus.eu/ws/ticket_info.php?ticket=95297 Another for the end of August. On hold (16/7)</br> QMUL https://ggus.eu/ws/ticket_info.php?ticket=95296 Almost there, just need to roll out SL6 to all their nodes. On hold (12/8)</br> EFDA-JET https://ggus.eu/ws/ticket_info.php?ticket=95295 Some confusion over Jet's status was had. Otherwise waiting until later to deploy this. On Hold (19/7)

SHA-2 (22/7)</br> ECDF https://ggus.eu/ws/ticket_info.php?ticket=96002 On hold (23/7)</br> DURHAM https://ggus.eu/ws/ticket_info.php?ticket=96001 Will upgrade in September. On hold (31/7)</br> MANCHESTER https://ggus.eu/ws/ticket_info.php?ticket=96081 Again in the October upgrade. On hold (23/7)</br> LANCASTER https://ggus.eu/ws/ticket_info.php?ticket=95999 Will do this week. On hold (12/8)</br> TIER 1 https://ggus.eu/ws/ticket_info.php?ticket=95996 In Progress, but not much news. Maybe should be On Held? In progress (22/7)</br> NEW 12/8 RALPP https://ggus.eu/ws/ticket_info.php?ticket=96588 Just assigned yesterday (12/8)

Common or Garden tickets:

SUSSEX</br> https://ggus.eu/ws/ticket_info.php?ticket=96469 (8/8)</br> An ops ticket for CREAMCE-JobSubmit failures. Not acknowledged yet. Assigned (8/8) In progress, Emyr reports the BDII config disappeared (auto update accident?).

https://ggus.eu/ws/ticket_info.php?ticket=96470 (8/8)</br> Another ops ticket, for the SRM-GetSURLs tests. Emyr has posted an explanation for the problems. In progress (9/8)

https://ggus.eu/ws/ticket_info.php?ticket=96556 (10/8)</br> Another, slightly younger, ops ticket. This test is CREAMCE-CertLifetime. The cert expired 3 days ago (or an old one has snuck back on the server - that's happened to me more then once). Assigned (10/8)

https://ggus.eu/ws/ticket_info.php?ticket=95165 (28/6)</br> Duncan has asked you to check your perfsonar - which might be being affected by the firewall work mentioned in 96470. But this ticket is looking mighty neglected. On hold - last "proper" update was (1/7)

RALPP</br> https://ggus.eu/ws/ticket_info.php?ticket=96287 (31/7) Atlas were seeing timeouts on their deletion service at RALPP. Alaistair noticed correlation between the times for these failures with those at the Tier 1 (96079). Chris asked if the errors were spread evenly or came in bursts - Brian posted some information that to me suggests bursts. In progress (6/8) Update, problem still persists at both T1 and T2

https://ggus.eu/ws/ticket_info.php?ticket=96531 (9/8)</br> Someone (lhcb? I recognise the submitter's name) has spotted 444444 jobs being advertised at RALPP. No news from the site yet, such is the peril of Friday tickets (especially over the summer). But of course you'll fix that as soon as you read this... Assigned (9/8) Update - not only acknowledge, but solved. lcg-info-dynamic-scheduler-pbs.noarch missing, screwed up dependencies somewhere?

OXFORD</br> https://ggus.eu/ws/ticket_info.php?ticket=96440 (6/8)</br> Actually a ticket for the nagios at Oxford. Chris W noticed ops tests making some odd requests, and noted the old ticket 70066 where he spotted atlas doing similar. Kashif is on the case. In progress (7/8)

BRISTOL</br> https://ggus.eu/ws/ticket_info.php?ticket=96261 (30/7)</br> A CMS user had trouble writing into a path at Bristol. Lukasz couldn't see anything wrong, and another user has written to the volume without error, so the submitter has been asked if he still sees a problem. No reply yet. Waiting for reply (5/8)

https://ggus.eu/ws/ticket_info.php?ticket=96483 (8/8)</br> Bristol had some obsolete glue 2 entries in their publishing. The Bristol team are on it. In progress (9/8)

BIRMINGHAM</br> https://ggus.eu/ws/ticket_info.php?ticket=95418 (4/7)</br> Alice, what's the matter? They'd like cvmfs installed at Birmingham. Due to the lack of urgency on this change Mark is leaving it until after the other stuff that needs to be done in this Summer of Upgrades. On hold (17/7)

https://ggus.eu/ws/ticket_info.php?ticket=96555 (10/8)</br> SRM-Put Ops test failures hitting Birmingham. Space has run out, Mark has his shoe horn out to create more but it will take a little while to sort out. In progress (12/8)

https://ggus.eu/ws/ticket_info.php?ticket=96533 (9/8)</br> LHCB have asked for g++ to be installed at Birmingham. Mark asked if this is urgent, and I think the LHCB reply can be summarised as "yes". In progress (9/8)

GLASGOW</br> https://ggus.eu/ws/ticket_info.php?ticket=96528 (9/8)</br> Glasgow also are having 444444 Waiting jobs on some of their shares. Gareth pointed out that the bad CEs are newer EMI ones - cream developers have been involved. In progress (12/8)

https://ggus.eu/ws/ticket_info.php?ticket=96234 (29/7)</br> Request to support the new HyperK VO on the Glasgow WMS. Glasgow would like to wait until the VO was supported on all the VOMS servers and the Operations Portal. Chris points out that it is supported on all the former. The latter is being a pain (is what I think the implication was). On hold (2/8)

https://ggus.eu/ws/ticket_info.php?ticket=96231 (29/7)</br> Sno+ have seen a lot of failures from jobs going through one of Glasgow's WMSii. The problem looks to have been ephemeral, but some zombie job clean up was needed. This was the end of July, Sno+ have been asked if they still have a problem. Waiting for reply (8/8)

ECDF</br> https://ggus.eu/ws/ticket_info.php?ticket=96331 (2/8)</br> Failing the ApelDN publishing ops tests. Turned out "publishGlobalUserName no" snuck into the new CE configuration. Just waiting for the republishing to soak in. In progress (12/8) Solved now.

DURHAM</br> https://ggus.eu/ws/ticket_info.php?ticket=96530 (9/8)</br> Another 444444 waiting jobs ticket. Not acknowledged yet though. Assigned (9/8) Update - In progress

https://ggus.eu/ws/ticket_info.php?ticket=96554 (10/8)</br> Ops CREAMCE-JobSubmit failures. Assigned (10/8)

MANCHESTER</br> https://ggus.eu/ws/ticket_info.php?ticket=96582 (12/8) An atlas user and the UK atlas team have spotted some files that they can't access at Manchester, in ATLASSCRATCHDISK. Assigned (12/8) Update- in progress, machines are back online. Problem with some network kit.

QMUL</br> https://ggus.eu/ws/ticket_info.php?ticket=94746 (10/6)</br> Biomed haunting the QM SE's information. Reinstalling the SE didn't kill off the entries, Storm-developers have been called in. On hold (31/7)

BRUNEL</br> https://ggus.eu/ws/ticket_info.php?ticket=96217 (29/7)</br> CMS have spotted that the bdii seems to be publishing inconsistent Wall/CPU time (2880 for one, 72 for the other, so one is in minutes, t'other is in hours). This is a known issue, fixed in EMI-3 (ticket 91859). Reading the ticket Raul doesn't intend to upgrade just to fix this until he's given it a proper testing. As he suggested, it's probably best to On Hold it until then. In progress (6/8)

EFDA-JET</br> https://ggus.eu/ws/ticket_info.php?ticket=96526 (9/8)</br> LHCB are seeing some 'certificate verify failed' errors at efda-jet. Not something I've seen before - CA certificate problems maybe? In progress (9/8)

And last but by no means least:

TIER 1</br> https://ggus.eu/ws/ticket_info.php?ticket=96482 (8/8)</br> CMS have noticed transfers from Caltech to RAL failing. Problem looks to be transient, Brian asked if retries also fail. Waiting for reply (8/8)

https://ggus.eu/ws/ticket_info.php?ticket=96235 (29/7)</br> Chris W has asked for a LFC for HyperK. In progress, but slowed by Vacations. In progress (9/8)

https://ggus.eu/ws/ticket_info.php?ticket=86152 (17/9/2012)</br> The oldest open ticket: "correlated packet-loss on perfsonar host". Last news was that the upgrade to the Tier 1 backbone/uplink was still in the planning stage. But is the original problem still there? On hold (17/6)

https://ggus.eu/ws/ticket_info.php?ticket=96321 (2/8)</br> The RAL SE is failing Sno+ nagios tests. Looks to be a problem with Kashif being mapped to t2k - problems seem to be authentication based (but what about ops tests - do they pass too? I smell a possible red herring). Waiting for reply (6/8)

https://ggus.eu/ws/ticket_info.php?ticket=96233 (29/7)</br> Request for HyperK support on the RAL WMS. In progress, but again Summer Vacations are slowing things down. In progress (9/8)

https://ggus.eu/ws/ticket_info.php?ticket=91658 (20/2)</br> Webdav support on the RAL LFC. Some good progress has been made by the looks of it, but again people going on well deserved, probably much needed holiday is slowing things down over the Summer. In progress (9/8)

Monday 29th July 2013 14.30 BST</br> There are 52 Open UK tickets this week. It's business as usual this week, but with so many tickets the risk of me missing something is greater then usual so let me know if I've skimmed over an important issue for your site or of interest to the UK.

Why so many tickets? We've been hit by several groups of tickets at once: 13 gLExec tickets, 10 Decommissioning NGS sites, 7 SHA-2, 5 Unresponsive VOs (the makings of a terrible Christmas Carol). That leaves only 17 tickets outside these categories.

SHA-2 tickets</br> 7 tickets left for these, affecting Glasgow, Manchester, ECDF, Durham, Lancaster, Imperial and the Tier 1. Depending on the time frame of when you plan to look at this those who haven't see to On hold might want to (Glasgow, IC, Tier 1) if they aren't going to be tackling this soon. Kashif has extended the tickets.

VOMS</br> https://ggus.eu/ws/ticket_info.php?ticket=95792 (16/7)</br> The HyperK VO has been rolled out and just needs testing now. In Progress (29/7) Update- Tests passed, ticket closed. Tickets have gone out asking WMS sites to support the new VO.

Unresponsive VOs (5/7)</br> https://ggus.eu/ws/ticket_info.php?ticket=95442 -Master ticket.</br> https://ggus.eu/ws/ticket_info.php?ticket=95474 - camont. Waiting for repy, can be closed (22/7)</br> https://ggus.eu/ws/ticket_info.php?ticket=95473 - gridpp. Jeremy waiting on new e-mail lists. In progress (29/7)</br> https://ggus.eu/ws/ticket_info.php?ticket=95472 - minos. The VO is probably dead, Jeremy is checking. Assigned (26/7)</br> https://ggus.eu/ws/ticket_info.php?ticket=95469 - supernemo. Probably also dead, Jeremy is checking here too. In progress (22/7)</br>

GLEXEC (1/7)</br> Birmingham, Cambridge, Bristol, Sussex, ECDF, Durham, Sheffield, Manchester, Lancaster, UCL, RHUL, Queen Mary and EFDA-JET all have open gLexec tickets. Durham and IC are In Progress - ironing out a few bugs (although both tickets could do with a soothing update to keep us in the loop-particularly the Durham one). The rest are on hold, but Sussex also appear to be close to a solution. Others are quoting late summer before tackling gLExec deployment.

NGS Decommissioning.</br> Not much to see here, 10 sites are on the "chopping block" but no progress is expected until late August so they're all on hold.

TIER-1</br> https://ggus.eu/ws/ticket_info.php?ticket=96079 (23/7)</br> Atlas seeing slow deletion rates, caused by occasional time outs for some deletions. Shaun is investigating, and can't see anything in the SRM layer causing this. Waiting for atlas to get back with some examples from their logs to cross-reference. Waiting for reply (24/7)

https://ggus.eu/ws/ticket_info.php?ticket=91658 (20/2)</br> Webdav for the RAL LFC. Catalin has asked for some advice on setting up the Webdav interface. Is anyone able to help him? Waiting for reply (17/7) Update - Catalin has rolled out the Webdav interface, Chris W will now crusade to get more sites Webdaved. Catalin points out that RAL's Castor doesn't support webdav.

OXFORD</br> https://ggus.eu/ws/ticket_info.php?ticket=96090 (23/7)</br> This issue was brought up in the Storage meeting, but worth mentioning here. Ewan had a problem where his one server with space on it ended up being given a weighting of zero by DPM gremlins. After fixing this things are better, but the one empty disk server is under a lot of stress. As our sites fill up issues like this could become more common. In progress (26/7) (Also see https://ggus.eu/ws/ticket_info.php?ticket=96071, which is the same issue for Sno+. It looks like the issue has been solved for them though).

DURHAM</br> https://ggus.eu/ws/ticket_info.php?ticket=96024 (22/7)</br> It looks like Durham have a black-hole node, sucking jobs into oblivion: the culprit appears to be n36.dur.scotgrid.ac.uk from the atlas monitoring. It could be a red herring, but things are quiet on the ticket fro the site end. In progress (29/7) Update-Mike offlined the bad node, but now all their nodes are failing jobs! It never rains but it pours...

SUSSEX</br> https://ggus.eu/ws/ticket_info.php?ticket=95165 (28/6)</br> Duncan poked Sussex to check their perfsonar installation. Not much word on this ticket for a while since Duncan updated with some information. In progress (1/7)

100IT</br> https://ggus.eu/ws/ticket_info.php?ticket=94780 (11/6)</br> The Cloud Site ticket. JK wants to hammer out a few things with NGI members before finalising this as it is the first Industrial Partner, but he's on leave (hopefully somewhere without this weekend's rain!), so it might be a while before this is finalised.

Nothing exciting in the solved case pile, but my eyes are about to fall out of my head after sifting through the active tickets so I may well have missed something.

Monday 22nd July 2013</br> 45 Open tickets for the NGI this week.</br> http://tinyurl.com/cblj3ab</br>

With so many open tickets and my day starting with a trip to the vets and then ending up with me working from home with one eye on a poorly cat I'm afraid this is the second week in a row with a half-cocked ticket review- sorry about that. The connection at home is a bit slow making checking individual tickets a pain, hence the different format. There's no excuse for any puns though.

New VO (https://ggus.eu/ws/ticket_info.php?ticket=95792 (16/7) )</br> Chris has submitted a request for the creation of the HyperK VO on the UK voms server. The request is chugging along. In progress (18/7).

NGS wind down.</br> There's a handful of tickets tracking the closure of some ngs resource centres (Keele-NGS, NGS-Leeds and ral-ngs2). I don't think this affects anyone in GridPP, but I like to report anything out of the ordinary.

SHA-2 hitting the fan...</br> As mentioned by Kashif, a number of sites have been handed out tickets after failing SHA-2 tests. Liverpool, Lancaster, RALPP, Bristol, ECDF and Durham have all received tickets for one or two of their CREAM CEs, IC recieved one for their WMS (which Daniela has already expressed her righteous displeasure about). Most are In Progress already.

UK Cloud Site (https://ggus.eu/ws/ticket_info.php?ticket=94780)</br> There's a request from Malgorzata if we can move forward with the cloud site, everything that's needed to be set up has been set up.

Unresponsive VOs still being Unresponsive (https://ggus.eu/ws/ticket_info.php?ticket=95442).</br> Some movement here, for one from https://ggus.eu/ws/ticket_info.php?ticket=95470 babar is now deleted (just in case you still support babar somewhere). supernemo is likely to go the same way (https://ggus.eu/ws/ticket_info.php?ticket=95469). The camont (https://ggus.eu/ws/ticket_info.php?ticket=95474) and minos (https://ggus.eu/ws/ticket_info.php?ticket=95472) tickets still haven't been acknowledged.

gLEXEC-utive Decision.</br> Not much movement on the gLExec deployment front - but from most site's initial replies progress wasn't expected until after July was over. The list of gLExec-less sites (or sites with broken gLEexec) is Sussex, Cambridge, Bristol, Birmingham, ECDF, Durham, Sheffield, Manchester, Lancaster, UCL, RHUL, QMUL and EFDA-JET. There is no shame being on that list (not yet anyway!).

LFC Webdav (https://ggus.eu/ws/ticket_info.php?ticket=91658)</br> Catalin has had a go at installing the LFC webdav, but would like a hand in implimenting the webdav interface.


Tuesday 16th July 2013</br> I'm afraid I was on a train back from the Deep South yesterday, so it will be a light ticket review this week. Please check out the tickets for your site (if any) here:

http://tinyurl.com/cblj3ab

The hot topics are glexec deployment and the unresponsive UK VOs (so the same as last week).

We have a couple of tickets stuck in the assigned state:

https://ggus.eu/ws/ticket_info.php?ticket=95309 - SUSSEX GLEXEC

https://ggus.eu/ws/ticket_info.php?ticket=95472 - minos being unreponsive

https://ggus.eu/ws/ticket_info.php?ticket=95473 - gridpp VO being unresponsive - but IIRC Jeremy was on leave.

https://ggus.eu/ws/ticket_info.php?ticket=95474 - same again, for camont this time.

If anyone has any ticket related problems they want to bring up please let me know before or during the meeting, as there's a good chance I might have missed something this week.


Monday 8th July 2013 14.30 BST</br> 34 Open UK tickets this week. Lets dive in.

Unresponsive VOs hosted by NGI_UK (5/7)</br> https://ggus.eu/ws/ticket_info.php?ticket=95442- The Parent Ticket, there's also mention of oxgrid.ox.ac.uk in the last update.</br> https://ggus.eu/ws/ticket_info.php?ticket=95474 -camont</br> https://ggus.eu/ws/ticket_info.php?ticket=95473 -gridpp</br> https://ggus.eu/ws/ticket_info.php?ticket=95472 -minos.vo.gridpp.ac.uk</br> https://ggus.eu/ws/ticket_info.php?ticket=95471 -Don't know for sure who this pertains to, perhaps vo.northgrid.ac.uk?</br> https://ggus.eu/ws/ticket_info.php?ticket=95470 -babar</br> https://ggus.eu/ws/ticket_info.php?ticket=95469 -Can't quite figure out who this is for either, probably supernemo.vo.eu-egee.org</br>

These tickets are asking for the CiC portal entries to be updated and/or cleaned up for the relevant VOs. It could be that some of these VOs are no longer used and need the Old Yeller treatment. Not much progress on any of these tickets, but they were only opened on the 5th.

GLEXEC TICKETS (1/7)</br> A number of sites have changed the ticket catagory from an "Incident" to a "Change Request", which I would encourage. It's not an incident unless we fail to meet a deadline!</br> CAMBRIDGE https://ggus.eu/ws/ticket_info.php?ticket=95306 John put in a brief plan. In Progress.</br> BRISTOL https://ggus.eu/ws/ticket_info.php?ticket=95305 In progress.</br> BIRMINGHAM https://ggus.eu/ws/ticket_info.php?ticket=95304 In Progress.</br> ECDF https://ggus.eu/ws/ticket_info.php?ticket=95303 "Not urgent". On hold.</br> DURHAM https://ggus.eu/ws/ticket_info.php?ticket=95302 Plan to implement soon, with support from Glasgow. In progress.</br> SHEFFIELD https://ggus.eu/ws/ticket_info.php?ticket=95301 "gLexec installation is in progress". In progress.</br> MANCHESTER https://ggus.eu/ws/ticket_info.php?ticket=95300 Will rollout gLexec during SL6 deployment on the 14/10. On hold.</br> LANCASTER https://ggus.eu/ws/ticket_info.php?ticket=95299 Working on a gLExec tarball (if possible). In progress.</br> UCL https://ggus.eu/ws/ticket_info.php?ticket=95298 Will try to deploy later in July. On hold.</br> RHUL https://ggus.eu/ws/ticket_info.php?ticket=95297 Working on it. In progress.</br> QMUL https://ggus.eu/ws/ticket_info.php?ticket=95296 Hope to mix it in with the SL6/EMI3 move. Atlas' problems on EMI3/SL6 might delay though. In progress.</br> EFDA-JET https://ggus.eu/ws/ticket_info.php?ticket=95295 An exchange occurred over JET's nature. In progress.</br> SUSSEX https://ggus.eu/ws/ticket_info.php?ticket=95309 Assigned. My usual thought is that Emyr is on holiday?</br>

OXFORD and RALPP have closed their tickets, as they have gLEXEC already deployed and only got ticketed due to glitches at their sites at the time.

BIRMINGHAM</br> https://ggus.eu/ws/ticket_info.php?ticket=95418 (4/7)</br> ALICE have ticket the site about enabling cvmfs for them. Assigned (8/7)

MANCHESTER</br> https://ggus.eu/ws/ticket_info.php?ticket=95487 (6/7)</br> LHCB SAM jobs weren't picking up the VO_LHCB_SW_DIR env variable. Assigned (6/7)

ECDF</br> https://ggus.eu/ws/ticket_info.php?ticket=94873 (14/6)</br> The LHCB reply seems to have been to just set the ticket back to "In progress", so you might as well close this one. In progress (2/7)

TIER 1</br> https://ggus.eu/ws/ticket_info.php?ticket=91658 (20/2)</br> Catalin has put plans down for rolling out a read-only LFC webdav frontend to the Oracle db as a standalone host alongside the current lfc.gridpp.rl.ac.uk alias. Please let him know if he's missed something. In progress (2/7)


Ticket Summary Supplemental!

Too late I noticed that Raul asking that some relevant tickets be brought up:

He submitted:</br> https://ggus.eu/ws/ticket_info.php?ticket=95110</br> regarding his recent (bad) experiences after "upgrading" cvmfs.

Which is pertinent to two active UK tickets at the moment:</br> https://ggus.eu/ws/ticket_info.php?ticket=95125 (Brunel)</br> https://ggus.eu/ws/ticket_info.php?ticket=94880 (Imperial)

At last report the I.C. admins were hopeful that some changes they had made would help sort things out.

Raul's advice "don't upgrade now as they are trying to release new version that should be more resilient." Hopefully the tickets above will have some hints for those already have took the plunge.


Monday 1st July 2013, 14.30 BST

18 Open UK tickets today.

Scratch that: make it 33. We just got hit by 15 "gLExec deployment" tickets, affecting:</br> Sussex, RALPP, Oxford, Cambridge, Bristol, Birmingham, ECDF, Durham, Sheffield, Manchester, Lancaster, UCL, RHUL, QMUL and EFDA-JET. It would be quicker to list who wasn't ticketed! Please can sites write their gLExec plans into their corresponding ticket - I'll review these next week.

NGI_UK</br> https://ggus.eu/ws/ticket_info.php?ticket=94780 (11/6)</br> The creation of the UK Cloud Site continues. On hold (26/6)

https://ggus.eu/ws/ticket_info.php?ticket=94766 (10/6)</br> Atlas were having some problems with transfers to/from the UK, possibly due to FTS3. I'm not entirely certain what's up with this ticket, it might be that it can be closed. Can someone in the know please close it or update it? In Progress (24/6)

TIER 1</br> https://ggus.eu/ws/ticket_info.php?ticket=95160 (28/6)</br> Atlas transfer errors to RAL, caused by Castor problems. The problems were hoped to be fixed on Sunday, but it looks like they've cropped up again. In progress (1/7)

https://ggus.eu/ws/ticket_info.php?ticket=95147 (27/6)</br> CMS jobs were having troubles at RAL due to cvmfs problems, which have since hopefully been fixed. Waiting for reply (27/6) SOLVED

https://ggus.eu/ws/ticket_info.php?ticket=95134 (27/6)</br> Another CMS CVMFS related ticket. Has this issue been solved to? In progress (27/6)

https://ggus.eu/ws/ticket_info.php?ticket=91658 (20/2)</br> Long standing ticket concerning deploying Weddav support on the RAL LFC. According to an announcement in the ticket, the latest version of lcgdm-dav is in epel-testing. On hold (1/7)

https://ggus.eu/ws/ticket_info.php?ticket=86152 (17/9/12)</br> "correlated packet-loss on perfsonar host". Not much news as updates depend on RAL's network upgrade schedule, but the upgrade to 40Gb/s backbone is still in the planning stage. On hold (16/6)

SHEFFIELD</br> https://ggus.eu/ws/ticket_info.php?ticket=95182 (1/7)</br> atlas are seeing transfer problems. Sheffield had power problems over the weekend, this is probably the aftermath - confirmation that the problems have gone away have been asked for. Waiting for reply (1/7) SOLVED

GLASGOW</br> https://ggus.eu/ws/ticket_info.php?ticket=95176 (30/6)</br> Glasgow having srm problems, which appear to be caused by their xrootd director dying and causing the disks to fill up with log messages. Solved as a typed (1/7)

https://ggus.eu/ws/ticket_info.php?ticket=94945 (18/6)</br> Biomed haven't replied to the request from Gareth to see if they're still having SRM problems. The SYSTEM has sent it's first reminder on the 26th. Waiting for reply (19/6)

QMUL</br> https://ggus.eu/ws/ticket_info.php?ticket=95166 (28/6)</br> Duncan has asked QM to rename their perfsonar-v6 box "and give it IPv6 and IPv4 only host names.". Assigned (28/6)

https://ggus.eu/ws/ticket_info.php?ticket=94746 (10/6)</br> QM have a ghost SE for Biomed, on hold until a reconfigure wipes out the last traces of support from the information system. On hold (10/6)

SUSSEX</br> https://ggus.eu/ws/ticket_info.php?ticket=95165 (28/6)</br> Duncan has poked Sussex to check their perfsonar installation, Emyr has responded that they've opened new ports- Duncan pointed Sussex to the up-to-datest FAQ http://psps.perfsonar.net/toolkit/FAQs.html In progress (1/7)

IC</br> https://ggus.eu/ws/ticket_info.php?ticket=95161 (29/6)</br> CMS has asked if they should only submit only RHEL6 glideins to IC from now on, Simon replies with a "Yes, please". Waiting for reply, probably could be closed (28/6)

https://ggus.eu/ws/ticket_info.php?ticket=94880 (14/6)</br> Another cvmfs related problem, although this one predates last week's peril. Daniela and Simon have been working on it, and have the latest and greatest cvmfs version. No news for a week on this though. In progress (24/6)

OXFORD</br> https://ggus.eu/ws/ticket_info.php?ticket=95126 (26/6)</br> Oxford also had some CMS CVMFS related problems, which should have been solved on the day. Kashif has asked for some further details, no reply yet (an artifact of the automated savannahness of the CMS ticket?). Waiting for reply (27/6)

BRUNEL</br> https://ggus.eu/ws/ticket_info.php?ticket=95125 (26/6)</br> More CVMFS problems, but this time for lhcb. Raul closed it, thinking that the problem was caused by the upgrade of cvmfs at his site, but lhcb have dashed those hopes. LHCB have been providing lists of nodes for which jobs have failed. In progress (1/7)

ECDF</br> https://ggus.eu/ws/ticket_info.php?ticket=94873 (14/6)</br> LHCB asked ECDF to fix their published OS information, which they did. Andy asked for some more information, but nothing has been heard yet. The SYSTEM has sent out its second reminder on the 28th of June, IIRC the user has one week to comply - I'll need to check up on this. Waiting for reply (14/6)

Monday 24th June 15.00 BST

17 Open UK tickets this week, most are of the common or garden variety and being handled, or are slow burners with no news. Here's the ones that stand out a bit.

NGI</br> https://ggus.eu/ws/ticket_info.php?ticket=94766 (10/6)</br> This ticket was submitted by atlas over UK sites using FTS3. I was privy to some of the NET2 problems which seem to be largely solved (although looking at the savannah ticket Brian submitted not all is well https://savannah.cern.ch/support/?138095). Is there anything the UK needs to do? In progress (24/6)

TIER 1</br> https://ggus.eu/ws/ticket_info.php?ticket=94755 (10/6)</br> Errors retrieving job output from the RAL WMS. The user reports that the problem has gone away, and the related SNO+ ticket https://ggus.eu/ws/ticket_info.php?ticket=94543 (set GLITE_LB_EXPORT_PURGE_ARGS variable to "--cleared 2d --aborted 15d --cancelled 15d --done 60d --other 60d") has been solved, so I think this ticket can be closed too. In Progress (can be closed) (24/6) Update - SOLVED

https://ggus.eu/ws/ticket_info.php?ticket=91658 (20/2)</br> Enabling webdav on the RAL LFC. Catalin has updated the ticket that currently only a read-only version is available. Chris has replied saying that most use-cases that currently want to be tested will only need read-only access. On hold (19/6)

IC</br> https://ggus.eu/ws/ticket_info.php?ticket=94880 (14/6)</br> LHCB jobs are having what appear to be cvmfs troubles at IC. Despite a lot of poking, prodding and trying to replicate the issue Simon and Daniela are a bit stumped. It doesn't happen all the time, and they're running a recent cvmfs release. The investigation continues. In progress (24/6)

cernatschool</br> https://ggus.eu/ws/ticket_info.php?ticket=94731 (RAL)</br> https://ggus.eu/ws/ticket_info.php?ticket=94732 (GLASGOW)</br> Chris has tested both sites with some success (one WMS at RAL isn't working, one at Glasgow still needs to be configured). Nice work by all.

I don't see any exciting Solved Cases this week, and no other tickets have been brought to my attention in the last 7 days.

Monday 17th June 2013 14.30 BST</br> 15 Open tickets for the UK this week.

UK_NGI</br> https://ggus.eu/ws/ticket_info.php?ticket=94780 (11/6)</br> The UK "Cloud Site" has been created (sporting the name 100IT). It currently has no endpoints, but it'll be interesting to see where this goes. In progress (17/6)

TIER 1</br> https://ggus.eu/ws/ticket_info.php?ticket=94543 (4/6)</br> SNO+ having WMS problems at RAL (the old failure to retrieve output bugbear). Catalin and the team have kicked things, but would like to see if their efforts have affected things. Waiting for reply (11/6)</br> (also probably linked to https://ggus.eu/ws/ticket_info.php?ticket=94755)

GLASGOW</br> https://ggus.eu/ws/ticket_info.php?ticket=94732 (7/6)</br> cernatschool enabling on the GLASGOW WMS. Should be sorted, request for testing from the VO. Chris said he would have a go if possible, and also put cernatschool into the CiC portal but IIRC Chris was traveling a lot last week and is away this one. Anyone else a cernatschool member? Waiting for reply (10/6)</br> (see also the Tier 1 version https://ggus.eu/ws/ticket_info.php?ticket=94731)

QMUL</br> https://ggus.eu/ws/ticket_info.php?ticket=94510 (3/6)</br> The lhcb issue with the QM information publishing has been fixed, a fresh ticket (94722) is tracking the uncovered SGE/bdii init script interaction problem. This ticket can be closed (10/6) Update - Dan closed this ticket. Thanks squire!

Tickets of Interest.

Raul kindly brought this EMI3/SL6 ticket to my attention:</br> https://ggus.eu/ws/ticket_info.php?ticket=94878</br> Wherein voms-proxy-info doesn't know to look at $X509_USER_PROXY variable, which is causing problems and job deaths. A fix has been made but will be a week or so before it trickles down to the EMI repos.

If anyone else has any tickets of interest crop up feel free to contact me with them.

Monday 10th June 2013 13.00 BST</br> 16 Open UK tickets this week, some of them so fresh the ink is still wet. And then I come in this morning and find a bunch of them solved, nice work! Shame that the tickets keep coming


NEW</br> https://ggus.eu/ws/ticket_info.php?ticket=94780 (11/6)</br> The NGI has received a request to instantiate a cloud site. Assigned (11/6)

https://ggus.eu/ws/ticket_info.php?ticket=94766 (10/6)</br> The UK has received a ticket about the use of FTS3, which is causing errors for some atlas transfers (specifically to NET2). Brian is on the case. In progress (11/6)

Tier 1</br> https://ggus.eu/ws/ticket_info.php?ticket=94758 (10/6)</br> CMS noted a SAM test failure at RAL, who are already on it (and think it should have been cleared up). Waiting for reply (to confirm) (10/6) SOLVED

https://ggus.eu/ws/ticket_info.php?ticket=94755 (10/6)</br> A user, from an unspecified VO, is having trouble retrieving data from a RAL WMS. A bit of a cryptic ticket. Assigned (10/6)

https://ggus.eu/ws/ticket_info.php?ticket=94505 (3/6)</br> https://ggus.eu/ws/ticket_info.php?ticket=94615 (5/6)</br> These two tickets are almost identical, CMS Hammercloud failures caused by high load on the CMS Castor instance. In progress (10/6) Both SOLVED

https://ggus.eu/ws/ticket_info.php?ticket=94543 (4/6)</br> Sno+ were having problem receiving their outputs from the RAL WMS. Daniela referenced https://ggus.eu/ws/ticket_info.php?ticket=92288 and gave a possible fix- I don't know if this was tried out. In progress (5/6)

https://ggus.eu/ws/ticket_info.php?ticket=86152 (17/9/2012)</br> Correlated Packet Loss on the RAL Perfsonar. This ticket has passed it's reminder date, so could do with updating before it gets too whiffy. On hold (19/3)

https://ggus.eu/ws/ticket_info.php?ticket=91658 (20/2)</br> Enabling webdav support on the RAL LFC. Waiting for a reply offline regarding the expected update to be put into production. On hold (29/5)

https://ggus.eu/ws/ticket_info.php?ticket=94731 (7/6) Request from Chris to enable cernatschool on the RAL WMS. In progress (10/6)

QMUL</br> https://ggus.eu/ws/ticket_info.php?ticket=94510 (3/6)</br> QM were publishing too large a MaxCPUTime, causing lhcb some grief. The interesting part was that Chris found some interesting behaviour that his changes would work when he ran a `/etc/init.d/bdii restart`, but not with a `service bdii restart` (or a reboot). Some sge related variables in /etc/profile.d weren't being seen in the latter two instances. Maarten posted a patch to the bdii script, and discussion brought to light an old ticket from Andrew at ECDF: https://ggus.eu/ws/ticket_info.php?ticket=88284. This ticket looks like it can be closed though. In Progress (10/6)

https://ggus.eu/ws/ticket_info.php?ticket=94746 (10/6)</br> Biomed complaining as the QM SE published Biomed support when it is supposed to be decommissioned for the VO. Perhaps some yaim artifact has cropped up in a recent reconfigure? Assigned (10/6)

IC</br> https://ggus.eu/ws/ticket_info.php?ticket=94358 (27/5)</br> Biomed complaining about software version tests failing at IC, due to the probe not containing the up-to-date tweak to take into account the tarballs. Daniela links two tickets trying to get this taken into account (one from me-90768, one from herself- 89891). I also wonder why Lancaster hasn;t received one of these tickets. Waiting for reply (10/6)

ECDF</br> https://ggus.eu/ws/ticket_info.php?ticket=94781 (11/6)</br> A Ops test eu.egi.sec.WN-ops. I think you have a glite-version "hack" in your path which is causing the failure. The official way of publishing your version is to have EMI_TARBALL_BASE in your environment- perhaps it disappeared on the move to SL6? Assigned (11/6)

GLASGOW</br> https://ggus.eu/ws/ticket_info.php?ticket=94732 (7/6)</br> Chris requested cernatschool support on the Glasgow WMS, Gareth has set things up and requests a test run (but noted that he couldn't see them in the CiC portal-and I couldn't either). Waiting for reply (10/6)

RALPP</br> https://ggus.eu/ws/ticket_info.php?ticket=94602 (5/6)</br> Hone were having jobs aborted on one of RALPP's queues. Chris kicked things but problems continue. In Progress (6/6) SOLVED

SUSSEX</br> https://ggus.eu/ws/ticket_info.php?ticket=94241 (21/5)</br> Please close the ticket. I draw the line at closing peoples tickets. Don't make me cross that line! In Progress (30/5) SOLVED - THANKS!

UCL</br> https://ggus.eu/ws/ticket_info.php?ticket=94247 (21/5)</br> Atlas WLCG squid change over. Ben has installed the new squid, on holding the ticket until ready for a changeover. On hold (but reminder date passed) (30/5)

SHEFFIELD</br> https://ggus.eu/ws/ticket_info.php?ticket=94423 (30/5)</br> Atlas space shuffling request, from LOCALGROUP to DATA. Elena informed that juggling tokens was difficult due to a dodgey disk server. In progress (30/5) SOLVED

Stardate 04-06-13-00.04</br> No proper ticket overview this week, as we're defending the Federation from Klingon hackers.

My net connection is a bit ropey at Starfleet's barracks here at Cosener's, but of the 16 UK tickets there's a RHUL and a QMUL one that need attending to (probably as the admins have been busy defending the free galaxy):</br> https://ggus.eu/ws/ticket_info.php?ticket=94521 (RHUL)</br> https://ggus.eu/ws/ticket_info.php?ticket=94510 (QMUL)</br>

There's a ROD ticket I don't really understand (as it concerns Lancaster, but as far as I know we haven't experienced any 72-hour problems):</br> https://ggus.eu/ws/ticket_info.php?ticket=94519

This Sussex ticket concerning Sno+ looks as if it can be closed:</br> https://ggus.eu/ws/ticket_info.php?ticket=94241

Whilst this other Sno+ ticket to Glasgow looks like it could use some love:</br> https://ggus.eu/ws/ticket_info.php?ticket=94213

Could Kashif or someone else Nagios experty please comment on this Biomed ticket to IC- Biomed are confused about nagios versions and need advice:</br> https://ggus.eu/ws/ticket_info.php?ticket=94358

If I see the admins listed tomorrow I'll give you a personal prod.

Live long and prosper!

Tuesday 28th May 10.00 BST

17 Open tickets this morning.

TIER 1</br> https://ggus.eu/ws/ticket_info.php?ticket=92266 (6/3)</br> Chris has had (some) success testing the new myproxy server, and it's now in the gocdb. This ticket could be closed now, or after another round of testing (just to be sure). In Progress (21/5)

https://ggus.eu/ws/ticket_info.php?ticket=93149 (5/4)</br> Atlas cvmfs errors on nodes that were testing a new version of cvmfs. Atlas seem to think things look okay, so it looks like the testing might be successful. On hold (18/5)

RHUL</br> https://ggus.eu/ws/ticket_info.php?ticket=94096 (15/5)</br> RHUL were ticketed over publishing the obsolete "GLUE2EntityCreationTime". Am I the only one to find tickets like this a little confusing? Govind heroically fought the good fight, and this ticket can now be closed. In Progress- can be closed (22/5)

https://ggus.eu/ws/ticket_info.php?ticket=94246 (21/5)</br> https://ggus.eu/ws/ticket_info.php?ticket=94260 (22/5)

Both these tickets concern the changing of the squid acls to the new WLCG squid monitoring (for atlas and cms respectively). Govind has had a go, but no luck so far (I know to get it to work at Lancaster required some fumbling). Waiting for reply/In Progress (22/5)

UCL</br> https://ggus.eu/ws/ticket_info.php?ticket=94247 (21/5)</br> UCL also got a ticket concerning their squid monitoring acls. In Progress (22/5)

MANCHESTER</br> https://ggus.eu/ws/ticket_info.php?ticket=94301 (23/5)</br> Manchester's DPM is misbehaving during atlas transfers, possibly due to database problems (perhaps due to innodb1 being too large). Alessandra is onto it, but any dpm problem is worth keeping an eye on. In Progress (26/5)

SUSSEX</br> https://ggus.eu/ws/ticket_info.php?ticket=94241 (21/5)</br> This SNO+ ticket regarding some submission problems to the Sussex CE looks like it can be closed - Emyr enabled glite-lb-locallogger. I think this was a classic case of atlas jobs working when others didn't due to atlas not using the WMS. In progress (can be closed) (22/5)

With 10 minutes to the meeting I think I'll leave it there.


Monday 20th May 21 Open UK tickets today.

Self-service week for tickets as I'm on leave: http://tinyurl.com/cblj3ab

Tarball sites might want to look at https://ggus.eu/ws/ticket_info.php?ticket=94120

Additional from Jeremy:

RALPP need to look at https://ggus.eu/ws/ticket_info.php?ticket=93905

Mark M: There is a need to test the Oxford setup for the new VO https://ggus.eu/ws/ticket_info.php?ticket=93969

Oxford as a CMS T3: On hold pending a CMS update https://ggus.eu/ws/ticket_info.php?ticket=93532

Durham: Mike looking at this LHCb problem https://ggus.eu/ws/ticket_info.php?ticket=92590

Sheffield: Have you progressed at all with the SNOLUS permissions problem? https://ggus.eu/ws/ticket_info.php?ticket=94027

Mark M: You were due to report back in this ticket on your ScotGrid VO move to Manchester VOMS https://ggus.eu/ws/ticket_info.php?ticket=93939

Alessandra: Do we know why this storage ticket did not make it to Manchester admins? https://ggus.eu/ws/ticket_info.php?ticket=94010

QMUL StoRM ticket now extended: https://ggus.eu/ws/ticket_info.php?ticket=93981 (but still red?)

RAL T1: ATLAS job timeouts still under investigation (on hold) https://ggus.eu/ws/ticket_info.php?ticket=93149

RAL T1: Correlated packet loss under investigation (on hold) https://ggus.eu/ws/ticket_info.php?ticket=86152

RAL T1: myproxy server issue. Now in GOCDB? https://ggus.eu/ws/ticket_info.php?ticket=92266

RAL T1: LFC webdav support https://ggus.eu/ws/ticket_info.php?ticket=91658. This ticket should be updated.

Monday 13th May 14.45 BST</br> 16 Open UK tickets this afternoon.</br>

EMI Upgrade</br> QMUL: https://ggus.eu/ws/ticket_info.php?ticket=93981 (10/5)</br> We thought we had the last of these, but sadly Storm has started triggering an EMI1 alert. As Chris is in the unenviable position of having no where to upgrade to (a working EMI3 Storm has an ETA of the end of May) upgrading isn't an option. Chris has held this ticket, but it might be that we need to counter-ticket the nagios team (citing the test as "unsuitable") or get testimony from the Storm devs to back our case *if* someone gets shirty about this. Will keep an eye on this one. On Hold (13/10) Update - Daniela has strongly recommended that we launch a pre-emptive counter-ticket at storm and/or nagios.

TIER 1</br> https://ggus.eu/ws/ticket_info.php?ticket=93149 (5/4)</br> Atlas jobs were failing on nodes testing a new version of cvmfs. A new new version was installed on Friday, and appears to be working, which is good news. Testing is still ongoing though. On hold (13/5)

https://ggus.eu/ws/ticket_info.php?ticket=92266 (6/3)</br> The new myproxy server is up and running, but no feedback has been given on the ticket. Has feedback been given elsewhere? It's likely that we just want to close this as I'm not sure feedback will be forthcoming, any problems with the new service could be handled in a new, fresh ticket. Waiting for reply (22/4)

QMUL</br> https://ggus.eu/ws/ticket_info.php?ticket=93791 (2/5)</br> LHCB jobs were just staying idle at QM, Chris tracked it due to a bug in the CREAM/SGE interactions - lhcb set a memory requirement which cream wasn't passing on correctly. Chris patched his scripts and submitted a ticket (https://ggus.eu/ws/ticket_info.php?ticket=93956). Probably needs to be Waiting for Reply-ed to get the thumbs up from lhcb. In progress (9/5) UPDATE- This ticket has been solved and verified

RALPP</br> https://ggus.eu/ws/ticket_info.php?ticket=93905 (7/5)</br> Chris B shuffled this CMS ticket over to GGUS shortly before heading on leave - and its been untouched ever since. Can anyone else in RALPP comment on it? It could be that the problem is no more (I'm ever the optimist!). In progress (7/5) Update - Rob is on it, wonkiness is still seen.

DURHAM</br> https://ggus.eu/ws/ticket_info.php?ticket=92590 (18/3)</br> This ticket is mega-crusty now (I can think of no other phrase to describe it). GGUS ticket monitoring have involved Claire. Let's not have things escalate over such a benign issue. On hold (18/3)

SOLVED CASES (both freshly conquered)</br> https://ggus.eu/ws/ticket_info.php?ticket=93833 (3/5)</br> Imperial were called out over both of their load-balanced site-BDIIs not being in the GOCDB. Daniela solved the ticket, although I admit to being a little confused by the end. It wasn't in the gocdb, but in the general site information that both BDIIs were expected to be published. Things got mixed up in my head.

https://ggus.eu/ws/ticket_info.php?ticket=89804 (18/12/2012)</br> After no word from Stephane or anyone else from atlas, Sam closed this ticket after deleting Glasgow's groupdisk space token. He was certainly in the right, but it would have been nice for Atlas to sanction the deletion. As Sam correctly pointed out, the problem was communication with the Atlas "central" and not the UK Atlas support, who have been nothing but communicative, helpful and generally great.

Tuesday 7th May, 10.00 BST</br> 17 Open UK tickets this week.

VOMS</br> https://ggus.eu/ws/ticket_info.php?ticket=93828 (3/5)</br> A request has been put in to retire the camont.gridpp.ac.uk VO. Jeremy has provided the necessary confirmation, Robert is onto it. In progress (7/5)

TIER 1</br> https://ggus.eu/ws/ticket_info.php?ticket=93870 (6/5)</br> CMS have issued the Tier 1 with a reminder that their squid must be upgraded by the end of May "In accordance with the WLCG mandate for squid monitoring". The local squid expert is on it. In progress (7/6)

https://ggus.eu/ws/ticket_info.php?ticket=92266 (6/3)</br> RAL has its new myproxy service up and running, but I think it's still wanting a few users to test it out before closing the ticket. Waiting for reply (16/4)

https://ggus.eu/ws/ticket_info.php?ticket=93149 (5/4)</br> atlas were seeing jobs failing at RAL due to cvmfs errors whilst testing out a new version of cvmfs. Tests still failing, is there any feedback from the cvmfs devs? On hold (3/5)

https://ggus.eu/ws/ticket_info.php?ticket=91658 (20/2)</br> Chris' request to have webdav support on the RAL LFC is still waiting on the improved webdav to be rolled into the production LFC. On hold (3/4)

https://ggus.eu/ws/ticket_info.php?ticket=86152 (17/9/2012)</br> "correlated packet-loss on perfsonar host" at RAL. March's intervention changed the picture somewhat, Andrew has stated that he's waiting on the next intervention (scheduled for this month?) before continuing his investigation. On hold (19/3)

MANCHESTER</br> https://ggus.eu/ws/ticket_info.php?ticket=93889 (7/5)</br> Hot off the ticket press, a user from the gocdb crowd would like svn access. Assigned (7/5)

IC</br> https://ggus.eu/ws/ticket_info.php?ticket=93884 (6/5)</br> Ewan ticketed IC over their Top BDII being a bit shaky, Daniela has confirmed that IC had a power cut over the weekend, and is prodding. In progress (7/5)

https://ggus.eu/ws/ticket_info.php?ticket=93833 (3/5)</br> IC are ticketed about their site-BDII not being published as part of their site. Daniela explained that this is due to them using two load-balanced, aliased hosts for their bdii. Stephen B has explained that in this case each bdii should be a resource bdii for the other, to keep things consistent. At least that's how I understood his explanation. In progress (6/5)

LANCASTER</br> https://ggus.eu/ws/ticket_info.php?ticket=93850 (5/5)</br> Lancaster started failing atlas transfers with stale crl type errors. I gave it a kick, it looks like our fetch-crl crons are dodgey after an update. I'll also continue this with the storage group. In Progress (7/5)

GLASGOW</br> https://ggus.eu/ws/ticket_info.php?ticket=93493 (19/4)</br> Sam has been trying to identify why one disk server, which should be identical to all his others, isn't working right for biomed (and only biomed). In progress (3/5)

https://ggus.eu/ws/ticket_info.php?ticket=89804 (18/12/2012)</br> Glasgow are just waiting on the green light from atlas to deleted the GROUPDISK token, then they can put this ticket to bed. Alessandra has directly involved Stephene to coax a reply. Waiting for reply (2/5)

OXFORD</br> https://ggus.eu/ws/ticket_info.php?ticket=93532 (22/4)</br> CMS were having trouble submitting to Oxford, primarily due to the latter being in the middle of an upgrade to SL6. After a bit of further tweaking things should be fixed, waiting for confirmation from the VO (which might take a while as there's no real work heading to Oxford to act as guinea pig jobs). Waiting for reply (3/5)

https://ggus.eu/ws/ticket_info.php?ticket=93817 (3/5)</br> An atlas user was having trouble getting at his files at Oxford (although Alastair could get to them okay). Ewan is on it. In progress (3/5)

QMUL</br> https://ggus.eu/ws/ticket_info.php?ticket=93791 (2/5)</br> LHCB pilots were becoming stale at QM, Chris had seen a problem and was onto it. In progress (2/5)

DURHAM</br> https://ggus.eu/ws/ticket_info.php?ticket=92590 (18/3)</br> LHCB pilots were being aborted at Durham, but this was a while ago now. The problem might have fixed itself (they sometimes do that), or at least it should be looked at again - at last check Durham were doing quite well. On Hold (2/4)

RHUL</br> https://ggus.eu/ws/ticket_info.php?ticket=89751 (17/12/12) Chris' ticket to RHUL about MTU path discovery not working. There was a bit of hope that this was fixed, but it wasn't. It's now back to the RHUL network admins and JANET. On hold (1/5)

TICKETS SHOW AND TELL</br> Raul brought two of his tickets to our attention:</br> https://ggus.eu/tech/ticket_show.php?ticket=88280</br> Where Raul had some problems with his Cream that were solved by clearing the argus pepd cache. Might not be a problem for those moving to EMI3, but the commands to force a reload after a policy change seem handy:</br> /etc/init.d/argus-pdp reloadpolicy</br> /etc/init.d/argus-pepd clearcache

https://ggus.eu/tech/ticket_show.php?ticket=93452</br> Where Raul had certificate "flavour" problems on his new CE - possibly another case where doing something exactly as you did on SL5 doesn't always get the intended results on SL6?

Thanks Raul!

Monday 29th April 2013, 14.45 BST</br> Only 17 open tickets assigned to the UK NGI this week. Make that 16 open.

EMI UPGRADE SEASON.</br> No doubt this will be covered elsewhere in the meeting, but with the deadline imminent it doesn't hurt repeating ourselves over this.

RALPP: https://ggus.eu/ws/ticket_info.php?ticket=93676 (26/4)</br> The site got re-ticketed about this on Friday, and the chaps might not have noticed it yet. As Daniela pointed out, if this is the red herring that it looks to be we need to counter-ticket the EU Nagios by the end of the month to avoid the ban-hammer. Assigned (26/4) Update - Chris, Stephen and Daniela are in discussion about what to do - it could be the nagios caching things it shouldn't or related to cern bdii problems - https://ggus.eu/ws/ticket_info.php?ticket=93650

GLASGOW: https://ggus.eu/ws/ticket_info.php?ticket=93632 (24/4)</br> Glasgow closed their other tickets, but got a new one about their WMSii for their trouble (no rest for the wicked?). Gareth has stated their plan to take these EMI1 WMS down on the 30th, to be bought back when/if the rebuild troubles they've seen can be worked out.

MUNDANE TICKETS</br> gridpp.ac.uk</br> https://ggus.eu/ws/ticket_info.php?ticket=93337 (15/4)</br> This ticket still looks like it's solved, if no one objects I'll close it myself (I assume the solution was "updated certificates on the web server"?). In progress (can be closed) (23/4)

VOMS</br> https://ggus.eu/ws/ticket_info.php?ticket=92306 (7/3)</br> Setting up the earthsci VO. Robert has asked for David and Gareth's e-mail addresses to use for the VO records. Waiting for reply (24/4)

TIER 1</br> https://ggus.eu/ws/ticket_info.php?ticket=93654 (25/4)</br> Chris has put in a request to have the T2K LFC at RAL upgraded from a "local" to a "global" LFC. The RAL team are on it. In progress (26/4)

GLASGOW</br> https://ggus.eu/ws/ticket_info.php?ticket=93642 (24/4)</br> I've singled out this ticket for two reasons - one that you should discourage VOs from piling extra, unrelated issues onto an existing ticket. The second is that sites should remember that tickets that are "re-opened" on you still need to have their statuses changed once they land back in your lap. (The ticket is also technically interesting as it codifies the problems Glasgow have been seeing with pile jobs on their many-core nodes, but this has been discussed in the atlas UK meetings). In progress (29/4) Update - the storm has passed and things have calmed down, Elena closed the ticket.

OXFORD</br> https://ggus.eu/ws/ticket_info.php?ticket=93532 (29/4)</br> I think this CMS ticket can be put to Waiting for Reply now that you have your SL6 nodes working, but I'm not sure enough to interfere with it myself. On hold (29/4)

No Tickets in the solved pile catch my eye.

Tickets of Interest.

https://ggus.eu/tech/ticket_show.php?ticket=93701</br> Chris ticketed the argus unit requesting a man page for pap-admin. The ticket was promptly closed (unsolved) with "not enough man power to produce a man page" - further stating that the help command should be sufficient. A little concerning.

https://ggus.eu/ws/ticket_info.php?ticket=92498</br> The ticket covering Chris and his EMI3 APEL migration. As I'm going to have to migrate to EMI3 soon for the improved LSF support this is very relevant to my interests.

In fact let's keep going with a few more of Chris' tickets...

https://ggus.eu/tech/ticket_show.php?ticket=91587</br> Memory Leak in BUpdaterSGE. Chris upgraded to EMI3 and still sees the issue.

https://ggus.eu/tech/ticket_show.php?ticket=88976</br> "glite-wn-info doesn't list any conf files" I think that this was supposed to be fixed in EMI3 WN, but there has been deathly silence from the WN devs (are there any now?).

Monday 22nd April 2013, 15.00 BST</br> Only 20 Open tickets this week.

gridpp.ac.uk</br> https://ggus.eu/ws/ticket_info.php?ticket=93337 (15/4)</br> The user's problem accessing the gridpp website has been solved, the ticket can be closed. In progress (can be closed) (16/4)

VOMS</br> https://ggus.eu/ws/ticket_info.php?ticket=92306 (7/3)</br> The earthsci VO has been been deployed to all the UK VOMS servers. Steve J has asked a question to the VO about becoming an Approved VO (tm), not sure if this has been received by the VO/Mark Mitchell. Otherwise if the voms is working for the VO this ticket can be closed. In progress (9/4)

EMI1</br> The argus EMI1 alerts have started showing up:</br> GLASGOW: https://ggus.eu/ws/ticket_info.php?ticket=93407</br> Gareth can't find any EMI1-ness about their argus box, it is running EMI2, and manually running the ldapsearch matching the test supports this. Waiting for reply.</br> BRUNEL: https://ggus.eu/ws/ticket_info.php?ticket=93406 </br> Raul has put down a comprehensive reply, although again the offending server is EMI2 (although it's being/has been shut down). A service which should fail the tests (which is scheduled for an upgrade) managed to sneak under the radar. Raul solved this ticket whilst I was typing.

And there's still the two DPM tickets:</br> GLASGOW: https://ggus.eu/ws/ticket_info.php?ticket=92805</br> DURHAM: https://ggus.eu/ws/ticket_info.php?ticket=92804</br> Both these tickets could really do with an update from their respective sites.

TIER 1</br> https://ggus.eu/ws/ticket_info.php?ticket=92266 (6/3)</br> The new myproxy service is open at the Tier 1 ready and open for testing: myproxy.gridpp.rl.ac.uk It's not in the gocdb yet, but feel free to give it a whirl. Waiting for reply (feedback) (16/4)

RALPP</br> https://ggus.eu/ws/ticket_info.php?ticket=93416 (17/3)</br> Biomed reported seeing nagios job failures at RALPP, it turned out that the CE was having problems due to a flood of biomed jobs. Biomed have asked for the user's DN so that they can have a word. In progress (18/3)

GLASGOW</br> https://ggus.eu/ws/ticket_info.php?ticket=93493 (19/4)</br> Another biomed ticket concerning their nagios jobs, which were failing on some Glasgow CEs. It looks like the problem is their end with their proxies expiring and their lfc not working, but they seem to have got confused. Waiting for reply (21/4)

RHUL</br> https://ggus.eu/ws/ticket_info.php?ticket=89751 (17/12/12)</br> Govind had been informed that the path MTU discovery to RHUL should work, but Chris has reported that he still sees the problem. It might be useful to find out where the external box that the RHUL network admins used for their test resides. In progress (17/4)


Monday 15th April 2013 14.30 BST</br> 26 Open UK tickets this week, most seem in hand. Here's the one's that jump out. I have an ill-timed appointment at the vets so I might not make it to the meeting in time, but the important bits are the gridpp.ac.uk ticket, and the remaining 3 EMI1 upgrade tickets which are in need of updating by the corresponding sites (Glasgow, Durham, RALPP).

NGI/gridpp.ac.uk</br> https://ggus.eu/ws/ticket_info.php?ticket=93337 (15/4)</br> This one stumped me about where it should be sent to, the submitter is having cert problems with the gridpp.ac.uk website- possibly due to the CA certs being out of date. Assigned (15/4) Update- Andrew sorted this out, and the user reports problem solved. Looks like this can be closed.

GLASGOW</br> https://ggus.eu/ws/ticket_info.php?ticket=93343 (15/4)</br> This ticket has been assigned to NGS-GLASGOW, which I'm almost certain is wrong - can one of the Glasgow chaps check and reassign to themselves if I'm right. Assigned (15/4) Update- Gareth solved this one.

EMI1 Upgrade.</br> Only the DPM tickets at Glasgow and Durham, and the dcache ticket at RALPP, remain. There are special circumstances around all of them (DPM and dcache versioning is quite separate from the EMI number) but all three have requests for updates on them.</br> GLASGOW: https://ggus.eu/ws/ticket_info.php?ticket=92805</br> DURHAM: https://ggus.eu/ws/ticket_info.php?ticket=92804</br> RALPP: https://ggus.eu/ws/ticket_info.php?ticket=91997 Update- Chris has solved the ticket, although there are still errors on the dashboard everything is upgraded.

RHUL</br> https://ggus.eu/ws/ticket_info.php?ticket=92969 (29/3)</br> Biomed reported seeing negative used space values for the RHUL dpm. Govind attempted to apply the old patch and failed, and has opened a new ticket with the DP devs: https://ggus.eu/tech/ticket_show.php?ticket=93026 In Progress (might want to On Hold if a new patch looks slow in coming) (10/4)

interest:</br> https://ggus.eu/ws/ticket_info.php?ticket=92498</br> I overlooked this one last week, but QMUL's ticket charting their upgrade to EMI3 APEL might be of interest.

MONDAY 8th APRIL 15.00 BST</br> 27 Open UK tickets this week, and as it's the first working day of the month, we have the joy of looking at all of them.

NGI</br> https://ggus.eu/ws/ticket_info.php?ticket=93142 (5/4)</br> The UK ROD is being pulled over the coals over not handling recent tickets "according to escalation procedure". I suspect all the tickets refered to are EMI1 upgrade ones, so justifying ourselves should be straightforward. Assigned to ngi-ops. (8/4)

VOMS</br> https://ggus.eu/ws/ticket_info.php?ticket=92306 (7/3)</br> Rolling out voms support for the new, Glasgow-based earthsci vo. After some discussion on domain naming it was decided to go with the vo name earthsci.vo.gridpp.ac.uk. It has been deployed at the Manchester, Oxford and IC, so I assume the next step is testing it. In progress (4/4)

EMI 1 UPGRADE TICKETS:</br> RALPP https://ggus.eu/ws/ticket_info.php?ticket=91997 (On hold, extended 5/4)</br> Chris has put back the dcache upgrade a bit, but it seems in order. The last other EMI1 holdout was being drained for upgrade last week.

GLASGOW https://ggus.eu/ws/ticket_info.php?ticket=91992 (In progress, extended 5/4)</br> Not much word from the Glasgow lads in a while (since 11/3), but they only had a few holdouts left.</br> https://ggus.eu/ws/ticket_info.php?ticket=92805 (On hold)</br> Glasgow's DPM ticket (despite their DPM technically being up to date)- Sam hopes to "update" when DPM 1.8.7 comes out, but if that looks unlikely in the time frame SAM will reinstall the DPM rpms to simulate an upgrade.

SHEFFIELD https://ggus.eu/ws/ticket_info.php?ticket=91990 (On hold, extended 5/4)</br> Just some worker nodes left at Sheffield. Looking good. (But Elena has some publishing issues-see TB-SUPPORT).

BRUNEL https://ggus.eu/ws/ticket_info.php?ticket=91975 (On hold)</br> Raul upgraded his CE, only to find that the nagios tests haven't picked up the upgrade! Daniela suggests a site BDII restart. Update - Raul seems to have figured out an arcane way of getting the publishing to work by yaiming twice then restarting the site BDII.

DURHAM https://ggus.eu/ws/ticket_info.php?ticket=92804 (In progress, extended 5/4)</br> Not much news from Mike about this in the last few weeks- I think that he's in the same boat as Sam - technically up to date (just from the "wrong" repo).

COMMON OR GARDEN TICKETS:

OXFORD</br> https://ggus.eu/ws/ticket_info.php?ticket=92688 (20/3)</br> Brian asked for a data dump, Ewan provided two! Ewan has left the ticket open whilst atlas decide what to do with the information. Waiting for reply (2/4)

GLASGOW</br> https://ggus.eu/ws/ticket_info.php?ticket=89804 (18/12/2012)</br> Moving atlas data from the groupdisk token. Last word was from Stephene on the 3/3, asking for a dump of what remains. I think that the conversation has moved offline to expedite things. How goes it? On hold (3/3)

https://ggus.eu/ws/ticket_info.php?ticket=92691 (20/3)</br> Glasgow supplied Brian with a list of all the files on the SE, Brian has given back a list of all the "dark data" files that they couldn't delete remotely. In progress (8/4)

https://ggus.eu/ws/ticket_info.php?ticket=93036 (2/4)</br> Glasgow were being bit by stage in failures after disk server stress killed the xrootd service on a node. Measures have been put in place to stop this happening again, and Sam has said some wise words on this issue (as it was data hungry production jobs that caused the deadly stress). Sam suggests that it would be beneficial to have these data-hungry production jobs flagged in some way, so that they can be treated similarly to how analysis jobs are (staggered starts, limiting the maximum number running etc.) In progress (5/4)

This raises the question, is it likely that suggestions put in a ticket like this would work their way up the chain to someone who could act on them?

DURHAM</br> https://ggus.eu/ws/ticket_info.php?ticket=92590 (18/3)</br> lhcb were having what looks like authorisation problems at Durham. Not much news on the ticket since then, does the problem persist? On hold (2/4)

MANCHESTER</br> https://ggus.eu/ws/ticket_info.php?ticket=93179 (8/4)</br> atlas would like 5TB shuffled from localgroupdisk to datadisk. Assigned (8/4)

LIVERPOOL</br> https://ggus.eu/ws/ticket_info.php?ticket=93160 (7/4)</br> Atlas were suffering transfer failures, which puzzled the Liver lads as their logs showed the transfers succeeding. It could have been a problem with the University firewalls - the timing of the problems coincided with a change in the Uni firewall. These have been reverted so lets see if things go back to normal. In progress (8/4)

LANCASTER</br> https://ggus.eu/ws/ticket_info.php?ticket=91304 (8/2)</br> LHCB jobs were running in the tidgey home partition on the Lancaster shared cluster. I've tried to put in place a job wrapper that cds to $TMPDIR, but no joy - not sure what I'm doing wrong. On hold (27/3)

RHUL</br> https://ggus.eu/ws/ticket_info.php?ticket=89751 (17/12/12)</br> Path MTU discovery problems for RHUL. Passed to the networking chaps and Janet, this may be a long time in the solving. On hold (28/1)

https://ggus.eu/ws/ticket_info.php?ticket=92969 (29/3)</br> Biomed are reporting seeing negative space on the RHUL SE- an old bugbear resurrected. In progress (1/4)

QMUL</br> https://ggus.eu/ws/ticket_info.php?ticket=93180 (8/4)</br> QM got a nagios ticket for the recent APEL troubles, Dan rightfully cited the apel ticket. In progress (8/4)

https://ggus.eu/ws/ticket_info.php?ticket=92951 (29/3)</br> Atlas transfer failures, caused by a crash in a disk storage node. Reopened after the initial fix, it looks like a lustre bug is plaguing the QM chaps. Currently they're hoping on a bug fix or else they'll need to rollback. In progress (8/4)

TIER 1</br> https://ggus.eu/ws/ticket_info.php?ticket=91658 (20/2)</br> Chris requesting webdav support on the RAL LFC. The RAL team are waiting on the next lfc version with better webdav support to come out in production. On hold (3/4)

https://ggus.eu/ws/ticket_info.php?ticket=91029 (30/1)</br> Long standing ticket concerning the srm troubles with certain robot DNs. No fix is likely in the near future. On hold (27/2)

https://ggus.eu/ws/ticket_info.php?ticket=86152 (17/12/12)</br> Correlated packet loss on the RAL perfsonar. The picture looks improved after last month's intervention, but still needs understanding. Proposed to wait until after the May intervention before looking at this hard again. On hold (27/3)

https://ggus.eu/ws/ticket_info.php?ticket=93136 (5/4)</br> epic VO having trouble downloading output from the RAL WMS. Most likely related to known problem https://ggus.eu/ws/ticket_info.php?ticket=92288 (submitted by Jon from t2k). In progress (5/4)

https://ggus.eu/ws/ticket_info.php?ticket=93149 (5/4)</br> Obviously Friday was the day of tickets. atlas were seeing a large number of cvmfs related cmtside failures. These nodes were testing the latest cvmfs 2.1.8, and have been rolled back. Waiting for reply (8/4)

https://ggus.eu/ws/ticket_info.php?ticket=92266 (6/3)</br> RAL were having problems with their myproxy aliases not matching up to their myproxy's certs. After trying a few fixes the RAL guys are setting up a new machine with the hostname and certificate match. Aim to have this done within a fortnight. In progress (28/3)

APEL</br> Just in case you guys haven't been reading TB-SUPPORT, the ticket tracking the current APEL problems:</br> https://ggus.eu/ws/ticket_info.php?ticket=93183


Monday 1st April 20:00 BST</br> 27 Open UK tickets, but we'll have to wait until next week for a full review of them all as Matt's on leave this week and sending his apologies for tomorrow's meeting - nothing's striking him as urgent although someone on the ROD/Ops team might want to look at https://ggus.eu/ws/ticket_info.php?ticket=92512 (Wahid has set it to waiting for reply, there might be some confusion over who needs to do the replying).

In the meant time if you aren't on leave too then please have a gander at your sites tickets and see if there's ought that needs your attention: http://tinyurl.com/cblj3ab

Otherwise he'll catch y'all next week, by then hopefully he will have stopped referring to himself in the third person again.

In other news:

EMI-3 Storm is not production ready: https://ggus.eu/tech/ticket_show.php?ticket=92819


Monday 18th March 15.00 GMT</br> Only 26 open tickets this week, although a lot of them are "interesting".

NGI/ECDF</br> https://ggus.eu/ws/ticket_info.php?ticket=92512 (14/3)</br> ECDF have been accused of being in an UNKNOWN state for 22% of February. Wahid, pretty sure that ECDF state through the month was fairly known, has questioned these results. Waiting for reply (18/3)

NGI/UCL</br> https://ggus.eu/ws/ticket_info.php?ticket=92412 (11/3)</br> Jeremy has given a reply to the EGI COD giving the reasons why UCL shouldn't be suspended. Waiting for reply (18/3)

VOMS</br> https://ggus.eu/ws/ticket_info.php?ticket=92306 (7/3)</br> Still no reply from the earthsci guys over the state of their corresponding domain name, as we might be playing chinese whispers with Mark asking for the creation on the proto-VO's behalf things are likely to go slowly. Waiting for reply (11/3)

EPIC GLASGOW</br> https://ggus.eu/ws/ticket_info.php?ticket=91687 (21/2)</br> The epic VO testing problems have officially bounced to Glasgow, I know the chaps are investigating the UI oddness but they haven't accepted the reassigned ticket yet (and of course once this is fixed there still might be a problem). Assigned (18/3)

EMI 1 UPGRADE</br> Just a reminder of who has a ticket:</br> BIRMINGHAM, GLASGOW, SHEFFIELD, BRUNEL, RAL TIER 1, RALPP, BRISTOL and RHUL. Things are chugging along, with site's making various levels of progress. The only worry is RHUL, still no reply on their ticket: https://ggus.eu/ws/ticket_info.php?ticket=92111.

ATLAS DATA MOVEMENT</br> RALPP: https://ggus.eu/ws/ticket_info.php?ticket=90244</br> GLASGOW: https://ggus.eu/ws/ticket_info.php?ticket=89804</br> The last word on these tickets was both from atlas, but maybe the conversation has moved offline? Either way both are nearly done by the looks of it.

TIER 1</br> https://ggus.eu/ws/ticket_info.php?ticket=92266 (6/3)</br> Chris W ticketing the Tier 1 over the mismatch between the my proxy server's hostname and certificate. There was an attempt to switch over to a matching certificate today, but that caused a failure in retrieving existing credentials. Plan B is to create a new MyProxy, Chris asks if the CA can't issue a multi-alias certificate? In progress (18/3)

https://ggus.eu/ws/ticket_info.php?ticket=91658 (20/2)</br> Webdaving the Tier 1 LFC. Catalin asked if EMI3 had the latest and greatest webdav support in it, Ricardo reports that sadly it does not. In progress (15/3)

https://ggus.eu/ws/ticket_info.php?ticket=91146 (4/2)</br> Atlas ticketing the Tier 1 over their network bandwidth. The picture is much improved and atlas are happy that this will continue being looked at so they are happy to close the ticket. In progress (14/3)</br> (does this mean that the perfsonar problems in https://ggus.eu/ws/ticket_info.php?ticket=86152 are likely to be also fixed?).

SHEFFIELD</br> https://ggus.eu/ws/ticket_info.php?ticket=92299 (7/3)</br> Stephen B has noticed that this biomed publishing problem seems to have evaporated. In progress (probably can be closed) (18/3)

QMUL</br> https://ggus.eu/ws/ticket_info.php?ticket=92444 (12/3)</br> LHCB problems at QM, some problems were found and fixed - waiting to see if the VO still sees problems (you guys forgot to "Waiting for reply"). Implied "Waiting for reply" (13/3)

https://ggus.eu/ws/ticket_info.php?ticket=92158 (5/3)</br> Hone have given their blessing to close this ticket concerning hone job problems. In progress (15/3)

Tickets spotted by Stephen B</br> https://ggus.eu/tech/ticket_show.php?ticket=92585</br> Steve's (the other Steve) ticket to the argus chaps concerning their EMI3 argus problems and the long standing (but soon to be extinct?) emailaddress-in-the-cert problems.

https://ggus.eu/ws/ticket_info.php?ticket=90328</br> Winnie fixed the Storm publishing problems after getting in touch with the Storm developers - as Stephen pointed out it looks to be a Storm bug that only the devs know about. There's an important lesson to be learned here, and from EFDA-JET's similar ticket from a few weeks back - if the solution is non-obvious then don't be hesitant to ask the relevent devs!


Monday 11th March 2013 14.30 GMT.</br> 34 Open UK tickets this week. A quarter of them are EMI1 upgrade tickets, which are largely in hand.

NGI/UCL</br> https://ggus.eu/ws/ticket_info.php?ticket=92412 (11/3)</br> UCL is being threatened with suspension unless the NGI intervene within 10 days. I've assigned this to the ops team mailing list as NGI tickets can sneak under the radar. In Progress (11/3)

VOMS</br> https://ggus.eu/ws/ticket_info.php?ticket=92306 (7/3)</br> vo.earthsci.ac.uk would like to be added to the VOMS server. Robert has noticed that the domain earthsci.ac.uk hasn't been registered, so the name can't be used, and asks if the VO plans to register it. Waiting for reply (8/3)

EMI 1 RETIREMENT TICKETS</br> I won't go over them all, here's the two that stand out as everyone else has laid out a plan.</br> https://ggus.eu/ws/ticket_info.php?ticket=92111 (RHUL) - Still just assigned.</br> https://ggus.eu/ws/ticket_info.php?ticket=91995 (Bristol) - I'm not sure how much Bristol need to upgrade (I think it's just their BDII), but no plans from Winnie yet.

For the sites needing to upgrade WMSii Daniela reported that it went quite smoothly for her using the instructions from http://www.eu-emi.eu/products/-/asset_publisher/1gkD/content/wms-1.

TIER 1</br> https://ggus.eu/ws/ticket_info.php?ticket=92266 (6/3)</br> Chris has ticketed the Tier 1 over their myproxy server's certificate. Stephen B references https://ggus.eu/ws/ticket_info.php?ticket=92065 (from Daniela), and the Tier 1 chaps aim to replace the host certificate on or about the 18th of March. In Progress (8/3)

https://ggus.eu/ws/ticket_info.php?ticket=91687 (21/2)</br> EPIC vo support on the RAL WMS. The VO has been enabled, but Tom is having problems. Could someone who admins the Scotgrid UI have a check that the EPIC vo gubbins are set up correctly please. In progress (7/3)

https://ggus.eu/ws/ticket_info.php?ticket=91658 (20/2)</br> Request for Webdav support on the RAL LFC. Ricardo reports that the new version being waited on is in epel-testing and awaiting validation - if you're feeling brave it can be tested. In progress (5/3)

https://ggus.eu/ws/ticket_info.php?ticket=90528 (17/1)</br> Sno+ jobs not being assigned to Sheffield by one of the RAL WMSes. James from Sno+ has got back to us, they're okay with just using the working WMS. If this can be set up (if it hasn't already) the ticket can probably be closed. In progress (6/3)

MANCHESTER</br> https://ggus.eu/ws/ticket_info.php?ticket=92190 (6/3) LHCB saw cvmfs related job failures, Andrew and Alessandra identified the problem cvmfs outgrowing its cache on several nodes. The caches have been increased and a savannah ticket filed with cvmfs. Looks like the ticket can be closed, but others might want to watch out for this. In progress (9/3)

SHEFFIELD</br> https://ggus.eu/ws/ticket_info.php?ticket=92299 (7/3)</br> Biomed are seeing invalid publishing for their VO at Sheffield. It could be that you're seeing the same problems that they saw at JET (https://ggus.eu/ws/ticket_info.php?ticket=88227) and fixed with an update. In progress (8/3)

Of interest to WMS admins:</br> Daniela brought this ticket to my attention:</br> https://ggus.eu/tech/ticket_show.php?ticket=92288</br> This problems sounds like it could be a right pain in the bum, from Daniela "I would consider this a rather major bug, it wipes all done jobs from the LB every night, as a bonus leaving all the crud (sandboxes) on the WMS lying around without the users being able to retrieve them. (Though the fix is simple.)"

Monday 4th March 2013 14.45 GMT</br> 38 Open UK tickets today. All was going smoothly until the EMI1 tickets hit us, still the reply to them was swift from sites. It's the start of the month, so I need to take a break from Spring cleaning my desk (the horrors that I have seen) and take a look at all the tickets.


EMI 1 Tickets:</br> (I won't go into much detail as they're likely to be talked about elsewhere and they only came out this morning.)

RALPP https://ggus.eu/ws/ticket_info.php?ticket=91997 (In progress) - Plan in place

OXFORD https://ggus.eu/ws/ticket_info.php?ticket=91996 (In Progress) - Is the deadline to upgrade the end of April, or do we need to be sorted before then?

BRISTOL https://ggus.eu/ws/ticket_info.php?ticket=91995 (In progress) - Winnie has asked for clarification for what's going on.

BIRMINGHAM https://ggus.eu/ws/ticket_info.php?ticket=91994 (In progress) - Mark will get onto this as soon as Birmingham's AC starts behaving.

GLASGOW https://ggus.eu/ws/ticket_info.php?ticket=91992 (In progress) - There are some red herrings at Glasgow due to hanging CE bdiis. Just the WMSes and LB to go, these are being handled.

SHEFFIELD https://ggus.eu/ws/ticket_info.php?ticket=91990 (In progress) - Elena plans to upgrade this month.

RHUL https://ggus.eu/ws/ticket_info.php?ticket=91987 (Assigned)</br> https://ggus.eu/ws/ticket_info.php?ticket=91982 (Assigned)</br> https://ggus.eu/ws/ticket_info.php?ticket=91981 (Assigned)</br> (Poor RHUL getting 3 tickets - I assume this is the ROD dashboard being silly as Daniela mentioned)</br> The real ticket: https://ggus.eu/ws/ticket_info.php?ticket=92111

LIVERPOOL https://ggus.eu/ws/ticket_info.php?ticket=91984 (In progress) - The Liver lads are working on it.

QMUL https://ggus.eu/ws/ticket_info.php?ticket=91980 (In Progress) - Chris has updated his BDII, so hopefully things will be sorted.

IC https://ggus.eu/ws/ticket_info.php?ticket=91978 (In Progress) - wms updated, last CE has a scheduled downtime, um, scheduled.

BRUNEL https://ggus.eu/ws/ticket_info.php?ticket=91975 (In Progress) - Raul plans to upgrade things at the end of the month. He asks about dangers upgrading the CE from EMI1 to 2 - Daniela replies that the DB change means that it's recommended to drain your CE first.

TIER 1 https://ggus.eu/ws/ticket_info.php?ticket=91974 (In Progress) - The team plan to have all services updated by the end of March.


Atlas data moving tickets:</br> https://ggus.eu/ws/ticket_info.php?ticket=90242 (Lancaster)</br> https://ggus.eu/ws/ticket_info.php?ticket=90243 (Liverpool)</br> https://ggus.eu/ws/ticket_info.php?ticket=90244 (RALPP)</br> https://ggus.eu/ws/ticket_info.php?ticket=90245 (Oxford)</br> https://ggus.eu/ws/ticket_info.php?ticket=89804 (Glasgow)</br>

Nearing the end of these. Lancaster and Oxford are down to their last few files (which might need to be manually fixed at the site end- the one left at Lancaster is lost for good). RALPP similarly have dark data files that might need to be cleaned up locally. Liverpool are waiting on atlas after giving them a new list of files. Glasgow have been asked for a fresh file dump.


The Rest:

TIER 1</br> https://ggus.eu/ws/ticket_info.php?ticket=91687 (21/2)</br> Support for the epic VO on the RAL WMS. Request for pool accounts went out but no word since. In progress (21/2)

https://ggus.eu/ws/ticket_info.php?ticket=91658 (20/2)</br> Request from Chris W for webdav redirection support on the RAL LFC. As reported last week waiting on the next release which has better, stronger, faster webdav support in it. In Progress (22/2)

https://ggus.eu/ws/ticket_info.php?ticket=91146 (4/2)</br> atlas tracking RAL bandwidth issues. The ticket was waiting on last week's downtime to hopefully sort things out. Did the picture improve? In progress (12/2)

https://ggus.eu/ws/ticket_info.php?ticket=91029 (30/1)</br> Again from atlas, this is the FTS queries failing for some jobs involving users with odd characters in the name ticket. A fix either needs to be implemented by the srm developers or atlas need to workaround by changing their robot DNs. On hold (27/2)

https://ggus.eu/ws/ticket_info.php?ticket=90528 (17/1)</br> Sno+ Jobs weren't making their way to Sheffield, tracked to a problem with one wms. As the cause of the problem is unknown and completely unobvious it was suggested restricting Sno+ jobs to the working WMS, but still no reply from Sno+. Waiting for reply (19/2)

https://ggus.eu/ws/ticket_info.php?ticket=86152 (17/9/2012)</br> Correlated packet loss on the RAL Perfsonar host. Did last week's network intervention fix things? Or maybe the problem evapourated (I'm ever the optimist)? On hold (16/1)

IMPERIAL</br> https://ggus.eu/ws/ticket_info.php?ticket=91866 (28/2)</br> It looks like atlas jobs were running afoul of some cvmfs problems on some nodes. They've been given a kick, it's worth seeing if the problem has gone away. In progress (28/2)

GLASGOW</br> https://ggus.eu/ws/ticket_info.php?ticket=91792 (26/2)</br> Atlas thought that they had lost some files, but it turns out that they just had bad permissions on a pool node (root.root) - the problem's been fixed and Sam is investigating with his DPM hat on, whilst checking the filesystems for more possible bad files. In progress (4/3)

https://ggus.eu/ws/ticket_info.php?ticket=90362 (13/1)</br> All Glasgow's CEs have been switched over to use the GridPP voms server for ngs.ac.uk, they just need some testing. Solved (4/3).

SHEFFIELD https://ggus.eu/ws/ticket_info.php?ticket=91770 (25/2)</br> lhcb complaining about the default value being published for Max CPU time. No news from Sheffield beyond the acknowledgement of the ticket. In Progress (25/2)

DURHAM</br> https://ggus.eu/ws/ticket_info.php?ticket=91745 (24/2)</br> enmr.eu having trouble with lcg-tagging things at DUrham. Mike gave this a kick, and asked if the problem has gone away. Waiting for reply (25/2)

RHUL</br> https://ggus.eu/ws/ticket_info.php?ticket=91711 (21/2)</br> atlas having trouble copying files into RHUL. It's being looked at but PRODDISK and ANALY_RHUL have been put offline. In Progress (28/2)

https://ggus.eu/ws/ticket_info.php?ticket=89751 (17/12/12)</br> Path MTU discovery problems to RHUL. On hold since being handed over to the Network guys, who were following it up with Janet. On hold (28/1)

LANCASTER</br> https://ggus.eu/ws/ticket_info.php?ticket=91304 (8/2)</br> LHCB having trouble on one of Lancaster's cluster as they like to run their jobs in the home directory rather then $TMPDIR. Forcing this behaviour is harder then it should be in LSF, so it looks like we're going to have to relocate the lhcb home directories. In Progress (1/3)

https://ggus.eu/ws/ticket_info.php?ticket=90395 (14/1)</br> dteam jobs failed at Lancaster, due to our old CE being rubbish. Its since been reborn with new disks, but embarrassingly I haven't found the time to set a UI up for dteam and test it myself (which I intend to do as part of testing the UI tarball, but that's a whole other story). In progress (18/2)

ECDF</br> https://ggus.eu/ws/ticket_info.php?ticket=90878 (27/1)</br> lhcb were having problem with cvmfs at Edinburgh, but the fixes attempted can't be checked due to dirac problems at the site. In progress (could be knocked back to waiting for reply) (28/2)

BRISTOL</br> https://ggus.eu/ws/ticket_info.php?ticket=90328 (11/1)</br> The Bristol SE is publishing some odd values - zero used space. Waiting on another, similar ticket (90325) to be resolved. On hold (11/2)

https://ggus.eu/ws/ticket_info.php?ticket=90275 (10/1)</br> The CVMFS taskforce have asked for Bristol's CVMFS plans. One Bristol CE is migrated to using it, with one left to go. On hold (5/2)

EFDA-JET</br> https://ggus.eu/ws/ticket_info.php?ticket=88227 (6/11/2012)</br> Jet have exhausted all options trying to fix this biomed job publishing problem. They're looking at reinstalling the CE to fix it, which seems like using a sledgehammer to crack a walnut (but I don't have any better ideas). On hold (25/2) Daniela suggests assigning issue to the developers.


Monday 25th February 2013 15.00 GMT</br> 30 open tickets for the UK this week. As usual everyone's doing a good job of keeping things tidy.

TIER 1</br> https://ggus.eu/ws/ticket_info.php?ticket=91687 (21/2)</br> Of interest- the epic.vo.gridpp.ac.uk request for access to the RAL WMS. The RAL chaps are working on it. In progress (21/2)

https://ggus.eu/ws/ticket_info.php?ticket=91658 (20/2)</br> Chris W has asked for webdav access to the RAL LFC (essentially a request for an upgrade). Wahid has commented that this might want to wait a couple of weeks for the next LFC release- which the Tier 1 will likely do. In progress (22/2)

https://ggus.eu/ws/ticket_info.php?ticket=91029 (30/1)</br> FTS problem when dealing with atlas robot certificates. A fix on the srm side isn't going to be coming anytime soon, and unless atlas want to switch to colon-less DNs for their robot names this ticket wil probably need to be on-holded - but it might be worth asking atlas if they're willing to change the robot DNs. In progress (25/2)

https://ggus.eu/ws/ticket_info.php?ticket=90528 (17/1)</br> Sno+ jobs submitted to the RAL WMSes aren't being sent to Sheffield. Further investigation reveals that it works for lcgwms03, but not 02. After further investigation revealed no earthly explanation for this behaviour Catalin proposed limiting SNO+ to the working wms. Waiting for reply (19/2)

GLASGOW</br> https://ggus.eu/ws/ticket_info.php?ticket=91439 (12/2)</br> The atlas transfer errors have come back, although atlas forgot to switch the ticket back from "Waiting for reply" after they replied. Waiting for reply (24/2)

https://ggus.eu/ws/ticket_info.php?ticket=90362 (13/1)</br> Switching the ngs VO over to the GridPP VOMS server. The last Glasgow CE should be switched over now, Gareth has asked for a test. Waiting for reply (25/2)

RALPP</br> https://ggus.eu/ws/ticket_info.php?ticket=91377 (11/2)</br> This atlas transfer failure ticket is looking quite neglected, last reply was from atlas a while ago confirming that the errors still existed - it's likely worth asking again if the problem persists. In progress (13/2)

SOLVED?</br> Chris brought this t2k ticket back to my attention last week:</br> https://ggus.eu/ws/ticket_info.php?ticket=90235 (solved by the SYSTEM on 30/1, should it have been? edit -reading up the ticket I see that it was solved by Catalin, my eyeballs failed me)</br> and it's "parent" ticket:</br> https://ggus.eu/ws/ticket_info.php?ticket=89105</br> It regarding WMSs failing to renew proxies. I can't say I have a clue what's going on, but ticket 89105 has been reassigned to "Operations" but hasn't been picked up by anyone - it's looking very neglected for the last month.</br> If the original problem is still going then we will need to make some noise about this.

Monday 18th February 15.00 GMT</br> 33 open tickets for UK sites this week. Only 1 is "green" - 2 are "yellow" and the rest are "red".

TIER 1</br> https://ggus.eu/ws/ticket_info.php?ticket=91146 (4/2)</br> Atlas RAL bandwidth issues.</br> https://ggus.eu/ws/ticket_info.php?ticket=91029 (30/1)</br> Atlas having problems querying the FTS.</br>

Both these tickets are waiting on something happening/being fixed in the (hopefully) not to distant future, so they probably should be On Holded.

https://ggus.eu/ws/ticket_info.php?ticket=90528 (17/1)</br> Sno+ jobs don't get sent to Sheffield from the Tier 1 WMSes. Matt M has provided additional information. In progress (14/2)

CAMBRIDGE</br> https://ggus.eu/ws/ticket_info.php?ticket=91582 (17/2)</br> Atlas have used up all their data disk at Cambridge, then ticketed the site about it. John has set the ticket to In Progress, confirming the situation (the space is used up, there's no accounting problem or dead disk server). At the very least I think this ticket should be set to "Waiting for Reply" (When are you going to clean up your data?), or even have the ticket bounced to the Atlas DDM people. In progress (17/2)

GLASGOW</br> https://ggus.eu/ws/ticket_info.php?ticket=91439 (12/2)</br> Atlas had transfer problems at Glasgow, fixed but errors continue due to a problem at FZK. Bear that in mind if you get a atlas ticket this week. In Progress, perhaps can be Waiting for Reply/FZK to fix themselves (18/2)

RALPP</br> https://ggus.eu/ws/ticket_info.php?ticket=91377 (11/2)</br> Atlas replied to say that they were still seeing transfer problems, although this was a couple of days ago. In progress (13/2)


ECDF</br> https://ggus.eu/ws/ticket_info.php?ticket=90878 (27/1)</br> This lhcb ticket concerning cvmfs problems (which weren't cvmfs problems after all) is looking a little neglected. LHCB replied to a question a while back now. In Progress (6/2)

LANCASTER</br> https://ggus.eu/ws/ticket_info.php?ticket=90395 (14/1)</br> Lancaster had problems with running dteam jobs (the CE in question had problems running anyone's jobs to be fair). The CE has been rejuvenated, but embarrassingly for me I still have yet to configure one of the Lancaster UI's for dteam. In Progress (11/2)

DURHAM</br> https://ggus.eu/ws/ticket_info.php?ticket=90393</br> https://ggus.eu/ws/ticket_info.php?ticket=90340</br> https://ggus.eu/ws/ticket_info.php?ticket=90358</br> https://ggus.eu/ws/ticket_info.php?ticket=89825</br> https://ggus.eu/ws/ticket_info.php?ticket=75488</br>

As mentioned last week, after Mike's Heraclean efforts Durham is back on it's feet. If Mike's back from his well-earned break it's worth for each of these tickets to ask if the problem's persist (or at least changed error message) and switching them all from "On hold" to "Waiting for Reply".

Any other tickets people want to go over - too or from the UK (or an issue which might affect us?).

Monday 11th February 2013 15.00 GMT</br> The UK has managed to batter the number of open tickets down to 35, nice one! Let's look at the interesting ones, everyone's doing a good job so there's not much to go over.

Possibly of interest to some, there's a ticket regarding the definition of "Waiting for reply": https://savannah.cern.ch/support/?135827

Atlas Movement Tickets:</br> https://ggus.eu/ws/ticket_info.php?ticket=90242 (Lancaster)</br> https://ggus.eu/ws/ticket_info.php?ticket=90244 (RalPP)</br> https://ggus.eu/ws/ticket_info.php?ticket=90245 (Oxford)</br> https://ggus.eu/ws/ticket_info.php?ticket=89804 (Glasgow)</br> https://ggus.eu/ws/ticket_info.php?ticket=90243 (Liverpool)</br> Stephene has started shepherding these tickets for Atlas whilst Brian's questing in Middle-earth, although there seems to be some confusion about them. They all seem to be progressing (at different paces), one possible issue (certainly for Lancaster) is the dpm-sql-spacetoken-list-files doesn't work on SL6 yet, but it's being worked on.

TIER 1</br> https://ggus.eu/ws/ticket_info.php?ticket=91029 (30/1)</br> This ticket concerning problems with the FTS querying jobs has took an interesting turn, the SRM developers have been called in. In Progress (11/2)

https://ggus.eu/ws/ticket_info.php?ticket=91251 (7/2)</br> LHCB jobs had problems with jobs at the Tier 1, related to mysterious batch system problems. The problem went away, if jobs ran successfully over the weekend then this ticket can be closed. In progress (7/2)

IC</br> https://ggus.eu/ws/ticket_info.php?ticket=91266 (7/2)</br> Hone lost jobs sent through the IC WMS, but this was due to a bug on their old and crusty glite UI. This ticket can be closed, In Progress (8/2)

DURHAM</br> https://ggus.eu/ws/ticket_info.php?ticket=89825 (19/12/2012)</br> enmr shared area problems at Durham, the last message was a positive one from the VO so I think this is another ticket then can be closed. On hold (6/2)

There are other Durham tickets that need looking at in a positive light, with a view to asking users if problems have gone and closing them.

Solved Cases:</br> https://ggus.eu/ws/ticket_info.php?ticket=90451 </br> The NGI_UK now has a service group in the GOCDB for our core services.

Otherwise business as usual in the closed ticket section.

Monday 4th February 2013, 14:30 GMT</br> We're hitting Febuary with 43 open tickets - slowly whittling away at them! It's the first Monday of the month, so let's dive into them all.

NGI</br> https://ggus.eu/ws/ticket_info.php?ticket=90451 (15/1)</br> Grouping of the core services. Progress is being made, although no deadline for this work has been given. In Progress (4/2)

https://ggus.eu/ws/ticket_info.php?ticket=91081 (1/2)</br> Our ROD team has been ticketed in order to make sure they're keeping track of out of date services after the 1st February deadline. We'll discuss this in the meeting. In progress (4/2)</br>

TIER 1</br> https://ggus.eu/ws/ticket_info.php?ticket=86152 (17/9/2012)</br> "Correlated packet-loss on perfsonar host". This ticket is being considered within the scope of wider scale networking issues at RAL, but other aspects of the investigation are coming first. On hold (16/1)

https://ggus.eu/ws/ticket_info.php?ticket=91029 (30/1)</br> atlas were having a problem querying the FTS jobs, which if I'm reading the ticket right might have been caused by some transfers between castor and QMUL's storm going awry. Chris has offered to upgrade his storm to EMI2 if it's thought that would help, and has asked atlas what they'd like him to do. In Progress (4/2)

https://ggus.eu/ws/ticket_info.php?ticket=90151 (8/1)</br> neiss have been enabled on the RAL WMS, but some problems still need to be ironed out. In Progress (4/2)

https://ggus.eu/ws/ticket_info.php?ticket=90528 (17/1)</br> The RAL WMS isn't assigning SNO+ jobs to Sheffield. Still being investigated. In Progress (4/2)

https://ggus.eu/ws/ticket_info.php?ticket=91060 (31/1)</br> A CMS ticket (although it's not got CMS as its "Concerned VO"), about glexec problems on a few workers. There was a few days where identity switching didn't work. More pool accounts have been requested, and when that's done the issue should be solved. In progress (4/2)

https://ggus.eu/ws/ticket_info.php?ticket=89733 (17/12/2012)</br> Chris' uncovering of a dodgey top-BDII node at RAL. A new BDII trinity went live today, hopefully that'll have solved the problems. We're now at the wait-and-see-if-it's-fixed stage. In progress (4/2)

RALPP</br> https://ggus.eu/ws/ticket_info.php?ticket=90244 (10/1)</br> Atlas migration from groupdisk. Waiting on atlas to finish moving data. With Brian on the other side of the planet it might need someone else to keep an eye on this and similar tickets. Waiting for reply (29/1)

https://ggus.eu/ws/ticket_info.php?ticket=90863 (27/1)</br> Atlas FTS errors on intra-site transfers/deletions. Looked to be a load related problem, possibly caused by the deletions. Did it come back? In progress (28/1)

OXFORD</br> https://ggus.eu/ws/ticket_info.php?ticket=86106 (14/9/2012)</br> Ye olde "low atlas sonar rates to BNL ticket" for Oxford. Have there been any further investigation on this issue. Does the problem still exist? We don't want to leave these tickets to rot. On hold (30/11)

https://ggus.eu/ws/ticket_info.php?ticket=90245 (10/1)</br> Oxford's atlas group disk migration ticket. Oxford seem to be mostly drained. Waiting for reply (28/1)

https://ggus.eu/ws/ticket_info.php?ticket=91117 (3/2)</br> atlas FTS failures, the problem seemed to be caused by high load on a dpm disk pool. Things looked to have calmed down (did you read-only the server?), this ticket looks good for closing. In progress (4/2)

BRISTOL</br> https://ggus.eu/ws/ticket_info.php?ticket=90275 (10/1)</br> cms (I think it's cms) have ticketed sites about their cvmfs status. Winnie is working on this, but has time constraints. On hold (29/1)

https://ggus.eu/ws/ticket_info.php?ticket=90328 (11/1)</br> Stephen ticketed Bristol over some strange values published by their SE. Waiting to track down how a similar problem was fixed. In progress (31/1)

https://ggus.eu/ws/ticket_info.php?ticket=90361 (13/1)</br> Enabling the GridPP VOMS server ticket for the ngs VO - the Bristol edition. Winnie's put the ticket on hold. On Hold (29/1)

BIRMINGHAM</br> https://ggus.eu/ws/ticket_info.php?ticket=86105 (14/9/2012)</br> Birmingham's "low atlas sonar rate to BNL" ticket. The same comments to the Oxford version apply to this one. Maybe we're lucky and the problem's evaporated! On hold (30/11/12)

GLASGOW</br> https://ggus.eu/ws/ticket_info.php?ticket=90862 (27/1)</br> Glasgow have a descrepency between the advertised space used according the the SRM and their BDII. Inder investigation, Stephen has asked that any findings get passed along to DPM support. In Progress (28/1)

https://ggus.eu/ws/ticket_info.php?ticket=89804 (18/12/12)</br> The Glaswegian atlas group disk migration ticket. After the initial changes this seems quiet, maybe too quiet. On hold (10/1)

https://ggus.eu/ws/ticket_info.php?ticket=91106 (2/2)</br> Atlas shifters noticed the Glasgow SE down. Things are settled now, so this ticket can probably be closed (remember that it's usually best NOT to leave it to a VO to close a ticket). In progress (4/2)

https://ggus.eu/ws/ticket_info.php?ticket=90966 (28/1)</br> The Glasgow WMS doesn't seem to be working for the londongrid VO. In progress (29/1)

https://ggus.eu/ws/ticket_info.php?ticket=90386 (14/1)</br> enmr.eu report that they can't run jobs when they use proxies containing VOMS group information. Hopefully this will be fixed when Glasgow roll out their new argus server. In progress (21/1)

https://ggus.eu/ws/ticket_info.php?ticket=90362 (13/1)</br> Enabling the GridPP VOMS server ticket for the ngs VO - Glasgow style. Hopefully this will be fixed with their new argus server. In progress (21/1)

https://ggus.eu/ws/ticket_info.php?ticket=89753 (17/12/2012)</br> Path MTU discovery problems from QMUL to Glasgow. Discovered to be a problem within Clydenet, held until it's fixed. On hold (23/1)

ECDF</br> https://ggus.eu/ws/ticket_info.php?ticket=90878 (27/1)</br> lhcb report cvmfs problems. Turned out to be a missing nfs mount on some workers causing jobs to have problems, things have been fixed and the bad jobs removed. Andy asks if LHCB jobs are doing better at their site. Waiting for reply (29/1)

https://ggus.eu/ws/ticket_info.php?ticket=86334 (24/9/2012)</br> Low atlas sonar rates to BNL ticket - Edinburgh edition. See my comments for Birmingham and Oxford. Wahid gave a brief update, things have been proceeding offline. On hold (16/1)

https://ggus.eu/ws/ticket_info.php?ticket=89356 (10/12/2012)</br> Wahid has given a statement about the need for the tarball to undergo more testing, and the ticket has been extended. On hold (31/1)

DURHAM</br> https://ggus.eu/ws/ticket_info.php?ticket=91072 (1/2)</br> Durham are having cream nagios test failures- "teething troubles" for their updated services. In progress (1/2)

https://ggus.eu/ws/ticket_info.php?ticket=89825 (19/12/2012)</br> enmr.eu having trouble installing software on the Durham cluster. Ticket "On hold" but there seems to be some progress going on as Durham get their reinstalled services back up and running. On hold (2/2)

https://ggus.eu/ws/ticket_info.php?ticket=75488 (19/10/2011)</br> Ancient Compchem ticket. Mike reports that the new CE is up but needs the VO software reinstalling. On hold (1/2)

https://ggus.eu/ws/ticket_info.php?ticket=90358 (13/1)</br> Durham's enabling the gridpp voms for the ngs VO ticket. On hold until the current batch of work is complete. On hold (30/1)

https://ggus.eu/ws/ticket_info.php?ticket=90340 (12/1)</br> lhcb pilots aborting at Durham. Let's see how the reinstalled services work for them, we might want to ask the VOs in these tickets directly how things are going. On hold (1/2)

https://ggus.eu/ws/ticket_info.php?ticket=90393 (14/1)</br> Helloworld dteam jobs failing at Durham. All that has been written previously for the Durham tickets probably applies here! On hold (1/2)

LIVERPOOL</br> https://ggus.eu/ws/ticket_info.php?ticket=90243 (10/1)</br> The scouser atlas groupdisk migration ticket. John has stated that they stand ready to move space on atlas' word, which has yet to come. Waiting for reply (11/1)

LANCASTER</br> https://ggus.eu/ws/ticket_info.php?ticket=90242 (10/1)</br> The red-rose version of the atlas groupdisk migration ticket. The migration seems to have stalled atlas-side. On hold (4/2)

https://ggus.eu/ws/ticket_info.php?ticket=90395 (14/1)</br> dteam helloworld jobs fail at Lancaster. Tracked down to a CE being rubbish rather then a configuration error, the offending CE is due for downtime this week to correct it's poor behaviour. On hold (4/2)

https://ggus.eu/ws/ticket_info.php?ticket=84461 (23/7)</br> t2k transfer failures to Lancaster. The problem has been greatly reduced, and the FTS channels have has their number of concurrent transfers turned down. Waiting to see how this goes. Waiting for reply (24/1)

https://ggus.eu/ws/ticket_info.php?ticket=85367 (20/8)</br> Pilot jobs for ilc failing at Lancaster, due to the same performance issues seen above. Hopefully it'll be no more after the reinstall. On hold (4/2)

https://ggus.eu/ws/ticket_info.php?ticket=88772 (22/11/2012)</br> One of Lancaster's clusters is giving out bad GlueCEPolicyMaxCPUTime, tracked to a bug in the dynamic publishing (https://ggus.eu/ws/ticket_info.php?ticket=88904). Waiting on a fix, which I don't think made it out in the last update. On hold (3/12)

RHUL</br> https://ggus.eu/ws/ticket_info.php?ticket=89751 (17/12/2012)</br> Path MTU discovery problems to RHUL. The RHUL networking team are following up with Janet. On hold (28/1)

IMPERIAL</br> https://ggus.eu/ws/ticket_info.php?ticket=89750 (17/12/2012)</br> IC's Path MTU discovery ticket. Again the ball is in Janet's court. On hold (16/1)

BRUNEL</br> https://ggus.eu/ws/ticket_info.php?ticket=90359 (13/1)</br> Brunels ticket to enable the GridPP voms server for the ngs VO. Raul had a go at fixing it but no joy. In progress (21/1)

EFDA-JET</br> https://ggus.eu/ws/ticket_info.php?ticket=88227 (6/11/2012)</br> No dynamic publishing at EFDA-JET for biomed. Ideas appear to have been exhausted. In progress (23/1)


Monday 28th January 2013 14.30 GMT</br> 49 Open UK tickets today. We're slowly whittling them down. Most are being kept well in hand though.

Time Frames:</br> There's been some discussion concerning the Time Frames for sites getting their tickets "poked" whilst in various states. The first pass of these looked like:

   * Ticket looks finished/closed [1 or maybe 2 days]
   * Ticket is assigned but not set to "in progress" by assignees [1 or maybe 2 days - send poke to them]
   * No progress on ticket for > 3 days - send poke

What we want to have is usefully timed reminders, not nagging, so I'll ask what you guys think of this tomorrow.

Security Tickets:</br> These are getting quite urgent, although tarball sites (ECDF) probably will get a reprieve. Durham and Glasgow are a little worrying, do you guys have a plan?

The ATLAS space token tickets seem to have stalled, and with Brian heading to the other side of the world for a month there might not be much progress on them.

  • NGI</br>

https://ggus.eu/ws/ticket_info.php?ticket=90451 (15/1)</br> Ticket for the grouping of core ngi services. JK has posed a question as to who it should be assigned to (I reckon gridpp-ops). In progress (28/1)

  • Tier 1</br>

https://ggus.eu/ws/ticket_info.php?ticket=90528 (17/1)</br> Sno+ have got back concerning this ticket about their jobs not being submitted to Sheffield, they see one WMS behaving (lcgwms03) and another not (lcgwms02). In progress (23/1)

https://ggus.eu/ws/ticket_info.php?ticket=90151 (8/1)</br> The neiss VO has been enabled on the RAL WMSes, Catalin has requested that they get tested. Waiting for reply (24/1)

  • Sussex</br>

https://ggus.eu/ws/ticket_info.php?ticket=90518 (17/1) One of a few tickets affecting Sussex, has there been a chance for the "big sort out" in the last week? Stuart has extended the ticket. In progress (17/1)

  • Glasgow</br>

https://ggus.eu/ws/ticket_info.php?ticket=90386 (14/1)</br> enmr.eu problems at Glasgow. If you chaps are waiting on your new Argus server could this ticket be On Held? Depends on the timeframe for the deployment. (similar for https://ggus.eu/ws/ticket_info.php?ticket=90362)

  • Durham</br>

https://ggus.eu/ws/ticket_info.php?ticket=90340 (12/1)</br> This lhcb ticket is looking very neglected. In progress (14/1)

Monday 21st January 14:45 GMT</br> 53 tickets this week. Despite being back to my usual perky self I might have turned down the resolution of my scan over these tickets a bit too much and missed something. Let me know if that's the case.

  • Ticket Bunches:</br>

DTEAM tickets:</br> Lancaster, Durham and ECDF have tickets concerning dteam working at their site, although all our in progress.

NGS tickets:</br> Glasgow, Bristol, Brunel, Durham and Manchester have tickets about the move of the ngs VO to our VOMS server.

Atlas Space Juggling:</br> Oxford, RalPP, Liverpool, Glasgow and Lancaster have open tickets for the atlas move from groupdisk to datadisk. Ewan has posted some legitimate concerns about not being kept in the dark regarding dark data cleanup.

WN-Sec test:</br> Durham, Glasgow, Imperial, ECDF & Lancaster have tickets for their out-of-date worker nodes

  • NGI</br>

https://ggus.eu/ws/ticket_info.php?ticket=90451 (15/1)</br> Core services grouping ticket. We should tend to this ticket. Assigned (15/1)

  • RALPP</br>

https://ggus.eu/ws/ticket_info.php?ticket=90575 (18/1)</br> Nagios failures on one of your CEs, maybe slipped under the radar? Still Assigned (21/6)

https://ggus.eu/ws/ticket_info.php?ticket=90040 (2/1)</br> A CMS user hasn't replied to this ticket in a while, I'd be tempted to close it soon. Waiting for reply (4/1)

  • TIER 1</br>

https://ggus.eu/ws/ticket_info.php?ticket=90528 (17/1)</br> Sno+ jobs aren't being sent to Sheffield for some reason, Catalin has asked if they see the same with another WMS. Waiting for reply (17/1)

https://ggus.eu/ws/ticket_info.php?ticket=90151 (8/1)</br> A request for WMS enablement by Neiss might have been mistook for a request for resources at the tier 1, Chris has tried to clear things up though. In progress (21/1)

  • SUSSEX</br>

https://ggus.eu/ws/ticket_info.php?ticket=90518 (17/1)</br> Nagios failures on a Sussex CE. Emyr has called on the help of Ewan and Chris today to conquer these glitches. In progress (17/1)

https://ggus.eu/ws/ticket_info.php?ticket=90239 (10/1)</br> Similar for their SE. Would be nice to hear how today went. In Progress (15/1)

https://ggus.eu/ws/ticket_info.php?ticket=90236 (10/1)</br> A ticket from atlas regarding problems at Sussex. In progress (21/1)

  • Bristol:</br>

https://ggus.eu/ws/ticket_info.php?ticket=90328 (11/1)</br> Bristol are publishing zero used space. Winnie's hard pressed to investigate with her time constraints. In progress (15/1)

  • RHUL:</br>

https://ggus.eu/ws/ticket_info.php?ticket=90219 (9/1)</br> RHUL are publishing negative space. As nothing seems out of place it's bene suggested reassigning this ticket to the DPM chaps for support. In progress (11/1)

  • IC:</br>

https://ggus.eu/ws/ticket_info.php?ticket=89468 (11/12/12)</br> A fusion user was having proxy problems, but the un-reproducibility of the error and the user silence suggests that this ticket can be put to bed. In progress (8/1)

Monday 14th January 2013, 14.30 GMT</br> I'm sickly and grumpy and today I opened up GGUS to see 60 tickets. 60 Open UK tickets. And so I've given up even try to do a proper review of them this week. Sorry, don't have it in me.

So for an indepth review see here: http://tinyurl.com/a8jsjs3 and please check to see if your site has any tickets assigned to it that need tending.

A good number of sites have got tickets concerning having not enabled the gridpp voms server for the ngs.ac.uk VO- these should be easily fixed. A number more have tickets from atlas regarding spacetoken juggling, a lot of these are being kept open to aid sites in tracking the space shuffle. A few sites (Glasgow, ECDF, UCL) have been ticketed again by atlas for transfer performance, but the reasons seem unrelated.

DURHAM, LANCASTER, GLASGOW, IC, and ECDF have nagios security version tickets. I know for three of them this is all down to the tarball (it exists now, for SL5 at least. I might get an SL6 beta tarball out today). Congrats to UCL for upgrading their SE and getting off the out-of-date list.

Other then that there seem to have a large mixed bag of tickets landing at sites over the last week.

Normal Service will resume next week.


Monday 7th January 2013 14:00 GMT</br> Happy New Year! We're kicking off 2013 with 39 Open Tickets. Lets go through them all. I see a lot of people still put in the hours over the winter holidays, I like to think that was due to dedication rather then us lot being a bunch of Scrooges!

  • RALPP</br>

https://ggus.eu/ws/ticket_info.php?ticket=90040 (2/1)</br> CMS users are having xrootd access problems, Chris is on the case so nothing to worry about though. Waiting for reply (4/1)

https://ggus.eu/ws/ticket_info.php?ticket=90122 (7/1)</br> Atlas would like 12TB moved from groupdisk to localgroupdisk. Assigned (7/1)

  • OXFORD</br>

https://ggus.eu/ws/ticket_info.php?ticket=86106 (14/9/2012)</br> Long standing low-sonar rate ticket. On hold (30/11/2012)

  • BIRMINGHAM</br>

https://ggus.eu/ws/ticket_info.php?ticket=89572 (12/12/2012)</br> Birmingham failing the nagios security test for their DPM, hopefully Mark's efforts over the last few days will see this cleared. On hold (17/12)

https://ggus.eu/ws/ticket_info.php?ticket=89291 (6/12/2012)</br> fusion having trouble submitting to epgr02.ph.bham.ac.uk. Worth checking if the problem still exists. In progress (17/12/2012)

https://ggus.eu/ws/ticket_info.php?ticket=86105 (14/9/2012)</br> Birmingham's low atlas sonar rate ticket. On hold (30/11/2012)

  • GLASGOW</br>

https://ggus.eu/ws/ticket_info.php?ticket=89221 (5/12/2012)</br> Unexpected accounting for enmr.eu. At last check Gareth was checking the enmr.eu settings. In progress (12/12/2012)

https://ggus.eu/ws/ticket_info.php?ticket=89369 (10/12/2012)</br> Failing the nagios security check for some workernodes. Plan in place to upgrade by the end of January. In progress (17/12/2012)

https://ggus.eu/ws/ticket_info.php?ticket=89753 (17/12/2012)</br> Path MTU to Glasgow seems off. On hold pending replies from Clydenet & the University network team, as it is a problem external to the site. On hold (18/12/2012)

https://ggus.eu/ws/ticket_info.php?ticket=89804 (18/12/2012)</br> atlas would like 30TB moved from groupdisk to datadisk, and a few other changes. The request landed in Glasgow's pre-xmas "hands off" period, so won't be looked at until nowish. On hold (3/1)

  • ECDF</br>

https://ggus.eu/ws/ticket_info.php?ticket=89356 (10/12/2012)</br> Failing the nagios WN security tests. Waiting for the SL6 tarball. On hold (10/12/2012)

https://ggus.eu/ws/ticket_info.php?ticket=86334 (24/9/2012)</br> Edinburgh's atlas sonar ticket. Let's not leave these network tickets to rot, but I appreciate we have bigger fish to fry. On hold (30/11/2012)

  • DURHAM</br>

https://ggus.eu/ws/ticket_info.php?ticket=89883 (21/12/2012)</br> Failing the nagios glue2 check. Mike has put a plan in place for an upgrade, but he only landed back today. On hold (21/12/2012)

https://ggus.eu/ws/ticket_info.php?ticket=89731 (17/12/2012)</br> enmr.eu jobs staying idle forever at Durham. Acknowledged but it doesn't look like any cause was discovered, always good to check of the problem still exists. In progress (17/12/2012)

https://ggus.eu/ws/ticket_info.php?ticket=89370 (10/12/2012)</br> Failing the nagios security workernode checks. On hold (17/12)

https://ggus.eu/ws/ticket_info.php?ticket=75488 (19/10/2011)</br> Ancient compchem authentication ticket. Compchem have remembered about this ticket, and can confirm that they still see problems. On hold (3/1)

https://ggus.eu/ws/ticket_info.php?ticket=89825 (19/12/2012)</br> Missing enmr.eu software directory. It was repaired, but checking is difficult due to the problems in 89731. In progress (21/12/2012)

  • SHEFFIELD</br>

https://ggus.eu/ws/ticket_info.php?ticket=90064 (3/1)</br> Atlas request to shunt the groupdisk space to datadisk. In progress (3/1)

https://ggus.eu/ws/ticket_info.php?ticket=90095 (4/1)</br> Chris has ticketted Sheffield over odd VOs being published. It looks like atlas user groups have snuck into the vo publishing. In progress (4/1)

https://ggus.eu/ws/ticket_info.php?ticket=89872 (21/12/2012)</br> lhcb pilots aborting at Sheffield. lhcb reopened the ticket, seeing problems with the site's publishing which Elena can't duplicate. Elena's asked for extra information. Waiting for reply (3/1)

  • MANCHESTER</br>

https://ggus.eu/ws/ticket_info.php?ticket=90084 (4/1)</br> Hone were having job submission problems, but these are fixed and the ticket can be closed. In progress, can be closed (7/1)

  • LIVERPOOL</br>

https://ggus.eu/ws/ticket_info.php?ticket=89374 (10/12/2012)</br> fusion were having trouble accessing files at Liverpool (although the ticket's not clear). Steve has asked if the problems persist as the ticket was filed during a bad period for the Liverpool network pipes, no reply yet. Waiting for reply (18/12/2012)

https://ggus.eu/ws/ticket_info.php?ticket=90087 (4/1)</br> hone jobs aborting on a Liverpudlian CE (just one of them). A kick of the machine after Christmas fixed things, this ticket can be closed. In progress (4/1)

  • LANCASTER</br>

https://ggus.eu/ws/ticket_info.php?ticket=89476 (11/12/2012)</br> Failing ops WN tests on one of our CEs. Plan in place, on hold (11/12)

https://ggus.eu/ws/ticket_info.php?ticket=88628 (20/11/2012)</br> t2k software install aborts at Lancaster. Working well with Jon P about this before Christmas, need to not forget about it. In progress (20/12)

https://ggus.eu/ws/ticket_info.php?ticket=85367 (20/8/2012)</br> ilc jobs failing to submit to Lancaster. No cause was found, the only option was to reinstall & reconfigure the underperforming CE. But this CE is the platform for the tarball testing... On hold (11/10/2012)

https://ggus.eu/ws/ticket_info.php?ticket=84461 (23/7/2012)</br> t2k transfer failures to Lancaster. The network upgrade seems to have made things better, we've asked t2k if things seem better for them. Waiting for reply (7/1)

https://ggus.eu/ws/ticket_info.php?ticket=88772 (22/11/2012)</br> lhcb were being affected by incorrect MaxCPUTime publishing. There was a bug in the dynamic publisher that should be fixed in this month's release. On hold (3/12/2012)

  • UCL</br>

https://ggus.eu/ws/ticket_info.php?ticket=89477 (11/12/2012)</br> Failing the DPM security nagios check. Plan to do this in January. On hold (19/12)

  • RHUL</br>

https://ggus.eu/ws/ticket_info.php?ticket=89751 (17/12/2012)</br> Path MTU discovery failing to RHUL. Govind was looking at it, might be worth the sites ticketed for this to put their heads together. In progress (18/12)

  • IMPERIAL</br>

https://ggus.eu/ws/ticket_info.php?ticket=89750 (17/12/2012)</br> Path MTU discovery ticket. Waiting on JANET to fix their upstream router. On hold (17/12/2012)

https://ggus.eu/ws/ticket_info.php?ticket=89105 (01/12/2012)</br> t2k having problems renewing proxies on the IC wmses. Daniela put a fix in place, which worked for one WMS. She then rolled it out to t'other. No word back, should probably be waiting for reply (so I made it so!) Waiting for reply (7/1)

https://ggus.eu/ws/ticket_info.php?ticket=89368 (10/12/2012)</br> Nagios security workernode check failure. Working on rolling out the tarball. On hold (10/12)

https://ggus.eu/ws/ticket_info.php?ticket=89468 (11/12/2012)</br> fusion having trouble downloading job outputs. User is trying to recreate the error, I'm not convinced there was a problem (at IC) in the first place... In progress (11/12/2012)

https://ggus.eu/ws/ticket_info.php?ticket=89412 (10/12/2012)</br> The cvmfs taskforce want to know when/how IC will deploy cvmfs. Daniela has replied, and stated her interest in the NFS version of CVMFS. In progress (10/12)

  • BRUNEL</br>

https://ggus.eu/ws/ticket_info.php?ticket=90083 (4/1)</br> hone users having trouble getting results into the SE, with an apparent problem contacting the file catalogue. Raul's asked for more information to digest when he gets back. In progress (7/1)

  • TIER 1</br>

https://ggus.eu/ws/ticket_info.php?ticket=89733 (17/12/2012)</br> Chris W spotted a problem with the RAL top bdii. Despite a few kicks the problem persists, and appears to be affecting multiple users. In progress (3/1)

https://ggus.eu/ws/ticket_info.php?ticket=86152 (17/9/2012)</br> Correlated packet loss on the RAL perfsonar host. On hold (31/10/2012)

  • EFDA-JET</br>

https://ggus.eu/ws/ticket_info.php?ticket=88227 (6/11/2012)</br> Biomed report broken dynamic publishing at jet. They have investigated but not found any solutions yet. In progress (20/12)

Monday 17th December 15:00 GMT</br>

33 tickets this week. Due to the GGUS outage time was a little short for me to go over the tickets in as much depth as I'd like to have, nor put in any delightful holiday puns. Next year!

Please can everyone make extra effort to put any tickets to bed this week.

  • NGI</br>

https://ggus.eu/ws/ticket_info.php?ticket=89350 (10/12)</br> User DN publishing (or lack thereof) at ECDF & Bristol. I believe both sites are publishing DNs now, and that neither destroyed APEL by trying to republish their data, so I think this ticket can be closed? In progress (12/12)

  • Out of Date middleware tickets:</br>

Sheffield: https://ggus.eu/ws/ticket_info.php?ticket=89478 - Can close their ticket by the looks of it.</br> WNs:</br> GLASGOW: https://ggus.eu/ws/ticket_info.php?ticket=89369 - Can you please try to post plans this week?</br> IC: https://ggus.eu/ws/ticket_info.php?ticket=89368 - Waiting on the tarball.</br> LANCASTER: https://ggus.eu/ws/ticket_info.php?ticket=89476 - Waiting on the tarball too. I better get my backside in gear!</br> DURHAM: https://ggus.eu/ws/ticket_info.php?ticket=89370 -plan to upgrade during their overhaul.</br> ECDF: https://ggus.eu/ws/ticket_info.php?ticket=89356 - waiting on the (sl6 just to be different) tarball.</br> DPM:</br> UCL: https://ggus.eu/ws/ticket_info.php?ticket=89477 - Ben will contact the storage group for help soon. Needs a plan.</br> BIRMINGHAM: No ticket, but Mark is battling bravely against what ever's causing his upgrade failures. At this point I'd suggest exorcising the machine room, to get rid of the Ghosts of Upgrades Past.</br> Update - Birmingham do have a ticket now: https://ggus.eu/ws/ticket_info.php?ticket=89572

  • TIER-1</br>

https://ggus.eu/ws/ticket_info.php?ticket=89733 (17/10)</br> Chris has spotted some bad information coming from the Tier-1 top bdii. Assigned as I was writing this (17/10)

  • DURHAM</br>

https://ggus.eu/ws/ticket_info.php?ticket=89731 (17/12)</br> enmr.eu are having jobs stall at Durham, can you please have a gander before the holidays. Assigned (17/12)

  • BRUNEL</br>

https://ggus.eu/ws/ticket_info.php?ticket=89415 (10/12)</br> Of interest- an hone user's jobs failed to send work back to a Desy SE (perhaps an SL6 compatibility problem with his jobs?). The user has asked to use a few hundred GBs of space at Brunel to stage his output to, Raul permitted it. In progress (could waiting for reply pending the user's experience) (14/12)

  • LIVERPOOL</br>

https://ggus.eu/ws/ticket_info.php?ticket=89374 (10/12)</br> Did you chaps have a chance to look at this fusion ticket? Is there actually a problem at Liverpool? In progress (11/12)

https://ggus.eu/ws/ticket_info.php?ticket=88761 (22/11)</br> The lhcb jobs that clogged the Liverpool network. Steps have been taken to stop this happening again, so I reckon we can put this one to bed. In progress (23/11)

  • BIRMINGHAM</br>

https://ggus.eu/ws/ticket_info.php?ticket=89129 (3/12)</br> atlas jobs failures, seem to have abated thanks to Mark's efforts. This ticket can be closed by the looks of it. In Progress (17/12)

  • IC</br>

https://ggus.eu/ws/ticket_info.php?ticket=89105 (1/12)</br> t2k were having problems renewing proxies via the IC WMSes. Daniela implemented a hopeful fix, and Stephen Burke suggested that the change of behaviour was due to the switch to EMI myproxy servers. In Progress (can be waiting for replied to see how the t2k jobs fare) (15/12)

  • LANCASTER</br>

https://ggus.eu/ws/ticket_info.php?ticket=88628 (20/11)</br> t2k were having trouble running their SW jobs at Lancaster. Current problem is getting the WMS to submit to one CE, expect a plea for help to TB-SUPPORT soon. In progress (17/12)