Difference between revisions of "Past Ticket Bulletins 2016"

From GridPP Wiki
Jump to: navigation, search
Line 1: Line 1:
 +
'''Monday 23rd May 2016, 15.00 BST'''<br />
 +
37 Open UK Tickets this week.
 +
 +
Concentrating on tickets that look like they can be closed (if not now then soon):
 +
 +
'''TIER 1:''' [https://ggus.eu/?mode=ticket_info&ticket_id=120954 120954] <br />
 +
This LHCB ticket to clean up DNS aliases looked to have the hard parts done.
 +
 +
'''TIER 1:''' [https://ggus.eu/?mode=ticket_info&ticket_id=121698 121698]<br />
 +
CMS failures over the weekend, solved by increasing the max file limit by a factor of 10. Looks like this sorted the problem.
 +
 +
'''RALPP:''' [https://ggus.eu/?mode=ticket_info&ticket_id=118628 118628]<br />
 +
Daniela reports that (after their voms change) LZ jobs submitted to RALPP okay - so maybe this LZ ticket can be wrapped up?
 +
 +
'''SUSSEX:''' [https://ggus.eu/?mode=ticket_info&ticket_id=120714 120714]<br />
 +
This ROD ticket looks sorted for Sussex, Gareth has asked the site to set it to solved. ''Update - solved''
 +
 +
'''RHUL:''' [https://ggus.eu/?mode=ticket_info&ticket_id=121231 121231]<br />
 +
An LHCB ticket, problems were found and solved, pilots are flowing once again. Mark gives the thumbs up to solve the ticket.
 +
 +
'''GLASGOW:''' [https://ggus.eu/?mode=ticket_info&ticket_id=120973 120973]<br />
 +
Ticket tracking the retirement the WMSii and LB. I suspect the Glasgow chaps know to (and are looking forward to) closing this ticket once you've completed the last few steps.
 +
 +
'''QMUL:''' [https://ggus.eu/?mode=ticket_info&ticket_id=121574 121574]<br />
 +
It looks like the alarm triggering this ROD BDII ticket disappeared on its own, so feel free to close the ticket (as Gareth suggested).
 +
 +
'''LHCB VOFEED tickets'''<br />
 +
'''ECDF:''' [https://ggus.eu/?mode=ticket_info&ticket_id=121360 121360]<br />
 +
'''BRUNEL:''' [https://ggus.eu/?mode=ticket_info&ticket_id=121388 121388]<br />
 +
Both these VOFEED tickets have asked for feedback from lhcb on what way to proceed.
 +
 +
'''TIER 1 SNO+ TICKETS'''<br />
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=120920 120920]<br />
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=121322 121322]<br />
 +
Snoplus have two open tickets with the Tier 1 regarding file access - the first is regarding xrootd problems, the second accessing files from tape. Both tickets could do with an update, I believe both tickets have their root cause in Castor not playing ball.
 +
 +
'''''Update - Birmingham'''''<br />
 +
[https://ggus.eu/?mode=ticket_info&ticket_id=121125 121125]<br />
 +
Atlas dumps ticket for Birmingham - Matt reports he's trying to get the xroot to upload the dumps locally into the DPM. Did anyone have success with this?
 +
 
'''Monday 16th May 2016, 15.00 BST'''<br />
 
'''Monday 16th May 2016, 15.00 BST'''<br />
 
42 Open UK Tickets this week.
 
42 Open UK Tickets this week.

Revision as of 16:18, 26 May 2016

Monday 23rd May 2016, 15.00 BST
37 Open UK Tickets this week.

Concentrating on tickets that look like they can be closed (if not now then soon):

TIER 1: 120954
This LHCB ticket to clean up DNS aliases looked to have the hard parts done.

TIER 1: 121698
CMS failures over the weekend, solved by increasing the max file limit by a factor of 10. Looks like this sorted the problem.

RALPP: 118628
Daniela reports that (after their voms change) LZ jobs submitted to RALPP okay - so maybe this LZ ticket can be wrapped up?

SUSSEX: 120714
This ROD ticket looks sorted for Sussex, Gareth has asked the site to set it to solved. Update - solved

RHUL: 121231
An LHCB ticket, problems were found and solved, pilots are flowing once again. Mark gives the thumbs up to solve the ticket.

GLASGOW: 120973
Ticket tracking the retirement the WMSii and LB. I suspect the Glasgow chaps know to (and are looking forward to) closing this ticket once you've completed the last few steps.

QMUL: 121574
It looks like the alarm triggering this ROD BDII ticket disappeared on its own, so feel free to close the ticket (as Gareth suggested).

LHCB VOFEED tickets
ECDF: 121360
BRUNEL: 121388
Both these VOFEED tickets have asked for feedback from lhcb on what way to proceed.

TIER 1 SNO+ TICKETS
120920
121322
Snoplus have two open tickets with the Tier 1 regarding file access - the first is regarding xrootd problems, the second accessing files from tape. Both tickets could do with an update, I believe both tickets have their root cause in Castor not playing ball.

Update - Birmingham
121125
Atlas dumps ticket for Birmingham - Matt reports he's trying to get the xroot to upload the dumps locally into the DPM. Did anyone have success with this?

Monday 16th May 2016, 15.00 BST
42 Open UK Tickets this week.

GOCDB/VOFEED mismatch tickets
There are 7 open tickets left from last week's campaign to clean up the VO tags featured in the gocdb. Only the Birmingham ticket is still in the "assigned" state, the rest are undergoing discussion or requesting feedback/clarification.
BIRMINGHAM 121450
RALPP 121464
LIVERPOOL 121394
BRUNEL 121388
BRISTOL 121386 Update - closed, thanks Winnie!
ECDF 121360
RHUL 121421

QUESTIONING ROD
121465(11/5)
This ECDF availability ticket is "on the mend", but Andy has asked how the numbers are calculated. Waiting for reply (16/5) (This goes in hand with Andy's question in ECDF's other ROD ticket 120004)

120714 (9/4)
I think this Sussex ROD ticket is solved, the link to the tests looks green (in a good way). I think it can be closed? In progress (28/4)

OXFORD 120019 (7/3)
Talking of tickets that probably can be closed, I think this CMS subscription change request issue is solved? Either way it could do with an update. In progress (29/4)

RHUL 121516 (12/5)
A biomed ticket, possibly the same networking problems affecting them that affected atlas jobs (121540). It looks like this ticket snuck past your sentries, and could do with acknowledgment. Assigned (12/5) Update- updated and in progress, hope the networking problems go away.

BIRMINGHAM
121125 (28/4)
Did you chaps have any luck getting your dumps working? Taking a peek myself I see that your dumps directories are still empty. Let us know if you need a hand. In progress (4/5)

Any there any other tickets or issues people want bringing up?

And finally, the Other VO Nagios...

Monday 9th May 2016, 13.00 BST
39 Open UK Tickets this month

So long and thanks for all the jobs - decommissioning tickets.
120973 (Glasgow, 2 WMSes and an LB).
121258 (Tier 1, just one WMS).
120664 (Tier 1, GenScratch disk pool).
Not much else to say, nothing to see here. Move along...

NGI
119995 (7/3)
Cleaning up old uncertified NGS sites. Any joy Jeremy? In Progress (18/4)

NEUGRID CVMFS STRATUM PROBLEMS
121179 (2/5)
The neugrid stratum at the Tier 1 isn't behaving - no site was notified with this ticket so it likely dodged people's notice. I sent it RAL's way- feel free to bounce elsewhere if it isn't a problem at the Tier 1. Assigned (9/5) Update - the submitter confirms things are fixed, it looks like the ticket can be closed.

SUSSEX
Ops tests woes:
121028 (25/4) -cream CE
120735 (11/4) -Availability
120714 (9/4) -CA distro.
Being handled as best Jeremy M can - it looks like the last two issues are on the mend. Not sure about the first one.

118289(10/12/15)
gridpp pilot role ticket. No news for a while, but hopefully a familiar face will sweep in and save the day soon. On Hold (25/1)

RALPP
120282 (18/3)
Atlas-centric HTTP support ticket. Chris is putting the site in downtime next week to upgrade the dcache hardware and version, and we'll see how this looks after. On hold (6/5)

118628 (5/1)
LZ pilot ticket. No news after the testing the test version of Arc didn't go so well, and so Chris decided to wait until they have a newer umd4 CE to try it out on, or at least until the fix makes it into the proper repos. The reminder date has passed, any news? On Hold (22/3)

OXFORD
120019 (7/3)
CMS federation subscription change for Oxford. Kashif has worked on this and it looks like it might be fixed. Any news? In progress (29/4)

121139 (22/4)
Enabling skatelescope.eu on the Oxford VOMS. Kashif kicked it but Robert's tests didn't work, so debugging is ongoing. In progress (6/5)

BRISTOL
121024 (25/4)
CMS transfer problems. Phedex was upgraded, but a few more problems with some dodgey datasets came up - Lukasz seems to have it all in hand though. In progress (6/5)

120455 (29/3)
A spot of self-ticketing, here Lukasz asked CMS to validate their new HTCondor CE. A lot of conversation in ticket (some regarding CMS multicore), the last entry has Lukasz looking at the cERN Condor accounting daemon. Assigned (could do with being changed to a different status) (9/5)

BIRMINGHAM
121125 (28/4)
The atlas storage dump is missing at Birmingham - Matt is looking for it (I had more trouble then I should have setting up this cron job at Lancaster - I forgot my 'nix-admining basics! The shame!). In progress (4/5)

120948 (20/4)
Ops availability ticket, on hold whilst things recover - naught to see here. On Hold (20/4)

GLASGOW
120135 (11/3)
Another atlas-centric http TF ticket. The ticket could do with an update/on holding. In progress (7/4)

120351 (22/3)
Enabling LSST at Glasgow, on hold awaiting the new identity management system[1]. Alessandra posted a helpful link here - how goes things? (5/5) Update - I noticed that 117706 (enabling pilots for pheno and friends) is done so hopefully this is just a roundtuit?

[1]Robin's started working on a CentOS7 argus sever build with ansible at Lancaster if that's relevant to your, or anyone else's, interests.

ECDF
121227 (4/5)
A crusty cream CE is causing ROD Ops test failures at ECDF - Andy and Marcus are deciding its fate. In progress (5/5) Update - the immediate issue was solved, and the ticket closed.

120004 (7/3)
The ARCHER facing test CE suffering ROD failures. Was a decision reached about whether or not to put the service in downtime or similar? I see the CE is in a short downtime at the moment. On Hold (25/4) Update - Andy is unsure what to do and has asked for some advice, or if perhaps a special case can be made for this CE in the monitoring/gocdb.

121285 (8/5)
Fleeting atlas transfer problems, caused by a network blip. The blip has passed, and Marcus asks if there are any more problems seen? Waiting for reply (9/5)

SHEFFIELD
121279 (7/5)
Atlas transfer failures - Elena noticed that the files don't actually exist at Sheffield and will declare them lost forthwith. In progress (8/5)

MANCHESTER

120998 (22/4)
skatelescope.eu VO creation ticket, nearly done. On Hold (4/5)

120430 (24/3)
Enabling Icecube VO at Manchester. It seems quite involved (gpu jobs sound quite exciting!), things look to be moving along nicely. In progress (5/5)

RHUL
121257 (6/5)
ROD ticket for multiple problems - a CE fell over and is being looked at (the CE problems might explain the BDII failures). In progress (6/5)

121231 (5/5)
LHCB pilots dying at RHUL. After finding a few problems at fixing them Govind wonders if problems persist. Waiting for reply (8/5)

QMUL
121245 (5/5)
Friday ROD issues - looks like multiple CEs were/are having a bad time of it. Assigned (5/5)

120352 (22/3)
Enabling LSST at QM. Alessandra posted the link to the information that Dan asked for. In Progress (5/5)

120204 (15/3)
The well-understood problem with lhcb jobs submitting to QM's dual-stack CEs. Waiting on 120586, where there has been no news for a month, although the last entry seemed positive. On Hold (25/4)

100IT (for 100% completeness)
121189 (2/5) - Being handled.
121271 (6/5) - Assigned
(interestingly this ticket asks for support for dteam as a child of 121262).

And Finally...

THE TIER 1
120810 (13/4)
Biomed asked that their castor storage pool that's being decommissioned (see 120664) be set to read-only prior to the decommissioning date. Gareth pointed out that this request is redundant, as the disk pool is set to be made read only as detailed in the decommissioning announcement. On Hold (27/4)

120350(22/3)
Enabling LSST at RAL. Andrew L reports good progress, still some work to go through. In progress (6/5)

https://ggus.eu/?mode=ticket_info&ticket_id=120920 (19/4)
Sno+ having xrootd problems at RAL. A lot of back and forth going on, the issue is being worked on. In progress (6/5)

117683 (18/11/15)
Castor not publishing glue2. This is being worked on slowly in the background, requires no small amount of dev work. On Hold (5/4)

119841 (1/3)
HTTP support ticket from the HTTP TF. On Hold whilst the developers are consulted. On Hold (26/4)

120954 (21/4)
SRM endpoint simplification for LHCB. At last check it looked good to remove the old alias, with a thumbs up from LHCB. Waiting fore reply (should be "In progress" I think) (3/5)

121147 (29/4)
CMS file reading failures at the Tier 1. Andrew L checked things and they looked okay, and asked for some clarification and extra information but no word back. Waiting for reply (29/4)


Tuesday 3rd May 2016, 10.00 BST
36 Open UK tickets this week.

The bank holiday through me off, but here's what a brief dredge of the tickets this morning dragged up:

NGI (TIER 1?)
121179
I think this ticket about the neugrid.egi.eu cvmfs is meant for the Tier 1 Stratum-1 admins (citing a problem with cvmfs-egi.gridpp.rl.ac.uk).

GET YOUR SKATELESCOPE.eu ON
120998
skatelescope.eu was the name settled on for this VO, IC and OXFORD have got child tickets to roll out the new VO to the backup VOMSESeses.


RALPP
121155
CMS noted that the RALPP PheDex agents decided to take the bank holiday off too. Assigned (29/4)

OXFORD
121175
Oxford got a ticket due to ATLAS using up all their space - as discussed many a time this is not a site problem - thanks to Elena for defending the site's honour.

LIVERPOOL
121092
As seen on TB-SUPPORT when Steve put a call out for advice, Liverpool were/are seeing multicore atlas jobs fail due to a lost heartbeat. Alessandra's digging revealed batch system memory restrictions as the likely culprit, but we can chat about it if it doesn't get brought up elsewhere.

QMUL
120352
Enabling LSST at QMUL - Dan has asked for some LSST details: "What's the software directory? Is it available via cvmfs? Typically how many accounts have you set up at other sites (10 / 50 100) ? No production role needed?". Waiting for reply (29/4)

similarly:
TIER 1
120350
The Tier 1 LSST ticket, this may contain the answers that Dan seeks - as Alessandra notes some VO information seems to have once again disappeared from the Ops Portal.

ECDF
120004
ROD ticket for the Archer-fronting CE, which doesn't really work but needs to look like it's in production for atlas to send tests. How long before this becomes a problem for the ROD Dashboard? Could ATLAS jobs be easily forced to a service in downtime?

GLASGOW
120973
WMS and L&B decomissioning ticket. The Ticket Pedant is saddened by the unchanged default status of this ticket...


Monday 25th of April 2016, 15.30 BST
31 Open UK Tickets.

A NEW CHALLENGER APPEARS
120998 (22/4)
Squire McNab has ticketed himself (which always feels like a weird thing to do) to set up the skatelescope.eu VO on the Manchester VOMS. No doubt many of us will be interested in enabling a SKA VO. Assigned (22/4)

A FEW FEWER WMSes
120973 (21/4)
Glasgow have announced the retirement of their WMSes and Logging and Bookkeeping server at the end of next month, with the Downtime starting in a fortnight (9/5). Assigned (Oh Hold or In Progress it?) (21/4)

The Tier 1 has a few tickets that peaked my interest:
120954 (21/4)
LHCB would like to amalgamate their endpoints at the Tier 1 - bringing the tape and the disk behind the same name. Brian rounded it out with a question- I think for LHCB. In progress (should be waiting for reply?) (25/4)

119841 (1/3)
This HTTP support ticket almost certainly looks like it should be On Hold, possibly awaiting some development work. In progress (22/3)

Talking of On Hold:
120204 (15/3)
This LHCB ticket for QMUL looks like it should be put On Hold, as it is awaiting an external fix that's outside the site's control (see ticket https://ggus.eu/?mode=ticket_info&ticket_id=120586). In progress (14/4)

And finally:
120019 (7/3)
A CMS ticket asking for a change of federation subscription for Oxford. I know Kashif and Pete are looking at it, but do you need a hand from someone who knows the arcane CMS ways? In progress (5/4).


Monday 18th April 2016, 15.30 BST
33 Open UK Tickets this week.

RALPP having a bad time?
120872 (cms)
120879 (lhcb)
I hope everything's not too (or at all) melty at RALPP. Both tickets still just assigned.

Update - there were bad times, caused by the condor collector filling up its filesystem, but things should be sorted now and both tickets are solved.

BIRMINGHAM
120860 (15/4)
Biomed are once again finding that they're running out of room at Birmingham. It seems like they either are very unsure of what data their users may or not be producing, or have (possibly unrealistic) views on what other user groups can and should be doing with their data. Assigned (15/4)

MANCHESTER
120706 (8/4)
This Biomed ticket looks like it took the Low Road whilst you were taking the High Road, missing each other along the way. Assigned (13/4) Update - In progress, Biomed have been purged from the Manchester information system and Alessandra has asked for the site to be removed from any static lists.

TIER 1
120664 (7/4)
The ticket tracking the retirement of one of the RAL disk volumes (this one supporting biomed, na62 and mice). All above board, but it could do with being set in progress or on hold. Assigned (7/4)

120810 (13/4)
I think related to the above ticket, Biomed have asked that write access be removed to their volume. In Progress (13/4)

120624 (5/4)
Atlas Consistency Checking Ticket - I don't think this should be in "waiting for reply" any more. Waiting for reply (13/4)

119841 (1/3)
HTTP Task force ticket. No news for a while, but it looked like the situation might be a complicated one to fix - perhaps the ticket needs on holding whilst its sorted out? In Progress (22/3)

Monday 4th April 2016, 14.00 BST
26 Open UK Tickets this month.

NGI
119995 (7/3)
Uncertified site ticket for the UK - Jeremy is on the case, and there appears to be no need to rush. In progress (4/4)

120588 (4/4)
A fresh ticket, saying we have achieved insufficient "Quality of Support performance" - we had an average of a 1.4 day response time for very urgent tickets during March.

I've looked into this using the ggus report viewer and I believe we're being accused of a crime we only technically committed (if I'm looking at things right). We only had 2 "very urgent" tickets in this period, and one of them the site forgot to put In Progress, so had an erroneous response time of two and a half days. When averaged with the single other very urgent ticket this gave us an average response time > 1. Poor statistics is a right blimmer. I've updated the ticket - which was solved whilst I wrote the report.

The take home from this - please remember to set your tickets In Progress! It does actually matter (kinda).

SUSSEX
118337 (14/12/15)
Sussex Storage down for Sno+ - I assume this is still the case? Jeremy M replied a while ago but no news since. On Hold (15/2)

117894 (23/11/15)
One of the last Atlas Consistency Checking tickets - in a similar state to the former. On Hold (25/1) Update - Solved by Alessandra, can make do without for Sussex

118289 (10/12/15)
gridpp pilots at Sussex- again no news. On Hold (25/1)

I was supposed to poke the Sussex tickets before Easter but local things came up - I will prod them after tomorrow's meeting if we don't get a chance to discuss them during.

RALPP
118628 (5/1)
LZ support at RALPP. Chris tried to roll out the LZ-friendly test version of ARC to a production server but hit a roadblock and had to rollback. Chris is waiting on the fix to go out into the proper repositories, and is interested to see how things fair on a test centos7/umd4 ArcCE he has brewing (no pun intended). On hold (22/3)

120282 (18/3) Atlas HTTP taskforce ticket. Chris has asked that the tests be re-aimed at another, less-loaded server. Waiting for reply (1/4)

OXFORD
120019 (7/3)
A CMS ticket asking the Oxford T3 to change its xrootd federation subscription. Ewan was the chap who first-responded to this ticket, quiet since - it needs some attention. In progress (7/3)

117892 (23/11/15)
The other holdout of the Atlas Storage Consistency Checking tickets, and again in a similar state. In progress (24/3)

120345 (22/3)
At atlas ticket asking Oxford to update their xroot monitoring settings. Kashif battled this issue with Ilija's help, and with luck it can be closed. In progress (31/3)

BIRMINGHAM
119957 (4/3)
A ROD availiability ticket after their SE DB crisis, just waiting to for the alarms to go green. On hold (31/3)

GLASGOW
117706 (19/11/15)
Pheno (and other?) pilots at Glasgow. Gareth reports that they should have their new identity management system up and running soon (it it arrived on time). On Hold (23/3)

118052 (30/11/15)
ATLAS HTTP Taskforce ticket. Reopened just before Easter after tests started failing with TLS issues. Reopened (24/3)

120351 (22/3)
The first on a few enable LSST tickets - On Hold until the new identity management system is up and running. On hold (23/3)

120135 (11/3)
I'm not entirely sure why you chaps got a second http TF ticket, but you have (for a slightly different issue). In progress (1/4)

EDINBURGH
120004 (7/3)
ROD ticket for the test ARC CE fronting ARCHER, where tests fail as expected. I remember years ago being among many who couldn't think of a good reason to keep the "Production=yes, Monitoring=no" option, so they got rid of it - but it would perfectly apply here. How long can the ROD keep this ticket on hold before the dashboard self-destructs? On hold (29/3)

SHEFFIELD
118764 (12/1)
Another HTTP TF ticket. Elena kicked the services a while ago, but no news since (and the tests are still not passing by the looks of things). In progress (24/2)

114460 (18/6/15)
gridpp pilots at Sheffield. Did you get round to having a look at this? In progress (29/2)

MANCHESTER
120430 (24/3)
Ticket tracking setting up Manchester for Icecube glideins (the coolest of VOs...). It opens with a request to the Manchester site admins to enable their user (looks like just the one pilot DN), but no reply (as the Mancunians might have missed that the ticket has turned on them). Assigned (24/3)

LANCASTER 120412 (24/3)
Atlas deletion errors at Lancaster - caused by a few files badly drained back in 2014. I'm trying to figure out a clever, database-y way of listing all the files on these long gone servers (the best I've got so far is `select * from Cns_file_replica where host like 'fal-pygrid-%';`, but of course the dpns mapping isn't that straightforward. Expect a cry for help soone! In progress (4/4)

RHUL
119509 (12/2)
Sno+ job directories being cleaned up prematurely. It looks like this problem could have been transient - Matt M submitted some test jobs and didn't see the problem, and is re-testing with some proper work. Hopefully those tests completed okay. In progress (22/3)

QMUL
120352 (22/3)
Request to enable LSST at QM. Dan has asked for a reminder after/during GRIDPP36. On hold (24/3)

120204 (15/3)
LHCB having issues with some of the QM CEs. The reasons for this are unclear - pilots stopped around the start of March and the problem persisted at last check. In progress (17/3)

THE TIER 1
117683 (18/11/15)
CASTOR not publishing GLUE2. It's being worked on in people's spare time - any recent news? If not, maybe progress is slow enough to warrant on-holding the ticket. In progress (17/2)

119841 (1/3)
HTTP TF ticket, this time for LHCB. Proxy functionality isn't working (although regular cert/key pair access is okay) - this functionality was never turned on and is being looked into. In progress (22/3)

120350 (22/3)
Request to enable LSST at the Tier 1. Daniela notes that the Tier 1 will likely hit the same problem as RALPP for LZ (118628), Andrew L concurs. Pool accounts have been requested, things chug along nicely. In progress (22/3)

Monday 21st March 2016, 15.15 GMT
29 Open UK Tickets this week.

After Ewan
Now that Ewan's living it up at his new job the Oxford tickets might need extra shepherding - let us know if you need help Kashif. The tickets are:

117892 (23/11/15)
Atlas consistency checking ticket. On Hold (16/3)

120019 (7/3)
CMS federation subscription change request. In progress (7/3)

120052 (8/3)
HTTP TF ticket. It appears to be looking hopeful though. In progress (14/3)

Whilst we're talking HTTP TF:
GLASGOW 120135 (11/3)
Looks like this ticket has snuck by, or maybe you chaps just never got roundtuit. Assigned (11/3)

SHEFFIELD
117886 (23/11/15) Atlas consistency check ticket - Elena's working on it, but the dump script fails as her DPM has run out of connections. Odd. In progress (21/3) - Update already - Elena ramped up the number of connections in my.cnf and things started working - just having trouble uploading the dumps now.

And I don't like to nag but the other two Sheffield tickets could do with an update:
118764 (http tf) and 114460 (pilot rollout)

QMUL
120204 (15/3)
A dearth of LHCB pilots at QM. Dan spotted that *something* broke at the start of March, and handily gave a list of suspects. Not sure which one is spoiling things though... In progress (17/3)

And that's all from me. The SUSSEX tickets will need chasing up again, I'll do that - plus the NGI ticket 119995 is a bit quiet. Finally, thanks to Alessandra for wrangling the Atlas Consistency Checking tickets.

Update - the RHUL Atlas Consistency Checking ticket looks on the verge of closure: 117881

Other VO Nagios looks clean. Nice one!

Monday 14th March 2015, 14.00 GMT

27 Open UK Tickets.

The Highlight(s):
The HTTP TF Tickets to DPM sites have mostly been reborn, seemingly changing tack from "http ain't working on your DPM" to "this ain't working all that well on your DPM - probably due to https".

The take home message from these tickets is:

"The DPM team strongly recommends disabling https on the disk servers. It is frequently a source of problems and has a significant performance penalty. Access is still authenticated and authorised on the head node which passes a token to the disk, so the setup is secure."

An example of one of these tickets (Manchester, by virtue of being the most recently updated): 120139

And um, that's it for interesting tickets AFAICS (over 50% of our tickets fall under atlas consistency checks, http TF tickets or rolling out pilot accounts). Let me know if I'm missing some excitement somewhere.

Looking at the other VO nagios... nope, that looks fine too (at time of writing). How peaceful...

Monday 7th March 2016, 14.30 GMT

28 Open UK Tickets this month.

NGI
119995 (7/3)
In some kind of clean up operation 5 old NGS sites that are uncertified have been identified for the "chopping block". Assigned (7/3)

ATLAS CONSISTENCY CHECKING SCRIPTS
SUSSEX 117894 On Hold (25/1)
OXFORD 117892 On Hold (12/1)
SHEFFIELD 117886 On Hold (29/1)
MANCHESTER 117885 On Hold (10/1)
RHUL 117881 On Hold (1/2)
QMUL 117880 Waiting for reply (25/2)

SUSSEX
119383 (5/2)
Low availability ticket - site recovering. On Hold (25/2)

118289 (10/12/15)
gridpp pilots, grounded after Matt RB left. Daniela has reiterated the need for this (as banning the site for the gridpp VO will ban it for snoplus too). On Hold (3/3)

118337 (14/12/15)
Sno+ having problems with the Sussex SE. The Sussex SE has been replaced, which will require some work with the Sno+ LFC (or aliasing magic). On Hold (15/2)

RALPP
118628 (5/1)
Getting LZ pilots working at RALPP. After trying out a patched version of ARC on a test CE there still appears to be a few problems with submission- no update for a few weeks though. In progress (15/2)

120006 (7/3)
A freshly squeezed ROD ticket. In progress (7/3) Update - dcache ws restarted just in case, but not sure what's going wrong. Nagios error messages aren't helpful.

BRISTOL
119930 (3/3)
A CMS user having trouble getting a file - it appears GFAL worked where xrdcp didn't. I suspect this ticket can be closed, the user seemed happy (and very polite!). Assigned (can be closed) (4/3) Update - solved

BIRMINGHAM
118155 (4/12/15)
Biomed problems with the Birmingham SE, ending with them greenlighting the removal of all their dark data (which I believe is all the biomed data still left on the SE). Matt's started the purge. In progress (7/7)

GLASGOW
118052 (30/11/15)
HTTP TF ticket - things seem to be intermittently working, Georgios spotted some interesting issues - but at least right now the SE looks all green. In progress (16/2)

117706 (19/11/15)
A pilot ticket, this one pheno-centric. Waiting on some infrastructure work at Glasgow. On hold (15/1)

ECDF
120004 (7/3)
A ROD ticket to the ARCHER facing ARC CE. Andy knows this will be a problem child, and has asked if there's a way to pull it from the ROD monitoring in a way that will still allow it to look in-production to ATLAS? Waiting for reply (7/3)

SHEFFIELD
118764 (12/1)
HTTP TF ticket. Things look a little odd on the probe page, but there's a fair amount of green. Any news? In progress (25/1)

114460 (18/6/15)
Pilot ticket - Elena rolled out the pilots but things didn't seem to work as intended. Any luck with this last week? In progress (29/2)

LIVERPOOL
119983 (4/3)
Some hardware (RAID) faults on a few pool nodes having been causing problems for some atlas users, but the Liver-lads are fighting the good fight. In progress (7/3) Update - solved. But I personally would like to hear about what hardware was failing in the Storage meeting.

RHUL
119795 (28/2)
Atlas transfer error ticket - fallout from the files lost during RHUL's draining troubles. Being declared lost. In progress (28/2) Update - spawned a ticket to track the cleanup: 120009

119509 (12/2)
Sno+ jobs are occasionally failing at RHUL with what looks to be premature sandbox cleanup problems. Govind is back in the saddle, and asked that some jobs be sent his way for testing. In progress (3/3)

QMUL
119013 (21/1)
CMS enabling QM and Glasgow as T3s - although the buck seems to have stopped at QM. After a lot of work it looks like we're waiting on the production team to greenlight the two sites. We might want to chase them up sooner rather then later. Waiting for reply (29/2).

IMPERIAL
119617 (19/2)
The CMS multicore adventure at Imperial. The jobs have run, so that looks good - CMS have asked if there is any form of reservation at the site, to which Simon replied with a resonating "kind of". Waiting for reply (7/3)

100IT
116358 (22/9/15)
Ongoing problems with missing images - work is still continuing this, but I won't go into it. In progress (2/3)

TIER 1
116864 (12/10/15)
CMS AAA test problems. CMS report that things seem to look better this week (EU redirector open and read tests are OK), and wonder if anything has changed? Has it? In progress (23/2) Update - Andrew L reports nothing changed. Maybe it was the nice Grid Pixies? We don't see them very often!

117683 (18/11/15)
CASTOR not publishing GLUE2. Jens reports that there's not been slow progress due to lack of time and ongoing CASTOR upgrade work, but slow progress is better then no progress! In progress (17/2)

Monday 29th February 2016, 15.00 GMT
Link to the 31 Open UK Tickets

A light review this week, some notes:
Still nothing from atlas on the Storage Consistency Check tickets-the ball is firmly in atlas' court.

Sheffield has two tickets that need some love:
118764 (http support)
114460 (pilot rollout)

Plus this Birmingham Biomed ticket has been left hanging (after Biomed gave the go ahead for purging their dark data at the site):118155.
(although I appreciate that Matt has had bigger fish to fry recently! I don't envy having to restore your DPM DB).

Helios is expiring: The Helios VO has hit a spot of bother and asked the Manchester VOMS admins to do...something. Robert has asked for clarification: 119363

And that's all I'll go into.

Looking at the other VO nagios

I see some persistent failures for pheno and t2k with the Imperial SE - a getTURLS failures (failing on the http protocol). I saw something like this at Lancaster but for the life of me can't remember what we fixed. Still I don't think this is a functional functional test!

Monday 22nd February 2016, 15.30 GMT
37 Open UK Tickets this week.

NGI
118930 (18/1)
This information system ticket really needs some attention. Assigned (19/1)

CMS Multicore
Brunel: 119618
Imperial: 119617
RALPP: 119616
CMS are to be rolling multicore pilots soonish and requested some information to set up their test queues with. Brunel might have missed the ticket, the other two are chugging along nicely. Update - Brunel's updated their ticket, so all's good.

Whilst we're talking CMS
119013 (21/1)
This ticket (wrongly assigned to just QMUL at the moment) seems to have become an odd catchall for enabling Glasgow and QM as Tier 3s. The CMS guys seem to think jobs should be flowing/trickling now, so maybe this can be closed? Assigned (18/2)

RHUL
119509
Govind is away and when the admin isn't looking things start breaking - in the case of this ticket Sno+ have disabled submission to RHUL so the ticket should be On Holded (I didn't want to On Hold the ticket myself, as that's a recipe for the ticket getting forgotten about). Or perhaps someone has a suggestion to tackle the problem? Assigned (12/2)

100IT
119534 (15/2)
ROD ticket for 100IT, where they're accused of failing a test that they shouldn't be failing. David opened a ticket about this (https://ggus.eu/index.php?mode=ticket_info&ticket_id=119513) but not received any attention at all - was it submitted to the right group? In progress (22/2)

GLASGOW
118052 (30/11)
HTTP support on the Glasgow SE. You seem to have been "upgraded" to "failing intermittently" (a possible title for my autobiography). Did you change anything to upgrade your status? In progress (16/2)

TIER 1
119389 (5/2)
This LHCB data transfer ticket to the Tier 1 has been waiting for a reply for a while now. Any news from lhcb? Waiting for reply (15/2)

Those 8 Atlas Storage Consistency Check Tickets
A chat about this at the Thursday atlas UK cloud meeting revealed that the chap handling these has gone to Argentina. It was unclear whether this was business, pleasure or as a GGUS fugitive escaping the grumpyness of dozens of site admins.

Updates:

Unsolved but not Unforgotten, the tarball glexec tickets
ECDF: 95303
Lancaster: 95299

Can be solved
Brunel: 119682 This ROD ticket looks like it's sorted now. Good stuff!


Monday 15th February 2016, 13.30 GMT

37 Open UK Tickets.
Link to them all: http://tinyurl.com/nwgrnys

A few highlights:

BRUNEL
118740 (10/1)
Atlas MCORE problems at Brunel. Raul has experimented with restricting MC jobs to nodes where the Condor Memory Checking is disabled, with promising results. Waiting for reply (13/2)

QMUL
119013 (21/1)
Enabling CMS T3 - this ticket has been reopened for QM. Dan has asked for some clarification and information with respect to xroot settings for CMS. The status could do with a tweak... Reopened (12/2)

RALPP
118628 (5/1)
The deployment of LZ pilots hitting an arc bug. Chris has managed to get ahold of and deploy the updated packages on his test CE (impressive turnaround!), and wonders if it works now. Waiting for reply (11/2)

And I think that's it - still a lot of atlas consistency checking tickets that I will mention in the Thursday atlas meeting - although I think Alastair and Brian are aware of them.

Other VO Nagios
I haven't looked at this for a while, the Imperial SE seems to have been seeing problems for pheno and t2k.org for nearly a fortnight.

Monday 8th February 2016, 13.30 GMT
44 43 Open UK Tickets this month. Going over all of them, in kinda-alphabetical order.

NGI
118930 (18/1)
That NGI information ticket, linked to the "wrong" (according to some) information being published by the UK arc CEs. This has haunted us for a while, the consensus was the ticket is a load of B-word and not really worth worrying over - but it does warrant a response (from someone over that Steve J).. Assigned (19/1)

SUSSEX
With Matt RB off to pastures green Sussex is in limbo - I'll contact Jeremy M concerning this last week's fresh tickets.

117894 (23/11)
Atlas Consistency Checking. On hold (25/1)

118289 (10/12)
Gridpp Pilots. On hold (25/1)

118337 (14/12)
The Sussex SE was not working for Sno+ - the most serious of these older issues. On hold (25/1)

119383 (5/2)
ROD Availability ticket. Assigned (5/2)

119384 (5/2)
ROD CA distribution ticket. Maybe the two ROD tickets are correlated (i.e. if we fix this one the previous one will soothe itself?) Assigned (5/2)

RALPP
118945 (19/1)
Poor CMS SAM results for RALPP due to digi-reco work pummeling the RALPP storage - Chris has asked for the digi-reco workload to stop at RALPP, then asked for clarification as to why the site was still in unknown state. Waiting for reply (25/1) Solved - it was them, not RALPP - a restart of the SAM services looks to have cleared the issue,

118628 (5/1)
LZ Pilot deployment at RALPP. Chris has submitted a bug report to nordugrid to fix the issue (http://bugzilla.nordugrid.org/show_bug.cgi?id=3529), which was fixed and should be available in the next release. On Hold (26/1) Update - Chris is trying to get hold of a pre-release to test things.

OXFORD
119197 (29/1)
CMS has asked to change some CRAB site configs at T3s - Daniela has ashed Chris B if he's the one looking after this for Oxford. Assigned (3/2)

117892 (23/11)
Atlas consistency checks. Ewan has firmly and clearly put this on the backburner. On hold (12/1)

BIRMINGHAM
118155 (4/12)
Biomed having a clear up of their stuff on the Brummie SE. Franck has given the nod for deleting the dark data left in the DPM after their cleanup efforts. It's on their heads now! In progress (2/2)

117890 (23/11)
Another Atlas Storage Consistency Checking ticket. Any chance to have a look at this again? On hold (15/12)

GLASGOW
117706 (19/11)
Another pilot ticket, this time for pheno. Glasgow were going to roll this into their overhaul of their identity management gubbins, but the Universe messed with their plans. How goes things? On hold (15/1)

118052 (30/11)
HTTP support on the Glasgow SE. I suspect progress here took a similar shoeing to the identity management plan - but the ticket could do with an update (and maybe on holding). In Progress (4/1)

ECDF
118787 (12/1)
Another HTTP ticket. Let us know if you need a hand Marcus and Andy. Or if you're too busy to make this a priority consider on-holding it. In progress (12/1)

95303 (1/7)
Tarball glexec ticket. On hold for a very long time.

An update on this - I managed to put in some good hours on trying to build a relocatable glexec last week, successfully building from source glexec and the lcas/lcmaps stack. *But* I still have rpath problems - short of attacking every lib file with patchelf I'm not sure how to proceed, and the process is such a mess that I'm not sure if I'll ever manage to make it into a proper recipe (much like my cocoa-butter shortbread).

SHEFFIELD
119374 (5/2)
A fresh ticket from Biomed, about incorrect/no dynamic information being published at Sheffield. In progress (5/2) Update - see Steve B's post to TB-SUPPORT for clues, Elena is retackling these problems today.

118789 (12/1)
ROD Information system ticket, almost certainly caused by the same underlying issue. Is the bdii service on your CEs silently dying or failing to update?

114460 (18/6)
Gridpp Pilots. Changes were implemented but at last check things weren't working right. How goes it now? In progress (20/1)

117886 (23/11)
Atlas Storage Consistency Check ticket - any luck with this? On hold (29/1)

118764 (12/1)
HTTP support ticket for the Sheffield SE. Have you had a chance to have a look at this? In progress (25/1)

The Storage list can lend a hand fixing either of these issues (which goes for everyone of course).

MANCHESTER
118679 (7/1)
HTTP support (atlas edition). Hit a problem due to there being no outside-a-space-token space at Manchester. On Hold (12/1)

118674 (7/1)
HTTP Support (lhcb edition). As above. On Hold (12/1)

117885 (23/11)
Atlas Storage Consistency Checks - hit the same problem as the previous 2 tickets. On hold (10/1)

118603 (4/1)
A VOMS ticket rather then a site ticket, removal of the nsccs.ac.uk VO. The VO has been removed from the other UK voms servers. In progress (5/2) Update-solved

LANCASTER
95299 (1/7)
Lancaster's glexec tarball ticket. See the entry above - although I really need to update the ticket properly! Practice what you preach, Matt! On hold.

RHUL
119380 (5/2)
ROD Low availability ticket - the site is in the green now, so it's the usual 30-day wait. On hold (8/2)

117881 (23/11)
Atlas SCC ticket. On hold until March. On hold (1/2)

QMUL
117723 (19/11)
Pilots at QM. Dan's been working on this, and asked Daniela for a picture of what should be enabled[1] - Any joy? In progress (27/1)

[1] http://www.hep.ph.ic.ac.uk/~dbauer/dirac/site_pilot_status.html

117880 (23/11)
Atlas SCC ticket (wish I had started using that acronym sooner). Just waiting for the nod from atlas that all is well. Dan included the script he uses that may be useful for other STORM sites. Waiting for reply (4/2)

118985 (21/1)
QM has banished biomed from their queues until QM have a cgroupy solution to the ill-behaved biomed user jobs. Biomed have asked that the ban be reconsidered and problem users by dealt with by the VO. QM are perfectly right to say no to this, but it'll be nice to not leave them hanging. On hold (1/2)

119348 (4/2)
LHCB have noticed cvmfs issues on some nodes, which Dan couldn't replicate. Dan ponders that perhaps this is caused by ephemeral memory issues on the nodes, noting more swap being used recently. Waiting for reply (4/2)

119409 (8/2)
Fresh ROD emi glexec ticket - things exploded at the weekend but the QM admins are fighting the good fight. In progress (8/2)

IMPERIAL
119294 - but this got solved by the times I got to it (it concerned a java update breaking md5).

BRUNEL
117878 (23/11)
Atlas SCC - Raul provided an example and is waiting on atlas to give a yay or nay before deploying. Waiting for reply (18/1)

118740 (10/1)
Atlas MCORE problems at Brunel, looks to be caused by some extreme Condor oddness, Raul reconfigured Condor to give a better view. Any joy? In progress (25/1)

100IT
119002 (Reopened)
116358 (In Progress)
Not going into detail with these as I'm not sure what the crack is with 100IT.

AND FINALLY...

THE TIER 1
118809 (12/1)
The Tier 1 provided feedback on configuring memory limits for batch jobs, the ticket left open for follow up. On hold (13/1)

116864 (12/10)
CMS AAA tests failing. Andrew L reports that the CASTOR headnode has received what sounds like a big fix which will hopefully improve things. In progress (29/1)

119389 (5/2)
LHCB data transfer problem to RAL. Being looked at. In progress (5/2)

117683 (18/11)
Another publishing ticket. How we love those! This one about CASTOR not publishing GLUE 2. Code was written by Jens and Rob but not integrated, something that works might be a long way off. That was a month ago, any news since? In progress (5/2)

109358 (15/10) or (5/2)
This ticket is weird - it started in a "waiting for reply" state and was apparently issued in 2014! I can't find a ticket with this number in my records though. Sno+ are unable to use the RAL WMS - it's being looked at. In progress (5/2)


Monday 1st February 2016, 14.30 GMT
50 Open UK Tickets this week, no Ops meeting scheduled so postponing a full review.

org.bdii.GLUE2-Validate tickets
We have 8 sites with these tickets (7 as Bristol have slain theirs), these are being discussed on TB-SUPPORT. A lot of these are still just assigned though - even if the issue is not really our fault we still need to handle the ticket proper. Rising above it all and all that.

If someone has submitted or knows of a counter-ticket for this issue please let me know.

NGI
Talking about a pain in the Information System, the UK still has this ticket to close (which has a similar root problem): 118930

CMS Siteconf problems.
GLASGOW 119196
EDINBURGH 119195
OXFORD 119197

CMS have spotted a number of misconfigured T3s across the globe (on a Friday afternoon)- the fix seems to be straightforward enough and Glasgow look like they're done already. Proper job!

ATLAS CONSISTENCY CHECKS
We still have 8 tickets open on this issue, although a couple are waiting for feedback from atlas. I'll bring this up in the Thursday UK atlas meeting to see if we can't shimmy along the tickets waiting for atlas feedback.

PILOTS
117723
Whilst investigating pilot issues at QM Daniela reminds us of this page that tells us what Dirac things should be going on at your site. Might be handy to preempt problems:
http://www.hep.ph.ic.ac.uk/~dbauer/dirac/site_pilot_status.html

118628
Whilst rolling out similar changes for LZ at RALPP Chris stumbled upon a problem, for which he submitted a bug report to nordugrid: http://bugzilla.nordugrid.org/show_bug.cgi?id=3529

AND FINALLY

QMUL
118985 (21/1)
Biomed have got back to Dan suggesting that rather then ban them altogether until he has a cgroup-corral to put their jobs in if he would be willing and able to supply a list of the problem users. Of course this requires that there be any non-problem users in the VO... On hold (1/2)

Monday 25th January 2016, 14.30 GMT

"OTHER VO" NAGIOS
Looks like hepgrid2.ph.liv.ac.uk at Liverpool is playing up for all VOs, and the Sheffield SE is misbehaving for the gripp VO. Other then that it looks clean.

43 Open UK Tickets this week.

That ticket to the NGI...
118930 (18/1)
Steve J put in a comprehensive reply about what Liverpool do to get their publishing kinda right. The view on this ticket from last week was to close it with a <carefully|harshly> worded statement about why this is a bit of a pointless request. Who was formulating the reply? If it was me I dropped that ball! Assigned (19/1)

Pilots Problems.
BRUNEL: 117710 Pheno. On Hold (19/11/15)
QMUL: 117723 Pheno - hopefully sorted. Waiting for reply (25/1)
SHEFFIELD: 114460 gridpp et al. In Progress (20/1)
RALPP: 118628 LZ (and maybe LSST?). In progress (14/1)

We have a few pilot rollout tickets, the last two being worked on but proving problematic.

RHUL
119027 (22/1)
As seen on the gridpp-storage list, Sno+ have asked RHUL (and will no doubt as others) for storage space (~20TB). In progress (22/1)

(for the interest of others the Govind's other thread on gridpp-storage was likely triggered by https://ggus.eu/?mode=ticket_info&ticket_id=118553)

QMUL
118985 (21/1)
QM have banished biomed from their cluster until they have a batch system that can put Biomed jobs in a c-group cage (looking at slurm). On Hold (21/1)

BIRMINGHAM
118155 (4/12)
Talking of Biomed, they've asked if they've successfully cleaned up all their files on the Birmingham SE - a cheeky uberftp onto your SE suggests the biomed directory is still full of cra.. I mean, files. In Progress (20/1)

HTTP TF Tickets
118787 (ECDF)
118764 (SHEFFIELD)
Feel free to poke the gridpp storage group for help with these. (I left out the 2 Manchester tickets as their immediate showstopper isn't their configs- but they can ask for help too!).

ATLAS CONSISTENCY CHECKS
Manchester, Oxford, Birmingham, Sussex, RHUL, Sheffield, Brunel and QMUL still open - a mix of chugging along nicely and being very much "On Hold".

Monday 18th January 2016, 14.00 GMT
49(!!) Open UK Tickets this week

NGI
118930 (18/1)
The NGI received a ticket concerning incorrect or missing glue information for the Tier 1, Brunel, Imperial, Liverpool, Durham, Glasgow, Bristol, Oxford and RALPP. The variables in question are GlueSubClusterPhysicalCPUs, GlueSubClusterLogicalCPUs and GlueHostProcessorOtherDescription. There are some extra instructions in the ticket - it would be nice if we didn't have to create child tickets (hint hint...).

ATLAS CONSISTENCY CHECKS (10 tickets)
Progress, or at least non-exciting but reassuring updates, on these. Birmingham and Glasgow tickets could do with an update (even if it's a "nothing to see here").

The QMUL ticket had an update providing feedback that might be useful to others too:
https://ggus.eu/?mode=ticket_info&ticket_id=117880

HTTP TF (5 tickets)
ECDF, Manchester, Sheffield and Glasgow are on the HTTP TF list - although no tickets are stale at the moment.

TIER 1 RECOMMENDATIONS
118809 (12/1) An interesting ticket asking T0 and T1s to fill in a questionnaire on configuring batch job memory limits - the Tier 1 have did their bit and the ticket is On Holded for feedback.

GLASGOW
118732 (9/1)
This ticket has got confusing - atlas want a dump for files "lost" at Glasgow that by the looks of it actually never made it to the site in the first place... Waiting for reply (15/1)

TIER 1 DUPLICATES
Are these three CMS are the same (or similar or related) issues -or am I just getting my wires crossed?
118494 (23/12/15)
116864 (12/10/15)
118722 (8/1)

CAN BE CLOSED (I THINK)
IC - 118162 (lfc ticket)
QM - 118839 (atlas job mcore jobs failures - doesn't look like the problem persists).

NEARLY THERE:
Lancaster - 118637 (squid misconfiguration hammering statum-0)
Birmingham - 118155 (biomed SE use - biomed now think they deleted all data at Birmingham).

Monday 11th January 2016, 14.30 GMT
48(!) Open UK Tickets this week

  • VOMS TWEAK

118603: nsccs.ac.uk has been requested to be removed from the gridpp voms servers. Just "Assigned" to the UK as a whole at the moment.

  • THE HTTP TASK FORCE STRIKES

Lancaster, RHUL and Manchester all had http TF tickets alongside Glasgow. Your site might be next! It'll be worth checking the monitoring pages and reviewing the documentation if you are: atlas: http://cern.ch/go/h8Rr
lhcb: http://cern.ch/go/Bk8J
https://twiki.cern.ch/twiki/bin/view/LCG/HTTPTFSAMProbe

  • TRANSFER ODDITIES

118494: The Tier-1 have a CMS ticket where xrootd is expecting a file which phedex and DAS don't think is at RAL. Is this even a site problem?

118728: In a similar vein, QMUL have an atlas ticket where a single file is refusing to be transfered - Dan has noticed a number of write attempts followed by immediate deletion. Checksumming causing a problem?

  • LOW HANGING FRUIT- tickets that can probably be closed, or are close to it.

IMPERIAL 118162
A ticket for the Imperial LFC, which appeared to be working (for Janusz at least).

RALPP 117740
Atlas datadisk cleanup ticket. Elena confirmed that the step09 directory can go for the chop. Not sure if Brian has had a chance at looking at the users directory contents yet.

BRISTOL 118311
I suspect that this CMS SAM ticket can be closed as the CEs were all green.

  • ATLAS CONSISTENCY CHECKS

As requested at the Thursday atlas meeting here's the outstanding consistency check tickets.

IMPERIAL: 117879
Not much news, (understandably) low priority for the site.

SUSSEX: 117894
It doesn't look like Matt got round to this before he left.

SHEFFIELD: 117886
Set in progress but no news since.

OXFORD: 117892
A similar case here - I assume it's on Ewan's to-do list before he heads off to pasture's green.

BIRMINGHAM: 117890
Matt was going to look at this again in the New Year. Any joy?

RHUL: 117881
Govind was going to try to get to this before Christmas. Any luck?

GLASGOW: 117889
Back in 2015 the dumps were run and Sam asked for some clarification. Considering Glasgow's current state any dump made using these tools might be full of lies, but I know that you chaps are working on this problem.

BRUNEL 117878
Raul asked some questions in his ticket, for which atlas only replied last week.

QMUL: 117880
Dan has created dumps and has asked for the all clear before he sets up the monthly cron.

TIER 1: 117846
Dumps have been created, but gfal and castor issues have slowed down the checking process (gfal-cat doesn't seem to work with castor).

MANCHESTER: 117885
This ticket was recently On-Holded, as currently Manchester has 0 free space outside of tokens whilst a few disk servers are down.

Monday 4th January 2015, 14.30 GMT
HAPPY NEW YEAR EVERYONE!

38 Open UK Tickets this year.

All-the-UK-tickets URL: http://tinyurl.com/nwgrnys

As Jeremy spotted, with Matt RB off to pastures new the Sussex tickets are looking a bit neglected, especially as one was reopened after his departure:
118337
118289

Finally in this Glasgow ticket the submitter gave two new links for the http taskforce monitoring: 118052

The links to the http tf monitoring pages are:
atlas: http://cern.ch/go/h8Rr
lhcb: http://cern.ch/go/Bk8J