Monday 3rd August 2015, 14.30 BST
23 Open tickets this month, full review time.
Newish this morning
As discussed on TB-SUPPORT, a few sites have been getting "I can't lcg-tag your CE" tickets from Biomed. The Liverpool ticket 115449, solved and verified, was the flagship of these issues. Brunel (115445) and EFDA-JET (115448) also have tickets about this.
Sno+ "glite-wms-job-status warning" (3/8)
Glasgow: 115435
Tier 1: 115434
Matt M submitted these tickets to Glasgow and the Tier 1 after having trouble with a proportion of Sno+ jobs. Both are being looked at- definitely worth collaborating on this one.
Wiki-leak...
115399(31/7)
Jeremy noticed that the wiki didn't work for him on Friday - but it seems to work for Jeremy, Alessandra and myself now. As Jeremy notes the ticket can be closed, but out of interest did anyone else spot any problems? In progress (3/8)
Spare the ROD...
LIVERPOOL
115433 (3/8)
Some CE problems noticed on the dashboard for the Liver-lads - who might be in mourning. Assigned (3/8) Update - aaannd Solved by upping gridftp max connections
RALPP & UCL
114764
114851
Both of these are "availability" alarm tickets, on-holding until they clear. I hope RALPP managed to get a re-computation for their unfair failures (IGTF-1.65 problems on ARC).
Sno+'d Under
115387 (30/7)
I'm uncertain if this Sno+ ticket, probably somewhat related to Matt M's recent thread on TB-SUPPORT and concerning xrootd access, is meant for the Tier 1 or RALPP. Assigned (3/8)
First Tier Problems.
115417 (2/8)
LHCB spotted a number of nodes with cvmfs problems at the Tier 1, which the RAL team had already jumped on and repaired this morning. They wonder if the problem persists. Waiting for reply (3/8)
115290 (28/7)
An FTS problem requiring some special CA magic to solve, but the current CA-wizard isn't about. On hold (29/7)
113836 (20/5)
Glue 1 vs Glue 2 queue mismatches. It's being worked on perfecting cluster publishing for ARC CEs, but the ticket could either do with an update or on-holding. In progress (24/6)
114992 (10/7)
CMS transfers failing between RAL and, err, TAMU in the US. Assigned to RAL, where Brian has investigating and Andrew has posed an good question, asking if the user has considered managing the transfers with FTS. Quiet on the user side. In progress (21/7)
108944 (1/10/2014)
One of the tickets from the before times, about CMS AA access tests. It has become a long and confusing saga, but Gareth rescued it with a handy summary of the issue in his last update. How goes the battle? In progress (17/7)
Oxford Squid is red.
115230 (24/7)
Which might be the colour Ewan's seeing right now! The ticket is reopened, with a comment from Alessandra that the current recommendation is to allow all CERN addresses, and asks if this is something Oxford could do. Reopened (3/8) Update - solved after a squid restart. It's not the end of this ordeal, but Ewan would like to tackle it in a different arena then an Oxford ticket.
Mavaricks and Gooses - GridPP Pilot roles.
114485 Bristol
114460 Sheffield
114442 RALPP
114441 RHUL
Daniela hopped right back into the pilot seat after getting back from her holidays. Bristol and RALPP are looking good, Sheffield and RHUL are still in the Danger Zone - RHUL in particular were having troubles with argus and could do with some working configs from elsewhere to compare and contrast with their own.
Update - RHUL are looking better, just a few queue permission tweaks to go by the looks of it.
My shame - tarball glexec
95303 ECDF
95299 Lancaster
The tarball glexec tickets. Actually this is likely to become a defunct (or at least different) problem at Edinburgh with their SL7 move. Lancaster has a plan - we plan to deploy *something* (amazing plan there Matt) during our next big reinstall in September. Between now and then I have a test CE, cluster and most importantly some time.
Pot-luck tickets (or those I couldn't group).
Durham
114381 (16/6)
A tiny fraction of jobs publishing 0 cores used. Looks to be a slurm oddity. Oliver upgraded their CEs to ARC5 last week and hopes this has fixed things. Fingers crossed! In progress (29/7)
Lancaster
100566 (27/1/2014)
My other shame - Lancaster's poor perfsonar performance. It's being worked on. On Hold (should be back in progress soon) (30/7)
Liverpool
114248 (21/7)
Sno+ production problems at Liverpool, probably due to a lack of space in the shared area. Things are back in Sno+'s court, with the submitter consulting the Sno+ gurus (I think). In progress (21/7)
QMUL
114573 (23/6)
LHCB job submission problems due to the known about dual-stacking problems. Waiting for input from LHCB for a while now, as things look okay at QM now but at least check LHCB jobs still weren't running for some reason. Waiting for reply (21/7)
That's all folks!
|