Monday 3rd of November 2014, 14.45 GMT
26 Open UK tickets this month.
Sussex
109539(22/10)
Sussex publishing "all the 4s" (bdii bingo!) for their waiting jobs. Matt RB has a ticket in with the developers over these problems (109263), although he has bravely said that he might try to tackle the problem himself...and it looks like lcg-infosites returns a sensible number now. On Hold (can be closed?) (23/10)
108765(24/9)
Cross-referenced with the above ticket, looking at the last few updates it looks like Matt RB release a spooky Hallow'een patch, and now they look to be green. Another ticket that can be closed? On hold (31/10)
Bristol
106325(18/6)
CMS pilots losing connection at Bristol. No news for a while, it looks to me like Bristol are still in downtime though? This has been a tough issue to debug. On hold (14/10)
Glasgow
109807(1/11)
Someone at atlas were trying to raise the dead at Glasgow over Hallow'een, although rather then zombies it was long lost files. It appears that despite these files being declared lost last summer the deletion/recovery ritual hadn't been completed. UK cloud support are on the case. In progress (3/11)
Edinburgh
95303(1/7/13)
Tarball glexec ticket. On Hold (29/8)
Durham
108273(5/9)
Durham's perfsonar results going "proper weird" suddenly. The local networking team where on the case, but the perfsonar got offlined from fear of shellshock and there has been no news since (is it alright to reinstall perfsonar yet?). On hold (6/10)
Sheffield
109207(8/10)
Sno+ asking for their VO_SW_DIR to point to cvmfs. Elena rolled this out, but sadly the ticket was reopened due to some job failures accessing cvmfs, and a few holdouts still with the wrong environment variable (Matt M threw in some CE errors he was seeing too, but he was very apologetic about it). Elena's investigating. In progress (30/10) Update - Catalin posted a reminder of the new cvmfs-keys release (1.5-1), and suggested moving snoplus' cvmfs area to teh egi.eu domain - /cvmfs/snoplus.egi.eu
Manchester
109272(11/10)
Atlas have been seeing transfer problems, although it looks like these failures have mutated since the ticket was opened (checksum errors to srm type errors by the looks of it). Alessandra is on the case. In progress (3/11)
Lancaster
108715(23/9)
Getting Sno+ jobs running at Lancaster. It looks like everything is in place, just waiting for Sno+ to confirm (or give us a list of errors!). Waiting for reply (30/10)
95299(1/7/13)
Tarball glexec ticket... no news other then my last attempt a few weeks ago failed (not as simple as I hoped) On hold (8/9)
100566(27/1)
Poor Perfsonar Performance. Has hit a bit of a roadblock with both perfsonar boxes being switched off for the last month... have I missed an announcement saying that the latest perfsonar release is ready? On hold (31/10)
UCL
95298(1/7/13)
UCL's glexec ticket. Ben hit a snag installing this mid-October, no news since then after some feedback from Maarten. In progress (14/10)
Imperial
109526(22/10)
LHCB having cvmfs trouble at IC, which was likely caused by a batch of naughty CMS jobs ruining it for everyone else. LHCB re-enabled IC to see if things were back on track, no news since. Waiting for reply (24/10)
EFDA-JET
109571(23/10)
Ops "availability" test failures at Jet. The cause of the alarms is known (Jet had a certificate problem on a few hosts). Just waiting for alarm to clear now. On Hold (28/10)
97485(21/9/13)
The case of the mysterious lhcb failures at Jet. No progress, none expected really though. On hold (1/10)
100IT
108356(10/9)
AFAICS this ticket now distills down to "Getting vmcatcher working at 100IT". Things seem to be progressing well, although the 100IT chaps aren't very good at setting their ticket statuses correctly! In progress (28/10)
109573(23/10)
Ticket listing the requirements for a cloud site. All the three actions have or already were completed, but there is a question over the state of the 100IT site BDII. In progress (30/10)
La Grada Uno
109712(29/10)
CMs are seeing glexec errors ("status 203") at the Tier 1. Looks to be caused by a lack of wildcard mapping, only just coming to light with the recent cms analysis jobs coming into the site. Andrew L is on it like a scotch bonnet. Or just on it. (29/10)
109694(28/10)
Matt M from Sno+ has noticed gfal-copy errors when trying to access the Tier 1 using those tools. He's not sure if this is a problem with the Tier 1 or the tools themselves (or even his setup), Duncan is already helping him out. In progress (3/11)
107880(26/8)
(possibly related) Sno+ "srmcp failures" for a bunch of SUSY users. Some great input on how to get the tools working from Duncan and Chris, but no word since. My suspicion is Matt is waiting to hear back from this user group. Maybe their mail clients don't work under SUSE either? In progress (21/10)
106324(18/6)
The Tier 1 version of the Bristol CMS pilots losing connection ticket. On hold after exhausting all ideas. On hold (13/10)
109276(11/10)
Submissions to the RAL FTS3 "REST" interface failing for some reason - AIUI thought to be a problem with the CRLs and apache. After some advice the system has been tweaked, and is in the waiting-to-see-if- that-fixed-it stage. On hold (3/11)
108944(1/10)
CMS AAA access tests failing at RAL. Reading down the ticket it looks to be a cms redirector problem at RAL... or something... Andrew has been working to fix things, adding another redirector and other tweaks. Andrew has asked the xrootd experts (cc'd?) why the behaviour they are seeing is occurring (and also notes some references to RALPP slipping into the Tier 1 discussion). Waiting for reply (27/10)
109608(24/10)
T2K notice the LFC denying the existence of the new user. The problem seem to go away from the T2K side, but Catalin has spotted a potential problem and asked for some voms-proxy-info output. Waiting for reply (28/10)
109814(3/11)
Atlas have noticed a lot of lost job heartbeats over the last day, the Tier One guys are on it. In progress (3/11)
107935(27/8)
Inconsistent BDII/SRM numbers. Looks to be a problem with how castor reports read-only disk servers, Brian has put in a request to the Castor team for information on this. On hold (3/11)
|