|
|
Line 436: |
Line 436: |
| ===== ===== | | ===== ===== |
| <!-- ******************Edit start********************* -----> | | <!-- ******************Edit start********************* -----> |
− | '''Monday 18th May 2015, 14.30 BST'''<br /> | + | '''Friday 22nd May 2015'''<br /> |
− | Full review this week.
| + | Matt's on leave until the 8th of June. But he's replaceable with handy links: |
| | | |
| [https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios''']<br /> | | [https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios''']<br /> |
− | At time of writing I see problems with test jobs at Brunel for pheno and Liverpool for a number of VOs (see Sno+ ticket for probable cause and fix at Liverpool).
| |
| | | |
− | 22 Open UK Tickets this week. Going site-by-site:
| + | [http://tinyurl.com/nwgrnys UK NGI GGUS tickets] |
− | | + | |
− | '''APEL/NGI'''<br />
| + | |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=113473 113473] (4/5)<br /> | + | |
− | Missing accounting date for April for some sites. Raul is discussing things for Brunel in the ticket, although they have republished. I think it's only ECDF left to republish their April data. In progress (16/5)
| + | |
− | | + | |
− | '''OXFORD'''<br />
| + | |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=113482 113482] (26/4)<br />
| + | |
− | Loss of accounting data for Oxford needing a APEL republish. The Oxford guys republished, but there is some confusion with the resulting numbers. Discussion is ongoing, John G is currently looking at the records. In progress (14/5)
| + | |
− | | + | |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=113650 113650] (11/5)<br />
| + | |
− | CMS glideins failing at Oxford. The original problem was with a config tweak being left out of the cvmfs setup, but the ticket has been reopened citing problems persisting on the ARC CE (the CREAM appears to be fixed). Reopened (16/5)
| + | |
− | | + | |
− | '''GLASGOW'''
| + | |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=113095 113095] (17/4)<br />
| + | |
− | ROD ticket about batch system BDII failures, left open to avoid unnecessary ticket filing. Gareth noted that the full migration to ARC and HTCondor, which should see the end of these issues, will hopefully be completed by the end of June. On Hold (12/5)
| + | |
− | | + | |
− | '''ECDF'''
| + | |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (31/7/13)<br />
| + | |
− | Somehow left this one out of the e-mail update. Edinburgh's glexec ticket, dependent on the tarball. I put in my tuppence worth today with my tarball hat on. On hold (18/5)
| + | |
− | | + | |
− | '''SHEFFIELD'''<br />
| + | |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=113769 113769] (18/5)<br />
| + | |
− | LHCB see a cvmfs problem at Sheffield. Elena has probably fixed the problem(restarted the sssd), just waiting to see if it all pans out. In progress (18/5)
| + | |
− | | + | |
− | '''MANCHESTER'''<br />
| + | |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=113744 113744] (15/5)<br />
| + | |
− | For the VOMS rather then the site, Jens' request for the creation of the dIrac VO, vo.dirac.ac.uk. In progress (18/5)
| + | |
− | | + | |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=113692 113692] (13/5)<br />
| + | |
− | A request from pheno to add support to for their new cvmfs area at Manchester, and as I understand it, to support them in a new "form" (pheno.egi.eu). In progress (13/5)
| + | |
− | | + | |
− | '''LIVERPOOL'''<br />
| + | |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=113742 113742] (15/5)<br />
| + | |
− | Sno+ noticed their nagios failures at Liverpool. Rob reckons this was a problem with the DPM BDII service certificate not being updated (that's bitten me too), and fixed things this morning. Let's see how that goes. In progress (18/5)
| + | |
− | | + | |
− | '''LANCASTER'''<br />
| + | |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13!)<br />
| + | |
− | Lancaster's vintage glexec ticket. An update on this - after have a roundtuit session last week I was building glexec for different paths. It still needs some testing to make sure it works properly. There however definitely won't be a one-size-fits-all tarball solution. On hold (15/5)
| + | |
− | | + | |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/14)<br />
| + | |
− | Only the crustiest old tickets for us at Lancaster! Poor perfsonar performance. Sadly didn't get roundtuit on this one - we're pushing getting these nodes dual stacked as Ewan had pointed out that it would be interesting to see if IPv6 tests also saw this issue. On hild (18/5)
| + | |
− | | + | |
− | '''UCL'''<br />
| + | |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=113721 113721] (14/5)<br />
| + | |
− | The only UCL ticket, this is a egi "low availability" ticket. However Daniela notes that the plots are on the rise, so things are looking alright. Probably want to "On Hold" it but otherwise not much to be done. In progress (14/5)
| + | |
− | | + | |
− | '''IMPERIAL'''<br />
| + | |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=113743 113743] (15/5)<br />
| + | |
− | A ticket from Durham concerning the Dirac instance at Imperial's settings for their site. Daniela hopes to get it fixed soon. In progress (15/5)
| + | |
− | | + | |
− | '''100IT'''<br />
| + | |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=112948 112948] (10/4)<br />
| + | |
− | CA certificate update at 100IT leading to a discussion of other authentication based failures. David has asked for voms information after posting his configs. In progress (13/5)
| + | |
− | | + | |
− | '''TIER 1'''<br />
| + | |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=113035 113035] (14/4)<br />
| + | |
− | Ticket tracking the decommissioning of the Tier 1 CREAM CEs. I think things are just about done now, this ticket can soon be closed. In progress (11/5)
| + | |
− | | + | |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694] (28/10/14)<br />
| + | |
− | Sno+ gfal-copy ticket. Brian reports that the Tier 1 is upgrading gfal2 on their WNs, and notes that there's a lot of active debugging work going on in the area. As he eloquently puts it "situation is quite fluid". In progress (13/5)
| + | |
− | | + | |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br />
| + | |
− | CMS AAA tests failing at the Tier 1. There's been a lot of work on this, deploying then trying to get the new xrootd director configured. New problems have cropped up, and are under investigation. In progress (11/5)
| + | |
− | | + | |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (28/3)<br />
| + | |
− | Atlas transfer failures ("failed to get source file size"). Tracked to a odd double transfer error, possibly introduced in one of the recent "upgrades". Brian has been declaring these files as bad, and a workaround or solution is being thought about. In progress (14/5)
| + | |
− | | + | |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=113705 113705] (13/5)<br />
| + | |
− | Atlas transfer failures from RAL tape. Checksum failures, which Brian tracked to being due to not being of a type Castor supports. Brian has asked if this can be changed at the CERN FTS or in rucio. Waiting for reply (14/5)
| + | |
− | | + | |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=113748 113748] (16/5)<br />
| + | |
− | Another atlas transfer ticket, but as the error indicates no space left at the Brunel space token being transferred to Elena has noted that this isn't a site problem, telling the submitter to put in a JIRA ticket instead. Waiting for reply, but probably can be just closed (16/5)
| + | |
− | | + | |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=112866 112866] (7/4)<br />
| + | |
− | Lots of cms job failures at RAL. This has been traced to some super-hot files, mitigation is being looked into. A candidate for perhaps On Holding, depends on the time frame of a work around. In progress (13/5)
| + | |
− | | + | |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=113320 113320] (27/4)<br />
| + | |
− | CMS data transfer issues. I'm not actually too sure what's going on. There are files that need invalidating, which seems to be the root of the evil befalling transfers. The issue is being actively worked on though. In progress (18/5)
| + | |
| | | |
| <!-- ******************Edit stop********************* -----> | | <!-- ******************Edit stop********************* -----> |