|
|
Line 412: |
Line 412: |
| ===== ===== | | ===== ===== |
| <!-- ******************Edit start********************* -----> | | <!-- ******************Edit start********************* -----> |
− | '''Monday 13th April 2015, 14.00 BST'''<br />
| |
− | 24 Open tickets this week - going over all of them this week, site by site.
| |
| | | |
− | ''Fresh in this morning - [https://ggus.eu/?mode=ticket_info&ticket_id=113010 113010] and [https://ggus.eu/?mode=ticket_info&ticket_id=113011 113011] - Sno+ tickets concerning the RAL and Glasgow WMSes not updating job statuses.''
| |
− |
| |
− | '''RALPP'''<br />
| |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703](11/2)<br />
| |
− | Atlas glexec hammercloud tests failing. There's been a lot of waiting on atlas to build new HC jobs. The most recent exchange (delayed due to Easter), was asking about SELinux - but no news since the first. In progress (1/4)
| |
− |
| |
− | '''BIRMINGHAM'''<br />
| |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=112875 112875](7/4)<br />
| |
− | Low availability ROD ticket. Availability is crawling back up, just need it to go green. On hold (13/4)
| |
− |
| |
− | '''GLASGOW'''<br />
| |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=112967 112967](10/4)<br />
| |
− | Another ROD ticket for bdii errors at Glasgow. Gareth has been doing everything right investigating this. Kashif recommended ticketed the midmon unit, but Gareth has spotted that the errors correspond to high load on their ARC CE - so it might be a site problem after all - Gareth asks for clarification. Waiting for reply (13/4)
| |
− |
| |
− | '''EDINBURGH'''<br />
| |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (1/7/13)<br />
| |
− | Tarball glexec ticket. No news (sorry). End of April I believe was the "deadline" I set for having this made. On Hold (9/3)
| |
− |
| |
− | '''LANCASTER'''<br />
| |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/14)<br />
| |
− | Lancaster's poor perfsonar performance. I'm not believing quite what I was seeing with the tests I performed so I'm aiming to rerun them. On hold (13/4)
| |
− |
| |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13)<br />
| |
− | Lancaster's tarball glexec ticket. Same as ECDF. On hold (9/3)
| |
− |
| |
− | '''BRUNEL'''<br />
| |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=112966 112966] (13/3)<br />
| |
− | A ROD cream job submit ticket, freshly assigned this afternoon. It's a bit mean of me to bring notice to it. Assigned (13/4) ''And POW, Raul closed this after kicking torque into shape - solved''
| |
− |
| |
− | '''100IT'''<br />
| |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=112948 112948] (10/4)<br />
| |
− | 100IT needed to upgrade to the latest CA release. They've done this, but there are still authentication problems. In progress (13/4)
| |
− |
| |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356] (10/9/14)<br />
| |
− | Deploying vmcatcher at 100IT. After David's questions falling on deaf ears for a while it has been advised that the ticket be closed as this issue will be dealt with elsewhere. Whether or not it is to be "solved" or "unsolved" is open to debate! In progress (can possibly be closed) (13/4)
| |
− |
| |
− | '''TIER 1'''<br />
| |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br />
| |
− | CMS AAA tests failing at RAL. After a lot of work and new xrootd redirectors problems persist. It's looking to be a problem that needs the CASTOR and/or xrootd devs to look at. In progress (30/3)
| |
− |
| |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=112713 112713] (27/3)<br />
| |
− | CMS asking to clean up the "unmerged area". Andrew conjured up a list of files and asked if they could be deleted - CMS responded with a "yes please then close the ticket". Has the deed been done? In progress (31/3)
| |
− |
| |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694] (28/10/14)<br />
| |
− | The Sno+ gfal copy ticket. Matt M still sees gfal-copy hang for files at RAL when he uses the GUID (SURL works). A Castor oddity perhaps? Matt asks a question about what problems like this (coupled with the move away from lcg tools) will mean for VOs that rely on the LFC. In progress (31/3)
| |
− |
| |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=112977 112977] (10/3)<br />
| |
− | CMS high job failure rate at RAL. Related to 112896 (below) - the jobs all want that file! In progress (13/3)
| |
− |
| |
− | https://ggus.eu/index.php?mode=ticket_info&ticket_id=112896 (9/4)<br />
| |
− | CMS Dataset access problems - caused by over a million access attempts on a single file over a 18 hour period. Andrew L comments that CMS needs to have a think about how they access pileup datasets. In progress (9/4)
| |
− |
| |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (10/2)<br />
| |
− | Tier 1 counterpart to 111703. A new HC stress test was submitted near the end of March, but no news on how it did. In progress (23/3)
| |
− |
| |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=112866 112866] (7/4)<br />
| |
− | A different "lots of CMS job failures" ticket. Again a "hot file" seems to be the root cause. In progress (7/4)
| |
− |
| |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (28/3)<br />
| |
− | An atlas file access ticket, seemingly caused by some odd FTS behaviour. No answers to Shaun's question about this odd occurrence or much noise at all till today. Waiting for reply (13/4)
| |
− |
| |
− | '''UCL'''<br />
| |
− | UCL has 6 tickets - 4 just "assigned". I'll just list them in the interests of brevity.<br />
| |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=112371 112371] (ROD low availiability, On Hold)<br />
| |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=112841 112841] (atlas 0% transfer efficiency, assigned)<br />
| |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=112873 112873] (ROD srm put failures, assigned)<br />
| |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=95298 95298] (glexec ticket, on hold)<br />
| |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=112722 112722] (atlas checksum timeouts, in progress)<br />
| |
− | [https://ggus.eu/?mode=ticket_info&ticket_id=112966 112966] (ROD job submit failures, assigned)
| |
| | | |
| <!-- ******************Edit stop********************* -----> | | <!-- ******************Edit stop********************* -----> |