Monday 3rd December 13.45 GMT</br>
32 Open UK tickets this week. It's the start of the month, so all tickets, great or small, will get reviewed.
https://ggus.eu/ws/ticket_info.php?ticket=88546 (16/11)</br>
Creation of epic.vo.gridpp.ac.uk. Name has been settled on, deployed on the master VOMS instance and rolled out to the backups, ready for whatever the next step will be. In progress (30/11)
https://ggus.eu/ws/ticket_info.php?ticket=87813 (25/10)</br>
Migration of the vo.helio-vo.eu to the UK. At last word everything was done on the VOMS side, and testing on grid resources was needed to be done. In progress (15/11)
https://ggus.eu/ws/ticket_info.php?ticket=89141 (3/12)</br>
RAL are seeing a high atlas production job failure rate, and a possibly related high FTS failure rate. In Progress (3/12)
https://ggus.eu/ws/ticket_info.php?ticket=89081 (30/11)</br>
Failed biomed SAM tests, tracked to a missing / in a .lsc file. Should be fixed, waiting for confirmation (but don't wait too long). Waiting for reply (3/12)
https://ggus.eu/ws/ticket_info.php?ticket=89063 (30/11)</br>
The atlas frontier squids at RAL weren't working, fixed (networking problem) but ticket reopened and placed on hold as the monitoring for these boxes needs updating. On hold (30/11)
https://ggus.eu/ws/ticket_info.php?ticket=88596 (19/11)</br>
t2k.org jobs weren't be delegated to RAL. After some effort this has been fixed, the ticket can be closed. In progress (1/12)
https://ggus.eu/ws/ticket_info.php?ticket=86690 (3/10)</br>
"JPKEKCRC02 missing from FTS ganglia metrics" for t2k. This has been a pain to fix, at last word RAL were waiting on their ganglia expert to come back, but that was a while ago (however I suspect they had bigger fish to fry in November). In progress (6/11)
https://ggus.eu/ws/ticket_info.php?ticket=86152 (17/9)</br>
Correlated packet loss on the RAL perfsonar. On hold pending a wider scale investigation. On hold (31/10)
https://ggus.eu/ws/ticket_info.php?ticket=87468 (17/10)</br>
The last Unsupported gLite software ticket (until the next batch). Ben has put the remaining out of date CE into downtime after updating another. In progress (29/11)
https://ggus.eu/ws/ticket_info.php?ticket=89129 (3/12)</br>
High atlas production failure rate, likely to be due to the migration to EMI. It could be a problem with the software area, Mark has involved Alessandro De Salvo. Waiting for reply (3/12)
https://ggus.eu/ws/ticket_info.php?ticket=86105 (14/9)</br>
Low atlas sonar rates to BNL from Birmingham. atlas tag removed from ticket to lower noise. On hold (30/11)
https://ggus.eu/ws/ticket_info.php?ticket=89105 (1/12)</br>
t2k.org jobs failing on I.C. WMSs due to proxy expiry. Daniela thinks that it may be a problem with myproxy (the cern myproxy servers are having dns alias trouble by the looks of it). In progress (3/12)
https://ggus.eu/ws/ticket_info.php?ticket=89096 (30/11)</br>
lhcb jobs to Sheffield that go through the WMS are seeing "BrokerHelper: no compatible resources" resources, possibly due to the published values for GlueCEStateFreeCPUs & GlueCEStateFreeJobSlots being 0. In progress (3/12)
https://ggus.eu/ws/ticket_info.php?ticket=89066 (30/11)</br>
biomed nagios tests failing on the Lancaster SE. "problem listing Storage Path(s)", which suggests to me that we have a publishing problem. Couldn't find any obvious bugbears though, keeping on digging. In progress (30/11)
https://ggus.eu/ws/ticket_info.php?ticket=89084 (30/11)</br>
The problem in 89066 is also affecting the biomed CE tests. On hold (30/11)
https://ggus.eu/ws/ticket_info.php?ticket=88628 (20/11)</br>
Getting t2k working on our clusters. Had some problem with building root on one cluster, and even just submitting jobs to the other. In progress (30/11)
https://ggus.eu/ws/ticket_info.php?ticket=88772 (22/11)</br>
One of Lancaster's clusters is reporting default values for "GlueCEPolicyMaxCPUTime", mucking up lhcb's job scheduling. Tracked to a problem in the scripts (https://ggus.eu/ws/ticket_info.php?ticket=88904), the fix will be out in January so I've on-holded this until then. On hold (3/12)
https://ggus.eu/ws/ticket_info.php?ticket=85367 (20/8)</br>
ilc jobs always fail on a Lancaster CE, possibly due to the CE's poor performance. For the third time in a row I've had to put this work off for a month. On hold (3/12)
https://ggus.eu/ws/ticket_info.php?ticket=84461 (23/7)</br>
t2k transfer failures to Lancaster. Having trouble getting a routing change put through with the RAL networking team, probably due to them having a lot on their plate over the past month. In Progress (3/12)
https://ggus.eu/ws/ticket_info.php?ticket=88761 (22/11)</br>
Technically a ticket from Liverpool to lhcb. A complaint over the bandwidth used by lhcb jobs, probably due to a spike in lhcb jobs running during an atlas quiet period. Are all sides satisfied about the cause of this problem and the steps taken to prevent this happening again? In progress (23/11)
https://ggus.eu/ws/ticket_info.php?ticket=88631 (20/11)</br>
Looks like Emyr has fixed Sussex's not-publishing-UserDNs APEL problem, so this ticket can be closed. In Progress (26/11)
https://ggus.eu/ws/ticket_info.php?ticket=88822 (23/11)</br>
A similar ticket to 88772 at Lancaster. It could be that the SGE scripts are needing updating too. In progress (26/11)
https://ggus.eu/ws/ticket_info.php?ticket=88987 (28/11)</br>
t2k jobs are failing on ce05. In progress (30/11)
https://ggus.eu/ws/ticket_info.php?ticket=88887 (26/11)</br>
lhcb pilots are also failing on ce05. In progress (28/11)
https://ggus.eu/ws/ticket_info.php?ticket=88878 (26/11)</br>
hone are also having troubles on ce05... In progress (26/11)
https://ggus.eu/ws/ticket_info.php?ticket=86306 (22/9)</br>
LHCB redundant, hard-to-kill pilots at QMUL. Chris opened a ticket to the cream developers (https://ggus.eu/tech/ticket_show.php?ticket=87891). But still the request to purge lists come in from lhcb. In progress (21/11).
https://ggus.eu/ws/ticket_info.php?ticket=88376 (8/11)</br>
Biomed authorisation errors on CE svr026. Sam asked if this was the only CE that has seen this problem on the 9th. No reply since, I added in the biomed e-mail address explicitly to the cc list to try and coax a response. Waiting for reply (9/11)
https://ggus.eu/ws/ticket_info.php?ticket=86334 (24/9)</br>
Low atlas sonar rates to BNL. Apparently things went from bad to worse on the 23rd/24th of October. Duncan has removed the atlas VO tag on the ticket to lower the noise on the atlas daily summary. On hold (30/11)
https://ggus.eu/ws/ticket_info.php?ticket=88227 (6/11)</br>
biomed complaining about 444444 waiting jobs & no running jobs being published by jet. The guys there have had a go at fixing the problem (probably caused by their update to EMI2), but are likely out of ideas. I had a brain wave regarding user access in maui.cfg but if that's not the solution I'm sure they'll appreciate ideas. In progress (3/12).
https://ggus.eu/ws/ticket_info.php?ticket=86106 (14/9)</br>
Poor atlas sonar rates from Oxford to BNL. On hold due to running out of fixes to try, and the fact that they get good rates elsewhere. VO tag removed to reduce noise. On hold (30/11)
https://ggus.eu/ws/ticket_info.php?ticket=84123 (11/7)</br>
atlas production failures at Durham. Site still in "quarantine". On hold (20/11).
https://ggus.eu/ws/ticket_info.php?ticket=75488 (19/10/11)
compchem authentication failures. As this ticket has been on hold at a low priority since January then it would seem worthwhile to contact the ticket originators to see what they want to do. On hold (8/10)
|