Monday 5th March 2018, 14.30 GMT
4 Open Tickets this month.
IPv6 Deployment Tickets
Sussex: 131617
Possibly on hold until mid-2018.
RALPP: 131616
Chris had an encouraging update back in January, but hit some snags with a new Perfsonar install. Any joy?
OXFORD: 131615
No update since stating you had dual-stacked Perfsonar boxes back in November. Anything to add? Thanks for the update.
CAMBRIDGE: 131614
No progress expected until the Summer of this year. Is this still the case?
BRISTOL: 131613
Last update hoped progress could happen by February, any news? No recent news
BIRMINGHAM: 131612
Some progress on the v6 infrastructure news, hopefully the bugs Mark described a few weeks back can be ironed out.
GLASGOW: 131611
Gareth provided a recent, if not totally positive, update.
ECDF: 131610
There were some interesting times last week when taking the first steps in dual-stacking the ECDF DPM broke things. Keeping to dual-stacking their test DPM for now.
DURHAM: 131609
Last update at the end of January had no positive movement from central IT on v6 deployment.
SHEFFIELD: 131608
This ticket really could do with an update - even an unexciting one.
MANCHESTER: 131607
IIRC I think reverse lookup works only for the Perfsonar boxes - the ticket could do with an update about this.
LIVERPOOL: 131606
Another ticket that could do with an update, even if it's a boring one. John provided a brief update.
UCL: 131604
No news from central IT at last check back in January.
RHUL: 131603
Perfsonar dual-stacked, but DNS lookup not supported yet.
Common or Garden Tickets
SUSSEX
122772 (11/7/16)
Webdav/Xroot ticket. Some good looking progress on getting this to work, although at last check Leo hit some more problems. In progress (7/2)
133325 (6/2)
Availability ticket. Hopefully given another week of smooth running this can be closed. In progress (12/2)
RALPP
133819 (4/3)
LHCB asked RALPP to provide details of nodes without any SSE4.2 support. As Chris instructed the ticket was reopened by LHCB to request lhcb jobs do not land on these nodes. Reopened (4/3) Update - solved, the nodes are being decommissioned very soon.
OXFORD
133809 (3/3)
Availability ticket, caused by the AC troubles. On hold (5/3)
BRISTOL
133762 (1/3)
CMS Transfer problems, on hold until Friday. On Hold (5/3)
133806 (2/3)
CMS asked sites to deploy Singularity by March 2018, this ticket is the follow up. On hold (5/3)
BIRMINGHAM
129930 (4/8/17)
Atlas http SAM tests failing. Any luck with the puppet scripts Kashif shared with you? On hold (13/2)
GLASGOW
133667 (23/2)
LHCB data access problems at Glasgow. The ticket tailed off a bit, Andrew McNab has offered to help compare Glasgow and Manchester settings. In progress (5/3) Update - everything looks good now after Sam updated xroot across the Glasgow storage. Maarten noted in the xroot changelog the likely fix. I should imagine this ticket can be closed now.
DURHAM
133338 (7/2)
Atlas jobs failing at Durham, with the problems likely to be related to the Arc Control Tower handling of pilots. Adam rolled out some changes, have these fixed things? In progress (21/2)
SHEFFIELD
133019 (24/1)
Availability ticket. Ticking along. On hold (1/3)
133810 (3/3)
Sno+ jobs failing due to cvmfs errors on a node, which Elena has offline. I suspect that that's this ticket done with. In progress (4/3)
133770 (2/3)
LHCB jobs failing due to problems on some WNs, Elena has been fixing them, hopefully it's all sorted now. In progress (3/3)
MANCHESTER
133716 (27/2)
Atlas deletion errors - it looks like this ticket has been missed. Assigned (27/2)
QMUL
133402 (9/2)
A good portion of Sno+ jobs failing at QM, due to stage in/out errors. This is likely caused by the reduced network bandwidth being hogged by atlas. Hopefully this will be fixed soon (by restoring the 20GB/s site connection). In progress (22/2)
132713 (4/1)
hyperk.org support ticket. Any news? In progress (6/2)
132929 (18/1)
CMS having problems due APEL's problem parsing slurm logs (or something like that). APEL support have been called in, but no news yet. In progress (29/1)
IMPERIAL
133683 (24/2)
Atlas seeing a high job failure at Imperial, due to problems with their AGIS configs that they have no control over. Elena proposes closing the ticket and moving the conversation to JIRA. In progress (5/3) Update - atlas are waiting on seeing some running jobs before closing the ticket
133818 (4/3)
Another LHCB asking how many nodes do not have sse4.2 support. Simon reports there are no plans to decommission these nodes yet. Waiting for reply (5/3)
133723 (27/2)
This is a ticket for the Cloud site, Sno+ saw problems. Simon was investigating, and has offlined the cloud site in Dirac to prevent further failures. In progress (27/2) Update - Simon hasn't managed to reproduce any errors, and has suggested closing the ticket for now, reopening if needed.
132688 (3/1)
Another not really an Imperial ticket, I think this lost Pheno file ticket can be closed soon. In progress (29/1) Update - ticket closed
TIER 1
133719 (27/2)
Atlas spotted tranfers failing into Echo. It was being investigated, any news? In progress (27/2)
133752 (1/3)
Atlas noticed the FTS was broken. Was investigating Alastair noted that it appears to be an IPv6 issue. In progress (1/3)
133717 (27/2)
Likely related, a similar sounding CMS ticket. Any news? In progress (27/2)
133619 (21/2)
Missing unmerged CMS files at RAL. Chris has been helping a lot, but has asked CMS to double check his working. Waiting for reply (5/3)
133764 (1/3)
Sno+ ticket about the RAL BDII not having SFU information. It looks like the bdii information has recently changed (for the worse). Any news? In progress (2/3) Update - Karin has updated the ticket saying that things have got a lot worse for Sno+, upping the ticket's priority.
132589 (21/12/17)
LHCB killed pilots ticket. Some more investigations into this show that the problem is getting worse. Any luck with your investigation? In progress (23/2)
132708 (4/1)
WMS decommissioning ticket. Nothing to do here until next month I don't think. In progress (18/1)
127597 (7/4/17)
CMS network performance ticket. No news since Chris' comprehensive update in January. On hold (29/1)
124876 (7/11/16)
ECHO gridftp ROD tests not working, due to problems with the tests. No news on the counter ticket, still. On hold (13/11/17)
117683 (18/11/15)
GLUE2 publishing for Castor. A quick update in January reports a prototype version is being tested. On hold (3/1)
|