Monday 6th February 2017, 14.30 GMT
21 Open UK tickets this month
FRESH IN THIS MORNING - BRISTOL
https://ggus.eu/?mode=ticket_info&ticket_id=126454 (7/6)
As seen on TB-SUPPORT, CMS are having test failures at Bristol and Winnie is left without a CMS site support at the moment. I see some replies already on the list, I'll leave this slot here for hopefully helpful discussion. On Hold (7/6)
SUSSEX
125503 (9/12/16)
Sno+ file download failure ticket, due to the wrong SE name in the LFC for the files. Jeremy M reports that he is looking into created a DNS alias and asking the CA sage (aka Jens) to shape the necessary certificate. In progress (30/1)
122772 (11/7/16)
Webdav/xroot deployment ticket from atlas. Jeremy M reports the appointment of their new admin, which is great stuff. This is one of the first things on his todo list. I'll repeat the usual "we're here to help" message. No point suffering in silence! On hold (26/1)
Fresh in last night - 126438 - atlas seeing srmPut failures, but the error is 'file already exists'. A problem with rucio?
RALPP
125743 (27/12/16)
An availability ticket. A few blips on the nagios page, but I don't think there's anything to see here really. On Hold (29/1)
125815 (5/1)
Atlas ticket regarding space not being released after deletion. Chris has beaten his dcache into shape, and asked for the deletions to be re-attempted. Waiting for reply (30/1)
OXFORD
126371 (4/2)
Atlas transfer failures. Kashif spotted that the dpm-gsiftp daemon and failed, and got it back up. I suspect this ticket it can be closed if the daemon is stable? In progress (4/2)
121924 (2/6/16)
Perfsonar rate ticket? Any news? If not, is there likely to be any? On Hold (5/12/16)
125822 (5/1)
The Oxford edition of the "Space not released after deletion" issue. Kashif too has been tinkering his SE, tweaking and (re-)starting httpd daemons and asks for a fresh list of files to check. Waiting for reply (27/1)
BIRMINGHAM
126131 (24/1)
Availability ticket. The numbers are on the mend so the ticket is On Hold (30/1)
GLASGOW
125867 (9/1)
LHCB seeing cvmfs-related job failures on WNs at Glasgow. Gareth has updated cvmfs across the Glasgow nodes and asks if the issue has calmed down. Waiting for reply (31/1)
124052 (25/9/16)
Another LHCB ticket, about the arc publishing incorrect job numbers. Gareth provided an update regarding the Glasgow plans, rolling fixing this into the Centos7 migration. Thanks Gareth! On Hold (31/1)
EDINBURGH
126349 (3/2)
Another availability ticket, although today's numbers look to be okay so hopefully the cause of the troubles has passed. Looks like this ticket hasn't been noticed yet though. Assigned (3/2) Andy noted that the argo numbers seem nonsensical with negative availability for a few days! But things are on the mend now. Looks like a simple case of On Holding the ticket for the next 26 days.
LIVERPOOL
124819 (3/11/16)
The last AFS ticket, John B reports that the university has stopped firewalling UDP port 7001 and asks if things are better now. Waiting for reply (3/2)
126167 (25/1)
Decommissioning ticket for the last CREAM CE at Liverpool (which will also see the end of torque at the site). Downtime for the service will be on the 14th (Happy Valentine's Day?) and the service will be switched off properly come the 28th. In progress (30/1)
QMUL
125627 (19/12/16)
Atlas transfers failing to the QM test SE. Dan increased the space to 10TB to sooth the last batch of failures, just waiting to here if that worked. Waiting for reply (26/1)
126261 (30/1)
Biomed nagios tests not working for ce4 at QM. The problem persists. In progress (2/2)
126312 (1/2)
Atlas spotted QM's squid had fallen over. Dan has noticed problems since upgrading to v3 of frontier-squid, although the issues could also be related to IPv6 on the hosts (of the two squids at QM the one that fell over was also the one that has an IPv6 address in DNS). Keeping the ticket open to see if things stay up. In progress (1/2)
TIER 1
126296 (1/2)
CMS SAM tests failing against srm-cms-disk.gridpp.rl.ac.uk. All transfers "by hand" pass without trouble, and Gareth points out that this service is not in production in the GOCDB, so tests shouldn't even be running against it! Waiting for reply (6/2) Update - CMS got back that this is the endpoint specified in PhEDeX so this is why it was tested. If this is wrong it will need to be changed.
126376 (5/2)
Another batch of CMS SAM test failures. This includes the srm-cms-disk issue again. John K restarted the CMS xroot directors to try to clear the CE test errors that were being seen - things were looking up. In progress (6/2)
126184 (26/1)
Request from atlas for input on the new site monitoring schemes, linked in the ticket. The appropriate people were being chased. In progress (26/1)
124876 (7/11/16)
echo instance at RAL failing nagios tests due to the tests not using the right path. The ticket addressing this (125026) has had no progress since just before Christmas and so could do with a shake up. On Hold (1/1)
117683 (18/11/15)
Glue 2 publishing for Castor ticket. Did Jens and Rob have any luck tackling this in the pre-Christmas get together? On Hold (7/12/16)
|