Difference between revisions of "Past Ticket Bulletins 2015"
Line 1: | Line 1: | ||
+ | '''Monday 2nd February 2015, 14.00 GMT'''<br /> | ||
+ | 22 Open UK tickets this month. | ||
+ | |||
+ | '''SUSSEX'''<br /> | ||
+ | [https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389] (26/11/14)<br /> | ||
+ | A perfsonar ticket for Sussex. Their perfsonar has been reinstalled, but needs soothing. Matt has informed us that this might have to wait a few weeks due to other issues. On Hold (21/1) | ||
+ | |||
+ | '''RALPP'''<br /> | ||
+ | [https://ggus.eu/?mode=ticket_info&ticket_id=110536 110536] (2/12/14)<br /> | ||
+ | MICE job failures at RALPP - it looked like they were dying due to running of of memory. The queues have been tweaked to give MICE more, but no word from the MICE if this has solved the problem. Waiting for reply (12/1) | ||
+ | |||
+ | '''BRISTOL'''<br /> | ||
+ | [https://ggus.eu/?mode=ticket_info&ticket_id=110365 110365] (25/11/14)<br /> | ||
+ | Another perfsonar ticket. Again the node is reinstalled, just not quite working right. Winnie is waiting for news from the other sites in a similar boat. In progress (maybe On Hold it?) (20/1) | ||
+ | |||
+ | '''EDINBURGH'''<br /> | ||
+ | [https://ggus.eu/?mode=ticket_info&ticket_id=111118 111118] (12/1)<br /> | ||
+ | ECDF "low availability" ticket - just waiting for the silly alarm to clear. Daniela submitted a ticket about this foolish alarm a while ago - [https://ggus.eu/?mode=ticket_info&ticket_id=107689 107689]. On Hold (19/1) | ||
+ | |||
+ | [https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (1/7/13)<br /> | ||
+ | glexec tarball ticket. With my tarball hat on - still no positive news on this front - it's beginning to look like this can't be done but we're having one last go. Sorry! On Hold (19/12) | ||
+ | |||
+ | '''MANCHESTER'''<br /> | ||
+ | [https://ggus.eu/?mode=ticket_info&ticket_id=110225 110225] (18/11/14)<br /> | ||
+ | Change of VO Manager for helios-vo.eu. It looks like this ticket is being held up at the user end a lot. I'm not sure there's anything we can do as it involves outside CAs. On Hold (20/1) | ||
+ | |||
+ | [https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356] (23/1)<br /> | ||
+ | One of Manchester's CEs not working for biomed, due to problems with the new CREAM/old WMS communication. Alessandra gave biomed some sagely advice, but I suspect this ticket will need to be prodded soon to get a reponse from biomed (who I agree should use a newer WMS and close it). On Hold (26/1) | ||
+ | |||
+ | '''LANCASTER'''<br /> | ||
+ | [https://ggus.eu/?mode=ticket_info&ticket_id=111547 111547] (2/2)<br /> | ||
+ | I'm reporting on a ticket that I submitted to myself today. I'm not sure what that says about the world. Anyway - a ticket to track the decommissioning of one of Lancaster's CEs, as we try to do it all proper like. On Hold (2/2) | ||
+ | |||
+ | [https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (21/1/14)<br /> | ||
+ | Lancaster's perfsonar ticket, which I sadly let reach its first birthday. I've been prodding this offline, does anyone have the address for a regular, open iperf endpoint I could borrow? On Hold (9/1) | ||
+ | |||
+ | [https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13)<br /> | ||
+ | Lancaster's tarball glexec ticket, as the ECDF one. On hold (26/1) | ||
+ | |||
+ | '''UCL'''<br /> | ||
+ | [https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13)<br /> | ||
+ | UCL's glexec ticket. They've been having trouble getting it to behave, and at last check Ben was off ill - probably due to dealing with glexec :-) On Hold (20/1) | ||
+ | |||
+ | '''QMUL'''<br /> | ||
+ | [https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353] (25/11/14)<br /> | ||
+ | Atlas asking for QM's storage to be made available via https. Waiting on a production ready STORM that can provide this - Dan is trying it out on his testbed se02.esc.qmul.ac.uk, which still needs tweaking. In progress (28/1) | ||
+ | |||
+ | '''IMPERIAL'''<br /> | ||
+ | [https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357](23/1)<br /> | ||
+ | One of the IC CEs not working for biomed. Similar to the Manchester ticket, Daniela points to ticket 110635 and is waiting on an EMI release to fix it (due out imminently AIUI). On Hold (28/1) | ||
+ | |||
+ | '''EFDA-JET'''<br /> | ||
+ | [https://ggus.eu/?mode=ticket_info&ticket_id=97485 97485] (21/9/13)<br /> | ||
+ | Jet's LCHB job failure tickets. I'm afraid I haven't been able to chase this up (partly due to only ever remembering on the first Monday of the month) - there's been no news for a while. On Hold (1/10/14) | ||
+ | |||
+ | '''100IT'''<br /> | ||
+ | [https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333] (22/1)<br /> | ||
+ | A ticket to 100IT and the NGI to get the cloud accounting probe upgraded. I notified 100IT, but forgot to reassign the ticket - thanks to Jeremy for doing it. Assigned (2/2) | ||
+ | |||
+ | [https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356](10/9/14)<br /> | ||
+ | Getting VMcatcher working at 100IT. David from 100IT has asked for some answers on which "glancepush" to use, but no reply for a while. Waiting for reply (19/1) | ||
+ | |||
+ | '''TIER 1'''<br /> | ||
+ | [https://ggus.eu/?mode=ticket_info&ticket_id=111477 111477](29/1)<br /> | ||
+ | CMS would like to run some staging tests to warm up for Run2. The Tier 1 warned CMS of today's outage and they're happy to proceed tomorrow (the 3rd) - I think they'd like a response. In progress (30/1) | ||
+ | |||
+ | [https://ggus.eu/?mode=ticket_info&ticket_id=107935 107935](27/8/14)<br /> | ||
+ | A ticket regarding inconsistent BDII and SRM storage numbers. Waiting on a fix from the developers regarding read-only disk accounting (I think), Brian is still on the case. Stephen B let us know that Maria the ticket submitter is on maternity leave, and asks in her stead if the numbers are expected to align now. On hold (28/1) | ||
+ | |||
+ | [https://ggus.eu/?mode=ticket_info&ticket_id=111120 111120](12/1)<br /> | ||
+ | An atlas ticket about a large number of data transfer errors seen between RAL and BNL. Brian reckoned that this was due to shallow checksums on the old data being transferred, but had trouble looking at the BNL FTS. Regardless, the ADCoS shifter hadn't seen any errors for a week and suggests the ticket can be closed. Waiting for reply (29/1) | ||
+ | |||
+ | [https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944](1/10/14)<br /> | ||
+ | CMS AAA test problems at RAL. After setting up a new xrootd box the test failures have changed in nature, but sadly they're still failures. In progress (29/1) | ||
+ | |||
+ | [https://ggus.eu/?mode=ticket_info&ticket_id=111347 111347](22/1)<br /> | ||
+ | CMS Consistency Check for RAL, January 2015 edition. Filelists were generated, orphan files were identified, then purged. Just need to know what CMS want to do next. Waiting for reply (26/1) | ||
+ | |||
+ | [https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694](28/10/14)<br /> | ||
+ | Sno+ ticket concerning gfal tool problems, waiting on the new release to come out (middle of this month I believe). If you don't want to wait that long then I believe the 2.8 gfal2 tools can be found in the fts3 repo at last check. On hold (20/1) | ||
+ | |||
+ | |||
'''Monday 26th January 2015, 14.15 GMT'''<br /> | '''Monday 26th January 2015, 14.15 GMT'''<br /> | ||
''Back after being forgotten about by me:''<br /> | ''Back after being forgotten about by me:''<br /> |
Revision as of 14:05, 9 February 2015
Monday 2nd February 2015, 14.00 GMT
22 Open UK tickets this month.
SUSSEX
110389 (26/11/14)
A perfsonar ticket for Sussex. Their perfsonar has been reinstalled, but needs soothing. Matt has informed us that this might have to wait a few weeks due to other issues. On Hold (21/1)
RALPP
110536 (2/12/14)
MICE job failures at RALPP - it looked like they were dying due to running of of memory. The queues have been tweaked to give MICE more, but no word from the MICE if this has solved the problem. Waiting for reply (12/1)
BRISTOL
110365 (25/11/14)
Another perfsonar ticket. Again the node is reinstalled, just not quite working right. Winnie is waiting for news from the other sites in a similar boat. In progress (maybe On Hold it?) (20/1)
EDINBURGH
111118 (12/1)
ECDF "low availability" ticket - just waiting for the silly alarm to clear. Daniela submitted a ticket about this foolish alarm a while ago - 107689. On Hold (19/1)
95303 (1/7/13)
glexec tarball ticket. With my tarball hat on - still no positive news on this front - it's beginning to look like this can't be done but we're having one last go. Sorry! On Hold (19/12)
MANCHESTER
110225 (18/11/14)
Change of VO Manager for helios-vo.eu. It looks like this ticket is being held up at the user end a lot. I'm not sure there's anything we can do as it involves outside CAs. On Hold (20/1)
111356 (23/1)
One of Manchester's CEs not working for biomed, due to problems with the new CREAM/old WMS communication. Alessandra gave biomed some sagely advice, but I suspect this ticket will need to be prodded soon to get a reponse from biomed (who I agree should use a newer WMS and close it). On Hold (26/1)
LANCASTER
111547 (2/2)
I'm reporting on a ticket that I submitted to myself today. I'm not sure what that says about the world. Anyway - a ticket to track the decommissioning of one of Lancaster's CEs, as we try to do it all proper like. On Hold (2/2)
100566 (21/1/14)
Lancaster's perfsonar ticket, which I sadly let reach its first birthday. I've been prodding this offline, does anyone have the address for a regular, open iperf endpoint I could borrow? On Hold (9/1)
95299 (1/7/13)
Lancaster's tarball glexec ticket, as the ECDF one. On hold (26/1)
UCL
95299 (1/7/13)
UCL's glexec ticket. They've been having trouble getting it to behave, and at last check Ben was off ill - probably due to dealing with glexec :-) On Hold (20/1)
QMUL
110353 (25/11/14)
Atlas asking for QM's storage to be made available via https. Waiting on a production ready STORM that can provide this - Dan is trying it out on his testbed se02.esc.qmul.ac.uk, which still needs tweaking. In progress (28/1)
IMPERIAL
111357(23/1)
One of the IC CEs not working for biomed. Similar to the Manchester ticket, Daniela points to ticket 110635 and is waiting on an EMI release to fix it (due out imminently AIUI). On Hold (28/1)
EFDA-JET
97485 (21/9/13)
Jet's LCHB job failure tickets. I'm afraid I haven't been able to chase this up (partly due to only ever remembering on the first Monday of the month) - there's been no news for a while. On Hold (1/10/14)
100IT
111333 (22/1)
A ticket to 100IT and the NGI to get the cloud accounting probe upgraded. I notified 100IT, but forgot to reassign the ticket - thanks to Jeremy for doing it. Assigned (2/2)
108356(10/9/14)
Getting VMcatcher working at 100IT. David from 100IT has asked for some answers on which "glancepush" to use, but no reply for a while. Waiting for reply (19/1)
TIER 1
111477(29/1)
CMS would like to run some staging tests to warm up for Run2. The Tier 1 warned CMS of today's outage and they're happy to proceed tomorrow (the 3rd) - I think they'd like a response. In progress (30/1)
107935(27/8/14)
A ticket regarding inconsistent BDII and SRM storage numbers. Waiting on a fix from the developers regarding read-only disk accounting (I think), Brian is still on the case. Stephen B let us know that Maria the ticket submitter is on maternity leave, and asks in her stead if the numbers are expected to align now. On hold (28/1)
111120(12/1)
An atlas ticket about a large number of data transfer errors seen between RAL and BNL. Brian reckoned that this was due to shallow checksums on the old data being transferred, but had trouble looking at the BNL FTS. Regardless, the ADCoS shifter hadn't seen any errors for a week and suggests the ticket can be closed. Waiting for reply (29/1)
108944(1/10/14)
CMS AAA test problems at RAL. After setting up a new xrootd box the test failures have changed in nature, but sadly they're still failures. In progress (29/1)
111347(22/1)
CMS Consistency Check for RAL, January 2015 edition. Filelists were generated, orphan files were identified, then purged. Just need to know what CMS want to do next. Waiting for reply (26/1)
109694(28/10/14)
Sno+ ticket concerning gfal tool problems, waiting on the new release to come out (middle of this month I believe). If you don't want to wait that long then I believe the 2.8 gfal2 tools can be found in the fts3 repo at last check. On hold (20/1)
Monday 26th January 2015, 14.15 GMT
Back after being forgotten about by me:
Other VO Nagios Status:
At the time of writing I see:
Imperial: gridpp VO job submission errors (but only 34 minutes old so probably naught to worry about).
Brunel: gridpp VO jobs aborted (one of these is 94 days old, so might be something to worry about).
Lancaster: pheno failures (I can't see what's wrong, but this CE only has 10 days left to live).
Sussex: snoplus failures (but I think Sussex is in downtime).
RALPP: A number of failures across a number of CEs, all a few hours old. An SE problem?
Sheffield: gridpp VO job submission failure, but only 6 hours old.
And of course the srm-$VONAME failures at the Tier 1, which are caused by incompatibility between the tests and Castor AIUI. Things are generally looking good.
22 Open UK Tickets this week.
NGI/100IT
111333(22/1)
The NGI has been asked to upgrade the cloud accounting probe, and then notify our (only at the moment) cloud site to republish their accounting. Not entirely sure what this entails or who this falls on, I assigned it to NGI-OPERATIONS (and also noticed that 100IT isn't on the "notify site" list - odd). Assigned (22/1)
TIER 1
108944(1/10/14)
CMS AAA test failures. Andrew Lahiff reported last week that the Tier 1 is building a replacement xrootd box which is currently being prepared. If that will take a while can the ticket be put on hold? In progress (19/1)
QMUL
110353(25/11/14)
An atlas ticket, asking for httpd access to at QMUL. The QM chaps were waiting on a production ready Storm that could handle this, and are preparing to test one out. This is another ticket that looks like it might need to be put On Hold (will leave that up to you chaps - there's a big difference between "slow and steady" progress and "no progress for a while"). In progress (21/1)
RHUL
111355(23/1)
A dteam ticket - concerning http access to RHUL's SE. Although the initial observation about the SE certificate being expired was incorrect (the expiry date was reported as 5/1/15, which to be fair I would read as the 5th of January and not the 1st of May!) there still is some underlying problem here with intermittent test failures. Also this ticket raises the question of under what context are these tests being conducted? Anyone know, or shall we ask the submitter? In progress (26/1)
BIOMED PROBLEMS:
Manchester: 111356(23/1)
Imperial: 111357(23/1)
Biomed are having job problems, looking to be caused by using crusty old WMSes to communicate with these site's shiny up-to-date CEs. According to ticket 110635 a cream side fix should be out by the end of January (CREAM 1.16.5), although Alessandra suggests that Biomed should try to use newer, working WMSes - or Dirac instead!
Monday 19th January 2015, 14.30 GMT
23 Open UK Tickets this week.
TIER 1
108944(1/10/14)
CMS seeing AAA test failures at RAL. The tests have been restarted recently and now seem to be having some suspicious looking authentication failures. In Progress (13/1)
SHEFFIELD
111162(14/1)
Atlas complaining about httpd doors not working on Sheffield's SE. After schooling the submitter in how to submit more useful information Elena is working on it. I bring this up as in the last few days I've had quite a few of my pool nodes have their httpd daemons crash on them (they're up to date, but still SL5), which may or may not be related. In Progress (19/1)
ECDF
111118(12/1)
ECDF "low availability" ticket after a few days of argus trouble, which Wahid fixed. Now the ticket will languish for a few weeks as the alarm clears. Daniela has reminded us of her ticket against these fairly silly alarms: https://ggus.eu/?mode=ticket_info&ticket_id=107689 . In the mean time this ticket could do with being put On Hold whilst the alarm clears. In progress (19/1)
100IT
108356(10/9/2014)
Setting up VMCatcher at 100IT. After some troubles things seem to have be looking up, although there are still some questions that the 100IT chaps have for the configurations and what they should be using that aren't getting answers. I set the ticket to "Waiting for Reply" hoping that this will help get those in the know's attention. Waiting for reply (15/1)
Perfsonar Tickets
110389(Sussex)
110382(TIER 1)
108273(Durham)
100566(Lancaster)
110365(Bristol)
Everyone seems to have updated their perfsonar hosts, so we're all good on that front, but a number of sites are either having trouble with their reinstalled hosts, or are having problems that they had pre-reinstall still haunt them. I'm afraid I have no suggestions of what to do about the growing number of these tickets though!