Previous Ticket Bulletins 1

From GridPP Wiki
Revision as of 15:45, 3 September 2012 by Matthew doidge (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Tuesday 28th August 00:00 BST</br> I see 43 open UK tickets this week (the build up is alarming), and I figure bothering people the morning after a long weekend about tickets at their site would be an exercise in generating bad will! A peruse through them doesn't show any urgent tickets decaying unnoticed.

However this build-up of tickets is worrying, so please can everybody scan the UK tickets for any pertaining to their site, and take the chance to update any relevant tickets if they need it. I'm guessing a lot of people are on holiday though.

All open UK tickets can be seen here:</br> http://tinyurl.com/93vlp2e

During today's meeting we'll have a quick surgery in case anyone wants to mention any ticket-related problems rather then the usual round-up. This will hopefully clear some time for the site round-table we've been meaning to have for a while!

Monday 20th of August, 13:00 BST</br> 35 open UK tickets this week! My limit for reading tickets and maintaining the will to live stands at about 24. There doesn't seem to be much going horribly wrong, a group of glite 3.1 retirement tickets make the bulk of the additional open problems this week; this seems to have combined with a buildup of crusty tickets. This could be largely down to holidays (there's a number of tickets waiting for user reply). I'll keep an eye on things, and start prodding tickets after the bank holiday (although due to the bank holiday next week will probably be a light ticket review).

7 tickets concern retirement of glite 3.1 components before the end of September. Most seem to be in order, one that stands out is:</br> RHUL</br> https://ggus.eu/ws/ticket_info.php?ticket=85179 (14/8)</br> Daniela asks if dgc-grid-42.brunel.ac.uk isn't also a glite 3.1 service? (15/8)

I'll review the rest of these next week.

Stealth Tickets:</br> Another case of tickets showing up unannounced, a CMS ticket to IC:</br> https://ggus.eu/ws/ticket_info.php?ticket=85131</br> a similar ticket to Bristol generated noise as normal though. Maybe it's IC's announcements that are broken? These are CMS savannah linked tickets, so there is an extra level of stuff to go wrong.

John Gordon's ticket tracking the ROD dashboard announcement problems is here:</br> https://ggus.eu/ws/ticket_info.php?ticket=85190

UK</br> https://ggus.eu/ws/ticket_info.php?ticket=84408 (20/7)</br> Neurogrid WMS & LFC configuration. Catalin reports that the RAL WMS has been configured, and asks for testing. He'll then move on to the LFC. Waiting for reply (13/8).</br> see also : https://ggus.eu/ws/ticket_info.php?ticket=80259 (14/3)

https://ggus.eu/ws/ticket_info.php?ticket=84381 (19/7)</br> COMET VO creation. Daniela spotted a typo in the VO name on the VOMS server and asks if it can be created ("comet.j-parc.ja" instead of "comet.j-parc.jp"). I'm unsure if Mike would have seen this request. (14/8).

TIER 1</br> https://ggus.eu/ws/ticket_info.php?ticket=84492 (24/7)</br> Sno+: "Job time/memory requirements not provided". A lot of progress, waiting for reply (17/8)</br> https://ggus.eu/ws/ticket_info.php?ticket=85023 (9/8)</br> SNO+ were having trouble using the RAL WMS. Waiting for reply from users for a while now (since 10/8). Are all the users on holiday?

there's a similar, probably related ticket at:</br> GLASGOW</br> https://ggus.eu/ws/ticket_info.php?ticket=85025 (9/8)</br> Stuart cites https://savannah.cern.ch/bugs/?59874 as a possible cause of the problem. Again a request for more information from the users remains unanswered. Waiting for Reply (10/8).

Sno+ aren't having much luck with wmsii it seems:</br> IC</br> https://ggus.eu/ws/ticket_info.php?ticket=85169 (14/8)</br> Seems to be unrelated to the above two problems, but waiting for reply from SNO+ for a while (14/8)

ECDF</br> https://ggus.eu/ws/ticket_info.php?ticket=85325 (20/8)</br> ECDF have a decommissioned lcg-CE that's still receiving tests (and of course failing them). Andy asks what can be done to stop the node being probed? This will be relevant as other sites start to decommission their crustier services. In Progress (20/8). UPDATE - Closed on the dashboard by setting the node to "not registered".

RALPP</br> https://ggus.eu/ws/ticket_info.php?ticket=85019 (9/8)</br> ilc are seeing "role not supported" errors submitting to one of RALPP's CEs. And now debugging has become a problem as the user reports similar sounding errors trying to submit jobs through the RAL WMS. In Progress (19/8).

DURHAM</br> https://ggus.eu/ws/ticket_info.php?ticket=84123 (11/7) and others.</br> Mike reports that he has received spare parts and should hopefully be able to proceed with fixing Durhams problems (15/8).

SUSSEX</br> https://ggus.eu/ws/ticket_info.php?ticket=81784 (1/5)</br> Has Sussex been added to Rebus? (10/8)

Solved Cases:</br> No exciting solved cases again, but my eyes were melting from my head after reading through all the open tickets.

Similar for cases from the UK.

Monday 13th of August, 13:00 BST</br> 23 open UK tickets this week. I spotted a few ilc tickets this week, they seem to be suffering from the usual problems VOs face after a quiet time. Otherwise not much to report.

Question: Do sites mind me "tidying tickets"; essentially when a ticket is obviously "In Progress" (work has started) or "Waiting for Reply" (the last post was a question from the site to a person of interest) I'll poke my nose in? I've done this a few times where people have started work without setting a ticket "in progress", but I don't want to stick my oar in where it's not wanted!

Quiet tickets?</br> https://ggus.eu/ws/ticket_info.php?ticket=85018 (Lancs)</br> https://ggus.eu/ws/ticket_info.php?ticket=85017 (QMUL)</br> Lancaster and QMUL received some Ops tickets on the 9/8, Lancaster got one notification (which I admit I missed) but no follow up e-mails. Queen Mary have been uncharacteristically quiet which makes me think they might have missed their ticket too. Anyone else had any missed tickets? Could this ticket quietness be caused by similar issues that were discussed in TB-SUPPORT this week (concerning multiple e-mail addresses in GGUS & EGI broadcasts).

UK Tickets</br> https://ggus.eu/ws/ticket_info.php?ticket=84381 (19/7)</br> Creation of the COMET VO. A voms instance is online and some formalities have been fulfilled. Daniela asks what's needed to host the VO on the gridpp voms server? (13/8)

QMUL</br> https://ggus.eu/ws/ticket_info.php?ticket=84938 (7/8)</br> neiss.org.uk needed updated VOMS server information on QMUL's servers. Chris has added the "2B" to his .lscs which should have got it, waiting to see if this is fixed things on the CEs before rolling out to the SE (should be waiting for reply, 9/8).

DURHAM</br> https://ggus.eu/ws/ticket_info.php?ticket=84123 (11/7)</br> Atlas production problems at Durham. After a lot of peril Alastair reports that Mike has things more or less up and running, with a few issues with services that need to be understood. Remaining in downtime till these are dealt with. In progress (9/8)</br>

  • Other Durham tickets are on hold except:</br>

https://ggus.eu/ws/ticket_info.php?ticket=68859 (Brian's request for DPM upgrade plans.)</br> which probably should be.

BRISTOL </br> https://ggus.eu/ws/ticket_info.php?ticket=80155 (12/3)</br> Upgrade plans for the Bristol SE. Winnie has outlined a plan (9/7), ticket has been put on hold (18/7) until the end of August. On hold no till the end of September, due to the upgrade not being able to be done this month (but guarantees from Winnie that it will be done before the the 30th of September). On hold (7/8).

GLASGOW</br> https://ggus.eu/ws/ticket_info.php?ticket=83283 (14/6)</br> lhcb cvmfs problems. Dave cited https://savannah.cern.ch/bugs/index.php?95420 & https://savannah.cern.ch/support/?129468 (18/6). Has there been any plans to try out the newer versions of cvmfs (or does the problem even still exist?). Jeremy set to Waiting for Reply (31/7). On the 8/8 Mark sparked discussion with lhcb over whether or not failures persisted (the site had in fact been banned for most of the period of this ticket). It was confirmed that the problem is still plagues lhcb jobs, and the list of problem nodes corresponds to the new "high-density" workers. Glasgow currently working on installing the cvmfs upgrade to fix lhcn & atlas problems at the site, but it will take a few days. In progress (10/8)

Tickets from the UK</br> https://ggus.eu/ws/ticket_info.php?ticket=84993</br> Raul has sent out a batch of cloned tickets to VOs with data still on Brunel's soon to be retired dgc-grid-50.brunel.ac.uk (ref https://ggus.eu/ws/ticket_info.php?ticket=84639).

https://ggus.eu/ws/ticket_info.php?ticket=85021 (9/8)</br> Emyr ticketed EGI over their improper signing of umd-release-1.8.0-1.el5.noarch.rpm. No word yet on the ticket.

Of Interest:</br> https://ggus.eu/ws/ticket_info.php?ticket=85029</br> Daniela pointed out on TB-SUPPORT a probably cause of some sites intermittent Ops test failures.

No exciting solved cases this week.

Monday 6th of August, 13:00 BST</br>

24 Open UK Tickets this week. A couple of sites forgot to "In Progress" their tickets after starting work on them last week , I stepped in and interfered with them. No sign of any site's not being notified about their tickets this week.

UK</br> https://ggus.eu/ws/ticket_info.php?ticket=80259 (14/4)</br> Creation of the neurogrid.incf.org VO. WMS & LFC request ticket submitted (84408 below) (20/7). GGUS registration ticket submitted (84848) (6/8) Update-Jeremy asks if RAL LFC should be a central or a local lfc for the VO (see 84408) (6/8)

TIER 1</br> https://ggus.eu/ws/ticket_info.php?ticket=84655 (30/7)</br> SNO+ are having trouble using the RAL WMSs (WMSIi? WMSices?). The RAL team have been trying some config tweaks to hit the sweetspot for SNO+. One wms is working (1/8) and waiting for reply from Matt M to see if the other is working now. (3/8) Update-lcgwms appears to be working now for Sno+ (6/8)

https://ggus.eu/ws/ticket_info.php?ticket=84408 (20/7)</br> Request to enable neurogrid on WMS & LFC, some delays due to holidays. Catalin asked if the LFC is to be "local or central", Jeremy replied that it should probably be central (needs double checking). No news on WMS, which is the priority. (31/7)

https://ggus.eu/ws/ticket_info.php?ticket=84492 (24/7)</br> SNO+ Jobs were not being matched to their queue at RAL, seems to be a problem with jobs (submitted via Ganga to the WMS) matching against GlueHostMainMemoryVirtualSize (which was not set) rather than GlueHostMainMemoryRAMSize. GlueHostMainMemoryVirtualSize has been set for the queue in question now, Waiting for reply from SNO+ (27/7).

https://ggus.eu/ws/ticket_info.php?ticket=83927 (6/7)</br> SNO+ attempting to get FTS to work for them. This ticket has been around the houses, before being set on on the RAL FTS (24/7). After a few tweaks much progress seems to have been made, hopefully things will work now, waiting for reply (26/7). Some additional advice on how to check transfers was given to SNO+ (1/8). Update -seems to be working for SNO+ now, ticket closed (6/8)

QMUL</br> https://ggus.eu/ws/ticket_info.php?ticket=84793 (3/8)</br> Hone were seeing job problems (originally thought to be one of their usual "in scheduled status for too long" tickets). Chris revealed that a problem with the batch system was actually the culprit. Jobs are flowing for hone again, looks like the ticket can be closed (6/8)

IC</br> https://ggus.eu/ws/ticket_info.php?ticket=84760 (2/8)</br> Hone were seeing jobs being cancelled on IC queues. I.C. suffered from power problems followed by the CE playing up. *Should* be fixed as of Friday, waiting for confirmation from hone (should be "Waiting for Reply really). (3/8)

BRUNEL</br> https://ggus.eu/ws/ticket_info.php?ticket=84639 (30/7)</br> Brunel's DPM is below the WLCG recommended level. Request from Brian for upgrade plans. (related to 68853). Raul has replied stating that the SE is small, with only a few TB of non-LHC data on it. He confirms that it will be upgraded before the deadline (I'm not sure which deadline that is though). (31/7)

DURHAM</br> https://ggus.eu/ws/ticket_info.php?ticket=84123 (11/7)</br> High atlas production failure rate. Mike offlined suspect nodes (19/7) but the UK cloud has set Durham to test mode (20/7). Power work went "badly" (smoking PDU badly) (1/8). PDU stabilisation working was to take place on the 2nd Aug, but no update from the site since (1/8).

The other Durham tickets can and should be put On Hold if infrastructure problems continue (progress requires power):</br> https://ggus.eu/ws/ticket_info.php?ticket=83950 (7/7)</br> lhcb cvmfs problems. First attempts at triage failed, and recent attempts by lhcb to confirm the problem fixed have been blocked by job submission problems (26/7).</br> https://ggus.eu/ws/ticket_info.php?ticket=68859 (22/3/11)</br> Brian's request for DPM upgrade plans. As of 19/1 still had disk servers to update. 30/7 Brian requested some more information.</br> https://ggus.eu/ws/ticket_info.php?ticket=75488 (19/10/11)</br> Compchem were seeing authentication problems at a number of sites, including Durham. On Hold (19/1). On 18/7 Mark M will poke Mike to see if the ticket can be closed.

GLASGOW</br> https://ggus.eu/ws/ticket_info.php?ticket=83283 (14/6)</br> lhcb cvmfs problems. Dave cited https://savannah.cern.ch/bugs/index.php?95420 & https://savannah.cern.ch/support/?129468 (18/6). Has there been any plans to try out the newer versions of cvmfs (or does the problem even still exist?). Jeremy set to Waiting for Reply (31/7).

SUSSEX</br> https://ggus.eu/ws/ticket_info.php?ticket=81784 (1/5)</br> It looks like this journey is nearing its end. Emyr has experienced the joy that only "all green" site nagios pages can bring. Congratulations! Next steps will be discussed in this week's meeting. (6/8)


SOLVED CASES</br> https://ggus.eu/ws/ticket_info.php?ticket=84487</br> SNO+ are having curl problems at Oxford (although the same command worked at QMUL). It seems that SNO+ require very up to date SL/RHEL "CA bundles" to get the latest DigiCert CA that SNO+ required. A mirror problem meant that this update was missed at Oxford. If everyone has the latest security updates installed (which we should do) then as Ewan pointed out this problem should never be seen again. But still worth noting.

https://ggus.eu/ws/ticket_info.php?ticket=83213</br> Chris W ticketed ngs.ac.uk concerning the decommissioning of ce03.esc.qmul.ac.uk. After a long while they got back to Chris saying that there's "Nothing to do here". It appears that the ngs don't require to be notified in such a way if they're removed from a CE as long as the corresponding entries are removed from the Information System.

TICKETS FROM THE UK</br> No exciting happenings on this front that I can see.

Monday 30th of July, 13:00 BST</br>

22 Open UK Tickets this week.</br> I'll start with a quick reminder to people that the job of "In Progress"-ing of tickets has fallen back to the sites admins.

UK</br> https://ggus.eu/ws/ticket_info.php?ticket=84381 (19/7)</br> Ticket to track the creation of a new VO for the COMET experiment (possibly to be called comet.j-parc.jp). A request to the voms admins was submitted at the time of the ticket. Increased the list of cc'd parties (24/7).

https://ggus.eu/ws/ticket_info.php?ticket=80259 (14/4)</br> Creation of the neurogrid.incf.org VO. Requests to GGUS and for the enabling of the VO on the WMS & LFC going out (20/7).

TIER 1</br> https://ggus.eu/ws/ticket_info.php?ticket=84408 (20/7)</br> Request to enable neurogrid on WMS & LFC, in progress on the 20th but no word since.

https://ggus.eu/ws/ticket_info.php?ticket=83927 (6/7)</br> SNO+ attempting to get FTS to work for them. This ticket has been around the houses, before being set on on the RAL FTS (24/7). After a few tweaks much progress seems to have been made, hopefully things will work now, waiting for reply (26/7).

https://ggus.eu/ws/ticket_info.php?ticket=84503 (24/7)</br> SNO+ asked for python-dev packages to be installed at RAL, who would rather not put it on their workers and so SNO+ have been asked if they can install it on their software area (25/7).

https://ggus.eu/ws/ticket_info.php?ticket=84492 (24/7)</br> SNO+ Jobs were not being matched to their queue at RAL, seems to be a problem with jobs (submitted via Ganga to the WMS) matching against GlueHostMainMemoryVirtualSize (which was not set) rather than GlueHostMainMemoryRAMSize. GlueHostMainMemoryVirtualSize has been set for the queue in question now, Waiting for reply from SNO+ (27/7).

OXFORD</br> https://ggus.eu/ws/ticket_info.php?ticket=84487 (24/7)</br> SNO+ are having curl problems at Oxford (although the same command works at QMUL). This ticket seems to have got stuck, I kicked it into notifying the site and Ewan promptly in progressed the ticket (30/7).

  • As a side note all the above jobs seem to have been victim of a game of ticket tennis, or in the latter case noto assigned at all. Some problem with SNO+ tickets? Or was it simply that Matt wasn't notifying site's manually as many more veteran submitters do.

LANCASTER</br> https://ggus.eu/ws/ticket_info.php?ticket=84461 (23/7)</br> t2k.org transfer errors. The disk server is okay, the problem is most likely "in the pipes" at the Lancaster end of the lightpath. Involving the local networking team, on hold till they get back to us with a solution (26/7).

https://ggus.eu/ws/ticket_info.php?ticket=84583 (26/7)</br> LHCB jobs aborting on one of the Lancaster CEs, reopened. The error message is "Transfer to CREAM failed due to exception: Failed to create a delegation id for job https://wms302.cern.ch:9000/9meup5GIEhvKFl6t1ogUhw: reason is Delegation ID '13432989762E590625wms3022Ecern2Ech' already exists for client". Google has failed me, and cleaning up the jobs didn't fix the problem. Has anyone else seen this error message? In progress (30/7).

BRUNEL</br> https://ggus.eu/ws/ticket_info.php?ticket=84639 (30/7)</br> Brunel's DPM is below the WLCG recommended level. Request from Brian for upgrade plans. (related to 68853).

MANCHESTER</br> https://ggus.eu/ws/ticket_info.php?ticket=84579 (26/7)</br> Hone had jobs in a scheduled status for a long time on one of Manchester's queues. There was also a transient SE problem mentioned in the ticket. Looks like the ticket can be closed as of Friday (27/7).

DURHAM</br> https://ggus.eu/ws/ticket_info.php?ticket=84123 (11/7)</br> High atlas production failure rate. Mike offlined suspect nodes (19/7) but the UK cloud has set Durham to test mode (20/7).

https://ggus.eu/ws/ticket_info.php?ticket=83950 (7/7)</br> lhcb cvmfs problems. First attempts at triage failed, and recent attempts by lhcb to confirm the problem fixed have been blocked by job submission problems (26/7).

https://ggus.eu/ws/ticket_info.php?ticket=68859 (22/3/11)</br> Brian's request for DPM upgrade plans. As of 19/1 still had disk servers to update. 30/7 Brian requested some more information.

https://ggus.eu/ws/ticket_info.php?ticket=75488 (19/10/11)</br> Compchem were seeing authentication problems at a number of sites, including Durham. On Hold (19/1). On 18/7 Mark M will poke Mike to see if the ticket can be closed.

RHUL</br> https://ggus.eu/ws/ticket_info.php?ticket=83627 (27/6)</br> Biomed seeing negative space published. Possibly related to ticket #81439. Despite database cleanup and extensive investigation the problem persists. Still in progress (20/7).

GLASGOW</br> https://ggus.eu/ws/ticket_info.php?ticket=83283 (14/6)</br> lhcb cvmfs problems. Dave cited https://savannah.cern.ch/bugs/index.php?95420 & https://savannah.cern.ch/support/?129468 (18/6). Has there been any plans to try out the newer versions of cvmfs (or does the problem even still exist?).

SUSSEX</br> https://ggus.eu/ws/ticket_info.php?ticket=81784 (1/5)</br> The Sussex saga continues. Emyr continues to battle bravely, with support from Ewan (ably filling in for Kashif) and Daniela. Tests have been moved to the Imperial WMS to work around some oddities that were seen, but weird possibly bdii related troubles still haunt the endeavour.

BRISTOL https://ggus.eu/ws/ticket_info.php?ticket=80155 (12/3)</br> Upgrade plans for the Bristol SE. Winnie has outlined a plan (9/7), ticket has been put on hold (18/7) until the end of August.

"OTHER"</br> https://ggus.eu/ws/ticket_info.php?ticket=68853 (22/3/11)</br> The "master ticket" to Brian's Crusty SE Upgrade/Decommissioning queries. On hold (17/7), only Durham, Bristol & Brunel left.

https://ggus.eu/ws/ticket_info.php?ticket=83213 (12/6)</br> Chris W ticketed ngs.ac.uk concerning the decommissioning of ce03.esc.qmul.ac.uk. No reply from them. Did they even get the message?

https://ggus.eu/ws/ticket_info.php?ticket=82492 (24/5)</br> Chris' ticket concerning VOMS re-signing requests. On Hold until the voms handover back to GridPP is complete (24/7).

SOLVED CASES</br> No ground breaking cases have been solved over the last week.

THE UK'S TICKETS</br> I still don't have a good way of tracking tickets submitted by us. If you have a ticket that you think we'd all be interested in, please send me a link. I'll go over these tickets in detail next week.

https://ggus.eu/ws/ticket_info.php?ticket=84015</br> A ticket for Lancaster's LSF apel problems.

https://ggus.eu/ws/ticket_info.php?ticket=84641</br> Daniela spotted a CMS user running multicore jobs naughtily.


Monday 23rd of July, 23:00 BST by Jeremy</br>

19 tickets in the open state.

LANCASTER </br> https://ggus.eu/ws/ticket_info.php?ticket=84461 Failing transfers for t2k.org. (23/07)

RAL TIER-1 </br> https://ggus.eu/ws/ticket_info.php?ticket=84408 </br> Enable neurogrid.incf.org on WMS and LFC (in progress 20/07) </br> https://ggus.eu/ws/ticket_info.php?ticket=84270 </br> To confirm addition of lcgbdii.gridpp.rl.ac.uk to WLCG recommended Top BDII list: https://tomtools.cern.ch/confluence/display/IS/WLCG_Support_Proposal. </br> https://ggus.eu/ws/ticket_info.php?ticket=83927 </br> snoplus glite-transfer permissions issue. Looks like FTS channels were not configured. Suggestions sent back for endpoints (19/07). </br> https://ggus.eu/ws/ticket_info.php?ticket=68853 (22/03/2011) </br> Retirement of SL4 and 32-bit head nodes and servers. On hold but still valid 17/07.

RHUL </br> https://ggus.eu/ws/ticket_info.php?ticket=83627 (27/06)</br> Biomed – SE reporting invalid used space. Work in progress! (20/07)

GLASGOW </br> https://ggus.eu/ws/ticket_info.php?ticket=83283 (14/06)</br> LHCb job failures. Related to CVMFS timeouts? (https://savannah.cern.ch/bugs/index.php?95420) (09/07). Put on hold?

NGS </br> https://ggus.eu/ws/ticket_info.php?ticket=83213 (12/06)</br> Decommissioning of CE03. Ticket to ngs.ac.uk VO. Close?

IMPERIAL </br> https://ggus.eu/ws/ticket_info.php?ticket=82946 (07/06)</br> Possible CVMFS issue for ATLAS. Cache problem? (19/07)

MANCHESTER </br> https://ggus.eu/ws/ticket_info.php?ticket=82492 (24/05)</br> VOMS server AUP resining requests. Reopened 11/07 – multiple reminders should be possible. Put on-hold?

NGI_UK </br> https://ggus.eu/ws/ticket_info.php?ticket=84381</br> New VO for the COMET experiment (proposed name comet.j-parc.jp) </br>

https://ggus.eu/ws/ticket_info.php?ticket=81784</br> Certification of UKI-SOUTHGRID-SUSX (01/05). Jobs stay running. (23/07) </br>

https://ggus.eu/ws/ticket_info.php?ticket=80259 (14/03)</br> Creation of neuroscience VO. Waiting on WMS enablement (20/07). Also adding to GGUS.

BRISTOL </br> https://ggus.eu/ws/ticket_info.php?ticket=80155 (12/03)</br> Timeline for SE upgrade/decommissioning. (on hold). Retire v1.3: “Ideally before end of August we hope” (09/07). Brian to comment…

ECDF </br> https://ggus.eu/ws/ticket_info.php?ticket=80152 (12/03)</br> Timeline for SE upgrade/decommissioning. On-hold. Waiting for release? (09/07)

DURHAM </br> https://ggus.eu/ws/ticket_info.php?ticket=84123 </br> Job failures at site (open 11/07). WNs put offline. Site forced into test mode. </br>

https://ggus.eu/ws/ticket_info.php?ticket=83950</br> CVMFS problem. Squid server had fallen over. Waiting for reply. </br> https://ggus.eu/ws/ticket_info.php?ticket=75488 (19/10/11)</br> Authentication problems in some CEs (compchem VO). Lots of chasing on this ticket! Mark M checking as of 18/07.</br> https://ggus.eu/ws/ticket_info.php?ticket=68859 (22/03/11)</br> Retirement of SL4 and 32-bit DPM head node and servers. Lots of chasing on this ticket. Still valid 17/07.


Solved cases not reviewed.

Monday 9th of July, 13:30 BST</br>

21 Open UK GGUS tickets this week.</br>

NEW</br> https://ggus.eu/ws/ticket_info.php?ticket=84066 Durham have a availability/reliability ticket for June.

NGI</br> https://ggus.eu/ws/ticket_info.php?ticket=83794</br> There's been a request to update the latitude & longitude information for sites in the gocdb, although this is being handled "centrally" so sites shouldn't have to worry about it.

https://ggus.eu/ws/ticket_info.php?ticket=80259</br> Mark has set himself up as a temp VO manager for neurogrid.incf.org, and opened another ticket (https://ggus.eu/ws/ticket_info.php?ticket=83926) to cover the final VO registration steps in the CiC portal.

QMUL</br> https://ggus.eu/ws/ticket_info.php?ticket=83773</br> atlas were being hit by cvmfs timeouts (https://savannah.cern.ch/support/?129468), Chris rolled out the new (beta) version of cvmfs (cvmfs-2.0.18-0.3.3574svn) which seems to be settling things. Similar to Glasgow's ticket https://ggus.eu/ws/ticket_info.php?ticket=83283 (although I believe Glasgow have bigger fish to fry at the moment).

https://ggus.eu/ws/ticket_info.php?ticket=83587</br> SNO+ are working on rolling out "git" in their software area, currently working out a few kinks. Matt M asks what the situation is for a cvmfs host at RAL?

SUSSEX</br> https://ggus.eu/ws/ticket_info.php?ticket=81784</br> Emyr is back from his holiday to come home to expired certificates and troubles with Jeremy M losing his CA status! Although everyone's having a hard time with certs this week.

ECDF</br> https://ggus.eu/ws/ticket_info.php?ticket=80152</br> Wahid's very succinct reply to Matt H's query on this ticket's progress made me smile. But it raises the question that are these short "ticket proddings" useful? Is there an alternative?

SOLVED CASES</br> RHUL</br> https://ggus.eu/ws/ticket_info.php?ticket=83933</br> https://ggus.eu/ws/ticket_info.php?ticket=83912</br> Suffered a DPM crash last Friday. Have you upgraded to a "patched" release of dpm (moving to glite 1.8.2-5 fixed things for Lancaster, the latest EMI releases should be immune to the common causes of these crashes).

TICKETS FROM THE UK</br> https://savannah.cern.ch/support/?130203</br> T2K have requested to be added as a VO to GGUS.


Monday 2nd of July, 14:00 BST</br>

23 Open UK tickets this week.</br>

TIER 1</br> https://ggus.eu/ws/ticket_info.php?ticket=83672</br> Might be of interest to others, GGUS will be updating the certificate it uses to sign alerts next Monday (9th).

OXFORD</br> https://ggus.eu/ws/ticket_info.php?ticket=83711</br> Hone asked if something can be done to increase job throughput at Oxford, which was kindly obliged. There's been a few sites that I noticed ticketed by hone, who don't seem to want more then for jobs to spend <24 hours queuing. Do people find these requests reasonable.

RHUL/BIRMINGHAM</br> https://ggus.eu/ws/ticket_info.php?ticket=83627</br> https://ggus.eu/ws/ticket_info.php?ticket=83628</br> Biomed are experiencing "negative space" values for lcg-infosite queries at these two sites.

QMUL/SNO+</br> https://ggus.eu/ws/ticket_info.php?ticket=83587</br> SNO+ have asked for git to be installed on the QMUL clusters, which could be a bit of a blimmer to implement. This request could find its way to other sites.

T2K.org</br> ref: https://ggus.eu/ws/ticket_info.php?ticket=83209</br> Having a poke around I don't think that t2k.org have ticketed ggus about getting a VO entry.

IC</br> https://ggus.eu/ws/ticket_info.php?ticket=82946</br> This old ticket concerning some odd cvmfs behaviour experienced by atlas has been updated to include some interesting information about running the SW tests "by hand".</br>

NEW THIS MORNING</br> LANCASTER</br> https://ggus.eu/ws/ticket_info.php?ticket=83812</br> The dpm crashing all the time bug has struck us again, and struck hard. We're on glite 1.8.2-3.

No "Solved Cases" stand out this week, nor due any Tickets from the UK. YVMV of course!

Monday 25th of June, 14:00 BST</br> 17 open UK tickets this week, nothing very exciting going down.</br> Java has broken on my machine so I might not be able to join the meeting in time, but there's not too much to report.

DURHAM</br> https://ggus.eu/ws/ticket_info.php?ticket=83473</br> Yet another in the trend of lhcb cvmfs-related tickets, along with these two open tickets from last week:</br> https://ggus.eu/ws/ticket_info.php?ticket=82946 (IC, also atlas rather then lhcb)</br> https://ggus.eu/ws/ticket_info.php?ticket=83283 (GLASGOW)</br> NEW https://ggus.eu/ws/ticket_info.php?ticket=83577 (Brunel)


NGI</br> https://ggus.eu/ws/ticket_info.php?ticket=82492</br> Chris' ticket to the VOMS admins, possibly moot now (something to take into consideration for the new GridPP voms servers?).</br>

SOLVED CASES</br> https://ggus.eu/ws/ticket_info.php?ticket=83376</br> Oxford also saw some lhcb cvmfs troubles, although they were swiftly fixed.</br>

FROM THE UK</br> https://ggus.eu/ws/ticket_info.php?ticket=83133</br> na62.gridpp.ac.uk should (finally) have an fts instance up and running at CERN.</br>

NEW https://ggus.eu/ws/ticket_info.php?ticket=83562</br> Chris W has ticketed the APEL team concerning the default SGE apel parser memory settings being too low. This is a known problem, the APEL team are on it.

Monday 18th of June, 13:00 BST</br> 22 Open UK tickets this week.

NGI</br> https://ggus.eu/ws/ticket_info.php?ticket=80259</br> A few finishing touches and neurogrid.incf.org will be ready for launch.

OXFORD</br> https://ggus.eu/ws/ticket_info.php?ticket=83330</br> Atlas FTS transfers to Oxford were suffering from time out failures (that appeared to occur in batches). As I understand it the Oxford-RAL timeout settings had been reduced from their original (very high) settings, they've now been loosened up somewhat.

GLASGOW</br> https://ggus.eu/ws/ticket_info.php?ticket=83283</br> LHCB have been having software-setting-up problems on some nodes, Dave expects this is due to problems chronicled in https://savannah.cern.ch/bugs/index.php?95420 & https://savannah.cern.ch/support/?129468 compounded by local bandwidth problems to some subsets of their machines.

QMUL</br> https://ggus.eu/ws/ticket_info.php?ticket=83020</br> Chris is waiting on the the availability/reliability site to fix their certificate chain (https://ggus.eu/ws/ticket_info.php?ticket=83237) before he can fully comment on their stats for May.

https://ggus.eu/ws/ticket_info.php?ticket=83198</br> Queen Mary are decommissioning one of their CEs (ce03.esc.qmul.ac.uk), Chris split this ticket into 15 and assigned it to each VO it supported. Which leads to..

T2K</br> reference: https://ggus.eu/ws/ticket_info.php?ticket=83209</br> As seen in this incarnation of Chris' ticket, t2k have requested that t2k.org get a VO entry in GGUS. Has anyone started the ball rolling on this?

(PS Chris, the pheno & camont tickets looks like it can be closed, I suspect the ngs one will take some time...)

DURHAM</br> https://ggus.eu/ws/ticket_info.php?ticket=83006</br> Availability/Reliability for May ticket. Mike put in a good (in my eyes) answer last week, but no movement from elsewhere on this ticket.

https://ggus.eu/ws/ticket_info.php?ticket=82214</br> https://ggus.eu/ws/ticket_info.php?ticket=82818</br> Both these tickets are looking almost wrapped up, nice one!

IC</br> https://ggus.eu/ws/ticket_info.php?ticket=82946</br> Still watching this ticket on atlas troubles with cvmfs, no movement although Daniela is on the case.

SUSSEX</br> https://ggus.eu/ws/ticket_info.php?ticket=81784</br> The certification infrastructure at GRNET has started to cause problems (again), Jeremy ticketed them (https://ggus.eu/ws/ticket_info.php?ticket=83284).

SOLVED CASES</br> https://ggus.eu/ws/ticket_info.php?ticket=83326</br> Raul at Brunel were having cvmfs troubles on a few nodes, fixed by a forced clean-up & restart. Not very interesting on its own, but there seems to be a number of cvmfs tickets cropping up.

NEW https://ggus.eu/ws/ticket_info.php?ticket=82670</br> SNO+ ticket that Daniela brought back to my attention from last week, the apparent WMS problem was actually a CREAM side "misconfiguration", details in the ticket and e-mail Daniela sent to the list.

FROM THE UK:</br> (https://www.gridpp.ac.uk/wiki/Tickets_From_The_UK)</br> No significant change since last week on existing tickets.

https://ggus.eu/ws/ticket_info.php?ticket=83243</br> Daniela noticed that IC weren't updating in APEL, this looks to be caused by the Imperial CEs not being registered in the gocdb as APEL endpoints.

https://ggus.eu/ws/ticket_info.php?ticket=83352</br> Daniela's ticket to track problems seen in the SL6/EMI2 bdii.