Difference between revisions of "Operations Bulletin Latest"

From GridPP Wiki
Jump to: navigation, search
()
()
Line 353: Line 353:
 
===== =====
 
===== =====
 
<!-- ******************Edit start********************* ----->
 
<!-- ******************Edit start********************* ----->
'''Monday 12th May 2014, 14.30 BST/'''</ br>
+
'''Monday 12th May 2014, 14.30 BST/'''<br />
  
 
A mere 27 open tickets for the UK today.
 
A mere 27 open tickets for the UK today.
  
NGI
+
'''NGI'''<br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=101502 (24/2)
+
https://ggus.eu/index.php?mode=ticket_info&ticket_id=101502 (24/2)<br />
 
The ILC cvmfs ticket. Only Durham is left (I actually missed out Durham last few times I looked at this ticket). So it's all on you Durham chaps now. No pressure (except, there is a little bit). In Progress (7/5)
 
The ILC cvmfs ticket. Only Durham is left (I actually missed out Durham last few times I looked at this ticket). So it's all on you Durham chaps now. No pressure (except, there is a little bit). In Progress (7/5)
  
SUSSEX
+
'''SUSSEX'''<br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=102810 (28/3)
+
https://ggus.eu/index.php?mode=ticket_info&ticket_id=102810 (28/3)<br />
 
Sussex's EMI3 upgrade ticket. Matt's fighting the good fight, and hopes to have it all sorted soon. Let us know if you need a hand Matt! In Progress (8/5)
 
Sussex's EMI3 upgrade ticket. Matt's fighting the good fight, and hopes to have it all sorted soon. Let us know if you need a hand Matt! In Progress (8/5)
  
RALPP
+
'''RALPP'''<br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=105290 (9/5)
+
https://ggus.eu/index.php?mode=ticket_info&ticket_id=105290 (9/5)<br />
 
The ROD has spotted Glue2 Validation errors on the RALPP bdii. Chris B spotted the ticket, but no news. In progress (9/5)
 
The ROD has spotted Glue2 Validation errors on the RALPP bdii. Chris B spotted the ticket, but no news. In progress (9/5)
  
BRISTOL
+
'''BRISTOL'''<br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=102205 (14/3)
+
https://ggus.eu/index.php?mode=ticket_info&ticket_id=102205 (14/3)<br />
 
Bristol's EMI3 ticket. Winnie has beaten the site-BDII into EMI3 shape and is visiting the same fate on their cream CEs and WNs, with one CE already converted and two more about to fall. Make sure you get the WNs too! On Hod (should really be In Progress) (12/5)
 
Bristol's EMI3 ticket. Winnie has beaten the site-BDII into EMI3 shape and is visiting the same fate on their cream CEs and WNs, with one CE already converted and two more about to fall. Make sure you get the WNs too! On Hod (should really be In Progress) (12/5)
  
https://ggus.eu/index.php?mode=ticket_info&ticket_id=105189 (6/5)
+
https://ggus.eu/index.php?mode=ticket_info&ticket_id=105189 (6/5)<br />
 
LHCB jobs having some trouble at Bristol, Winnie thinks it's some dodgey nodes at fault and is working on it. Waiting to see if failure continue. Waiting for Reply (7/5)
 
LHCB jobs having some trouble at Bristol, Winnie thinks it's some dodgey nodes at fault and is working on it. Waiting to see if failure continue. Waiting for Reply (7/5)
  
GLASGOW
+
'''GLASGOW'''<br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=101565 (26/2)
+
https://ggus.eu/index.php?mode=ticket_info&ticket_id=101565 (26/2)<br />
 
Publishing Max CPU time for LHCB. I believe that we've left it with LHCB asking that it be set to "a value that is obviously made up but isn't the default value" (although I could have the wrong end of the mace here). Been on hold for a while, so we probably want to make some kind of ruling. On Hold (8/4)
 
Publishing Max CPU time for LHCB. I believe that we've left it with LHCB asking that it be set to "a value that is obviously made up but isn't the default value" (although I could have the wrong end of the mace here). Been on hold for a while, so we probably want to make some kind of ruling. On Hold (8/4)
  
EDINBURGH
+
'''EDINBURGH'''<br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=95303 (1/7/2013)
+
https://ggus.eu/index.php?mode=ticket_info&ticket_id=95303 (1/7/2013)<br />
 
glexec ticket. No news here - sorry. On Hold (27/1)
 
glexec ticket. No news here - sorry. On Hold (27/1)
  
https://ggus.eu/index.php?mode=ticket_info&ticket_id=102201 (14/3)
+
https://ggus.eu/index.php?mode=ticket_info&ticket_id=102201 (14/3)<br />
 
The ECDF EMI3 upgrade ticket. Had some problems with a lingering ghost of their previous site-BDII, but hopefully time has exorcised that gremlin and the new EMI3 CE will be seen too, which just leaves one straggler to be dealt with. On Hold (should probably be In Progress) (9/5)
 
The ECDF EMI3 upgrade ticket. Had some problems with a lingering ghost of their previous site-BDII, but hopefully time has exorcised that gremlin and the new EMI3 CE will be seen too, which just leaves one straggler to be dealt with. On Hold (should probably be In Progress) (9/5)
  
https://ggus.eu/index.php?mode=ticket_info&ticket_id=105267 (8/5)
+
https://ggus.eu/index.php?mode=ticket_info&ticket_id=105267 (8/5)<br />
 
The other ECDF EMI3 upgrade ticket. Actually this only got submitted by Daniela to satisfy the dashboard demons, probably as you can't physically lift the ROD dashboard to throw it out of the window and shut it up that way. On Hold (12/5)
 
The other ECDF EMI3 upgrade ticket. Actually this only got submitted by Daniela to satisfy the dashboard demons, probably as you can't physically lift the ROD dashboard to throw it out of the window and shut it up that way. On Hold (12/5)
  
DURHAM
+
'''DURHAM'''<br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=103722 (14/4)
+
https://ggus.eu/index.php?mode=ticket_info&ticket_id=103722 (14/4)<br />
 
Durham's EMI3 upgrade ticket. Daniela has extended the ticket to the zeroth hour. Let us know if you chaps get stuck on anything, but it looks like you have the upper hand. In Progress (2/5)
 
Durham's EMI3 upgrade ticket. Daniela has extended the ticket to the zeroth hour. Let us know if you chaps get stuck on anything, but it looks like you have the upper hand. In Progress (2/5)
  
SHEFFIELD
+
'''SHEFFIELD'''<br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=105090 (2/5)
+
https://ggus.eu/index.php?mode=ticket_info&ticket_id=105090 (2/5)<br />
 
Sheffield had some CE nagios failures, but it looks like that storm has passed, with nothing but green as far as the eye can see on the nagios pages. Elena asks if she can close the ticket (i.e. has the alarm disappeared from the dashboard?). Waiting for reply (12/5)
 
Sheffield had some CE nagios failures, but it looks like that storm has passed, with nothing but green as far as the eye can see on the nagios pages. Elena asks if she can close the ticket (i.e. has the alarm disappeared from the dashboard?). Waiting for reply (12/5)
  
LIVERPOOL
+
'''LIVERPOOL'''<br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=105299 (9/5)
+
https://ggus.eu/index.php?mode=ticket_info&ticket_id=105299 (9/5)<br />
 
Liverpool also have received a ROD ticket, this time of the Glue2 validation variety. Steve has set it in progress. In Progress (9/5)
 
Liverpool also have received a ROD ticket, this time of the Glue2 validation variety. Steve has set it in progress. In Progress (9/5)
  
LANCASTER
+
'''LANCASTER'''<br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=95299 (1/7/2013)
+
https://ggus.eu/index.php?mode=ticket_info&ticket_id=95299 (1/7/2013)<br />
 
Lancaster's glexec ticket. No news I'm afraid. On Hold (4/4)
 
Lancaster's glexec ticket. No news I'm afraid. On Hold (4/4)
  
https://ggus.eu/index.php?mode=ticket_info&ticket_id=100566 (27/1)
+
https://ggus.eu/index.php?mode=ticket_info&ticket_id=100566 (27/1)<br />
 
Lancaster's PerfSonar sucking. Duncan has suggested a reinstall, and noticed spikes of goodness. A reinstall has been put on the todo list. On Hold (12/5)
 
Lancaster's PerfSonar sucking. Duncan has suggested a reinstall, and noticed spikes of goodness. A reinstall has been put on the todo list. On Hold (12/5)
  
UCL
+
'''UCL'''<br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=102193 (14/3)
+
https://ggus.eu/index.php?mode=ticket_info&ticket_id=102193 (14/3)<br />
 
UCL's EMI3 upgrade ticket. Quiet, but Ben had scheduled the date for the upgrade as the 13th. Hopefully we'll hear positive news from him shortly. On Hold (30/4).
 
UCL's EMI3 upgrade ticket. Quiet, but Ben had scheduled the date for the upgrade as the 13th. Hopefully we'll hear positive news from him shortly. On Hold (30/4).
  
https://ggus.eu/index.php?mode=ticket_info&ticket_id=101285 (16/2)
+
https://ggus.eu/index.php?mode=ticket_info&ticket_id=101285 (16/2)<br />
 
UCL's perfsonar host carking it. And last work Ben had brought it back from the great beyond and hoped to have a reinstall done on the 30/4. No word since though. On Hold (28/4)
 
UCL's perfsonar host carking it. And last work Ben had brought it back from the great beyond and hoped to have a reinstall done on the 30/4. No word since though. On Hold (28/4)
  
https://ggus.eu/index.php?mode=ticket_info&ticket_id=95298 (1/7/13)
+
https://ggus.eu/index.php?mode=ticket_info&ticket_id=95298 (1/7/13)<br />
 
UCL's glexec ticket. Ben mentions a new chap being deputised, and that this will likely have to wait until then. On Hold (16/4)
 
UCL's glexec ticket. Ben mentions a new chap being deputised, and that this will likely have to wait until then. On Hold (16/4)
  
https://ggus.eu/index.php?mode=ticket_info&ticket_id=104824 (22/4)
+
https://ggus.eu/index.php?mode=ticket_info&ticket_id=104824 (22/4)<br />
 
Nagios ticket due to low site availability, caused by a period of outdated CA RPMs. Just waiting for the numbers to pick up again. In progress (6/5)
 
Nagios ticket due to low site availability, caused by a period of outdated CA RPMs. Just waiting for the numbers to pick up again. In progress (6/5)
  
QMUL
+
'''QMUL'''<br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=103028 (6/4)
+
https://ggus.eu/index.php?mode=ticket_info&ticket_id=103028 (6/4)<br />
 
A much talked about (and right so) atlas ticket, about job failures at QM essentially due to atlas jobs not requesting the right amount of RAM. There's a question from atlas "if all the questions have been answered". Have they? In Progress (8/5)
 
A much talked about (and right so) atlas ticket, about job failures at QM essentially due to atlas jobs not requesting the right amount of RAM. There's a question from atlas "if all the questions have been answered". Have they? In Progress (8/5)
  
BRUNEL
+
'''BRUNEL'''<br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=105324 (12/5)
+
https://ggus.eu/index.php?mode=ticket_info&ticket_id=105324 (12/5)<br />
 
Brunel are having some bother with their APEL publishing, it looks like there's a lot of missing data. In progress (12/5)
 
Brunel are having some bother with their APEL publishing, it looks like there's a lot of missing data. In progress (12/5)
  
TIER 1
+
'''TIER 1'''<br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=105161 (5/5)
+
https://ggus.eu/index.php?mode=ticket_info&ticket_id=105161 (5/5)<br />
 
Hone noticed their jobs in the ready status for a long time whilst submitted through the RAL WMSeses. Catalin has been engaging with Alexander to debug the issue. Waiting for reply (12/5)
 
Hone noticed their jobs in the ready status for a long time whilst submitted through the RAL WMSeses. Catalin has been engaging with Alexander to debug the issue. Waiting for reply (12/5)
  
https://ggus.eu/index.php?mode=ticket_info&ticket_id=105100 (2/5)
+
https://ggus.eu/index.php?mode=ticket_info&ticket_id=105100 (2/5)<br />
 
CMS have embarked on their next Storage Consistency Check. Andrew closed the ticket after providing the desired information, but CMS have reopened (wanting to keep the ticket to track the SCC). Reopened (needs to be put In Progress or On Hold) (12/5)
 
CMS have embarked on their next Storage Consistency Check. Andrew closed the ticket after providing the desired information, but CMS have reopened (wanting to keep the ticket to track the SCC). Reopened (needs to be put In Progress or On Hold) (12/5)
  
https://ggus.eu/index.php?mode=ticket_info&ticket_id=98249 (21/10/13)
+
https://ggus.eu/index.php?mode=ticket_info&ticket_id=98249 (21/10/13)<br />
 
cvmfs for Sno+. Things have picked up pace on this ticket, with Matt M ready to kick off the uploading the Sno+ tarball. Catalin has tweaked the web access to allow him to do so. Waiting for reply (12/5)
 
cvmfs for Sno+. Things have picked up pace on this ticket, with Matt M ready to kick off the uploading the Sno+ tarball. Catalin has tweaked the web access to allow him to do so. Waiting for reply (12/5)
  
https://ggus.eu/index.php?mode=ticket_info&ticket_id=105308 (11/5)
+
https://ggus.eu/index.php?mode=ticket_info&ticket_id=105308 (11/5)<br />
 
Atlas MCORE jobs failing with "Failed to open shared memory object: Permission denied". RAL team are looking at it. In progress (12/5)
 
Atlas MCORE jobs failing with "Failed to open shared memory object: Permission denied". RAL team are looking at it. In progress (12/5)
  
EFDA-JET
+
'''EFDA-JET'''<br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=97485 (21/9/2013)
+
https://ggus.eu/index.php?mode=ticket_info&ticket_id=97485 (21/9/2013)<br />
 
Longstanding LHCB authentication problem at JET. The Jet admins have exhausted all their ideas, and have asked for any help. As the problem survived the upgrades to SL6 and EMI3 it's probably something specific with their setup. On Hold (25/4)
 
Longstanding LHCB authentication problem at JET. The Jet admins have exhausted all their ideas, and have asked for any help. As the problem survived the upgrades to SL6 and EMI3 it's probably something specific with their setup. On Hold (25/4)
  

Revision as of 15:53, 12 May 2014

Bulletin archive


Week commencing 12th May 2014
Task Areas
General updates

Tuesday 6th May

  • WLCG workshop - responses considered as part of a list. If you notified Jeremy last week, please now go ahead and submit a visit notice as usual and book early.
  • WLCG A/R reports for April are now available. ALICE; ATLAS; CMS and LHCb.
  • The notified outage of the RAL top-BDII last week impacted other services such as Nagios in a way that could have been prevented. We must improve our change control and impact procedures within core ops. The approach will also be invoked when other (inter)nationally facing services are going to cause wider impacts (e.g. VOMS, DIRAC, website...).
  • The EGI statistics for April 2014 are also available.


Tuesday 29th April

  • There is an LHCOPN/LHCONE meeting at CERN - yesterday and today.
  • A reminder that there is a GOCDB service OUTAGE today 06:00 to 13:00 UTC (07:00 to 14:00 BST). After that it will be at risk. During the outage a read-only fail over service will be in use.
  • Planning for the May pre-GDB on Data Access and GDB is almost done.
  • Please email Jeremy if you have an interest in attending the WLCG workshop in July.
  • There was an EGI OMB meeting last Thursday. See the agenda here. Topics covered were:
    • Operations updates;
    • EGI Competence Centres Call;
    • Update on SAM migration;
    • Migration of 1st and 2nd level support;
    • Status of other core tasks;
    • Security updates;
    • New features of the accounting portal;
    • CVMFS task for update;
    • EMI-2 decommissioning update.
  • Pete Clarke circulated the final network forward look document update.


WLCG Operations Coordination - Agendas

Tuesday 6th May

  • There will be a WLCG ops coordination meeting this Thursday 8th May. Pre-meeting reports can be found in the twiki.

Tuesday 22nd April

  • There was a WLCG operations coordination planning meeting last Thursday. The minutes are now available. Also see the agenda.
  • There was a request to add xrootd endpoints of your site in GOCDB. Alessandra provided this status summary link.

Tuesday 15th April


Tier-1 - Status Page

Tuesday 6th May

  • Network intervention last Tuesday completed successfully.
  • New testing CVMFS client 2.1.19.
  • In process of scheduling Castor 2.1.14 upgrade.
  • The software server used by the small VOs will be withdrawn from service (aiming for June).
Storage & Data Management - Agendas/Minutes

Tuesday 6th May

  • There was a DPM collaboration meeting last Wednesday.
  • The following priorities were agreed for the next year:
    • YAIM->Puppet transition (YAIM support ends this year);
    • I/O Monitoring; GridFTP redirection - available now for testing;
    • Admin interface and improved HTTP file management;
    • Nightly testing of WAN HTTP access performance, Hammercloud;
    • Removal of legacy components where possible (eg RFIO);
    • System logging via dmlite;
    • Rebalancing utilities;
    • and move of web presence and docs to an indexable Drupal site.

Tuesday 22nd April

  • A DPM collaboration meeting is being planned for the coming week(s). Are there any site comments or feedback on DPM as a product (e.g. speed of new feature development) and the support it receives?


Accounting - UK Grid Metrics HEPSPEC06 Atlas Dashboard HS06

Tuesday 29th April

  • Glasgow looks slightly delayed with recent accounting data publishing.

Tuesday 15th April

  • The APEL accounting system has been undergoing database maintenance to improve performance and reliability. Networking problems at the RAL site have delayed completion of the operation. Sites may see nagios alerts warning them that they have not published accounting data for 7 days - these will stop after the maintenance work completes.
Documentation - KeyDocs

See the worst KeyDocs list for documents needing review now and the names of the responsible people.

Tuesday 6th April

  • KeyDocs are going to be reviewed (in next 4 weeks) as the system is not working (or not adding anything) in some areas.

Tuesday 15th April

Tuesday 1st April

  • Keydocs action needed by Jens J; Rob H/Security T; Alessandra F; Wahid B; David C and Matt D.
  • We need to reassign Mark M's documents on Core Grid Services


Tuesday 18th March

  • Keydocs action needed by: Mark M; Jens J; Rob H/Security T; Alessandra F; Wahid B; David C and Matt D.
Interoperation - EGI ops agendas

Tuesday May 6th

  • There was a meeting yesterday, the agenda is here: https://wiki.egi.eu/wiki/Agenda-05-05-2014
  • Two things to pull out:
    • UMD 3 EA list:
      • John Gordon was in touch with Cristina Aiftimiei and noted that in his opinion the UK sites listed as UMD-1/2 would probably still be taking part for UMD-3 (not least because UMD-1/2 are effectively no longer extant).
      • In the agenda is a list of a few UK sites that haven't confirmed their contacts with Joao Pina; please have a look and get back to him - all he's looking for is a note to make sure that the contact list is up to date.
    • EMI-2 decommissioning
  • Also discussed was the migration of Central SAM services & reconfiguration of NGIs SAM instances
  • Next meeting June 2


Monitoring - Links MyWLCG

Tuesday 6th May

On-duty - Dashboard ROD rota

Tuesday 6th May

  • One ticket expiry dealt with promptly.
  • A number of the "EMI-3" tickets have now been closed - there has been good progress. However, some do remain.
  • UCL ticket about low availability. The cause has been fixed. It is expected to stay open until their availability has risen to an acceptable level again.
  • Very slow refresh of the Nagios test results as seen on the ROD dashboard. In some cases the dashboard still showed test result

states for the previous day. Using gridppnagios display to see the 'real' state of any given Nagios test.

Tuesday 29th April

  • Ongoing problems with the dashboard. Issue escalated to EGI.
  • There was an update of Dashboard last Thursday which solved a few issues. It no longer shows warning alarms which is good. There were some other improvements as well.
  • EMI-2 deadline is approaching. The following need attention and are about to be escalated:
  • RHUL (cream2.ppgrid1.rhul.ac.uk) ) Last update in ticket: 16/4
  • Sussex (grid-cream-01.hpc.susx.ac.uk, grid-bdii.hpc.susx.ac.uk,) (Plus Sha-2 compliance for grid-cream-01.hpc.susx.ac.uk) ) Last update in ticket: 16/4
  • Bristol (lcgce03.phy.bris.ac.uk, lcgce04.phy.bris.ac.uk, lcgbdii.phy.bris.ac.uk) ) Last update in ticket: 23/4
  • ECDF (info2.glite.ecdf.ed.ac.uk, ce7.glite.ecdf.ed.ac.uk) Last update in ticket: 28/4
  • Durham (ce1.dur.scotgrid.ac.uk, se01.dur.scotgrid.ac.uk) ) Last update in ticket: 23/4

Tuesday 22nd April

  • There were ongoing problems with the dashboard last week. Several bugs, possibly including one related to the email function, have been fixed.
Rollout Status WLCG Baseline

Tuesday 18th March

Tuesday 11th February

  • 31st May has been set as the deadline for EMI-2 decommissioning. There may be an issue for dCache (related to 3rd party/enstore component).

References


Security - Incident Procedure Policies Rota

Tuesday 29th April

  • The changes to the regional dashboard make the on-duty task harder. Need to rely on Pakiti again.

Tuesday 15th April

  • Update on the OpenSSL status.
  • The discussion list members have been updated. Anyone missing?



Services - PerfSonar dashboard | GridPP VOMS

- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).

Tuesday 29th April

  • It was mentioned several weeks ago that the perfsonar meshes were being sorted by host name and that sorting by site name would be available soon. This is now the case. You can see the familiar GridPP site sorting here and the large WLCG mesh here. Note the square of GridPP sites towards the bottom right. Red squares represent throughput of less than 500 Mb/s.

Tuesday 15th April

  • New LiveCD and LiveUSB images are now available containing the latest openssl packages (see email of 11th April).

Tuesday 8th April

  • Some discrepancies found in VOMS ports and listings between VOMSsnooper and the dashboard for ops. (15009 vs 15002.
  • Also noted WLCG VOMS changes. New VOMS servers are being introduced as notified in this broadcast.
Tickets

Monday 12th May 2014, 14.30 BST/

A mere 27 open tickets for the UK today.

NGI
https://ggus.eu/index.php?mode=ticket_info&ticket_id=101502 (24/2)
The ILC cvmfs ticket. Only Durham is left (I actually missed out Durham last few times I looked at this ticket). So it's all on you Durham chaps now. No pressure (except, there is a little bit). In Progress (7/5)

SUSSEX
https://ggus.eu/index.php?mode=ticket_info&ticket_id=102810 (28/3)
Sussex's EMI3 upgrade ticket. Matt's fighting the good fight, and hopes to have it all sorted soon. Let us know if you need a hand Matt! In Progress (8/5)

RALPP
https://ggus.eu/index.php?mode=ticket_info&ticket_id=105290 (9/5)
The ROD has spotted Glue2 Validation errors on the RALPP bdii. Chris B spotted the ticket, but no news. In progress (9/5)

BRISTOL
https://ggus.eu/index.php?mode=ticket_info&ticket_id=102205 (14/3)
Bristol's EMI3 ticket. Winnie has beaten the site-BDII into EMI3 shape and is visiting the same fate on their cream CEs and WNs, with one CE already converted and two more about to fall. Make sure you get the WNs too! On Hod (should really be In Progress) (12/5)

https://ggus.eu/index.php?mode=ticket_info&ticket_id=105189 (6/5)
LHCB jobs having some trouble at Bristol, Winnie thinks it's some dodgey nodes at fault and is working on it. Waiting to see if failure continue. Waiting for Reply (7/5)

GLASGOW
https://ggus.eu/index.php?mode=ticket_info&ticket_id=101565 (26/2)
Publishing Max CPU time for LHCB. I believe that we've left it with LHCB asking that it be set to "a value that is obviously made up but isn't the default value" (although I could have the wrong end of the mace here). Been on hold for a while, so we probably want to make some kind of ruling. On Hold (8/4)

EDINBURGH
https://ggus.eu/index.php?mode=ticket_info&ticket_id=95303 (1/7/2013)
glexec ticket. No news here - sorry. On Hold (27/1)

https://ggus.eu/index.php?mode=ticket_info&ticket_id=102201 (14/3)
The ECDF EMI3 upgrade ticket. Had some problems with a lingering ghost of their previous site-BDII, but hopefully time has exorcised that gremlin and the new EMI3 CE will be seen too, which just leaves one straggler to be dealt with. On Hold (should probably be In Progress) (9/5)

https://ggus.eu/index.php?mode=ticket_info&ticket_id=105267 (8/5)
The other ECDF EMI3 upgrade ticket. Actually this only got submitted by Daniela to satisfy the dashboard demons, probably as you can't physically lift the ROD dashboard to throw it out of the window and shut it up that way. On Hold (12/5)

DURHAM
https://ggus.eu/index.php?mode=ticket_info&ticket_id=103722 (14/4)
Durham's EMI3 upgrade ticket. Daniela has extended the ticket to the zeroth hour. Let us know if you chaps get stuck on anything, but it looks like you have the upper hand. In Progress (2/5)

SHEFFIELD
https://ggus.eu/index.php?mode=ticket_info&ticket_id=105090 (2/5)
Sheffield had some CE nagios failures, but it looks like that storm has passed, with nothing but green as far as the eye can see on the nagios pages. Elena asks if she can close the ticket (i.e. has the alarm disappeared from the dashboard?). Waiting for reply (12/5)

LIVERPOOL
https://ggus.eu/index.php?mode=ticket_info&ticket_id=105299 (9/5)
Liverpool also have received a ROD ticket, this time of the Glue2 validation variety. Steve has set it in progress. In Progress (9/5)

LANCASTER
https://ggus.eu/index.php?mode=ticket_info&ticket_id=95299 (1/7/2013)
Lancaster's glexec ticket. No news I'm afraid. On Hold (4/4)

https://ggus.eu/index.php?mode=ticket_info&ticket_id=100566 (27/1)
Lancaster's PerfSonar sucking. Duncan has suggested a reinstall, and noticed spikes of goodness. A reinstall has been put on the todo list. On Hold (12/5)

UCL
https://ggus.eu/index.php?mode=ticket_info&ticket_id=102193 (14/3)
UCL's EMI3 upgrade ticket. Quiet, but Ben had scheduled the date for the upgrade as the 13th. Hopefully we'll hear positive news from him shortly. On Hold (30/4).

https://ggus.eu/index.php?mode=ticket_info&ticket_id=101285 (16/2)
UCL's perfsonar host carking it. And last work Ben had brought it back from the great beyond and hoped to have a reinstall done on the 30/4. No word since though. On Hold (28/4)

https://ggus.eu/index.php?mode=ticket_info&ticket_id=95298 (1/7/13)
UCL's glexec ticket. Ben mentions a new chap being deputised, and that this will likely have to wait until then. On Hold (16/4)

https://ggus.eu/index.php?mode=ticket_info&ticket_id=104824 (22/4)
Nagios ticket due to low site availability, caused by a period of outdated CA RPMs. Just waiting for the numbers to pick up again. In progress (6/5)

QMUL
https://ggus.eu/index.php?mode=ticket_info&ticket_id=103028 (6/4)
A much talked about (and right so) atlas ticket, about job failures at QM essentially due to atlas jobs not requesting the right amount of RAM. There's a question from atlas "if all the questions have been answered". Have they? In Progress (8/5)

BRUNEL
https://ggus.eu/index.php?mode=ticket_info&ticket_id=105324 (12/5)
Brunel are having some bother with their APEL publishing, it looks like there's a lot of missing data. In progress (12/5)

TIER 1
https://ggus.eu/index.php?mode=ticket_info&ticket_id=105161 (5/5)
Hone noticed their jobs in the ready status for a long time whilst submitted through the RAL WMSeses. Catalin has been engaging with Alexander to debug the issue. Waiting for reply (12/5)

https://ggus.eu/index.php?mode=ticket_info&ticket_id=105100 (2/5)
CMS have embarked on their next Storage Consistency Check. Andrew closed the ticket after providing the desired information, but CMS have reopened (wanting to keep the ticket to track the SCC). Reopened (needs to be put In Progress or On Hold) (12/5)

https://ggus.eu/index.php?mode=ticket_info&ticket_id=98249 (21/10/13)
cvmfs for Sno+. Things have picked up pace on this ticket, with Matt M ready to kick off the uploading the Sno+ tarball. Catalin has tweaked the web access to allow him to do so. Waiting for reply (12/5)

https://ggus.eu/index.php?mode=ticket_info&ticket_id=105308 (11/5)
Atlas MCORE jobs failing with "Failed to open shared memory object: Permission denied". RAL team are looking at it. In progress (12/5)

EFDA-JET
https://ggus.eu/index.php?mode=ticket_info&ticket_id=97485 (21/9/2013)
Longstanding LHCB authentication problem at JET. The Jet admins have exhausted all their ideas, and have asked for any help. As the problem survived the upgrades to SL6 and EMI3 it's probably something specific with their setup. On Hold (25/4)

Tools - MyEGI Nagios

Monday 17th March

Tuesday 26th November

  • Regional Nagios updated to release 22. It is a glite to UMD update and it required a fresh installation.
  • There have been some internal changes in SAM-Nagios. Test probes are now the responsibility of product team. Some test names have been changed as a result of this reorganization. For example the org.sam.CREAMCE-DirectJobSubmit test has become emi.cream.CREAMCE-DirectJobSubmit. This does not affect the operational activities.
  • Please could all site admins look at services associated to their site and please mail Kashif if anything odd is noticed. Site admins can reschedule tests for their sites and it would be helpful if most functionalities are tested.
  • Also, look at myegi which can be useful with links to the Dashboard, GSTAT, Accounting Portal and GGUS.
VOs - GridPP VOMS VO IDs Approved VO table

Tuesday 15th April

  • Is there interest in an FTS3 web front end? (more details)

Monday 17 February 2014

  • Proxy renewal
    • All RAL WMSs now renew proxies with 1024 bits. This looks like the end of this (at last).


Tuesday 11 February 2014

  • Proxy renewal
    • lcgwms06 at RAL has been upgraded and works
    • Both Imperial's WMSs work
    • Glasgow's will still need to be upgraded (unless they have been since Friday).
Site Updates


Meeting Summaries
Project Management Board - MembersMinutes Quarterly Reports

Empty

GridPP ops meeting - Agendas Actions Core Tasks

Empty


RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) Agenda Meeting takes place on Vidyo.

Wednesday 7th May 2014

  • Operations report
  • Currently testing CVMFS client 2.1.19.
  • In process of scheduling Castor 2.1.14 upgrade. Proposed date for Nameserver upgrade: Wednesday 28th May.
  • We are proposing to turn off the CREAM CEs.
  • Reminder: The software server used by the small VOs will be withdrawn from service (aiming for June).
WLCG Grid Deployment Board - Agendas MB agendas

Empty



NGI UK - Homepage CA

Empty

Events
UK ATLAS - Shifter view News & Links

Empty

UK CMS

Empty

UK LHCb

Empty

UK OTHER
  • N/A
To note

  • N/A