https://www.gridpp.ac.uk/w/api.php?action=feedcontributions&user=Matthew+Doidge+1ac9bd3994&feedformat=atomGridPP Wiki - User contributions [en]2024-03-29T00:21:37ZUser contributionsMediaWiki 1.22.0https://www.gridpp.ac.uk/wiki/Batch_system_statusBatch system status2015-12-01T11:41:30Z<p>Matthew Doidge 1ac9bd3994: /* Sites batch system status */</p>
<hr />
<div>== Other links ==<br />
<br />
* [https://twiki.cern.ch/twiki/bin/view/LCG/BatchSystemComparison Batch System Comparison Table]<br />
<br />
== Sites batch system status == <br />
<br />
This page has been setup to collect information from GridPP sites regarding their batch systems in February 2014. The information will help with wider considerations and strategy. The table seeks the following:<br />
<br />
# Current product (local/shared) - what is the current batch system at the site. Is it locally managed or shared with other groups?<br />
# Concerns - has your site experienced any problems with the batch system in operation?<br />
# Interest/Investigating/Testing - Does your site already have plans to change and if so to what. If not are you actively investigating or testing any alternatives?<br />
# CE type(s) - What CE type (gLite, ARC...) do you currently run and do you plan to change this, perhaps in conjunction with a batch system move?<br />
# glExec/pilot support for all VOs - do you have glExec and pilot pool accounts for all VOs, as opposed to just the LHC VOs? Used for the move to a Dirac WMS.<br />
# Cloud interface(s)? - Does your site offer access to resources in ways other than via a CE? (See [[Cloud & VM status]] for more up-to-date / detailed information)<br />
# Multicore status for ATLAS and CMS<br />
## ATLAS [http://tinyurl.com/mclkmfq multicore jobs history for UK sites] <br />
# Notes - Any other information you wish to share on this topic.<br />
<br />
<br />
<br />
<br />
{|border="1" cellpadding="1"<br />
|+<br />
<br />
|-style="background:#7C8AAF;color:white"<br />
|Site<br />
|Current product (local/shared)<br />
|Concerns and observations<br />
|Interest/Investigating/Testing<br />
|CE type(s) & plans at site<br />
|Pilots for all<br />
|cgroups <br />
|Multicore Atlas/CMS<br />
|Cloud interface available/plans<br />
|Notes<br />
<br />
|-<br />
|RAL Tier-1<br />
|<span style="color:green">HTCondor (local)</span><br />
|<span style="color:green">None</span><br />
|<span style="color:green">No reason</span><br />
|<span style="color:green">ARC</span><br />
|<span style="color:green"></span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">OpenNebula</span><br />
|<br />
<br />
|-<br />
|UKI-LT2-Brunel<br />
|<span style="color:green">Torque/Maui, Arc/Condor</span><br />
|<span style="color:green">No support for Torque/Maui</span><br />
|<span style="color:green">Slurm and HTCondor in test</span><br />
|<span style="color:green">Arc in test</span><br />
|<span style="color:green"></span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">OpenVZ in production, Docker in test</span><br />
|<br />
<br />
|-<br />
|UKI-LT2-IC-HEP<br />
|<span style="color:green">Gridengine (local)</span><br />
|<span style="color:green">None</span><br />
|<span style="color:green">No reason</span><br />
|<span style="color:green">CREAM, ARC</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:black">No</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">GridPP Cloud Tests</span><br />
|<br />
<br />
<br />
|-<br />
|UKI-LT2-QMUL<br />
|<span style="color:green">Gridengine (local)</span><br />
|<span style="color:green">None</span><br />
|<span style="color:green">SLURM</span><br />
|<span style="color:green">CREAM</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:black">No</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Deploy cloudstack, find scalable solution to get our storage usable in the cloud</span><br />
|<br />
<br />
|-<br />
|UKI-LT2-RHUL<br />
||<span style="color:green">Torque/Maui (local)</span><br />
|<span style="color:green">Torque/Maui support non-existent</span><br />
|<span style="color:green">Will follow the consensus</span><br />
|<span style="color:green">CREAM</span><br />
|<span style="color:green"></span><br />
|<span style="color:black">No</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green"></span><br />
|<br />
<br />
<br />
|-<br />
|UKI-LT2-UCL-HEP<br />
||<span style="color:green">Torque/Maui (local)</span><br />
|<span style="color:green">Torque/Maui support non-existent</span><br />
|<span style="color:green">HTCondor</span><br />
|<span style="color:green">CREAM CE</span><br />
|<span style="color:green"></span><br />
|<span style="color:black">No</span><br />
|<span style="color:red">X</span><br />
|<span style="color:green"></span><br />
|<br />
<br />
<br />
|-<br />
|UKI-NORTHGRID-LANCS-HEP<br />
||<span style="color:green">Son of Gridengine (HEC)</span><br />
|<span style="color:green">Torque/Maui clusterDecommissioned, for for grid and local (tier 3)</span><br />
|<span style="color:green">Sticking with grid engine</span><br />
|<span style="color:green">CREAM, moving to ARC eventually</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:red">No</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">VMWare testing; Vac in production</span><br />
|<br />
<br />
|-<br />
|UKI-NORTHGRID-LIV-HEP <span style="color:blue">(Single core cluster)</span><br />
|<span style="color:green">Torque Maui (local)</span><br />
|<span style="color:green">Poor Support, Maui intrinsically broken</span><br />
|<span style="color:green"> </span><br />
|<span style="color:green">Cream</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:black">No</span><br />
|<span style="color:green">No</span><br />
|<span style="color:green">None</span><br />
|<br />
<br />
|-<br />
|UKI-NORTHGRID-LIV-HEP <span style="color:blue">(Multi core cluster)</span><br />
|<span style="color:green">HTCondor (local)</span><br />
|<span style="color:green">None</span><br />
|<span style="color:green"></span><br />
|<span style="color:green">ARC</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:orange">Loooking into it</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">None</span><br />
|<br />
<br />
<br />
|-<br />
|UKI-NORTHGRID-MAN-HEP<br />
|<span style="color:green">Torque/Maui (local)</span><br />
|<span style="color:green">Maui is unsupported. It had memory leaks. Robert wrote a patch and there was nowhere to feed it back into.</span><br />
|<span style="color:green">HTCondor</span><br />
|<span style="color:green">Currently CREAM, investigating ARC-CE</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:orange">Looking into it</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Vac in production</span><br />
|<br />
<br />
<br />
|-<br />
|UKI-NORTHGRID-SHEF-HEP<br />
|<span style="color:green">Torque/Maui (local)</span><br />
|<span style="color:green">Torque/Maui support non-existent</span><br />
|<span style="color:green">HTCondor is in testing mode</span><br />
|<span style="color:green">CREAM CE, ACR CE is in test</span><br />
|<span style="color:green"></span><br />
|<span style="color:black">No</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green"></span><br />
|<br />
<br />
<br />
|-<br />
|UKI-SCOTGRID-DURHAM<br />
|<span style="color:green">SLURM (local)</span><br />
|<span style="color:green"></span><br />
|<span style="color:green">No reason</span><br />
|<span style="color:green">ARC CE</span><br />
|<span style="color:green"></span><br />
|<span style="color:black">No</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">N/A</span><br />
|<br />
<br />
<br />
|-<br />
|UKI-SCOTGRID-ECDF<br />
|<span style="color:green">Gridengine</span><br />
|<span style="color:green">None</span><br />
|<span style="color:green">No reason</span><br />
|<span style="color:green">Cream CE for standard production, ARC CE for exploratory HPC work</span><br />
|<span style="color:green"></span><br />
|<span style="color:black">No</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green"></span><br />
|<br />
<br />
<br />
|-<br />
|UKI-SCOTGRID-GLASGOW<br />
|<span style="color:green"> HTcondor (local), Torque/Maui (local)</span><br />
|<span style="color:green">Becomes unresponsive at times of high load or nodes being un-contactable.</span><br />
|<span style="color:green">Investigating HTCondor/SoGE/SLURM as a replacement.</span><br />
|<span style="color:green">ARC, Cream</span><br />
|<span style="color:green"></span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">N/A</span><br />
|<br />
<br />
|-<br />
|UKI-SOUTHGRID-BHAM-HEP<br />
||<span style="color:green">Torque/Maui</span><br />
|<span style="color:green">Maui sometimes fails to see new jobs and so nothing is scheduled</span><br />
|<span style="color:green">HTCondor</span><br />
|<span style="color:green">CREAM</span><br />
|<span style="color:green"></span><br />
|<span style="color:black">No</span><br />
|<span style="color:green">No</span><br />
|<span style="color:green">Testing Vac setup</span><br />
|<br />
<br />
<br />
|-<br />
|UKI-SOUTHGRID-BRIS<br />
|<span style="color:green">HTCondor (shared), torque + maui (local)</span><br />
|<span style="color:green">None</span><br />
|<span style="color:green">No reason</span><br />
|<span style="color:green">ARC & CREAM CEs</span><br />
|<span style="color:green"></span><br />
|<span style="color:black">No</span><br />
|<span style="color:red">X</span><br />
|<span style="color:green"></span><br />
|<br />
<br />
<br />
|-<br />
|UKI-SOUTHGRID-CAM-HEP<br />
|<span style="color:green">Torque/Maui (local)</span><br />
|<span style="color:green">Torque/Maui support non-existent</span><br />
|<span style="color:green">Will follow the consensus</span><br />
|<span style="color:green">CREAM CE</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:black">No</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">None at present</span><br />
|<br />
<br />
|-<br />
|UKI-SOUTHGRID-OX-HEP<br />
|<span style="color:green">HTCondor (local)</span><br />
|<span style="color:green">None</span><br />
|<span style="color:green">No reason</span><br />
|<span style="color:green">ARC CE in production</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">OpenStack in production. Testing VAC</span><br />
|<br />
<br />
<br />
|-<br />
|UKI-SOUTHGRID-RALPP<br />
|<span style="color:green">HTCondor</span><br />
|<span style="color:green">None</span><br />
|<span style="color:green">No reason</span><br />
|<span style="color:green">ARC CE</span><br />
|<span style="color:green"></span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green"></span><br />
|<br />
<br />
<br />
|-<br />
|UKI-SOUTHGRID-SUSX<br />
|<span style="color:green">(Shared) Gridengine - (Univa Grid Engine)</span><br />
|<span style="color:green">None</span><br />
|<span style="color:green">No reason</span><br />
|<span style="color:green">CREAMCE</span><br />
|<span style="color:green"></span><br />
|<span style="color:orange">Looking into it</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green"></span><br />
|<br />
<br />
<br />
|}<br />
<br />
[[Category:Multicore]]<br />
[[Category:Batch System]]</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Batch_system_statusBatch system status2015-12-01T11:40:56Z<p>Matthew Doidge 1ac9bd3994: /* Sites batch system status */</p>
<hr />
<div>== Other links ==<br />
<br />
* [https://twiki.cern.ch/twiki/bin/view/LCG/BatchSystemComparison Batch System Comparison Table]<br />
<br />
== Sites batch system status == <br />
<br />
This page has been setup to collect information from GridPP sites regarding their batch systems in February 2014. The information will help with wider considerations and strategy. The table seeks the following:<br />
<br />
# Current product (local/shared) - what is the current batch system at the site. Is it locally managed or shared with other groups?<br />
# Concerns - has your site experienced any problems with the batch system in operation?<br />
# Interest/Investigating/Testing - Does your site already have plans to change and if so to what. If not are you actively investigating or testing any alternatives?<br />
# CE type(s) - What CE type (gLite, ARC...) do you currently run and do you plan to change this, perhaps in conjunction with a batch system move?<br />
# glExec/pilot support for all VOs - do you have glExec and pilot pool accounts for all VOs, as opposed to just the LHC VOs? Used for the move to a Dirac WMS.<br />
# Cloud interface(s)? - Does your site offer access to resources in ways other than via a CE? (See [[Cloud & VM status]] for more up-to-date / detailed information)<br />
# Multicore status for ATLAS and CMS<br />
## ATLAS [http://tinyurl.com/mclkmfq multicore jobs history for UK sites] <br />
# Notes - Any other information you wish to share on this topic.<br />
<br />
<br />
<br />
<br />
{|border="1" cellpadding="1"<br />
|+<br />
<br />
|-style="background:#7C8AAF;color:white"<br />
|Site<br />
|Current product (local/shared)<br />
|Concerns and observations<br />
|Interest/Investigating/Testing<br />
|CE type(s) & plans at site<br />
|Pilots for all<br />
|cgroups <br />
|Multicore Atlas/CMS<br />
|Cloud interface available/plans<br />
|Notes<br />
<br />
|-<br />
|RAL Tier-1<br />
|<span style="color:green">HTCondor (local)</span><br />
|<span style="color:green">None</span><br />
|<span style="color:green">No reason</span><br />
|<span style="color:green">ARC</span><br />
|<span style="color:green"></span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">OpenNebula</span><br />
|<br />
<br />
|-<br />
|UKI-LT2-Brunel<br />
|<span style="color:green">Torque/Maui, Arc/Condor</span><br />
|<span style="color:green">No support for Torque/Maui</span><br />
|<span style="color:green">Slurm and HTCondor in test</span><br />
|<span style="color:green">Arc in test</span><br />
|<span style="color:green"></span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">OpenVZ in production, Docker in test</span><br />
|<br />
<br />
|-<br />
|UKI-LT2-IC-HEP<br />
|<span style="color:green">Gridengine (local)</span><br />
|<span style="color:green">None</span><br />
|<span style="color:green">No reason</span><br />
|<span style="color:green">CREAM, ARC</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:black">No</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">GridPP Cloud Tests</span><br />
|<br />
<br />
<br />
|-<br />
|UKI-LT2-QMUL<br />
|<span style="color:green">Gridengine (local)</span><br />
|<span style="color:green">None</span><br />
|<span style="color:green">SLURM</span><br />
|<span style="color:green">CREAM</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:black">No</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Deploy cloudstack, find scalable solution to get our storage usable in the cloud</span><br />
|<br />
<br />
|-<br />
|UKI-LT2-RHUL<br />
||<span style="color:green">Torque/Maui (local)</span><br />
|<span style="color:green">Torque/Maui support non-existent</span><br />
|<span style="color:green">Will follow the consensus</span><br />
|<span style="color:green">CREAM</span><br />
|<span style="color:green"></span><br />
|<span style="color:black">No</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green"></span><br />
|<br />
<br />
<br />
|-<br />
|UKI-LT2-UCL-HEP<br />
||<span style="color:green">Torque/Maui (local)</span><br />
|<span style="color:green">Torque/Maui support non-existent</span><br />
|<span style="color:green">HTCondor</span><br />
|<span style="color:green">CREAM CE</span><br />
|<span style="color:green"></span><br />
|<span style="color:black">No</span><br />
|<span style="color:red">X</span><br />
|<span style="color:green"></span><br />
|<br />
<br />
<br />
|-<br />
|UKI-NORTHGRID-LANCS-HEP<br />
||<span style="color:green">Son of Gridengine (HEC)</span><br />
|<span style="color:green">Torque/Maui clusterDecommissioned, for for grid and local (tier 3)</span><br />
|<span style="color:green">Sticking with grid engine</span><br />
|<span style="color:green">CREAM, moving to ARC eventually</span><br />
|<span style="color:green"></span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">VMWare testing; Vac in production</span><br />
|<br />
<br />
|-<br />
|UKI-NORTHGRID-LIV-HEP <span style="color:blue">(Single core cluster)</span><br />
|<span style="color:green">Torque Maui (local)</span><br />
|<span style="color:green">Poor Support, Maui intrinsically broken</span><br />
|<span style="color:green"> </span><br />
|<span style="color:green">Cream</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:black">No</span><br />
|<span style="color:green">No</span><br />
|<span style="color:green">None</span><br />
|<br />
<br />
|-<br />
|UKI-NORTHGRID-LIV-HEP <span style="color:blue">(Multi core cluster)</span><br />
|<span style="color:green">HTCondor (local)</span><br />
|<span style="color:green">None</span><br />
|<span style="color:green"></span><br />
|<span style="color:green">ARC</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:orange">Loooking into it</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">None</span><br />
|<br />
<br />
<br />
|-<br />
|UKI-NORTHGRID-MAN-HEP<br />
|<span style="color:green">Torque/Maui (local)</span><br />
|<span style="color:green">Maui is unsupported. It had memory leaks. Robert wrote a patch and there was nowhere to feed it back into.</span><br />
|<span style="color:green">HTCondor</span><br />
|<span style="color:green">Currently CREAM, investigating ARC-CE</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:orange">Looking into it</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Vac in production</span><br />
|<br />
<br />
<br />
|-<br />
|UKI-NORTHGRID-SHEF-HEP<br />
|<span style="color:green">Torque/Maui (local)</span><br />
|<span style="color:green">Torque/Maui support non-existent</span><br />
|<span style="color:green">HTCondor is in testing mode</span><br />
|<span style="color:green">CREAM CE, ACR CE is in test</span><br />
|<span style="color:green"></span><br />
|<span style="color:black">No</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green"></span><br />
|<br />
<br />
<br />
|-<br />
|UKI-SCOTGRID-DURHAM<br />
|<span style="color:green">SLURM (local)</span><br />
|<span style="color:green"></span><br />
|<span style="color:green">No reason</span><br />
|<span style="color:green">ARC CE</span><br />
|<span style="color:green"></span><br />
|<span style="color:black">No</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">N/A</span><br />
|<br />
<br />
<br />
|-<br />
|UKI-SCOTGRID-ECDF<br />
|<span style="color:green">Gridengine</span><br />
|<span style="color:green">None</span><br />
|<span style="color:green">No reason</span><br />
|<span style="color:green">Cream CE for standard production, ARC CE for exploratory HPC work</span><br />
|<span style="color:green"></span><br />
|<span style="color:black">No</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green"></span><br />
|<br />
<br />
<br />
|-<br />
|UKI-SCOTGRID-GLASGOW<br />
|<span style="color:green"> HTcondor (local), Torque/Maui (local)</span><br />
|<span style="color:green">Becomes unresponsive at times of high load or nodes being un-contactable.</span><br />
|<span style="color:green">Investigating HTCondor/SoGE/SLURM as a replacement.</span><br />
|<span style="color:green">ARC, Cream</span><br />
|<span style="color:green"></span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">N/A</span><br />
|<br />
<br />
|-<br />
|UKI-SOUTHGRID-BHAM-HEP<br />
||<span style="color:green">Torque/Maui</span><br />
|<span style="color:green">Maui sometimes fails to see new jobs and so nothing is scheduled</span><br />
|<span style="color:green">HTCondor</span><br />
|<span style="color:green">CREAM</span><br />
|<span style="color:green"></span><br />
|<span style="color:black">No</span><br />
|<span style="color:green">No</span><br />
|<span style="color:green">Testing Vac setup</span><br />
|<br />
<br />
<br />
|-<br />
|UKI-SOUTHGRID-BRIS<br />
|<span style="color:green">HTCondor (shared), torque + maui (local)</span><br />
|<span style="color:green">None</span><br />
|<span style="color:green">No reason</span><br />
|<span style="color:green">ARC & CREAM CEs</span><br />
|<span style="color:green"></span><br />
|<span style="color:black">No</span><br />
|<span style="color:red">X</span><br />
|<span style="color:green"></span><br />
|<br />
<br />
<br />
|-<br />
|UKI-SOUTHGRID-CAM-HEP<br />
|<span style="color:green">Torque/Maui (local)</span><br />
|<span style="color:green">Torque/Maui support non-existent</span><br />
|<span style="color:green">Will follow the consensus</span><br />
|<span style="color:green">CREAM CE</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:black">No</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">None at present</span><br />
|<br />
<br />
|-<br />
|UKI-SOUTHGRID-OX-HEP<br />
|<span style="color:green">HTCondor (local)</span><br />
|<span style="color:green">None</span><br />
|<span style="color:green">No reason</span><br />
|<span style="color:green">ARC CE in production</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">OpenStack in production. Testing VAC</span><br />
|<br />
<br />
<br />
|-<br />
|UKI-SOUTHGRID-RALPP<br />
|<span style="color:green">HTCondor</span><br />
|<span style="color:green">None</span><br />
|<span style="color:green">No reason</span><br />
|<span style="color:green">ARC CE</span><br />
|<span style="color:green"></span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green"></span><br />
|<br />
<br />
<br />
|-<br />
|UKI-SOUTHGRID-SUSX<br />
|<span style="color:green">(Shared) Gridengine - (Univa Grid Engine)</span><br />
|<span style="color:green">None</span><br />
|<span style="color:green">No reason</span><br />
|<span style="color:green">CREAMCE</span><br />
|<span style="color:green"></span><br />
|<span style="color:orange">Looking into it</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green"></span><br />
|<br />
<br />
<br />
|}<br />
<br />
[[Category:Multicore]]<br />
[[Category:Batch System]]</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Operations_Bulletin_LatestOperations Bulletin Latest2015-12-01T10:25:09Z<p>Matthew Doidge 1ac9bd3994: /* */</p>
<hr />
<div>[[Operations_Bulletin_Archive|Bulletin archive]]<br />
<br />
__NOTOC__<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Week commencing 30th November 2015<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Task Areas<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start General****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | General updates <br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start General text*********************** ----->'''<br />
'''Tuesday 1st December'''<br />
* Heavy ion running started last week.<br />
* Enabling pilots for local VOs - what is the status?<br />
* EGI is looking for document reviewers ([ https://wiki.egi.eu/wiki/EGI-Engage:Deliverables_and_Milestones Further information]).<br />
* Note the following EGI-Engage deliverables this month:<br />
** D 4.2 [https://documents.egi.eu/document/2643 VM snapshot support: OCCI extension, final specification]<br />
** D 4.3 [https://documents.egi.eu/document/2644 Resource template changes: OCCI extension, final specification]<br />
** D 6.1 [https://documents.egi.eu/document/2647 Assisted pattern recognition tools integrated with EGI for citizen science]<br />
<br />
* Switch over of the GridPP website. Soft launch this week on Tuesday with some unavailability expected during the afternoon.<br />
* The next WLCG GDB is on Wednesday 9th December. [ https://indico.cern.ch/event/319754/ Agenda].<br />
* Change of HEPSYSMAN dates due to Manchester room availability. Now anticipate 13th/14th.<br />
* [https://www.gridpp.ac.uk/wiki/User_Interface_%28UI%29_to_support_approved_VOs Installing a UI on SL.]<br />
* Core ops members - please provide your task list for Q3 15!<br />
* EGI: How many apel-publishers on sl5 in your NGI? (Instances in GOC-DB registered as glite-APEL.)<br />
* EGI: What is your WMS usage?<br />
* The dilemma: foreman/puppet or ansible?<br />
* Release date for GOCDB v5.5 is Wed 2nd December.<br />
<br />
<br />
<br />
'''Tuesday 24th November'''<br />
* Some ATLAS sites (as of Monday) still had PRODDISK listed in the BDII.<br />
* ATLAS DPM 'dump' - a list of files?<br />
* An agenda for December's GDB is developing - [https://indico.cern.ch/event/319754/ see here]. Tier-2 representation has wained recently. Perhaps someone attending the DPM meeting could attend the GDB too?<br />
* [https://www.gridpp.ac.uk/wiki/Example_Build_of_an_ARC/Condor_Cluster#Patch_for_BDII_Job_Count_Breakdown Andrew's BDII patch] used in relation to LSST job breakdown issues.<br />
* [https://www.gridpp.ac.uk/wiki/User_Interface_%28UI%29_to_support_approved_VOs Installing a UI on SL6].<br />
<br />
'''Tuesday 17th November'''<br />
* The [https://vm36.tier2.hep.manchester.ac.uk/users/case-studies/galdyn/ GalDyn case study] has been written up<br />
* There is now a job adverts section on the new GridPP website. Now we need a process to populate the area....<br />
* There is a summary available of the [https://indico.cern.ch/event/401680 LHCOPN-LHCONE meeting in Amsterdam on the 28-29 of October 2015].<br />
* Ewan: Pilot accounts for everyone<br />
* Welcome to Markus Ebert who has started at Edinburgh (designated GridPP/LSST DAC person).<br />
<br />
<!-- **********************End General text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End General****************** -----><br />
<!-- ****************Start ops coord****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Operations Coordination - [https://indico.cern.ch/categoryDisplay.py?categId=4372 Agendas][https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsCoordination Wiki Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start ops coord text*********************** -----><br />
'''Monday 30th November'''<br />
* [https://espace.cern.ch/WLCG-document-repository/Technical_Documents/WLCGFutureISUseCases_1.3.pdf An Information System future use cases document has been produced].<br />
<br />
'''Tuesday 24th November'''<br />
* There was a WLCG ops coordination meeting last Thursday: [https://indico.cern.ch/event/393620/ Agenda] | [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151119 Minutes].<br />
* Baselines: dCache 2.6.x decommissioned deadline was end of September.<br />
* News: Critical Vulnerability broadcast by SVG on [https://wiki.egi.eu/wiki/SVG:Advisory-SVG-2015-CVE-2015-7183 Friday 06 affecting NSS].<br />
* T0 news: IPv6 enabled in MyProxy and VOMS for testing purposes, in dual-stack mode (IPv4 and IPv6). LSF 9 upgrade of the WNs is in QA testing. <br />
* ALICE: Good activity. Preparations for heavy ion reco jobs. At CERN ALICE jobs can request 2 cores and hence have twice the memory but with an impact on efficiency.<br />
* ATLAS: new record in parallel running slots: 250k. Some mc15b campaign jobs were requesting an excessive amount of conditions data. Deletion agents: deletion agents were switched off for a few days for operational reasons but once on again struggled to keep up.PRODDISK has been decommissioned on all the Tier2s.<br />
* CMS: High loads:~120k parallel jobs. Multi-billion events MC RECO campaign ahead.<br />
* LHCb: Very high activity. Several days of failures at SARA when srm was overloaded by a local user. Working on interface to HTCondor-CE.<br />
* glexec TF: NTR<br />
* HTTP TF: [ https://indico.cern.ch/event/459419 Meeting on 5th November]. Have a working Nagios probe, endpoint lists from the experiments, regular monitoring of the infrastructure (see links on agenda) and a GGUS support unit.<br />
* IS evolution: The first draft of the Future Use Cases document is now available for comments. Deadline to provide input is on 24th November. There was a [https://indico.cern.ch/event/454975/attachments/1188757/1724809/ISTF-minutes-12112015.pdf TF meeting on 12th November]. Looking at options for publishing a subset of the current GLUE schema that is useful for WLCG in JSON/HTTPS.<br />
* IPv6: NTR<br />
* MW readiness: Tests reported for EOS, DPM, dCache, BDII, MW readiness app. Next meeting 2nd December.<br />
* Multicore: NTR<br />
* Network and transfer metrics: perfSONAR collector, datastore, publisher and dashboard in production (stable operations). ALL sites are encouraged to enable auto-updates for perfSONAR. Pilots: ATLAS Panda, perfSONAR stream now in ATLAS Network Analytics. LHCb DIRAC bridge is now functional.<br />
* RFC proxies: NTR<br />
* Squid monitoring and proxy discovery: NTR.<br />
* AOB: Andrew McNab will take over the coordination of the Machine/Job Features task force.<br />
<br />
<br />
'''Tuesday 17th November'''<br />
* There is a [https://indico.cern.ch/event/393620/ WLCG ops coordination meeting this week] on Thursday.<br />
<br />
'''Tuesday 10th November'''<br />
* There was a WLCG ops coordination meeting last Thursday: [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151105 Minutes]. (Alessandra chaired and might be able to talk over the main items at our ops meeting).<br />
<br />
<!-- **********************End ops coord text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End ops coord****************** -----><br />
<br />
<!-- ****************Start T1****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-1 - [http://www.gridpp.rl.ac.uk/status/ Status Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start T1 text*********************** -----><br />
'''Tuesday 1st December'''<br />
A reminder that there is a weekly [http://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Tier-1 experiment liaison meeting]. Notes from the last meeting [https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2015-11-25 here]<br />
* We are investigating why LHCB batch jobs sometimes fail to write results back to Castor (and the sometimes fail to write remotely as well). A recent change has improved, but not fixed, this problem.<br />
* We are continuing with the detailed network changes needed to remove our old core switch from the network. We anticipate significant steps on the 8th and 9th December. <br />
* As reported last week, the tenders for the next round of Disk and CPU purchases are now out.<br />
* We have seen some increased data transfer rates to/from the Tier1. We have a plan to increase the bandwidth on the bypass link (to Tier2s) from 10 to 20 Gbit. <br />
* We are implementing a new algorithm for the draining of worker nodes to make space for multi-core jobs. The new version allows "pre-emptable" jobs (ie. jobs that can be stopped at short notice) to run in the job slots until they are needed.<br />
<!-- **********************End T1 text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End T1****************** -----><br />
<br />
<!-- ****************Start Storage & DM****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Storage & Data Management - [http://storage.esc.rl.ac.uk/weekly/ Agendas/Minutes]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20151104-minutes.txt Wednesday 18 Nov]'''<br />
* New member from Edinburgh!<br />
* Updates on PRODDISK cleanup at T2s, catalogue synchronisation (aka syncat), and what does "world readable" mean?<br />
* Summary of EGI community forum - interoperation and standards for moving data<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20151104-minutes.txt Wednesday 11 Nov]'''<br />
* Puppet for configuration<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20151104-minutes.txt Wednesday 04 Nov]'''<br />
* Brian presented GridPP achievements at the E2E workshop<br />
* Storage accounting with EGI<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20151028-minutes.txt Wednesday 28 Oct]'''<br />
* Summary of "UK T0" workshop - GridPP well represented<br />
* Sites should not upgrade to DPM 1.8.10 just yet<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150902-minutes.txt Wednesday 02 Sep]'''<br />
* Catch up with [http://www3.imperial.ac.uk/highenergyphysics/research/experiments/mice MICE]<br />
* How to do transfers of '''lots''' of files with FTS3 without the proxy timing out (in particularly if you need it vomsified)<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End Storage & DM****************** -----><br />
<!-- ****************Start T2EVO****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-2 Evolution - [https://its.cern.ch/jira/browse/GRIDPP/ GridPP JIRA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 24 Nov'''<br />
* Cloud Init demonstrated with modified version of (old) GridPP DIRAC VMs<br />
* Cloud Init support in Vac 0.20pre (GRIDPP-27)<br />
* Progress on VMs for new GridPP DIRAC service: multi-VO config currently preventing matching. (GRIDPP-9)<br />
<br />
'''Tuesday 17 Nov'''<br />
* New depo.gridpp.ac.uk service for uploading files to via HTTPS<br />
* ATLAS VMs now upload log files to depo.gridpp.ac.uk for debugging (GRIDPP-24)<br />
<br />
'''Monday 9 Nov'''<br />
* Vac 0.19.0 released last week: new VacQuery protocol<br />
* Prototyping Vcycle(/Vac) GLUE2 publishing: talk at WLCG InfoSys Evolution TF on Thursday<br />
* Please now use https://repo.gridpp.ac.uk/vacproject/ URLs for user_data and cernvm3.iso files (in preparation for move to new www.gridpp.ac.uk website.)<br />
* See Laurence's GDB talk "Helix Nebula Price Enquiry Results" for more use of Vcycle<br />
<br />
'''Tuesday 3 Nov'''<br />
* LHCb prototype of GOCDB pointers to resource BDII done<br />
* T2C tests at Oxford ongoing<br />
<br />
'''Tuesday 20 Oct'''<br />
* LHCb multipilot VMs now in production<br />
* Support for APEL-Sync records in Vac, but need to co-ordinate with APEL team to validate it. This is to allow pure-VM sites like UCL to pass APEL SAM tests (GRIDPP-10)<br />
* Last GridPP Technical Meeting decided to test disk-less operation at Oxford for CMS (GRIDPP-20) and LHCb (GRIDPP-21).<br />
<br />
'''Tuesday 6 Oct'''<br />
* UCL Vac site now running LHCb test of two payloads per dual processor VM. Total of dual processor VMs at UCL now 120.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End T2EVO****************** -----><br />
<!-- ****************Start Accounting****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Accounting - [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html UK Grid Metrics] [[HEPSPEC06]] [http://tinyurl.com/cfbfanf Atlas Dashboard HS06]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 24th November'''<br />
* Slight delay for Sheffield.<br />
<br />
'''Tuesday 3rd November'''<br />
* APEL delay (normal state) Lancaster and Sheffield.<br />
<br />
'''Tuesday 20th October'''<br />
The WLCG MB decided to create a Benchmarking Task force led by Helge Meinhard see [https://indico.cern.ch/event/433303/contribution/9/attachments/1154494/1658853/2015-09-15-LCGMB-Benchmarking.pdf talk]<br />
<br />
* [http://accounting.egi.eu/egi.php?SubRegion=1.67&query=normcpu&startYear=2014&startMonth=8&endYear=2014&endMonth=9&yRange=SITE&xRange=VO&voGroup=lhc&chart=GRBAR&scale=LIN&localJobs=onlygridjobs APEL status]: An issue at Sheffield?<br />
<br />
<!-- *****************Edit stop*********************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End Accounting****************** -----><br />
<!-- ****************Start Documentation****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Documentation - [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=area KeyDocs]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- *********************************************************** -----><br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 1st December'''<br />
* Sixt and hone have been removed from the GridPP list.<br />
<br />
'''Tuesday 24th November'''<br />
Steve J: problems with voms server at fnal, voms.fnal.gov, resolved. Approved VOs document updated with newest records for those VOs affected, CDF, DZERO, LSST. Also, note changes to CA_DN for PLANCK and CDF. https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs<br />
<br />
'''Friday 6th Nov, 2015'''<br />
<br />
SteveJ: Advice to admins published about a common GSS error, globus_gsi_callback_module: Could not verify <br />
credential etc.<br />
<br />
https://www.gridpp.ac.uk/wiki/Security_system_errors_and_workarounds<br />
<br />
'''Tuesday 20th October, 2015'''<br />
<br />
Approved VOs document updated with temporary section for [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs#Approved_VOs_in_the_process_of_being_established LZ]<br />
<br />
'''Tuesday 29nd September'''<br />
Steve J: problems with voms server at fnal, voms.fnal.gov, have been detected; I will resolve them soon and may issue an update to Approved VOs, alerting sites with TB_SUPPORT should that occur. Approved VOs potentially affected are CDF, DZERO, LSST. Please do not act act yet.<br />
<br />
<br />
'''General note'''<br />
<br />
See the [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=reviewed worst KeyDocs list] for documents needing review now and the names of the responsible people.<br />
<br />
|}<br />
<br />
<!-- ****************End Documentation****************** -----><br />
<!-- ****************Start Interop****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Interoperation - [https://wiki.egi.eu/wiki/Grid_Operations_Meetings EGI ops agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
''' Tuesday 10th November<br />
<br />
Meeting yesterday was cancelled in favour of OMB; next meeting scheduled for 14/12.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- *********************************************************** -----><br />
<br />
|}<br />
<!-- ****************End Interop****************** -----><br />
<!-- ****************Start Monitoring****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Monitoring - [https://www.gridpp.ac.uk/wiki/Links_Monitoring_pages Links] [http://grid-monitoring.cern.ch/mywlcg/ MyWLCG]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 16th June'''<br />
* F Melaccio & D Crooks decided to add a [https://www.gridpp.ac.uk/wiki/Monitoring_FAQs FAQs section] devoted to common monitoring issues under the monitoring page.<br />
* Feedback welcome. <br />
<br />
<br />
<br />
''' Tuesday 31st March<br />
<br />
* Glasgow Graphite/Grafana documentation: http://www.scotgrid.ac.uk/graphite/<br />
<br />
'''Monday 7th December<br />
<br />
* Meeting last Friday - agenda: https://indico.cern.ch/event/356853/ minutes: https://indico.cern.ch/event/356853/material/minutes/1.pdf<br />
* This was the wrap-up meeting of the consolidation TF; the mailing list will remain extant for a while yet.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Monitoring****************** -----><br />
<!-- ****************On-duty****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | On-duty - [https://operations-portal.in2p3.fr/dashboard Dashboard] [https://www.gridpp.ac.uk/wiki/ROD_rota ROD rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Monday 30th November'''<br />
* A quiet week, mostly just cloing alarms for not-in-production services. There are no open ROD tickets.<br />
<br />
'''Monday 23rd November'''<br />
* Still seeing the alarms for systems at QMUL and RHUL that are not in production.<br />
* Closed some tickets for the rolling availabilities. Confused by the rolling availability plots. It seems the "rolling average" period was cut back to 20 days. <br />
<br />
'''Tuesday 3rd November'''<br />
* The long-standing UCL availability alarm went green yesterday on 29th October. We are not sure why!<br />
* Quite a lot of activity on the dashboard this week, but only one or two new tickets. <br />
* Tickets: Five for availability / reliability: Sussex, Sheffield, Liverpool, Lancaster and UCL. Two for GLUE2 validation: Liverpool and QMUL. One for the CEs at QMU<br />
<br />
'''Tuesday 20th October'''<br />
<br />
Lots of bits here and there, but no big pattern. Tickets about CE and storage problems open at several sites. QMUL notable as going on for a while, probably with some kind of configuration problem they're not identifying.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Rollout [http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html Status] [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions WLCG Baseline]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 15th September'''<br />
* [http://repository.egi.eu/2015/09/10/release-umd-3-13-3/ UMD 3.13.3 is available].<br />
<br />
'''Tuesday 12th May'''<br />
* MW Readiness WG meeting Wed May 6th at 4pm. Attended by Raul, Matt, Sam and Jeremy.<br />
<br />
'''Tuesday 17th March'''<br />
* Daniela has updated the [ https://www.gridpp.ac.uk/wiki/Staged_rollout_emi3 EMI-3 testing table]. Please check it is correct for your site. We want a clear view of where we are contributing.<br />
* There is a middleware readiness meeting this Wednesday. Would be good if a few site representatives joined.<br />
* Machine job features solution testing. Fed back that we will only commence tests if more documentation made available. This stops the HTC solution until after CHEP. Is there interest in testing other batch systems? Raul mentioned SLURM. There is also SGE and Torque.<br />
<br />
'''References'''<br />
<br />
* Staged Rollout pages (now separated into EMI1 & 2), and the page listing the deployed versions is extractable from the bdii, so they should all be reasonably up-to-date:<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout_emi2.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End On-duty****************** -----><br />
<!-- ****************Start Security****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Security - [http://www.gridpp.ac.uk/security/inchand/Incident.html Incident Procedure] [http://www.gridpp.ac.uk/security/policies/index.html Policies] [https://www.gridpp.ac.uk/wiki/SD_rota Rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
'''Tuesday 2nd December'''<br />
* [https://wiki.egi.eu/wiki/SVG:Advisory-SVG-2015-9707 ADVISORY EGI-SVG-2015-9707] - updated 'Various Java CVE's with max CVSS score'<br />
* Updated IGTF distribution version 1.70 available - now available for download from the Repository (and mirrors) [https://dist.igtf.net/distribution/igtf/current/ https://dist.igtf.net/distribution/igtf/current/]<br />
<br />
'''Tuesday 24th November'''<br />
* Call on NGIs to participate in "Security Threat Risk Assessment - with Cloud Focus" work.<br />
* Check Pakiti for CVE-2015-7183 issues.<br />
<br />
'''Tuesday 17th November'''<br />
* [https://wiki.egi.eu/wiki/SVG:Advisory-SVG-2015-CVE-2015-7183 Advisory-SVG-2015-CVE-2015-7183] issued 06/11/2015: a few UK sites show as unpatched in [https://operations-portal.egi.eu/csiDashboard/ngi/NGI_UK/tab/sites/filter/operators/page/sites EGI monitoring]. WNs, as tested by the monitoring, may be less vulnerable than affected middleware services but they could be taken as an indication of general site readiness and sites are encouraged to check their status. <br />
<br />
The EGI [https://operations-portal.egi.eu/csiDashboard security dashboard].<br />
<br />
|}<br />
<!-- ****************End Security****************** -----><br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start Services****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Services - [http://psmad.grid.iu.edu/maddash-webui/ PerfSonar dashboard] | [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).<br />
<br />
'''Tuesday 6th October'''<br />
* A reminder, the next [https://indico.cern.ch/event/401680/ LHCOPN and LHCONE joint meeting]: Science Park Amsterdam (NL) 28-29 of October 2015<br />
<br />
'''Tuesday 14th July'''<br />
* GridPP35 in September will have a part focus on networking and IPv6. This will include a review of where sites are with their deployment. Please try to firm up dates for your IPv6 availability between now and September. Please update the [https://gridpp.ac.uk/wiki/IPv6_site_status GridPP IPv6 status table]. <br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Services****************** -----><br />
<!-- ****************Start Tickets****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tickets<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Monday 30th November 2015, 15.30 GMT'''<br />
<br />
33 Open Tickets this week. 14 are for the same Atlas request.<br />
<br />
'''GAZILLION ATLAS CONSISTENCY TICKETS'''<br /><br />
I won't go over all these - but ECDF, BIRMINGHAM and IMPERIAL are still just "assigned". As Ewan did with Oxford's ticket feel free to down the "Priority" for these tickets - these requests aren't urgent at all! A few sites are done already, a few are on holding their tickets till later.<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117740 117740] (20/11)<br /><br />
Atlas request to delete proddisk data which digressed a bit (after the original issue was confirmed as solved) but then stalled. Last check Brian asked for a list of files in ./users. In progress (23/11)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116864 116864] (12/10)<br /><br />
CMS xrootd tests failing. No news since CMS replied on the 20th. Are things churning along in the background? In progress (20/11)<br />
<br />
'''''Forgot about this one:'''''<br /><br />
'''GLASGOW'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=118052 118052] (30/11)<br /><br />
Atlas, for the "WLCG HTTP Deployment Task Force" have ticket Glasgow after their SE showed up on this [https://etf-atlas-dev.cern.ch/etf/check_mk/index.py?start_url=%2Fetf%2Fcheck_mk%2Fview.py%3Fview_name%3Dallhosts http probe page]. I notice a few other UK sites in the red on that page. In progress (1/12)<br />
<br />
I think that's it for "exciting" tickets - as always let me know if that's not the case.<br />
<br />
Not much news on the [https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO nagios] - it's looking pleasantly plain.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tickets****************** -----><br />
<!-- ****************Start Tools****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tools - [https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] [https://gridppnagios.physics.ox.ac.uk/nagios/ Nagios]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== ===== <br />
<!-- ******************Edit start********************* -----><br />
'''Monday 30th November'''<br />
* The SAM/ARGO team has created a [https://wiki.egi.eu/wiki/Service_Level_Target_-_Availability_Reliability document] describing Availability reliability calculation in ARGO tool.<br />
<br />
<br />
<br />
''' Tuesday 6 Oct 2015'''<br />
<br />
Moved Gridppnagios instance back to Oxford from Lancaster. It was kind of double whammy as both sites went down together. Fortunately Oxford site was partially working so we managed to start SAM Nagios at Oxford. Sam tests were unavailable for few hours but no affect on egi availibilty/reliability. Sites can have a look at https://mon.egi.eu/myegi/ss/ for a/r status. <br />
<br />
'''Tuesday 29 Sep 2015'''<br />
<br />
Following an air-conditioning problem at machine room in Oxford Tier-2 site on 26 September, gridppnagios(OX) was shut down and gridppnagios(Lancs) became active instance. Oxford site is in downtime until 1st Oct and it may be extended depending on the situation. <br />
VO-Nagios was also unavailable for two days but we have started it yesterday as it is running on a VM. VO-nagios is using oxford SE for replication test so it is failing those tests. I am looking to change to some other SE. <br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tools****************** -----><br />
<!-- ****************Start VOs****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | VOs - [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS] [http://operations-portal.egi.eu/vo VO IDs] [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs Approved] [http://pprc.qmul.ac.uk/~walker/votable.html VO table]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 19th May'''<br />
* There is a current priority for enabling/supporting our joining communities. <br />
<br />
'''Tuesday 5th May'''<br />
* We have a number of VOs to be removed. Dedicated follow-up meeting proposed.<br />
<br />
'''Tuesday 28th April'''<br />
* For SNOPLUS.SNOLAB.CA, the port numbers for voms02.gridpp.ac.uk and voms03.gridpp.ac.uk have both been updated from 15003 to 15503.<br />
<br />
'''Tuesday 31st March'''<br />
* LIGO are in need of additional support for debugging some tests.<br />
* LSST now enabled on 3 sites. No 'own' CVMFS yet.<br />
<br />
* Impact<br />
** Citation policy (https://www.gridpp.ac.uk/acknowledging.html)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End VOs****************** -----><br />
<!-- ****************Start Sites****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Site Updates<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 24th February'''<br />
* Next review of status today.<br />
<br />
'''Tuesday 27th January'''<br />
* Squids not in GOCDB for: UCL; ECDF; Birmingham; Durham; RHUL; IC; Sussex; Lancaster<br />
* Squids in GOCDB for: EFDA-JET; Manchester; Liverpool; Cambridge; Sheffield; Bristol; Brunel; QMUL; T1; Oxford; Glasgow; RALPPD.<br />
<br />
'''Tuesday 2nd December''' <br />
* [https://www.gridpp.ac.uk/wiki/Batch_system_status Multicore status]. Queues available (63%)<br />
** YES: RAL T1; Brunel; Imperial; QMUL; Lancaster; Liverpool; Manchester; Glasgow; Cambridge; Oxford; RALPP; Sussex (12)<br />
** NO: RHUL (testing); UCL; Sheffield (testing); Durham; ECDF (testing); Birmingham; Bristol (7)<br />
<br />
* According to our [https://www.gridpp.ac.uk/wiki/Batch_system_status table] for cloud/VMs (26%)<br />
** YES: RAL T1; Brunel; Imperial; Manchester; Oxford (5)<br />
** NO: QMUL; RHUL; UCL; Lancaster; Liverpool; Sheffield; Durham; ECDF; Glasgow; Birmingham; Bristol; Cambridge; RALPP; Sussex (14)<br />
<br />
* [http://www.gridpp.ac.uk/php/gridpp-dirac-sam.php?action=viewlcg GridPP DIRAC jobs] successful (58%)<br />
** YES: Bristol; Glasgow; Lancaster; Liverpool; Manchester; Oxford; Sheffield; Brunel; IC; QMUL; RHUL (11)<br />
** NO: Cambridge; Durham; RALPP; RAL T1 (4) + ECDF; Sussex; UCL; Birmingham (4)<br />
<br />
* [https://www.gridpp.ac.uk/wiki/IPv6_site_status IPv6 status]<br />
** Allocation - 42%<br />
** YES: RAL T1; Brunel; IC; QMUL; Manchester; Sheffield; Cambridge; Oxford (8)<br />
** NO: RHUL; UCL; Lancaster; Liverpool; Durham; ECDF; Glasgow; Birmingham; Bristol; RALPP; Sussex<br />
<br />
* Dual stack nodes - 21%<br />
** YES: Brunel; IC; QMUL; Oxford (4)<br />
** NO: RHUL; UCL; Lancaster; Glasgow; Liverpool; Manchester; Sheffield; Durham; ECDF; Birmingham; Bristol; Cambridge; RALPP; Sussex, RAL T1 (15)<br />
<br />
<br />
<br />
'''Tuesday 21st October'''<br />
* High loads seen in xroot by several sites: Liverpool and RALT1... and also Bristol (see Luke's TB-S email on 16/10 for questions about changes to help).<br />
<br />
'''Tuesday 9th September'''<br />
* Intel announced the new generation of Xeon based on Haswell.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Sites****************** -----><br />
|}<br />
<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Meeting Summaries<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start PMB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Project Management Board - [http://www.gridpp.ac.uk/pmb/ Members][http://www.gridpp.ac.uk/php/pmb/minutes.php Minutes] [http://www.gridpp.ac.uk/pmb/ProjectManagement/QuarterlyReports/reports.html Quarterly Reports]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ******************************************* -----><br />
<!-- ****************End PMB****************** -----><br />
<!-- ****************Start Grid Ops****************** -----><br />
<!-- ******************************************* -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | GridPP ops meeting - [http://indico.cern.ch/categoryDisplay.py?categId=338 Agendas] [https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Actions] [https://www.gridpp.ac.uk/wiki/Category:GridPP_Operations Core Tasks]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Grid Ops****************** -----><br />
<!-- ****************Start T1 liaison****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) [https://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Agenda] Meeting takes place on Vidyo.<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
Highlights from this meeting are now included in the Tier1 report farther up this page.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Liaison****************** -----><br />
<!-- ****************Start GDB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Grid Deployment Board - [http://indico.cern.ch/categoryDisplay.py?categId=3l181 Agendas] [http://indico.cern.ch/categoryDisplay.py?categId=666 MB agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End GDB****************** -----><br />
<br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start NGI****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | NGI UK - [http://www.ukngi.ac.uk/ Homepage] [https://ca.grid-support.ac.uk/cgi-bin/pub/pki?cmd=getStaticPage&name=index CA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End NGI****************** -----><br />
<br />
<!-- ****************Start Events****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Events<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************Stop Events****************** -----><br />
<!-- ****************Start ATLAS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK ATLAS - [http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?#currentView=Shifter+view&highlight=false Shifter view] [http://www.atlas.ac.uk/ops/ News & Links]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Atlas S&C week 2-6 Feb 2015'''<br />
<br />
Production<br />
<br />
• Prodsys-2 in production since Dec 1st<br />
<br />
• Deployment has not been transparent , many issued has been solved, the grid is filled again <br />
<br />
• MC15 is expected to start soon, waiting for physics validations, evgen testing is underway and close to finalised.. Simulation expected to be broadly similar to MC14, no blockers expected. <br />
<br />
Rucio<br />
<br />
• Rucio in production since Dec 1st and is ready for LHC RUN-2. Some fields need improvements, including transfer and deletion agents, documentation and monitoring. <br />
<br />
• [https://rucio-ui.cern.ch/dumps Rucio dumps available].<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMDarkDataCleaning Dark data cleaning]<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMLostFilesLost files declaration] . Only Only DDM ops can issue lost files declaration for now, cloud support needs to fill a ticket.<br />
<br />
• Webdav panda functional tests with Hammercloud are ongoing<br />
<br />
Monitoring<br />
<br />
• [http://adc-monitoring.cern.ch/ Main page]<br />
<br />
• [http://dashb-atlas-ddm-acc.cern.ch/dashboard/request.py/ddmaccounting DDM Accounting]<br />
<br />
• [http://atlas-agis.cern.ch/agis/ddmblacklisting/listGroup space]<br />
<br />
• [http://dashb-atlas-ddm.cern.ch/ddm2/ Deletion]<br />
<br />
ASAP<br />
<br />
• ASAP (ATLAS Site Availability Performance) in place. Every 3 months the T2s sites performing BELOW 80% are [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/ATLASSiteCategorization reported to the International Computing Board.]<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End ATLAS****************** -----><br />
<!-- ****************Start CMS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK CMS<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End CMS****************** -----><br />
<!-- ****************Start LHCb****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK LHCb<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End LHCb****************** -----><br />
<!-- ****************Start Other****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK OTHER<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Other****************** -----><br />
<!-- ****************Start Requests****************** -----><br />
{| cellspacing="0" cellpadding="0" style="margin: 1em 0em 0em 0.3em; width:100%"<br />
| style="width:50%; vertical-align:top; border:1px solid Gold; background-color: LightGreen;" rowspan="2"|<br />
<div style="border-bottom:1px solid Gold; background-color:#ffffaa; padding:0.2em 0.5em 0.2em 0.5em; font-size:110%; font-weight:bold;"> To note</div><br />
<div style="padding:0.4em 1em 0.3em 1em;"><br />
<br />
==== ====<br />
<!-- ******************Edit start********************* -----><br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<br />
</div><br />
|}<br />
<br />
<!-- ****************End Requests****************** -----><br />
<br />
|}</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Operations_Bulletin_LatestOperations Bulletin Latest2015-12-01T10:24:56Z<p>Matthew Doidge 1ac9bd3994: /* */</p>
<hr />
<div>[[Operations_Bulletin_Archive|Bulletin archive]]<br />
<br />
__NOTOC__<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Week commencing 30th November 2015<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Task Areas<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start General****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | General updates <br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start General text*********************** ----->'''<br />
'''Tuesday 1st December'''<br />
* Heavy ion running started last week.<br />
* Enabling pilots for local VOs - what is the status?<br />
* EGI is looking for document reviewers ([ https://wiki.egi.eu/wiki/EGI-Engage:Deliverables_and_Milestones Further information]).<br />
* Note the following EGI-Engage deliverables this month:<br />
** D 4.2 [https://documents.egi.eu/document/2643 VM snapshot support: OCCI extension, final specification]<br />
** D 4.3 [https://documents.egi.eu/document/2644 Resource template changes: OCCI extension, final specification]<br />
** D 6.1 [https://documents.egi.eu/document/2647 Assisted pattern recognition tools integrated with EGI for citizen science]<br />
<br />
* Switch over of the GridPP website. Soft launch this week on Tuesday with some unavailability expected during the afternoon.<br />
* The next WLCG GDB is on Wednesday 9th December. [ https://indico.cern.ch/event/319754/ Agenda].<br />
* Change of HEPSYSMAN dates due to Manchester room availability. Now anticipate 13th/14th.<br />
* [https://www.gridpp.ac.uk/wiki/User_Interface_%28UI%29_to_support_approved_VOs Installing a UI on SL.]<br />
* Core ops members - please provide your task list for Q3 15!<br />
* EGI: How many apel-publishers on sl5 in your NGI? (Instances in GOC-DB registered as glite-APEL.)<br />
* EGI: What is your WMS usage?<br />
* The dilemma: foreman/puppet or ansible?<br />
* Release date for GOCDB v5.5 is Wed 2nd December.<br />
<br />
<br />
<br />
'''Tuesday 24th November'''<br />
* Some ATLAS sites (as of Monday) still had PRODDISK listed in the BDII.<br />
* ATLAS DPM 'dump' - a list of files?<br />
* An agenda for December's GDB is developing - [https://indico.cern.ch/event/319754/ see here]. Tier-2 representation has wained recently. Perhaps someone attending the DPM meeting could attend the GDB too?<br />
* [https://www.gridpp.ac.uk/wiki/Example_Build_of_an_ARC/Condor_Cluster#Patch_for_BDII_Job_Count_Breakdown Andrew's BDII patch] used in relation to LSST job breakdown issues.<br />
* [https://www.gridpp.ac.uk/wiki/User_Interface_%28UI%29_to_support_approved_VOs Installing a UI on SL6].<br />
<br />
'''Tuesday 17th November'''<br />
* The [https://vm36.tier2.hep.manchester.ac.uk/users/case-studies/galdyn/ GalDyn case study] has been written up<br />
* There is now a job adverts section on the new GridPP website. Now we need a process to populate the area....<br />
* There is a summary available of the [https://indico.cern.ch/event/401680 LHCOPN-LHCONE meeting in Amsterdam on the 28-29 of October 2015].<br />
* Ewan: Pilot accounts for everyone<br />
* Welcome to Markus Ebert who has started at Edinburgh (designated GridPP/LSST DAC person).<br />
<br />
<!-- **********************End General text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End General****************** -----><br />
<!-- ****************Start ops coord****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Operations Coordination - [https://indico.cern.ch/categoryDisplay.py?categId=4372 Agendas][https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsCoordination Wiki Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start ops coord text*********************** -----><br />
'''Monday 30th November'''<br />
* [https://espace.cern.ch/WLCG-document-repository/Technical_Documents/WLCGFutureISUseCases_1.3.pdf An Information System future use cases document has been produced].<br />
<br />
'''Tuesday 24th November'''<br />
* There was a WLCG ops coordination meeting last Thursday: [https://indico.cern.ch/event/393620/ Agenda] | [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151119 Minutes].<br />
* Baselines: dCache 2.6.x decommissioned deadline was end of September.<br />
* News: Critical Vulnerability broadcast by SVG on [https://wiki.egi.eu/wiki/SVG:Advisory-SVG-2015-CVE-2015-7183 Friday 06 affecting NSS].<br />
* T0 news: IPv6 enabled in MyProxy and VOMS for testing purposes, in dual-stack mode (IPv4 and IPv6). LSF 9 upgrade of the WNs is in QA testing. <br />
* ALICE: Good activity. Preparations for heavy ion reco jobs. At CERN ALICE jobs can request 2 cores and hence have twice the memory but with an impact on efficiency.<br />
* ATLAS: new record in parallel running slots: 250k. Some mc15b campaign jobs were requesting an excessive amount of conditions data. Deletion agents: deletion agents were switched off for a few days for operational reasons but once on again struggled to keep up.PRODDISK has been decommissioned on all the Tier2s.<br />
* CMS: High loads:~120k parallel jobs. Multi-billion events MC RECO campaign ahead.<br />
* LHCb: Very high activity. Several days of failures at SARA when srm was overloaded by a local user. Working on interface to HTCondor-CE.<br />
* glexec TF: NTR<br />
* HTTP TF: [ https://indico.cern.ch/event/459419 Meeting on 5th November]. Have a working Nagios probe, endpoint lists from the experiments, regular monitoring of the infrastructure (see links on agenda) and a GGUS support unit.<br />
* IS evolution: The first draft of the Future Use Cases document is now available for comments. Deadline to provide input is on 24th November. There was a [https://indico.cern.ch/event/454975/attachments/1188757/1724809/ISTF-minutes-12112015.pdf TF meeting on 12th November]. Looking at options for publishing a subset of the current GLUE schema that is useful for WLCG in JSON/HTTPS.<br />
* IPv6: NTR<br />
* MW readiness: Tests reported for EOS, DPM, dCache, BDII, MW readiness app. Next meeting 2nd December.<br />
* Multicore: NTR<br />
* Network and transfer metrics: perfSONAR collector, datastore, publisher and dashboard in production (stable operations). ALL sites are encouraged to enable auto-updates for perfSONAR. Pilots: ATLAS Panda, perfSONAR stream now in ATLAS Network Analytics. LHCb DIRAC bridge is now functional.<br />
* RFC proxies: NTR<br />
* Squid monitoring and proxy discovery: NTR.<br />
* AOB: Andrew McNab will take over the coordination of the Machine/Job Features task force.<br />
<br />
<br />
'''Tuesday 17th November'''<br />
* There is a [https://indico.cern.ch/event/393620/ WLCG ops coordination meeting this week] on Thursday.<br />
<br />
'''Tuesday 10th November'''<br />
* There was a WLCG ops coordination meeting last Thursday: [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151105 Minutes]. (Alessandra chaired and might be able to talk over the main items at our ops meeting).<br />
<br />
<!-- **********************End ops coord text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End ops coord****************** -----><br />
<br />
<!-- ****************Start T1****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-1 - [http://www.gridpp.rl.ac.uk/status/ Status Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start T1 text*********************** -----><br />
'''Tuesday 1st December'''<br />
A reminder that there is a weekly [http://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Tier-1 experiment liaison meeting]. Notes from the last meeting [https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2015-11-25 here]<br />
* We are investigating why LHCB batch jobs sometimes fail to write results back to Castor (and the sometimes fail to write remotely as well). A recent change has improved, but not fixed, this problem.<br />
* We are continuing with the detailed network changes needed to remove our old core switch from the network. We anticipate significant steps on the 8th and 9th December. <br />
* As reported last week, the tenders for the next round of Disk and CPU purchases are now out.<br />
* We have seen some increased data transfer rates to/from the Tier1. We have a plan to increase the bandwidth on the bypass link (to Tier2s) from 10 to 20 Gbit. <br />
* We are implementing a new algorithm for the draining of worker nodes to make space for multi-core jobs. The new version allows "pre-emptable" jobs (ie. jobs that can be stopped at short notice) to run in the job slots until they are needed.<br />
<!-- **********************End T1 text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End T1****************** -----><br />
<br />
<!-- ****************Start Storage & DM****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Storage & Data Management - [http://storage.esc.rl.ac.uk/weekly/ Agendas/Minutes]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20151104-minutes.txt Wednesday 18 Nov]'''<br />
* New member from Edinburgh!<br />
* Updates on PRODDISK cleanup at T2s, catalogue synchronisation (aka syncat), and what does "world readable" mean?<br />
* Summary of EGI community forum - interoperation and standards for moving data<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20151104-minutes.txt Wednesday 11 Nov]'''<br />
* Puppet for configuration<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20151104-minutes.txt Wednesday 04 Nov]'''<br />
* Brian presented GridPP achievements at the E2E workshop<br />
* Storage accounting with EGI<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20151028-minutes.txt Wednesday 28 Oct]'''<br />
* Summary of "UK T0" workshop - GridPP well represented<br />
* Sites should not upgrade to DPM 1.8.10 just yet<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150902-minutes.txt Wednesday 02 Sep]'''<br />
* Catch up with [http://www3.imperial.ac.uk/highenergyphysics/research/experiments/mice MICE]<br />
* How to do transfers of '''lots''' of files with FTS3 without the proxy timing out (in particularly if you need it vomsified)<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End Storage & DM****************** -----><br />
<!-- ****************Start T2EVO****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-2 Evolution - [https://its.cern.ch/jira/browse/GRIDPP/ GridPP JIRA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 24 Nov'''<br />
* Cloud Init demonstrated with modified version of (old) GridPP DIRAC VMs<br />
* Cloud Init support in Vac 0.20pre (GRIDPP-27)<br />
* Progress on VMs for new GridPP DIRAC service: multi-VO config currently preventing matching. (GRIDPP-9)<br />
<br />
'''Tuesday 17 Nov'''<br />
* New depo.gridpp.ac.uk service for uploading files to via HTTPS<br />
* ATLAS VMs now upload log files to depo.gridpp.ac.uk for debugging (GRIDPP-24)<br />
<br />
'''Monday 9 Nov'''<br />
* Vac 0.19.0 released last week: new VacQuery protocol<br />
* Prototyping Vcycle(/Vac) GLUE2 publishing: talk at WLCG InfoSys Evolution TF on Thursday<br />
* Please now use https://repo.gridpp.ac.uk/vacproject/ URLs for user_data and cernvm3.iso files (in preparation for move to new www.gridpp.ac.uk website.)<br />
* See Laurence's GDB talk "Helix Nebula Price Enquiry Results" for more use of Vcycle<br />
<br />
'''Tuesday 3 Nov'''<br />
* LHCb prototype of GOCDB pointers to resource BDII done<br />
* T2C tests at Oxford ongoing<br />
<br />
'''Tuesday 20 Oct'''<br />
* LHCb multipilot VMs now in production<br />
* Support for APEL-Sync records in Vac, but need to co-ordinate with APEL team to validate it. This is to allow pure-VM sites like UCL to pass APEL SAM tests (GRIDPP-10)<br />
* Last GridPP Technical Meeting decided to test disk-less operation at Oxford for CMS (GRIDPP-20) and LHCb (GRIDPP-21).<br />
<br />
'''Tuesday 6 Oct'''<br />
* UCL Vac site now running LHCb test of two payloads per dual processor VM. Total of dual processor VMs at UCL now 120.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End T2EVO****************** -----><br />
<!-- ****************Start Accounting****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Accounting - [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html UK Grid Metrics] [[HEPSPEC06]] [http://tinyurl.com/cfbfanf Atlas Dashboard HS06]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 24th November'''<br />
* Slight delay for Sheffield.<br />
<br />
'''Tuesday 3rd November'''<br />
* APEL delay (normal state) Lancaster and Sheffield.<br />
<br />
'''Tuesday 20th October'''<br />
The WLCG MB decided to create a Benchmarking Task force led by Helge Meinhard see [https://indico.cern.ch/event/433303/contribution/9/attachments/1154494/1658853/2015-09-15-LCGMB-Benchmarking.pdf talk]<br />
<br />
* [http://accounting.egi.eu/egi.php?SubRegion=1.67&query=normcpu&startYear=2014&startMonth=8&endYear=2014&endMonth=9&yRange=SITE&xRange=VO&voGroup=lhc&chart=GRBAR&scale=LIN&localJobs=onlygridjobs APEL status]: An issue at Sheffield?<br />
<br />
<!-- *****************Edit stop*********************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End Accounting****************** -----><br />
<!-- ****************Start Documentation****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Documentation - [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=area KeyDocs]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- *********************************************************** -----><br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 1st December'''<br />
* Sixt and hone have been removed from the GridPP list.<br />
<br />
'''Tuesday 24th November'''<br />
Steve J: problems with voms server at fnal, voms.fnal.gov, resolved. Approved VOs document updated with newest records for those VOs affected, CDF, DZERO, LSST. Also, note changes to CA_DN for PLANCK and CDF. https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs<br />
<br />
'''Friday 6th Nov, 2015'''<br />
<br />
SteveJ: Advice to admins published about a common GSS error, globus_gsi_callback_module: Could not verify <br />
credential etc.<br />
<br />
https://www.gridpp.ac.uk/wiki/Security_system_errors_and_workarounds<br />
<br />
'''Tuesday 20th October, 2015'''<br />
<br />
Approved VOs document updated with temporary section for [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs#Approved_VOs_in_the_process_of_being_established LZ]<br />
<br />
'''Tuesday 29nd September'''<br />
Steve J: problems with voms server at fnal, voms.fnal.gov, have been detected; I will resolve them soon and may issue an update to Approved VOs, alerting sites with TB_SUPPORT should that occur. Approved VOs potentially affected are CDF, DZERO, LSST. Please do not act act yet.<br />
<br />
<br />
'''General note'''<br />
<br />
See the [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=reviewed worst KeyDocs list] for documents needing review now and the names of the responsible people.<br />
<br />
|}<br />
<br />
<!-- ****************End Documentation****************** -----><br />
<!-- ****************Start Interop****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Interoperation - [https://wiki.egi.eu/wiki/Grid_Operations_Meetings EGI ops agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
''' Tuesday 10th November<br />
<br />
Meeting yesterday was cancelled in favour of OMB; next meeting scheduled for 14/12.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- *********************************************************** -----><br />
<br />
|}<br />
<!-- ****************End Interop****************** -----><br />
<!-- ****************Start Monitoring****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Monitoring - [https://www.gridpp.ac.uk/wiki/Links_Monitoring_pages Links] [http://grid-monitoring.cern.ch/mywlcg/ MyWLCG]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 16th June'''<br />
* F Melaccio & D Crooks decided to add a [https://www.gridpp.ac.uk/wiki/Monitoring_FAQs FAQs section] devoted to common monitoring issues under the monitoring page.<br />
* Feedback welcome. <br />
<br />
<br />
<br />
''' Tuesday 31st March<br />
<br />
* Glasgow Graphite/Grafana documentation: http://www.scotgrid.ac.uk/graphite/<br />
<br />
'''Monday 7th December<br />
<br />
* Meeting last Friday - agenda: https://indico.cern.ch/event/356853/ minutes: https://indico.cern.ch/event/356853/material/minutes/1.pdf<br />
* This was the wrap-up meeting of the consolidation TF; the mailing list will remain extant for a while yet.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Monitoring****************** -----><br />
<!-- ****************On-duty****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | On-duty - [https://operations-portal.in2p3.fr/dashboard Dashboard] [https://www.gridpp.ac.uk/wiki/ROD_rota ROD rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Monday 30th November'''<br />
* A quiet week, mostly just cloing alarms for not-in-production services. There are no open ROD tickets.<br />
<br />
'''Monday 23rd November'''<br />
* Still seeing the alarms for systems at QMUL and RHUL that are not in production.<br />
* Closed some tickets for the rolling availabilities. Confused by the rolling availability plots. It seems the "rolling average" period was cut back to 20 days. <br />
<br />
'''Tuesday 3rd November'''<br />
* The long-standing UCL availability alarm went green yesterday on 29th October. We are not sure why!<br />
* Quite a lot of activity on the dashboard this week, but only one or two new tickets. <br />
* Tickets: Five for availability / reliability: Sussex, Sheffield, Liverpool, Lancaster and UCL. Two for GLUE2 validation: Liverpool and QMUL. One for the CEs at QMU<br />
<br />
'''Tuesday 20th October'''<br />
<br />
Lots of bits here and there, but no big pattern. Tickets about CE and storage problems open at several sites. QMUL notable as going on for a while, probably with some kind of configuration problem they're not identifying.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Rollout [http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html Status] [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions WLCG Baseline]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 15th September'''<br />
* [http://repository.egi.eu/2015/09/10/release-umd-3-13-3/ UMD 3.13.3 is available].<br />
<br />
'''Tuesday 12th May'''<br />
* MW Readiness WG meeting Wed May 6th at 4pm. Attended by Raul, Matt, Sam and Jeremy.<br />
<br />
'''Tuesday 17th March'''<br />
* Daniela has updated the [ https://www.gridpp.ac.uk/wiki/Staged_rollout_emi3 EMI-3 testing table]. Please check it is correct for your site. We want a clear view of where we are contributing.<br />
* There is a middleware readiness meeting this Wednesday. Would be good if a few site representatives joined.<br />
* Machine job features solution testing. Fed back that we will only commence tests if more documentation made available. This stops the HTC solution until after CHEP. Is there interest in testing other batch systems? Raul mentioned SLURM. There is also SGE and Torque.<br />
<br />
'''References'''<br />
<br />
* Staged Rollout pages (now separated into EMI1 & 2), and the page listing the deployed versions is extractable from the bdii, so they should all be reasonably up-to-date:<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout_emi2.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End On-duty****************** -----><br />
<!-- ****************Start Security****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Security - [http://www.gridpp.ac.uk/security/inchand/Incident.html Incident Procedure] [http://www.gridpp.ac.uk/security/policies/index.html Policies] [https://www.gridpp.ac.uk/wiki/SD_rota Rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
'''Tuesday 2nd December'''<br />
* [https://wiki.egi.eu/wiki/SVG:Advisory-SVG-2015-9707 ADVISORY EGI-SVG-2015-9707] - updated 'Various Java CVE's with max CVSS score'<br />
* Updated IGTF distribution version 1.70 available - now available for download from the Repository (and mirrors) [https://dist.igtf.net/distribution/igtf/current/ https://dist.igtf.net/distribution/igtf/current/]<br />
<br />
'''Tuesday 24th November'''<br />
* Call on NGIs to participate in "Security Threat Risk Assessment - with Cloud Focus" work.<br />
* Check Pakiti for CVE-2015-7183 issues.<br />
<br />
'''Tuesday 17th November'''<br />
* [https://wiki.egi.eu/wiki/SVG:Advisory-SVG-2015-CVE-2015-7183 Advisory-SVG-2015-CVE-2015-7183] issued 06/11/2015: a few UK sites show as unpatched in [https://operations-portal.egi.eu/csiDashboard/ngi/NGI_UK/tab/sites/filter/operators/page/sites EGI monitoring]. WNs, as tested by the monitoring, may be less vulnerable than affected middleware services but they could be taken as an indication of general site readiness and sites are encouraged to check their status. <br />
<br />
The EGI [https://operations-portal.egi.eu/csiDashboard security dashboard].<br />
<br />
|}<br />
<!-- ****************End Security****************** -----><br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start Services****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Services - [http://psmad.grid.iu.edu/maddash-webui/ PerfSonar dashboard] | [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).<br />
<br />
'''Tuesday 6th October'''<br />
* A reminder, the next [https://indico.cern.ch/event/401680/ LHCOPN and LHCONE joint meeting]: Science Park Amsterdam (NL) 28-29 of October 2015<br />
<br />
'''Tuesday 14th July'''<br />
* GridPP35 in September will have a part focus on networking and IPv6. This will include a review of where sites are with their deployment. Please try to firm up dates for your IPv6 availability between now and September. Please update the [https://gridpp.ac.uk/wiki/IPv6_site_status GridPP IPv6 status table]. <br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Services****************** -----><br />
<!-- ****************Start Tickets****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tickets<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Monday 30th November 2015, 15.30 GMT'''<br />
<br />
33 Open Tickets this week. 14 are for the same Atlas request.<br />
<br />
'''GAZILLION ATLAS CONSISTENCY TICKETS'''<br /><br />
I won't go over all these - but ECDF, BIRMINGHAM and IMPERIAL are still just "assigned". As Ewan did with Oxford's ticket feel free to down the "Priority" for these tickets - these requests aren't urgent at all! A few sites are done already, a few are on holding their tickets till later.<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117740 117740] (20/11)<br /><br />
Atlas request to delete proddisk data which digressed a bit (after the original issue was confirmed as solved) but then stalled. Last check Brian asked for a list of files in ./users. In progress (23/11)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116864 116864] (12/10)<br /><br />
CMS xrootd tests failing. No news since CMS replied on the 20th. Are things churning along in the background? In progress (20/11)<br />
<br />
'''''Forgot about this one:'''''<br /><br />
'''GLASGOW'''<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=118052 118052] (30/11)<br /><br />
Atlas, for the "WLCG HTTP Deployment Task Force" have ticket Glasgow after their SE showed up on this [https://etf-atlas-dev.cern.ch/etf/check_mk/index.py?start_url=%2Fetf%2Fcheck_mk%2Fview.py%3Fview_name%3Dallhosts http probe page]. I notice a few other UK sites in the red on that page. In progress (1/12)<br />
<br />
I think that's it for "exciting" tickets - as always let me know if that's not the case.<br />
<br />
Not much news on the [https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO nagios] - it's looking pleasantly plain.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tickets****************** -----><br />
<!-- ****************Start Tools****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tools - [https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] [https://gridppnagios.physics.ox.ac.uk/nagios/ Nagios]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== ===== <br />
<!-- ******************Edit start********************* -----><br />
'''Monday 30th November'''<br />
* The SAM/ARGO team has created a [https://wiki.egi.eu/wiki/Service_Level_Target_-_Availability_Reliability document] describing Availability reliability calculation in ARGO tool.<br />
<br />
<br />
<br />
''' Tuesday 6 Oct 2015'''<br />
<br />
Moved Gridppnagios instance back to Oxford from Lancaster. It was kind of double whammy as both sites went down together. Fortunately Oxford site was partially working so we managed to start SAM Nagios at Oxford. Sam tests were unavailable for few hours but no affect on egi availibilty/reliability. Sites can have a look at https://mon.egi.eu/myegi/ss/ for a/r status. <br />
<br />
'''Tuesday 29 Sep 2015'''<br />
<br />
Following an air-conditioning problem at machine room in Oxford Tier-2 site on 26 September, gridppnagios(OX) was shut down and gridppnagios(Lancs) became active instance. Oxford site is in downtime until 1st Oct and it may be extended depending on the situation. <br />
VO-Nagios was also unavailable for two days but we have started it yesterday as it is running on a VM. VO-nagios is using oxford SE for replication test so it is failing those tests. I am looking to change to some other SE. <br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tools****************** -----><br />
<!-- ****************Start VOs****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | VOs - [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS] [http://operations-portal.egi.eu/vo VO IDs] [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs Approved] [http://pprc.qmul.ac.uk/~walker/votable.html VO table]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 19th May'''<br />
* There is a current priority for enabling/supporting our joining communities. <br />
<br />
'''Tuesday 5th May'''<br />
* We have a number of VOs to be removed. Dedicated follow-up meeting proposed.<br />
<br />
'''Tuesday 28th April'''<br />
* For SNOPLUS.SNOLAB.CA, the port numbers for voms02.gridpp.ac.uk and voms03.gridpp.ac.uk have both been updated from 15003 to 15503.<br />
<br />
'''Tuesday 31st March'''<br />
* LIGO are in need of additional support for debugging some tests.<br />
* LSST now enabled on 3 sites. No 'own' CVMFS yet.<br />
<br />
* Impact<br />
** Citation policy (https://www.gridpp.ac.uk/acknowledging.html)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End VOs****************** -----><br />
<!-- ****************Start Sites****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Site Updates<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 24th February'''<br />
* Next review of status today.<br />
<br />
'''Tuesday 27th January'''<br />
* Squids not in GOCDB for: UCL; ECDF; Birmingham; Durham; RHUL; IC; Sussex; Lancaster<br />
* Squids in GOCDB for: EFDA-JET; Manchester; Liverpool; Cambridge; Sheffield; Bristol; Brunel; QMUL; T1; Oxford; Glasgow; RALPPD.<br />
<br />
'''Tuesday 2nd December''' <br />
* [https://www.gridpp.ac.uk/wiki/Batch_system_status Multicore status]. Queues available (63%)<br />
** YES: RAL T1; Brunel; Imperial; QMUL; Lancaster; Liverpool; Manchester; Glasgow; Cambridge; Oxford; RALPP; Sussex (12)<br />
** NO: RHUL (testing); UCL; Sheffield (testing); Durham; ECDF (testing); Birmingham; Bristol (7)<br />
<br />
* According to our [https://www.gridpp.ac.uk/wiki/Batch_system_status table] for cloud/VMs (26%)<br />
** YES: RAL T1; Brunel; Imperial; Manchester; Oxford (5)<br />
** NO: QMUL; RHUL; UCL; Lancaster; Liverpool; Sheffield; Durham; ECDF; Glasgow; Birmingham; Bristol; Cambridge; RALPP; Sussex (14)<br />
<br />
* [http://www.gridpp.ac.uk/php/gridpp-dirac-sam.php?action=viewlcg GridPP DIRAC jobs] successful (58%)<br />
** YES: Bristol; Glasgow; Lancaster; Liverpool; Manchester; Oxford; Sheffield; Brunel; IC; QMUL; RHUL (11)<br />
** NO: Cambridge; Durham; RALPP; RAL T1 (4) + ECDF; Sussex; UCL; Birmingham (4)<br />
<br />
* [https://www.gridpp.ac.uk/wiki/IPv6_site_status IPv6 status]<br />
** Allocation - 42%<br />
** YES: RAL T1; Brunel; IC; QMUL; Manchester; Sheffield; Cambridge; Oxford (8)<br />
** NO: RHUL; UCL; Lancaster; Liverpool; Durham; ECDF; Glasgow; Birmingham; Bristol; RALPP; Sussex<br />
<br />
* Dual stack nodes - 21%<br />
** YES: Brunel; IC; QMUL; Oxford (4)<br />
** NO: RHUL; UCL; Lancaster; Glasgow; Liverpool; Manchester; Sheffield; Durham; ECDF; Birmingham; Bristol; Cambridge; RALPP; Sussex, RAL T1 (15)<br />
<br />
<br />
<br />
'''Tuesday 21st October'''<br />
* High loads seen in xroot by several sites: Liverpool and RALT1... and also Bristol (see Luke's TB-S email on 16/10 for questions about changes to help).<br />
<br />
'''Tuesday 9th September'''<br />
* Intel announced the new generation of Xeon based on Haswell.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Sites****************** -----><br />
|}<br />
<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Meeting Summaries<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start PMB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Project Management Board - [http://www.gridpp.ac.uk/pmb/ Members][http://www.gridpp.ac.uk/php/pmb/minutes.php Minutes] [http://www.gridpp.ac.uk/pmb/ProjectManagement/QuarterlyReports/reports.html Quarterly Reports]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ******************************************* -----><br />
<!-- ****************End PMB****************** -----><br />
<!-- ****************Start Grid Ops****************** -----><br />
<!-- ******************************************* -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | GridPP ops meeting - [http://indico.cern.ch/categoryDisplay.py?categId=338 Agendas] [https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Actions] [https://www.gridpp.ac.uk/wiki/Category:GridPP_Operations Core Tasks]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Grid Ops****************** -----><br />
<!-- ****************Start T1 liaison****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) [https://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Agenda] Meeting takes place on Vidyo.<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
Highlights from this meeting are now included in the Tier1 report farther up this page.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Liaison****************** -----><br />
<!-- ****************Start GDB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Grid Deployment Board - [http://indico.cern.ch/categoryDisplay.py?categId=3l181 Agendas] [http://indico.cern.ch/categoryDisplay.py?categId=666 MB agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End GDB****************** -----><br />
<br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start NGI****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | NGI UK - [http://www.ukngi.ac.uk/ Homepage] [https://ca.grid-support.ac.uk/cgi-bin/pub/pki?cmd=getStaticPage&name=index CA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End NGI****************** -----><br />
<br />
<!-- ****************Start Events****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Events<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************Stop Events****************** -----><br />
<!-- ****************Start ATLAS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK ATLAS - [http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?#currentView=Shifter+view&highlight=false Shifter view] [http://www.atlas.ac.uk/ops/ News & Links]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Atlas S&C week 2-6 Feb 2015'''<br />
<br />
Production<br />
<br />
• Prodsys-2 in production since Dec 1st<br />
<br />
• Deployment has not been transparent , many issued has been solved, the grid is filled again <br />
<br />
• MC15 is expected to start soon, waiting for physics validations, evgen testing is underway and close to finalised.. Simulation expected to be broadly similar to MC14, no blockers expected. <br />
<br />
Rucio<br />
<br />
• Rucio in production since Dec 1st and is ready for LHC RUN-2. Some fields need improvements, including transfer and deletion agents, documentation and monitoring. <br />
<br />
• [https://rucio-ui.cern.ch/dumps Rucio dumps available].<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMDarkDataCleaning Dark data cleaning]<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMLostFilesLost files declaration] . Only Only DDM ops can issue lost files declaration for now, cloud support needs to fill a ticket.<br />
<br />
• Webdav panda functional tests with Hammercloud are ongoing<br />
<br />
Monitoring<br />
<br />
• [http://adc-monitoring.cern.ch/ Main page]<br />
<br />
• [http://dashb-atlas-ddm-acc.cern.ch/dashboard/request.py/ddmaccounting DDM Accounting]<br />
<br />
• [http://atlas-agis.cern.ch/agis/ddmblacklisting/listGroup space]<br />
<br />
• [http://dashb-atlas-ddm.cern.ch/ddm2/ Deletion]<br />
<br />
ASAP<br />
<br />
• ASAP (ATLAS Site Availability Performance) in place. Every 3 months the T2s sites performing BELOW 80% are [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/ATLASSiteCategorization reported to the International Computing Board.]<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End ATLAS****************** -----><br />
<!-- ****************Start CMS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK CMS<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End CMS****************** -----><br />
<!-- ****************Start LHCb****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK LHCb<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End LHCb****************** -----><br />
<!-- ****************Start Other****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK OTHER<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Other****************** -----><br />
<!-- ****************Start Requests****************** -----><br />
{| cellspacing="0" cellpadding="0" style="margin: 1em 0em 0em 0.3em; width:100%"<br />
| style="width:50%; vertical-align:top; border:1px solid Gold; background-color: LightGreen;" rowspan="2"|<br />
<div style="border-bottom:1px solid Gold; background-color:#ffffaa; padding:0.2em 0.5em 0.2em 0.5em; font-size:110%; font-weight:bold;"> To note</div><br />
<div style="padding:0.4em 1em 0.3em 1em;"><br />
<br />
==== ====<br />
<!-- ******************Edit start********************* -----><br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<br />
</div><br />
|}<br />
<br />
<!-- ****************End Requests****************** -----><br />
<br />
|}</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Operations_Bulletin_LatestOperations Bulletin Latest2015-12-01T10:10:09Z<p>Matthew Doidge 1ac9bd3994: /* */</p>
<hr />
<div>[[Operations_Bulletin_Archive|Bulletin archive]]<br />
<br />
__NOTOC__<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Week commencing 30th November 2015<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Task Areas<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start General****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | General updates <br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start General text*********************** ----->'''<br />
'''Tuesday 1st December'''<br />
* Heavy ion running started last week.<br />
* Enabling pilots for local VOs - what is the status?<br />
* EGI is looking for document reviewers ([ https://wiki.egi.eu/wiki/EGI-Engage:Deliverables_and_Milestones Further information]).<br />
* Note the following EGI-Engage deliverables this month:<br />
** D 4.2 [https://documents.egi.eu/document/2643 VM snapshot support: OCCI extension, final specification]<br />
** D 4.3 [https://documents.egi.eu/document/2644 Resource template changes: OCCI extension, final specification]<br />
** D 6.1 [https://documents.egi.eu/document/2647 Assisted pattern recognition tools integrated with EGI for citizen science]<br />
<br />
* Switch over of the GridPP website. Soft launch this week on Tuesday with some unavailability expected during the afternoon.<br />
* The next WLCG GDB is on Wednesday 9th December. [ https://indico.cern.ch/event/319754/ Agenda].<br />
* Change of HEPSYSMAN dates due to Manchester room availability. Now anticipate 13th/14th.<br />
* [https://www.gridpp.ac.uk/wiki/User_Interface_%28UI%29_to_support_approved_VOs Installing a UI on SL.]<br />
* Core ops members - please provide your task list for Q3 15!<br />
* EGI: How many apel-publishers on sl5 in your NGI? (Instances in GOC-DB registered as glite-APEL.)<br />
* EGI: What is your WMS usage?<br />
* The dilemma: foreman/puppet or ansible?<br />
* Release date for GOCDB v5.5 is Wed 2nd December.<br />
<br />
<br />
<br />
'''Tuesday 24th November'''<br />
* Some ATLAS sites (as of Monday) still had PRODDISK listed in the BDII.<br />
* ATLAS DPM 'dump' - a list of files?<br />
* An agenda for December's GDB is developing - [https://indico.cern.ch/event/319754/ see here]. Tier-2 representation has wained recently. Perhaps someone attending the DPM meeting could attend the GDB too?<br />
* [https://www.gridpp.ac.uk/wiki/Example_Build_of_an_ARC/Condor_Cluster#Patch_for_BDII_Job_Count_Breakdown Andrew's BDII patch] used in relation to LSST job breakdown issues.<br />
* [https://www.gridpp.ac.uk/wiki/User_Interface_%28UI%29_to_support_approved_VOs Installing a UI on SL6].<br />
<br />
'''Tuesday 17th November'''<br />
* The [https://vm36.tier2.hep.manchester.ac.uk/users/case-studies/galdyn/ GalDyn case study] has been written up<br />
* There is now a job adverts section on the new GridPP website. Now we need a process to populate the area....<br />
* There is a summary available of the [https://indico.cern.ch/event/401680 LHCOPN-LHCONE meeting in Amsterdam on the 28-29 of October 2015].<br />
* Ewan: Pilot accounts for everyone<br />
* Welcome to Markus Ebert who has started at Edinburgh (designated GridPP/LSST DAC person).<br />
<br />
<!-- **********************End General text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End General****************** -----><br />
<!-- ****************Start ops coord****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Operations Coordination - [https://indico.cern.ch/categoryDisplay.py?categId=4372 Agendas][https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsCoordination Wiki Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start ops coord text*********************** -----><br />
'''Monday 30th November'''<br />
* [https://espace.cern.ch/WLCG-document-repository/Technical_Documents/WLCGFutureISUseCases_1.3.pdf An Information System future use cases document has been produced].<br />
<br />
'''Tuesday 24th November'''<br />
* There was a WLCG ops coordination meeting last Thursday: [https://indico.cern.ch/event/393620/ Agenda] | [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151119 Minutes].<br />
* Baselines: dCache 2.6.x decommissioned deadline was end of September.<br />
* News: Critical Vulnerability broadcast by SVG on [https://wiki.egi.eu/wiki/SVG:Advisory-SVG-2015-CVE-2015-7183 Friday 06 affecting NSS].<br />
* T0 news: IPv6 enabled in MyProxy and VOMS for testing purposes, in dual-stack mode (IPv4 and IPv6). LSF 9 upgrade of the WNs is in QA testing. <br />
* ALICE: Good activity. Preparations for heavy ion reco jobs. At CERN ALICE jobs can request 2 cores and hence have twice the memory but with an impact on efficiency.<br />
* ATLAS: new record in parallel running slots: 250k. Some mc15b campaign jobs were requesting an excessive amount of conditions data. Deletion agents: deletion agents were switched off for a few days for operational reasons but once on again struggled to keep up.PRODDISK has been decommissioned on all the Tier2s.<br />
* CMS: High loads:~120k parallel jobs. Multi-billion events MC RECO campaign ahead.<br />
* LHCb: Very high activity. Several days of failures at SARA when srm was overloaded by a local user. Working on interface to HTCondor-CE.<br />
* glexec TF: NTR<br />
* HTTP TF: [ https://indico.cern.ch/event/459419 Meeting on 5th November]. Have a working Nagios probe, endpoint lists from the experiments, regular monitoring of the infrastructure (see links on agenda) and a GGUS support unit.<br />
* IS evolution: The first draft of the Future Use Cases document is now available for comments. Deadline to provide input is on 24th November. There was a [https://indico.cern.ch/event/454975/attachments/1188757/1724809/ISTF-minutes-12112015.pdf TF meeting on 12th November]. Looking at options for publishing a subset of the current GLUE schema that is useful for WLCG in JSON/HTTPS.<br />
* IPv6: NTR<br />
* MW readiness: Tests reported for EOS, DPM, dCache, BDII, MW readiness app. Next meeting 2nd December.<br />
* Multicore: NTR<br />
* Network and transfer metrics: perfSONAR collector, datastore, publisher and dashboard in production (stable operations). ALL sites are encouraged to enable auto-updates for perfSONAR. Pilots: ATLAS Panda, perfSONAR stream now in ATLAS Network Analytics. LHCb DIRAC bridge is now functional.<br />
* RFC proxies: NTR<br />
* Squid monitoring and proxy discovery: NTR.<br />
* AOB: Andrew McNab will take over the coordination of the Machine/Job Features task force.<br />
<br />
<br />
'''Tuesday 17th November'''<br />
* There is a [https://indico.cern.ch/event/393620/ WLCG ops coordination meeting this week] on Thursday.<br />
<br />
'''Tuesday 10th November'''<br />
* There was a WLCG ops coordination meeting last Thursday: [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151105 Minutes]. (Alessandra chaired and might be able to talk over the main items at our ops meeting).<br />
<br />
<!-- **********************End ops coord text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End ops coord****************** -----><br />
<br />
<!-- ****************Start T1****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-1 - [http://www.gridpp.rl.ac.uk/status/ Status Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start T1 text*********************** -----><br />
'''Tuesday 1st December'''<br />
A reminder that there is a weekly [http://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Tier-1 experiment liaison meeting]. Notes from the last meeting [https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2015-11-25 here]<br />
* We are investigating why LHCB batch jobs sometimes fail to write results back to Castor (and the sometimes fail to write remotely as well). A recent change has improved, but not fixed, this problem.<br />
* We are continuing with the detailed network changes needed to remove our old core switch from the network. We anticipate significant steps on the 8th and 9th December. <br />
* As reported last week, the tenders for the next round of Disk and CPU purchases are now out.<br />
* We have seen some increased data transfer rates to/from the Tier1. We have a plan to increase the bandwidth on the bypass link (to Tier2s) from 10 to 20 Gbit. <br />
* We are implementing a new algorithm for the draining of worker nodes to make space for multi-core jobs. The new version allows "pre-emptable" jobs (ie. jobs that can be stopped at short notice) to run in the job slots until they are needed.<br />
<!-- **********************End T1 text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End T1****************** -----><br />
<br />
<!-- ****************Start Storage & DM****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Storage & Data Management - [http://storage.esc.rl.ac.uk/weekly/ Agendas/Minutes]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20151104-minutes.txt Wednesday 18 Nov]'''<br />
* New member from Edinburgh!<br />
* Updates on PRODDISK cleanup at T2s, catalogue synchronisation (aka syncat), and what does "world readable" mean?<br />
* Summary of EGI community forum - interoperation and standards for moving data<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20151104-minutes.txt Wednesday 11 Nov]'''<br />
* Puppet for configuration<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20151104-minutes.txt Wednesday 04 Nov]'''<br />
* Brian presented GridPP achievements at the E2E workshop<br />
* Storage accounting with EGI<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20151028-minutes.txt Wednesday 28 Oct]'''<br />
* Summary of "UK T0" workshop - GridPP well represented<br />
* Sites should not upgrade to DPM 1.8.10 just yet<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150902-minutes.txt Wednesday 02 Sep]'''<br />
* Catch up with [http://www3.imperial.ac.uk/highenergyphysics/research/experiments/mice MICE]<br />
* How to do transfers of '''lots''' of files with FTS3 without the proxy timing out (in particularly if you need it vomsified)<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End Storage & DM****************** -----><br />
<!-- ****************Start T2EVO****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-2 Evolution - [https://its.cern.ch/jira/browse/GRIDPP/ GridPP JIRA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 24 Nov'''<br />
* Cloud Init demonstrated with modified version of (old) GridPP DIRAC VMs<br />
* Cloud Init support in Vac 0.20pre (GRIDPP-27)<br />
* Progress on VMs for new GridPP DIRAC service: multi-VO config currently preventing matching. (GRIDPP-9)<br />
<br />
'''Tuesday 17 Nov'''<br />
* New depo.gridpp.ac.uk service for uploading files to via HTTPS<br />
* ATLAS VMs now upload log files to depo.gridpp.ac.uk for debugging (GRIDPP-24)<br />
<br />
'''Monday 9 Nov'''<br />
* Vac 0.19.0 released last week: new VacQuery protocol<br />
* Prototyping Vcycle(/Vac) GLUE2 publishing: talk at WLCG InfoSys Evolution TF on Thursday<br />
* Please now use https://repo.gridpp.ac.uk/vacproject/ URLs for user_data and cernvm3.iso files (in preparation for move to new www.gridpp.ac.uk website.)<br />
* See Laurence's GDB talk "Helix Nebula Price Enquiry Results" for more use of Vcycle<br />
<br />
'''Tuesday 3 Nov'''<br />
* LHCb prototype of GOCDB pointers to resource BDII done<br />
* T2C tests at Oxford ongoing<br />
<br />
'''Tuesday 20 Oct'''<br />
* LHCb multipilot VMs now in production<br />
* Support for APEL-Sync records in Vac, but need to co-ordinate with APEL team to validate it. This is to allow pure-VM sites like UCL to pass APEL SAM tests (GRIDPP-10)<br />
* Last GridPP Technical Meeting decided to test disk-less operation at Oxford for CMS (GRIDPP-20) and LHCb (GRIDPP-21).<br />
<br />
'''Tuesday 6 Oct'''<br />
* UCL Vac site now running LHCb test of two payloads per dual processor VM. Total of dual processor VMs at UCL now 120.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End T2EVO****************** -----><br />
<!-- ****************Start Accounting****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Accounting - [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html UK Grid Metrics] [[HEPSPEC06]] [http://tinyurl.com/cfbfanf Atlas Dashboard HS06]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 24th November'''<br />
* Slight delay for Sheffield.<br />
<br />
'''Tuesday 3rd November'''<br />
* APEL delay (normal state) Lancaster and Sheffield.<br />
<br />
'''Tuesday 20th October'''<br />
The WLCG MB decided to create a Benchmarking Task force led by Helge Meinhard see [https://indico.cern.ch/event/433303/contribution/9/attachments/1154494/1658853/2015-09-15-LCGMB-Benchmarking.pdf talk]<br />
<br />
* [http://accounting.egi.eu/egi.php?SubRegion=1.67&query=normcpu&startYear=2014&startMonth=8&endYear=2014&endMonth=9&yRange=SITE&xRange=VO&voGroup=lhc&chart=GRBAR&scale=LIN&localJobs=onlygridjobs APEL status]: An issue at Sheffield?<br />
<br />
<!-- *****************Edit stop*********************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End Accounting****************** -----><br />
<!-- ****************Start Documentation****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Documentation - [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=area KeyDocs]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- *********************************************************** -----><br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 1st December'''<br />
* Sixt and hone have been removed from the GridPP list.<br />
<br />
'''Tuesday 24th November'''<br />
Steve J: problems with voms server at fnal, voms.fnal.gov, resolved. Approved VOs document updated with newest records for those VOs affected, CDF, DZERO, LSST. Also, note changes to CA_DN for PLANCK and CDF. https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs<br />
<br />
'''Friday 6th Nov, 2015'''<br />
<br />
SteveJ: Advice to admins published about a common GSS error, globus_gsi_callback_module: Could not verify <br />
credential etc.<br />
<br />
https://www.gridpp.ac.uk/wiki/Security_system_errors_and_workarounds<br />
<br />
'''Tuesday 20th October, 2015'''<br />
<br />
Approved VOs document updated with temporary section for [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs#Approved_VOs_in_the_process_of_being_established LZ]<br />
<br />
'''Tuesday 29nd September'''<br />
Steve J: problems with voms server at fnal, voms.fnal.gov, have been detected; I will resolve them soon and may issue an update to Approved VOs, alerting sites with TB_SUPPORT should that occur. Approved VOs potentially affected are CDF, DZERO, LSST. Please do not act act yet.<br />
<br />
<br />
'''General note'''<br />
<br />
See the [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=reviewed worst KeyDocs list] for documents needing review now and the names of the responsible people.<br />
<br />
|}<br />
<br />
<!-- ****************End Documentation****************** -----><br />
<!-- ****************Start Interop****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Interoperation - [https://wiki.egi.eu/wiki/Grid_Operations_Meetings EGI ops agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
''' Tuesday 10th November<br />
<br />
Meeting yesterday was cancelled in favour of OMB; next meeting scheduled for 14/12.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- *********************************************************** -----><br />
<br />
|}<br />
<!-- ****************End Interop****************** -----><br />
<!-- ****************Start Monitoring****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Monitoring - [https://www.gridpp.ac.uk/wiki/Links_Monitoring_pages Links] [http://grid-monitoring.cern.ch/mywlcg/ MyWLCG]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 16th June'''<br />
* F Melaccio & D Crooks decided to add a [https://www.gridpp.ac.uk/wiki/Monitoring_FAQs FAQs section] devoted to common monitoring issues under the monitoring page.<br />
* Feedback welcome. <br />
<br />
<br />
<br />
''' Tuesday 31st March<br />
<br />
* Glasgow Graphite/Grafana documentation: http://www.scotgrid.ac.uk/graphite/<br />
<br />
'''Monday 7th December<br />
<br />
* Meeting last Friday - agenda: https://indico.cern.ch/event/356853/ minutes: https://indico.cern.ch/event/356853/material/minutes/1.pdf<br />
* This was the wrap-up meeting of the consolidation TF; the mailing list will remain extant for a while yet.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Monitoring****************** -----><br />
<!-- ****************On-duty****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | On-duty - [https://operations-portal.in2p3.fr/dashboard Dashboard] [https://www.gridpp.ac.uk/wiki/ROD_rota ROD rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Monday 30th November'''<br />
* A quiet week, mostly just cloing alarms for not-in-production services. There are no open ROD tickets.<br />
<br />
'''Monday 23rd November'''<br />
* Still seeing the alarms for systems at QMUL and RHUL that are not in production.<br />
* Closed some tickets for the rolling availabilities. Confused by the rolling availability plots. It seems the "rolling average" period was cut back to 20 days. <br />
<br />
'''Tuesday 3rd November'''<br />
* The long-standing UCL availability alarm went green yesterday on 29th October. We are not sure why!<br />
* Quite a lot of activity on the dashboard this week, but only one or two new tickets. <br />
* Tickets: Five for availability / reliability: Sussex, Sheffield, Liverpool, Lancaster and UCL. Two for GLUE2 validation: Liverpool and QMUL. One for the CEs at QMU<br />
<br />
'''Tuesday 20th October'''<br />
<br />
Lots of bits here and there, but no big pattern. Tickets about CE and storage problems open at several sites. QMUL notable as going on for a while, probably with some kind of configuration problem they're not identifying.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Rollout [http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html Status] [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions WLCG Baseline]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 15th September'''<br />
* [http://repository.egi.eu/2015/09/10/release-umd-3-13-3/ UMD 3.13.3 is available].<br />
<br />
'''Tuesday 12th May'''<br />
* MW Readiness WG meeting Wed May 6th at 4pm. Attended by Raul, Matt, Sam and Jeremy.<br />
<br />
'''Tuesday 17th March'''<br />
* Daniela has updated the [ https://www.gridpp.ac.uk/wiki/Staged_rollout_emi3 EMI-3 testing table]. Please check it is correct for your site. We want a clear view of where we are contributing.<br />
* There is a middleware readiness meeting this Wednesday. Would be good if a few site representatives joined.<br />
* Machine job features solution testing. Fed back that we will only commence tests if more documentation made available. This stops the HTC solution until after CHEP. Is there interest in testing other batch systems? Raul mentioned SLURM. There is also SGE and Torque.<br />
<br />
'''References'''<br />
<br />
* Staged Rollout pages (now separated into EMI1 & 2), and the page listing the deployed versions is extractable from the bdii, so they should all be reasonably up-to-date:<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout_emi2.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End On-duty****************** -----><br />
<!-- ****************Start Security****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Security - [http://www.gridpp.ac.uk/security/inchand/Incident.html Incident Procedure] [http://www.gridpp.ac.uk/security/policies/index.html Policies] [https://www.gridpp.ac.uk/wiki/SD_rota Rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
'''Tuesday 2nd December'''<br />
* [https://wiki.egi.eu/wiki/SVG:Advisory-SVG-2015-9707 ADVISORY EGI-SVG-2015-9707] - updated 'Various Java CVE's with max CVSS score'<br />
* Updated IGTF distribution version 1.70 available - now available for download from the Repository (and mirrors) [https://dist.igtf.net/distribution/igtf/current/ https://dist.igtf.net/distribution/igtf/current/]<br />
<br />
'''Tuesday 24th November'''<br />
* Call on NGIs to participate in "Security Threat Risk Assessment - with Cloud Focus" work.<br />
* Check Pakiti for CVE-2015-7183 issues.<br />
<br />
'''Tuesday 17th November'''<br />
* [https://wiki.egi.eu/wiki/SVG:Advisory-SVG-2015-CVE-2015-7183 Advisory-SVG-2015-CVE-2015-7183] issued 06/11/2015: a few UK sites show as unpatched in [https://operations-portal.egi.eu/csiDashboard/ngi/NGI_UK/tab/sites/filter/operators/page/sites EGI monitoring]. WNs, as tested by the monitoring, may be less vulnerable than affected middleware services but they could be taken as an indication of general site readiness and sites are encouraged to check their status. <br />
<br />
The EGI [https://operations-portal.egi.eu/csiDashboard security dashboard].<br />
<br />
|}<br />
<!-- ****************End Security****************** -----><br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start Services****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Services - [http://psmad.grid.iu.edu/maddash-webui/ PerfSonar dashboard] | [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).<br />
<br />
'''Tuesday 6th October'''<br />
* A reminder, the next [https://indico.cern.ch/event/401680/ LHCOPN and LHCONE joint meeting]: Science Park Amsterdam (NL) 28-29 of October 2015<br />
<br />
'''Tuesday 14th July'''<br />
* GridPP35 in September will have a part focus on networking and IPv6. This will include a review of where sites are with their deployment. Please try to firm up dates for your IPv6 availability between now and September. Please update the [https://gridpp.ac.uk/wiki/IPv6_site_status GridPP IPv6 status table]. <br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Services****************** -----><br />
<!-- ****************Start Tickets****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tickets<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Monday 30th November 2015, 15.30 GMT'''<br />
<br />
33 Open Tickets this week. 14 are for the same Atlas request.<br />
<br />
GAZILLION ATLAS CONSISTENCY TICKETS<br /><br />
I won't go over all these - but ECDF, BIRMINGHAM and IMPERIAL are still just "assigned". As Ewan did with Oxford's ticket feel free to down the "Priority" for these tickets - these requests aren't urgent at all! A few sites are done already, a few are on holding their tickets till later.<br />
<br />
RALPP<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117740 117740] (20/11)<br /><br />
Atlas request to delete proddisk data which digressed a bit (after the original issue was confirmed as solved) but then stalled. Last check Brian asked for a list of files in ./users. In progress (23/11)<br />
<br />
TIER 1<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116864 116864] (12/10)<br /><br />
CMS xrootd tests failing. No news since CMS replied on the 20th. Are things churning along in the background? In progress (20/11)<br />
<br />
'''''Forgot about this one:'''''<br /><br />
'''GLASGOW'''<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=118052 118052] (30/11)<br /><br />
Atlas, for the "WLCG HTTP Deployment Task Force" have ticket Glasgow after their SE showed up on this [https://etf-atlas-dev.cern.ch/etf/check_mk/index.py?start_url=%2Fetf%2Fcheck_mk%2Fview.py%3Fview_name%3Dallhosts http probe page]. I notice a few other UK sites in the red on that page. In progress (1/12)<br />
<br />
I think that's it for "exciting" tickets - as always let me know if that's not the case.<br />
<br />
Not much news on the [https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO nagios] - it's looking pleasantly plain.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tickets****************** -----><br />
<!-- ****************Start Tools****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tools - [https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] [https://gridppnagios.physics.ox.ac.uk/nagios/ Nagios]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== ===== <br />
<!-- ******************Edit start********************* -----><br />
'''Monday 30th November'''<br />
* The SAM/ARGO team has created a [https://wiki.egi.eu/wiki/Service_Level_Target_-_Availability_Reliability document] describing Availability reliability calculation in ARGO tool.<br />
<br />
<br />
<br />
''' Tuesday 6 Oct 2015'''<br />
<br />
Moved Gridppnagios instance back to Oxford from Lancaster. It was kind of double whammy as both sites went down together. Fortunately Oxford site was partially working so we managed to start SAM Nagios at Oxford. Sam tests were unavailable for few hours but no affect on egi availibilty/reliability. Sites can have a look at https://mon.egi.eu/myegi/ss/ for a/r status. <br />
<br />
'''Tuesday 29 Sep 2015'''<br />
<br />
Following an air-conditioning problem at machine room in Oxford Tier-2 site on 26 September, gridppnagios(OX) was shut down and gridppnagios(Lancs) became active instance. Oxford site is in downtime until 1st Oct and it may be extended depending on the situation. <br />
VO-Nagios was also unavailable for two days but we have started it yesterday as it is running on a VM. VO-nagios is using oxford SE for replication test so it is failing those tests. I am looking to change to some other SE. <br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tools****************** -----><br />
<!-- ****************Start VOs****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | VOs - [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS] [http://operations-portal.egi.eu/vo VO IDs] [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs Approved] [http://pprc.qmul.ac.uk/~walker/votable.html VO table]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 19th May'''<br />
* There is a current priority for enabling/supporting our joining communities. <br />
<br />
'''Tuesday 5th May'''<br />
* We have a number of VOs to be removed. Dedicated follow-up meeting proposed.<br />
<br />
'''Tuesday 28th April'''<br />
* For SNOPLUS.SNOLAB.CA, the port numbers for voms02.gridpp.ac.uk and voms03.gridpp.ac.uk have both been updated from 15003 to 15503.<br />
<br />
'''Tuesday 31st March'''<br />
* LIGO are in need of additional support for debugging some tests.<br />
* LSST now enabled on 3 sites. No 'own' CVMFS yet.<br />
<br />
* Impact<br />
** Citation policy (https://www.gridpp.ac.uk/acknowledging.html)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End VOs****************** -----><br />
<!-- ****************Start Sites****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Site Updates<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 24th February'''<br />
* Next review of status today.<br />
<br />
'''Tuesday 27th January'''<br />
* Squids not in GOCDB for: UCL; ECDF; Birmingham; Durham; RHUL; IC; Sussex; Lancaster<br />
* Squids in GOCDB for: EFDA-JET; Manchester; Liverpool; Cambridge; Sheffield; Bristol; Brunel; QMUL; T1; Oxford; Glasgow; RALPPD.<br />
<br />
'''Tuesday 2nd December''' <br />
* [https://www.gridpp.ac.uk/wiki/Batch_system_status Multicore status]. Queues available (63%)<br />
** YES: RAL T1; Brunel; Imperial; QMUL; Lancaster; Liverpool; Manchester; Glasgow; Cambridge; Oxford; RALPP; Sussex (12)<br />
** NO: RHUL (testing); UCL; Sheffield (testing); Durham; ECDF (testing); Birmingham; Bristol (7)<br />
<br />
* According to our [https://www.gridpp.ac.uk/wiki/Batch_system_status table] for cloud/VMs (26%)<br />
** YES: RAL T1; Brunel; Imperial; Manchester; Oxford (5)<br />
** NO: QMUL; RHUL; UCL; Lancaster; Liverpool; Sheffield; Durham; ECDF; Glasgow; Birmingham; Bristol; Cambridge; RALPP; Sussex (14)<br />
<br />
* [http://www.gridpp.ac.uk/php/gridpp-dirac-sam.php?action=viewlcg GridPP DIRAC jobs] successful (58%)<br />
** YES: Bristol; Glasgow; Lancaster; Liverpool; Manchester; Oxford; Sheffield; Brunel; IC; QMUL; RHUL (11)<br />
** NO: Cambridge; Durham; RALPP; RAL T1 (4) + ECDF; Sussex; UCL; Birmingham (4)<br />
<br />
* [https://www.gridpp.ac.uk/wiki/IPv6_site_status IPv6 status]<br />
** Allocation - 42%<br />
** YES: RAL T1; Brunel; IC; QMUL; Manchester; Sheffield; Cambridge; Oxford (8)<br />
** NO: RHUL; UCL; Lancaster; Liverpool; Durham; ECDF; Glasgow; Birmingham; Bristol; RALPP; Sussex<br />
<br />
* Dual stack nodes - 21%<br />
** YES: Brunel; IC; QMUL; Oxford (4)<br />
** NO: RHUL; UCL; Lancaster; Glasgow; Liverpool; Manchester; Sheffield; Durham; ECDF; Birmingham; Bristol; Cambridge; RALPP; Sussex, RAL T1 (15)<br />
<br />
<br />
<br />
'''Tuesday 21st October'''<br />
* High loads seen in xroot by several sites: Liverpool and RALT1... and also Bristol (see Luke's TB-S email on 16/10 for questions about changes to help).<br />
<br />
'''Tuesday 9th September'''<br />
* Intel announced the new generation of Xeon based on Haswell.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Sites****************** -----><br />
|}<br />
<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Meeting Summaries<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start PMB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Project Management Board - [http://www.gridpp.ac.uk/pmb/ Members][http://www.gridpp.ac.uk/php/pmb/minutes.php Minutes] [http://www.gridpp.ac.uk/pmb/ProjectManagement/QuarterlyReports/reports.html Quarterly Reports]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ******************************************* -----><br />
<!-- ****************End PMB****************** -----><br />
<!-- ****************Start Grid Ops****************** -----><br />
<!-- ******************************************* -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | GridPP ops meeting - [http://indico.cern.ch/categoryDisplay.py?categId=338 Agendas] [https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Actions] [https://www.gridpp.ac.uk/wiki/Category:GridPP_Operations Core Tasks]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Grid Ops****************** -----><br />
<!-- ****************Start T1 liaison****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) [https://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Agenda] Meeting takes place on Vidyo.<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
Highlights from this meeting are now included in the Tier1 report farther up this page.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Liaison****************** -----><br />
<!-- ****************Start GDB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Grid Deployment Board - [http://indico.cern.ch/categoryDisplay.py?categId=3l181 Agendas] [http://indico.cern.ch/categoryDisplay.py?categId=666 MB agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End GDB****************** -----><br />
<br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start NGI****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | NGI UK - [http://www.ukngi.ac.uk/ Homepage] [https://ca.grid-support.ac.uk/cgi-bin/pub/pki?cmd=getStaticPage&name=index CA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End NGI****************** -----><br />
<br />
<!-- ****************Start Events****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Events<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************Stop Events****************** -----><br />
<!-- ****************Start ATLAS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK ATLAS - [http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?#currentView=Shifter+view&highlight=false Shifter view] [http://www.atlas.ac.uk/ops/ News & Links]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Atlas S&C week 2-6 Feb 2015'''<br />
<br />
Production<br />
<br />
• Prodsys-2 in production since Dec 1st<br />
<br />
• Deployment has not been transparent , many issued has been solved, the grid is filled again <br />
<br />
• MC15 is expected to start soon, waiting for physics validations, evgen testing is underway and close to finalised.. Simulation expected to be broadly similar to MC14, no blockers expected. <br />
<br />
Rucio<br />
<br />
• Rucio in production since Dec 1st and is ready for LHC RUN-2. Some fields need improvements, including transfer and deletion agents, documentation and monitoring. <br />
<br />
• [https://rucio-ui.cern.ch/dumps Rucio dumps available].<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMDarkDataCleaning Dark data cleaning]<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMLostFilesLost files declaration] . Only Only DDM ops can issue lost files declaration for now, cloud support needs to fill a ticket.<br />
<br />
• Webdav panda functional tests with Hammercloud are ongoing<br />
<br />
Monitoring<br />
<br />
• [http://adc-monitoring.cern.ch/ Main page]<br />
<br />
• [http://dashb-atlas-ddm-acc.cern.ch/dashboard/request.py/ddmaccounting DDM Accounting]<br />
<br />
• [http://atlas-agis.cern.ch/agis/ddmblacklisting/listGroup space]<br />
<br />
• [http://dashb-atlas-ddm.cern.ch/ddm2/ Deletion]<br />
<br />
ASAP<br />
<br />
• ASAP (ATLAS Site Availability Performance) in place. Every 3 months the T2s sites performing BELOW 80% are [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/ATLASSiteCategorization reported to the International Computing Board.]<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End ATLAS****************** -----><br />
<!-- ****************Start CMS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK CMS<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End CMS****************** -----><br />
<!-- ****************Start LHCb****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK LHCb<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End LHCb****************** -----><br />
<!-- ****************Start Other****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK OTHER<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Other****************** -----><br />
<!-- ****************Start Requests****************** -----><br />
{| cellspacing="0" cellpadding="0" style="margin: 1em 0em 0em 0.3em; width:100%"<br />
| style="width:50%; vertical-align:top; border:1px solid Gold; background-color: LightGreen;" rowspan="2"|<br />
<div style="border-bottom:1px solid Gold; background-color:#ffffaa; padding:0.2em 0.5em 0.2em 0.5em; font-size:110%; font-weight:bold;"> To note</div><br />
<div style="padding:0.4em 1em 0.3em 1em;"><br />
<br />
==== ====<br />
<!-- ******************Edit start********************* -----><br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<br />
</div><br />
|}<br />
<br />
<!-- ****************End Requests****************** -----><br />
<br />
|}</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Operations_Bulletin_LatestOperations Bulletin Latest2015-11-30T16:47:24Z<p>Matthew Doidge 1ac9bd3994: /* */</p>
<hr />
<div>[[Operations_Bulletin_Archive|Bulletin archive]]<br />
<br />
__NOTOC__<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Week commencing 30th November 2015<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Task Areas<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start General****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | General updates <br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start General text*********************** ----->'''<br />
<br />
'''Tuesday 24th November'''<br />
* Some ATLAS sites (as of Monday) still had PRODDISK listed in the BDII.<br />
* ATLAS DPM 'dump' - a list of files?<br />
* An agenda for December's GDB is developing - [https://indico.cern.ch/event/319754/ see here]. Tier-2 representation has wained recently. Perhaps someone attending the DPM meeting could attend the GDB too?<br />
* [https://www.gridpp.ac.uk/wiki/Example_Build_of_an_ARC/Condor_Cluster#Patch_for_BDII_Job_Count_Breakdown Andrew's BDII patch] used in relation to LSST job breakdown issues.<br />
* [https://www.gridpp.ac.uk/wiki/User_Interface_%28UI%29_to_support_approved_VOs Installing a UI on SL6].<br />
<br />
'''Tuesday 17th November'''<br />
* The [https://vm36.tier2.hep.manchester.ac.uk/users/case-studies/galdyn/ GalDyn case study] has been written up<br />
* There is now a job adverts section on the new GridPP website. Now we need a process to populate the area....<br />
* There is a summary available of the [https://indico.cern.ch/event/401680 LHCOPN-LHCONE meeting in Amsterdam on the 28-29 of October 2015].<br />
* Ewan: Pilot accounts for everyone<br />
* Welcome to Markus Ebert who has started at Edinburgh (designated GridPP/LSST DAC person).<br />
<br />
<br />
'''Tuesday 10th November'''<br />
* There was a WLCG GDB last week: [http://indico.cern.ch/event/319753/ Agenda]. [https://twiki.cern.ch/twiki/bin/view/LCG/GDBMeetingNotes20151104 Minutes].<br />
* ATLAS is going to move the brokering to use maxrss value rather than maxrss+maxswap (called maxmemory in the ATLAS panda queues). [https://twiki.cern.ch/twiki/bin/view/LCG/BSPassingParameters Reminder notes].<br />
* There is an EGI meeting in Bari this week. The OMB agenda/slides can be seen [https://indico.egi.eu/indico/contributionDisplay.py?contribId=1&confId=2675 here]. <br />
* An update from John Gordon on the CPU efficiency accounting discussion of last week."Stuart has the database merge under way now. We had hoped it would be done by the end of October but it isn’t complete yet. When done and we send an integrated dataset to the portal they will put the current dev view in the production portal alongside the current one. They are currently developing a major rewrite of the portal and don’t want to mess about with it too much."<br />
*J Perkin: Weird user error when reading LFC. Led to a reminder that the CA team are currently best efforts - please be patient.<br />
* J Hill: ATLAS consistency checks at Tier2s requires sites to upload stiorage dumps - please can we have a clear statement of what parameters ATLAS wants us to use for these dumps?<br />
* Simon G: Asked about Tier3 access to Tier2 storage.<br />
* LCG-ROLLOUT: TOP BDII issues with CentOS 6.7 (openldap-servers-2.4.40-5.el6.x86_64). It basically breaks but is being followed up with RH.<br />
* For those interested in the ARGUS working group meeting last week please see their [https://indico.cern.ch/event/459898/ summary].<br />
* There was a GridPP Technical Meeting last Friday: [https://indico.cern.ch/e/460437 Agenda].<br />
<br />
<!-- **********************End General text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End General****************** -----><br />
<!-- ****************Start ops coord****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Operations Coordination - [https://indico.cern.ch/categoryDisplay.py?categId=4372 Agendas][https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsCoordination Wiki Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start ops coord text*********************** -----><br />
'''Tuesday 24th November'''<br />
* There was a WLCG ops coordination meeting last Thursday: [https://indico.cern.ch/event/393620/ Agenda] | [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151119 Minutes].<br />
* Baselines: dCache 2.6.x decommissioned deadline was end of September.<br />
* News: Critical Vulnerability broadcast by SVG on [https://wiki.egi.eu/wiki/SVG:Advisory-SVG-2015-CVE-2015-7183 Friday 06 affecting NSS].<br />
* T0 news: IPv6 enabled in MyProxy and VOMS for testing purposes, in dual-stack mode (IPv4 and IPv6). LSF 9 upgrade of the WNs is in QA testing. <br />
* ALICE: Good activity. Preparations for heavy ion reco jobs. At CERN ALICE jobs can request 2 cores and hence have twice the memory but with an impact on efficiency.<br />
* ATLAS: new record in parallel running slots: 250k. Some mc15b campaign jobs were requesting an excessive amount of conditions data. Deletion agents: deletion agents were switched off for a few days for operational reasons but once on again struggled to keep up.PRODDISK has been decommissioned on all the Tier2s.<br />
* CMS: High loads:~120k parallel jobs. Multi-billion events MC RECO campaign ahead.<br />
* LHCb: Very high activity. Several days of failures at SARA when srm was overloaded by a local user. Working on interface to HTCondor-CE.<br />
* glexec TF: NTR<br />
* HTTP TF: [ https://indico.cern.ch/event/459419 Meeting on 5th November]. Have a working Nagios probe, endpoint lists from the experiments, regular monitoring of the infrastructure (see links on agenda) and a GGUS support unit.<br />
* IS evolution: The first draft of the Future Use Cases document is now available for comments. Deadline to provide input is on 24th November. There was a [https://indico.cern.ch/event/454975/attachments/1188757/1724809/ISTF-minutes-12112015.pdf TF meeting on 12th November]. Looking at options for publishing a subset of the current GLUE schema that is useful for WLCG in JSON/HTTPS.<br />
* IPv6: NTR<br />
* MW readiness: Tests reported for EOS, DPM, dCache, BDII, MW readiness app. Next meeting 2nd December.<br />
* Multicore: NTR<br />
* Network and transfer metrics: perfSONAR collector, datastore, publisher and dashboard in production (stable operations). ALL sites are encouraged to enable auto-updates for perfSONAR. Pilots: ATLAS Panda, perfSONAR stream now in ATLAS Network Analytics. LHCb DIRAC bridge is now functional.<br />
* RFC proxies: NTR<br />
* Squid monitoring and proxy discovery: NTR.<br />
* AOB: Andrew McNab will take over the coordination of the Machine/Job Features task force.<br />
<br />
<br />
'''Tuesday 17th November'''<br />
* There is a [https://indico.cern.ch/event/393620/ WLCG ops coordination meeting this week] on Thursday.<br />
<br />
'''Tuesday 10th November'''<br />
* There was a WLCG ops coordination meeting last Thursday: [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151105 Minutes]. (Alessandra chaired and might be able to talk over the main items at our ops meeting).<br />
<br />
<!-- **********************End ops coord text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End ops coord****************** -----><br />
<br />
<!-- ****************Start T1****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-1 - [http://www.gridpp.rl.ac.uk/status/ Status Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start T1 text*********************** -----><br />
'''Tuesday 1st December'''<br />
A reminder that there is a weekly [http://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Tier-1 experiment liaison meeting]. Notes from the last meeting [https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2015-11-25 here]<br />
* We are investigating why LHCB batch jobs sometimes fail to write results back to Castor (and the sometimes fail to write remotely as well). A recent change has improved, but not fixed, this problem.<br />
* We are continuing with the detailed network changes needed to remove our old core switch from the network. We anticipate significant steps on the 8th and 9th December. <br />
* As reported last week, the tenders for the next round of Disk and CPU purchases are now out.<br />
* We have seen some increased data transfer rates to/from the Tier1. We have a plan to increase the bandwidth on the bypass link (to Tier2s) from 10 to 20 Gbit. <br />
* We are implementing a new algorithm for the draining of worker nodes to make space for multi-core jobs. The new version allows "pre-emptable" jobs (ie. jobs that can be stopped at short notice) to run in the job slots until they are needed.<br />
<!-- **********************End T1 text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End T1****************** -----><br />
<br />
<!-- ****************Start Storage & DM****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Storage & Data Management - [http://storage.esc.rl.ac.uk/weekly/ Agendas/Minutes]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20151104-minutes.txt Wednesday 18 Nov]'''<br />
* New member from Edinburgh!<br />
* Updates on PRODDISK cleanup at T2s, catalogue synchronisation (aka syncat), and what does "world readable" mean?<br />
* Summary of EGI community forum - interoperation and standards for moving data<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20151104-minutes.txt Wednesday 11 Nov]'''<br />
* Puppet for configuration<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20151104-minutes.txt Wednesday 04 Nov]'''<br />
* Brian presented GridPP achievements at the E2E workshop<br />
* Storage accounting with EGI<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20151028-minutes.txt Wednesday 28 Oct]'''<br />
* Summary of "UK T0" workshop - GridPP well represented<br />
* Sites should not upgrade to DPM 1.8.10 just yet<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150902-minutes.txt Wednesday 02 Sep]'''<br />
* Catch up with [http://www3.imperial.ac.uk/highenergyphysics/research/experiments/mice MICE]<br />
* How to do transfers of '''lots''' of files with FTS3 without the proxy timing out (in particularly if you need it vomsified)<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End Storage & DM****************** -----><br />
<!-- ****************Start T2EVO****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-2 Evolution - [https://its.cern.ch/jira/browse/GRIDPP/ GridPP JIRA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 24 Nov'''<br />
* Cloud Init demonstrated with modified version of (old) GridPP DIRAC VMs<br />
* Cloud Init support in Vac 0.20pre (GRIDPP-27)<br />
* Progress on VMs for new GridPP DIRAC service: multi-VO config currently preventing matching. (GRIDPP-9)<br />
<br />
'''Tuesday 17 Nov'''<br />
* New depo.gridpp.ac.uk service for uploading files to via HTTPS<br />
* ATLAS VMs now upload log files to depo.gridpp.ac.uk for debugging (GRIDPP-24)<br />
<br />
'''Monday 9 Nov'''<br />
* Vac 0.19.0 released last week: new VacQuery protocol<br />
* Prototyping Vcycle(/Vac) GLUE2 publishing: talk at WLCG InfoSys Evolution TF on Thursday<br />
* Please now use https://repo.gridpp.ac.uk/vacproject/ URLs for user_data and cernvm3.iso files (in preparation for move to new www.gridpp.ac.uk website.)<br />
* See Laurence's GDB talk "Helix Nebula Price Enquiry Results" for more use of Vcycle<br />
<br />
'''Tuesday 3 Nov'''<br />
* LHCb prototype of GOCDB pointers to resource BDII done<br />
* T2C tests at Oxford ongoing<br />
<br />
'''Tuesday 20 Oct'''<br />
* LHCb multipilot VMs now in production<br />
* Support for APEL-Sync records in Vac, but need to co-ordinate with APEL team to validate it. This is to allow pure-VM sites like UCL to pass APEL SAM tests (GRIDPP-10)<br />
* Last GridPP Technical Meeting decided to test disk-less operation at Oxford for CMS (GRIDPP-20) and LHCb (GRIDPP-21).<br />
<br />
'''Tuesday 6 Oct'''<br />
* UCL Vac site now running LHCb test of two payloads per dual processor VM. Total of dual processor VMs at UCL now 120.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End T2EVO****************** -----><br />
<!-- ****************Start Accounting****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Accounting - [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html UK Grid Metrics] [[HEPSPEC06]] [http://tinyurl.com/cfbfanf Atlas Dashboard HS06]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 24th November'''<br />
* Slight delay for Sheffield.<br />
<br />
'''Tuesday 3rd November'''<br />
* APEL delay (normal state) Lancaster and Sheffield.<br />
<br />
'''Tuesday 20th October'''<br />
The WLCG MB decided to create a Benchmarking Task force led by Helge Meinhard see [https://indico.cern.ch/event/433303/contribution/9/attachments/1154494/1658853/2015-09-15-LCGMB-Benchmarking.pdf talk]<br />
<br />
* [http://accounting.egi.eu/egi.php?SubRegion=1.67&query=normcpu&startYear=2014&startMonth=8&endYear=2014&endMonth=9&yRange=SITE&xRange=VO&voGroup=lhc&chart=GRBAR&scale=LIN&localJobs=onlygridjobs APEL status]: An issue at Sheffield?<br />
<br />
<!-- *****************Edit stop*********************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End Accounting****************** -----><br />
<!-- ****************Start Documentation****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Documentation - [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=area KeyDocs]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- *********************************************************** -----><br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 24th November'''<br />
Steve J: problems with voms server at fnal, voms.fnal.gov, resolved. Approved VOs document updated with newest records for those VOs affected, CDF, DZERO, LSST. Also, note changes to CA_DN for PLANCK and CDF. https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs<br />
<br />
'''Friday 6th Nov, 2015'''<br />
<br />
SteveJ: Advice to admins published about a common GSS error, globus_gsi_callback_module: Could not verify <br />
credential etc.<br />
<br />
https://www.gridpp.ac.uk/wiki/Security_system_errors_and_workarounds<br />
<br />
'''Tuesday 20th October, 2015'''<br />
<br />
Approved VOs document updated with temporary section for [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs#Approved_VOs_in_the_process_of_being_established LZ]<br />
<br />
'''Tuesday 29nd September'''<br />
Steve J: problems with voms server at fnal, voms.fnal.gov, have been detected; I will resolve them soon and may issue an update to Approved VOs, alerting sites with TB_SUPPORT should that occur. Approved VOs potentially affected are CDF, DZERO, LSST. Please do not act act yet.<br />
<br />
'''Tuesday 22nd September'''<br />
* Steve J is going to undertake some GridPP/documentation usability testing. <br />
<br />
'''Tuesday 18th August'''<br />
* Lydia's document - Setup a system to do data archiving using FTS3<br />
<br />
'''Tuesday 28th July'''<br />
* Ewan: /cvmfs/gridpp-vo help ... there's a lot of historical stuff on the GridPP wiki that makes it look a lot more complicated than it is now. We really should have a bit of a clear out at some point.<br />
<br />
'''Tuesday 23rd June'''<br />
* Reminder that documents need reviewing!<br />
<br />
<br />
'''General note'''<br />
<br />
See the [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=reviewed worst KeyDocs list] for documents needing review now and the names of the responsible people.<br />
<br />
|}<br />
<br />
<!-- ****************End Documentation****************** -----><br />
<!-- ****************Start Interop****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Interoperation - [https://wiki.egi.eu/wiki/Grid_Operations_Meetings EGI ops agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
''' Tuesday 10th November<br />
<br />
Meeting yesterday was cancelled in favour of OMB; next meeting scheduled for 14/12.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- *********************************************************** -----><br />
<br />
|}<br />
<!-- ****************End Interop****************** -----><br />
<!-- ****************Start Monitoring****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Monitoring - [https://www.gridpp.ac.uk/wiki/Links_Monitoring_pages Links] [http://grid-monitoring.cern.ch/mywlcg/ MyWLCG]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 16th June'''<br />
* F Melaccio & D Crooks decided to add a [https://www.gridpp.ac.uk/wiki/Monitoring_FAQs FAQs section] devoted to common monitoring issues under the monitoring page.<br />
* Feedback welcome. <br />
<br />
<br />
<br />
''' Tuesday 31st March<br />
<br />
* Glasgow Graphite/Grafana documentation: http://www.scotgrid.ac.uk/graphite/<br />
<br />
'''Monday 7th December<br />
<br />
* Meeting last Friday - agenda: https://indico.cern.ch/event/356853/ minutes: https://indico.cern.ch/event/356853/material/minutes/1.pdf<br />
* This was the wrap-up meeting of the consolidation TF; the mailing list will remain extant for a while yet.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Monitoring****************** -----><br />
<!-- ****************On-duty****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | On-duty - [https://operations-portal.in2p3.fr/dashboard Dashboard] [https://www.gridpp.ac.uk/wiki/ROD_rota ROD rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Monday 23rd November'''<br />
* Still seeing the alarms for systems at QMUL and RHUL that are not in production.<br />
* Closed some tickets for the rolling availabilities. Confused by the rolling availability plots. It seems the "rolling average" period was cut back to 20 days. <br />
<br />
'''Tuesday 3rd November'''<br />
* The long-standing UCL availability alarm went green yesterday on 29th October. We are not sure why!<br />
* Quite a lot of activity on the dashboard this week, but only one or two new tickets. <br />
* Tickets: Five for availability / reliability: Sussex, Sheffield, Liverpool, Lancaster and UCL. Two for GLUE2 validation: Liverpool and QMUL. One for the CEs at QMU<br />
<br />
'''Tuesday 20th October'''<br />
<br />
Lots of bits here and there, but no big pattern. Tickets about CE and storage problems open at several sites. QMUL notable as going on for a while, probably with some kind of configuration problem they're not identifying.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Rollout [http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html Status] [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions WLCG Baseline]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 15th September'''<br />
* [http://repository.egi.eu/2015/09/10/release-umd-3-13-3/ UMD 3.13.3 is available].<br />
<br />
'''Tuesday 12th May'''<br />
* MW Readiness WG meeting Wed May 6th at 4pm. Attended by Raul, Matt, Sam and Jeremy.<br />
<br />
'''Tuesday 17th March'''<br />
* Daniela has updated the [ https://www.gridpp.ac.uk/wiki/Staged_rollout_emi3 EMI-3 testing table]. Please check it is correct for your site. We want a clear view of where we are contributing.<br />
* There is a middleware readiness meeting this Wednesday. Would be good if a few site representatives joined.<br />
* Machine job features solution testing. Fed back that we will only commence tests if more documentation made available. This stops the HTC solution until after CHEP. Is there interest in testing other batch systems? Raul mentioned SLURM. There is also SGE and Torque.<br />
<br />
'''References'''<br />
<br />
* Staged Rollout pages (now separated into EMI1 & 2), and the page listing the deployed versions is extractable from the bdii, so they should all be reasonably up-to-date:<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout_emi2.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End On-duty****************** -----><br />
<!-- ****************Start Security****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Security - [http://www.gridpp.ac.uk/security/inchand/Incident.html Incident Procedure] [http://www.gridpp.ac.uk/security/policies/index.html Policies] [https://www.gridpp.ac.uk/wiki/SD_rota Rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
'''Tuesday 2nd December'''<br />
* [https://wiki.egi.eu/wiki/SVG:Advisory-SVG-2015-9707 ADVISORY EGI-SVG-2015-9707] - updated 'Various Java CVE's with max CVSS score'<br />
* Updated IGTF distribution version 1.70 available - now available for download from the Repository (and mirrors) [https://dist.igtf.net/distribution/igtf/current/ https://dist.igtf.net/distribution/igtf/current/]<br />
<br />
'''Tuesday 24th November'''<br />
* Call on NGIs to participate in "Security Threat Risk Assessment - with Cloud Focus" work.<br />
* Check Pakiti for CVE-2015-7183 issues.<br />
<br />
'''Tuesday 17th November'''<br />
* [https://wiki.egi.eu/wiki/SVG:Advisory-SVG-2015-CVE-2015-7183 Advisory-SVG-2015-CVE-2015-7183] issued 06/11/2015: a few UK sites show as unpatched in [https://operations-portal.egi.eu/csiDashboard/ngi/NGI_UK/tab/sites/filter/operators/page/sites EGI monitoring]. WNs, as tested by the monitoring, may be less vulnerable than affected middleware services but they could be taken as an indication of general site readiness and sites are encouraged to check their status. <br />
<br />
The EGI [https://operations-portal.egi.eu/csiDashboard security dashboard].<br />
<br />
|}<br />
<!-- ****************End Security****************** -----><br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start Services****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Services - [http://psmad.grid.iu.edu/maddash-webui/ PerfSonar dashboard] | [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).<br />
<br />
'''Tuesday 6th October'''<br />
* A reminder, the next [https://indico.cern.ch/event/401680/ LHCOPN and LHCONE joint meeting]: Science Park Amsterdam (NL) 28-29 of October 2015<br />
<br />
'''Tuesday 14th July'''<br />
* GridPP35 in September will have a part focus on networking and IPv6. This will include a review of where sites are with their deployment. Please try to firm up dates for your IPv6 availability between now and September. Please update the [https://gridpp.ac.uk/wiki/IPv6_site_status GridPP IPv6 status table]. <br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Services****************** -----><br />
<!-- ****************Start Tickets****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tickets<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Monday 30th November 2015, 15.30 GMT'''<br />
<br />
33 Open Tickets this week. 14 are for the same Atlas request.<br />
<br />
GAZILLION ATLAS CONSISTENCY TICKETS<br /><br />
I won't go over all these - but ECDF, BIRMINGHAM and IMPERIAL are still just "assigned". As Ewan did with Oxford's ticket feel free to down the "Priority" for these tickets - these requests aren't urgent at all! A few sites are done already, a few are on holding their tickets till later.<br />
<br />
RALPP<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117740 117740] (20/11)<br /><br />
Atlas request to delete proddisk data which digressed a bit (after the original issue was confirmed as solved) but then stalled. Last check Brian asked for a list of files in ./users. In progress (23/11)<br />
<br />
TIER 1<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116864 116864] (12/10)<br /><br />
CMS xrootd tests failing. No news since CMS replied on the 20th. Are things churning along in the background? In progress (20/11)<br />
<br />
I think that's it for "exciting" tickets - as always let me know if that's not the case.<br />
<br />
Not much news on the [https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO nagios] - it's looking pleasantly plain.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tickets****************** -----><br />
<!-- ****************Start Tools****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tools - [https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] [https://gridppnagios.physics.ox.ac.uk/nagios/ Nagios]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== ===== <br />
<!-- ******************Edit start********************* -----><br />
''' Tuesday 6 Oct 2015'''<br />
<br />
Moved Gridppnagios instance back to Oxford from Lancaster. It was kind of double whammy as both sites went down together. Fortunately Oxford site was partially working so we managed to start SAM Nagios at Oxford. Sam tests were unavailable for few hours but no affect on egi availibilty/reliability. Sites can have a look at https://mon.egi.eu/myegi/ss/ for a/r status. <br />
<br />
'''Tuesday 29 Sep 2015'''<br />
<br />
Following an air-conditioning problem at machine room in Oxford Tier-2 site on 26 September, gridppnagios(OX) was shut down and gridppnagios(Lancs) became active instance. Oxford site is in downtime until 1st Oct and it may be extended depending on the situation. <br />
VO-Nagios was also unavailable for two days but we have started it yesterday as it is running on a VM. VO-nagios is using oxford SE for replication test so it is failing those tests. I am looking to change to some other SE. <br />
<br />
'''Tuesday 09 June 2015'''<br />
*ARC CEs were failing nagios test becuase of non-availability of egi repository. Nagios test compare CA version from EGI repo. It started on 5th June and one of the IP addresses behind webserver was not responding. Problem went away in approximately 3 hours. The same problem started again on 6th June. Finally it was fixed on 8th June. No reason was given in any of the ticket opened regarding this outage. <br />
<br />
'''Tuesday 17th February'''<br />
* Another period where message brokers were temporarily unavailable seen yesterday. Any news on the last follow-up?<br />
<br />
'''Tuesday 27th January'''<br />
* Unscheduled outage of the EGI message broker (GRNET) caused a short-lived disruption to GridPP site monitoring (jobs failed) last Thursday 22nd January. Suspect BDII caching meant no immediate failover to stomp://mq.cro-ngi.hr:6163/ from stomp://mq.afroditi.hellasgrid.gr:6163/<br />
<br />
* [http://southgrid.blogspot.co.uk/2014/10/nagios-monitoring-for-non-lhc-vos.html Blog about VO Nagios]<br />
* [https://vo-nagios.physics.ox.ac.uk/nagios/ Oxford VO Nagios] currently monitoring gridpp, pheno, t2k.org, snoplus.snolab.ca, vo.southgrid.ac.uk.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tools****************** -----><br />
<!-- ****************Start VOs****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | VOs - [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS] [http://operations-portal.egi.eu/vo VO IDs] [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs Approved] [http://pprc.qmul.ac.uk/~walker/votable.html VO table]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 19th May'''<br />
* There is a current priority for enabling/supporting our joining communities. <br />
<br />
'''Tuesday 5th May'''<br />
* We have a number of VOs to be removed. Dedicated follow-up meeting proposed.<br />
<br />
'''Tuesday 28th April'''<br />
* For SNOPLUS.SNOLAB.CA, the port numbers for voms02.gridpp.ac.uk and voms03.gridpp.ac.uk have both been updated from 15003 to 15503.<br />
<br />
'''Tuesday 31st March'''<br />
* LIGO are in need of additional support for debugging some tests.<br />
* LSST now enabled on 3 sites. No 'own' CVMFS yet.<br />
<br />
* Impact<br />
** Citation policy (https://www.gridpp.ac.uk/acknowledging.html)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End VOs****************** -----><br />
<!-- ****************Start Sites****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Site Updates<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 24th February'''<br />
* Next review of status today.<br />
<br />
'''Tuesday 27th January'''<br />
* Squids not in GOCDB for: UCL; ECDF; Birmingham; Durham; RHUL; IC; Sussex; Lancaster<br />
* Squids in GOCDB for: EFDA-JET; Manchester; Liverpool; Cambridge; Sheffield; Bristol; Brunel; QMUL; T1; Oxford; Glasgow; RALPPD.<br />
<br />
'''Tuesday 2nd December''' <br />
* [https://www.gridpp.ac.uk/wiki/Batch_system_status Multicore status]. Queues available (63%)<br />
** YES: RAL T1; Brunel; Imperial; QMUL; Lancaster; Liverpool; Manchester; Glasgow; Cambridge; Oxford; RALPP; Sussex (12)<br />
** NO: RHUL (testing); UCL; Sheffield (testing); Durham; ECDF (testing); Birmingham; Bristol (7)<br />
<br />
* According to our [https://www.gridpp.ac.uk/wiki/Batch_system_status table] for cloud/VMs (26%)<br />
** YES: RAL T1; Brunel; Imperial; Manchester; Oxford (5)<br />
** NO: QMUL; RHUL; UCL; Lancaster; Liverpool; Sheffield; Durham; ECDF; Glasgow; Birmingham; Bristol; Cambridge; RALPP; Sussex (14)<br />
<br />
* [http://www.gridpp.ac.uk/php/gridpp-dirac-sam.php?action=viewlcg GridPP DIRAC jobs] successful (58%)<br />
** YES: Bristol; Glasgow; Lancaster; Liverpool; Manchester; Oxford; Sheffield; Brunel; IC; QMUL; RHUL (11)<br />
** NO: Cambridge; Durham; RALPP; RAL T1 (4) + ECDF; Sussex; UCL; Birmingham (4)<br />
<br />
* [https://www.gridpp.ac.uk/wiki/IPv6_site_status IPv6 status]<br />
** Allocation - 42%<br />
** YES: RAL T1; Brunel; IC; QMUL; Manchester; Sheffield; Cambridge; Oxford (8)<br />
** NO: RHUL; UCL; Lancaster; Liverpool; Durham; ECDF; Glasgow; Birmingham; Bristol; RALPP; Sussex<br />
<br />
* Dual stack nodes - 21%<br />
** YES: Brunel; IC; QMUL; Oxford (4)<br />
** NO: RHUL; UCL; Lancaster; Glasgow; Liverpool; Manchester; Sheffield; Durham; ECDF; Birmingham; Bristol; Cambridge; RALPP; Sussex, RAL T1 (15)<br />
<br />
<br />
<br />
'''Tuesday 21st October'''<br />
* High loads seen in xroot by several sites: Liverpool and RALT1... and also Bristol (see Luke's TB-S email on 16/10 for questions about changes to help).<br />
<br />
'''Tuesday 9th September'''<br />
* Intel announced the new generation of Xeon based on Haswell.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Sites****************** -----><br />
|}<br />
<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Meeting Summaries<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start PMB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Project Management Board - [http://www.gridpp.ac.uk/pmb/ Members][http://www.gridpp.ac.uk/php/pmb/minutes.php Minutes] [http://www.gridpp.ac.uk/pmb/ProjectManagement/QuarterlyReports/reports.html Quarterly Reports]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ******************************************* -----><br />
<!-- ****************End PMB****************** -----><br />
<!-- ****************Start Grid Ops****************** -----><br />
<!-- ******************************************* -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | GridPP ops meeting - [http://indico.cern.ch/categoryDisplay.py?categId=338 Agendas] [https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Actions] [https://www.gridpp.ac.uk/wiki/Category:GridPP_Operations Core Tasks]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Grid Ops****************** -----><br />
<!-- ****************Start T1 liaison****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) [https://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Agenda] Meeting takes place on Vidyo.<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
Highlights from this meeting are now included in the Tier1 report farther up this page.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Liaison****************** -----><br />
<!-- ****************Start GDB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Grid Deployment Board - [http://indico.cern.ch/categoryDisplay.py?categId=3l181 Agendas] [http://indico.cern.ch/categoryDisplay.py?categId=666 MB agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End GDB****************** -----><br />
<br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start NGI****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | NGI UK - [http://www.ukngi.ac.uk/ Homepage] [https://ca.grid-support.ac.uk/cgi-bin/pub/pki?cmd=getStaticPage&name=index CA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End NGI****************** -----><br />
<br />
<!-- ****************Start Events****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Events<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************Stop Events****************** -----><br />
<!-- ****************Start ATLAS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK ATLAS - [http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?#currentView=Shifter+view&highlight=false Shifter view] [http://www.atlas.ac.uk/ops/ News & Links]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Atlas S&C week 2-6 Feb 2015'''<br />
<br />
Production<br />
<br />
• Prodsys-2 in production since Dec 1st<br />
<br />
• Deployment has not been transparent , many issued has been solved, the grid is filled again <br />
<br />
• MC15 is expected to start soon, waiting for physics validations, evgen testing is underway and close to finalised.. Simulation expected to be broadly similar to MC14, no blockers expected. <br />
<br />
Rucio<br />
<br />
• Rucio in production since Dec 1st and is ready for LHC RUN-2. Some fields need improvements, including transfer and deletion agents, documentation and monitoring. <br />
<br />
• [https://rucio-ui.cern.ch/dumps Rucio dumps available].<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMDarkDataCleaning Dark data cleaning]<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMLostFilesLost files declaration] . Only Only DDM ops can issue lost files declaration for now, cloud support needs to fill a ticket.<br />
<br />
• Webdav panda functional tests with Hammercloud are ongoing<br />
<br />
Monitoring<br />
<br />
• [http://adc-monitoring.cern.ch/ Main page]<br />
<br />
• [http://dashb-atlas-ddm-acc.cern.ch/dashboard/request.py/ddmaccounting DDM Accounting]<br />
<br />
• [http://atlas-agis.cern.ch/agis/ddmblacklisting/listGroup space]<br />
<br />
• [http://dashb-atlas-ddm.cern.ch/ddm2/ Deletion]<br />
<br />
ASAP<br />
<br />
• ASAP (ATLAS Site Availability Performance) in place. Every 3 months the T2s sites performing BELOW 80% are [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/ATLASSiteCategorization reported to the International Computing Board.]<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End ATLAS****************** -----><br />
<!-- ****************Start CMS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK CMS<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End CMS****************** -----><br />
<!-- ****************Start LHCb****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK LHCb<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End LHCb****************** -----><br />
<!-- ****************Start Other****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK OTHER<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Other****************** -----><br />
<!-- ****************Start Requests****************** -----><br />
{| cellspacing="0" cellpadding="0" style="margin: 1em 0em 0em 0.3em; width:100%"<br />
| style="width:50%; vertical-align:top; border:1px solid Gold; background-color: LightGreen;" rowspan="2"|<br />
<div style="border-bottom:1px solid Gold; background-color:#ffffaa; padding:0.2em 0.5em 0.2em 0.5em; font-size:110%; font-weight:bold;"> To note</div><br />
<div style="padding:0.4em 1em 0.3em 1em;"><br />
<br />
==== ====<br />
<!-- ******************Edit start********************* -----><br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<br />
</div><br />
|}<br />
<br />
<!-- ****************End Requests****************** -----><br />
<br />
|}</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Operations_Bulletin_LatestOperations Bulletin Latest2015-11-30T14:40:58Z<p>Matthew Doidge 1ac9bd3994: /* */</p>
<hr />
<div>[[Operations_Bulletin_Archive|Bulletin archive]]<br />
<br />
__NOTOC__<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Week commencing 30th November 2015<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Task Areas<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start General****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | General updates <br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start General text*********************** ----->'''<br />
<br />
'''Tuesday 24th November'''<br />
* Some ATLAS sites (as of Monday) still had PRODDISK listed in the BDII.<br />
* ATLAS DPM 'dump' - a list of files?<br />
* An agenda for December's GDB is developing - [https://indico.cern.ch/event/319754/ see here]. Tier-2 representation has wained recently. Perhaps someone attending the DPM meeting could attend the GDB too?<br />
* [https://www.gridpp.ac.uk/wiki/Example_Build_of_an_ARC/Condor_Cluster#Patch_for_BDII_Job_Count_Breakdown Andrew's BDII patch] used in relation to LSST job breakdown issues.<br />
* [https://www.gridpp.ac.uk/wiki/User_Interface_%28UI%29_to_support_approved_VOs Installing a UI on SL6].<br />
<br />
'''Tuesday 17th November'''<br />
* The [https://vm36.tier2.hep.manchester.ac.uk/users/case-studies/galdyn/ GalDyn case study] has been written up<br />
* There is now a job adverts section on the new GridPP website. Now we need a process to populate the area....<br />
* There is a summary available of the [https://indico.cern.ch/event/401680 LHCOPN-LHCONE meeting in Amsterdam on the 28-29 of October 2015].<br />
* Ewan: Pilot accounts for everyone<br />
* Welcome to Markus Ebert who has started at Edinburgh (designated GridPP/LSST DAC person).<br />
<br />
<br />
'''Tuesday 10th November'''<br />
* There was a WLCG GDB last week: [http://indico.cern.ch/event/319753/ Agenda]. [https://twiki.cern.ch/twiki/bin/view/LCG/GDBMeetingNotes20151104 Minutes].<br />
* ATLAS is going to move the brokering to use maxrss value rather than maxrss+maxswap (called maxmemory in the ATLAS panda queues). [https://twiki.cern.ch/twiki/bin/view/LCG/BSPassingParameters Reminder notes].<br />
* There is an EGI meeting in Bari this week. The OMB agenda/slides can be seen [https://indico.egi.eu/indico/contributionDisplay.py?contribId=1&confId=2675 here]. <br />
* An update from John Gordon on the CPU efficiency accounting discussion of last week."Stuart has the database merge under way now. We had hoped it would be done by the end of October but it isn’t complete yet. When done and we send an integrated dataset to the portal they will put the current dev view in the production portal alongside the current one. They are currently developing a major rewrite of the portal and don’t want to mess about with it too much."<br />
*J Perkin: Weird user error when reading LFC. Led to a reminder that the CA team are currently best efforts - please be patient.<br />
* J Hill: ATLAS consistency checks at Tier2s requires sites to upload stiorage dumps - please can we have a clear statement of what parameters ATLAS wants us to use for these dumps?<br />
* Simon G: Asked about Tier3 access to Tier2 storage.<br />
* LCG-ROLLOUT: TOP BDII issues with CentOS 6.7 (openldap-servers-2.4.40-5.el6.x86_64). It basically breaks but is being followed up with RH.<br />
* For those interested in the ARGUS working group meeting last week please see their [https://indico.cern.ch/event/459898/ summary].<br />
* There was a GridPP Technical Meeting last Friday: [https://indico.cern.ch/e/460437 Agenda].<br />
<br />
<!-- **********************End General text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End General****************** -----><br />
<!-- ****************Start ops coord****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Operations Coordination - [https://indico.cern.ch/categoryDisplay.py?categId=4372 Agendas][https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsCoordination Wiki Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start ops coord text*********************** -----><br />
'''Tuesday 24th November'''<br />
* There was a WLCG ops coordination meeting last Thursday: [https://indico.cern.ch/event/393620/ Agenda] | [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151119 Minutes].<br />
* Baselines: dCache 2.6.x decommissioned deadline was end of September.<br />
* News: Critical Vulnerability broadcast by SVG on [https://wiki.egi.eu/wiki/SVG:Advisory-SVG-2015-CVE-2015-7183 Friday 06 affecting NSS].<br />
* T0 news: IPv6 enabled in MyProxy and VOMS for testing purposes, in dual-stack mode (IPv4 and IPv6). LSF 9 upgrade of the WNs is in QA testing. <br />
* ALICE: Good activity. Preparations for heavy ion reco jobs. At CERN ALICE jobs can request 2 cores and hence have twice the memory but with an impact on efficiency.<br />
* ATLAS: new record in parallel running slots: 250k. Some mc15b campaign jobs were requesting an excessive amount of conditions data. Deletion agents: deletion agents were switched off for a few days for operational reasons but once on again struggled to keep up.PRODDISK has been decommissioned on all the Tier2s.<br />
* CMS: High loads:~120k parallel jobs. Multi-billion events MC RECO campaign ahead.<br />
* LHCb: Very high activity. Several days of failures at SARA when srm was overloaded by a local user. Working on interface to HTCondor-CE.<br />
* glexec TF: NTR<br />
* HTTP TF: [ https://indico.cern.ch/event/459419 Meeting on 5th November]. Have a working Nagios probe, endpoint lists from the experiments, regular monitoring of the infrastructure (see links on agenda) and a GGUS support unit.<br />
* IS evolution: The first draft of the Future Use Cases document is now available for comments. Deadline to provide input is on 24th November. There was a [https://indico.cern.ch/event/454975/attachments/1188757/1724809/ISTF-minutes-12112015.pdf TF meeting on 12th November]. Looking at options for publishing a subset of the current GLUE schema that is useful for WLCG in JSON/HTTPS.<br />
* IPv6: NTR<br />
* MW readiness: Tests reported for EOS, DPM, dCache, BDII, MW readiness app. Next meeting 2nd December.<br />
* Multicore: NTR<br />
* Network and transfer metrics: perfSONAR collector, datastore, publisher and dashboard in production (stable operations). ALL sites are encouraged to enable auto-updates for perfSONAR. Pilots: ATLAS Panda, perfSONAR stream now in ATLAS Network Analytics. LHCb DIRAC bridge is now functional.<br />
* RFC proxies: NTR<br />
* Squid monitoring and proxy discovery: NTR.<br />
* AOB: Andrew McNab will take over the coordination of the Machine/Job Features task force.<br />
<br />
<br />
'''Tuesday 17th November'''<br />
* There is a [https://indico.cern.ch/event/393620/ WLCG ops coordination meeting this week] on Thursday.<br />
<br />
'''Tuesday 10th November'''<br />
* There was a WLCG ops coordination meeting last Thursday: [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151105 Minutes]. (Alessandra chaired and might be able to talk over the main items at our ops meeting).<br />
<br />
<!-- **********************End ops coord text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End ops coord****************** -----><br />
<br />
<!-- ****************Start T1****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-1 - [http://www.gridpp.rl.ac.uk/status/ Status Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start T1 text*********************** -----><br />
'''Tuesday 1st December'''<br />
A reminder that there is a weekly [http://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Tier-1 experiment liaison meeting]. Notes from the last meeting [https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2015-11-25 here]<br />
* We are investigating why LHCB batch jobs sometimes fail to write results back to Castor (and the sometimes fail to write remotely as well). A recent change has improved, but not fixed, this problem.<br />
* We are continuing with the detailed network changes needed to remove our old core switch from the network. We anticipate significant steps on the 8th and 9th December. <br />
* As reported last week, the tenders for the next round of Disk and CPU purchases are now out.<br />
* We have seen some increased data transfer rates to/from the Tier1. We have a plan to increase the bandwidth on the bypass link (to Tier2s) from 10 to 20 Gbit. <br />
* We are implementing a new algorithm for the draining of worker nodes to make space for multi-core jobs. The new version allows "pre-emptable" jobs (ie. jobs that can be stopped at short notice) to run in the job slots until they are needed.<br />
<!-- **********************End T1 text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End T1****************** -----><br />
<br />
<!-- ****************Start Storage & DM****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Storage & Data Management - [http://storage.esc.rl.ac.uk/weekly/ Agendas/Minutes]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20151104-minutes.txt Wednesday 18 Nov]'''<br />
* New member from Edinburgh!<br />
* Updates on PRODDISK cleanup at T2s, catalogue synchronisation (aka syncat), and what does "world readable" mean?<br />
* Summary of EGI community forum - interoperation and standards for moving data<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20151104-minutes.txt Wednesday 11 Nov]'''<br />
* Puppet for configuration<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20151104-minutes.txt Wednesday 04 Nov]'''<br />
* Brian presented GridPP achievements at the E2E workshop<br />
* Storage accounting with EGI<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20151028-minutes.txt Wednesday 28 Oct]'''<br />
* Summary of "UK T0" workshop - GridPP well represented<br />
* Sites should not upgrade to DPM 1.8.10 just yet<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150902-minutes.txt Wednesday 02 Sep]'''<br />
* Catch up with [http://www3.imperial.ac.uk/highenergyphysics/research/experiments/mice MICE]<br />
* How to do transfers of '''lots''' of files with FTS3 without the proxy timing out (in particularly if you need it vomsified)<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End Storage & DM****************** -----><br />
<!-- ****************Start T2EVO****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-2 Evolution - [https://its.cern.ch/jira/browse/GRIDPP/ GridPP JIRA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 24 Nov'''<br />
* Cloud Init demonstrated with modified version of (old) GridPP DIRAC VMs<br />
* Cloud Init support in Vac 0.20pre (GRIDPP-27)<br />
* Progress on VMs for new GridPP DIRAC service: multi-VO config currently preventing matching. (GRIDPP-9)<br />
<br />
'''Tuesday 17 Nov'''<br />
* New depo.gridpp.ac.uk service for uploading files to via HTTPS<br />
* ATLAS VMs now upload log files to depo.gridpp.ac.uk for debugging (GRIDPP-24)<br />
<br />
'''Monday 9 Nov'''<br />
* Vac 0.19.0 released last week: new VacQuery protocol<br />
* Prototyping Vcycle(/Vac) GLUE2 publishing: talk at WLCG InfoSys Evolution TF on Thursday<br />
* Please now use https://repo.gridpp.ac.uk/vacproject/ URLs for user_data and cernvm3.iso files (in preparation for move to new www.gridpp.ac.uk website.)<br />
* See Laurence's GDB talk "Helix Nebula Price Enquiry Results" for more use of Vcycle<br />
<br />
'''Tuesday 3 Nov'''<br />
* LHCb prototype of GOCDB pointers to resource BDII done<br />
* T2C tests at Oxford ongoing<br />
<br />
'''Tuesday 20 Oct'''<br />
* LHCb multipilot VMs now in production<br />
* Support for APEL-Sync records in Vac, but need to co-ordinate with APEL team to validate it. This is to allow pure-VM sites like UCL to pass APEL SAM tests (GRIDPP-10)<br />
* Last GridPP Technical Meeting decided to test disk-less operation at Oxford for CMS (GRIDPP-20) and LHCb (GRIDPP-21).<br />
<br />
'''Tuesday 6 Oct'''<br />
* UCL Vac site now running LHCb test of two payloads per dual processor VM. Total of dual processor VMs at UCL now 120.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End T2EVO****************** -----><br />
<!-- ****************Start Accounting****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Accounting - [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html UK Grid Metrics] [[HEPSPEC06]] [http://tinyurl.com/cfbfanf Atlas Dashboard HS06]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 24th November'''<br />
* Slight delay for Sheffield.<br />
<br />
'''Tuesday 3rd November'''<br />
* APEL delay (normal state) Lancaster and Sheffield.<br />
<br />
'''Tuesday 20th October'''<br />
The WLCG MB decided to create a Benchmarking Task force led by Helge Meinhard see [https://indico.cern.ch/event/433303/contribution/9/attachments/1154494/1658853/2015-09-15-LCGMB-Benchmarking.pdf talk]<br />
<br />
* [http://accounting.egi.eu/egi.php?SubRegion=1.67&query=normcpu&startYear=2014&startMonth=8&endYear=2014&endMonth=9&yRange=SITE&xRange=VO&voGroup=lhc&chart=GRBAR&scale=LIN&localJobs=onlygridjobs APEL status]: An issue at Sheffield?<br />
<br />
<!-- *****************Edit stop*********************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End Accounting****************** -----><br />
<!-- ****************Start Documentation****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Documentation - [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=area KeyDocs]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- *********************************************************** -----><br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 24th November'''<br />
Steve J: problems with voms server at fnal, voms.fnal.gov, resolved. Approved VOs document updated with newest records for those VOs affected, CDF, DZERO, LSST. Also, note changes to CA_DN for PLANCK and CDF. https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs<br />
<br />
'''Friday 6th Nov, 2015'''<br />
<br />
SteveJ: Advice to admins published about a common GSS error, globus_gsi_callback_module: Could not verify <br />
credential etc.<br />
<br />
https://www.gridpp.ac.uk/wiki/Security_system_errors_and_workarounds<br />
<br />
'''Tuesday 20th October, 2015'''<br />
<br />
Approved VOs document updated with temporary section for [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs#Approved_VOs_in_the_process_of_being_established LZ]<br />
<br />
'''Tuesday 29nd September'''<br />
Steve J: problems with voms server at fnal, voms.fnal.gov, have been detected; I will resolve them soon and may issue an update to Approved VOs, alerting sites with TB_SUPPORT should that occur. Approved VOs potentially affected are CDF, DZERO, LSST. Please do not act act yet.<br />
<br />
'''Tuesday 22nd September'''<br />
* Steve J is going to undertake some GridPP/documentation usability testing. <br />
<br />
'''Tuesday 18th August'''<br />
* Lydia's document - Setup a system to do data archiving using FTS3<br />
<br />
'''Tuesday 28th July'''<br />
* Ewan: /cvmfs/gridpp-vo help ... there's a lot of historical stuff on the GridPP wiki that makes it look a lot more complicated than it is now. We really should have a bit of a clear out at some point.<br />
<br />
'''Tuesday 23rd June'''<br />
* Reminder that documents need reviewing!<br />
<br />
<br />
'''General note'''<br />
<br />
See the [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=reviewed worst KeyDocs list] for documents needing review now and the names of the responsible people.<br />
<br />
|}<br />
<br />
<!-- ****************End Documentation****************** -----><br />
<!-- ****************Start Interop****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Interoperation - [https://wiki.egi.eu/wiki/Grid_Operations_Meetings EGI ops agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
''' Tuesday 10th November<br />
<br />
Meeting yesterday was cancelled in favour of OMB; next meeting scheduled for 14/12.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- *********************************************************** -----><br />
<br />
|}<br />
<!-- ****************End Interop****************** -----><br />
<!-- ****************Start Monitoring****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Monitoring - [https://www.gridpp.ac.uk/wiki/Links_Monitoring_pages Links] [http://grid-monitoring.cern.ch/mywlcg/ MyWLCG]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 16th June'''<br />
* F Melaccio & D Crooks decided to add a [https://www.gridpp.ac.uk/wiki/Monitoring_FAQs FAQs section] devoted to common monitoring issues under the monitoring page.<br />
* Feedback welcome. <br />
<br />
<br />
<br />
''' Tuesday 31st March<br />
<br />
* Glasgow Graphite/Grafana documentation: http://www.scotgrid.ac.uk/graphite/<br />
<br />
'''Monday 7th December<br />
<br />
* Meeting last Friday - agenda: https://indico.cern.ch/event/356853/ minutes: https://indico.cern.ch/event/356853/material/minutes/1.pdf<br />
* This was the wrap-up meeting of the consolidation TF; the mailing list will remain extant for a while yet.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Monitoring****************** -----><br />
<!-- ****************On-duty****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | On-duty - [https://operations-portal.in2p3.fr/dashboard Dashboard] [https://www.gridpp.ac.uk/wiki/ROD_rota ROD rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Monday 23rd November'''<br />
* Still seeing the alarms for systems at QMUL and RHUL that are not in production.<br />
* Closed some tickets for the rolling availabilities. Confused by the rolling availability plots. It seems the "rolling average" period was cut back to 20 days. <br />
<br />
'''Tuesday 3rd November'''<br />
* The long-standing UCL availability alarm went green yesterday on 29th October. We are not sure why!<br />
* Quite a lot of activity on the dashboard this week, but only one or two new tickets. <br />
* Tickets: Five for availability / reliability: Sussex, Sheffield, Liverpool, Lancaster and UCL. Two for GLUE2 validation: Liverpool and QMUL. One for the CEs at QMU<br />
<br />
'''Tuesday 20th October'''<br />
<br />
Lots of bits here and there, but no big pattern. Tickets about CE and storage problems open at several sites. QMUL notable as going on for a while, probably with some kind of configuration problem they're not identifying.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Rollout [http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html Status] [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions WLCG Baseline]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 15th September'''<br />
* [http://repository.egi.eu/2015/09/10/release-umd-3-13-3/ UMD 3.13.3 is available].<br />
<br />
'''Tuesday 12th May'''<br />
* MW Readiness WG meeting Wed May 6th at 4pm. Attended by Raul, Matt, Sam and Jeremy.<br />
<br />
'''Tuesday 17th March'''<br />
* Daniela has updated the [ https://www.gridpp.ac.uk/wiki/Staged_rollout_emi3 EMI-3 testing table]. Please check it is correct for your site. We want a clear view of where we are contributing.<br />
* There is a middleware readiness meeting this Wednesday. Would be good if a few site representatives joined.<br />
* Machine job features solution testing. Fed back that we will only commence tests if more documentation made available. This stops the HTC solution until after CHEP. Is there interest in testing other batch systems? Raul mentioned SLURM. There is also SGE and Torque.<br />
<br />
'''References'''<br />
<br />
* Staged Rollout pages (now separated into EMI1 & 2), and the page listing the deployed versions is extractable from the bdii, so they should all be reasonably up-to-date:<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout_emi2.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End On-duty****************** -----><br />
<!-- ****************Start Security****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Security - [http://www.gridpp.ac.uk/security/inchand/Incident.html Incident Procedure] [http://www.gridpp.ac.uk/security/policies/index.html Policies] [https://www.gridpp.ac.uk/wiki/SD_rota Rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
'''Tuesday 2nd December'''<br />
* [https://wiki.egi.eu/wiki/SVG:Advisory-SVG-2015-9707 ADVISORY EGI-SVG-2015-9707] - updated 'Various Java CVE's with max CVSS score'<br />
* Updated IGTF distribution version 1.70 available - now available for download from the Repository (and mirrors) [https://dist.igtf.net/distribution/igtf/current/ https://dist.igtf.net/distribution/igtf/current/]<br />
<br />
'''Tuesday 24th November'''<br />
* Call on NGIs to participate in "Security Threat Risk Assessment - with Cloud Focus" work.<br />
* Check Pakiti for CVE-2015-7183 issues.<br />
<br />
'''Tuesday 17th November'''<br />
* [https://wiki.egi.eu/wiki/SVG:Advisory-SVG-2015-CVE-2015-7183 Advisory-SVG-2015-CVE-2015-7183] issued 06/11/2015: a few UK sites show as unpatched in [https://operations-portal.egi.eu/csiDashboard/ngi/NGI_UK/tab/sites/filter/operators/page/sites EGI monitoring]. WNs, as tested by the monitoring, may be less vulnerable than affected middleware services but they could be taken as an indication of general site readiness and sites are encouraged to check their status. <br />
<br />
The EGI [https://operations-portal.egi.eu/csiDashboard security dashboard].<br />
<br />
|}<br />
<!-- ****************End Security****************** -----><br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start Services****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Services - [http://psmad.grid.iu.edu/maddash-webui/ PerfSonar dashboard] | [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).<br />
<br />
'''Tuesday 6th October'''<br />
* A reminder, the next [https://indico.cern.ch/event/401680/ LHCOPN and LHCONE joint meeting]: Science Park Amsterdam (NL) 28-29 of October 2015<br />
<br />
'''Tuesday 14th July'''<br />
* GridPP35 in September will have a part focus on networking and IPv6. This will include a review of where sites are with their deployment. Please try to firm up dates for your IPv6 availability between now and September. Please update the [https://gridpp.ac.uk/wiki/IPv6_site_status GridPP IPv6 status table]. <br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Services****************** -----><br />
<!-- ****************Start Tickets****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tickets<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tickets****************** -----><br />
<!-- ****************Start Tools****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tools - [https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] [https://gridppnagios.physics.ox.ac.uk/nagios/ Nagios]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== ===== <br />
<!-- ******************Edit start********************* -----><br />
''' Tuesday 6 Oct 2015'''<br />
<br />
Moved Gridppnagios instance back to Oxford from Lancaster. It was kind of double whammy as both sites went down together. Fortunately Oxford site was partially working so we managed to start SAM Nagios at Oxford. Sam tests were unavailable for few hours but no affect on egi availibilty/reliability. Sites can have a look at https://mon.egi.eu/myegi/ss/ for a/r status. <br />
<br />
'''Tuesday 29 Sep 2015'''<br />
<br />
Following an air-conditioning problem at machine room in Oxford Tier-2 site on 26 September, gridppnagios(OX) was shut down and gridppnagios(Lancs) became active instance. Oxford site is in downtime until 1st Oct and it may be extended depending on the situation. <br />
VO-Nagios was also unavailable for two days but we have started it yesterday as it is running on a VM. VO-nagios is using oxford SE for replication test so it is failing those tests. I am looking to change to some other SE. <br />
<br />
'''Tuesday 09 June 2015'''<br />
*ARC CEs were failing nagios test becuase of non-availability of egi repository. Nagios test compare CA version from EGI repo. It started on 5th June and one of the IP addresses behind webserver was not responding. Problem went away in approximately 3 hours. The same problem started again on 6th June. Finally it was fixed on 8th June. No reason was given in any of the ticket opened regarding this outage. <br />
<br />
'''Tuesday 17th February'''<br />
* Another period where message brokers were temporarily unavailable seen yesterday. Any news on the last follow-up?<br />
<br />
'''Tuesday 27th January'''<br />
* Unscheduled outage of the EGI message broker (GRNET) caused a short-lived disruption to GridPP site monitoring (jobs failed) last Thursday 22nd January. Suspect BDII caching meant no immediate failover to stomp://mq.cro-ngi.hr:6163/ from stomp://mq.afroditi.hellasgrid.gr:6163/<br />
<br />
* [http://southgrid.blogspot.co.uk/2014/10/nagios-monitoring-for-non-lhc-vos.html Blog about VO Nagios]<br />
* [https://vo-nagios.physics.ox.ac.uk/nagios/ Oxford VO Nagios] currently monitoring gridpp, pheno, t2k.org, snoplus.snolab.ca, vo.southgrid.ac.uk.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tools****************** -----><br />
<!-- ****************Start VOs****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | VOs - [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS] [http://operations-portal.egi.eu/vo VO IDs] [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs Approved] [http://pprc.qmul.ac.uk/~walker/votable.html VO table]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 19th May'''<br />
* There is a current priority for enabling/supporting our joining communities. <br />
<br />
'''Tuesday 5th May'''<br />
* We have a number of VOs to be removed. Dedicated follow-up meeting proposed.<br />
<br />
'''Tuesday 28th April'''<br />
* For SNOPLUS.SNOLAB.CA, the port numbers for voms02.gridpp.ac.uk and voms03.gridpp.ac.uk have both been updated from 15003 to 15503.<br />
<br />
'''Tuesday 31st March'''<br />
* LIGO are in need of additional support for debugging some tests.<br />
* LSST now enabled on 3 sites. No 'own' CVMFS yet.<br />
<br />
* Impact<br />
** Citation policy (https://www.gridpp.ac.uk/acknowledging.html)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End VOs****************** -----><br />
<!-- ****************Start Sites****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Site Updates<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 24th February'''<br />
* Next review of status today.<br />
<br />
'''Tuesday 27th January'''<br />
* Squids not in GOCDB for: UCL; ECDF; Birmingham; Durham; RHUL; IC; Sussex; Lancaster<br />
* Squids in GOCDB for: EFDA-JET; Manchester; Liverpool; Cambridge; Sheffield; Bristol; Brunel; QMUL; T1; Oxford; Glasgow; RALPPD.<br />
<br />
'''Tuesday 2nd December''' <br />
* [https://www.gridpp.ac.uk/wiki/Batch_system_status Multicore status]. Queues available (63%)<br />
** YES: RAL T1; Brunel; Imperial; QMUL; Lancaster; Liverpool; Manchester; Glasgow; Cambridge; Oxford; RALPP; Sussex (12)<br />
** NO: RHUL (testing); UCL; Sheffield (testing); Durham; ECDF (testing); Birmingham; Bristol (7)<br />
<br />
* According to our [https://www.gridpp.ac.uk/wiki/Batch_system_status table] for cloud/VMs (26%)<br />
** YES: RAL T1; Brunel; Imperial; Manchester; Oxford (5)<br />
** NO: QMUL; RHUL; UCL; Lancaster; Liverpool; Sheffield; Durham; ECDF; Glasgow; Birmingham; Bristol; Cambridge; RALPP; Sussex (14)<br />
<br />
* [http://www.gridpp.ac.uk/php/gridpp-dirac-sam.php?action=viewlcg GridPP DIRAC jobs] successful (58%)<br />
** YES: Bristol; Glasgow; Lancaster; Liverpool; Manchester; Oxford; Sheffield; Brunel; IC; QMUL; RHUL (11)<br />
** NO: Cambridge; Durham; RALPP; RAL T1 (4) + ECDF; Sussex; UCL; Birmingham (4)<br />
<br />
* [https://www.gridpp.ac.uk/wiki/IPv6_site_status IPv6 status]<br />
** Allocation - 42%<br />
** YES: RAL T1; Brunel; IC; QMUL; Manchester; Sheffield; Cambridge; Oxford (8)<br />
** NO: RHUL; UCL; Lancaster; Liverpool; Durham; ECDF; Glasgow; Birmingham; Bristol; RALPP; Sussex<br />
<br />
* Dual stack nodes - 21%<br />
** YES: Brunel; IC; QMUL; Oxford (4)<br />
** NO: RHUL; UCL; Lancaster; Glasgow; Liverpool; Manchester; Sheffield; Durham; ECDF; Birmingham; Bristol; Cambridge; RALPP; Sussex, RAL T1 (15)<br />
<br />
<br />
<br />
'''Tuesday 21st October'''<br />
* High loads seen in xroot by several sites: Liverpool and RALT1... and also Bristol (see Luke's TB-S email on 16/10 for questions about changes to help).<br />
<br />
'''Tuesday 9th September'''<br />
* Intel announced the new generation of Xeon based on Haswell.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Sites****************** -----><br />
|}<br />
<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Meeting Summaries<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start PMB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Project Management Board - [http://www.gridpp.ac.uk/pmb/ Members][http://www.gridpp.ac.uk/php/pmb/minutes.php Minutes] [http://www.gridpp.ac.uk/pmb/ProjectManagement/QuarterlyReports/reports.html Quarterly Reports]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ******************************************* -----><br />
<!-- ****************End PMB****************** -----><br />
<!-- ****************Start Grid Ops****************** -----><br />
<!-- ******************************************* -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | GridPP ops meeting - [http://indico.cern.ch/categoryDisplay.py?categId=338 Agendas] [https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Actions] [https://www.gridpp.ac.uk/wiki/Category:GridPP_Operations Core Tasks]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Grid Ops****************** -----><br />
<!-- ****************Start T1 liaison****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) [https://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Agenda] Meeting takes place on Vidyo.<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
Highlights from this meeting are now included in the Tier1 report farther up this page.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Liaison****************** -----><br />
<!-- ****************Start GDB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Grid Deployment Board - [http://indico.cern.ch/categoryDisplay.py?categId=3l181 Agendas] [http://indico.cern.ch/categoryDisplay.py?categId=666 MB agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End GDB****************** -----><br />
<br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start NGI****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | NGI UK - [http://www.ukngi.ac.uk/ Homepage] [https://ca.grid-support.ac.uk/cgi-bin/pub/pki?cmd=getStaticPage&name=index CA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End NGI****************** -----><br />
<br />
<!-- ****************Start Events****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Events<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************Stop Events****************** -----><br />
<!-- ****************Start ATLAS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK ATLAS - [http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?#currentView=Shifter+view&highlight=false Shifter view] [http://www.atlas.ac.uk/ops/ News & Links]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Atlas S&C week 2-6 Feb 2015'''<br />
<br />
Production<br />
<br />
• Prodsys-2 in production since Dec 1st<br />
<br />
• Deployment has not been transparent , many issued has been solved, the grid is filled again <br />
<br />
• MC15 is expected to start soon, waiting for physics validations, evgen testing is underway and close to finalised.. Simulation expected to be broadly similar to MC14, no blockers expected. <br />
<br />
Rucio<br />
<br />
• Rucio in production since Dec 1st and is ready for LHC RUN-2. Some fields need improvements, including transfer and deletion agents, documentation and monitoring. <br />
<br />
• [https://rucio-ui.cern.ch/dumps Rucio dumps available].<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMDarkDataCleaning Dark data cleaning]<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMLostFilesLost files declaration] . Only Only DDM ops can issue lost files declaration for now, cloud support needs to fill a ticket.<br />
<br />
• Webdav panda functional tests with Hammercloud are ongoing<br />
<br />
Monitoring<br />
<br />
• [http://adc-monitoring.cern.ch/ Main page]<br />
<br />
• [http://dashb-atlas-ddm-acc.cern.ch/dashboard/request.py/ddmaccounting DDM Accounting]<br />
<br />
• [http://atlas-agis.cern.ch/agis/ddmblacklisting/listGroup space]<br />
<br />
• [http://dashb-atlas-ddm.cern.ch/ddm2/ Deletion]<br />
<br />
ASAP<br />
<br />
• ASAP (ATLAS Site Availability Performance) in place. Every 3 months the T2s sites performing BELOW 80% are [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/ATLASSiteCategorization reported to the International Computing Board.]<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End ATLAS****************** -----><br />
<!-- ****************Start CMS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK CMS<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End CMS****************** -----><br />
<!-- ****************Start LHCb****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK LHCb<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End LHCb****************** -----><br />
<!-- ****************Start Other****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK OTHER<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Other****************** -----><br />
<!-- ****************Start Requests****************** -----><br />
{| cellspacing="0" cellpadding="0" style="margin: 1em 0em 0em 0.3em; width:100%"<br />
| style="width:50%; vertical-align:top; border:1px solid Gold; background-color: LightGreen;" rowspan="2"|<br />
<div style="border-bottom:1px solid Gold; background-color:#ffffaa; padding:0.2em 0.5em 0.2em 0.5em; font-size:110%; font-weight:bold;"> To note</div><br />
<div style="padding:0.4em 1em 0.3em 1em;"><br />
<br />
==== ====<br />
<!-- ******************Edit start********************* -----><br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<br />
</div><br />
|}<br />
<br />
<!-- ****************End Requests****************** -----><br />
<br />
|}</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Past_Ticket_Bulletins_2015Past Ticket Bulletins 20152015-11-30T14:40:50Z<p>Matthew Doidge 1ac9bd3994: </p>
<hr />
<div>'''Monday 23rd November 2015, 15.00 GMT'''<br /><br />
There were just 18 Open UK Tickets - now we're up to 37...<br />
<br />
'''ATLASPRODDISK Deletion''' (20/11)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117740 117740] (RALPP)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117739 117739] (Sussex)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117737 117737] (Birmingham)<br />
<br />
Brian (as an agent of atlas) has asked sites to delete the data from proddisk and remove the tokens. Chris has sorted it already for RALPP and the ticket has mutated to a generic atlas cleanout. No news on the other two.<br />
<br />
'''PILOT LIGHT AT THE END OF THE TUNNEL'''<br /><br />
In another round of Pilot wrangling by Daniela sites have been asked to implement pilots for the Pheno VO (or where they are enabled find out why they're not working!):<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117706 117706] (GLASGOW)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117710 117710] (Brunel)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117711 117711] (RALPP)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117723 117723] (QMUL)<br />
<br />
For completeness the older pilot tickets:<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] (Sheffield)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116866 116866] (TIER 1)<br />
The Tier 1 is at the "debugging" stage, no news from Sheffield though.<br />
<br />
'''HOT OFF THE TICKET PRESS'''<br /><br />
Atlas have ticketed everyone and their grandmother asking to implement the automated storage consistency check stuff. Which I think is a little unfair as I'm unconvinced of the maturity of the tools after spending a day with them last week.<br />
<br />
I've updated Lancaster's ticket to see if we've set this up right during our mucking about last week: [https://ggus.eu/index.php?mode=ticket_info&ticket_id=117883 117883]<br /><br />
''Update''- Atlas have replied to say things are looking good at Lancaster so we just need a few more tweaks (such as using Alessandra's updated and polished version of the [https://gitlab.cern.ch/atlas-adc-ddm-support/dark_data/blob/master/dpm_dump.py dpm_dump.py] script and creating dumps that are only a day old and we're done.<br />
<br />
'''EVEN HOTTER OFF THE PRESS'''<br /><br />
Brunel have been asked by CMS to upgrade their DPM to 1.8.10 in [https://ggus.eu/?mode=ticket_info&ticket_id=117922 117922] - however Raul reports that things aren't working all that well, suggesting CMS need to have a natter with the DPM devs.<br />
<br />
'''Monday 16th November 2015, 15.45 GMT'''<br /><br />
Only 17 Open UK tickets this week.<br />
<br />
'''LSST FRIDAY 13th TEETHING TROUBLES'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117585 117585] (Liverpool)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117586 117586] (Oxford)<br /><br />
Daniela has been testing out the LSST Dirac pilots - Ewan's fighting the good fight at Oxford getting his ARC to work, maybe Liverpool's problem has a similar root cause (we should be so lucky)?<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116866 116866] (12/10)<br /><br />
In a similar vein (and thanks again to Daniela for the effort with preparing our pilots) is this ticket about getting "other" pilots enabled at the Tier 1 - particularly with a view of these tests: http://www.gridpp.ac.uk/php/gridpp-dirac-sam.php?action=viewlcg Any news? On hold (6/11)<br />
<br />
'''ECDF(-RDF)'''<br /><br />
I think this new atlas ticket:<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117606 117606] (15/11)<br /><br />
is a duplicate of this current atlas ticket:<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117447 117447] (8/11)<br /><br />
So I suspect it can be closed!<br />
<br />
I see from [https://ggus.eu/?mode=ticket_info&ticket_id=117642 117642] that the other [https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 VO nagios] has been recently rebooted which might explain the lot of failures I see, I'll check again before the meeting.<br />
<br />
<br />
'''Tuesday 1oth November 2015, 10.20 GMT'''<br /><br />
Down to 13 Open tickets this week!<br />
<br />
All the UK can be found [http://tinyurl.com/nwgrnys '''here'''].<br />
<br />
Not very exciting - the ECDF-RDF ticket [https://ggus.eu/?mode=ticket_info&ticket_id=117447 117447] (atlas complaining about stale file handle error messages - but it's for the "RDF" so not urgent?) might have slipped past Andy's sentries yesterday, and the two remaining pilot role tickets for Sheffield and the Tier 1 ([https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] and [https://ggus.eu/?mode=ticket_info&ticket_id=116866 116866]) could really do with updates - with the latter Daniela notes that the gridpp pilots roles are needed for testing purposes.<br />
<br />
Anything I've missed?<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios'''] - no persistent trouble anywhere (that matters to us) - a few ARC CE submit troubles over the last hour or so with some RALPP CEs but I don't think that's ought to worry about.<br />
<br />
'''Monday 2nd November 2015, 13.30 GMT'''<br /><br />
22 Open UK Tickets this week. First Monday of the Month, so all the tickets get looked at, however run of the mill they are.<br />
<br />
First, the link to all the UK [http://tinyurl.com/nwgrnys '''tickets'''].<br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116915 116915] (14/10)<br /><br />
Low availability Ops ticket. On holded whilst the numbers sooth themselves. On Hold (23/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116865 116865] (12/10)<br /><br />
Sno+ job submission failures. Not much on this ticket since it was set In Progress. Looks like an argus problem. How goes things at Sussex before Matt RB moves on? (We'll miss you Matt!). In progress (20/10)<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117261 117261] (28/10)<br /><br />
Atlas jobs failing with stage out failures. Federico notices that the failures are due to odd errors - "file already existing", and that things seem to be calming themselves. He's at a loss of what RALPP can do. Checking the panda link suggests the errors are still there today. Waiting for reply (29/10)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116775 116775] (6/10)<br /><br />
Bristol's CMS glexec ticket. It looks like the solution is to have more cms pool accounts (which of course requires time to deploy). In progress (28/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117303 117303] (30/10)<br /><br />
CMS, not Highlander fans, don't seem to believe that There can be only One (glexec ticket). Poor old Bristol seem to be playing whack-a-mole with duplicate tickets. Is there a note that can be left somewhere to stop this happening? Assigned (30/10)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=95303 95303] (Long long ago)<br /><br />
Edinburgh's (and indeed Scotgrid's) only ticket is this tarball glexec ticket. A bit more on this later. On hold (18/5)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=114460 114460] (18/6)<br /><br />
Gridpp (and other) VO pilot roles at Sheffield. No news for a while, snoplus are trying to use pilot roles now for dirac so this is becoming very relevant. In progress (9/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116560 116560] (30/9)<br /><br />
Sno+ jobs failing, likely due to too many being submitted to the 10 slots that Sno+ has. Maybe a WMS scheduling problem - Stephen B has given advice. Elena asked if the problem persisted a few weeks ago. Waiting for reply (12/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116967 116967] (17/10)<br /><br />
A ROD availability ticket, on hold as per SOP. On hold (20/10)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116478 116478] (28/9)<br /><br />
Another availability ticket. Autumn was not kind to many of us! On hold (8/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116882 116882] (13/10)<br /><br />
Enabling pilot snoplus users at Lancaster. Shouldn't have been a problem, but turned into a bit of a comedy/tragedy of errors by yours truly mucking up. Hopefully fixed now- thanks to Daniela for her patience. In progress (2/11)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=95299 95299] (Far far away)<br /><br />
glexec tarball ticket. There's been a lot of communication with the glexec devs about this - the hopefully last hurdle is sorting out the RPATHs for the libraries. It's not a small hurdle though... On hold (2/11)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117151 117151](23/10)<br /><br />
A ticket about jumbo frame problems, submitted to QM. After Dan provided some education the user replied, in that he only sees this problem at two atlas sites. But he is contacting the network admins at his institution to see if it is their end. On hold (29/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117011 117011] (19/10)<br /><br />
ROD ticket for glue-validate errors. Went away for a while after Dan re-yaimed his site bdii, but possibly back again. Daniela suggests re-running the glue-validate test. Reopened (2/11)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116689 116689] (6/10)<br /><br />
Another ROD ticket, where Ops glexec test jobs are seemingly timing out for QM (this is the ticket Daniela mentioned on the ops mailing list). Dan noted that with the cluster half full tests were passing, suggesting some kind of load correlation (but as he also notes - what's getting loaded and causing the problem - Batch, CE or WNs?). Kashif reckons the argus server, and suggests a handy glexec time test which he posted. In progress (2/11)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117324 117324] (2/11)<br /><br />
A fresh looking ROD ticket - Raul had to restart the BDII and hopefully that got it. In progress (2/11)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116358 116358] (22/9)<br /><br />
Missing Image at 100IT. 100IT have asked for more details, no news since. Waiting for reply (19/10)<br />
<br />
'''THE TIER 1'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116866 116866] (12/10)<br /><br />
Snoplus pilot enablement (not actually a word) at the Tier 1. New accounts were being requested after some internal discussion. On hold (19/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116864 116864] (12/10)<br /><br />
CMS AAA tests failing (the submitter notes "again..."). There are some oddities with other sites, which might be remote problems, but Andrew notes that previous manual fixes have been overwritten which likely explains why problems came back. In progress (does it need to be waiting for a reply?) (26/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117171 117171] (24/10)<br /><br />
LHCB had problems with an arc CE that was misbehaving for everyone. Things were fixed, and this ticket can now be closed. Waiting for reply (can be closed) (27/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117277 117277] (30/10)<br /><br />
Atlas have spotted "bring online timeout has been exceeded). This appears to be a mixture of problems adding up, such as a number of borken disk nodes and heavy write access by atlas. In progress (2/11)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117248 117248] (28/10)<br /><br />
I believe related to the discussion on tb-support, this ticket requests that new SRM host certs that meet the requirements specified be requested for the RAL SRMs. Jens was on it, and the new certs are ready to be deployed. In progress (30/10)<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios'''] - some badness at Sussex, but they have a ticket open for that.<br />
<br />
'''Monday 26th October 2015, 15.30 GMT'''<br /><br />
26 Open UK Tickets this week. Not many seem all that exciting though.<br />
<br />
The link to all the UK [http://tinyurl.com/nwgrnys '''tickets'''].<br />
<br />
The few (two) tickets that really caught my eye are:<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117151 117151] (23/10)<br /><br />
This ticket is quite interesting, mainly for Dan schooling the submitter. QM received a ticket complaining that their jumbo frames were breaking stuff - it doesn't looking like the problem is at QM though. Naught wrong with the ticket handling. Waiting for reply (23/10)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117065 117065] (20/10)<br /><br />
Bristol have a CMS glexec error ticket that looks very similar to an existing one ([https://ggus.eu/?mode=ticket_info&ticket_id=116683 116683]), which is in turn spookily similar to a ticket being worked on by the Bristol admins ([https://ggus.eu/?mode=ticket_info&ticket_id=116775 116775]). At the very least I would say that if the two problems are different one would likely obfuscate the other. Is this a case of over-keen shifters submitting tickets without checking? I'd be tempted to close one or both of [https://ggus.eu/?mode=ticket_info&ticket_id=116683 116683] and this one ([https://ggus.eu/?mode=ticket_info&ticket_id=117065 117065]). Tell them I said it was okay to[1]. In progress (21/10)<br />
<br />
[1] It probably is okay to.<br />
<br />
There are also 4 availability tickets, all On Hold waiting for 30 days or so to pass.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios'''] looks clean at time of writing.<br />
<br />
'''Monday 19th October 2015, 14.30 BST'''<br /><br />
28 Open UK Tickets today.<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116920 116920] (14/10)<br /><br />
UCL have a availability ticket, and Andrew M wonders what can be done for them as a VAC site to stop getting these sort of tickets? Assigned (15/10) ''Update - Alessandra has updated and On-Holded the ticket. Thanks!''<br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116865 116865] (12/10)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116915 116915] (14/10)<br /><br />
Sussex have a Sno+ ticket and a ROD ticket that don't seem to have been looked at since they were submitted last week. Both just Assigned.<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116918 116918] (14/10)<br /><br />
Another ROD ticket (Invalid glue), I think this one has snuck past the Liver-Lad's watch. Assigned (14/10)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116782 116782] (7/10)<br /><br />
Another Rod ticket, Dan looks like he's tracked down why one of his CE's is misbehaving (MaxStartups in sshd_config). Looks like this ticket at least can be closed, Gareth confirms that the test is green. Waiting for reply (19/10)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116560 116560] (30/9)<br /><br />
Stephen B has added some more information to try to figure out why Sno+ jobs are flooding Sheffield's 10 Sno+ slots. Elena is not sure if the problem persists though. Waiting for reply (12/10)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116864 116864] (12/10)<br /><br />
It looks like this CMS AAA test problem has resolved itself - Federica asks if you chaps at RAL changed anything? Looks like the ticket can be closed. In progress (15/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116866 116866] (12/10)<br /><br />
Enabling Sno+ pilots at the Tier 1. Only the LHC VOs had pilot roles enabled at the Tier 1, Andrew was going to discuss how best to make these changes. As with a similar issue at Lancaster - probably best to do it for all the VOs that will be using Dirac. In progress (13/10)<br />
<br />
'''Monday 12th October 2015, 14.30 BST'''<br /><br />
23 Open UK Tickets this week. Just a light review.<br />
<br />
'''OXFORD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116752 116752] (1/10)<br /><br />
Oxford's CMS Phedex renewal ritual ticket. Chris B kindly offered in his ticket for RALPP to officiate this (un)holy task for Oxford, so I advise some communication between you chaps and him - which may well be going on, but it ain't in the ticket! Assigned (6/10)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116683 116683] (5/10)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116775 116775] (6/10)<br /><br />
CMS have by the looks of it thrown a pair of duplicate tickets Bristol's way. How rude! Lukasz has rightfully suggested closing one of them (I suggest 116683. Winnie's put a good reply in t'other one).The underlying problem appears to be a shortage of pool accounts - what are the recommended amount of accounts for VOs these days? In progress.<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] (18/6)<br /><br />
Sheffield pilot role tickets. Daniela has pointed out that as Sno+ have started using Sheffield in earnest they really could do with pilot roles enabled. In progress (9/10)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116812 116812] (8/10)<br /><br />
LHCB asked Andy to clean up the $LOGIN_POST_SCRIPT for LHCB at his site, removing a "export CMTEXTRATAGS="host-slc5"" line. Naught wrong with the ticket, but I liked how lhcb did some good debugging, and there's something about a profile script problem that reminds me of a simpler time... In progress (9/10)<br />
<br />
[http://tinyurl.com/nwgrnys '''A link to the rest of the tickets for completeness.''']<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''The other VO nagios.''']<br /><br />
Looks okay, some (known I think) errors at QM, and Sheffield seems to have a few SRM errors going on since Saturday.<br />
<br />
<br />
'''Monday 5th October 2015, 14.15 BST'''<br />
<br />
22 Open UK Tickets this month, all of them, Site by Site:<br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116136 116136] (9/9)<br /><br />
Sussex got a snoplus ticket for a high number of job failures, although simple test jobs ran okay. Matt asks if the problem persists, the reply was a resounding "not sure". In progress (think about closing) (21/9)<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116652 116652](1/10)<br /><br />
A ticket from CMS, about some important Phedex ritual that must occur on the 3rd of November, when the stars are right. The ticket needs some confirmation and feedback, plus the nomination of one site acolyte to receive the DBParam secrets from CMS - but the ticket only got assigned to sites this morning. Assigned (5/10)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116651 116651](1/10)<br /><br />
Same as the RALPP ticket, Winnie has volunteered Dr Kreczko for the task. In progress (5/10)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303](Long, long ago)<br /><br />
glexec ticket. On hold (18/5)<br />
<br />
'''DURHAM'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116576 116576](1/10)<br /><br />
Atlas ticket asking Durham to delete all files outside of the datadisk path. Oliver asks what this means for the other tokens (I think they can be sacrificed to feed datadisk, but Brian et al can confirm that). Waiting for reply (5/10)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116560 116560](30/9)<br /><br />
Sno+ jobs having trouble at Sheffield. Looks like a proxy going stale problem as only 10 Sno+ jobs at a time can run at Sheffield. Matt M asks if/how the WMS can be notified to stop sending jobs in such a case. In progress (30/9)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460](18/6)<br /><br />
Gridpp Pilot roles. No news on this for a while, after the last attempt seemed to not quite work. In progress (30/7)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116585 116585](1/10)<br /><br />
Biomed ticketed Manchester with problems from their VO nagios box - which Alessandra points out being due to there being no spare cycles for biomed to run on. Assigned (can be put on hold or closed?) (1/10)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116082 116082](7/9)<br /><br />
A classic Rod Availability ticket. On Hold (7/9)<br />
<br />
'''LANCASTER''' (a little embarrassing that my own site has the most tickets)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116478 116478] (28/9)<br /><br />
Another availability ticket, this time for Lancaster (which has been through the wars in September). Still trying to dig our way out, but even the Admin's broke. On hold (5/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116676 116676] (5/10)<br /><br />
Another ROD ticket, Lancaster's not quite out of the woods. We think WMS access is somewhat broken. We have no idea about the sha2 error. In progress (5/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116366 116366] (22/9)<br /><br />
Sno+ spotted malloc errors at Lancaster. The problems seemed to survive one batch of fixes, but I asked again if they still see problems after running a good number of jobs over the weekend. Waiting for reply (5/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (In a galaxy far, far way)<br /><br />
glexec ticket. This was supposed to be done last week, after I had figured out [https://www.gridpp.ac.uk/wiki/RelocatableGlexec "the formula"] - but then last week happened. On hold (5/10)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115959 115959] (31/8)<br /><br />
LHCB job errors at QM, with a 70% pilot failure rate on ce05. Dan couldn't see where things are breaking (only that the CE wasn't publishing to APEL- and asks if this is the cause of the problem?) Waiting for reply (5/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116662 116662] (5/10)<br /><br />
LHCB job failures on ce05 - almost certainly a duplicate of [https://ggus.eu/?mode=ticket_info&ticket_id=115959 115959], but it might have some useful information in it. Assigned (probably can be closed as a duplicate) (5/10)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116650 116650] (1/10)<br /><br />
Imperial's invitation to the CMS Phedex DBParam ritual. Daniela's on it, as well as the other CMS sites. On hold (5/10)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116649 116649] (1/10)<br /><br />
Brunel's ticket for the great DBParam alignment of 2015. On hold (5/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116455 116455] (28/9)<br /><br />
A CMS request to change the xrootd monitoring configs. Did you get round to doing this last week Raul? In progress (29/9)<br />
<br />
'''EFDA-JET'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115448 115448] (3/8)<br /><br />
Biomed having trouble tagging the jet CE. The Jet admins think this is the same underlying issues as their other ticket <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115496 115496]. In progress (25/9)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115496 115496] (5/8)<br /><br />
Biomed unable to remove files from the jet SE. There are clues that suggest that some dns oddness is the cause, but it's not clear. In progress (18/9)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116358 116358] (22/9)<br /><br />
Ticket complaining about a missing image at the site. Some to and fro, the ball is back in the site's court. In progress (2/10)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116618 116618] (1/10)<br /><br />
The Tier 1's CMS DBParam ritual ticket. In progress (5/10)<br />
<br />
Let me know if I missed ought.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''T'OTHER VO NAGIOS''']<br /><br />
At time of writing things looka a bit rough at QM, Liverpool (just getting over their downtime) and for Sno+ at Sheffield (likely related to their ticket).<br />
<br />
<br />
Matt's on holiday until the 29th, so he's being replaced with links or any update Jeremy is kind enough to provide.<br />
<br />
[http://tinyurl.com/nwgrnys '''CAST YOUR GAZE UPON THE UK'S GGUS TICKETS HERE'''].<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''FEAST YOUR EYES ON THE T'OTHER VO NAGIOS STATUS HERE''']<br />
<br />
"Normal" service will resume in October. I'll leave y'all anticipating that lovely review of all the tickets.<br />
<br />
'''Monday 14th September, 15.00 BST'''<br /><br />
<br />
Yet another brief ticket review - its been a tad busy! Hopefully normal service will resume, err, next month. My apologies.<br />
<br />
Only 20 Open UK tickets this week.<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115805 115805] - This RAL ticket from Sno+ looks like it can be closed, with there not actually being a problem with the site, or a feature that RAL would really want to implement. <br />
<br />
Keeping on the Tier 1, does this ticket concerning the FTS3 certificates ([https://ggus.eu/?mode=ticket_info&ticket_id=115290 115290]) have anyone looking into it yet?<br />
<br />
Are these two QMUL LHCB tickets duplicates, or related? [https://ggus.eu/?mode=ticket_info&ticket_id=115959 115959] & the more recent [https://ggus.eu/?mode=ticket_info&ticket_id=116153 116153]<br />
<br />
This Sussex ticket looks like it could do with someone taking a look - still just "assigned" since the 9th - [https://ggus.eu/?mode=ticket_info&ticket_id=116136 116136]<br />
<br />
Finally, an interesting VAC ticket for Oxford - "no space left on device" for atlas jobs - [https://ggus.eu/?mode=ticket_info&ticket_id=116123 116123]. Bit of an odd one!<br />
<br />
'''Monday 7th September 2015, 15.00 BST'''<br /><br />
20 Open UK Tickets this week.<br />
Due to Interesting Downtimes (apologies for reusing that pun!) yet another fairly light review. But not much is going on in ticketland.<br />
<br />
[http://tinyurl.com/nwgrnys THE GGUS TICKETS IN ALL THEIR GLORY]<br /><br />
The Sno+ ticket [https://ggus.eu/?mode=ticket_info&ticket_id=115805 115805] is interesting, Sno+ are looking at monitoring jobs submitted via WMS through the port 9000 URLS, but the RAL WMS behaves differently from the Glasgow one and doesn't let others with the same roles look at the links. Sno+ are still "developing" their grid infrastructure with the WMS in mind by the looks of it.<br />
<br />
The pilot role at Sheffield ticket [https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] could still do with some attention, or an update.<br />
<br />
Bristol had a ticket from CMS that looks interesting - [https://ggus.eu/?mode=ticket_info&ticket_id=115883 115883]. The CMS SAM3 tests are confused by Bristol having an SRM-less SE endpoint. Waiting for reply after things were clarified.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 T'Other VO Nagios Page]<br /><br />
A few "The CREAM service cannot accept jobs at the moment" style errors at QM, but they're only a few hours old. Otherwise looking alright beyond the usual noise.<br />
<br />
Of course with these light reviews I could well be missing something, so feel free to let me know - sites or VO representatives.<br />
<br />
'''Monday 24th August 2015, 15.45 BST'''<br /><br />
<br />
21 Open UK tickets this week, most being looked at or are understood. There will be no ticket update from Matt next week (1st September) either, as he will be flapping about during a local downtime.<br />
<br />
[http://tinyurl.com/nwgrnys YE OLDE GGUS TICKET LINK]<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] (gridpp pilots at Sheffield) could do with an update. The Liverpool ticket discussed last week ([https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248]) has received an update from the user saying that the ticket can be closed. As Steve mentioned last week the underlying issue is still very much there, but I don't think this ticket is a suitable banner for us to fight that battle under.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 T'Other VO Nagios Page]<br /><br />
As usual nothing to see here at time of writing, sites are doing a grand job of working for the monitored VOs.<br />
<br />
And that's all folks! See you in Liverpool.<br />
<br />
<br />
'''Monday 17th August 2015, 15.00 BST'''<br /><br />
29 Open UK Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114992 114992] (10/7)<br /><br />
It looks like this CMS transfer problem ticket can be closed after a user update last week, which reported enabling multiple streams solved the initial failures. In progress (13/8)<br />
<br />
'''EFDA-JET'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115613 115613] (assigned) ''Update- looks like this was a temporary problem, and the ticket can be closed.<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115448 115448] (in progress, but empty)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115496 115496] (in progress, but might not be a site problem).<br /><br />
Jet have 3 biomed tickets, 2 of which are looking a little neglected.<br />
<br />
'''CAMBRIDGE'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115655 115655] (12/8)<br /><br />
John rightfully asks why does a lengthy but very scheduled downtime set off a ROD alarm. Other then that it'll be worth "on-holding" this ticket whilst the unrighteous red flag fades. In progress (17/8) ''Update - On holded''<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
This Sno+ ticket looks like it needs some chasing up, no news for nearly 2 months. In progress (21/7) ''Update - Steve commented on this on TB-SUPPORT''<br />
<br />
'''Monday 10th August 2015, 15.00 BST'''<br /><br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 T'Other VO Nagios] looks alright at time of writing.<br />
<br />
24 Open UK tickets this week.<br />
<br />
''Lots of activity on GGUS since yesterday. Most positive, but I see the number of tickets at EFDA-JET increasing - all three from biomed.''<br />
<br />
'''Tier 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115512 115512] (5/8)<br /><br />
An interesting ticket - where a banned user is still banned after moving to LHCB from Biomed (no, he's not called Heinz). In Progress (6/8) ''Update - Andrew has asked that the ticket be reassigned to the argus devs as the pepd logs are showing oddness. Waiting for reply now.''<br />
<br />
'''Bristol'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115565 115565] (7/8)<br /><br />
Bristol's phedex agents are down, and have been for a few days. I might have dreamt this, but thought that the Bristol Phedex service might not be hosted at Bristol, especially after RALPP had a similar ticket ([https://ggus.eu/?mode=ticket_info&ticket_id=115566 115566]) at the same time. Assigned (7/8) ''Update - solved''<br />
<br />
'''Manchester'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115504 115504] (ROD Ticket) ''Solved'' <br /> <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115399 115399] (Wiki Ticket) ''Still Open'' <br /> <br />
Both these tickets look like they can be closed.<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115525 115525] (5/8)<br /><br />
Atlas deletion errors after a a disk server fell over- nothing wrong with the ticket handling, but Alessandra brings up a point that always niggles me - the emphasis on the total number of transaction errors and not the number of affected unique files. In progress (8/8)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB ticket sparked by those IPv6 problems. Still no word from Vladimir; Raja - could you comment? I suspect there's been plenty of room for LHCB jobs in QM's (and everyone else who mainlines atlas jobs) queues this weekend. Waiting for reply (21/7)<br />
<br />
'''Monday 3rd August 2015, 14.30 BST'''<br /><br />
<br />
23 Open tickets this month, full review time.<br />
<br />
'''''Newish this morning'''''<br /><br />
As discussed on TB-SUPPORT, a few sites have been getting "I can't lcg-tag your CE" tickets from Biomed. The Liverpool ticket [https://ggus.eu/?mode=ticket_info&ticket_id=115449 115449], solved and verified, was the flagship of these issues. Brunel ([https://ggus.eu/?mode=ticket_info&ticket_id=115445 115445]) and EFDA-JET ([https://ggus.eu/?mode=ticket_info&ticket_id=115448 115448]) also have tickets about this.<br />
<br />
<br />
'''Sno+ "glite-wms-job-status warning"''' (3/8)<br /><br />
Glasgow: [https://ggus.eu/?mode=ticket_info&ticket_id=115435 115435]<br /><br />
Tier 1: [https://ggus.eu/?mode=ticket_info&ticket_id=115434 115434]<br /><br />
Matt M submitted these tickets to Glasgow and the Tier 1 after having trouble with a proportion of Sno+ jobs. Both are being looked at- definitely worth collaborating on this one.<br />
<br />
'''Wiki-leak...'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115399 115399](31/7)<br /><br />
Jeremy noticed that the wiki didn't work for him on Friday - but it seems to work for Jeremy, Alessandra and myself now. As Jeremy notes the ticket can be closed, but out of interest did anyone else spot any problems? In progress (3/8)<br />
<br />
'''Spare the ROD...'''<br /><br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115433 115433] (3/8)<br /><br />
Some CE problems noticed on the dashboard for the Liver-lads - who might be in mourning. Assigned (3/8) ''Update - aaannd Solved by upping gridftp max connections''<br />
<br />
'''RALPP & UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114764 114764]<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114851 114851]<br /><br />
Both of these are "availability" alarm tickets, on-holding until they clear. I hope RALPP managed to get a re-computation for their unfair failures (IGTF-1.65 problems on ARC).<br />
<br />
'''Sno+'d Under'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115387 115387] (30/7)<br /><br />
I'm uncertain if this Sno+ ticket, probably somewhat related to Matt M's recent thread on TB-SUPPORT and concerning xrootd access, is meant for the Tier 1 or RALPP. Assigned (3/8)<br />
<br />
'''First Tier Problems.'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115417 115417] (2/8)<br /><br />
LHCB spotted a number of nodes with cvmfs problems at the Tier 1, which the RAL team had already jumped on and repaired this morning. They wonder if the problem persists. Waiting for reply (3/8)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115290 115290] (28/7)<br /><br />
An FTS problem requiring some special CA magic to solve, but the current CA-wizard isn't about. On hold (29/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113836 113836] (20/5)<br /><br />
Glue 1 vs Glue 2 queue mismatches. It's being worked on perfecting cluster publishing for ARC CEs, but the ticket could either do with an update or on-holding. In progress (24/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114992 114992] (10/7)<br /><br />
CMS transfers failing between RAL and, err, TAMU in the US. Assigned to RAL, where Brian has investigating and Andrew has posed an good question, asking if the user has considered managing the transfers with FTS. Quiet on the user side. In progress (21/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/2014)<br /><br />
One of the tickets from the before times, about CMS AA access tests. It has become a long and confusing saga, but Gareth rescued it with a handy summary of the issue in his last update. How goes the battle? In progress (17/7)<br />
<br />
'''Oxford Squid is red.'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115230 115230] (24/7)<br /><br />
Which might be the colour Ewan's seeing right now! The ticket is reopened, with a comment from Alessandra that the current recommendation is to allow all CERN addresses, and asks if this is something Oxford could do. Reopened (3/8) ''Update - solved after a squid restart. It's not the end of this ordeal, but Ewan would like to tackle it in a different arena then an Oxford ticket.''<br />
<br />
'''Mavaricks and Gooses - GridPP Pilot roles.'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114485 114485] '''Bristol'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] '''Sheffield'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114442 114442] '''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114441 114441] '''RHUL'''<br /><br />
Daniela hopped right back into the pilot seat after getting back from her holidays. Bristol and RALPP are looking good, Sheffield and RHUL are still in the Danger Zone - RHUL in particular were having troubles with argus and could do with some working configs from elsewhere to compare and contrast with their own.<br /><br />
''Update - RHUL are looking better, just a few queue permission tweaks to go by the looks of it.''<br />
<br />
<br />
'''My shame - tarball glexec'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] '''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] '''Lancaster'''<br /><br />
The tarball glexec tickets. Actually this is likely to become a defunct (or at least different) problem at Edinburgh with their SL7 move. Lancaster has a plan - we plan to deploy *something* (amazing plan there Matt) during our next big reinstall in September. Between now and then I have a test CE, cluster and most importantly some time. <br />
<br />
'''Pot-luck tickets (or those I couldn't group).'''<br />
<br />
'''Durham'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br /><br />
A tiny fraction of jobs publishing 0 cores used. Looks to be a slurm oddity. Oliver upgraded their CEs to ARC5 last week and hopes this has fixed things. Fingers crossed! In progress (29/7)<br />
<br />
'''Lancaster'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/2014)<br /><br />
My other shame - Lancaster's poor perfsonar performance. It's being worked on. On Hold (should be back in progress soon) (30/7)<br />
<br />
'''Liverpool'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (21/7)<br /><br />
Sno+ production problems at Liverpool, probably due to a lack of space in the shared area. Things are back in Sno+'s court, with the submitter consulting the Sno+ gurus (I think). In progress (21/7)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB job submission problems due to the known about dual-stacking problems. Waiting for input from LHCB for a while now, as things look okay at QM now but at least check LHCB jobs still weren't running for some reason. Waiting for reply (21/7)<br />
<br />
That's all folks!<br />
<br />
'''Monday 27th July 2015, 16.10 BST'''<br /><br />
<br />
Only 20 UK Tickets this week, and many are on hold for summer holidays. I pruned a few tickets, but none are striking me as needing urgent action, so this will be brief.<br />
<br />
[http://tinyurl.com/nwgrnys Mandatory UK GGUS Link]<br /><br />
Nothing to see here really, but maybe I'm missing something? Nit-picking I see:<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (publishing problems at Durham) could still do with an update - not sure if work is progressing offline on the issue.<br />
<br />
The Snoplus ticket [https://ggus.eu/?mode=ticket_info&ticket_id=115165 115165] looks like it might be of interest for others - in it Matt M asks about tape-functionality in gfal2 tools. Brian has updated the ticket clueing us in about gfal-xattr.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 UK T'other VO Nagios]<br /><br />
A few failures here at time of writing - although only one at Brunel seems to be more then a few hours old (dc2-grid-66.brunel.ac.uk is failing pheno CE tests).<br />
<br />
Let me know if I missed ought!<br />
<br />
'''Monday 20th July 2015, 14.30 BST'''<br /><br />
27 Open UK Tickets this week.<br />
<br />
'''Brunel'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115113 115113] (17/7)<br /><br />
This Brunel Ops ticket (otherwise okay) is a good reminder that when removing CEs from the GOCDB for your site make sure to get all the services, or else you just might end up still monitored (and thus end up ticketed!). Reopened (can probably be closed soon) (20/7) ''Update - ticket solved''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114006 114006] (31/5)<br /><br />
This ticket about problems with Brunels accounting figures was looking promisingly close to being solved (at least to my layman's eyes) 3 weeks ago. Any word offline? In progress (30/6)<br />
<br />
'''Lancaster'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114845 114845] (6/7)<br /><br />
LHCB jobs were failing at Lancaster a fortnight ago, but things should have been fixed quite promptly. Are they still broken? Waiting for reply (14/7)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
A similar case for this LHCB ticket for QM (part of their ongoing dual-stack saga). Waiting for reply (13/7)<br />
<br />
Note: The related issue [https://ggus.eu/index.php?mode=ticket_info&ticket_id=115017 115017] has been "solved" pending further testing. <br />
<br />
'''Sheffield'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114649 114649] (26/6)<br /><br />
This Sno+ ticket has been in "Waiting for Reply" for a little while, no word from the user (who isn't Matt M). Could we poke Sno+ though another channel about this? Waiting for reply (6/7)<br />
<br />
'''Liverpool'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
We could do with finding out from Sno+ the state of play at Liverpool too, although we might have to wait until Steve is back from his hols later this week to field any replies. In progress (17/6)<br />
<br />
'''Durham'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br /><br />
Ticket concerning the small percentage of Durham jobs that aren't publishing their core count (probably a slurm oddity). Now that Oliver's back from holiday has he had time to look at this? On Hold (19/6)<br />
<br />
'''Tier 1'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=113836 (20/5)<br /><br />
I suspect that whilst the work described chugs along in the background we consider on holding this ticket. In progress (24/6)<br />
<br />
'''UCL'''<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=114746 114746] (30/6)<br /><br />
Ben has been battling getting his DPM working, and spotted an interesting problem where SELinux was blocking the httpd from accessing mysql. It's nice to see someone not just switching SELinux off (like I have a habit of doing...). In progress (20/7)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115003 115003] (12/7)<br /><br />
Andy having some problems on a test SE at ECDF - he seems ot be suffering a series of unfortunate errors. Maybe the storage group could help? In progress (17/7)<br />
<br />
'''GridPP Pilot Roles.'''<br /><br />
Bristol are ready for testing, Govind discovered a possible bug in argus that needs a bit more testing at RHUL. Things seem quiet at RALPP and Sheffield, and Brunel too.<br />
<br />
''Supplemental - In the region multicore publishing ticket ([https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233]) only Oxford have a CE still appearing to publish 0 cores - but I thought this CE was Old-Yeller'ed?''<br />
<br />
'''Tuesday 14th July 2015, 9.30 BST'''<br />
<br />
Lazy update today, due to some fun and games at Lancaster yesterday. <br />
<br />
[http://tinyurl.com/nwgrnys UK GGUS Tickets]<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios]<br />
<br />
'''Tickets that pop:'''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114952 114952] and [https://ggus.eu/?mode=ticket_info&ticket_id=114951 114951] are both atlas frontier tickets at RALPP and Oxford, both have been reopened - although the underlying issues seem different. The Oxford ticket is similar to one at RAL ([https://ggus.eu/?mode=ticket_info&ticket_id=114957 114957]), which looks to be caused by an unannounced change in IP for some important atlas squids (AIUI - speed reading the tickets this morning).<br />
<br />
'''QM IPv6 Woes'''<br /><br />
Followers of the atlas uk lists will have noticed some heroic attempts to diagnose and repair problems at QM which appear to be someone else's fault.<br />
LHCB's ticket to QM on the matter: [https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573]<br /><br />
Dan's ticket concerning the "rogue routes": [https://ggus.eu/index.php?mode=ticket_info&ticket_id=115017 115017]<br />
<br />
'''GridPP Pilot Roles'''<br /><br />
Durham, Bristol, Sheffield, Brunel and RHUL still have open tickets about this. Bristol are working on it, as are Durham - Oliver's ready for their setup to be tested again (Puppet overwrote his last changes!). Not much recent news from the other three.<br />
<br />
That's all from me folks, let me know if I missed ought!<br />
<br />
'''Tuesday 14th July 2015, 9.30 BST'''<br />
<br />
Lazy update today, due to some fun and games at Lancaster yesterday. <br />
<br />
[http://tinyurl.com/nwgrnys UK GGUS Tickets]<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios]<br />
<br />
'''Tickets that pop:'''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114952 114952] and [https://ggus.eu/?mode=ticket_info&ticket_id=114951 114951] are both atlas frontier tickets at RALPP and Oxford, both have been reopened - although the underlying issues seem different. The Oxford ticket is similar to one at RAL ([https://ggus.eu/?mode=ticket_info&ticket_id=114957 114957]), which looks to be caused by an unannounced change in IP for some important atlas squids (AIUI - speed reading the tickets this morning).<br />
<br />
'''QM IPv6 Woes'''<br /><br />
Followers of the atlas uk lists will have noticed some heroic attempts to diagnose and repair problems at QM which appear to be someone else's fault.<br />
LHCB's ticket to QM on the matter: [https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573]<br /><br />
Dan's ticket concerning the "rogue routes": [https://ggus.eu/index.php?mode=ticket_info&ticket_id=115017 115017]<br />
<br />
'''GridPP Pilot Roles'''<br /><br />
Durham, Bristol, Sheffield, Brunel and RHUL still have open tickets about this. Bristol are working on it, as are Durham - Oliver's ready for their setup to be tested again (Puppet overwrote his last changes!). Not much recent news from the other three.<br />
<br />
That's all from me folks, let me know if I missed ought!<br />
<br />
'''Monday 6th July 2015, 14.00 BST''' <br /><br />
30 Open UK Tickets this month. Looking at them all!<br />
<br />
'''NGI'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233] (10/6)<br /><br />
The UK not publishing core counts at all sites. Some progress, but at last check John G couldn't see a change for Oxford or Glasgow. In progress (30/6) ''Update - Glasgow seems to be okay after de-creaming, checking the July list we have t2ce6 at Oxford, ce3 and ce4 at Durham (see their ticket) and cetest02 at IC (but that node has test in its hostname!).''<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114442 114442] (18/6)<br /><br />
Gridpp Pilot role ticket. Accounts need to be created, but no word for a few weeks. In progress (19/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114764 114764] (1/7)<br /><br />
Ticket tracking (false) availability issues, created to appease COD - the problem caused by a broken CA rpm release for Arc CEs. Kashif has created a counter-ticket [https://ggus.eu/index.php?mode=ticket_info&ticket_id=114742 114742] Gordon's sagely advice is to submit a recalculation request once the issue is fixed. Assigned (1/7)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114485 114485] (19/6)<br /><br />
Bristol's gridpp pilot role ticket. No news, could do with an update really. In progress (22/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114426 114426] (18/6)<br /><br />
CMS AAA reading test problems. The Bristol admins have transferred data to their new shiny SE and have asked CMS to test again. No word since. Waiting for reply (30/6)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (1/7/13...)<br /><br />
Tarball glexec ticket, now 2 years old. After a really promising burst the last 6 weeks haven't seen any progress, due to a lot of other "normal" tarball work taking up the time. Sorry! On hold (18/5)<br />
<br />
'''DURHAM'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114536 114536] (22/6)<br /><br />
Durham's gridpp pilot role ticket. Not acknowledged yet, is Oliver back yet? Assigned (22/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114765 114765] (1/7)<br /><br />
See RALPP ticket 114764. Assigned (1/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114727 114727] (30/6)<br /><br />
Catalin ticketed that a number of SW_DIR variables at Durham are still pointing to the old school .gridpp.ac.uk cvmfs space. Assigned (30/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br /><br />
John G ticketed Durham over a small percentage of jobs being published as "zero core". Looks like a SLURM timeout problem, although a fix isn't obvious. Put on the back burner whilst Oliver is on holiday. On Hold (19/6)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114649 114649] (26/6)<br /><br />
A ticket from a Sno+ user about not being able to access software using the Sheffield CEs. Acknowledged but no news. In progress (26/6) ''Update - Elena can't find anything wrong, cvmfs seems to be working fine. Perhaps a problem with the environment?''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] (18/6)<br /><br />
Sheffield's gridpp pilot role ticket. Did you get round to rolling them out? In progress (19/6)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114444 114444] (18/6)<br /><br />
LHCB ticket concerning the DPM's SRM not returning checksum information. On hold whilst a related ticket is being looked at ([https://ggus.eu/index.php?mode=ticket_info&ticket_id=111403 111403]). On Hold (22/6)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
Another Sno+ ticket, about grid production jobs failing at Liverpool. AIUI caused by Sno+ running out of space on the shared pool. At last check Steve posted the usage information for Sno+ but no word since (and Steve's off on his hols). In progress (17/6)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114845 114845] (6/7)<br /><br />
LHCB pilots failing at Lancaster. Looks like a simple node misconfiguration, hopefully fixed, waiting to see if it is. On hold (6/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/2013)<br /><br />
glexec ticket - see Edinburgh description. On hold (15/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1)<br /><br />
Bad bandwidth performance at Lancaster. Hoping that IPv6 will shake things up a bit so pushing that. On hold (18/5)<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114746 114746] (30/6)<br /><br />
SRM-put failures ROD ticket. No news at all. Assigned (30/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114851 114851] (6/7)<br /><br />
Low availability ROD ticket, related to above. Assigned (6/7)<br />
<br />
'''RHUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114441 114441] (18/6)<br /><br />
Another GridPP pilot role ticket. Pilots rolled out, but something isn't quite right and they're not working - Govind is looking again. In progress (6/7)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB ticket about two out of three QM CEs not responding for them. Dan spotted the broken CEs were dual-stacked, the working one wasn't. The ticket seemed to have trailed off into some confusion over who needs to do some testing where. I agree with Dan that that who needs to be someone with LHCB credentials! The waters still seem muddied. In progress (1/7)<br />
<br />
'''IC'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114737 114737] (30/6)<br /><br />
The IC voms wasn't updating properly, due to what I infer from the ticket as "SSL/mysql madness". Simon and Robert have been heroically battling this one - it's a good read. On hold (3/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114379 114379] (16/6)<br /><br />
Sam's ticket about SE support in Dirac. Sam will shortly try testing things out on the new Dirac to see how it fares. In progress (6/7)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114447 114447] (18/6)<br /><br />
Brunel's gridpp pilot ticket. Being worked on, with one CE with the pilots enabled. In progress (26/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114006 114006] (31/5)<br /><br />
A ticket from APEL, about Brunel under-reporting the number of jobs they are doing. Turned out to be a problem with Arc, which Raul upgraded to the fixed 5.0 version. The APEL team deleted the sync records, but no word since. In progress (30/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114850 114850] (6/7)<br /><br />
Another APEL ticket, likely the fallout of the previous one - it looks like GAP publishing has been left on for the Brunel CREAM CEs. Assigned (6/7) ''Update - solved''<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114786 114786] (2/7)<br /><br />
Low availability ticket - see RALPP ticket 114442 - probably could do with On holding. In progress (2/7) ''Update - Onholded''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113910 113910] (26/5)<br /><br />
Sno+ data staging problems. Brian gave some advice on how the large VOs do data staging from tape, and has asked if Sno+ still has problems. Matt M might still be on leave though. Waiting for reply (23/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA problems, which eventually brought to light to a problem with super-hot datasets which were alleviated (I think). Despite an update to castor that improved performance the last batch of tests didn't show improved results. No news since. In progress (17/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113836 113836] (20/5)<br /><br />
Glue mismatch problems at RAL. Working on getting "many-Arcs" to correctly publish. In progress (24/6)<br />
<br />
'''Monday 29th June 2015, 14.30 BST'''<br /><br />
<br />
Looking at the [https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''"Other VO" Nagios''']. <br /><br />
Things look generally alright - but Durham look like they need to update their CA rpms - but that might have to wait until Oliver is back from leave.<br />
<br />
'''Tarballs:'''<br /><br />
I don't think this effects many, but there's was a ticket to produce a new version of the WN tarball (which is done): [https://ggus.eu/index.php?mode=ticket_info&ticket_id=114574 114574]<br ><br />
Although AFAICS there is no urgent need to upgrade tarball WNs.<br />
<br />
26 UK Tickets, although not many stand out.<br />
<br />
'''Gridpp VO Pilot tickets:'''<br /><br />
Largely doing alright. With Oliver away the Durham ticket hasn't been looked at yet. Sheffield and Bristol's tickets could do with an update (or on-holding if there's going to be a delay). The RHUL ticket has been reopened as they're deployment of the pilot roles hasn't quite worked out.<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB having trouble with two out of three QM CEs. Dan notes that the two "broken" CEs have been recently dual-stacked, and asks if this could be the problem. The answers is a resounding "maybe", and Raja asks if problems could be duplicated by others using lxplus. Waiting for reply (24/6)<br />
<br />
'''IMPERIAL/DIRAC'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114379 114379] (16/6)<br /><br />
Sam's ticket trying to get SE support with Dirac...er, spruced up. Daniela has asked if the tests can be redone with the "new" dirac. Waiting for reply (22/6)<br />
<br />
Let me know if I missed any tickets.<br />
'''Monday 22nd June 2015, 14.00 BST'''<br />
<br />
35 Open UK Tickets this week (!!!)<br />
<br />
'''GridPP Pilot Role'''<br /><br />
A dozen of them are from Daniela (who painstakingly submitted them all) concerning getting the gridpp (and other) pilot role enabled on the site in question's CEs.<br />
<br />
An example of one of these tickets is:<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=114440 114440] (Lancaster's smug solved ticket).<br />
<br />
Ticketed sites are: Durham, Bristol, Cambridge, Glasgow, ECDF (who are also having general gridpp VO support problems), EFDA-JET (looking solved), Oxford, Liverpool, Sheffield, Brunel, RALPP and RHUL. Most tickets are being worked on fine, but the Bristol and Liverpool ones were still just in an "assigned" state at time of writing. <br /><br />
''Update - good progress on this, just one ticket left "assigned". Cambridge are done, as are JET (ticket needs to be closed). Oxford and Manchester are ready for to have their new setups tried out, with Oxford kindly road-testing glexec for the pilot roles. Good stuff.'' <br />
<br />
'''Core Count Publishing'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233] (10/6)<br /><br />
Of the sites mentioned in this ticket (Durham[1], IC, Liverpool, Glasgow, Oxford) who *hasn't* had a go at changing their core count publishing? I know Oxford have. Daniela had a pertinent question about publishing for VMs, which John answered. In progress (17/6)<br />
<br />
[1] Durham have another ticket on this which may explain their lack of core count publishing: <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br />
<br />
'''DIRAC'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114379 114379] (16/6)<br /><br />
Sam S formed this ticket over having trouble accessing the majority of SEs over Dirac, after some discussion around this last week. Sam acknowledges that this could be a site problem, not a DIRAC problem, but you gotta start somewhere (he worded that point more eloquently). Daniela has posted her latest and greatest DIRAC setup gubbins for Sam to try out. Another, unrelated, point to have are the names missing from Sam's list - for example I'm pretty sure Lancaster should support gridpp VO storage but I've forgotten to roll it out! Waiting for reply (22/6)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
Final ticket today, and another one discussed last week in the storage meeting. Steve's explanation of why (and how) Sno+ would need to start to using space tokens was fantastically well worded in a way to not spook easily startled users. David is digesting the information, but it will likely need to wait for Matt M's return before we'll see progress. In progress (16/6)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114444 114444](18/6)<br /><br />
I told a pork pie when I said that was the last ticket - this one caught my eye. A ticket from lhcb over files not having their checksums stored on Manchester's DPM. A link was given to another ticket at CBPF for a similar issue which got the DPM devs involved ([https://ggus.eu/index.php?mode=ticket_info&ticket_id=111403 111403]) - although Andrew McNab was already subscribed to the ticket. In progress (19/6)<br />
<br />
'''Monday 15th June 2015, 14.15 BST'''<br /><br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO nagios:]<br /><br />
ce05.esc.qmul.ac.uk and hepgrid11.ph.liv.ac.uk seem to be having a spot of bother for multiple VOs, and hepgrid2.ph.liv.ac.uk seems to be starting to have trouble too.<br />
<br />
19 Open UK Tickets this week.<br />
<br />
'''NGI'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233] (10/6)<br /><br />
John Gordon ticketed the NGI (as well as others) about some sites in the UK not publishing core counts with their APEL numbers (or more precisely have submission hosts at that site not publishing). Following the link it looks like Imperial, Liverpool, Durham, Glasgow and Oxford are on this list, I've listed the submission hosts reporting "0" core jobs below to help people clear up their rogues! If you've only fixed things in the last fortnight you'd still show up on this list. In progress (15/6) <br />
<br />
"0" core job submission hosts:<br /><br />
ce3.dur.scotgrid.ac.uk<br /><br />
ce4.dur.scotgrid.ac.uk<br /><br />
cetest02.grid.hep.ph.ic.ac.uk<br /><br />
hepgrid5.ph.liv.ac.uk<br /><br />
hepgrid6.ph.liv.ac.uk<br /><br />
hepgrid97.ph.liv.ac.uk<br /><br />
svr009.gla.scotgrid.ac.uk<br /><br />
t2ce06.physics.ox.ac.uk<br />
<br />
'''Tier 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113914 113914] (26/5)<br /><br />
Sno+ file copying ticket. Matt M is away on his hols, but Dave Auty has took over his duties and reports that this problem seems to have gone away - it can probably be closed. In progress (9/6)<br />
<br />
'''Liverpool'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
Another Sno+ ticket here from David concerning job failures. Nothing wrong with the ticket handling, but I thought that David's errors in submitting test jobs are worth documenting, as they were very understandable. David has since asked if the Sno+ job failures are linked to Sno+ nagios test failures at Liverpool. In progress (12/6)<br />
<br />
'''Imperial'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114157 114157] (8/6)<br /><br />
After Simon and Daniela have cleared up the atlas dark data and expanded their Space Tokens using the space freed up there still seems to be some confusion in Rucio, disagreeing with the SRM numbers. In progress (10/6)<br />
<br />
'''Oxford'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114208 114208] (9/6)<br /><br />
Oxford being ticketed for UKI-SOUTHGRID-OX-HEP_IPV6TEST failing connection tests, tests for which Oxford should not be getting ticketed afaacs. I remember this being mentioned in the Thursday cloud meeting, but I'm ashamed to say I wasn't paying attention. Were any conclusions drawn/decisions made? In progress (10/6)<br />
<br />
'''Manchester'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114153 114153] (7/6)<br /><br />
Atlas transfer failures from Manchester. Errors are still occurring as of yesterday, any news? In progress (14/6)<br />
<br />
<br />
'''Monday 8th June 2015, 15.00 BST'''<br /><br />
21 Open Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113914 113914] (26/5)<br /><br />
Sno+ had problems at the Tier 1 where jobs failed whilst uploading data, believed to be due to an incorrect VOInfoPath. There's been a failure at replicating the issue, and the VOInfoPath advertised is correct. Very confusing, as I assume it all worked at some point before! In progress (2/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113910 113910] (26/5)<br /><br />
Another Sno+ ticket, concerning lcg-cp timeouts whilst data-staging from tape. Matt M has asked for advice on the best practice for doing this, or if Sno+ would be better off just upping their timeouts. Brian has given some advice on using the "bringonline" command, but is himself unsure the best way of seeing what files are currently in a VO's cache. Not much news since. In progress (28/5) <br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114004 114004] (31/5)<br /><br />
Atlas transfers fail due to the "bring-online" timeout being exceeded. Brian spotted a problem with file timestamps mismatching, but no news on this ticket since. In progress (1/6)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114006 114006] (31/5)<br /><br />
APEL accounting oddness at Brunel, noticed by the APEL team. After much to-and-fro-ing John noticed that multiple CEs were using the same SumbitHost, and thus overwriting each other's sync records. Something to watch out for. In progress (7/6)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114157 114157] (8/6)<br /><br />
There's been some debate on the atlas lists about this ticket, a classic "not enough space at the site" ticket. Raising above the indignation over being ticketed for this, Daniela has offered a couple of TB to give some space, and pointed out that IC have some atlas data outside space tokens, and that this could be used to expand the tokens if cleaned up. Waiting for reply (8/6)<br />
<br />
<br />
'''Tuesday 26th May 2015'''<br /><br />
Matt's on leave until the 8th of June. But he's replaceable with handy links... 23 tickets today:<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios''']<br /><br />
<br />
[http://tinyurl.com/nwgrnys '''UK NGI GGUS tickets''']<br />
<br />
'''Monday 18th May 2015, 14.30 BST'''<br /><br />
Full review this week.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios''']<br /><br />
At time of writing I see problems with test jobs at Brunel for pheno and Liverpool for a number of VOs (see Sno+ ticket for probable cause and fix at Liverpool).<br />
<br />
22 Open UK Tickets this week. Going site-by-site:<br />
<br />
'''APEL/NGI'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113473 113473] (4/5)<br /><br />
Missing accounting date for April for some sites. Raul is discussing things for Brunel in the ticket, although they have republished. I think it's only ECDF left to republish their April data. In progress (16/5)<br />
<br />
'''OXFORD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113482 113482] (26/4)<br /><br />
Loss of accounting data for Oxford needing a APEL republish. The Oxford guys republished, but there is some confusion with the resulting numbers. Discussion is ongoing, John G is currently looking at the records. In progress (14/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113650 113650] (11/5)<br /><br />
CMS glideins failing at Oxford. The original problem was with a config tweak being left out of the cvmfs setup, but the ticket has been reopened citing problems persisting on the ARC CE (the CREAM appears to be fixed). Reopened (16/5)<br />
<br />
'''GLASGOW'''<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113095 113095] (17/4)<br /><br />
ROD ticket about batch system BDII failures, left open to avoid unnecessary ticket filing. Gareth noted that the full migration to ARC and HTCondor, which should see the end of these issues, will hopefully be completed by the end of June. On Hold (12/5)<br />
<br />
'''ECDF'''<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (31/7/13)<br /><br />
Somehow left this one out of the e-mail update. Edinburgh's glexec ticket, dependent on the tarball. I put in my tuppence worth today with my tarball hat on. On hold (18/5)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113769 113769] (18/5)<br /><br />
LHCB see a cvmfs problem at Sheffield. Elena has probably fixed the problem(restarted the sssd), just waiting to see if it all pans out. In progress (18/5)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113744 113744] (15/5)<br /><br />
For the VOMS rather then the site, Jens' request for the creation of the dIrac VO, vo.dirac.ac.uk. In progress (18/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113692 113692] (13/5)<br /><br />
A request from pheno to add support to for their new cvmfs area at Manchester, and as I understand it, to support them in a new "form" (pheno.egi.eu). In progress (13/5)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113742 113742] (15/5)<br /><br />
Sno+ noticed their nagios failures at Liverpool. Rob reckons this was a problem with the DPM BDII service certificate not being updated (that's bitten me too), and fixed things this morning. Let's see how that goes. In progress (18/5)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13!)<br /><br />
Lancaster's vintage glexec ticket. An update on this - after have a roundtuit session last week I was building glexec for different paths. It still needs some testing to make sure it works properly. There however definitely won't be a one-size-fits-all tarball solution. On hold (15/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/14)<br /><br />
Only the crustiest old tickets for us at Lancaster! Poor perfsonar performance. Sadly didn't get roundtuit on this one - we're pushing getting these nodes dual stacked as Ewan had pointed out that it would be interesting to see if IPv6 tests also saw this issue. On hild (18/5)<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113721 113721] (14/5)<br /><br />
The only UCL ticket, this is a egi "low availability" ticket. However Daniela notes that the plots are on the rise, so things are looking alright. Probably want to "On Hold" it but otherwise not much to be done. In progress (14/5)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113743 113743] (15/5)<br /><br />
A ticket from Durham concerning the Dirac instance at Imperial's settings for their site. Daniela hopes to get it fixed soon. In progress (15/5)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112948 112948] (10/4)<br /><br />
CA certificate update at 100IT leading to a discussion of other authentication based failures. David has asked for voms information after posting his configs. In progress (13/5)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113035 113035] (14/4)<br /><br />
Ticket tracking the decommissioning of the Tier 1 CREAM CEs. I think things are just about done now, this ticket can soon be closed. In progress (11/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694] (28/10/14)<br /><br />
Sno+ gfal-copy ticket. Brian reports that the Tier 1 is upgrading gfal2 on their WNs, and notes that there's a lot of active debugging work going on in the area. As he eloquently puts it "situation is quite fluid". In progress (13/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA tests failing at the Tier 1. There's been a lot of work on this, deploying then trying to get the new xrootd director configured. New problems have cropped up, and are under investigation. In progress (11/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (28/3)<br /><br />
Atlas transfer failures ("failed to get source file size"). Tracked to a odd double transfer error, possibly introduced in one of the recent "upgrades". Brian has been declaring these files as bad, and a workaround or solution is being thought about. In progress (14/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113705 113705] (13/5)<br /><br />
Atlas transfer failures from RAL tape. Checksum failures, which Brian tracked to being due to not being of a type Castor supports. Brian has asked if this can be changed at the CERN FTS or in rucio. Waiting for reply (14/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113748 113748] (16/5)<br /><br />
Another atlas transfer ticket, but as the error indicates no space left at the Brunel space token being transferred to Elena has noted that this isn't a site problem, telling the submitter to put in a JIRA ticket instead. Waiting for reply, but probably can be just closed (16/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112866 112866] (7/4)<br /><br />
Lots of cms job failures at RAL. This has been traced to some super-hot files, mitigation is being looked into. A candidate for perhaps On Holding, depends on the time frame of a work around. In progress (13/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113320 113320] (27/4)<br /><br />
CMS data transfer issues. I'm not actually too sure what's going on. There are files that need invalidating, which seems to be the root of the evil befalling transfers. The issue is being actively worked on though. In progress (18/5)<br />
<br />
'''Monday 11th May 2015, 14.10 BST'''<br /><br />
22 Open UK Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
There are a few tickets at the Tier 1 that are set "In Progress" but haven't received an update yet this month:<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (CMS AAA Tests, 30/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (Atlas Transfer problems, 16/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694] (SNO+ gfal copy trouble, 15/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112866 112866] (CMS job failures, 7/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112819 112819] (SNO+ arcsync troubles, 20/4)<br />
<br />
'''Other Tier 1 Tickets''' (sorry to be picking on you guys!)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (10/2)<br /><br />
Atlas glexec hammercloud test jobs at the Tier 1. It appears to be working, but a batch of test jobs failed because they couldn't find the "mkgltempdir" utility on some nodes ("slot1_5@lcg1742.gridpp.rl.ac.uk" and "slot1_4@lcg1739.gridpp.rl.ac.uk"). In progress (4/5)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=113320 113320] (27/4)<br /><br />
Maybe repeating what Daniela is going to say in the CMS update - trouble with CMS data transfers within RAL. It's under investigation, but it looks like the files in question will need to be invalidated - even if it's just to paint a clearer picture. In progress (10/5)<br />
<br />
'''APEL REPUBLISHING'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113473 113473] <br /><br />
At last update Brunel, Liverpool, Edinburgh, Birmingham and Oxford need to republish still. Oxford have their own ticket about it due to complications ([https://ggus.eu/?mode=ticket_info&ticket_id=113482 113482]).<br />
<br />
'''UCL Tickets''' - Ben is starting to move to close these, some are going to be "unsolved".<br />
<br />
'''GLASGOW'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113095 113095] (17/4)<br /><br />
Andrew asks if the timeframe for the move to Condor be added to this ticket, for the ROD team's information. On Hold (7/4)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112948 112948] (10/4)<br /><br />
No news on this 100IT ticket for a while. In progress (27/4)<br />
<br />
'''Friday 1st May'''<br /><br />
The Bank Holiday weekend might muck up plans for a Ticket review this week. Just in case, some links!<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios page.]<br /><br />
<br />
[http://tinyurl.com/l784bbg UK GGUS Tickets]<br />
<br />
Hope you all have a nice weekend!<br />
<br />
A quick check of the [https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios] page.<br />
<br />
26 Open UK tickets this week.<br />
<br />
'''ITWO Decommissioning'''<br /><br />
Three of the tickets are to the VOMS sites (Manchester, Oxford, IC), concerning the decommissioning of the ITWO VO. Just an FYI to y'all.<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113293 113293] (26/4)<br /><br />
There was an APEL problem last month where a lot of sites needed to republish their data for the month. I think Edinburgh are the only UK site that suffered this problem, but another FYI ticket. Assigned (26/4) '' And solved''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113181 113181](21/4)<br /><br />
Atlas production jobs not running at ECDF. Andy noticed that analysis jobs were running fine, and believes that this might be a problem scheduling pilots in time. Perhaps a multicore issue if this is only effecting (affecting?) production jobs. In progress (22/4) ''Update - solved''<br />
<br />
'''ATLAS GLEXEC HAMMERCLOUD PROBLEMS'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (TIER 1)<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=111703 111703] (RALPP)<br /><br />
It was discovered that there was a problem in the test code, so the ball is very much in atlas' court for this one. The problem has been fixed and the tests are being rebuilt and resubmitted.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (28/3)<br /><br />
ATLAS FTS failures too RAL. A rucio issue causing double-transfers has been discovered ([https://its.cern.ch/jira/browse/ATLDDMOPS-4939 here]), which would explain the behaviour seen. No news since this revelation. In progress (16/4)<br />
<br />
There are a number of other Tier 1 tickets that could do with either an update or On Holding<br />
<br />
'''Monday 20th April 2015, 14.30 BST'''<br /><br />
24 Open UK tickets this week, only a light review. ''Update - down to 20 open tickets as of this morning''<br />
<br />
'''NGI/TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113150 113150] (20/4)<br /><br />
Fresh in - the NGI has been ticketed to change the regional VO from emi.argus to ngi.argus in the gocdb. Seems a bit pedantic, but hey! I assigned it to the NGI ops, and notified RAL as keepers of the regional argus. Assigned (20/4) ''Update - solved''<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113035 113035] (14/4)<br /><br />
Just for people's interest, the ticket tracking the decommissioning of the last of the RAL CREAM CEs. In progress (14/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112819 112819] (2/4)<br /><br />
A SNO+ ticket I must of somehow missed last week, concerning SNO+'s manual renewing of proxies on ARC machines. Matt M has noticed that ArcSync occasionally hangs rather then timeouts smoothly (although he later notes that he doesn't see the initial problems working from a different network). I'm thinking that this should be redirected at the arc devs, but I don't think they have a GGUS support group (I could be wrong, I'm well behind on the ARC curve). In progress (7/4)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113110 113110] (17/4)<br /><br />
Looks like this atlas low transfer efficiency ticket can be closed. Waiting for reply (20/4) ''Update - solved''<br />
<br />
'''GLASGOW'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113095 113095] (17/4)<br /><br />
ROD ticket for some BDII misreporting at Glasgow. The botheration seems to be ephemeral in nature, the blunders passing with the abating of their batch system's burden. This ticket can probably be solved. In progress (17/4)<br />
<br />
<br />
'''Monday 13th April 2015, 14.00 BST'''<br /><br />
24 Open tickets this week - going over all of them this week, site by site.<br />
<br />
''Fresh in this morning - [https://ggus.eu/?mode=ticket_info&ticket_id=113010 113010] and [https://ggus.eu/?mode=ticket_info&ticket_id=113011 113011] - Sno+ tickets concerning the RAL and Glasgow WMSes not updating job statuses.''<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703](11/2)<br /><br />
Atlas glexec hammercloud tests failing. There's been a lot of waiting on atlas to build new HC jobs. The most recent exchange (delayed due to Easter), was asking about SELinux - but no news since the first. In progress (1/4)<br />
<br />
'''BIRMINGHAM'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112875 112875](7/4)<br /><br />
Low availability ROD ticket. Availability is crawling back up, just need it to go green. On hold (13/4)<br />
<br />
'''GLASGOW'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112967 112967](10/4)<br /><br />
Another ROD ticket for bdii errors at Glasgow. Gareth has been doing everything right investigating this. Kashif recommended ticketed the midmon unit, but Gareth has spotted that the errors correspond to high load on their ARC CE - so it might be a site problem after all - Gareth asks for clarification. Waiting for reply (13/4)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (1/7/13)<br /><br />
Tarball glexec ticket. No news (sorry). End of April I believe was the "deadline" I set for having this made. On Hold (9/3)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/14)<br /><br />
Lancaster's poor perfsonar performance. I'm not believing quite what I was seeing with the tests I performed so I'm aiming to rerun them. On hold (13/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13)<br /><br />
Lancaster's tarball glexec ticket. Same as ECDF. On hold (9/3)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112966 112966] (13/3)<br /><br />
A ROD cream job submit ticket, freshly assigned this afternoon. It's a bit mean of me to bring notice to it. Assigned (13/4) ''And POW, Raul closed this after kicking torque into shape - solved''<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112948 112948] (10/4)<br /><br />
100IT needed to upgrade to the latest CA release. They've done this, but there are still authentication problems. In progress (13/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356] (10/9/14)<br /><br />
Deploying vmcatcher at 100IT. After David's questions falling on deaf ears for a while it has been advised that the ticket be closed as this issue will be dealt with elsewhere. Whether or not it is to be "solved" or "unsolved" is open to debate! In progress (can possibly be closed) (13/4)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA tests failing at RAL. After a lot of work and new xrootd redirectors problems persist. It's looking to be a problem that needs the CASTOR and/or xrootd devs to look at. In progress (30/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112713 112713] (27/3)<br /><br />
CMS asking to clean up the "unmerged area". Andrew conjured up a list of files and asked if they could be deleted - CMS responded with a "yes please then close the ticket". Has the deed been done? In progress (31/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694] (28/10/14)<br /><br />
The Sno+ gfal copy ticket. Matt M still sees gfal-copy hang for files at RAL when he uses the GUID (SURL works). A Castor oddity perhaps? Matt asks a question about what problems like this (coupled with the move away from lcg tools) will mean for VOs that rely on the LFC. In progress (31/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112977 112977] (10/3)<br /><br />
CMS high job failure rate at RAL. Related to 112896 (below) - the jobs all want that file! In progress (13/3)<br />
<br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=112896 (9/4)<br /><br />
CMS Dataset access problems - caused by over a million access attempts on a single file over a 18 hour period. Andrew L comments that CMS needs to have a think about how they access pileup datasets. In progress (9/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (10/2)<br /><br />
Tier 1 counterpart to 111703. A new HC stress test was submitted near the end of March, but no news on how it did. In progress (23/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112866 112866] (7/4)<br /><br />
A different "lots of CMS job failures" ticket. Again a "hot file" seems to be the root cause. In progress (7/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (28/3)<br /><br />
An atlas file access ticket, seemingly caused by some odd FTS behaviour. No answers to Shaun's question about this odd occurrence or much noise at all till today. Waiting for reply (13/4)<br />
<br />
'''UCL'''<br /><br />
UCL has 6 tickets - 4 just "assigned". I'll just list them in the interests of brevity.<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112371 112371] (ROD low availiability, On Hold)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112841 112841] (atlas 0% transfer efficiency, assigned)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112873 112873] (ROD srm put failures, assigned)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95298 95298] (glexec ticket, on hold)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112722 112722] (atlas checksum timeouts, in progress)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112966 112966] (ROD job submit failures, assigned)<br />
<br />
'''Tuesday 7th April'''<br />
* 20 open tickets. [http://tinyurl.com/l784bbg Link to GGUS].<br />
<br />
'''Monday 23rd March 2015, 15.30 GMT'''<br /><br />
19 Open tickets this week. <br />
<br />
'''BIRMINGHAM'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=112550 (23/3)<br /><br />
A ticket fresh off the ROD dashboard - the Birmingham CREAMs aren't being matched ("BrokerHelper: no compatible resources"). Matt W has double checked their setup and can't spot anything wrong - they've been running "normal" atlas/lhcb etc jobs fine over the last few weeks. Any advice appreciated. In progress (23/3)<br />
<br />
'''TIER 1'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=112495 (21/3)<br /><br />
MICE problems running jobs at RAL, which Andrew L discovered coincided with WMS problems that he fixed. Probably should be in "Waiting for Reply/Seeing if the problem's evaporated". In progress (23/3)<br />
<br />
https://ggus.eu/?mode=ticket_info&ticket_id=112350 (14/3)<br /><br />
The cause of this Sno+ ticket, about a recent user not being able to access files due to not being in the gridmap, has been discovered. As Robert F sagely pointed out the latest version of the mkgridmap rpm is required to talk to the voms server. Just waiting on the time for it to updated at now. In progress (17/3)<br />
<br />
'''100IT'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=108356 (10/9/14)<br /><br />
This 100IT ticket is still waiting for a reply (since mid-January). The question needs to be answered by someone familiar with the technologies and terminologies? Is anyone up on vmcatcher? Anyone know what other channel to pass David's query onto? Waiting for reply (15/1)<br />
<br />
'''Monday 16th March 2015, 15.30 GMT'''<br /><br />
<br />
16 Open UK tickets this week. Half Red, Half Green.<br />
<br />
'''RALPP and TIER 1 glexec HC tickets'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703] (RALPP)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (TIER 1)<br /><br />
No news on these tickets since it was expected that a new stable HC job would be released last Tuesday. All very quiet.<br />
<br />
'''OXFORD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112011 112011](25/2)<br /><br />
Similar for this CMS glexec ticket - no news after Kashif asked for some more information way back. Waiting for reply (27/2)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694](28/10/14)<br /><br />
The Sno+ gfal copying ticket. A lot of people are working on this, and attempts to recreate the problems seem to occasionally be devolving into "did we get this complicated command right?". At some point it might be necessary to get the gfal devs involved (are there gfal devs?). Waiting for reply (11/3)<br />
<br />
Also there's the poor 100IT ticket, still waiting for a reply. The JET ticket also needs wrapping up, I put a reminder in to the end of it.<br />
<br />
<br />
'''Monday 9th March 2015, 15.00 GMT'''<br/ ><br />
From last week's crusty ticket round up:<br />
<br />
'''Tier 1'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694]- 28/10/14<br/ ><br />
Matt M has managed to reclaim his tickets after a certificate change orphaned his old ones. Progress has resumed. Duncan has asked Matt to retry his failing tests with a simple copy to local disk example.<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944]- 1/10/14<br/ ><br />
CMS AAA access at RAL. Andrew posed a question to the CMS xroot experts last week - if you have their details it might be a good idea to involve them in the ticket.<br />
<br />
'''100IT'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356]- 10/9/14<br/ ><br />
This VMCatcher ticket is still stuck waiting for a reply. Deafening silence for our 100IT colleagues. <br />
<br />
'''EFDA-JET'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=97485 97485] - 21/9/13<br/ ><br />
The Jet LHCB ticket is in the state of being wrapped up. LHCB have been removed from the local configs, and the site has been removed from LHCB's. I believe that this ticket can be terminated.<br />
<br />
'''QMUL'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353] - 25/11/14<br/ ><br />
Dan's managed to get webdav working on one SE, but not t'other. Very strange, but Dan is investigating (see also [https://ggus.eu/?mode=ticket_info&ticket_id=111942 111942]).<br />
<br />
No movement on the three glexec tickets (none expected on the two tarball ones in the last week though), the Lancaster perfsonar ticket is still waiting on another batch of local tweaks (and I still need to make sense of what I'm seeing). Matt RB closed Sussex's perfsonar ticket though - nice one.<br />
<br />
'''The "Normal" tickets:'''<br />
<br />
Atlas gLexec Hammercloud failures (RALPP and Tier 1)<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (Tier 1)<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703] (RALPP)<br/ ><br />
These tests were waiting on a new stable job release being made - this has been delayed (hopefully out tomorrow).<br />
<br />
'''TIER 1'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111856 111856](19/2)<br/ ><br />
This LHCB ticket about stalled jobs looks like it can be closed (LHCB no longer see a problem). ''Update - set to solved, the jobs were being killed for using too much memory.''<br />
<br />
'''OXFORD'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112011 112011](25/2)<br/ ><br />
A CMS user saw glexec failures on some nodes - Kashif asked for some more information but there has been no reply. I'd consider giving the user till the end of the week then closing the ticket if there's still no word. Waiting for reply (27/2)<br />
<br />
'''Tuesday 3rd March'''<br />
* [http://tinyurl.com/nwgrnys A link to GGUS open tickets] for checking status directly.<br />
<br />
Concentrating on pre-2015 tickets this week in an attempt to Spring Clean the UK ggus presence. I will review these again next week - can people please take a look at these tickets if they're owned by them (or if they think they can help!).<br />
<br />
'''SUSSEX - 26/11/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389]<br /><br />
This is a perfsonar ticket - the initial request (reinstalling the perfsonar node) has been done a while ago but things weren't quite right. Matt RB did some soothing to this last week and asks if he's missed anything - I put it to waiting for reply this morning. Waiting for reply (24/2)<br />
<br />
'''QMUL - 25/11/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353]<br /><br />
Atlas wanting https access on QM's SE. Dan's been working on this nicely, carefully testing each stage of his rollout. The end is in sight here. In progress (17/2)<br />
<br />
'''TIER 1 - 28/10/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694]<br /><br />
This is a SNO+ ticket about getting gfal tools working for the Tier 1 - with the new version out Brian has tested it correctly (and I saw a related thread on lcg-rollout) - but no word from Sno+. Who is wrangling the other VOs in these post-Walker times? Waiting for reply (24/2)<br />
<br />
'''TIER 1 - 1/10/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944]<br /><br />
A CMS access about AAA tests at the Tier 1. This ticket is being actively worked on, with a new xrootd redirector at RAL and problems with the EU redirectors mucking things up. No problems that I can see. Waiting for reply (2/3)<br />
<br />
'''100IT - 10/9/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356]<br /><br />
Getting VMcatcher stuff to work at 100IT. This ticket seems to keep stalling due to lack of documentation or replies from the submitters. Waiting for reply (19/1)<br />
<br />
'''LANCASTER - 27/1/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566]<br /><br />
Lancaster's poor perfsonar performance. Being poked and prodded on and off over the last year, but the problems remain a mystery - Ewan's lending of a iperf endpoint has helped out greatly though, waiting on yet another network tweak. On Hold (23/2)<br />
<br />
'''EFDA-JET - 21/9/13'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=97485 97485]<br /><br />
LHCB job failures at EFDA-JET. The causes of this remain a mystery, and is the first ticket on my "to be set to unsolved" list.<br />
<br />
'''UCL - 1/7/13'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95298 95298]<br /><br />
UCL's glexec ticket. Ben's been working hard at this recently, but keeps hitting show stoppers - the latest being a performance problem possibly due to the VM he's running argus on. On hold (19/2)<br />
<br />
'''ECDF and LANCASTER - 1/7/13'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (ECDF)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (Lancaster)<br /><br />
glexec for the tarball. This is *still* waiting on the tarball glexec, which is again waiting on me, which is waiting on me magicking some extra tarball development time. Will be reviewed by the end of March.<br />
<br />
<br />
'''Monday 23rd February 2015, 15.00 GMT'''<br /><br />
Only 15 Open UK tickets this week. Feel free to bring up any ticket-based issues of your own on this quiet week.<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703](11/2)<br /><br />
This CMS glexec hammercloud ticket is looking a little quiet - no update for a while. If it's continuing offline or waiting on input could it at least be put On Hold? In progress (11/2)<br />
(The Tier-1 version of this ticket, 111699, seems to be chugging along fine - there might be useful snippets in there).<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333](22/1)<br /><br />
The Cloud accounting probe ticket was reopened, asking if 100IT ticketed apel support (I assume contacting them via other means would work too) otherwise the new cloud accounting won't be properly republished. Reopened (20/2)<br />
<br />
<br />
'''IMPERIAL''' (but not really their issue)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111872 111872](20/2)<br /><br />
Tom opened a ticket after another cern@school user had troubles using the IC SE - there has been some problems with the newer versions of the dirac UI. Sometimes it's better to go Vintage! Although after trying Simon and Daniela haven't been able to reproduce the failure - perhaps something's up with the user's UI? Waiting for reply (23/2)<br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389](26/11/14)<br /><br />
Sussex's Perfsonar ticket. I know Matt RB has put the ticket On Hold and is very busy - is there any news/anything we can do to help? On Hold (21/1<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694](28/10/14)<br /><br />
Sno+ gfal copy problems at RAL. Brian informs us that the latest version of the gfal tools works for him and has asked if they work for Matt M. and Co. Did you get these packages out of epel/epel-testing or somewhere else Brian? Waiting for reply (18/2)<br />
<br />
'''Monday 16th February 2015, 14.30 GMT'''<br /><br />
Only 19 open UK tickets today.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111120 111120](12/1)<br /><br />
An atlas ticket concerning transfer failures between RAL and BNL. Brian mentioned last week that the lack of recent failures is due to atlas not attempting to transfer any older data recently. Perhaps this could do with being put into the ticket (and the ticket being put On Hold, or prodded some more)? Waiting for reply (29/1)<br />
<br />
Also<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=111800 111800] (17/2)<br /><br />
ARC CE issues at RAL detected. <br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703](11/2)<br /><br />
Atlas running glexec hammerclouds - having trouble at RALPP (and RAL, see 111699). The glexec experts have gotten involved on this one, and asked to take a peek at a proxy - not sure about anyone else, but I'd feel a tad uncomfortable sharing proxies, even with known and trusted experts as in this case. Am I being overly paranoid? Either way the ticket has gone a bit quiet. In progress (11/2) <br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356] & [https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333]<br /><br />
Both the 100IT tickets are in Waiting for reply - the oldest one for quite a while - David asked a question a while back and no answer. The newest one asks if the 100IT logs made it to apel safely - I think what David has to do is submit a ticket with this question to the apel support team - have I got the right end of the stick? <br />
<br />
'''Biomed Tickets at Manchester and Imperial'''<br /> <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356] & [https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357]<br /><br />
FYI There's a note at the bottom of both of these tickets that the version of CREAM that should fix this has been delayed until the end of February(ish).<br /><br />
''Update - I read those updates wrong - the cream update has been released and these tickets have been (perhaps erroneously) closed.'' <br />
<br />
<br />
<br />
'''Monday 9th February 2015, 15.00 GMT'''<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios Results]<br /><br />
At the time of writing the only site showing red that aren't suffering an understood problem was RALPP with org.nordugrid.ARC-CE-submit and SRM-submit test failures for gridpp, pheno, t2k and southgrid for both its CEs and its SE. The failures are between 1 and 12 hours old, so it doesn't seem to be a persistent failure, but it seems to be quite consistent. They all seem to be failing with "Job submission failed... arcsub exited with code 256: ...ERROR: Failed to connect to XXXX(IPv4):443 .... Job submission failed, no more possible targets". Anyone seen something like this before?<br />
<br />
Only 20 Open UK tickets this week.<br />
<br />
'''Biomed tickets:'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356] (Manchester)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357] (Imperial)<br /><br />
Biomed have linked both these tickets as children of 110636, being worked on by the cream blah team. AFAIKS no sign of Cream 1.16.5 just yet.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111347 111347] (22/1)<br /><br />
CMS consistency checks for January 2015. It looks like everything that was asked of RAL has been done by RAL, so hopefully this can be successfully closed. In progress (3/2)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111120 111120] (12/1)<br /><br />
Another ticket, this time concerning a period of Atlas transfer failures between RAL and BNL, that looks like it can be closed as the failures seem to have stopped (and might well have been at the BNL end). Waiting for reply (22/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA test failures at RAL. Federica can't connect to the new xrootd service according to the error messages. No news for a while. In progress (29/1)<br />
<br />
'''100IT'''<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356]<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333]<br /><br />
Both of these 100IT tickets are looking a bit crusty - the first is waiting for advice, the second was just put "In progress".<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353] (25/11/14)<br />
Dan has set up se02.esc.qmul.ac.uk to test out the latest https-accessible version of storm for dteam and atlas. As a cherry on top this node is also IPv6 enabled. I'm not sure if Dan wants others in the UK to "give it a go"? In progress (6/2)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/14)<br /><br />
(Blatantly scounging for advice) Trying to figure out why Lancaster's perfsonar is under-performing. Ewan kindly gave us access to a iperf endpoint and it's been very useful in characterising some of the weirdness - although I'm still confused. Ewan also gave us a bunch of suggestions for testing that have been useful - next stop, window sizes. If anyone else wants to throw advice to me all wisdom donations are thankfully accepted. My advice for others in be careful trying to connect to the default iperf port on a working DPM pool node.... In Progress (9/2)<br />
<br />
'''Monday 2nd February 2015, 14.00 GMT'''<br /><br />
22 Open UK tickets this month. <br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389] (26/11/14)<br /><br />
A perfsonar ticket for Sussex. Their perfsonar has been reinstalled, but needs soothing. Matt has informed us that this might have to wait a few weeks due to other issues. On Hold (21/1)<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110536 110536] (2/12/14)<br /><br />
MICE job failures at RALPP - it looked like they were dying due to running of of memory. The queues have been tweaked to give MICE more, but no word from the MICE if this has solved the problem. Waiting for reply (12/1)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110365 110365] (25/11/14)<br /><br />
Another perfsonar ticket. Again the node is reinstalled, just not quite working right. Winnie is waiting for news from the other sites in a similar boat. In progress (maybe On Hold it?) (20/1)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111118 111118] (12/1)<br /><br />
ECDF "low availability" ticket - just waiting for the silly alarm to clear. Daniela submitted a ticket about this foolish alarm a while ago - [https://ggus.eu/?mode=ticket_info&ticket_id=107689 107689]. On Hold (19/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (1/7/13)<br /><br />
glexec tarball ticket. With my tarball hat on - still no positive news on this front - it's beginning to look like this can't be done but we're having one last go. Sorry! On Hold (19/12)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110225 110225] (18/11/14)<br /><br />
Change of VO Manager for helios-vo.eu. It looks like this ticket is being held up at the user end a lot. I'm not sure there's anything we can do as it involves outside CAs. On Hold (20/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356] (23/1)<br /><br />
One of Manchester's CEs not working for biomed, due to problems with the new CREAM/old WMS communication. Alessandra gave biomed some sagely advice, but I suspect this ticket will need to be prodded soon to get a reponse from biomed (who I agree should use a newer WMS and close it). On Hold (26/1)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111547 111547] (2/2)<br /><br />
I'm reporting on a ticket that I submitted to myself today. I'm not sure what that says about the world. Anyway - a ticket to track the decommissioning of one of Lancaster's CEs, as we try to do it all proper like. On Hold (2/2)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (21/1/14)<br /><br />
Lancaster's perfsonar ticket, which I sadly let reach its first birthday. I've been prodding this offline, does anyone have the address for a regular, open iperf endpoint I could borrow? On Hold (9/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13)<br /><br />
Lancaster's tarball glexec ticket, as the ECDF one. On hold (26/1)<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13)<br /><br />
UCL's glexec ticket. They've been having trouble getting it to behave, and at last check Ben was off ill - probably due to dealing with glexec :-) On Hold (20/1)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353] (25/11/14)<br /><br />
Atlas asking for QM's storage to be made available via https. Waiting on a production ready STORM that can provide this - Dan is trying it out on his testbed se02.esc.qmul.ac.uk, which still needs tweaking. In progress (28/1)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357](23/1)<br /><br />
One of the IC CEs not working for biomed. Similar to the Manchester ticket, Daniela points to ticket 110635 and is waiting on an EMI release to fix it (due out imminently AIUI). On Hold (28/1)<br />
<br />
'''EFDA-JET'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=97485 97485] (21/9/13)<br /><br />
Jet's LCHB job failure tickets. I'm afraid I haven't been able to chase this up (partly due to only ever remembering on the first Monday of the month) - there's been no news for a while. On Hold (1/10/14)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333] (22/1)<br /><br />
A ticket to 100IT and the NGI to get the cloud accounting probe upgraded. I notified 100IT, but forgot to reassign the ticket - thanks to Jeremy for doing it. Assigned (2/2)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356](10/9/14)<br /><br />
Getting VMcatcher working at 100IT. David from 100IT has asked for some answers on which "glancepush" to use, but no reply for a while. Waiting for reply (19/1)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111477 111477](29/1)<br /><br />
CMS would like to run some staging tests to warm up for Run2. The Tier 1 warned CMS of today's outage and they're happy to proceed tomorrow (the 3rd) - I think they'd like a response. In progress (30/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=107935 107935](27/8/14)<br /><br />
A ticket regarding inconsistent BDII and SRM storage numbers. Waiting on a fix from the developers regarding read-only disk accounting (I think), Brian is still on the case. Stephen B let us know that Maria the ticket submitter is on maternity leave, and asks in her stead if the numbers are expected to align now. On hold (28/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111120 111120](12/1)<br /><br />
An atlas ticket about a large number of data transfer errors seen between RAL and BNL. Brian reckoned that this was due to shallow checksums on the old data being transferred, but had trouble looking at the BNL FTS. Regardless, the ADCoS shifter hadn't seen any errors for a week and suggests the ticket can be closed. Waiting for reply (29/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944](1/10/14)<br /><br />
CMS AAA test problems at RAL. After setting up a new xrootd box the test failures have changed in nature, but sadly they're still failures. In progress (29/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111347 111347](22/1)<br /><br />
CMS Consistency Check for RAL, January 2015 edition. Filelists were generated, orphan files were identified, then purged. Just need to know what CMS want to do next. Waiting for reply (26/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694](28/10/14)<br /><br />
Sno+ ticket concerning gfal tool problems, waiting on the new release to come out (middle of this month I believe). If you don't want to wait that long then I believe the 2.8 gfal2 tools can be found in the fts3 repo at last check. On hold (20/1)<br />
<br />
<br />
'''Monday 26th January 2015, 14.15 GMT'''<br /><br />
''Back after being forgotten about by me:''<br /><br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios Status:''']<br />
<br />
At the time of writing I see:<br /><br />
'''Imperial:''' gridpp VO job submission errors (but only 34 minutes old so probably naught to worry about).<br /><br />
'''Brunel:''' gridpp VO jobs aborted (one of these is 94 days old, so might be something to worry about).<br /><br />
'''Lancaster:''' pheno failures (I can't see what's wrong, but this CE only has 10 days left to live).<br /><br />
'''Sussex:''' snoplus failures (but I think Sussex is in downtime).<br /><br />
'''RALPP:''' A number of failures across a number of CEs, all a few hours old. An SE problem?<br /><br />
'''Sheffield:''' gridpp VO job submission failure, but only 6 hours old.<br />
And of course the srm-$VONAME failures at the Tier 1, which are caused by incompatibility between the tests and Castor AIUI. Things are generally looking good.<br />
<br />
22 Open UK Tickets this week. <br /><br />
'''NGI/100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333](22/1)<br /><br />
The NGI has been asked to upgrade the cloud accounting probe, and then notify our (only at the moment) cloud site to republish their accounting. Not entirely sure what this entails or who this falls on, I assigned it to NGI-OPERATIONS (and also noticed that 100IT isn't on the "notify site" list - odd). Assigned (22/1)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944](1/10/14)<br /><br />
CMS AAA test failures. Andrew Lahiff reported last week that the Tier 1 is building a replacement xrootd box which is currently being prepared. If that will take a while can the ticket be put on hold? In progress (19/1)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353](25/11/14)<br /><br />
An atlas ticket, asking for httpd access to at QMUL. The QM chaps were waiting on a production ready Storm that could handle this, and are preparing to test one out. This is another ticket that looks like it might need to be put On Hold (will leave that up to you chaps - there's a big difference between "slow and steady" progress and "no progress for a while"). In progress (21/1)<br />
<br />
'''RHUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111355 111355](23/1)<br /><br />
A dteam ticket - concerning http access to RHUL's SE. Although the initial observation about the SE certificate being expired was incorrect (the expiry date was reported as 5/1/15, which to be fair I would read as the 5th of January and not the 1st of May!) there still is some underlying problem here with intermittent test failures. Also this ticket raises the question of under what context are these tests being conducted? Anyone know, or shall we ask the submitter? In progress (26/1)<br />
<br />
'''BIOMED PROBLEMS:'''<br /><br />
'''Manchester:''' [https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356](23/1)<br /><br />
'''Imperial:''' [https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357](23/1)<br /><br />
Biomed are having job problems, looking to be caused by using crusty old WMSes to communicate with these site's shiny up-to-date CEs. According to ticket 110635 a cream side fix should be out by the end of January (CREAM 1.16.5), although Alessandra suggests that Biomed should try to use newer, working WMSes - or Dirac instead!<br />
<br />
<br />
'''Monday 19th January 2015, 14.30 GMT'''<br /><br />
23 Open UK Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944](1/10/14)<br /><br />
CMS seeing AAA test failures at RAL. The tests have been restarted recently and now seem to be having some suspicious looking authentication failures. In Progress (13/1)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111162 111162](14/1)<br /><br />
Atlas complaining about httpd doors not working on Sheffield's SE. After schooling the submitter in how to submit more useful information Elena is working on it. I bring this up as in the last few days I've had quite a few of my pool nodes have their httpd daemons crash on them (they're up to date, but still SL5), which may or may not be related. In Progress (19/1)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111118 111118](12/1)<br /><br />
ECDF "low availability" ticket after a few days of argus trouble, which Wahid fixed. Now the ticket will languish for a few weeks as the alarm clears. Daniela has reminded us of her ticket against these fairly silly alarms: https://ggus.eu/?mode=ticket_info&ticket_id=107689 . In the mean time this ticket could do with being put On Hold whilst the alarm clears. In progress (19/1)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356](10/9/2014)<br /><br />
Setting up VMCatcher at 100IT. After some troubles things seem to have be looking up, although there are still some questions that the 100IT chaps have for the configurations and what they should be using that aren't getting answers. I set the ticket to "Waiting for Reply" hoping that this will help get those in the know's attention. Waiting for reply (15/1)<br />
<br />
'''Perfsonar Tickets'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389](Sussex)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110382 110382](TIER 1)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108273 108273](Durham)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566](Lancaster)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110365 110365](Bristol)<br /><br />
Everyone seems to have updated their perfsonar hosts, so we're all good on that front, but a number of sites are either having trouble with their reinstalled hosts, or are having problems that they had pre-reinstall still haunt them. I'm afraid I have no suggestions of what to do about the growing number of these tickets though!</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Operations_Bulletin_LatestOperations Bulletin Latest2015-11-24T10:54:07Z<p>Matthew Doidge 1ac9bd3994: /* */</p>
<hr />
<div>[[Operations_Bulletin_Archive|Bulletin archive]]<br />
<br />
__NOTOC__<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Week commencing 23rd November 2015<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Task Areas<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start General****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | General updates <br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start General text*********************** ----->'''<br />
<br />
'''Tuesday 24th November'''<br />
* Some ATLAS sites (as of Monday) still had PRODDISK listed in the BDII.<br />
* ATLAS DPM 'dump' - a list of files?<br />
* An agenda for December's GDB is developing - [https://indico.cern.ch/event/319754/ see here]. Tier-2 representation has wained recently. Perhaps someone attending the DPM meeting could attend the GDB too?<br />
* [https://www.gridpp.ac.uk/wiki/Example_Build_of_an_ARC/Condor_Cluster#Patch_for_BDII_Job_Count_Breakdown Andrew's BDII patch] used in relation to LSST job breakdown issues.<br />
* [https://www.gridpp.ac.uk/wiki/User_Interface_%28UI%29_to_support_approved_VOs Installing a UI on SL6].<br />
<br />
'''Tuesday 17th November'''<br />
* The [https://vm36.tier2.hep.manchester.ac.uk/users/case-studies/galdyn/ GalDyn case study] has been written up<br />
* There is now a job adverts section on the new GridPP website. Now we need a process to populate the area....<br />
* There is a summary available of the [https://indico.cern.ch/event/401680 LHCOPN-LHCONE meeting in Amsterdam on the 28-29 of October 2015].<br />
* Ewan: Pilot accounts for everyone<br />
* Welcome to Markus Ebert who has started at Edinburgh (designated GridPP/LSST DAC person).<br />
<br />
<br />
'''Tuesday 10th November'''<br />
* There was a WLCG GDB last week: [http://indico.cern.ch/event/319753/ Agenda]. [https://twiki.cern.ch/twiki/bin/view/LCG/GDBMeetingNotes20151104 Minutes].<br />
* ATLAS is going to move the brokering to use maxrss value rather than maxrss+maxswap (called maxmemory in the ATLAS panda queues). [https://twiki.cern.ch/twiki/bin/view/LCG/BSPassingParameters Reminder notes].<br />
* There is an EGI meeting in Bari this week. The OMB agenda/slides can be seen [https://indico.egi.eu/indico/contributionDisplay.py?contribId=1&confId=2675 here]. <br />
* An update from John Gordon on the CPU efficiency accounting discussion of last week."Stuart has the database merge under way now. We had hoped it would be done by the end of October but it isn’t complete yet. When done and we send an integrated dataset to the portal they will put the current dev view in the production portal alongside the current one. They are currently developing a major rewrite of the portal and don’t want to mess about with it too much."<br />
*J Perkin: Weird user error when reading LFC. Led to a reminder that the CA team are currently best efforts - please be patient.<br />
* J Hill: ATLAS consistency checks at Tier2s requires sites to upload stiorage dumps - please can we have a clear statement of what parameters ATLAS wants us to use for these dumps?<br />
* Simon G: Asked about Tier3 access to Tier2 storage.<br />
* LCG-ROLLOUT: TOP BDII issues with CentOS 6.7 (openldap-servers-2.4.40-5.el6.x86_64). It basically breaks but is being followed up with RH.<br />
* For those interested in the ARGUS working group meeting last week please see their [https://indico.cern.ch/event/459898/ summary].<br />
* There was a GridPP Technical Meeting last Friday: [https://indico.cern.ch/e/460437 Agenda].<br />
<br />
<!-- **********************End General text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End General****************** -----><br />
<!-- ****************Start ops coord****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Operations Coordination - [https://indico.cern.ch/categoryDisplay.py?categId=4372 Agendas][https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsCoordination Wiki Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start ops coord text*********************** -----><br />
'''Tuesday 24th November'''<br />
* There was a WLCG ops coordination meeting last Thursday: [https://indico.cern.ch/event/393620/ Agenda] | [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151119 Minutes].<br />
* Baselines: dCache 2.6.x decommissioned deadline was end of September.<br />
* News: Critical Vulnerability broadcast by SVG on [https://wiki.egi.eu/wiki/SVG:Advisory-SVG-2015-CVE-2015-7183 Friday 06 affecting NSS].<br />
* T0 news: IPv6 enabled in MyProxy and VOMS for testing purposes, in dual-stack mode (IPv4 and IPv6). LSF 9 upgrade of the WNs is in QA testing. <br />
* ALICE: Good activity. Preparations for heavy ion reco jobs. At CERN ALICE jobs can request 2 cores and hence have twice the memory but with an impact on efficiency.<br />
* ATLAS: new record in parallel running slots: 250k. Some mc15b campaign jobs were requesting an excessive amount of conditions data. Deletion agents: deletion agents were switched off for a few days for operational reasons but once on again struggled to keep up.PRODDISK has been decommissioned on all the Tier2s.<br />
* CMS: High loads:~120k parallel jobs. Multi-billion events MC RECO campaign ahead.<br />
* LHCb: Very high activity. Several days of failures at SARA when srm was overloaded by a local user. Working on interface to HTCondor-CE.<br />
* glexec TF: NTR<br />
* HTTP TF: [ https://indico.cern.ch/event/459419 Meeting on 5th November]. Have a working Nagios probe, endpoint lists from the experiments, regular monitoring of the infrastructure (see links on agenda) and a GGUS support unit.<br />
* IS evolution: The first draft of the Future Use Cases document is now available for comments. Deadline to provide input is on 24th November. There was a [https://indico.cern.ch/event/454975/attachments/1188757/1724809/ISTF-minutes-12112015.pdf TF meeting on 12th November]. Looking at options for publishing a subset of the current GLUE schema that is useful for WLCG in JSON/HTTPS.<br />
* IPv6: NTR<br />
* MW readiness: Tests reported for EOS, DPM, dCache, BDII, MW readiness app. Next meeting 2nd December.<br />
* Multicore: NTR<br />
* Network and transfer metrics: perfSONAR collector, datastore, publisher and dashboard in production (stable operations). ALL sites are encouraged to enable auto-updates for perfSONAR. Pilots: ATLAS Panda, perfSONAR stream now in ATLAS Network Analytics. LHCb DIRAC bridge is now functional.<br />
* RFC proxies: NTR<br />
* Squid monitoring and proxy discovery: NTR.<br />
* AOB: Andrew McNab will take over the coordination of the Machine/Job Features task force.<br />
<br />
<br />
'''Tuesday 17th November'''<br />
* There is a [https://indico.cern.ch/event/393620/ WLCG ops coordination meeting this week] on Thursday.<br />
<br />
'''Tuesday 10th November'''<br />
* There was a WLCG ops coordination meeting last Thursday: [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151105 Minutes]. (Alessandra chaired and might be able to talk over the main items at our ops meeting).<br />
<br />
<!-- **********************End ops coord text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End ops coord****************** -----><br />
<br />
<!-- ****************Start T1****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-1 - [http://www.gridpp.rl.ac.uk/status/ Status Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start T1 text*********************** -----><br />
'''Tuesday 24th November'''<br />
A reminder that there is a weekly [http://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Tier-1 experiment liaison meeting]. Notes from the last meeting [https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2015-11-18 here]<br />
* We are investigating why LHCB batch jobs sometimes fail to write results back to Castor (and the sometimes fail to write remotely as well).<br />
* We are continuing with some detailed network changes needed to remove our old core switch from the network. On Wednesday morning (18th) the link to the Atlas building (R26) was successfully moved. (There was a site 'warning' in the GOC DB for this.)<br />
* A week or so ago we saw some Atlas Hammercloud tests failing (loss of heartbeat) - although the problem seems to have gone away now. This is not understood yet.<br />
* We have found a problem on some disk servers of one particular batch that have been updated to SL6. The servers can run slowly and individual commands hang (until a timeout) while making name look-ups. So far a total of three servers have been affected spread over a couple of weeks. We can easily fix the problem but do not yet know why it occurs.<br />
* The tenders for the next round of Disk and CPU purchases are now out.<br />
* All of the Tier1's Castor tape servers are now running Castor version 2.1.15. (The rest of Castor is still at 2.1.14 - with the aim of upgrading early-ish in 2016).<br />
* We have seen some saturation of the 'bypass' link that transfers data to non-Tier1 sites a week ago. This is currently a 10Gbit link which was very busy for about a three-day period. The amount of traffic using the OPN link has also been rising.<br />
<!-- **********************End T1 text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End T1****************** -----><br />
<br />
<!-- ****************Start Storage & DM****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Storage & Data Management - [http://storage.esc.rl.ac.uk/weekly/ Agendas/Minutes]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20151028-minutes.txt Wednesday 28 Oct]'''<br />
* Summary of "UK T0" workshop - GridPP well represented<br />
* Sites should not upgrade to DPM 1.8.10 just yet<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150902-minutes.txt Wednesday 02 Sep]'''<br />
* Catch up with [http://www3.imperial.ac.uk/highenergyphysics/research/experiments/mice MICE]<br />
* How to do transfers of '''lots''' of files with FTS3 without the proxy timing out (in particularly if you need it vomsified)<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150812-minutes.txt Wednesday 12 Aug]'''<br />
* sort of housekeeping: data cleanups, catalogue synchronisation - in particular namespace dumps for VOs<br />
* GridPP storage/data at future events; GridPP35 and Hepix and Cloud data events<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150708-minutes.txt Wednesday 08 July]'''<br />
* Huge backlog of ATLAS data from Glasgow waiting to go to RAL, and oddly varying performance numbers - investigating<br />
* How physics data is like your Windows 95 games<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150701-minutes.txt Wednesday 01 July]'''<br />
* Feedback on CMS's proposal for listing contents of storage<br />
* Simple storage on expensive raided disks vs complicated storage on el cheapo or archive drives?<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End Storage & DM****************** -----><br />
<!-- ****************Start T2EVO****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-2 Evolution - [https://its.cern.ch/jira/browse/GRIDPP/ GridPP JIRA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 24 Nov'''<br />
* Cloud Init version of (old) GridPP DIRAC VMs and support in Vac 0.20pre<br />
* Progress on VMs for new GridPP DIRAC service: some multi-VO config currently preventing matching? Asking for developer help. (GRIDPP-9)<br />
<br />
'''Tuesday 17 Nov'''<br />
* New depo.gridpp.ac.uk service for uploading files to via HTTPS<br />
* ATLAS VMs now upload log files to depo.gridpp.ac.uk for debugging (GRIDPP-24)<br />
<br />
'''Monday 9 Nov'''<br />
* Vac 0.19.0 released last week: new VacQuery protocol<br />
* Prototyping Vcycle(/Vac) GLUE2 publishing: talk at WLCG InfoSys Evolution TF on Thursday<br />
* Please now use https://repo.gridpp.ac.uk/vacproject/ URLs for user_data and cernvm3.iso files (in preparation for move to new www.gridpp.ac.uk website.)<br />
* See Laurence's GDB talk "Helix Nebula Price Enquiry Results" for more use of Vcycle<br />
<br />
'''Tuesday 3 Nov'''<br />
* LHCb prototype of GOCDB pointers to resource BDII done<br />
* T2C tests at Oxford ongoing<br />
<br />
'''Tuesday 20 Oct'''<br />
* LHCb multipilot VMs now in production<br />
* Support for APEL-Sync records in Vac, but need to co-ordinate with APEL team to validate it. This is to allow pure-VM sites like UCL to pass APEL SAM tests (GRIDPP-10)<br />
* Last GridPP Technical Meeting decided to test disk-less operation at Oxford for CMS (GRIDPP-20) and LHCb (GRIDPP-21).<br />
<br />
'''Tuesday 6 Oct'''<br />
* UCL Vac site now running LHCb test of two payloads per dual processor VM. Total of dual processor VMs at UCL now 120.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End T2EVO****************** -----><br />
<!-- ****************Start Accounting****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Accounting - [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html UK Grid Metrics] [[HEPSPEC06]] [http://tinyurl.com/cfbfanf Atlas Dashboard HS06]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 24th November'''<br />
* Slight delay for Sheffield.<br />
<br />
'''Tuesday 3rd November'''<br />
* APEL delay (normal state) Lancaster and Sheffield.<br />
<br />
'''Tuesday 20th October'''<br />
The WLCG MB decided to create a Benchmarking Task force led by Helge Meinhard see [https://indico.cern.ch/event/433303/contribution/9/attachments/1154494/1658853/2015-09-15-LCGMB-Benchmarking.pdf talk]<br />
<br />
* [http://accounting.egi.eu/egi.php?SubRegion=1.67&query=normcpu&startYear=2014&startMonth=8&endYear=2014&endMonth=9&yRange=SITE&xRange=VO&voGroup=lhc&chart=GRBAR&scale=LIN&localJobs=onlygridjobs APEL status]: An issue at Sheffield?<br />
<br />
<!-- *****************Edit stop*********************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End Accounting****************** -----><br />
<!-- ****************Start Documentation****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Documentation - [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=area KeyDocs]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- *********************************************************** -----><br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 24th November'''<br />
Steve J: problems with voms server at fnal, voms.fnal.gov, resolved. Approved VOs document updated with newest records for those VOs affected, CDF, DZERO, LSST. Also, note changes to CA_DN for PLANCK and CDF. https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs<br />
<br />
'''Friday 6th Nov, 2015'''<br />
<br />
SteveJ: Advice to admins published about a common GSS error, globus_gsi_callback_module: Could not verify <br />
credential etc.<br />
<br />
https://www.gridpp.ac.uk/wiki/Security_system_errors_and_workarounds<br />
<br />
'''Tuesday 20th October, 2015'''<br />
<br />
Approved VOs document updated with temporary section for [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs#Approved_VOs_in_the_process_of_being_established LZ]<br />
<br />
'''Tuesday 29nd September'''<br />
Steve J: problems with voms server at fnal, voms.fnal.gov, have been detected; I will resolve them soon and may issue an update to Approved VOs, alerting sites with TB_SUPPORT should that occur. Approved VOs potentially affected are CDF, DZERO, LSST. Please do not act act yet.<br />
<br />
'''Tuesday 22nd September'''<br />
* Steve J is going to undertake some GridPP/documentation usability testing. <br />
<br />
'''Tuesday 18th August'''<br />
* Lydia's document - Setup a system to do data archiving using FTS3<br />
<br />
'''Tuesday 28th July'''<br />
* Ewan: /cvmfs/gridpp-vo help ... there's a lot of historical stuff on the GridPP wiki that makes it look a lot more complicated than it is now. We really should have a bit of a clear out at some point.<br />
<br />
'''Tuesday 23rd June'''<br />
* Reminder that documents need reviewing!<br />
<br />
<br />
'''General note'''<br />
<br />
See the [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=reviewed worst KeyDocs list] for documents needing review now and the names of the responsible people.<br />
<br />
|}<br />
<br />
<!-- ****************End Documentation****************** -----><br />
<!-- ****************Start Interop****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Interoperation - [https://wiki.egi.eu/wiki/Grid_Operations_Meetings EGI ops agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
''' Tuesday 10th November<br />
<br />
Meeting yesterday was cancelled in favour of OMB; next meeting scheduled for 14/12.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- *********************************************************** -----><br />
<br />
|}<br />
<!-- ****************End Interop****************** -----><br />
<!-- ****************Start Monitoring****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Monitoring - [https://www.gridpp.ac.uk/wiki/Links_Monitoring_pages Links] [http://grid-monitoring.cern.ch/mywlcg/ MyWLCG]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 16th June'''<br />
* F Melaccio & D Crooks decided to add a [https://www.gridpp.ac.uk/wiki/Monitoring_FAQs FAQs section] devoted to common monitoring issues under the monitoring page.<br />
* Feedback welcome. <br />
<br />
<br />
<br />
''' Tuesday 31st March<br />
<br />
* Glasgow Graphite/Grafana documentation: http://www.scotgrid.ac.uk/graphite/<br />
<br />
'''Monday 7th December<br />
<br />
* Meeting last Friday - agenda: https://indico.cern.ch/event/356853/ minutes: https://indico.cern.ch/event/356853/material/minutes/1.pdf<br />
* This was the wrap-up meeting of the consolidation TF; the mailing list will remain extant for a while yet.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Monitoring****************** -----><br />
<!-- ****************On-duty****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | On-duty - [https://operations-portal.in2p3.fr/dashboard Dashboard] [https://www.gridpp.ac.uk/wiki/ROD_rota ROD rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Monday 23rd November'''<br />
* Still seeing the alarms for systems at QMUL and RHUL that are not in production.<br />
* Closed some tickets for the rolling availabilities. Confused by the rolling availability plots. It seems the "rolling average" period was cut back to 20 days. <br />
<br />
'''Tuesday 3rd November'''<br />
* The long-standing UCL availability alarm went green yesterday on 29th October. We are not sure why!<br />
* Quite a lot of activity on the dashboard this week, but only one or two new tickets. <br />
* Tickets: Five for availability / reliability: Sussex, Sheffield, Liverpool, Lancaster and UCL. Two for GLUE2 validation: Liverpool and QMUL. One for the CEs at QMU<br />
<br />
'''Tuesday 20th October'''<br />
<br />
Lots of bits here and there, but no big pattern. Tickets about CE and storage problems open at several sites. QMUL notable as going on for a while, probably with some kind of configuration problem they're not identifying.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Rollout [http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html Status] [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions WLCG Baseline]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 15th September'''<br />
* [http://repository.egi.eu/2015/09/10/release-umd-3-13-3/ UMD 3.13.3 is available].<br />
<br />
'''Tuesday 12th May'''<br />
* MW Readiness WG meeting Wed May 6th at 4pm. Attended by Raul, Matt, Sam and Jeremy.<br />
<br />
'''Tuesday 17th March'''<br />
* Daniela has updated the [ https://www.gridpp.ac.uk/wiki/Staged_rollout_emi3 EMI-3 testing table]. Please check it is correct for your site. We want a clear view of where we are contributing.<br />
* There is a middleware readiness meeting this Wednesday. Would be good if a few site representatives joined.<br />
* Machine job features solution testing. Fed back that we will only commence tests if more documentation made available. This stops the HTC solution until after CHEP. Is there interest in testing other batch systems? Raul mentioned SLURM. There is also SGE and Torque.<br />
<br />
'''References'''<br />
<br />
* Staged Rollout pages (now separated into EMI1 & 2), and the page listing the deployed versions is extractable from the bdii, so they should all be reasonably up-to-date:<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout_emi2.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End On-duty****************** -----><br />
<!-- ****************Start Security****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Security - [http://www.gridpp.ac.uk/security/inchand/Incident.html Incident Procedure] [http://www.gridpp.ac.uk/security/policies/index.html Policies] [https://www.gridpp.ac.uk/wiki/SD_rota Rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
'''Tuesday 24th November'''<br />
* Call on NGIs to participate in "Security Threat Risk Assessment - with Cloud Focus" work.<br />
* Check Pakiti for CVE-2015-7183 issues.<br />
<br />
'''Tuesday 17th November'''<br />
* [https://wiki.egi.eu/wiki/SVG:Advisory-SVG-2015-CVE-2015-7183 Advisory-SVG-2015-CVE-2015-7183] issued 06/11/2015: a few UK sites show as unpatched in [https://operations-portal.egi.eu/csiDashboard/ngi/NGI_UK/tab/sites/filter/operators/page/sites EGI monitoring]. WNs, as tested by the monitoring, may be less vulnerable than affected middleware services but they could be taken as an indication of general site readiness and sites are encouraged to check their status. <br />
<br />
The EGI [https://operations-portal.egi.eu/csiDashboard security dashboard].<br />
<br />
|}<br />
<!-- ****************End Security****************** -----><br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start Services****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Services - [http://psmad.grid.iu.edu/maddash-webui/ PerfSonar dashboard] | [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).<br />
<br />
'''Tuesday 6th October'''<br />
* A reminder, the next [https://indico.cern.ch/event/401680/ LHCOPN and LHCONE joint meeting]: Science Park Amsterdam (NL) 28-29 of October 2015<br />
<br />
'''Tuesday 14th July'''<br />
* GridPP35 in September will have a part focus on networking and IPv6. This will include a review of where sites are with their deployment. Please try to firm up dates for your IPv6 availability between now and September. Please update the [https://gridpp.ac.uk/wiki/IPv6_site_status GridPP IPv6 status table]. <br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Services****************** -----><br />
<!-- ****************Start Tickets****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tickets<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Monday 23rd November 2015, 15.00 GMT'''<br /><br />
There were just 18 Open UK Tickets - now we're up to 37...<br />
<br />
'''ATLASPRODDISK Deletion''' (20/11)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117740 117740] (RALPP)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117739 117739] (Sussex)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117737 117737] (Birmingham)<br />
<br />
Brian (as an agent of atlas) has asked sites to delete the data from proddisk and remove the tokens. Chris has sorted it already for RALPP and the ticket has mutated to a generic atlas cleanout. No news on the other two.<br />
<br />
'''PILOT LIGHT AT THE END OF THE TUNNEL'''<br /><br />
In another round of Pilot wrangling by Daniela sites have been asked to implement pilots for the Pheno VO (or where they are enabled find out why they're not working!):<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117706 117706] (GLASGOW)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117710 117710] (Brunel)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117711 117711] (RALPP)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117723 117723] (QMUL)<br />
<br />
For completeness the older pilot tickets:<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] (Sheffield)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116866 116866] (TIER 1)<br />
The Tier 1 is at the "debugging" stage, no news from Sheffield though.<br />
<br />
'''HOT OFF THE TICKET PRESS'''<br /><br />
Atlas have ticketed everyone and their grandmother asking to implement the automated storage consistency check stuff. Which I think is a little unfair as I'm unconvinced of the maturity of the tools after spending a day with them last week.<br />
<br />
I've updated Lancaster's ticket to see if we've set this up right during our mucking about last week: [https://ggus.eu/index.php?mode=ticket_info&ticket_id=117883 117883]<br /><br />
''Update''- Atlas have replied to say things are looking good at Lancaster so we just need a few more tweaks (such as using Alessandra's updated and polished version of the [https://gitlab.cern.ch/atlas-adc-ddm-support/dark_data/blob/master/dpm_dump.py dpm_dump.py] script and creating dumps that are only a day old and we're done.<br />
<br />
'''EVEN HOTTER OFF THE PRESS'''<br /><br />
Brunel have been asked by CMS to upgrade their DPM to 1.8.10 in [https://ggus.eu/?mode=ticket_info&ticket_id=117922 117922] - however Raul reports that things aren't working all that well, suggesting CMS need to have a natter with the DPM devs.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tickets****************** -----><br />
<!-- ****************Start Tools****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tools - [https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] [https://gridppnagios.physics.ox.ac.uk/nagios/ Nagios]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== ===== <br />
<!-- ******************Edit start********************* -----><br />
''' Tuesday 6 Oct 2015'''<br />
<br />
Moved Gridppnagios instance back to Oxford from Lancaster. It was kind of double whammy as both sites went down together. Fortunately Oxford site was partially working so we managed to start SAM Nagios at Oxford. Sam tests were unavailable for few hours but no affect on egi availibilty/reliability. Sites can have a look at https://mon.egi.eu/myegi/ss/ for a/r status. <br />
<br />
'''Tuesday 29 Sep 2015'''<br />
<br />
Following an air-conditioning problem at machine room in Oxford Tier-2 site on 26 September, gridppnagios(OX) was shut down and gridppnagios(Lancs) became active instance. Oxford site is in downtime until 1st Oct and it may be extended depending on the situation. <br />
VO-Nagios was also unavailable for two days but we have started it yesterday as it is running on a VM. VO-nagios is using oxford SE for replication test so it is failing those tests. I am looking to change to some other SE. <br />
<br />
'''Tuesday 09 June 2015'''<br />
*ARC CEs were failing nagios test becuase of non-availability of egi repository. Nagios test compare CA version from EGI repo. It started on 5th June and one of the IP addresses behind webserver was not responding. Problem went away in approximately 3 hours. The same problem started again on 6th June. Finally it was fixed on 8th June. No reason was given in any of the ticket opened regarding this outage. <br />
<br />
'''Tuesday 17th February'''<br />
* Another period where message brokers were temporarily unavailable seen yesterday. Any news on the last follow-up?<br />
<br />
'''Tuesday 27th January'''<br />
* Unscheduled outage of the EGI message broker (GRNET) caused a short-lived disruption to GridPP site monitoring (jobs failed) last Thursday 22nd January. Suspect BDII caching meant no immediate failover to stomp://mq.cro-ngi.hr:6163/ from stomp://mq.afroditi.hellasgrid.gr:6163/<br />
<br />
* [http://southgrid.blogspot.co.uk/2014/10/nagios-monitoring-for-non-lhc-vos.html Blog about VO Nagios]<br />
* [https://vo-nagios.physics.ox.ac.uk/nagios/ Oxford VO Nagios] currently monitoring gridpp, pheno, t2k.org, snoplus.snolab.ca, vo.southgrid.ac.uk.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tools****************** -----><br />
<!-- ****************Start VOs****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | VOs - [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS] [http://operations-portal.egi.eu/vo VO IDs] [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs Approved] [http://pprc.qmul.ac.uk/~walker/votable.html VO table]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 19th May'''<br />
* There is a current priority for enabling/supporting our joining communities. <br />
<br />
'''Tuesday 5th May'''<br />
* We have a number of VOs to be removed. Dedicated follow-up meeting proposed.<br />
<br />
'''Tuesday 28th April'''<br />
* For SNOPLUS.SNOLAB.CA, the port numbers for voms02.gridpp.ac.uk and voms03.gridpp.ac.uk have both been updated from 15003 to 15503.<br />
<br />
'''Tuesday 31st March'''<br />
* LIGO are in need of additional support for debugging some tests.<br />
* LSST now enabled on 3 sites. No 'own' CVMFS yet.<br />
<br />
* Impact<br />
** Citation policy (https://www.gridpp.ac.uk/acknowledging.html)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End VOs****************** -----><br />
<!-- ****************Start Sites****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Site Updates<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 24th February'''<br />
* Next review of status today.<br />
<br />
'''Tuesday 27th January'''<br />
* Squids not in GOCDB for: UCL; ECDF; Birmingham; Durham; RHUL; IC; Sussex; Lancaster<br />
* Squids in GOCDB for: EFDA-JET; Manchester; Liverpool; Cambridge; Sheffield; Bristol; Brunel; QMUL; T1; Oxford; Glasgow; RALPPD.<br />
<br />
'''Tuesday 2nd December''' <br />
* [https://www.gridpp.ac.uk/wiki/Batch_system_status Multicore status]. Queues available (63%)<br />
** YES: RAL T1; Brunel; Imperial; QMUL; Lancaster; Liverpool; Manchester; Glasgow; Cambridge; Oxford; RALPP; Sussex (12)<br />
** NO: RHUL (testing); UCL; Sheffield (testing); Durham; ECDF (testing); Birmingham; Bristol (7)<br />
<br />
* According to our [https://www.gridpp.ac.uk/wiki/Batch_system_status table] for cloud/VMs (26%)<br />
** YES: RAL T1; Brunel; Imperial; Manchester; Oxford (5)<br />
** NO: QMUL; RHUL; UCL; Lancaster; Liverpool; Sheffield; Durham; ECDF; Glasgow; Birmingham; Bristol; Cambridge; RALPP; Sussex (14)<br />
<br />
* [http://www.gridpp.ac.uk/php/gridpp-dirac-sam.php?action=viewlcg GridPP DIRAC jobs] successful (58%)<br />
** YES: Bristol; Glasgow; Lancaster; Liverpool; Manchester; Oxford; Sheffield; Brunel; IC; QMUL; RHUL (11)<br />
** NO: Cambridge; Durham; RALPP; RAL T1 (4) + ECDF; Sussex; UCL; Birmingham (4)<br />
<br />
* [https://www.gridpp.ac.uk/wiki/IPv6_site_status IPv6 status]<br />
** Allocation - 42%<br />
** YES: RAL T1; Brunel; IC; QMUL; Manchester; Sheffield; Cambridge; Oxford (8)<br />
** NO: RHUL; UCL; Lancaster; Liverpool; Durham; ECDF; Glasgow; Birmingham; Bristol; RALPP; Sussex<br />
<br />
* Dual stack nodes - 21%<br />
** YES: Brunel; IC; QMUL; Oxford (4)<br />
** NO: RHUL; UCL; Lancaster; Glasgow; Liverpool; Manchester; Sheffield; Durham; ECDF; Birmingham; Bristol; Cambridge; RALPP; Sussex, RAL T1 (15)<br />
<br />
<br />
<br />
'''Tuesday 21st October'''<br />
* High loads seen in xroot by several sites: Liverpool and RALT1... and also Bristol (see Luke's TB-S email on 16/10 for questions about changes to help).<br />
<br />
'''Tuesday 9th September'''<br />
* Intel announced the new generation of Xeon based on Haswell.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Sites****************** -----><br />
|}<br />
<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Meeting Summaries<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start PMB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Project Management Board - [http://www.gridpp.ac.uk/pmb/ Members][http://www.gridpp.ac.uk/php/pmb/minutes.php Minutes] [http://www.gridpp.ac.uk/pmb/ProjectManagement/QuarterlyReports/reports.html Quarterly Reports]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ******************************************* -----><br />
<!-- ****************End PMB****************** -----><br />
<!-- ****************Start Grid Ops****************** -----><br />
<!-- ******************************************* -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | GridPP ops meeting - [http://indico.cern.ch/categoryDisplay.py?categId=338 Agendas] [https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Actions] [https://www.gridpp.ac.uk/wiki/Category:GridPP_Operations Core Tasks]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Grid Ops****************** -----><br />
<!-- ****************Start T1 liaison****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) [https://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Agenda] Meeting takes place on Vidyo.<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
Highlights from this meeting are now included in the Tier1 report farther up this page.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Liaison****************** -----><br />
<!-- ****************Start GDB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Grid Deployment Board - [http://indico.cern.ch/categoryDisplay.py?categId=3l181 Agendas] [http://indico.cern.ch/categoryDisplay.py?categId=666 MB agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End GDB****************** -----><br />
<br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start NGI****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | NGI UK - [http://www.ukngi.ac.uk/ Homepage] [https://ca.grid-support.ac.uk/cgi-bin/pub/pki?cmd=getStaticPage&name=index CA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End NGI****************** -----><br />
<br />
<!-- ****************Start Events****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Events<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************Stop Events****************** -----><br />
<!-- ****************Start ATLAS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK ATLAS - [http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?#currentView=Shifter+view&highlight=false Shifter view] [http://www.atlas.ac.uk/ops/ News & Links]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Atlas S&C week 2-6 Feb 2015'''<br />
<br />
Production<br />
<br />
• Prodsys-2 in production since Dec 1st<br />
<br />
• Deployment has not been transparent , many issued has been solved, the grid is filled again <br />
<br />
• MC15 is expected to start soon, waiting for physics validations, evgen testing is underway and close to finalised.. Simulation expected to be broadly similar to MC14, no blockers expected. <br />
<br />
Rucio<br />
<br />
• Rucio in production since Dec 1st and is ready for LHC RUN-2. Some fields need improvements, including transfer and deletion agents, documentation and monitoring. <br />
<br />
• [https://rucio-ui.cern.ch/dumps Rucio dumps available].<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMDarkDataCleaning Dark data cleaning]<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMLostFilesLost files declaration] . Only Only DDM ops can issue lost files declaration for now, cloud support needs to fill a ticket.<br />
<br />
• Webdav panda functional tests with Hammercloud are ongoing<br />
<br />
Monitoring<br />
<br />
• [http://adc-monitoring.cern.ch/ Main page]<br />
<br />
• [http://dashb-atlas-ddm-acc.cern.ch/dashboard/request.py/ddmaccounting DDM Accounting]<br />
<br />
• [http://atlas-agis.cern.ch/agis/ddmblacklisting/listGroup space]<br />
<br />
• [http://dashb-atlas-ddm.cern.ch/ddm2/ Deletion]<br />
<br />
ASAP<br />
<br />
• ASAP (ATLAS Site Availability Performance) in place. Every 3 months the T2s sites performing BELOW 80% are [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/ATLASSiteCategorization reported to the International Computing Board.]<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End ATLAS****************** -----><br />
<!-- ****************Start CMS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK CMS<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End CMS****************** -----><br />
<!-- ****************Start LHCb****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK LHCb<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End LHCb****************** -----><br />
<!-- ****************Start Other****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK OTHER<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Other****************** -----><br />
<!-- ****************Start Requests****************** -----><br />
{| cellspacing="0" cellpadding="0" style="margin: 1em 0em 0em 0.3em; width:100%"<br />
| style="width:50%; vertical-align:top; border:1px solid Gold; background-color: LightGreen;" rowspan="2"|<br />
<div style="border-bottom:1px solid Gold; background-color:#ffffaa; padding:0.2em 0.5em 0.2em 0.5em; font-size:110%; font-weight:bold;"> To note</div><br />
<div style="padding:0.4em 1em 0.3em 1em;"><br />
<br />
==== ====<br />
<!-- ******************Edit start********************* -----><br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<br />
</div><br />
|}<br />
<br />
<!-- ****************End Requests****************** -----><br />
<br />
|}</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Operations_Bulletin_LatestOperations Bulletin Latest2015-11-23T16:10:17Z<p>Matthew Doidge 1ac9bd3994: /* */</p>
<hr />
<div>[[Operations_Bulletin_Archive|Bulletin archive]]<br />
<br />
__NOTOC__<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Week commencing 23rd November 2015<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Task Areas<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start General****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | General updates <br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start General text*********************** ----->'''<br />
<br />
'''Tuesday 24th November'''<br />
* Some ATLAS sites (as of Monday) still had PRODDISK listed in the BDII.<br />
* ATLAS DPM 'dump' - a list of files?<br />
* An agenda for December's GDB is developing - [https://indico.cern.ch/event/319754/ see here]. Tier-2 representation has wained recently. Perhaps someone attending the DPM meeting could attend the GDB too?<br />
* [https://www.gridpp.ac.uk/wiki/Example_Build_of_an_ARC/Condor_Cluster#Patch_for_BDII_Job_Count_Breakdown Andrew's BDII patch] used in relation to LSST job breakdown issues.<br />
* [https://www.gridpp.ac.uk/wiki/User_Interface_%28UI%29_to_support_approved_VOs Installing a UI on SL6].<br />
<br />
'''Tuesday 17th November'''<br />
* The [https://vm36.tier2.hep.manchester.ac.uk/users/case-studies/galdyn/ GalDyn case study] has been written up<br />
* There is now a job adverts section on the new GridPP website. Now we need a process to populate the area....<br />
* There is a summary available of the [https://indico.cern.ch/event/401680 LHCOPN-LHCONE meeting in Amsterdam on the 28-29 of October 2015].<br />
* Ewan: Pilot accounts for everyone<br />
* Welcome to Markus Ebert who has started at Edinburgh (designated GridPP/LSST DAC person).<br />
<br />
<br />
'''Tuesday 10th November'''<br />
* There was a WLCG GDB last week: [http://indico.cern.ch/event/319753/ Agenda]. [https://twiki.cern.ch/twiki/bin/view/LCG/GDBMeetingNotes20151104 Minutes].<br />
* ATLAS is going to move the brokering to use maxrss value rather than maxrss+maxswap (called maxmemory in the ATLAS panda queues). [https://twiki.cern.ch/twiki/bin/view/LCG/BSPassingParameters Reminder notes].<br />
* There is an EGI meeting in Bari this week. The OMB agenda/slides can be seen [https://indico.egi.eu/indico/contributionDisplay.py?contribId=1&confId=2675 here]. <br />
* An update from John Gordon on the CPU efficiency accounting discussion of last week."Stuart has the database merge under way now. We had hoped it would be done by the end of October but it isn’t complete yet. When done and we send an integrated dataset to the portal they will put the current dev view in the production portal alongside the current one. They are currently developing a major rewrite of the portal and don’t want to mess about with it too much."<br />
*J Perkin: Weird user error when reading LFC. Led to a reminder that the CA team are currently best efforts - please be patient.<br />
* J Hill: ATLAS consistency checks at Tier2s requires sites to upload stiorage dumps - please can we have a clear statement of what parameters ATLAS wants us to use for these dumps?<br />
* Simon G: Asked about Tier3 access to Tier2 storage.<br />
* LCG-ROLLOUT: TOP BDII issues with CentOS 6.7 (openldap-servers-2.4.40-5.el6.x86_64). It basically breaks but is being followed up with RH.<br />
* For those interested in the ARGUS working group meeting last week please see their [https://indico.cern.ch/event/459898/ summary].<br />
* There was a GridPP Technical Meeting last Friday: [https://indico.cern.ch/e/460437 Agenda].<br />
<br />
<!-- **********************End General text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End General****************** -----><br />
<!-- ****************Start ops coord****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Operations Coordination - [https://indico.cern.ch/categoryDisplay.py?categId=4372 Agendas][https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsCoordination Wiki Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start ops coord text*********************** -----><br />
'''Tuesday 17th November'''<br />
* There is a [https://indico.cern.ch/event/393620/ WLCG ops coordination meeting this week] on Thursday.<br />
<br />
'''Tuesday 10th November'''<br />
* There was a WLCG ops coordination meeting last Thursday: [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151105 Minutes]. (Alessandra chaired and might be able to talk over the main items at our ops meeting).<br />
<br />
'''Tuesday 27th October'''<br />
* 13th MW Readiness WG meeting THIS Wed 28/10 @ 4pm CET in CERN room 28-S-023 or via vidyo.<br />
* There was a WLCG ops coordination meeting last Thursday: [https://indico.cern.ch/event/393618/ Agenda] - [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151022 Minutes]. <br />
<br />
'''Next Meeting is scheduled for Thursday 22nd October'''<br />
<br />
'''Tuesday 6th October'''<br />
* There was a WLCG ops coordination meeting last week. [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151001 Minutes]. [https://indico.cern.ch/event/393617/ Agenda] (which has John Gordon's accounting slides).<br />
* The highlights:<br />
** dCache sites should install the latest fix for SRM solving a vulnerability<br />
** All sites hosting a regional or local site xrootd should updgrade it at least to version 4.1.1<br />
** CMS DPM sites should consider upgrading dpm-xrootd to version 3.5.5 now (from epel-testing) or after mid October (from epel-stable) to fix a problem affecting AAA<br />
** Tier-1 sites should do their best to avoid scheduling OUTAGE downtimes at the same time as other Tier-1's supporting common LHC VOs. A calendar will be linked in the minutes of the 3 o'clock operations meeting to easily find out if there are already downtimes at a given date<br />
** The multicore accounting for WLCG is now correct for the 99.5% of the CPU time, with the few remaining issues being addressed. Corrected historical accounting data is expected to be available from the production portal by the end of the month<br />
** All LHCb sites will soon be asked to deploy the "machine features" functionality<br />
<br />
'''Tuesday 22nd September'''<br />
* There was an ops coordination meeting last Thursday: [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150917 Minutes].<br />
* Highlights:<br />
** All 4 experiments have now an agreed workflow with the T0 for tickets that should be handled by the experiment supporters and were accidentally assigned to the T0 service managers.<br />
** A new FTS3 bug fixing release 3.3.1 is now available.<br />
** A globus lib issue is causing problems with FTS3 for sites running IPv6.<br />
** The rogue Glasgow configuration management tool replacing the current configuration for VOMS with the old one was picked up and unfortunately discussed as though sites had not got the message about using the new VOMS.<br />
** No network problems experienced with the transatlantic link despite 3 out of 4 cables being unavailable.<br />
** T0 experts are investigating the slow WN performance reported by LHCb and others.<br />
** A group of experts at CERN and CMS investigate ARGUS authentication problems affecting CMS VOBOXes.<br />
** T1 & T2 sites please observe the actions requested by ATLAS and CMS (also on the WLCG Operations portal).<br />
* Actions for [https://wlcg-ops.web.cern.ch/sys-admins/tasks Sites]; [https://wlcg-ops.web.cern.ch/experiments/tasks Experiments].<br />
<br />
'''Tuesday 15th September'''<br />
* The [https://indico.cern.ch/event/393616/ next WLCG ops coordination meeting is this Thursday 17th September]. Are there any Tier-2 issues we wish to raise? Minutes will appear [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150917 here]. <br />
<br />
<br />
<!-- **********************End ops coord text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End ops coord****************** -----><br />
<br />
<!-- ****************Start T1****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-1 - [http://www.gridpp.rl.ac.uk/status/ Status Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start T1 text*********************** -----><br />
'''Tuesday 24th November'''<br />
A reminder that there is a weekly [http://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Tier-1 experiment liaison meeting]. Notes from the last meeting [https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2015-11-18 here]<br />
* We are investigating why LHCB batch jobs sometimes fail to write results back to Castor (and the sometimes fail to write remotely as well).<br />
* We are continuing with some detailed network changes needed to remove our old core switch from the network. On Wednesday morning (18th) the link to the Atlas building (R26) was successfully moved. (There was a site 'warning' in the GOC DB for this.)<br />
* A week or so ago we saw some Atlas Hammercloud tests failing (loss of heartbeat) - although the problem seems to have gone away now. This is not understood yet.<br />
* We have found a problem on some disk servers of one particular batch that have been updated to SL6. The servers can run slowly and individual commands hang (until a timeout) while making name look-ups. So far a total of three servers have been affected spread over a couple of weeks. We can easily fix the problem but do not yet know why it occurs.<br />
* The tenders for the next round of Disk and CPU purchases are now out.<br />
* All of the Tier1's Castor tape servers are now running Castor version 2.1.15. (The rest of Castor is still at 2.1.14 - with the aim of upgrading early-ish in 2016).<br />
<!-- **********************End T1 text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End T1****************** -----><br />
<br />
<!-- ****************Start Storage & DM****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Storage & Data Management - [http://storage.esc.rl.ac.uk/weekly/ Agendas/Minutes]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20151028-minutes.txt Wednesday 28 Oct]'''<br />
* Summary of "UK T0" workshop - GridPP well represented<br />
* Sites should not upgrade to DPM 1.8.10 just yet<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150902-minutes.txt Wednesday 02 Sep]'''<br />
* Catch up with [http://www3.imperial.ac.uk/highenergyphysics/research/experiments/mice MICE]<br />
* How to do transfers of '''lots''' of files with FTS3 without the proxy timing out (in particularly if you need it vomsified)<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150812-minutes.txt Wednesday 12 Aug]'''<br />
* sort of housekeeping: data cleanups, catalogue synchronisation - in particular namespace dumps for VOs<br />
* GridPP storage/data at future events; GridPP35 and Hepix and Cloud data events<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150708-minutes.txt Wednesday 08 July]'''<br />
* Huge backlog of ATLAS data from Glasgow waiting to go to RAL, and oddly varying performance numbers - investigating<br />
* How physics data is like your Windows 95 games<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150701-minutes.txt Wednesday 01 July]'''<br />
* Feedback on CMS's proposal for listing contents of storage<br />
* Simple storage on expensive raided disks vs complicated storage on el cheapo or archive drives?<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End Storage & DM****************** -----><br />
<!-- ****************Start T2EVO****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-2 Evolution - [https://its.cern.ch/jira/browse/GRIDPP/ GridPP JIRA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 17 Nov'''<br />
* New depo.gridpp.ac.uk service for uploading files to via HTTPS<br />
* ATLAS VMs now upload log files to depo.gridpp.ac.uk for debugging (GRIDPP-24)<br />
<br />
'''Monday 9 Nov'''<br />
* Vac 0.19.0 released last week: new VacQuery protocol<br />
* Prototyping Vcycle(/Vac) GLUE2 publishing: talk at WLCG InfoSys Evolution TF on Thursday<br />
* Please now use https://repo.gridpp.ac.uk/vacproject/ URLs for user_data and cernvm3.iso files (in preparation for move to new www.gridpp.ac.uk website.)<br />
* See Laurence's GDB talk "Helix Nebula Price Enquiry Results" for more use of Vcycle<br />
<br />
'''Tuesday 3 Nov'''<br />
* LHCb prototype of GOCDB pointers to resource BDII done<br />
* T2C tests at Oxford ongoing<br />
<br />
'''Tuesday 20 Oct'''<br />
* LHCb multipilot VMs now in production<br />
* Support for APEL-Sync records in Vac, but need to co-ordinate with APEL team to validate it. This is to allow pure-VM sites like UCL to pass APEL SAM tests (GRIDPP-10)<br />
* Last GridPP Technical Meeting decided to test disk-less operation at Oxford for CMS (GRIDPP-20) and LHCb (GRIDPP-21).<br />
<br />
'''Tuesday 6 Oct'''<br />
* UCL Vac site now running LHCb test of two payloads per dual processor VM. Total of dual processor VMs at UCL now 120.<br />
<br />
'''Tuesday 29 Sep'''<br />
* UCL Vac site updated with most recent version of Vac-in-a-Box. Now running ~216 jobs: LHCb MC and ATLAS certification jobs.<br />
* Drawing up list of tasks needed to be able to run a site for GridPP-supported VOs purely using VMs (e.g. VM certification by experiments etc.)<br />
* Discussion at GridPP Technical Meeting on storage options, including xrootd-based sites (i.e. xrootd not DPM/dCache)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End T2EVO****************** -----><br />
<!-- ****************Start Accounting****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Accounting - [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html UK Grid Metrics] [[HEPSPEC06]] [http://tinyurl.com/cfbfanf Atlas Dashboard HS06]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 3rd November'''<br />
* APEL delay (normal state) Lancaster and Sheffield.<br />
<br />
'''Tuesday 20th October'''<br />
The WLCG MB decided to create a Benchmarking Task force led by Helge Meinhard see [https://indico.cern.ch/event/433303/contribution/9/attachments/1154494/1658853/2015-09-15-LCGMB-Benchmarking.pdf talk]<br />
<br />
'''Tuesday 22nd September'''<br />
* Slight delay for Sheffield but overall okay - although there is a gap between today's date and the most recent update for all sites. Perhaps an APEL delay.<br />
<br />
'''Monday 20th July'''<br />
* Oxford publishing 0 cores from Cream today. Maybe they forgot to switch one off. [http://goc-accounting.grid-support.ac.uk/apel/jobs2_withsubmithost.html Check here]. <br />
<br />
* A reminder to keep updating the [https://www.gridpp.ac.uk/wiki/HEPSPEC06 HEPSPEC06 tables].<br />
<br />
* [http://accounting.egi.eu/egi.php?SubRegion=1.67&query=normcpu&startYear=2014&startMonth=8&endYear=2014&endMonth=9&yRange=SITE&xRange=VO&voGroup=lhc&chart=GRBAR&scale=LIN&localJobs=onlygridjobs APEL status]: An issue at Sheffield?<br />
<br />
<!-- *****************Edit stop*********************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End Accounting****************** -----><br />
<!-- ****************Start Documentation****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Documentation - [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=area KeyDocs]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- *********************************************************** -----><br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Friday 6th Nov, 2015'''<br />
<br />
SteveJ: Advice to admins published about a common GSS error, globus_gsi_callback_module: Could not verify <br />
credential etc.<br />
<br />
https://www.gridpp.ac.uk/wiki/Security_system_errors_and_workarounds<br />
<br />
'''Tuesday 20th October, 2015'''<br />
<br />
Approved VOs document updated with temporary section for [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs#Approved_VOs_in_the_process_of_being_established LZ]<br />
<br />
'''Tuesday 29nd September'''<br />
Steve J: problems with voms server at fnal, voms.fnal.gov, have been detected; I will resolve them soon and may issue an update to Approved VOs, alerting sites with TB_SUPPORT should that occur. Approved VOs potentially affected are CDF, DZERO, LSST. Please do not act act yet.<br />
<br />
'''Tuesday 22nd September'''<br />
* Steve J is going to undertake some GridPP/documentation usability testing. <br />
<br />
'''Tuesday 18th August'''<br />
* Lydia's document - Setup a system to do data archiving using FTS3<br />
<br />
'''Tuesday 28th July'''<br />
* Ewan: /cvmfs/gridpp-vo help ... there's a lot of historical stuff on the GridPP wiki that makes it look a lot more complicated than it is now. We really should have a bit of a clear out at some point.<br />
<br />
'''Tuesday 23rd June'''<br />
* Reminder that documents need reviewing!<br />
<br />
<br />
'''General note'''<br />
<br />
See the [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=reviewed worst KeyDocs list] for documents needing review now and the names of the responsible people.<br />
<br />
|}<br />
<br />
<!-- ****************End Documentation****************** -----><br />
<!-- ****************Start Interop****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Interoperation - [https://wiki.egi.eu/wiki/Grid_Operations_Meetings EGI ops agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
''' Tuesday 10th November<br />
<br />
Meeting yesterday was cancelled in favour of OMB; next meeting scheduled for 14/12.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- *********************************************************** -----><br />
<br />
|}<br />
<!-- ****************End Interop****************** -----><br />
<!-- ****************Start Monitoring****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Monitoring - [https://www.gridpp.ac.uk/wiki/Links_Monitoring_pages Links] [http://grid-monitoring.cern.ch/mywlcg/ MyWLCG]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 16th June'''<br />
* F Melaccio & D Crooks decided to add a [https://www.gridpp.ac.uk/wiki/Monitoring_FAQs FAQs section] devoted to common monitoring issues under the monitoring page.<br />
* Feedback welcome. <br />
<br />
<br />
<br />
''' Tuesday 31st March<br />
<br />
* Glasgow Graphite/Grafana documentation: http://www.scotgrid.ac.uk/graphite/<br />
<br />
'''Monday 7th December<br />
<br />
* Meeting last Friday - agenda: https://indico.cern.ch/event/356853/ minutes: https://indico.cern.ch/event/356853/material/minutes/1.pdf<br />
* This was the wrap-up meeting of the consolidation TF; the mailing list will remain extant for a while yet.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Monitoring****************** -----><br />
<!-- ****************On-duty****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | On-duty - [https://operations-portal.in2p3.fr/dashboard Dashboard] [https://www.gridpp.ac.uk/wiki/ROD_rota ROD rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Monday 23rd November'''<br />
* Still seeing the alarms for systems at QMUL and RHUL that are not in production.<br />
* Closed some tickets for the rolling availabilities. Confused by the rolling availability plots. It seems the "rolling average" period was cut back to 20 days. <br />
<br />
'''Tuesday 3rd November'''<br />
* The long-standing UCL availability alarm went green yesterday on 29th October. We are not sure why!<br />
* Quite a lot of activity on the dashboard this week, but only one or two new tickets. <br />
* Tickets: Five for availability / reliability: Sussex, Sheffield, Liverpool, Lancaster and UCL. Two for GLUE2 validation: Liverpool and QMUL. One for the CEs at QMU<br />
<br />
'''Tuesday 20th October'''<br />
<br />
Lots of bits here and there, but no big pattern. Tickets about CE and storage problems open at several sites. QMUL notable as going on for a while, probably with some kind of configuration problem they're not identifying.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Rollout [http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html Status] [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions WLCG Baseline]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 15th September'''<br />
* [http://repository.egi.eu/2015/09/10/release-umd-3-13-3/ UMD 3.13.3 is available].<br />
<br />
'''Tuesday 12th May'''<br />
* MW Readiness WG meeting Wed May 6th at 4pm. Attended by Raul, Matt, Sam and Jeremy.<br />
<br />
'''Tuesday 17th March'''<br />
* Daniela has updated the [ https://www.gridpp.ac.uk/wiki/Staged_rollout_emi3 EMI-3 testing table]. Please check it is correct for your site. We want a clear view of where we are contributing.<br />
* There is a middleware readiness meeting this Wednesday. Would be good if a few site representatives joined.<br />
* Machine job features solution testing. Fed back that we will only commence tests if more documentation made available. This stops the HTC solution until after CHEP. Is there interest in testing other batch systems? Raul mentioned SLURM. There is also SGE and Torque.<br />
<br />
'''References'''<br />
<br />
* Staged Rollout pages (now separated into EMI1 & 2), and the page listing the deployed versions is extractable from the bdii, so they should all be reasonably up-to-date:<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout_emi2.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End On-duty****************** -----><br />
<!-- ****************Start Security****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Security - [http://www.gridpp.ac.uk/security/inchand/Incident.html Incident Procedure] [http://www.gridpp.ac.uk/security/policies/index.html Policies] [https://www.gridpp.ac.uk/wiki/SD_rota Rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
'''Tuesday 24th November'''<br />
* Call on NGIs to participate in "Security Threat Risk Assessment - with Cloud Focus" work.<br />
* Check Pakiti for CVE-2015-7183 issues.<br />
<br />
'''Tuesday 17th November'''<br />
* [https://wiki.egi.eu/wiki/SVG:Advisory-SVG-2015-CVE-2015-7183 Advisory-SVG-2015-CVE-2015-7183] issued 06/11/2015: a few UK sites show as unpatched in [https://operations-portal.egi.eu/csiDashboard/ngi/NGI_UK/tab/sites/filter/operators/page/sites EGI monitoring]. WNs, as tested by the monitoring, may be less vulnerable than affected middleware services but they could be taken as an indication of general site readiness and sites are encouraged to check their status. <br />
<br />
'''Monday 9th November'''<br />
* EGI SVG Advisory - 'Critical' risk. Remote arbitrary code execution vulnerabilities in the core crypto library used by RedHat - [https://wiki.egi.eu/wiki/SVG:Advisory-SVG-2015-CVE-2015-7183 Advisory-SVG-2015-CVE-2015-7183] All running resources based on Red hat and its derivatives MUST be patched by 2015-11-13 <br />
<br />
'''Tuesday 3rd November'''<br />
* EGI SVG Advisory - Various Java CVE's with max CVSS score.<br />
<br />
'''Monday 26th October'''<br />
* Updated IGTF distribution version [ https://dist.igtf.net/distribution/igtf/current/ 1.69 available].<br />
* Next UK Security Team meeting scheduled for 28th Oct.<br />
<br />
The EGI [https://operations-portal.egi.eu/csiDashboard security dashboard].<br />
<br />
|}<br />
<!-- ****************End Security****************** -----><br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start Services****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Services - [http://psmad.grid.iu.edu/maddash-webui/ PerfSonar dashboard] | [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).<br />
<br />
'''Tuesday 6th October'''<br />
* A reminder, the next [https://indico.cern.ch/event/401680/ LHCOPN and LHCONE joint meeting]: Science Park Amsterdam (NL) 28-29 of October 2015<br />
<br />
'''Tuesday 14th July'''<br />
* GridPP35 in September will have a part focus on networking and IPv6. This will include a review of where sites are with their deployment. Please try to firm up dates for your IPv6 availability between now and September. Please update the [https://gridpp.ac.uk/wiki/IPv6_site_status GridPP IPv6 status table]. <br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Services****************** -----><br />
<!-- ****************Start Tickets****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tickets<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Monday 23rd November 2015, 15.00 GMT'''<br /><br />
There were just 18 Open UK Tickets - now we're up to 37...<br />
<br />
'''ATLASPRODDISK Deletion''' (20/11)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117740 117740] (RALPP)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117739 117739] (Sussex)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117737 117737] (Birmingham)<br />
<br />
Brian (as an agent of atlas) has asked sites to delete the data from proddisk and remove the tokens. Chris has sorted it already for RALPP and the ticket has mutated to a generic atlas cleanout. No news on the other two.<br />
<br />
'''PILOT LIGHT AT THE END OF THE TUNNEL'''<br /><br />
In another round of Pilot wrangling by Daniela sites have been asked to implement pilots for the Pheno VO (or where they are enabled find out why they're not working!):<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117706 117706] (GLASGOW)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117710 117710] (Brunel)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117711 117711] (RALPP)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117723 117723] (QMUL)<br />
<br />
For completeness the older pilot tickets:<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] (Sheffield)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116866 116866] (TIER 1)<br />
The Tier 1 is at the "debugging" stage, no news from Sheffield though.<br />
<br />
HOT OFF THE TICKET PRESS<br /><br />
Atlas have ticketed everyone and their grandmother asking to implement the automated storage consistency check stuff. Which I think is a little unfair as I'm unconvinced of the maturity of the tools after spending a day with them last week.<br />
<br />
I've updated Lancaster's ticket to see if we've set this up right during our mucking about last week: [https://ggus.eu/index.php?mode=ticket_info&ticket_id=117883 117883]<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tickets****************** -----><br />
<!-- ****************Start Tools****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tools - [https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] [https://gridppnagios.physics.ox.ac.uk/nagios/ Nagios]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== ===== <br />
<!-- ******************Edit start********************* -----><br />
''' Tuesday 6 Oct 2015'''<br />
<br />
Moved Gridppnagios instance back to Oxford from Lancaster. It was kind of double whammy as both sites went down together. Fortunately Oxford site was partially working so we managed to start SAM Nagios at Oxford. Sam tests were unavailable for few hours but no affect on egi availibilty/reliability. Sites can have a look at https://mon.egi.eu/myegi/ss/ for a/r status. <br />
<br />
'''Tuesday 29 Sep 2015'''<br />
<br />
Following an air-conditioning problem at machine room in Oxford Tier-2 site on 26 September, gridppnagios(OX) was shut down and gridppnagios(Lancs) became active instance. Oxford site is in downtime until 1st Oct and it may be extended depending on the situation. <br />
VO-Nagios was also unavailable for two days but we have started it yesterday as it is running on a VM. VO-nagios is using oxford SE for replication test so it is failing those tests. I am looking to change to some other SE. <br />
<br />
'''Tuesday 09 June 2015'''<br />
*ARC CEs were failing nagios test becuase of non-availability of egi repository. Nagios test compare CA version from EGI repo. It started on 5th June and one of the IP addresses behind webserver was not responding. Problem went away in approximately 3 hours. The same problem started again on 6th June. Finally it was fixed on 8th June. No reason was given in any of the ticket opened regarding this outage. <br />
<br />
'''Tuesday 17th February'''<br />
* Another period where message brokers were temporarily unavailable seen yesterday. Any news on the last follow-up?<br />
<br />
'''Tuesday 27th January'''<br />
* Unscheduled outage of the EGI message broker (GRNET) caused a short-lived disruption to GridPP site monitoring (jobs failed) last Thursday 22nd January. Suspect BDII caching meant no immediate failover to stomp://mq.cro-ngi.hr:6163/ from stomp://mq.afroditi.hellasgrid.gr:6163/<br />
<br />
* [http://southgrid.blogspot.co.uk/2014/10/nagios-monitoring-for-non-lhc-vos.html Blog about VO Nagios]<br />
* [https://vo-nagios.physics.ox.ac.uk/nagios/ Oxford VO Nagios] currently monitoring gridpp, pheno, t2k.org, snoplus.snolab.ca, vo.southgrid.ac.uk.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tools****************** -----><br />
<!-- ****************Start VOs****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | VOs - [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS] [http://operations-portal.egi.eu/vo VO IDs] [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs Approved] [http://pprc.qmul.ac.uk/~walker/votable.html VO table]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 19th May'''<br />
* There is a current priority for enabling/supporting our joining communities. <br />
<br />
'''Tuesday 5th May'''<br />
* We have a number of VOs to be removed. Dedicated follow-up meeting proposed.<br />
<br />
'''Tuesday 28th April'''<br />
* For SNOPLUS.SNOLAB.CA, the port numbers for voms02.gridpp.ac.uk and voms03.gridpp.ac.uk have both been updated from 15003 to 15503.<br />
<br />
'''Tuesday 31st March'''<br />
* LIGO are in need of additional support for debugging some tests.<br />
* LSST now enabled on 3 sites. No 'own' CVMFS yet.<br />
<br />
* Impact<br />
** Citation policy (https://www.gridpp.ac.uk/acknowledging.html)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End VOs****************** -----><br />
<!-- ****************Start Sites****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Site Updates<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 24th February'''<br />
* Next review of status today.<br />
<br />
'''Tuesday 27th January'''<br />
* Squids not in GOCDB for: UCL; ECDF; Birmingham; Durham; RHUL; IC; Sussex; Lancaster<br />
* Squids in GOCDB for: EFDA-JET; Manchester; Liverpool; Cambridge; Sheffield; Bristol; Brunel; QMUL; T1; Oxford; Glasgow; RALPPD.<br />
<br />
'''Tuesday 2nd December''' <br />
* [https://www.gridpp.ac.uk/wiki/Batch_system_status Multicore status]. Queues available (63%)<br />
** YES: RAL T1; Brunel; Imperial; QMUL; Lancaster; Liverpool; Manchester; Glasgow; Cambridge; Oxford; RALPP; Sussex (12)<br />
** NO: RHUL (testing); UCL; Sheffield (testing); Durham; ECDF (testing); Birmingham; Bristol (7)<br />
<br />
* According to our [https://www.gridpp.ac.uk/wiki/Batch_system_status table] for cloud/VMs (26%)<br />
** YES: RAL T1; Brunel; Imperial; Manchester; Oxford (5)<br />
** NO: QMUL; RHUL; UCL; Lancaster; Liverpool; Sheffield; Durham; ECDF; Glasgow; Birmingham; Bristol; Cambridge; RALPP; Sussex (14)<br />
<br />
* [http://www.gridpp.ac.uk/php/gridpp-dirac-sam.php?action=viewlcg GridPP DIRAC jobs] successful (58%)<br />
** YES: Bristol; Glasgow; Lancaster; Liverpool; Manchester; Oxford; Sheffield; Brunel; IC; QMUL; RHUL (11)<br />
** NO: Cambridge; Durham; RALPP; RAL T1 (4) + ECDF; Sussex; UCL; Birmingham (4)<br />
<br />
* [https://www.gridpp.ac.uk/wiki/IPv6_site_status IPv6 status]<br />
** Allocation - 42%<br />
** YES: RAL T1; Brunel; IC; QMUL; Manchester; Sheffield; Cambridge; Oxford (8)<br />
** NO: RHUL; UCL; Lancaster; Liverpool; Durham; ECDF; Glasgow; Birmingham; Bristol; RALPP; Sussex<br />
<br />
* Dual stack nodes - 21%<br />
** YES: Brunel; IC; QMUL; Oxford (4)<br />
** NO: RHUL; UCL; Lancaster; Glasgow; Liverpool; Manchester; Sheffield; Durham; ECDF; Birmingham; Bristol; Cambridge; RALPP; Sussex, RAL T1 (15)<br />
<br />
<br />
<br />
'''Tuesday 21st October'''<br />
* High loads seen in xroot by several sites: Liverpool and RALT1... and also Bristol (see Luke's TB-S email on 16/10 for questions about changes to help).<br />
<br />
'''Tuesday 9th September'''<br />
* Intel announced the new generation of Xeon based on Haswell.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Sites****************** -----><br />
|}<br />
<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Meeting Summaries<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start PMB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Project Management Board - [http://www.gridpp.ac.uk/pmb/ Members][http://www.gridpp.ac.uk/php/pmb/minutes.php Minutes] [http://www.gridpp.ac.uk/pmb/ProjectManagement/QuarterlyReports/reports.html Quarterly Reports]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ******************************************* -----><br />
<!-- ****************End PMB****************** -----><br />
<!-- ****************Start Grid Ops****************** -----><br />
<!-- ******************************************* -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | GridPP ops meeting - [http://indico.cern.ch/categoryDisplay.py?categId=338 Agendas] [https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Actions] [https://www.gridpp.ac.uk/wiki/Category:GridPP_Operations Core Tasks]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Grid Ops****************** -----><br />
<!-- ****************Start T1 liaison****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) [https://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Agenda] Meeting takes place on Vidyo.<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
Highlights from this meeting are now included in the Tier1 report farther up this page.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Liaison****************** -----><br />
<!-- ****************Start GDB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Grid Deployment Board - [http://indico.cern.ch/categoryDisplay.py?categId=3l181 Agendas] [http://indico.cern.ch/categoryDisplay.py?categId=666 MB agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End GDB****************** -----><br />
<br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start NGI****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | NGI UK - [http://www.ukngi.ac.uk/ Homepage] [https://ca.grid-support.ac.uk/cgi-bin/pub/pki?cmd=getStaticPage&name=index CA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End NGI****************** -----><br />
<br />
<!-- ****************Start Events****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Events<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************Stop Events****************** -----><br />
<!-- ****************Start ATLAS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK ATLAS - [http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?#currentView=Shifter+view&highlight=false Shifter view] [http://www.atlas.ac.uk/ops/ News & Links]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Atlas S&C week 2-6 Feb 2015'''<br />
<br />
Production<br />
<br />
• Prodsys-2 in production since Dec 1st<br />
<br />
• Deployment has not been transparent , many issued has been solved, the grid is filled again <br />
<br />
• MC15 is expected to start soon, waiting for physics validations, evgen testing is underway and close to finalised.. Simulation expected to be broadly similar to MC14, no blockers expected. <br />
<br />
Rucio<br />
<br />
• Rucio in production since Dec 1st and is ready for LHC RUN-2. Some fields need improvements, including transfer and deletion agents, documentation and monitoring. <br />
<br />
• [https://rucio-ui.cern.ch/dumps Rucio dumps available].<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMDarkDataCleaning Dark data cleaning]<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMLostFilesLost files declaration] . Only Only DDM ops can issue lost files declaration for now, cloud support needs to fill a ticket.<br />
<br />
• Webdav panda functional tests with Hammercloud are ongoing<br />
<br />
Monitoring<br />
<br />
• [http://adc-monitoring.cern.ch/ Main page]<br />
<br />
• [http://dashb-atlas-ddm-acc.cern.ch/dashboard/request.py/ddmaccounting DDM Accounting]<br />
<br />
• [http://atlas-agis.cern.ch/agis/ddmblacklisting/listGroup space]<br />
<br />
• [http://dashb-atlas-ddm.cern.ch/ddm2/ Deletion]<br />
<br />
ASAP<br />
<br />
• ASAP (ATLAS Site Availability Performance) in place. Every 3 months the T2s sites performing BELOW 80% are [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/ATLASSiteCategorization reported to the International Computing Board.]<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End ATLAS****************** -----><br />
<!-- ****************Start CMS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK CMS<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End CMS****************** -----><br />
<!-- ****************Start LHCb****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK LHCb<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End LHCb****************** -----><br />
<!-- ****************Start Other****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK OTHER<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Other****************** -----><br />
<!-- ****************Start Requests****************** -----><br />
{| cellspacing="0" cellpadding="0" style="margin: 1em 0em 0em 0.3em; width:100%"<br />
| style="width:50%; vertical-align:top; border:1px solid Gold; background-color: LightGreen;" rowspan="2"|<br />
<div style="border-bottom:1px solid Gold; background-color:#ffffaa; padding:0.2em 0.5em 0.2em 0.5em; font-size:110%; font-weight:bold;"> To note</div><br />
<div style="padding:0.4em 1em 0.3em 1em;"><br />
<br />
==== ====<br />
<!-- ******************Edit start********************* -----><br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<br />
</div><br />
|}<br />
<br />
<!-- ****************End Requests****************** -----><br />
<br />
|}</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Operations_Bulletin_LatestOperations Bulletin Latest2015-11-23T15:02:23Z<p>Matthew Doidge 1ac9bd3994: /* */</p>
<hr />
<div>[[Operations_Bulletin_Archive|Bulletin archive]]<br />
<br />
__NOTOC__<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Week commencing 23rd November 2015<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Task Areas<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start General****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | General updates <br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start General text*********************** ----->'''<br />
<br />
'''Tuesday 24th November'''<br />
* Some ATLAS sites (as of Monday) still had PRODDISK listed in the BDII.<br />
* ATLAS DPM 'dump' - a list of files?<br />
* An agenda for December's GDB is developing - [https://indico.cern.ch/event/319754/ see here]. Tier-2 representation has wained recently. Perhaps someone attending the DPM meeting could attend the GDB too?<br />
* [https://www.gridpp.ac.uk/wiki/Example_Build_of_an_ARC/Condor_Cluster#Patch_for_BDII_Job_Count_Breakdown Andrew's BDII patch] used in relation to LSST job breakdown issues.<br />
* [https://www.gridpp.ac.uk/wiki/User_Interface_%28UI%29_to_support_approved_VOs Installing a UI on SL6].<br />
<br />
'''Tuesday 17th November'''<br />
* The [https://vm36.tier2.hep.manchester.ac.uk/users/case-studies/galdyn/ GalDyn case study] has been written up<br />
* There is now a job adverts section on the new GridPP website. Now we need a process to populate the area....<br />
* There is a summary available of the [https://indico.cern.ch/event/401680 LHCOPN-LHCONE meeting in Amsterdam on the 28-29 of October 2015].<br />
* Ewan: Pilot accounts for everyone<br />
* Welcome to Markus Ebert who has started at Edinburgh (designated GridPP/LSST DAC person).<br />
<br />
<br />
'''Tuesday 10th November'''<br />
* There was a WLCG GDB last week: [http://indico.cern.ch/event/319753/ Agenda]. [https://twiki.cern.ch/twiki/bin/view/LCG/GDBMeetingNotes20151104 Minutes].<br />
* ATLAS is going to move the brokering to use maxrss value rather than maxrss+maxswap (called maxmemory in the ATLAS panda queues). [https://twiki.cern.ch/twiki/bin/view/LCG/BSPassingParameters Reminder notes].<br />
* There is an EGI meeting in Bari this week. The OMB agenda/slides can be seen [https://indico.egi.eu/indico/contributionDisplay.py?contribId=1&confId=2675 here]. <br />
* An update from John Gordon on the CPU efficiency accounting discussion of last week."Stuart has the database merge under way now. We had hoped it would be done by the end of October but it isn’t complete yet. When done and we send an integrated dataset to the portal they will put the current dev view in the production portal alongside the current one. They are currently developing a major rewrite of the portal and don’t want to mess about with it too much."<br />
*J Perkin: Weird user error when reading LFC. Led to a reminder that the CA team are currently best efforts - please be patient.<br />
* J Hill: ATLAS consistency checks at Tier2s requires sites to upload stiorage dumps - please can we have a clear statement of what parameters ATLAS wants us to use for these dumps?<br />
* Simon G: Asked about Tier3 access to Tier2 storage.<br />
* LCG-ROLLOUT: TOP BDII issues with CentOS 6.7 (openldap-servers-2.4.40-5.el6.x86_64). It basically breaks but is being followed up with RH.<br />
* For those interested in the ARGUS working group meeting last week please see their [https://indico.cern.ch/event/459898/ summary].<br />
* There was a GridPP Technical Meeting last Friday: [https://indico.cern.ch/e/460437 Agenda].<br />
<br />
<!-- **********************End General text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End General****************** -----><br />
<!-- ****************Start ops coord****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Operations Coordination - [https://indico.cern.ch/categoryDisplay.py?categId=4372 Agendas][https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsCoordination Wiki Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start ops coord text*********************** -----><br />
'''Tuesday 17th November'''<br />
* There is a [https://indico.cern.ch/event/393620/ WLCG ops coordination meeting this week] on Thursday.<br />
<br />
'''Tuesday 10th November'''<br />
* There was a WLCG ops coordination meeting last Thursday: [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151105 Minutes]. (Alessandra chaired and might be able to talk over the main items at our ops meeting).<br />
<br />
'''Tuesday 27th October'''<br />
* 13th MW Readiness WG meeting THIS Wed 28/10 @ 4pm CET in CERN room 28-S-023 or via vidyo.<br />
* There was a WLCG ops coordination meeting last Thursday: [https://indico.cern.ch/event/393618/ Agenda] - [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151022 Minutes]. <br />
<br />
'''Next Meeting is scheduled for Thursday 22nd October'''<br />
<br />
'''Tuesday 6th October'''<br />
* There was a WLCG ops coordination meeting last week. [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151001 Minutes]. [https://indico.cern.ch/event/393617/ Agenda] (which has John Gordon's accounting slides).<br />
* The highlights:<br />
** dCache sites should install the latest fix for SRM solving a vulnerability<br />
** All sites hosting a regional or local site xrootd should updgrade it at least to version 4.1.1<br />
** CMS DPM sites should consider upgrading dpm-xrootd to version 3.5.5 now (from epel-testing) or after mid October (from epel-stable) to fix a problem affecting AAA<br />
** Tier-1 sites should do their best to avoid scheduling OUTAGE downtimes at the same time as other Tier-1's supporting common LHC VOs. A calendar will be linked in the minutes of the 3 o'clock operations meeting to easily find out if there are already downtimes at a given date<br />
** The multicore accounting for WLCG is now correct for the 99.5% of the CPU time, with the few remaining issues being addressed. Corrected historical accounting data is expected to be available from the production portal by the end of the month<br />
** All LHCb sites will soon be asked to deploy the "machine features" functionality<br />
<br />
'''Tuesday 22nd September'''<br />
* There was an ops coordination meeting last Thursday: [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150917 Minutes].<br />
* Highlights:<br />
** All 4 experiments have now an agreed workflow with the T0 for tickets that should be handled by the experiment supporters and were accidentally assigned to the T0 service managers.<br />
** A new FTS3 bug fixing release 3.3.1 is now available.<br />
** A globus lib issue is causing problems with FTS3 for sites running IPv6.<br />
** The rogue Glasgow configuration management tool replacing the current configuration for VOMS with the old one was picked up and unfortunately discussed as though sites had not got the message about using the new VOMS.<br />
** No network problems experienced with the transatlantic link despite 3 out of 4 cables being unavailable.<br />
** T0 experts are investigating the slow WN performance reported by LHCb and others.<br />
** A group of experts at CERN and CMS investigate ARGUS authentication problems affecting CMS VOBOXes.<br />
** T1 & T2 sites please observe the actions requested by ATLAS and CMS (also on the WLCG Operations portal).<br />
* Actions for [https://wlcg-ops.web.cern.ch/sys-admins/tasks Sites]; [https://wlcg-ops.web.cern.ch/experiments/tasks Experiments].<br />
<br />
'''Tuesday 15th September'''<br />
* The [https://indico.cern.ch/event/393616/ next WLCG ops coordination meeting is this Thursday 17th September]. Are there any Tier-2 issues we wish to raise? Minutes will appear [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150917 here]. <br />
<br />
<br />
<!-- **********************End ops coord text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End ops coord****************** -----><br />
<br />
<!-- ****************Start T1****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-1 - [http://www.gridpp.rl.ac.uk/status/ Status Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start T1 text*********************** -----><br />
'''Tuesday 24th November'''<br />
A reminder that there is a weekly [http://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Tier-1 experiment liaison meeting]. Notes from the last meeting [https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2015-11-18 here]<br />
* We are investigating why LHCB batch jobs sometimes fail to write results back to Castor (and the sometimes fail to write remotely as well).<br />
* We are continuing with some detailed network changes needed to remove our old core switch from the network. On Wednesday morning (18th) the link to the Atlas building (R26) was successfully moved. (There was a site 'warning' in the GOC DB for this.)<br />
* A week or so ago we saw some Atlas Hammercloud tests failing (loss of heartbeat) - although the problem seems to have gone away now. This is not understood yet.<br />
* We have found a problem on some disk servers of one particular batch that have been updated to SL6. The servers can run slowly and individual commands hang (until a timeout) while making name look-ups. So far a total of three servers have been affected spread over a couple of weeks. We can easily fix the problem but do not yet know why it occurs.<br />
* The tenders for the next round of Disk and CPU purchases are now out.<br />
* All of the Tier1's Castor tape servers are now running Castor version 2.1.15. (The rest of Castor is still at 2.1.14 - with the aim of upgrading early-ish in 2016).<br />
<!-- **********************End T1 text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End T1****************** -----><br />
<br />
<!-- ****************Start Storage & DM****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Storage & Data Management - [http://storage.esc.rl.ac.uk/weekly/ Agendas/Minutes]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20151028-minutes.txt Wednesday 28 Oct]'''<br />
* Summary of "UK T0" workshop - GridPP well represented<br />
* Sites should not upgrade to DPM 1.8.10 just yet<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150902-minutes.txt Wednesday 02 Sep]'''<br />
* Catch up with [http://www3.imperial.ac.uk/highenergyphysics/research/experiments/mice MICE]<br />
* How to do transfers of '''lots''' of files with FTS3 without the proxy timing out (in particularly if you need it vomsified)<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150812-minutes.txt Wednesday 12 Aug]'''<br />
* sort of housekeeping: data cleanups, catalogue synchronisation - in particular namespace dumps for VOs<br />
* GridPP storage/data at future events; GridPP35 and Hepix and Cloud data events<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150708-minutes.txt Wednesday 08 July]'''<br />
* Huge backlog of ATLAS data from Glasgow waiting to go to RAL, and oddly varying performance numbers - investigating<br />
* How physics data is like your Windows 95 games<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150701-minutes.txt Wednesday 01 July]'''<br />
* Feedback on CMS's proposal for listing contents of storage<br />
* Simple storage on expensive raided disks vs complicated storage on el cheapo or archive drives?<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End Storage & DM****************** -----><br />
<!-- ****************Start T2EVO****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-2 Evolution - [https://its.cern.ch/jira/browse/GRIDPP/ GridPP JIRA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 17 Nov'''<br />
* New depo.gridpp.ac.uk service for uploading files to via HTTPS<br />
* ATLAS VMs now upload log files to depo.gridpp.ac.uk for debugging (GRIDPP-24)<br />
<br />
'''Monday 9 Nov'''<br />
* Vac 0.19.0 released last week: new VacQuery protocol<br />
* Prototyping Vcycle(/Vac) GLUE2 publishing: talk at WLCG InfoSys Evolution TF on Thursday<br />
* Please now use https://repo.gridpp.ac.uk/vacproject/ URLs for user_data and cernvm3.iso files (in preparation for move to new www.gridpp.ac.uk website.)<br />
* See Laurence's GDB talk "Helix Nebula Price Enquiry Results" for more use of Vcycle<br />
<br />
'''Tuesday 3 Nov'''<br />
* LHCb prototype of GOCDB pointers to resource BDII done<br />
* T2C tests at Oxford ongoing<br />
<br />
'''Tuesday 20 Oct'''<br />
* LHCb multipilot VMs now in production<br />
* Support for APEL-Sync records in Vac, but need to co-ordinate with APEL team to validate it. This is to allow pure-VM sites like UCL to pass APEL SAM tests (GRIDPP-10)<br />
* Last GridPP Technical Meeting decided to test disk-less operation at Oxford for CMS (GRIDPP-20) and LHCb (GRIDPP-21).<br />
<br />
'''Tuesday 6 Oct'''<br />
* UCL Vac site now running LHCb test of two payloads per dual processor VM. Total of dual processor VMs at UCL now 120.<br />
<br />
'''Tuesday 29 Sep'''<br />
* UCL Vac site updated with most recent version of Vac-in-a-Box. Now running ~216 jobs: LHCb MC and ATLAS certification jobs.<br />
* Drawing up list of tasks needed to be able to run a site for GridPP-supported VOs purely using VMs (e.g. VM certification by experiments etc.)<br />
* Discussion at GridPP Technical Meeting on storage options, including xrootd-based sites (i.e. xrootd not DPM/dCache)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End T2EVO****************** -----><br />
<!-- ****************Start Accounting****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Accounting - [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html UK Grid Metrics] [[HEPSPEC06]] [http://tinyurl.com/cfbfanf Atlas Dashboard HS06]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 3rd November'''<br />
* APEL delay (normal state) Lancaster and Sheffield.<br />
<br />
'''Tuesday 20th October'''<br />
The WLCG MB decided to create a Benchmarking Task force led by Helge Meinhard see [https://indico.cern.ch/event/433303/contribution/9/attachments/1154494/1658853/2015-09-15-LCGMB-Benchmarking.pdf talk]<br />
<br />
'''Tuesday 22nd September'''<br />
* Slight delay for Sheffield but overall okay - although there is a gap between today's date and the most recent update for all sites. Perhaps an APEL delay.<br />
<br />
'''Monday 20th July'''<br />
* Oxford publishing 0 cores from Cream today. Maybe they forgot to switch one off. [http://goc-accounting.grid-support.ac.uk/apel/jobs2_withsubmithost.html Check here]. <br />
<br />
* A reminder to keep updating the [https://www.gridpp.ac.uk/wiki/HEPSPEC06 HEPSPEC06 tables].<br />
<br />
* [http://accounting.egi.eu/egi.php?SubRegion=1.67&query=normcpu&startYear=2014&startMonth=8&endYear=2014&endMonth=9&yRange=SITE&xRange=VO&voGroup=lhc&chart=GRBAR&scale=LIN&localJobs=onlygridjobs APEL status]: An issue at Sheffield?<br />
<br />
<!-- *****************Edit stop*********************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End Accounting****************** -----><br />
<!-- ****************Start Documentation****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Documentation - [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=area KeyDocs]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- *********************************************************** -----><br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Friday 6th Nov, 2015'''<br />
<br />
SteveJ: Advice to admins published about a common GSS error, globus_gsi_callback_module: Could not verify <br />
credential etc.<br />
<br />
https://www.gridpp.ac.uk/wiki/Security_system_errors_and_workarounds<br />
<br />
'''Tuesday 20th October, 2015'''<br />
<br />
Approved VOs document updated with temporary section for [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs#Approved_VOs_in_the_process_of_being_established LZ]<br />
<br />
'''Tuesday 29nd September'''<br />
Steve J: problems with voms server at fnal, voms.fnal.gov, have been detected; I will resolve them soon and may issue an update to Approved VOs, alerting sites with TB_SUPPORT should that occur. Approved VOs potentially affected are CDF, DZERO, LSST. Please do not act act yet.<br />
<br />
'''Tuesday 22nd September'''<br />
* Steve J is going to undertake some GridPP/documentation usability testing. <br />
<br />
'''Tuesday 18th August'''<br />
* Lydia's document - Setup a system to do data archiving using FTS3<br />
<br />
'''Tuesday 28th July'''<br />
* Ewan: /cvmfs/gridpp-vo help ... there's a lot of historical stuff on the GridPP wiki that makes it look a lot more complicated than it is now. We really should have a bit of a clear out at some point.<br />
<br />
'''Tuesday 23rd June'''<br />
* Reminder that documents need reviewing!<br />
<br />
<br />
'''General note'''<br />
<br />
See the [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=reviewed worst KeyDocs list] for documents needing review now and the names of the responsible people.<br />
<br />
|}<br />
<br />
<!-- ****************End Documentation****************** -----><br />
<!-- ****************Start Interop****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Interoperation - [https://wiki.egi.eu/wiki/Grid_Operations_Meetings EGI ops agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
''' Tuesday 10th November<br />
<br />
Meeting yesterday was cancelled in favour of OMB; next meeting scheduled for 14/12.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- *********************************************************** -----><br />
<br />
|}<br />
<!-- ****************End Interop****************** -----><br />
<!-- ****************Start Monitoring****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Monitoring - [https://www.gridpp.ac.uk/wiki/Links_Monitoring_pages Links] [http://grid-monitoring.cern.ch/mywlcg/ MyWLCG]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 16th June'''<br />
* F Melaccio & D Crooks decided to add a [https://www.gridpp.ac.uk/wiki/Monitoring_FAQs FAQs section] devoted to common monitoring issues under the monitoring page.<br />
* Feedback welcome. <br />
<br />
<br />
<br />
''' Tuesday 31st March<br />
<br />
* Glasgow Graphite/Grafana documentation: http://www.scotgrid.ac.uk/graphite/<br />
<br />
'''Monday 7th December<br />
<br />
* Meeting last Friday - agenda: https://indico.cern.ch/event/356853/ minutes: https://indico.cern.ch/event/356853/material/minutes/1.pdf<br />
* This was the wrap-up meeting of the consolidation TF; the mailing list will remain extant for a while yet.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Monitoring****************** -----><br />
<!-- ****************On-duty****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | On-duty - [https://operations-portal.in2p3.fr/dashboard Dashboard] [https://www.gridpp.ac.uk/wiki/ROD_rota ROD rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Monday 23rd November'''<br />
* Still seeing the alarms for systems at QMUL and RHUL that are not in production.<br />
* Closed some tickets for the rolling availabilities. Confused by the rolling availability plots. It seems the "rolling average" period was cut back to 20 days. <br />
<br />
'''Tuesday 3rd November'''<br />
* The long-standing UCL availability alarm went green yesterday on 29th October. We are not sure why!<br />
* Quite a lot of activity on the dashboard this week, but only one or two new tickets. <br />
* Tickets: Five for availability / reliability: Sussex, Sheffield, Liverpool, Lancaster and UCL. Two for GLUE2 validation: Liverpool and QMUL. One for the CEs at QMU<br />
<br />
'''Tuesday 20th October'''<br />
<br />
Lots of bits here and there, but no big pattern. Tickets about CE and storage problems open at several sites. QMUL notable as going on for a while, probably with some kind of configuration problem they're not identifying.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Rollout [http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html Status] [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions WLCG Baseline]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 15th September'''<br />
* [http://repository.egi.eu/2015/09/10/release-umd-3-13-3/ UMD 3.13.3 is available].<br />
<br />
'''Tuesday 12th May'''<br />
* MW Readiness WG meeting Wed May 6th at 4pm. Attended by Raul, Matt, Sam and Jeremy.<br />
<br />
'''Tuesday 17th March'''<br />
* Daniela has updated the [ https://www.gridpp.ac.uk/wiki/Staged_rollout_emi3 EMI-3 testing table]. Please check it is correct for your site. We want a clear view of where we are contributing.<br />
* There is a middleware readiness meeting this Wednesday. Would be good if a few site representatives joined.<br />
* Machine job features solution testing. Fed back that we will only commence tests if more documentation made available. This stops the HTC solution until after CHEP. Is there interest in testing other batch systems? Raul mentioned SLURM. There is also SGE and Torque.<br />
<br />
'''References'''<br />
<br />
* Staged Rollout pages (now separated into EMI1 & 2), and the page listing the deployed versions is extractable from the bdii, so they should all be reasonably up-to-date:<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout_emi2.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End On-duty****************** -----><br />
<!-- ****************Start Security****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Security - [http://www.gridpp.ac.uk/security/inchand/Incident.html Incident Procedure] [http://www.gridpp.ac.uk/security/policies/index.html Policies] [https://www.gridpp.ac.uk/wiki/SD_rota Rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
'''Tuesday 24th November'''<br />
* Call on NGIs to participate in "Security Threat Risk Assessment - with Cloud Focus" work.<br />
* Check Pakiti for CVE-2015-7183 issues.<br />
<br />
'''Tuesday 17th November'''<br />
* [https://wiki.egi.eu/wiki/SVG:Advisory-SVG-2015-CVE-2015-7183 Advisory-SVG-2015-CVE-2015-7183] issued 06/11/2015: a few UK sites show as unpatched in [https://operations-portal.egi.eu/csiDashboard/ngi/NGI_UK/tab/sites/filter/operators/page/sites EGI monitoring]. WNs, as tested by the monitoring, may be less vulnerable than affected middleware services but they could be taken as an indication of general site readiness and sites are encouraged to check their status. <br />
<br />
'''Monday 9th November'''<br />
* EGI SVG Advisory - 'Critical' risk. Remote arbitrary code execution vulnerabilities in the core crypto library used by RedHat - [https://wiki.egi.eu/wiki/SVG:Advisory-SVG-2015-CVE-2015-7183 Advisory-SVG-2015-CVE-2015-7183] All running resources based on Red hat and its derivatives MUST be patched by 2015-11-13 <br />
<br />
'''Tuesday 3rd November'''<br />
* EGI SVG Advisory - Various Java CVE's with max CVSS score.<br />
<br />
'''Monday 26th October'''<br />
* Updated IGTF distribution version [ https://dist.igtf.net/distribution/igtf/current/ 1.69 available].<br />
* Next UK Security Team meeting scheduled for 28th Oct.<br />
<br />
The EGI [https://operations-portal.egi.eu/csiDashboard security dashboard].<br />
<br />
|}<br />
<!-- ****************End Security****************** -----><br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start Services****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Services - [http://psmad.grid.iu.edu/maddash-webui/ PerfSonar dashboard] | [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).<br />
<br />
'''Tuesday 6th October'''<br />
* A reminder, the next [https://indico.cern.ch/event/401680/ LHCOPN and LHCONE joint meeting]: Science Park Amsterdam (NL) 28-29 of October 2015<br />
<br />
'''Tuesday 14th July'''<br />
* GridPP35 in September will have a part focus on networking and IPv6. This will include a review of where sites are with their deployment. Please try to firm up dates for your IPv6 availability between now and September. Please update the [https://gridpp.ac.uk/wiki/IPv6_site_status GridPP IPv6 status table]. <br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Services****************** -----><br />
<!-- ****************Start Tickets****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tickets<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tickets****************** -----><br />
<!-- ****************Start Tools****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tools - [https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] [https://gridppnagios.physics.ox.ac.uk/nagios/ Nagios]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== ===== <br />
<!-- ******************Edit start********************* -----><br />
''' Tuesday 6 Oct 2015'''<br />
<br />
Moved Gridppnagios instance back to Oxford from Lancaster. It was kind of double whammy as both sites went down together. Fortunately Oxford site was partially working so we managed to start SAM Nagios at Oxford. Sam tests were unavailable for few hours but no affect on egi availibilty/reliability. Sites can have a look at https://mon.egi.eu/myegi/ss/ for a/r status. <br />
<br />
'''Tuesday 29 Sep 2015'''<br />
<br />
Following an air-conditioning problem at machine room in Oxford Tier-2 site on 26 September, gridppnagios(OX) was shut down and gridppnagios(Lancs) became active instance. Oxford site is in downtime until 1st Oct and it may be extended depending on the situation. <br />
VO-Nagios was also unavailable for two days but we have started it yesterday as it is running on a VM. VO-nagios is using oxford SE for replication test so it is failing those tests. I am looking to change to some other SE. <br />
<br />
'''Tuesday 09 June 2015'''<br />
*ARC CEs were failing nagios test becuase of non-availability of egi repository. Nagios test compare CA version from EGI repo. It started on 5th June and one of the IP addresses behind webserver was not responding. Problem went away in approximately 3 hours. The same problem started again on 6th June. Finally it was fixed on 8th June. No reason was given in any of the ticket opened regarding this outage. <br />
<br />
'''Tuesday 17th February'''<br />
* Another period where message brokers were temporarily unavailable seen yesterday. Any news on the last follow-up?<br />
<br />
'''Tuesday 27th January'''<br />
* Unscheduled outage of the EGI message broker (GRNET) caused a short-lived disruption to GridPP site monitoring (jobs failed) last Thursday 22nd January. Suspect BDII caching meant no immediate failover to stomp://mq.cro-ngi.hr:6163/ from stomp://mq.afroditi.hellasgrid.gr:6163/<br />
<br />
* [http://southgrid.blogspot.co.uk/2014/10/nagios-monitoring-for-non-lhc-vos.html Blog about VO Nagios]<br />
* [https://vo-nagios.physics.ox.ac.uk/nagios/ Oxford VO Nagios] currently monitoring gridpp, pheno, t2k.org, snoplus.snolab.ca, vo.southgrid.ac.uk.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tools****************** -----><br />
<!-- ****************Start VOs****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | VOs - [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS] [http://operations-portal.egi.eu/vo VO IDs] [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs Approved] [http://pprc.qmul.ac.uk/~walker/votable.html VO table]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 19th May'''<br />
* There is a current priority for enabling/supporting our joining communities. <br />
<br />
'''Tuesday 5th May'''<br />
* We have a number of VOs to be removed. Dedicated follow-up meeting proposed.<br />
<br />
'''Tuesday 28th April'''<br />
* For SNOPLUS.SNOLAB.CA, the port numbers for voms02.gridpp.ac.uk and voms03.gridpp.ac.uk have both been updated from 15003 to 15503.<br />
<br />
'''Tuesday 31st March'''<br />
* LIGO are in need of additional support for debugging some tests.<br />
* LSST now enabled on 3 sites. No 'own' CVMFS yet.<br />
<br />
* Impact<br />
** Citation policy (https://www.gridpp.ac.uk/acknowledging.html)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End VOs****************** -----><br />
<!-- ****************Start Sites****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Site Updates<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 24th February'''<br />
* Next review of status today.<br />
<br />
'''Tuesday 27th January'''<br />
* Squids not in GOCDB for: UCL; ECDF; Birmingham; Durham; RHUL; IC; Sussex; Lancaster<br />
* Squids in GOCDB for: EFDA-JET; Manchester; Liverpool; Cambridge; Sheffield; Bristol; Brunel; QMUL; T1; Oxford; Glasgow; RALPPD.<br />
<br />
'''Tuesday 2nd December''' <br />
* [https://www.gridpp.ac.uk/wiki/Batch_system_status Multicore status]. Queues available (63%)<br />
** YES: RAL T1; Brunel; Imperial; QMUL; Lancaster; Liverpool; Manchester; Glasgow; Cambridge; Oxford; RALPP; Sussex (12)<br />
** NO: RHUL (testing); UCL; Sheffield (testing); Durham; ECDF (testing); Birmingham; Bristol (7)<br />
<br />
* According to our [https://www.gridpp.ac.uk/wiki/Batch_system_status table] for cloud/VMs (26%)<br />
** YES: RAL T1; Brunel; Imperial; Manchester; Oxford (5)<br />
** NO: QMUL; RHUL; UCL; Lancaster; Liverpool; Sheffield; Durham; ECDF; Glasgow; Birmingham; Bristol; Cambridge; RALPP; Sussex (14)<br />
<br />
* [http://www.gridpp.ac.uk/php/gridpp-dirac-sam.php?action=viewlcg GridPP DIRAC jobs] successful (58%)<br />
** YES: Bristol; Glasgow; Lancaster; Liverpool; Manchester; Oxford; Sheffield; Brunel; IC; QMUL; RHUL (11)<br />
** NO: Cambridge; Durham; RALPP; RAL T1 (4) + ECDF; Sussex; UCL; Birmingham (4)<br />
<br />
* [https://www.gridpp.ac.uk/wiki/IPv6_site_status IPv6 status]<br />
** Allocation - 42%<br />
** YES: RAL T1; Brunel; IC; QMUL; Manchester; Sheffield; Cambridge; Oxford (8)<br />
** NO: RHUL; UCL; Lancaster; Liverpool; Durham; ECDF; Glasgow; Birmingham; Bristol; RALPP; Sussex<br />
<br />
* Dual stack nodes - 21%<br />
** YES: Brunel; IC; QMUL; Oxford (4)<br />
** NO: RHUL; UCL; Lancaster; Glasgow; Liverpool; Manchester; Sheffield; Durham; ECDF; Birmingham; Bristol; Cambridge; RALPP; Sussex, RAL T1 (15)<br />
<br />
<br />
<br />
'''Tuesday 21st October'''<br />
* High loads seen in xroot by several sites: Liverpool and RALT1... and also Bristol (see Luke's TB-S email on 16/10 for questions about changes to help).<br />
<br />
'''Tuesday 9th September'''<br />
* Intel announced the new generation of Xeon based on Haswell.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Sites****************** -----><br />
|}<br />
<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Meeting Summaries<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start PMB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Project Management Board - [http://www.gridpp.ac.uk/pmb/ Members][http://www.gridpp.ac.uk/php/pmb/minutes.php Minutes] [http://www.gridpp.ac.uk/pmb/ProjectManagement/QuarterlyReports/reports.html Quarterly Reports]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ******************************************* -----><br />
<!-- ****************End PMB****************** -----><br />
<!-- ****************Start Grid Ops****************** -----><br />
<!-- ******************************************* -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | GridPP ops meeting - [http://indico.cern.ch/categoryDisplay.py?categId=338 Agendas] [https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Actions] [https://www.gridpp.ac.uk/wiki/Category:GridPP_Operations Core Tasks]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Grid Ops****************** -----><br />
<!-- ****************Start T1 liaison****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) [https://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Agenda] Meeting takes place on Vidyo.<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
Highlights from this meeting are now included in the Tier1 report farther up this page.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Liaison****************** -----><br />
<!-- ****************Start GDB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Grid Deployment Board - [http://indico.cern.ch/categoryDisplay.py?categId=3l181 Agendas] [http://indico.cern.ch/categoryDisplay.py?categId=666 MB agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End GDB****************** -----><br />
<br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start NGI****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | NGI UK - [http://www.ukngi.ac.uk/ Homepage] [https://ca.grid-support.ac.uk/cgi-bin/pub/pki?cmd=getStaticPage&name=index CA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End NGI****************** -----><br />
<br />
<!-- ****************Start Events****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Events<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************Stop Events****************** -----><br />
<!-- ****************Start ATLAS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK ATLAS - [http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?#currentView=Shifter+view&highlight=false Shifter view] [http://www.atlas.ac.uk/ops/ News & Links]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Atlas S&C week 2-6 Feb 2015'''<br />
<br />
Production<br />
<br />
• Prodsys-2 in production since Dec 1st<br />
<br />
• Deployment has not been transparent , many issued has been solved, the grid is filled again <br />
<br />
• MC15 is expected to start soon, waiting for physics validations, evgen testing is underway and close to finalised.. Simulation expected to be broadly similar to MC14, no blockers expected. <br />
<br />
Rucio<br />
<br />
• Rucio in production since Dec 1st and is ready for LHC RUN-2. Some fields need improvements, including transfer and deletion agents, documentation and monitoring. <br />
<br />
• [https://rucio-ui.cern.ch/dumps Rucio dumps available].<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMDarkDataCleaning Dark data cleaning]<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMLostFilesLost files declaration] . Only Only DDM ops can issue lost files declaration for now, cloud support needs to fill a ticket.<br />
<br />
• Webdav panda functional tests with Hammercloud are ongoing<br />
<br />
Monitoring<br />
<br />
• [http://adc-monitoring.cern.ch/ Main page]<br />
<br />
• [http://dashb-atlas-ddm-acc.cern.ch/dashboard/request.py/ddmaccounting DDM Accounting]<br />
<br />
• [http://atlas-agis.cern.ch/agis/ddmblacklisting/listGroup space]<br />
<br />
• [http://dashb-atlas-ddm.cern.ch/ddm2/ Deletion]<br />
<br />
ASAP<br />
<br />
• ASAP (ATLAS Site Availability Performance) in place. Every 3 months the T2s sites performing BELOW 80% are [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/ATLASSiteCategorization reported to the International Computing Board.]<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End ATLAS****************** -----><br />
<!-- ****************Start CMS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK CMS<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End CMS****************** -----><br />
<!-- ****************Start LHCb****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK LHCb<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End LHCb****************** -----><br />
<!-- ****************Start Other****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK OTHER<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Other****************** -----><br />
<!-- ****************Start Requests****************** -----><br />
{| cellspacing="0" cellpadding="0" style="margin: 1em 0em 0em 0.3em; width:100%"<br />
| style="width:50%; vertical-align:top; border:1px solid Gold; background-color: LightGreen;" rowspan="2"|<br />
<div style="border-bottom:1px solid Gold; background-color:#ffffaa; padding:0.2em 0.5em 0.2em 0.5em; font-size:110%; font-weight:bold;"> To note</div><br />
<div style="padding:0.4em 1em 0.3em 1em;"><br />
<br />
==== ====<br />
<!-- ******************Edit start********************* -----><br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<br />
</div><br />
|}<br />
<br />
<!-- ****************End Requests****************** -----><br />
<br />
|}</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Past_Ticket_Bulletins_2015Past Ticket Bulletins 20152015-11-23T15:02:16Z<p>Matthew Doidge 1ac9bd3994: </p>
<hr />
<div>'''Monday 16th November 2015, 15.45 GMT'''<br /><br />
Only 17 Open UK tickets this week.<br />
<br />
'''LSST FRIDAY 13th TEETHING TROUBLES'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117585 117585] (Liverpool)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117586 117586] (Oxford)<br /><br />
Daniela has been testing out the LSST Dirac pilots - Ewan's fighting the good fight at Oxford getting his ARC to work, maybe Liverpool's problem has a similar root cause (we should be so lucky)?<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116866 116866] (12/10)<br /><br />
In a similar vein (and thanks again to Daniela for the effort with preparing our pilots) is this ticket about getting "other" pilots enabled at the Tier 1 - particularly with a view of these tests: http://www.gridpp.ac.uk/php/gridpp-dirac-sam.php?action=viewlcg Any news? On hold (6/11)<br />
<br />
'''ECDF(-RDF)'''<br /><br />
I think this new atlas ticket:<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117606 117606] (15/11)<br /><br />
is a duplicate of this current atlas ticket:<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117447 117447] (8/11)<br /><br />
So I suspect it can be closed!<br />
<br />
I see from [https://ggus.eu/?mode=ticket_info&ticket_id=117642 117642] that the other [https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 VO nagios] has been recently rebooted which might explain the lot of failures I see, I'll check again before the meeting.<br />
<br />
<br />
'''Tuesday 1oth November 2015, 10.20 GMT'''<br /><br />
Down to 13 Open tickets this week!<br />
<br />
All the UK can be found [http://tinyurl.com/nwgrnys '''here'''].<br />
<br />
Not very exciting - the ECDF-RDF ticket [https://ggus.eu/?mode=ticket_info&ticket_id=117447 117447] (atlas complaining about stale file handle error messages - but it's for the "RDF" so not urgent?) might have slipped past Andy's sentries yesterday, and the two remaining pilot role tickets for Sheffield and the Tier 1 ([https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] and [https://ggus.eu/?mode=ticket_info&ticket_id=116866 116866]) could really do with updates - with the latter Daniela notes that the gridpp pilots roles are needed for testing purposes.<br />
<br />
Anything I've missed?<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios'''] - no persistent trouble anywhere (that matters to us) - a few ARC CE submit troubles over the last hour or so with some RALPP CEs but I don't think that's ought to worry about.<br />
<br />
'''Monday 2nd November 2015, 13.30 GMT'''<br /><br />
22 Open UK Tickets this week. First Monday of the Month, so all the tickets get looked at, however run of the mill they are.<br />
<br />
First, the link to all the UK [http://tinyurl.com/nwgrnys '''tickets'''].<br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116915 116915] (14/10)<br /><br />
Low availability Ops ticket. On holded whilst the numbers sooth themselves. On Hold (23/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116865 116865] (12/10)<br /><br />
Sno+ job submission failures. Not much on this ticket since it was set In Progress. Looks like an argus problem. How goes things at Sussex before Matt RB moves on? (We'll miss you Matt!). In progress (20/10)<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117261 117261] (28/10)<br /><br />
Atlas jobs failing with stage out failures. Federico notices that the failures are due to odd errors - "file already existing", and that things seem to be calming themselves. He's at a loss of what RALPP can do. Checking the panda link suggests the errors are still there today. Waiting for reply (29/10)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116775 116775] (6/10)<br /><br />
Bristol's CMS glexec ticket. It looks like the solution is to have more cms pool accounts (which of course requires time to deploy). In progress (28/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117303 117303] (30/10)<br /><br />
CMS, not Highlander fans, don't seem to believe that There can be only One (glexec ticket). Poor old Bristol seem to be playing whack-a-mole with duplicate tickets. Is there a note that can be left somewhere to stop this happening? Assigned (30/10)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=95303 95303] (Long long ago)<br /><br />
Edinburgh's (and indeed Scotgrid's) only ticket is this tarball glexec ticket. A bit more on this later. On hold (18/5)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=114460 114460] (18/6)<br /><br />
Gridpp (and other) VO pilot roles at Sheffield. No news for a while, snoplus are trying to use pilot roles now for dirac so this is becoming very relevant. In progress (9/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116560 116560] (30/9)<br /><br />
Sno+ jobs failing, likely due to too many being submitted to the 10 slots that Sno+ has. Maybe a WMS scheduling problem - Stephen B has given advice. Elena asked if the problem persisted a few weeks ago. Waiting for reply (12/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116967 116967] (17/10)<br /><br />
A ROD availability ticket, on hold as per SOP. On hold (20/10)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116478 116478] (28/9)<br /><br />
Another availability ticket. Autumn was not kind to many of us! On hold (8/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116882 116882] (13/10)<br /><br />
Enabling pilot snoplus users at Lancaster. Shouldn't have been a problem, but turned into a bit of a comedy/tragedy of errors by yours truly mucking up. Hopefully fixed now- thanks to Daniela for her patience. In progress (2/11)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=95299 95299] (Far far away)<br /><br />
glexec tarball ticket. There's been a lot of communication with the glexec devs about this - the hopefully last hurdle is sorting out the RPATHs for the libraries. It's not a small hurdle though... On hold (2/11)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117151 117151](23/10)<br /><br />
A ticket about jumbo frame problems, submitted to QM. After Dan provided some education the user replied, in that he only sees this problem at two atlas sites. But he is contacting the network admins at his institution to see if it is their end. On hold (29/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117011 117011] (19/10)<br /><br />
ROD ticket for glue-validate errors. Went away for a while after Dan re-yaimed his site bdii, but possibly back again. Daniela suggests re-running the glue-validate test. Reopened (2/11)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116689 116689] (6/10)<br /><br />
Another ROD ticket, where Ops glexec test jobs are seemingly timing out for QM (this is the ticket Daniela mentioned on the ops mailing list). Dan noted that with the cluster half full tests were passing, suggesting some kind of load correlation (but as he also notes - what's getting loaded and causing the problem - Batch, CE or WNs?). Kashif reckons the argus server, and suggests a handy glexec time test which he posted. In progress (2/11)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117324 117324] (2/11)<br /><br />
A fresh looking ROD ticket - Raul had to restart the BDII and hopefully that got it. In progress (2/11)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116358 116358] (22/9)<br /><br />
Missing Image at 100IT. 100IT have asked for more details, no news since. Waiting for reply (19/10)<br />
<br />
'''THE TIER 1'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116866 116866] (12/10)<br /><br />
Snoplus pilot enablement (not actually a word) at the Tier 1. New accounts were being requested after some internal discussion. On hold (19/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116864 116864] (12/10)<br /><br />
CMS AAA tests failing (the submitter notes "again..."). There are some oddities with other sites, which might be remote problems, but Andrew notes that previous manual fixes have been overwritten which likely explains why problems came back. In progress (does it need to be waiting for a reply?) (26/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117171 117171] (24/10)<br /><br />
LHCB had problems with an arc CE that was misbehaving for everyone. Things were fixed, and this ticket can now be closed. Waiting for reply (can be closed) (27/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117277 117277] (30/10)<br /><br />
Atlas have spotted "bring online timeout has been exceeded). This appears to be a mixture of problems adding up, such as a number of borken disk nodes and heavy write access by atlas. In progress (2/11)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117248 117248] (28/10)<br /><br />
I believe related to the discussion on tb-support, this ticket requests that new SRM host certs that meet the requirements specified be requested for the RAL SRMs. Jens was on it, and the new certs are ready to be deployed. In progress (30/10)<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios'''] - some badness at Sussex, but they have a ticket open for that.<br />
<br />
'''Monday 26th October 2015, 15.30 GMT'''<br /><br />
26 Open UK Tickets this week. Not many seem all that exciting though.<br />
<br />
The link to all the UK [http://tinyurl.com/nwgrnys '''tickets'''].<br />
<br />
The few (two) tickets that really caught my eye are:<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117151 117151] (23/10)<br /><br />
This ticket is quite interesting, mainly for Dan schooling the submitter. QM received a ticket complaining that their jumbo frames were breaking stuff - it doesn't looking like the problem is at QM though. Naught wrong with the ticket handling. Waiting for reply (23/10)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117065 117065] (20/10)<br /><br />
Bristol have a CMS glexec error ticket that looks very similar to an existing one ([https://ggus.eu/?mode=ticket_info&ticket_id=116683 116683]), which is in turn spookily similar to a ticket being worked on by the Bristol admins ([https://ggus.eu/?mode=ticket_info&ticket_id=116775 116775]). At the very least I would say that if the two problems are different one would likely obfuscate the other. Is this a case of over-keen shifters submitting tickets without checking? I'd be tempted to close one or both of [https://ggus.eu/?mode=ticket_info&ticket_id=116683 116683] and this one ([https://ggus.eu/?mode=ticket_info&ticket_id=117065 117065]). Tell them I said it was okay to[1]. In progress (21/10)<br />
<br />
[1] It probably is okay to.<br />
<br />
There are also 4 availability tickets, all On Hold waiting for 30 days or so to pass.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios'''] looks clean at time of writing.<br />
<br />
'''Monday 19th October 2015, 14.30 BST'''<br /><br />
28 Open UK Tickets today.<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116920 116920] (14/10)<br /><br />
UCL have a availability ticket, and Andrew M wonders what can be done for them as a VAC site to stop getting these sort of tickets? Assigned (15/10) ''Update - Alessandra has updated and On-Holded the ticket. Thanks!''<br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116865 116865] (12/10)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116915 116915] (14/10)<br /><br />
Sussex have a Sno+ ticket and a ROD ticket that don't seem to have been looked at since they were submitted last week. Both just Assigned.<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116918 116918] (14/10)<br /><br />
Another ROD ticket (Invalid glue), I think this one has snuck past the Liver-Lad's watch. Assigned (14/10)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116782 116782] (7/10)<br /><br />
Another Rod ticket, Dan looks like he's tracked down why one of his CE's is misbehaving (MaxStartups in sshd_config). Looks like this ticket at least can be closed, Gareth confirms that the test is green. Waiting for reply (19/10)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116560 116560] (30/9)<br /><br />
Stephen B has added some more information to try to figure out why Sno+ jobs are flooding Sheffield's 10 Sno+ slots. Elena is not sure if the problem persists though. Waiting for reply (12/10)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116864 116864] (12/10)<br /><br />
It looks like this CMS AAA test problem has resolved itself - Federica asks if you chaps at RAL changed anything? Looks like the ticket can be closed. In progress (15/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116866 116866] (12/10)<br /><br />
Enabling Sno+ pilots at the Tier 1. Only the LHC VOs had pilot roles enabled at the Tier 1, Andrew was going to discuss how best to make these changes. As with a similar issue at Lancaster - probably best to do it for all the VOs that will be using Dirac. In progress (13/10)<br />
<br />
'''Monday 12th October 2015, 14.30 BST'''<br /><br />
23 Open UK Tickets this week. Just a light review.<br />
<br />
'''OXFORD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116752 116752] (1/10)<br /><br />
Oxford's CMS Phedex renewal ritual ticket. Chris B kindly offered in his ticket for RALPP to officiate this (un)holy task for Oxford, so I advise some communication between you chaps and him - which may well be going on, but it ain't in the ticket! Assigned (6/10)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116683 116683] (5/10)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116775 116775] (6/10)<br /><br />
CMS have by the looks of it thrown a pair of duplicate tickets Bristol's way. How rude! Lukasz has rightfully suggested closing one of them (I suggest 116683. Winnie's put a good reply in t'other one).The underlying problem appears to be a shortage of pool accounts - what are the recommended amount of accounts for VOs these days? In progress.<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] (18/6)<br /><br />
Sheffield pilot role tickets. Daniela has pointed out that as Sno+ have started using Sheffield in earnest they really could do with pilot roles enabled. In progress (9/10)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116812 116812] (8/10)<br /><br />
LHCB asked Andy to clean up the $LOGIN_POST_SCRIPT for LHCB at his site, removing a "export CMTEXTRATAGS="host-slc5"" line. Naught wrong with the ticket, but I liked how lhcb did some good debugging, and there's something about a profile script problem that reminds me of a simpler time... In progress (9/10)<br />
<br />
[http://tinyurl.com/nwgrnys '''A link to the rest of the tickets for completeness.''']<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''The other VO nagios.''']<br /><br />
Looks okay, some (known I think) errors at QM, and Sheffield seems to have a few SRM errors going on since Saturday.<br />
<br />
<br />
'''Monday 5th October 2015, 14.15 BST'''<br />
<br />
22 Open UK Tickets this month, all of them, Site by Site:<br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116136 116136] (9/9)<br /><br />
Sussex got a snoplus ticket for a high number of job failures, although simple test jobs ran okay. Matt asks if the problem persists, the reply was a resounding "not sure". In progress (think about closing) (21/9)<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116652 116652](1/10)<br /><br />
A ticket from CMS, about some important Phedex ritual that must occur on the 3rd of November, when the stars are right. The ticket needs some confirmation and feedback, plus the nomination of one site acolyte to receive the DBParam secrets from CMS - but the ticket only got assigned to sites this morning. Assigned (5/10)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116651 116651](1/10)<br /><br />
Same as the RALPP ticket, Winnie has volunteered Dr Kreczko for the task. In progress (5/10)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303](Long, long ago)<br /><br />
glexec ticket. On hold (18/5)<br />
<br />
'''DURHAM'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116576 116576](1/10)<br /><br />
Atlas ticket asking Durham to delete all files outside of the datadisk path. Oliver asks what this means for the other tokens (I think they can be sacrificed to feed datadisk, but Brian et al can confirm that). Waiting for reply (5/10)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116560 116560](30/9)<br /><br />
Sno+ jobs having trouble at Sheffield. Looks like a proxy going stale problem as only 10 Sno+ jobs at a time can run at Sheffield. Matt M asks if/how the WMS can be notified to stop sending jobs in such a case. In progress (30/9)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460](18/6)<br /><br />
Gridpp Pilot roles. No news on this for a while, after the last attempt seemed to not quite work. In progress (30/7)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116585 116585](1/10)<br /><br />
Biomed ticketed Manchester with problems from their VO nagios box - which Alessandra points out being due to there being no spare cycles for biomed to run on. Assigned (can be put on hold or closed?) (1/10)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116082 116082](7/9)<br /><br />
A classic Rod Availability ticket. On Hold (7/9)<br />
<br />
'''LANCASTER''' (a little embarrassing that my own site has the most tickets)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116478 116478] (28/9)<br /><br />
Another availability ticket, this time for Lancaster (which has been through the wars in September). Still trying to dig our way out, but even the Admin's broke. On hold (5/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116676 116676] (5/10)<br /><br />
Another ROD ticket, Lancaster's not quite out of the woods. We think WMS access is somewhat broken. We have no idea about the sha2 error. In progress (5/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116366 116366] (22/9)<br /><br />
Sno+ spotted malloc errors at Lancaster. The problems seemed to survive one batch of fixes, but I asked again if they still see problems after running a good number of jobs over the weekend. Waiting for reply (5/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (In a galaxy far, far way)<br /><br />
glexec ticket. This was supposed to be done last week, after I had figured out [https://www.gridpp.ac.uk/wiki/RelocatableGlexec "the formula"] - but then last week happened. On hold (5/10)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115959 115959] (31/8)<br /><br />
LHCB job errors at QM, with a 70% pilot failure rate on ce05. Dan couldn't see where things are breaking (only that the CE wasn't publishing to APEL- and asks if this is the cause of the problem?) Waiting for reply (5/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116662 116662] (5/10)<br /><br />
LHCB job failures on ce05 - almost certainly a duplicate of [https://ggus.eu/?mode=ticket_info&ticket_id=115959 115959], but it might have some useful information in it. Assigned (probably can be closed as a duplicate) (5/10)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116650 116650] (1/10)<br /><br />
Imperial's invitation to the CMS Phedex DBParam ritual. Daniela's on it, as well as the other CMS sites. On hold (5/10)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116649 116649] (1/10)<br /><br />
Brunel's ticket for the great DBParam alignment of 2015. On hold (5/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116455 116455] (28/9)<br /><br />
A CMS request to change the xrootd monitoring configs. Did you get round to doing this last week Raul? In progress (29/9)<br />
<br />
'''EFDA-JET'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115448 115448] (3/8)<br /><br />
Biomed having trouble tagging the jet CE. The Jet admins think this is the same underlying issues as their other ticket <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115496 115496]. In progress (25/9)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115496 115496] (5/8)<br /><br />
Biomed unable to remove files from the jet SE. There are clues that suggest that some dns oddness is the cause, but it's not clear. In progress (18/9)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116358 116358] (22/9)<br /><br />
Ticket complaining about a missing image at the site. Some to and fro, the ball is back in the site's court. In progress (2/10)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116618 116618] (1/10)<br /><br />
The Tier 1's CMS DBParam ritual ticket. In progress (5/10)<br />
<br />
Let me know if I missed ought.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''T'OTHER VO NAGIOS''']<br /><br />
At time of writing things looka a bit rough at QM, Liverpool (just getting over their downtime) and for Sno+ at Sheffield (likely related to their ticket).<br />
<br />
<br />
Matt's on holiday until the 29th, so he's being replaced with links or any update Jeremy is kind enough to provide.<br />
<br />
[http://tinyurl.com/nwgrnys '''CAST YOUR GAZE UPON THE UK'S GGUS TICKETS HERE'''].<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''FEAST YOUR EYES ON THE T'OTHER VO NAGIOS STATUS HERE''']<br />
<br />
"Normal" service will resume in October. I'll leave y'all anticipating that lovely review of all the tickets.<br />
<br />
'''Monday 14th September, 15.00 BST'''<br /><br />
<br />
Yet another brief ticket review - its been a tad busy! Hopefully normal service will resume, err, next month. My apologies.<br />
<br />
Only 20 Open UK tickets this week.<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115805 115805] - This RAL ticket from Sno+ looks like it can be closed, with there not actually being a problem with the site, or a feature that RAL would really want to implement. <br />
<br />
Keeping on the Tier 1, does this ticket concerning the FTS3 certificates ([https://ggus.eu/?mode=ticket_info&ticket_id=115290 115290]) have anyone looking into it yet?<br />
<br />
Are these two QMUL LHCB tickets duplicates, or related? [https://ggus.eu/?mode=ticket_info&ticket_id=115959 115959] & the more recent [https://ggus.eu/?mode=ticket_info&ticket_id=116153 116153]<br />
<br />
This Sussex ticket looks like it could do with someone taking a look - still just "assigned" since the 9th - [https://ggus.eu/?mode=ticket_info&ticket_id=116136 116136]<br />
<br />
Finally, an interesting VAC ticket for Oxford - "no space left on device" for atlas jobs - [https://ggus.eu/?mode=ticket_info&ticket_id=116123 116123]. Bit of an odd one!<br />
<br />
'''Monday 7th September 2015, 15.00 BST'''<br /><br />
20 Open UK Tickets this week.<br />
Due to Interesting Downtimes (apologies for reusing that pun!) yet another fairly light review. But not much is going on in ticketland.<br />
<br />
[http://tinyurl.com/nwgrnys THE GGUS TICKETS IN ALL THEIR GLORY]<br /><br />
The Sno+ ticket [https://ggus.eu/?mode=ticket_info&ticket_id=115805 115805] is interesting, Sno+ are looking at monitoring jobs submitted via WMS through the port 9000 URLS, but the RAL WMS behaves differently from the Glasgow one and doesn't let others with the same roles look at the links. Sno+ are still "developing" their grid infrastructure with the WMS in mind by the looks of it.<br />
<br />
The pilot role at Sheffield ticket [https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] could still do with some attention, or an update.<br />
<br />
Bristol had a ticket from CMS that looks interesting - [https://ggus.eu/?mode=ticket_info&ticket_id=115883 115883]. The CMS SAM3 tests are confused by Bristol having an SRM-less SE endpoint. Waiting for reply after things were clarified.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 T'Other VO Nagios Page]<br /><br />
A few "The CREAM service cannot accept jobs at the moment" style errors at QM, but they're only a few hours old. Otherwise looking alright beyond the usual noise.<br />
<br />
Of course with these light reviews I could well be missing something, so feel free to let me know - sites or VO representatives.<br />
<br />
'''Monday 24th August 2015, 15.45 BST'''<br /><br />
<br />
21 Open UK tickets this week, most being looked at or are understood. There will be no ticket update from Matt next week (1st September) either, as he will be flapping about during a local downtime.<br />
<br />
[http://tinyurl.com/nwgrnys YE OLDE GGUS TICKET LINK]<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] (gridpp pilots at Sheffield) could do with an update. The Liverpool ticket discussed last week ([https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248]) has received an update from the user saying that the ticket can be closed. As Steve mentioned last week the underlying issue is still very much there, but I don't think this ticket is a suitable banner for us to fight that battle under.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 T'Other VO Nagios Page]<br /><br />
As usual nothing to see here at time of writing, sites are doing a grand job of working for the monitored VOs.<br />
<br />
And that's all folks! See you in Liverpool.<br />
<br />
<br />
'''Monday 17th August 2015, 15.00 BST'''<br /><br />
29 Open UK Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114992 114992] (10/7)<br /><br />
It looks like this CMS transfer problem ticket can be closed after a user update last week, which reported enabling multiple streams solved the initial failures. In progress (13/8)<br />
<br />
'''EFDA-JET'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115613 115613] (assigned) ''Update- looks like this was a temporary problem, and the ticket can be closed.<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115448 115448] (in progress, but empty)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115496 115496] (in progress, but might not be a site problem).<br /><br />
Jet have 3 biomed tickets, 2 of which are looking a little neglected.<br />
<br />
'''CAMBRIDGE'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115655 115655] (12/8)<br /><br />
John rightfully asks why does a lengthy but very scheduled downtime set off a ROD alarm. Other then that it'll be worth "on-holding" this ticket whilst the unrighteous red flag fades. In progress (17/8) ''Update - On holded''<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
This Sno+ ticket looks like it needs some chasing up, no news for nearly 2 months. In progress (21/7) ''Update - Steve commented on this on TB-SUPPORT''<br />
<br />
'''Monday 10th August 2015, 15.00 BST'''<br /><br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 T'Other VO Nagios] looks alright at time of writing.<br />
<br />
24 Open UK tickets this week.<br />
<br />
''Lots of activity on GGUS since yesterday. Most positive, but I see the number of tickets at EFDA-JET increasing - all three from biomed.''<br />
<br />
'''Tier 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115512 115512] (5/8)<br /><br />
An interesting ticket - where a banned user is still banned after moving to LHCB from Biomed (no, he's not called Heinz). In Progress (6/8) ''Update - Andrew has asked that the ticket be reassigned to the argus devs as the pepd logs are showing oddness. Waiting for reply now.''<br />
<br />
'''Bristol'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115565 115565] (7/8)<br /><br />
Bristol's phedex agents are down, and have been for a few days. I might have dreamt this, but thought that the Bristol Phedex service might not be hosted at Bristol, especially after RALPP had a similar ticket ([https://ggus.eu/?mode=ticket_info&ticket_id=115566 115566]) at the same time. Assigned (7/8) ''Update - solved''<br />
<br />
'''Manchester'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115504 115504] (ROD Ticket) ''Solved'' <br /> <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115399 115399] (Wiki Ticket) ''Still Open'' <br /> <br />
Both these tickets look like they can be closed.<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115525 115525] (5/8)<br /><br />
Atlas deletion errors after a a disk server fell over- nothing wrong with the ticket handling, but Alessandra brings up a point that always niggles me - the emphasis on the total number of transaction errors and not the number of affected unique files. In progress (8/8)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB ticket sparked by those IPv6 problems. Still no word from Vladimir; Raja - could you comment? I suspect there's been plenty of room for LHCB jobs in QM's (and everyone else who mainlines atlas jobs) queues this weekend. Waiting for reply (21/7)<br />
<br />
'''Monday 3rd August 2015, 14.30 BST'''<br /><br />
<br />
23 Open tickets this month, full review time.<br />
<br />
'''''Newish this morning'''''<br /><br />
As discussed on TB-SUPPORT, a few sites have been getting "I can't lcg-tag your CE" tickets from Biomed. The Liverpool ticket [https://ggus.eu/?mode=ticket_info&ticket_id=115449 115449], solved and verified, was the flagship of these issues. Brunel ([https://ggus.eu/?mode=ticket_info&ticket_id=115445 115445]) and EFDA-JET ([https://ggus.eu/?mode=ticket_info&ticket_id=115448 115448]) also have tickets about this.<br />
<br />
<br />
'''Sno+ "glite-wms-job-status warning"''' (3/8)<br /><br />
Glasgow: [https://ggus.eu/?mode=ticket_info&ticket_id=115435 115435]<br /><br />
Tier 1: [https://ggus.eu/?mode=ticket_info&ticket_id=115434 115434]<br /><br />
Matt M submitted these tickets to Glasgow and the Tier 1 after having trouble with a proportion of Sno+ jobs. Both are being looked at- definitely worth collaborating on this one.<br />
<br />
'''Wiki-leak...'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115399 115399](31/7)<br /><br />
Jeremy noticed that the wiki didn't work for him on Friday - but it seems to work for Jeremy, Alessandra and myself now. As Jeremy notes the ticket can be closed, but out of interest did anyone else spot any problems? In progress (3/8)<br />
<br />
'''Spare the ROD...'''<br /><br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115433 115433] (3/8)<br /><br />
Some CE problems noticed on the dashboard for the Liver-lads - who might be in mourning. Assigned (3/8) ''Update - aaannd Solved by upping gridftp max connections''<br />
<br />
'''RALPP & UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114764 114764]<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114851 114851]<br /><br />
Both of these are "availability" alarm tickets, on-holding until they clear. I hope RALPP managed to get a re-computation for their unfair failures (IGTF-1.65 problems on ARC).<br />
<br />
'''Sno+'d Under'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115387 115387] (30/7)<br /><br />
I'm uncertain if this Sno+ ticket, probably somewhat related to Matt M's recent thread on TB-SUPPORT and concerning xrootd access, is meant for the Tier 1 or RALPP. Assigned (3/8)<br />
<br />
'''First Tier Problems.'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115417 115417] (2/8)<br /><br />
LHCB spotted a number of nodes with cvmfs problems at the Tier 1, which the RAL team had already jumped on and repaired this morning. They wonder if the problem persists. Waiting for reply (3/8)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115290 115290] (28/7)<br /><br />
An FTS problem requiring some special CA magic to solve, but the current CA-wizard isn't about. On hold (29/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113836 113836] (20/5)<br /><br />
Glue 1 vs Glue 2 queue mismatches. It's being worked on perfecting cluster publishing for ARC CEs, but the ticket could either do with an update or on-holding. In progress (24/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114992 114992] (10/7)<br /><br />
CMS transfers failing between RAL and, err, TAMU in the US. Assigned to RAL, where Brian has investigating and Andrew has posed an good question, asking if the user has considered managing the transfers with FTS. Quiet on the user side. In progress (21/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/2014)<br /><br />
One of the tickets from the before times, about CMS AA access tests. It has become a long and confusing saga, but Gareth rescued it with a handy summary of the issue in his last update. How goes the battle? In progress (17/7)<br />
<br />
'''Oxford Squid is red.'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115230 115230] (24/7)<br /><br />
Which might be the colour Ewan's seeing right now! The ticket is reopened, with a comment from Alessandra that the current recommendation is to allow all CERN addresses, and asks if this is something Oxford could do. Reopened (3/8) ''Update - solved after a squid restart. It's not the end of this ordeal, but Ewan would like to tackle it in a different arena then an Oxford ticket.''<br />
<br />
'''Mavaricks and Gooses - GridPP Pilot roles.'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114485 114485] '''Bristol'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] '''Sheffield'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114442 114442] '''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114441 114441] '''RHUL'''<br /><br />
Daniela hopped right back into the pilot seat after getting back from her holidays. Bristol and RALPP are looking good, Sheffield and RHUL are still in the Danger Zone - RHUL in particular were having troubles with argus and could do with some working configs from elsewhere to compare and contrast with their own.<br /><br />
''Update - RHUL are looking better, just a few queue permission tweaks to go by the looks of it.''<br />
<br />
<br />
'''My shame - tarball glexec'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] '''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] '''Lancaster'''<br /><br />
The tarball glexec tickets. Actually this is likely to become a defunct (or at least different) problem at Edinburgh with their SL7 move. Lancaster has a plan - we plan to deploy *something* (amazing plan there Matt) during our next big reinstall in September. Between now and then I have a test CE, cluster and most importantly some time. <br />
<br />
'''Pot-luck tickets (or those I couldn't group).'''<br />
<br />
'''Durham'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br /><br />
A tiny fraction of jobs publishing 0 cores used. Looks to be a slurm oddity. Oliver upgraded their CEs to ARC5 last week and hopes this has fixed things. Fingers crossed! In progress (29/7)<br />
<br />
'''Lancaster'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/2014)<br /><br />
My other shame - Lancaster's poor perfsonar performance. It's being worked on. On Hold (should be back in progress soon) (30/7)<br />
<br />
'''Liverpool'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (21/7)<br /><br />
Sno+ production problems at Liverpool, probably due to a lack of space in the shared area. Things are back in Sno+'s court, with the submitter consulting the Sno+ gurus (I think). In progress (21/7)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB job submission problems due to the known about dual-stacking problems. Waiting for input from LHCB for a while now, as things look okay at QM now but at least check LHCB jobs still weren't running for some reason. Waiting for reply (21/7)<br />
<br />
That's all folks!<br />
<br />
'''Monday 27th July 2015, 16.10 BST'''<br /><br />
<br />
Only 20 UK Tickets this week, and many are on hold for summer holidays. I pruned a few tickets, but none are striking me as needing urgent action, so this will be brief.<br />
<br />
[http://tinyurl.com/nwgrnys Mandatory UK GGUS Link]<br /><br />
Nothing to see here really, but maybe I'm missing something? Nit-picking I see:<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (publishing problems at Durham) could still do with an update - not sure if work is progressing offline on the issue.<br />
<br />
The Snoplus ticket [https://ggus.eu/?mode=ticket_info&ticket_id=115165 115165] looks like it might be of interest for others - in it Matt M asks about tape-functionality in gfal2 tools. Brian has updated the ticket clueing us in about gfal-xattr.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 UK T'other VO Nagios]<br /><br />
A few failures here at time of writing - although only one at Brunel seems to be more then a few hours old (dc2-grid-66.brunel.ac.uk is failing pheno CE tests).<br />
<br />
Let me know if I missed ought!<br />
<br />
'''Monday 20th July 2015, 14.30 BST'''<br /><br />
27 Open UK Tickets this week.<br />
<br />
'''Brunel'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115113 115113] (17/7)<br /><br />
This Brunel Ops ticket (otherwise okay) is a good reminder that when removing CEs from the GOCDB for your site make sure to get all the services, or else you just might end up still monitored (and thus end up ticketed!). Reopened (can probably be closed soon) (20/7) ''Update - ticket solved''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114006 114006] (31/5)<br /><br />
This ticket about problems with Brunels accounting figures was looking promisingly close to being solved (at least to my layman's eyes) 3 weeks ago. Any word offline? In progress (30/6)<br />
<br />
'''Lancaster'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114845 114845] (6/7)<br /><br />
LHCB jobs were failing at Lancaster a fortnight ago, but things should have been fixed quite promptly. Are they still broken? Waiting for reply (14/7)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
A similar case for this LHCB ticket for QM (part of their ongoing dual-stack saga). Waiting for reply (13/7)<br />
<br />
Note: The related issue [https://ggus.eu/index.php?mode=ticket_info&ticket_id=115017 115017] has been "solved" pending further testing. <br />
<br />
'''Sheffield'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114649 114649] (26/6)<br /><br />
This Sno+ ticket has been in "Waiting for Reply" for a little while, no word from the user (who isn't Matt M). Could we poke Sno+ though another channel about this? Waiting for reply (6/7)<br />
<br />
'''Liverpool'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
We could do with finding out from Sno+ the state of play at Liverpool too, although we might have to wait until Steve is back from his hols later this week to field any replies. In progress (17/6)<br />
<br />
'''Durham'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br /><br />
Ticket concerning the small percentage of Durham jobs that aren't publishing their core count (probably a slurm oddity). Now that Oliver's back from holiday has he had time to look at this? On Hold (19/6)<br />
<br />
'''Tier 1'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=113836 (20/5)<br /><br />
I suspect that whilst the work described chugs along in the background we consider on holding this ticket. In progress (24/6)<br />
<br />
'''UCL'''<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=114746 114746] (30/6)<br /><br />
Ben has been battling getting his DPM working, and spotted an interesting problem where SELinux was blocking the httpd from accessing mysql. It's nice to see someone not just switching SELinux off (like I have a habit of doing...). In progress (20/7)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115003 115003] (12/7)<br /><br />
Andy having some problems on a test SE at ECDF - he seems ot be suffering a series of unfortunate errors. Maybe the storage group could help? In progress (17/7)<br />
<br />
'''GridPP Pilot Roles.'''<br /><br />
Bristol are ready for testing, Govind discovered a possible bug in argus that needs a bit more testing at RHUL. Things seem quiet at RALPP and Sheffield, and Brunel too.<br />
<br />
''Supplemental - In the region multicore publishing ticket ([https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233]) only Oxford have a CE still appearing to publish 0 cores - but I thought this CE was Old-Yeller'ed?''<br />
<br />
'''Tuesday 14th July 2015, 9.30 BST'''<br />
<br />
Lazy update today, due to some fun and games at Lancaster yesterday. <br />
<br />
[http://tinyurl.com/nwgrnys UK GGUS Tickets]<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios]<br />
<br />
'''Tickets that pop:'''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114952 114952] and [https://ggus.eu/?mode=ticket_info&ticket_id=114951 114951] are both atlas frontier tickets at RALPP and Oxford, both have been reopened - although the underlying issues seem different. The Oxford ticket is similar to one at RAL ([https://ggus.eu/?mode=ticket_info&ticket_id=114957 114957]), which looks to be caused by an unannounced change in IP for some important atlas squids (AIUI - speed reading the tickets this morning).<br />
<br />
'''QM IPv6 Woes'''<br /><br />
Followers of the atlas uk lists will have noticed some heroic attempts to diagnose and repair problems at QM which appear to be someone else's fault.<br />
LHCB's ticket to QM on the matter: [https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573]<br /><br />
Dan's ticket concerning the "rogue routes": [https://ggus.eu/index.php?mode=ticket_info&ticket_id=115017 115017]<br />
<br />
'''GridPP Pilot Roles'''<br /><br />
Durham, Bristol, Sheffield, Brunel and RHUL still have open tickets about this. Bristol are working on it, as are Durham - Oliver's ready for their setup to be tested again (Puppet overwrote his last changes!). Not much recent news from the other three.<br />
<br />
That's all from me folks, let me know if I missed ought!<br />
<br />
'''Tuesday 14th July 2015, 9.30 BST'''<br />
<br />
Lazy update today, due to some fun and games at Lancaster yesterday. <br />
<br />
[http://tinyurl.com/nwgrnys UK GGUS Tickets]<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios]<br />
<br />
'''Tickets that pop:'''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114952 114952] and [https://ggus.eu/?mode=ticket_info&ticket_id=114951 114951] are both atlas frontier tickets at RALPP and Oxford, both have been reopened - although the underlying issues seem different. The Oxford ticket is similar to one at RAL ([https://ggus.eu/?mode=ticket_info&ticket_id=114957 114957]), which looks to be caused by an unannounced change in IP for some important atlas squids (AIUI - speed reading the tickets this morning).<br />
<br />
'''QM IPv6 Woes'''<br /><br />
Followers of the atlas uk lists will have noticed some heroic attempts to diagnose and repair problems at QM which appear to be someone else's fault.<br />
LHCB's ticket to QM on the matter: [https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573]<br /><br />
Dan's ticket concerning the "rogue routes": [https://ggus.eu/index.php?mode=ticket_info&ticket_id=115017 115017]<br />
<br />
'''GridPP Pilot Roles'''<br /><br />
Durham, Bristol, Sheffield, Brunel and RHUL still have open tickets about this. Bristol are working on it, as are Durham - Oliver's ready for their setup to be tested again (Puppet overwrote his last changes!). Not much recent news from the other three.<br />
<br />
That's all from me folks, let me know if I missed ought!<br />
<br />
'''Monday 6th July 2015, 14.00 BST''' <br /><br />
30 Open UK Tickets this month. Looking at them all!<br />
<br />
'''NGI'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233] (10/6)<br /><br />
The UK not publishing core counts at all sites. Some progress, but at last check John G couldn't see a change for Oxford or Glasgow. In progress (30/6) ''Update - Glasgow seems to be okay after de-creaming, checking the July list we have t2ce6 at Oxford, ce3 and ce4 at Durham (see their ticket) and cetest02 at IC (but that node has test in its hostname!).''<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114442 114442] (18/6)<br /><br />
Gridpp Pilot role ticket. Accounts need to be created, but no word for a few weeks. In progress (19/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114764 114764] (1/7)<br /><br />
Ticket tracking (false) availability issues, created to appease COD - the problem caused by a broken CA rpm release for Arc CEs. Kashif has created a counter-ticket [https://ggus.eu/index.php?mode=ticket_info&ticket_id=114742 114742] Gordon's sagely advice is to submit a recalculation request once the issue is fixed. Assigned (1/7)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114485 114485] (19/6)<br /><br />
Bristol's gridpp pilot role ticket. No news, could do with an update really. In progress (22/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114426 114426] (18/6)<br /><br />
CMS AAA reading test problems. The Bristol admins have transferred data to their new shiny SE and have asked CMS to test again. No word since. Waiting for reply (30/6)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (1/7/13...)<br /><br />
Tarball glexec ticket, now 2 years old. After a really promising burst the last 6 weeks haven't seen any progress, due to a lot of other "normal" tarball work taking up the time. Sorry! On hold (18/5)<br />
<br />
'''DURHAM'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114536 114536] (22/6)<br /><br />
Durham's gridpp pilot role ticket. Not acknowledged yet, is Oliver back yet? Assigned (22/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114765 114765] (1/7)<br /><br />
See RALPP ticket 114764. Assigned (1/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114727 114727] (30/6)<br /><br />
Catalin ticketed that a number of SW_DIR variables at Durham are still pointing to the old school .gridpp.ac.uk cvmfs space. Assigned (30/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br /><br />
John G ticketed Durham over a small percentage of jobs being published as "zero core". Looks like a SLURM timeout problem, although a fix isn't obvious. Put on the back burner whilst Oliver is on holiday. On Hold (19/6)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114649 114649] (26/6)<br /><br />
A ticket from a Sno+ user about not being able to access software using the Sheffield CEs. Acknowledged but no news. In progress (26/6) ''Update - Elena can't find anything wrong, cvmfs seems to be working fine. Perhaps a problem with the environment?''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] (18/6)<br /><br />
Sheffield's gridpp pilot role ticket. Did you get round to rolling them out? In progress (19/6)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114444 114444] (18/6)<br /><br />
LHCB ticket concerning the DPM's SRM not returning checksum information. On hold whilst a related ticket is being looked at ([https://ggus.eu/index.php?mode=ticket_info&ticket_id=111403 111403]). On Hold (22/6)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
Another Sno+ ticket, about grid production jobs failing at Liverpool. AIUI caused by Sno+ running out of space on the shared pool. At last check Steve posted the usage information for Sno+ but no word since (and Steve's off on his hols). In progress (17/6)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114845 114845] (6/7)<br /><br />
LHCB pilots failing at Lancaster. Looks like a simple node misconfiguration, hopefully fixed, waiting to see if it is. On hold (6/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/2013)<br /><br />
glexec ticket - see Edinburgh description. On hold (15/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1)<br /><br />
Bad bandwidth performance at Lancaster. Hoping that IPv6 will shake things up a bit so pushing that. On hold (18/5)<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114746 114746] (30/6)<br /><br />
SRM-put failures ROD ticket. No news at all. Assigned (30/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114851 114851] (6/7)<br /><br />
Low availability ROD ticket, related to above. Assigned (6/7)<br />
<br />
'''RHUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114441 114441] (18/6)<br /><br />
Another GridPP pilot role ticket. Pilots rolled out, but something isn't quite right and they're not working - Govind is looking again. In progress (6/7)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB ticket about two out of three QM CEs not responding for them. Dan spotted the broken CEs were dual-stacked, the working one wasn't. The ticket seemed to have trailed off into some confusion over who needs to do some testing where. I agree with Dan that that who needs to be someone with LHCB credentials! The waters still seem muddied. In progress (1/7)<br />
<br />
'''IC'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114737 114737] (30/6)<br /><br />
The IC voms wasn't updating properly, due to what I infer from the ticket as "SSL/mysql madness". Simon and Robert have been heroically battling this one - it's a good read. On hold (3/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114379 114379] (16/6)<br /><br />
Sam's ticket about SE support in Dirac. Sam will shortly try testing things out on the new Dirac to see how it fares. In progress (6/7)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114447 114447] (18/6)<br /><br />
Brunel's gridpp pilot ticket. Being worked on, with one CE with the pilots enabled. In progress (26/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114006 114006] (31/5)<br /><br />
A ticket from APEL, about Brunel under-reporting the number of jobs they are doing. Turned out to be a problem with Arc, which Raul upgraded to the fixed 5.0 version. The APEL team deleted the sync records, but no word since. In progress (30/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114850 114850] (6/7)<br /><br />
Another APEL ticket, likely the fallout of the previous one - it looks like GAP publishing has been left on for the Brunel CREAM CEs. Assigned (6/7) ''Update - solved''<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114786 114786] (2/7)<br /><br />
Low availability ticket - see RALPP ticket 114442 - probably could do with On holding. In progress (2/7) ''Update - Onholded''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113910 113910] (26/5)<br /><br />
Sno+ data staging problems. Brian gave some advice on how the large VOs do data staging from tape, and has asked if Sno+ still has problems. Matt M might still be on leave though. Waiting for reply (23/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA problems, which eventually brought to light to a problem with super-hot datasets which were alleviated (I think). Despite an update to castor that improved performance the last batch of tests didn't show improved results. No news since. In progress (17/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113836 113836] (20/5)<br /><br />
Glue mismatch problems at RAL. Working on getting "many-Arcs" to correctly publish. In progress (24/6)<br />
<br />
'''Monday 29th June 2015, 14.30 BST'''<br /><br />
<br />
Looking at the [https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''"Other VO" Nagios''']. <br /><br />
Things look generally alright - but Durham look like they need to update their CA rpms - but that might have to wait until Oliver is back from leave.<br />
<br />
'''Tarballs:'''<br /><br />
I don't think this effects many, but there's was a ticket to produce a new version of the WN tarball (which is done): [https://ggus.eu/index.php?mode=ticket_info&ticket_id=114574 114574]<br ><br />
Although AFAICS there is no urgent need to upgrade tarball WNs.<br />
<br />
26 UK Tickets, although not many stand out.<br />
<br />
'''Gridpp VO Pilot tickets:'''<br /><br />
Largely doing alright. With Oliver away the Durham ticket hasn't been looked at yet. Sheffield and Bristol's tickets could do with an update (or on-holding if there's going to be a delay). The RHUL ticket has been reopened as they're deployment of the pilot roles hasn't quite worked out.<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB having trouble with two out of three QM CEs. Dan notes that the two "broken" CEs have been recently dual-stacked, and asks if this could be the problem. The answers is a resounding "maybe", and Raja asks if problems could be duplicated by others using lxplus. Waiting for reply (24/6)<br />
<br />
'''IMPERIAL/DIRAC'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114379 114379] (16/6)<br /><br />
Sam's ticket trying to get SE support with Dirac...er, spruced up. Daniela has asked if the tests can be redone with the "new" dirac. Waiting for reply (22/6)<br />
<br />
Let me know if I missed any tickets.<br />
'''Monday 22nd June 2015, 14.00 BST'''<br />
<br />
35 Open UK Tickets this week (!!!)<br />
<br />
'''GridPP Pilot Role'''<br /><br />
A dozen of them are from Daniela (who painstakingly submitted them all) concerning getting the gridpp (and other) pilot role enabled on the site in question's CEs.<br />
<br />
An example of one of these tickets is:<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=114440 114440] (Lancaster's smug solved ticket).<br />
<br />
Ticketed sites are: Durham, Bristol, Cambridge, Glasgow, ECDF (who are also having general gridpp VO support problems), EFDA-JET (looking solved), Oxford, Liverpool, Sheffield, Brunel, RALPP and RHUL. Most tickets are being worked on fine, but the Bristol and Liverpool ones were still just in an "assigned" state at time of writing. <br /><br />
''Update - good progress on this, just one ticket left "assigned". Cambridge are done, as are JET (ticket needs to be closed). Oxford and Manchester are ready for to have their new setups tried out, with Oxford kindly road-testing glexec for the pilot roles. Good stuff.'' <br />
<br />
'''Core Count Publishing'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233] (10/6)<br /><br />
Of the sites mentioned in this ticket (Durham[1], IC, Liverpool, Glasgow, Oxford) who *hasn't* had a go at changing their core count publishing? I know Oxford have. Daniela had a pertinent question about publishing for VMs, which John answered. In progress (17/6)<br />
<br />
[1] Durham have another ticket on this which may explain their lack of core count publishing: <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br />
<br />
'''DIRAC'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114379 114379] (16/6)<br /><br />
Sam S formed this ticket over having trouble accessing the majority of SEs over Dirac, after some discussion around this last week. Sam acknowledges that this could be a site problem, not a DIRAC problem, but you gotta start somewhere (he worded that point more eloquently). Daniela has posted her latest and greatest DIRAC setup gubbins for Sam to try out. Another, unrelated, point to have are the names missing from Sam's list - for example I'm pretty sure Lancaster should support gridpp VO storage but I've forgotten to roll it out! Waiting for reply (22/6)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
Final ticket today, and another one discussed last week in the storage meeting. Steve's explanation of why (and how) Sno+ would need to start to using space tokens was fantastically well worded in a way to not spook easily startled users. David is digesting the information, but it will likely need to wait for Matt M's return before we'll see progress. In progress (16/6)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114444 114444](18/6)<br /><br />
I told a pork pie when I said that was the last ticket - this one caught my eye. A ticket from lhcb over files not having their checksums stored on Manchester's DPM. A link was given to another ticket at CBPF for a similar issue which got the DPM devs involved ([https://ggus.eu/index.php?mode=ticket_info&ticket_id=111403 111403]) - although Andrew McNab was already subscribed to the ticket. In progress (19/6)<br />
<br />
'''Monday 15th June 2015, 14.15 BST'''<br /><br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO nagios:]<br /><br />
ce05.esc.qmul.ac.uk and hepgrid11.ph.liv.ac.uk seem to be having a spot of bother for multiple VOs, and hepgrid2.ph.liv.ac.uk seems to be starting to have trouble too.<br />
<br />
19 Open UK Tickets this week.<br />
<br />
'''NGI'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233] (10/6)<br /><br />
John Gordon ticketed the NGI (as well as others) about some sites in the UK not publishing core counts with their APEL numbers (or more precisely have submission hosts at that site not publishing). Following the link it looks like Imperial, Liverpool, Durham, Glasgow and Oxford are on this list, I've listed the submission hosts reporting "0" core jobs below to help people clear up their rogues! If you've only fixed things in the last fortnight you'd still show up on this list. In progress (15/6) <br />
<br />
"0" core job submission hosts:<br /><br />
ce3.dur.scotgrid.ac.uk<br /><br />
ce4.dur.scotgrid.ac.uk<br /><br />
cetest02.grid.hep.ph.ic.ac.uk<br /><br />
hepgrid5.ph.liv.ac.uk<br /><br />
hepgrid6.ph.liv.ac.uk<br /><br />
hepgrid97.ph.liv.ac.uk<br /><br />
svr009.gla.scotgrid.ac.uk<br /><br />
t2ce06.physics.ox.ac.uk<br />
<br />
'''Tier 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113914 113914] (26/5)<br /><br />
Sno+ file copying ticket. Matt M is away on his hols, but Dave Auty has took over his duties and reports that this problem seems to have gone away - it can probably be closed. In progress (9/6)<br />
<br />
'''Liverpool'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
Another Sno+ ticket here from David concerning job failures. Nothing wrong with the ticket handling, but I thought that David's errors in submitting test jobs are worth documenting, as they were very understandable. David has since asked if the Sno+ job failures are linked to Sno+ nagios test failures at Liverpool. In progress (12/6)<br />
<br />
'''Imperial'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114157 114157] (8/6)<br /><br />
After Simon and Daniela have cleared up the atlas dark data and expanded their Space Tokens using the space freed up there still seems to be some confusion in Rucio, disagreeing with the SRM numbers. In progress (10/6)<br />
<br />
'''Oxford'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114208 114208] (9/6)<br /><br />
Oxford being ticketed for UKI-SOUTHGRID-OX-HEP_IPV6TEST failing connection tests, tests for which Oxford should not be getting ticketed afaacs. I remember this being mentioned in the Thursday cloud meeting, but I'm ashamed to say I wasn't paying attention. Were any conclusions drawn/decisions made? In progress (10/6)<br />
<br />
'''Manchester'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114153 114153] (7/6)<br /><br />
Atlas transfer failures from Manchester. Errors are still occurring as of yesterday, any news? In progress (14/6)<br />
<br />
<br />
'''Monday 8th June 2015, 15.00 BST'''<br /><br />
21 Open Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113914 113914] (26/5)<br /><br />
Sno+ had problems at the Tier 1 where jobs failed whilst uploading data, believed to be due to an incorrect VOInfoPath. There's been a failure at replicating the issue, and the VOInfoPath advertised is correct. Very confusing, as I assume it all worked at some point before! In progress (2/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113910 113910] (26/5)<br /><br />
Another Sno+ ticket, concerning lcg-cp timeouts whilst data-staging from tape. Matt M has asked for advice on the best practice for doing this, or if Sno+ would be better off just upping their timeouts. Brian has given some advice on using the "bringonline" command, but is himself unsure the best way of seeing what files are currently in a VO's cache. Not much news since. In progress (28/5) <br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114004 114004] (31/5)<br /><br />
Atlas transfers fail due to the "bring-online" timeout being exceeded. Brian spotted a problem with file timestamps mismatching, but no news on this ticket since. In progress (1/6)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114006 114006] (31/5)<br /><br />
APEL accounting oddness at Brunel, noticed by the APEL team. After much to-and-fro-ing John noticed that multiple CEs were using the same SumbitHost, and thus overwriting each other's sync records. Something to watch out for. In progress (7/6)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114157 114157] (8/6)<br /><br />
There's been some debate on the atlas lists about this ticket, a classic "not enough space at the site" ticket. Raising above the indignation over being ticketed for this, Daniela has offered a couple of TB to give some space, and pointed out that IC have some atlas data outside space tokens, and that this could be used to expand the tokens if cleaned up. Waiting for reply (8/6)<br />
<br />
<br />
'''Tuesday 26th May 2015'''<br /><br />
Matt's on leave until the 8th of June. But he's replaceable with handy links... 23 tickets today:<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios''']<br /><br />
<br />
[http://tinyurl.com/nwgrnys '''UK NGI GGUS tickets''']<br />
<br />
'''Monday 18th May 2015, 14.30 BST'''<br /><br />
Full review this week.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios''']<br /><br />
At time of writing I see problems with test jobs at Brunel for pheno and Liverpool for a number of VOs (see Sno+ ticket for probable cause and fix at Liverpool).<br />
<br />
22 Open UK Tickets this week. Going site-by-site:<br />
<br />
'''APEL/NGI'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113473 113473] (4/5)<br /><br />
Missing accounting date for April for some sites. Raul is discussing things for Brunel in the ticket, although they have republished. I think it's only ECDF left to republish their April data. In progress (16/5)<br />
<br />
'''OXFORD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113482 113482] (26/4)<br /><br />
Loss of accounting data for Oxford needing a APEL republish. The Oxford guys republished, but there is some confusion with the resulting numbers. Discussion is ongoing, John G is currently looking at the records. In progress (14/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113650 113650] (11/5)<br /><br />
CMS glideins failing at Oxford. The original problem was with a config tweak being left out of the cvmfs setup, but the ticket has been reopened citing problems persisting on the ARC CE (the CREAM appears to be fixed). Reopened (16/5)<br />
<br />
'''GLASGOW'''<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113095 113095] (17/4)<br /><br />
ROD ticket about batch system BDII failures, left open to avoid unnecessary ticket filing. Gareth noted that the full migration to ARC and HTCondor, which should see the end of these issues, will hopefully be completed by the end of June. On Hold (12/5)<br />
<br />
'''ECDF'''<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (31/7/13)<br /><br />
Somehow left this one out of the e-mail update. Edinburgh's glexec ticket, dependent on the tarball. I put in my tuppence worth today with my tarball hat on. On hold (18/5)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113769 113769] (18/5)<br /><br />
LHCB see a cvmfs problem at Sheffield. Elena has probably fixed the problem(restarted the sssd), just waiting to see if it all pans out. In progress (18/5)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113744 113744] (15/5)<br /><br />
For the VOMS rather then the site, Jens' request for the creation of the dIrac VO, vo.dirac.ac.uk. In progress (18/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113692 113692] (13/5)<br /><br />
A request from pheno to add support to for their new cvmfs area at Manchester, and as I understand it, to support them in a new "form" (pheno.egi.eu). In progress (13/5)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113742 113742] (15/5)<br /><br />
Sno+ noticed their nagios failures at Liverpool. Rob reckons this was a problem with the DPM BDII service certificate not being updated (that's bitten me too), and fixed things this morning. Let's see how that goes. In progress (18/5)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13!)<br /><br />
Lancaster's vintage glexec ticket. An update on this - after have a roundtuit session last week I was building glexec for different paths. It still needs some testing to make sure it works properly. There however definitely won't be a one-size-fits-all tarball solution. On hold (15/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/14)<br /><br />
Only the crustiest old tickets for us at Lancaster! Poor perfsonar performance. Sadly didn't get roundtuit on this one - we're pushing getting these nodes dual stacked as Ewan had pointed out that it would be interesting to see if IPv6 tests also saw this issue. On hild (18/5)<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113721 113721] (14/5)<br /><br />
The only UCL ticket, this is a egi "low availability" ticket. However Daniela notes that the plots are on the rise, so things are looking alright. Probably want to "On Hold" it but otherwise not much to be done. In progress (14/5)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113743 113743] (15/5)<br /><br />
A ticket from Durham concerning the Dirac instance at Imperial's settings for their site. Daniela hopes to get it fixed soon. In progress (15/5)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112948 112948] (10/4)<br /><br />
CA certificate update at 100IT leading to a discussion of other authentication based failures. David has asked for voms information after posting his configs. In progress (13/5)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113035 113035] (14/4)<br /><br />
Ticket tracking the decommissioning of the Tier 1 CREAM CEs. I think things are just about done now, this ticket can soon be closed. In progress (11/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694] (28/10/14)<br /><br />
Sno+ gfal-copy ticket. Brian reports that the Tier 1 is upgrading gfal2 on their WNs, and notes that there's a lot of active debugging work going on in the area. As he eloquently puts it "situation is quite fluid". In progress (13/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA tests failing at the Tier 1. There's been a lot of work on this, deploying then trying to get the new xrootd director configured. New problems have cropped up, and are under investigation. In progress (11/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (28/3)<br /><br />
Atlas transfer failures ("failed to get source file size"). Tracked to a odd double transfer error, possibly introduced in one of the recent "upgrades". Brian has been declaring these files as bad, and a workaround or solution is being thought about. In progress (14/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113705 113705] (13/5)<br /><br />
Atlas transfer failures from RAL tape. Checksum failures, which Brian tracked to being due to not being of a type Castor supports. Brian has asked if this can be changed at the CERN FTS or in rucio. Waiting for reply (14/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113748 113748] (16/5)<br /><br />
Another atlas transfer ticket, but as the error indicates no space left at the Brunel space token being transferred to Elena has noted that this isn't a site problem, telling the submitter to put in a JIRA ticket instead. Waiting for reply, but probably can be just closed (16/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112866 112866] (7/4)<br /><br />
Lots of cms job failures at RAL. This has been traced to some super-hot files, mitigation is being looked into. A candidate for perhaps On Holding, depends on the time frame of a work around. In progress (13/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113320 113320] (27/4)<br /><br />
CMS data transfer issues. I'm not actually too sure what's going on. There are files that need invalidating, which seems to be the root of the evil befalling transfers. The issue is being actively worked on though. In progress (18/5)<br />
<br />
'''Monday 11th May 2015, 14.10 BST'''<br /><br />
22 Open UK Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
There are a few tickets at the Tier 1 that are set "In Progress" but haven't received an update yet this month:<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (CMS AAA Tests, 30/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (Atlas Transfer problems, 16/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694] (SNO+ gfal copy trouble, 15/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112866 112866] (CMS job failures, 7/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112819 112819] (SNO+ arcsync troubles, 20/4)<br />
<br />
'''Other Tier 1 Tickets''' (sorry to be picking on you guys!)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (10/2)<br /><br />
Atlas glexec hammercloud test jobs at the Tier 1. It appears to be working, but a batch of test jobs failed because they couldn't find the "mkgltempdir" utility on some nodes ("slot1_5@lcg1742.gridpp.rl.ac.uk" and "slot1_4@lcg1739.gridpp.rl.ac.uk"). In progress (4/5)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=113320 113320] (27/4)<br /><br />
Maybe repeating what Daniela is going to say in the CMS update - trouble with CMS data transfers within RAL. It's under investigation, but it looks like the files in question will need to be invalidated - even if it's just to paint a clearer picture. In progress (10/5)<br />
<br />
'''APEL REPUBLISHING'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113473 113473] <br /><br />
At last update Brunel, Liverpool, Edinburgh, Birmingham and Oxford need to republish still. Oxford have their own ticket about it due to complications ([https://ggus.eu/?mode=ticket_info&ticket_id=113482 113482]).<br />
<br />
'''UCL Tickets''' - Ben is starting to move to close these, some are going to be "unsolved".<br />
<br />
'''GLASGOW'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113095 113095] (17/4)<br /><br />
Andrew asks if the timeframe for the move to Condor be added to this ticket, for the ROD team's information. On Hold (7/4)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112948 112948] (10/4)<br /><br />
No news on this 100IT ticket for a while. In progress (27/4)<br />
<br />
'''Friday 1st May'''<br /><br />
The Bank Holiday weekend might muck up plans for a Ticket review this week. Just in case, some links!<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios page.]<br /><br />
<br />
[http://tinyurl.com/l784bbg UK GGUS Tickets]<br />
<br />
Hope you all have a nice weekend!<br />
<br />
A quick check of the [https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios] page.<br />
<br />
26 Open UK tickets this week.<br />
<br />
'''ITWO Decommissioning'''<br /><br />
Three of the tickets are to the VOMS sites (Manchester, Oxford, IC), concerning the decommissioning of the ITWO VO. Just an FYI to y'all.<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113293 113293] (26/4)<br /><br />
There was an APEL problem last month where a lot of sites needed to republish their data for the month. I think Edinburgh are the only UK site that suffered this problem, but another FYI ticket. Assigned (26/4) '' And solved''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113181 113181](21/4)<br /><br />
Atlas production jobs not running at ECDF. Andy noticed that analysis jobs were running fine, and believes that this might be a problem scheduling pilots in time. Perhaps a multicore issue if this is only effecting (affecting?) production jobs. In progress (22/4) ''Update - solved''<br />
<br />
'''ATLAS GLEXEC HAMMERCLOUD PROBLEMS'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (TIER 1)<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=111703 111703] (RALPP)<br /><br />
It was discovered that there was a problem in the test code, so the ball is very much in atlas' court for this one. The problem has been fixed and the tests are being rebuilt and resubmitted.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (28/3)<br /><br />
ATLAS FTS failures too RAL. A rucio issue causing double-transfers has been discovered ([https://its.cern.ch/jira/browse/ATLDDMOPS-4939 here]), which would explain the behaviour seen. No news since this revelation. In progress (16/4)<br />
<br />
There are a number of other Tier 1 tickets that could do with either an update or On Holding<br />
<br />
'''Monday 20th April 2015, 14.30 BST'''<br /><br />
24 Open UK tickets this week, only a light review. ''Update - down to 20 open tickets as of this morning''<br />
<br />
'''NGI/TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113150 113150] (20/4)<br /><br />
Fresh in - the NGI has been ticketed to change the regional VO from emi.argus to ngi.argus in the gocdb. Seems a bit pedantic, but hey! I assigned it to the NGI ops, and notified RAL as keepers of the regional argus. Assigned (20/4) ''Update - solved''<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113035 113035] (14/4)<br /><br />
Just for people's interest, the ticket tracking the decommissioning of the last of the RAL CREAM CEs. In progress (14/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112819 112819] (2/4)<br /><br />
A SNO+ ticket I must of somehow missed last week, concerning SNO+'s manual renewing of proxies on ARC machines. Matt M has noticed that ArcSync occasionally hangs rather then timeouts smoothly (although he later notes that he doesn't see the initial problems working from a different network). I'm thinking that this should be redirected at the arc devs, but I don't think they have a GGUS support group (I could be wrong, I'm well behind on the ARC curve). In progress (7/4)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113110 113110] (17/4)<br /><br />
Looks like this atlas low transfer efficiency ticket can be closed. Waiting for reply (20/4) ''Update - solved''<br />
<br />
'''GLASGOW'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113095 113095] (17/4)<br /><br />
ROD ticket for some BDII misreporting at Glasgow. The botheration seems to be ephemeral in nature, the blunders passing with the abating of their batch system's burden. This ticket can probably be solved. In progress (17/4)<br />
<br />
<br />
'''Monday 13th April 2015, 14.00 BST'''<br /><br />
24 Open tickets this week - going over all of them this week, site by site.<br />
<br />
''Fresh in this morning - [https://ggus.eu/?mode=ticket_info&ticket_id=113010 113010] and [https://ggus.eu/?mode=ticket_info&ticket_id=113011 113011] - Sno+ tickets concerning the RAL and Glasgow WMSes not updating job statuses.''<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703](11/2)<br /><br />
Atlas glexec hammercloud tests failing. There's been a lot of waiting on atlas to build new HC jobs. The most recent exchange (delayed due to Easter), was asking about SELinux - but no news since the first. In progress (1/4)<br />
<br />
'''BIRMINGHAM'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112875 112875](7/4)<br /><br />
Low availability ROD ticket. Availability is crawling back up, just need it to go green. On hold (13/4)<br />
<br />
'''GLASGOW'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112967 112967](10/4)<br /><br />
Another ROD ticket for bdii errors at Glasgow. Gareth has been doing everything right investigating this. Kashif recommended ticketed the midmon unit, but Gareth has spotted that the errors correspond to high load on their ARC CE - so it might be a site problem after all - Gareth asks for clarification. Waiting for reply (13/4)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (1/7/13)<br /><br />
Tarball glexec ticket. No news (sorry). End of April I believe was the "deadline" I set for having this made. On Hold (9/3)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/14)<br /><br />
Lancaster's poor perfsonar performance. I'm not believing quite what I was seeing with the tests I performed so I'm aiming to rerun them. On hold (13/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13)<br /><br />
Lancaster's tarball glexec ticket. Same as ECDF. On hold (9/3)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112966 112966] (13/3)<br /><br />
A ROD cream job submit ticket, freshly assigned this afternoon. It's a bit mean of me to bring notice to it. Assigned (13/4) ''And POW, Raul closed this after kicking torque into shape - solved''<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112948 112948] (10/4)<br /><br />
100IT needed to upgrade to the latest CA release. They've done this, but there are still authentication problems. In progress (13/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356] (10/9/14)<br /><br />
Deploying vmcatcher at 100IT. After David's questions falling on deaf ears for a while it has been advised that the ticket be closed as this issue will be dealt with elsewhere. Whether or not it is to be "solved" or "unsolved" is open to debate! In progress (can possibly be closed) (13/4)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA tests failing at RAL. After a lot of work and new xrootd redirectors problems persist. It's looking to be a problem that needs the CASTOR and/or xrootd devs to look at. In progress (30/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112713 112713] (27/3)<br /><br />
CMS asking to clean up the "unmerged area". Andrew conjured up a list of files and asked if they could be deleted - CMS responded with a "yes please then close the ticket". Has the deed been done? In progress (31/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694] (28/10/14)<br /><br />
The Sno+ gfal copy ticket. Matt M still sees gfal-copy hang for files at RAL when he uses the GUID (SURL works). A Castor oddity perhaps? Matt asks a question about what problems like this (coupled with the move away from lcg tools) will mean for VOs that rely on the LFC. In progress (31/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112977 112977] (10/3)<br /><br />
CMS high job failure rate at RAL. Related to 112896 (below) - the jobs all want that file! In progress (13/3)<br />
<br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=112896 (9/4)<br /><br />
CMS Dataset access problems - caused by over a million access attempts on a single file over a 18 hour period. Andrew L comments that CMS needs to have a think about how they access pileup datasets. In progress (9/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (10/2)<br /><br />
Tier 1 counterpart to 111703. A new HC stress test was submitted near the end of March, but no news on how it did. In progress (23/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112866 112866] (7/4)<br /><br />
A different "lots of CMS job failures" ticket. Again a "hot file" seems to be the root cause. In progress (7/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (28/3)<br /><br />
An atlas file access ticket, seemingly caused by some odd FTS behaviour. No answers to Shaun's question about this odd occurrence or much noise at all till today. Waiting for reply (13/4)<br />
<br />
'''UCL'''<br /><br />
UCL has 6 tickets - 4 just "assigned". I'll just list them in the interests of brevity.<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112371 112371] (ROD low availiability, On Hold)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112841 112841] (atlas 0% transfer efficiency, assigned)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112873 112873] (ROD srm put failures, assigned)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95298 95298] (glexec ticket, on hold)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112722 112722] (atlas checksum timeouts, in progress)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112966 112966] (ROD job submit failures, assigned)<br />
<br />
'''Tuesday 7th April'''<br />
* 20 open tickets. [http://tinyurl.com/l784bbg Link to GGUS].<br />
<br />
'''Monday 23rd March 2015, 15.30 GMT'''<br /><br />
19 Open tickets this week. <br />
<br />
'''BIRMINGHAM'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=112550 (23/3)<br /><br />
A ticket fresh off the ROD dashboard - the Birmingham CREAMs aren't being matched ("BrokerHelper: no compatible resources"). Matt W has double checked their setup and can't spot anything wrong - they've been running "normal" atlas/lhcb etc jobs fine over the last few weeks. Any advice appreciated. In progress (23/3)<br />
<br />
'''TIER 1'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=112495 (21/3)<br /><br />
MICE problems running jobs at RAL, which Andrew L discovered coincided with WMS problems that he fixed. Probably should be in "Waiting for Reply/Seeing if the problem's evaporated". In progress (23/3)<br />
<br />
https://ggus.eu/?mode=ticket_info&ticket_id=112350 (14/3)<br /><br />
The cause of this Sno+ ticket, about a recent user not being able to access files due to not being in the gridmap, has been discovered. As Robert F sagely pointed out the latest version of the mkgridmap rpm is required to talk to the voms server. Just waiting on the time for it to updated at now. In progress (17/3)<br />
<br />
'''100IT'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=108356 (10/9/14)<br /><br />
This 100IT ticket is still waiting for a reply (since mid-January). The question needs to be answered by someone familiar with the technologies and terminologies? Is anyone up on vmcatcher? Anyone know what other channel to pass David's query onto? Waiting for reply (15/1)<br />
<br />
'''Monday 16th March 2015, 15.30 GMT'''<br /><br />
<br />
16 Open UK tickets this week. Half Red, Half Green.<br />
<br />
'''RALPP and TIER 1 glexec HC tickets'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703] (RALPP)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (TIER 1)<br /><br />
No news on these tickets since it was expected that a new stable HC job would be released last Tuesday. All very quiet.<br />
<br />
'''OXFORD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112011 112011](25/2)<br /><br />
Similar for this CMS glexec ticket - no news after Kashif asked for some more information way back. Waiting for reply (27/2)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694](28/10/14)<br /><br />
The Sno+ gfal copying ticket. A lot of people are working on this, and attempts to recreate the problems seem to occasionally be devolving into "did we get this complicated command right?". At some point it might be necessary to get the gfal devs involved (are there gfal devs?). Waiting for reply (11/3)<br />
<br />
Also there's the poor 100IT ticket, still waiting for a reply. The JET ticket also needs wrapping up, I put a reminder in to the end of it.<br />
<br />
<br />
'''Monday 9th March 2015, 15.00 GMT'''<br/ ><br />
From last week's crusty ticket round up:<br />
<br />
'''Tier 1'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694]- 28/10/14<br/ ><br />
Matt M has managed to reclaim his tickets after a certificate change orphaned his old ones. Progress has resumed. Duncan has asked Matt to retry his failing tests with a simple copy to local disk example.<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944]- 1/10/14<br/ ><br />
CMS AAA access at RAL. Andrew posed a question to the CMS xroot experts last week - if you have their details it might be a good idea to involve them in the ticket.<br />
<br />
'''100IT'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356]- 10/9/14<br/ ><br />
This VMCatcher ticket is still stuck waiting for a reply. Deafening silence for our 100IT colleagues. <br />
<br />
'''EFDA-JET'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=97485 97485] - 21/9/13<br/ ><br />
The Jet LHCB ticket is in the state of being wrapped up. LHCB have been removed from the local configs, and the site has been removed from LHCB's. I believe that this ticket can be terminated.<br />
<br />
'''QMUL'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353] - 25/11/14<br/ ><br />
Dan's managed to get webdav working on one SE, but not t'other. Very strange, but Dan is investigating (see also [https://ggus.eu/?mode=ticket_info&ticket_id=111942 111942]).<br />
<br />
No movement on the three glexec tickets (none expected on the two tarball ones in the last week though), the Lancaster perfsonar ticket is still waiting on another batch of local tweaks (and I still need to make sense of what I'm seeing). Matt RB closed Sussex's perfsonar ticket though - nice one.<br />
<br />
'''The "Normal" tickets:'''<br />
<br />
Atlas gLexec Hammercloud failures (RALPP and Tier 1)<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (Tier 1)<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703] (RALPP)<br/ ><br />
These tests were waiting on a new stable job release being made - this has been delayed (hopefully out tomorrow).<br />
<br />
'''TIER 1'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111856 111856](19/2)<br/ ><br />
This LHCB ticket about stalled jobs looks like it can be closed (LHCB no longer see a problem). ''Update - set to solved, the jobs were being killed for using too much memory.''<br />
<br />
'''OXFORD'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112011 112011](25/2)<br/ ><br />
A CMS user saw glexec failures on some nodes - Kashif asked for some more information but there has been no reply. I'd consider giving the user till the end of the week then closing the ticket if there's still no word. Waiting for reply (27/2)<br />
<br />
'''Tuesday 3rd March'''<br />
* [http://tinyurl.com/nwgrnys A link to GGUS open tickets] for checking status directly.<br />
<br />
Concentrating on pre-2015 tickets this week in an attempt to Spring Clean the UK ggus presence. I will review these again next week - can people please take a look at these tickets if they're owned by them (or if they think they can help!).<br />
<br />
'''SUSSEX - 26/11/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389]<br /><br />
This is a perfsonar ticket - the initial request (reinstalling the perfsonar node) has been done a while ago but things weren't quite right. Matt RB did some soothing to this last week and asks if he's missed anything - I put it to waiting for reply this morning. Waiting for reply (24/2)<br />
<br />
'''QMUL - 25/11/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353]<br /><br />
Atlas wanting https access on QM's SE. Dan's been working on this nicely, carefully testing each stage of his rollout. The end is in sight here. In progress (17/2)<br />
<br />
'''TIER 1 - 28/10/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694]<br /><br />
This is a SNO+ ticket about getting gfal tools working for the Tier 1 - with the new version out Brian has tested it correctly (and I saw a related thread on lcg-rollout) - but no word from Sno+. Who is wrangling the other VOs in these post-Walker times? Waiting for reply (24/2)<br />
<br />
'''TIER 1 - 1/10/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944]<br /><br />
A CMS access about AAA tests at the Tier 1. This ticket is being actively worked on, with a new xrootd redirector at RAL and problems with the EU redirectors mucking things up. No problems that I can see. Waiting for reply (2/3)<br />
<br />
'''100IT - 10/9/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356]<br /><br />
Getting VMcatcher stuff to work at 100IT. This ticket seems to keep stalling due to lack of documentation or replies from the submitters. Waiting for reply (19/1)<br />
<br />
'''LANCASTER - 27/1/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566]<br /><br />
Lancaster's poor perfsonar performance. Being poked and prodded on and off over the last year, but the problems remain a mystery - Ewan's lending of a iperf endpoint has helped out greatly though, waiting on yet another network tweak. On Hold (23/2)<br />
<br />
'''EFDA-JET - 21/9/13'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=97485 97485]<br /><br />
LHCB job failures at EFDA-JET. The causes of this remain a mystery, and is the first ticket on my "to be set to unsolved" list.<br />
<br />
'''UCL - 1/7/13'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95298 95298]<br /><br />
UCL's glexec ticket. Ben's been working hard at this recently, but keeps hitting show stoppers - the latest being a performance problem possibly due to the VM he's running argus on. On hold (19/2)<br />
<br />
'''ECDF and LANCASTER - 1/7/13'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (ECDF)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (Lancaster)<br /><br />
glexec for the tarball. This is *still* waiting on the tarball glexec, which is again waiting on me, which is waiting on me magicking some extra tarball development time. Will be reviewed by the end of March.<br />
<br />
<br />
'''Monday 23rd February 2015, 15.00 GMT'''<br /><br />
Only 15 Open UK tickets this week. Feel free to bring up any ticket-based issues of your own on this quiet week.<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703](11/2)<br /><br />
This CMS glexec hammercloud ticket is looking a little quiet - no update for a while. If it's continuing offline or waiting on input could it at least be put On Hold? In progress (11/2)<br />
(The Tier-1 version of this ticket, 111699, seems to be chugging along fine - there might be useful snippets in there).<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333](22/1)<br /><br />
The Cloud accounting probe ticket was reopened, asking if 100IT ticketed apel support (I assume contacting them via other means would work too) otherwise the new cloud accounting won't be properly republished. Reopened (20/2)<br />
<br />
<br />
'''IMPERIAL''' (but not really their issue)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111872 111872](20/2)<br /><br />
Tom opened a ticket after another cern@school user had troubles using the IC SE - there has been some problems with the newer versions of the dirac UI. Sometimes it's better to go Vintage! Although after trying Simon and Daniela haven't been able to reproduce the failure - perhaps something's up with the user's UI? Waiting for reply (23/2)<br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389](26/11/14)<br /><br />
Sussex's Perfsonar ticket. I know Matt RB has put the ticket On Hold and is very busy - is there any news/anything we can do to help? On Hold (21/1<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694](28/10/14)<br /><br />
Sno+ gfal copy problems at RAL. Brian informs us that the latest version of the gfal tools works for him and has asked if they work for Matt M. and Co. Did you get these packages out of epel/epel-testing or somewhere else Brian? Waiting for reply (18/2)<br />
<br />
'''Monday 16th February 2015, 14.30 GMT'''<br /><br />
Only 19 open UK tickets today.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111120 111120](12/1)<br /><br />
An atlas ticket concerning transfer failures between RAL and BNL. Brian mentioned last week that the lack of recent failures is due to atlas not attempting to transfer any older data recently. Perhaps this could do with being put into the ticket (and the ticket being put On Hold, or prodded some more)? Waiting for reply (29/1)<br />
<br />
Also<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=111800 111800] (17/2)<br /><br />
ARC CE issues at RAL detected. <br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703](11/2)<br /><br />
Atlas running glexec hammerclouds - having trouble at RALPP (and RAL, see 111699). The glexec experts have gotten involved on this one, and asked to take a peek at a proxy - not sure about anyone else, but I'd feel a tad uncomfortable sharing proxies, even with known and trusted experts as in this case. Am I being overly paranoid? Either way the ticket has gone a bit quiet. In progress (11/2) <br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356] & [https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333]<br /><br />
Both the 100IT tickets are in Waiting for reply - the oldest one for quite a while - David asked a question a while back and no answer. The newest one asks if the 100IT logs made it to apel safely - I think what David has to do is submit a ticket with this question to the apel support team - have I got the right end of the stick? <br />
<br />
'''Biomed Tickets at Manchester and Imperial'''<br /> <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356] & [https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357]<br /><br />
FYI There's a note at the bottom of both of these tickets that the version of CREAM that should fix this has been delayed until the end of February(ish).<br /><br />
''Update - I read those updates wrong - the cream update has been released and these tickets have been (perhaps erroneously) closed.'' <br />
<br />
<br />
<br />
'''Monday 9th February 2015, 15.00 GMT'''<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios Results]<br /><br />
At the time of writing the only site showing red that aren't suffering an understood problem was RALPP with org.nordugrid.ARC-CE-submit and SRM-submit test failures for gridpp, pheno, t2k and southgrid for both its CEs and its SE. The failures are between 1 and 12 hours old, so it doesn't seem to be a persistent failure, but it seems to be quite consistent. They all seem to be failing with "Job submission failed... arcsub exited with code 256: ...ERROR: Failed to connect to XXXX(IPv4):443 .... Job submission failed, no more possible targets". Anyone seen something like this before?<br />
<br />
Only 20 Open UK tickets this week.<br />
<br />
'''Biomed tickets:'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356] (Manchester)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357] (Imperial)<br /><br />
Biomed have linked both these tickets as children of 110636, being worked on by the cream blah team. AFAIKS no sign of Cream 1.16.5 just yet.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111347 111347] (22/1)<br /><br />
CMS consistency checks for January 2015. It looks like everything that was asked of RAL has been done by RAL, so hopefully this can be successfully closed. In progress (3/2)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111120 111120] (12/1)<br /><br />
Another ticket, this time concerning a period of Atlas transfer failures between RAL and BNL, that looks like it can be closed as the failures seem to have stopped (and might well have been at the BNL end). Waiting for reply (22/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA test failures at RAL. Federica can't connect to the new xrootd service according to the error messages. No news for a while. In progress (29/1)<br />
<br />
'''100IT'''<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356]<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333]<br /><br />
Both of these 100IT tickets are looking a bit crusty - the first is waiting for advice, the second was just put "In progress".<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353] (25/11/14)<br />
Dan has set up se02.esc.qmul.ac.uk to test out the latest https-accessible version of storm for dteam and atlas. As a cherry on top this node is also IPv6 enabled. I'm not sure if Dan wants others in the UK to "give it a go"? In progress (6/2)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/14)<br /><br />
(Blatantly scounging for advice) Trying to figure out why Lancaster's perfsonar is under-performing. Ewan kindly gave us access to a iperf endpoint and it's been very useful in characterising some of the weirdness - although I'm still confused. Ewan also gave us a bunch of suggestions for testing that have been useful - next stop, window sizes. If anyone else wants to throw advice to me all wisdom donations are thankfully accepted. My advice for others in be careful trying to connect to the default iperf port on a working DPM pool node.... In Progress (9/2)<br />
<br />
'''Monday 2nd February 2015, 14.00 GMT'''<br /><br />
22 Open UK tickets this month. <br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389] (26/11/14)<br /><br />
A perfsonar ticket for Sussex. Their perfsonar has been reinstalled, but needs soothing. Matt has informed us that this might have to wait a few weeks due to other issues. On Hold (21/1)<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110536 110536] (2/12/14)<br /><br />
MICE job failures at RALPP - it looked like they were dying due to running of of memory. The queues have been tweaked to give MICE more, but no word from the MICE if this has solved the problem. Waiting for reply (12/1)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110365 110365] (25/11/14)<br /><br />
Another perfsonar ticket. Again the node is reinstalled, just not quite working right. Winnie is waiting for news from the other sites in a similar boat. In progress (maybe On Hold it?) (20/1)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111118 111118] (12/1)<br /><br />
ECDF "low availability" ticket - just waiting for the silly alarm to clear. Daniela submitted a ticket about this foolish alarm a while ago - [https://ggus.eu/?mode=ticket_info&ticket_id=107689 107689]. On Hold (19/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (1/7/13)<br /><br />
glexec tarball ticket. With my tarball hat on - still no positive news on this front - it's beginning to look like this can't be done but we're having one last go. Sorry! On Hold (19/12)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110225 110225] (18/11/14)<br /><br />
Change of VO Manager for helios-vo.eu. It looks like this ticket is being held up at the user end a lot. I'm not sure there's anything we can do as it involves outside CAs. On Hold (20/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356] (23/1)<br /><br />
One of Manchester's CEs not working for biomed, due to problems with the new CREAM/old WMS communication. Alessandra gave biomed some sagely advice, but I suspect this ticket will need to be prodded soon to get a reponse from biomed (who I agree should use a newer WMS and close it). On Hold (26/1)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111547 111547] (2/2)<br /><br />
I'm reporting on a ticket that I submitted to myself today. I'm not sure what that says about the world. Anyway - a ticket to track the decommissioning of one of Lancaster's CEs, as we try to do it all proper like. On Hold (2/2)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (21/1/14)<br /><br />
Lancaster's perfsonar ticket, which I sadly let reach its first birthday. I've been prodding this offline, does anyone have the address for a regular, open iperf endpoint I could borrow? On Hold (9/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13)<br /><br />
Lancaster's tarball glexec ticket, as the ECDF one. On hold (26/1)<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13)<br /><br />
UCL's glexec ticket. They've been having trouble getting it to behave, and at last check Ben was off ill - probably due to dealing with glexec :-) On Hold (20/1)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353] (25/11/14)<br /><br />
Atlas asking for QM's storage to be made available via https. Waiting on a production ready STORM that can provide this - Dan is trying it out on his testbed se02.esc.qmul.ac.uk, which still needs tweaking. In progress (28/1)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357](23/1)<br /><br />
One of the IC CEs not working for biomed. Similar to the Manchester ticket, Daniela points to ticket 110635 and is waiting on an EMI release to fix it (due out imminently AIUI). On Hold (28/1)<br />
<br />
'''EFDA-JET'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=97485 97485] (21/9/13)<br /><br />
Jet's LCHB job failure tickets. I'm afraid I haven't been able to chase this up (partly due to only ever remembering on the first Monday of the month) - there's been no news for a while. On Hold (1/10/14)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333] (22/1)<br /><br />
A ticket to 100IT and the NGI to get the cloud accounting probe upgraded. I notified 100IT, but forgot to reassign the ticket - thanks to Jeremy for doing it. Assigned (2/2)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356](10/9/14)<br /><br />
Getting VMcatcher working at 100IT. David from 100IT has asked for some answers on which "glancepush" to use, but no reply for a while. Waiting for reply (19/1)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111477 111477](29/1)<br /><br />
CMS would like to run some staging tests to warm up for Run2. The Tier 1 warned CMS of today's outage and they're happy to proceed tomorrow (the 3rd) - I think they'd like a response. In progress (30/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=107935 107935](27/8/14)<br /><br />
A ticket regarding inconsistent BDII and SRM storage numbers. Waiting on a fix from the developers regarding read-only disk accounting (I think), Brian is still on the case. Stephen B let us know that Maria the ticket submitter is on maternity leave, and asks in her stead if the numbers are expected to align now. On hold (28/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111120 111120](12/1)<br /><br />
An atlas ticket about a large number of data transfer errors seen between RAL and BNL. Brian reckoned that this was due to shallow checksums on the old data being transferred, but had trouble looking at the BNL FTS. Regardless, the ADCoS shifter hadn't seen any errors for a week and suggests the ticket can be closed. Waiting for reply (29/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944](1/10/14)<br /><br />
CMS AAA test problems at RAL. After setting up a new xrootd box the test failures have changed in nature, but sadly they're still failures. In progress (29/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111347 111347](22/1)<br /><br />
CMS Consistency Check for RAL, January 2015 edition. Filelists were generated, orphan files were identified, then purged. Just need to know what CMS want to do next. Waiting for reply (26/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694](28/10/14)<br /><br />
Sno+ ticket concerning gfal tool problems, waiting on the new release to come out (middle of this month I believe). If you don't want to wait that long then I believe the 2.8 gfal2 tools can be found in the fts3 repo at last check. On hold (20/1)<br />
<br />
<br />
'''Monday 26th January 2015, 14.15 GMT'''<br /><br />
''Back after being forgotten about by me:''<br /><br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios Status:''']<br />
<br />
At the time of writing I see:<br /><br />
'''Imperial:''' gridpp VO job submission errors (but only 34 minutes old so probably naught to worry about).<br /><br />
'''Brunel:''' gridpp VO jobs aborted (one of these is 94 days old, so might be something to worry about).<br /><br />
'''Lancaster:''' pheno failures (I can't see what's wrong, but this CE only has 10 days left to live).<br /><br />
'''Sussex:''' snoplus failures (but I think Sussex is in downtime).<br /><br />
'''RALPP:''' A number of failures across a number of CEs, all a few hours old. An SE problem?<br /><br />
'''Sheffield:''' gridpp VO job submission failure, but only 6 hours old.<br />
And of course the srm-$VONAME failures at the Tier 1, which are caused by incompatibility between the tests and Castor AIUI. Things are generally looking good.<br />
<br />
22 Open UK Tickets this week. <br /><br />
'''NGI/100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333](22/1)<br /><br />
The NGI has been asked to upgrade the cloud accounting probe, and then notify our (only at the moment) cloud site to republish their accounting. Not entirely sure what this entails or who this falls on, I assigned it to NGI-OPERATIONS (and also noticed that 100IT isn't on the "notify site" list - odd). Assigned (22/1)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944](1/10/14)<br /><br />
CMS AAA test failures. Andrew Lahiff reported last week that the Tier 1 is building a replacement xrootd box which is currently being prepared. If that will take a while can the ticket be put on hold? In progress (19/1)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353](25/11/14)<br /><br />
An atlas ticket, asking for httpd access to at QMUL. The QM chaps were waiting on a production ready Storm that could handle this, and are preparing to test one out. This is another ticket that looks like it might need to be put On Hold (will leave that up to you chaps - there's a big difference between "slow and steady" progress and "no progress for a while"). In progress (21/1)<br />
<br />
'''RHUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111355 111355](23/1)<br /><br />
A dteam ticket - concerning http access to RHUL's SE. Although the initial observation about the SE certificate being expired was incorrect (the expiry date was reported as 5/1/15, which to be fair I would read as the 5th of January and not the 1st of May!) there still is some underlying problem here with intermittent test failures. Also this ticket raises the question of under what context are these tests being conducted? Anyone know, or shall we ask the submitter? In progress (26/1)<br />
<br />
'''BIOMED PROBLEMS:'''<br /><br />
'''Manchester:''' [https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356](23/1)<br /><br />
'''Imperial:''' [https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357](23/1)<br /><br />
Biomed are having job problems, looking to be caused by using crusty old WMSes to communicate with these site's shiny up-to-date CEs. According to ticket 110635 a cream side fix should be out by the end of January (CREAM 1.16.5), although Alessandra suggests that Biomed should try to use newer, working WMSes - or Dirac instead!<br />
<br />
<br />
'''Monday 19th January 2015, 14.30 GMT'''<br /><br />
23 Open UK Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944](1/10/14)<br /><br />
CMS seeing AAA test failures at RAL. The tests have been restarted recently and now seem to be having some suspicious looking authentication failures. In Progress (13/1)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111162 111162](14/1)<br /><br />
Atlas complaining about httpd doors not working on Sheffield's SE. After schooling the submitter in how to submit more useful information Elena is working on it. I bring this up as in the last few days I've had quite a few of my pool nodes have their httpd daemons crash on them (they're up to date, but still SL5), which may or may not be related. In Progress (19/1)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111118 111118](12/1)<br /><br />
ECDF "low availability" ticket after a few days of argus trouble, which Wahid fixed. Now the ticket will languish for a few weeks as the alarm clears. Daniela has reminded us of her ticket against these fairly silly alarms: https://ggus.eu/?mode=ticket_info&ticket_id=107689 . In the mean time this ticket could do with being put On Hold whilst the alarm clears. In progress (19/1)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356](10/9/2014)<br /><br />
Setting up VMCatcher at 100IT. After some troubles things seem to have be looking up, although there are still some questions that the 100IT chaps have for the configurations and what they should be using that aren't getting answers. I set the ticket to "Waiting for Reply" hoping that this will help get those in the know's attention. Waiting for reply (15/1)<br />
<br />
'''Perfsonar Tickets'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389](Sussex)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110382 110382](TIER 1)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108273 108273](Durham)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566](Lancaster)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110365 110365](Bristol)<br /><br />
Everyone seems to have updated their perfsonar hosts, so we're all good on that front, but a number of sites are either having trouble with their reinstalled hosts, or are having problems that they had pre-reinstall still haunt them. I'm afraid I have no suggestions of what to do about the growing number of these tickets though!</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Operations_Bulletin_LatestOperations Bulletin Latest2015-11-16T16:42:08Z<p>Matthew Doidge 1ac9bd3994: /* */</p>
<hr />
<div>[[Operations_Bulletin_Archive|Bulletin archive]]<br />
<br />
__NOTOC__<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Week commencing 16th November 2015<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Task Areas<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start General****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | General updates <br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start General text*********************** ----->'''<br />
'''Tuesday 10th November'''<br />
* There was a WLCG GDB last week: [http://indico.cern.ch/event/319753/ Agenda]. [https://twiki.cern.ch/twiki/bin/view/LCG/GDBMeetingNotes20151104 Minutes].<br />
* ATLAS is going to move the brokering to use maxrss value rather than maxrss+maxswap (called maxmemory in the ATLAS panda queues). [https://twiki.cern.ch/twiki/bin/view/LCG/BSPassingParameters Reminder notes].<br />
* There is an EGI meeting in Bari this week. The OMB agenda/slides can be seen [https://indico.egi.eu/indico/contributionDisplay.py?contribId=1&confId=2675 here]. <br />
* An update from John Gordon on the CPU efficiency accounting discussion of last week."Stuart has the database merge under way now. We had hoped it would be done by the end of October but it isn’t complete yet. When done and we send an integrated dataset to the portal they will put the current dev view in the production portal alongside the current one. They are currently developing a major rewrite of the portal and don’t want to mess about with it too much."<br />
*J Perkin: Weird user error when reading LFC. Led to a reminder that the CA team are currently best efforts - please be patient.<br />
* J Hill: ATLAS consistency checks at Tier2s requires sites to upload stiorage dumps - please can we have a clear statement of what parameters ATLAS wants us to use for these dumps?<br />
* Simon G: Asked about Tier3 access to Tier2 storage.<br />
* LCG-ROLLOUT: TOP BDII issues with CentOS 6.7 (openldap-servers-2.4.40-5.el6.x86_64). It basically breaks but is being followed up with RH.<br />
* For those interested in the ARGUS working group meeting last week please see their [https://indico.cern.ch/event/459898/ summary].<br />
* There was a GridPP Technical Meeting last Friday: [https://indico.cern.ch/e/460437 Agenda].<br />
<br />
'''Tuesday 3rd November'''<br />
* There is a GDB this week: [https://indico.cern.ch/event/319753/ Agenda].<br />
* (re)introduction of the STRICT_RFC2818 mechanism in Globus.... See Jens's comments on TB-SUPPORT.<br />
* Re-allocation of space for ATLAS as a result of cleanup and removal of the ATLASPRODDISK space token.<br />
* Time out of Nagios glexec probe.<br />
* The GridPP hardware allocations are just about final so expect figures very soon. Purchases are this financial year.<br />
* DPM workshop 2015 7th-8th Dec at CERN - [https://indico.cern.ch/event/432642/ Registration is open].<br />
* The [http://wlcg-sam.cern.ch/reports/2015/201510/wlcg/ October WLCG A/R] figures are now available<br />
** [http://wlcg-sam.cern.ch/reports/2015/201510/wlcg/WLCG_All_Sites_ALICE_Oct2015.pdf ALICE]. All okay.<br />
** [http://wlcg-sam.cern.ch/reports/2015/201510/wlcg/WLCG_All_Sites_ATLAS_Oct2015.pdf ATLAS]. <br />
*** QMUL: 85%:85%<br />
*** Lancaster: N/a:N/a?<br />
*** Liverpool: 85%:100%<br />
*** Sheffield: 78%:78%<br />
**[http://wlcg-sam.cern.ch/reports/2015/201510/wlcg/WLCG_All_Sites_CMS_Oct2015.pdf CMS]. All okay.<br />
** [http://wlcg-sam.cern.ch/reports/2015/201510/wlcg/WLCG_All_Sites_LHCB_Oct2015.pdf LHCb] <br />
*** QMUL: 79%:79%<br />
*** Liverpool: 86%:100%<br />
*** Sheffield: 86%:86%<br />
*** RAL PPD: 79%:79%<br />
* Exploring options for VM based sites (in respect of the monitoring within EGI): Perhaps setup a 'community platform'.<br />
* RCUK Cloud Working Group - a first [http://bit.ly/cloudwgdec15 workshop on the 1st December] at Imperial College.<br />
* From the MB last week: <br />
** Memory Requirements: LHC experiments all basically agreed that 2GB/core was the baseline but that some (advertised) resources with up to 4GB/core would be valuable for some workflows.<br />
** February as kick-off for technical evolution groups.<br />
** PCP - Pre-commercial-procurement and HNSciCloud. This is approved and starts January-16. UK has small involvement. The plan is to build on the hybrid cloud service that results, in order to deploy a European Open Science Cloud funded from the INFRADEV-04 (2016) call <br />
<br />
<br />
<!-- **********************End General text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End General****************** -----><br />
<!-- ****************Start ops coord****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Operations Coordination - [https://indico.cern.ch/categoryDisplay.py?categId=4372 Agendas][https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsCoordination Wiki Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start ops coord text*********************** -----><br />
<br />
'''Tuesday 10th November'''<br />
* There was a WLCG ops coordination meeting last Thursday: [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151105 Minutes]. (Alessandra chaired and might be able to talk over the main items at our ops meeting).<br />
<br />
'''Tuesday 27th October'''<br />
* 13th MW Readiness WG meeting THIS Wed 28/10 @ 4pm CET in CERN room 28-S-023 or via vidyo.<br />
* There was a WLCG ops coordination meeting last Thursday: [https://indico.cern.ch/event/393618/ Agenda] - [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151022 Minutes]. <br />
<br />
'''Next Meeting is scheduled for Thursday 22nd October'''<br />
<br />
'''Tuesday 6th October'''<br />
* There was a WLCG ops coordination meeting last week. [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151001 Minutes]. [https://indico.cern.ch/event/393617/ Agenda] (which has John Gordon's accounting slides).<br />
* The highlights:<br />
** dCache sites should install the latest fix for SRM solving a vulnerability<br />
** All sites hosting a regional or local site xrootd should updgrade it at least to version 4.1.1<br />
** CMS DPM sites should consider upgrading dpm-xrootd to version 3.5.5 now (from epel-testing) or after mid October (from epel-stable) to fix a problem affecting AAA<br />
** Tier-1 sites should do their best to avoid scheduling OUTAGE downtimes at the same time as other Tier-1's supporting common LHC VOs. A calendar will be linked in the minutes of the 3 o'clock operations meeting to easily find out if there are already downtimes at a given date<br />
** The multicore accounting for WLCG is now correct for the 99.5% of the CPU time, with the few remaining issues being addressed. Corrected historical accounting data is expected to be available from the production portal by the end of the month<br />
** All LHCb sites will soon be asked to deploy the "machine features" functionality<br />
<br />
'''Tuesday 22nd September'''<br />
* There was an ops coordination meeting last Thursday: [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150917 Minutes].<br />
* Highlights:<br />
** All 4 experiments have now an agreed workflow with the T0 for tickets that should be handled by the experiment supporters and were accidentally assigned to the T0 service managers.<br />
** A new FTS3 bug fixing release 3.3.1 is now available.<br />
** A globus lib issue is causing problems with FTS3 for sites running IPv6.<br />
** The rogue Glasgow configuration management tool replacing the current configuration for VOMS with the old one was picked up and unfortunately discussed as though sites had not got the message about using the new VOMS.<br />
** No network problems experienced with the transatlantic link despite 3 out of 4 cables being unavailable.<br />
** T0 experts are investigating the slow WN performance reported by LHCb and others.<br />
** A group of experts at CERN and CMS investigate ARGUS authentication problems affecting CMS VOBOXes.<br />
** T1 & T2 sites please observe the actions requested by ATLAS and CMS (also on the WLCG Operations portal).<br />
* Actions for [https://wlcg-ops.web.cern.ch/sys-admins/tasks Sites]; [https://wlcg-ops.web.cern.ch/experiments/tasks Experiments].<br />
<br />
'''Tuesday 15th September'''<br />
* The [https://indico.cern.ch/event/393616/ next WLCG ops coordination meeting is this Thursday 17th September]. Are there any Tier-2 issues we wish to raise? Minutes will appear [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150917 here]. <br />
<br />
<br />
<!-- **********************End ops coord text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End ops coord****************** -----><br />
<br />
<!-- ****************Start T1****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-1 - [http://www.gridpp.rl.ac.uk/status/ Status Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start T1 text*********************** -----><br />
'''Tuesday 10th November'''<br />
A reminder that there is a weekly [http://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Tier-1 experiment liaison meeting]. Notes from the last meeting [https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2015-11-04 here]<br />
* We are investigating why LHCB batch jobs sometimes fail to write results back to Castor (and the sometimes fail to write remotely as well).<br />
* We are continuing with some detailed network changes needed to remove our old core switch from the network.<br />
<!-- **********************End T1 text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End T1****************** -----><br />
<br />
<!-- ****************Start Storage & DM****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Storage & Data Management - [http://storage.esc.rl.ac.uk/weekly/ Agendas/Minutes]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20151028-minutes.txt Wednesday 28 Oct]'''<br />
* Summary of "UK T0" workshop - GridPP well represented<br />
* Sites should not upgrade to DPM 1.8.10 just yet<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150902-minutes.txt Wednesday 02 Sep]'''<br />
* Catch up with [http://www3.imperial.ac.uk/highenergyphysics/research/experiments/mice MICE]<br />
* How to do transfers of '''lots''' of files with FTS3 without the proxy timing out (in particularly if you need it vomsified)<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150812-minutes.txt Wednesday 12 Aug]'''<br />
* sort of housekeeping: data cleanups, catalogue synchronisation - in particular namespace dumps for VOs<br />
* GridPP storage/data at future events; GridPP35 and Hepix and Cloud data events<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150708-minutes.txt Wednesday 08 July]'''<br />
* Huge backlog of ATLAS data from Glasgow waiting to go to RAL, and oddly varying performance numbers - investigating<br />
* How physics data is like your Windows 95 games<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150701-minutes.txt Wednesday 01 July]'''<br />
* Feedback on CMS's proposal for listing contents of storage<br />
* Simple storage on expensive raided disks vs complicated storage on el cheapo or archive drives?<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End Storage & DM****************** -----><br />
<!-- ****************Start T2EVO****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-2 Evolution - [https://its.cern.ch/jira/browse/GRIDPP/ GridPP JIRA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Monday 9 Nov'''<br />
* Vac 0.19.0 released last week: new VacQuery protocol<br />
* Prototyping Vcycle(/Vac) GLUE2 publishing: talk at WLCG InfoSys Evolution TF on Thursday<br />
* Please now use https://repo.gridpp.ac.uk/vacproject/ URLs for user_data and cernvm3.iso files (in preparation for move to new www.gridpp.ac.uk website.)<br />
* See Laurence's GDB talk "Helix Nebula Price Enquiry Results" for more use of Vcycle<br />
<br />
'''Tuesday 3 Nov'''<br />
* LHCb prototype of GOCDB pointers to resource BDII done<br />
* T2C tests at Oxford ongoing<br />
<br />
'''Tuesday 20 Oct'''<br />
* LHCb multipilot VMs now in production<br />
* Support for APEL-Sync records in Vac, but need to co-ordinate with APEL team to validate it. This is to allow pure-VM sites like UCL to pass APEL SAM tests (GRIDPP-10)<br />
* Last GridPP Technical Meeting decided to test disk-less operation at Oxford for CMS (GRIDPP-20) and LHCb (GRIDPP-21).<br />
<br />
'''Tuesday 6 Oct'''<br />
* UCL Vac site now running LHCb test of two payloads per dual processor VM. Total of dual processor VMs at UCL now 120.<br />
<br />
'''Tuesday 29 Sep'''<br />
* UCL Vac site updated with most recent version of Vac-in-a-Box. Now running ~216 jobs: LHCb MC and ATLAS certification jobs.<br />
* Drawing up list of tasks needed to be able to run a site for GridPP-supported VOs purely using VMs (e.g. VM certification by experiments etc.)<br />
* Discussion at GridPP Technical Meeting on storage options, including xrootd-based sites (i.e. xrootd not DPM/dCache)<br />
<br />
'''Tuesday 22 Sep'''<br />
* Fortnightly [https://indico.cern.ch/category/4454/ GridPP Technical Meetings] on Fridays will have Tier-2 Evolution discussions, starting on Fri 25 Sept.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End T2EVO****************** -----><br />
<!-- ****************Start Accounting****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Accounting - [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html UK Grid Metrics] [[HEPSPEC06]] [http://tinyurl.com/cfbfanf Atlas Dashboard HS06]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 3rd November'''<br />
* APEL delay (normal state) Lancaster and Sheffield.<br />
<br />
'''Tuesday 20th October'''<br />
The WLCG MB decided to create a Benchmarking Task force led by Helge Meinhard see [https://indico.cern.ch/event/433303/contribution/9/attachments/1154494/1658853/2015-09-15-LCGMB-Benchmarking.pdf talk]<br />
<br />
'''Tuesday 22nd September'''<br />
* Slight delay for Sheffield but overall okay - although there is a gap between today's date and the most recent update for all sites. Perhaps an APEL delay.<br />
<br />
'''Monday 20th July'''<br />
* Oxford publishing 0 cores from Cream today. Maybe they forgot to switch one off. [http://goc-accounting.grid-support.ac.uk/apel/jobs2_withsubmithost.html Check here]. <br />
<br />
* A reminder to keep updating the [https://www.gridpp.ac.uk/wiki/HEPSPEC06 HEPSPEC06 tables].<br />
<br />
* [http://accounting.egi.eu/egi.php?SubRegion=1.67&query=normcpu&startYear=2014&startMonth=8&endYear=2014&endMonth=9&yRange=SITE&xRange=VO&voGroup=lhc&chart=GRBAR&scale=LIN&localJobs=onlygridjobs APEL status]: An issue at Sheffield?<br />
<br />
<!-- *****************Edit stop*********************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End Accounting****************** -----><br />
<!-- ****************Start Documentation****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Documentation - [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=area KeyDocs]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- *********************************************************** -----><br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Friday 6th Nov, 2015'''<br />
<br />
SteveJ: Advice to admins published about a common GSS error, globus_gsi_callback_module: Could not verify <br />
credential etc.<br />
<br />
https://www.gridpp.ac.uk/wiki/Security_system_errors_and_workarounds<br />
<br />
'''Tuesday 20th October, 2015'''<br />
<br />
Approved VOs document updated with temporary section for [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs#Approved_VOs_in_the_process_of_being_established LZ]<br />
<br />
'''Tuesday 29nd September'''<br />
Steve J: problems with voms server at fnal, voms.fnal.gov, have been detected; I will resolve them soon and may issue an update to Approved VOs, alerting sites with TB_SUPPORT should that occur. Approved VOs potentially affected are CDF, DZERO, LSST. Please do not act act yet.<br />
<br />
'''Tuesday 22nd September'''<br />
* Steve J is going to undertake some GridPP/documentation usability testing. <br />
<br />
'''Tuesday 18th August'''<br />
* Lydia's document - Setup a system to do data archiving using FTS3<br />
<br />
'''Tuesday 28th July'''<br />
* Ewan: /cvmfs/gridpp-vo help ... there's a lot of historical stuff on the GridPP wiki that makes it look a lot more complicated than it is now. We really should have a bit of a clear out at some point.<br />
<br />
'''Tuesday 23rd June'''<br />
* Reminder that documents need reviewing!<br />
<br />
<br />
'''General note'''<br />
<br />
See the [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=reviewed worst KeyDocs list] for documents needing review now and the names of the responsible people.<br />
<br />
|}<br />
<br />
<!-- ****************End Documentation****************** -----><br />
<!-- ****************Start Interop****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Interoperation - [https://wiki.egi.eu/wiki/Grid_Operations_Meetings EGI ops agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
''' Tuesday 10th November<br />
<br />
Meeting yesterday was cancelled in favour of OMB; next meeting scheduled for 14/12.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- *********************************************************** -----><br />
<br />
|}<br />
<!-- ****************End Interop****************** -----><br />
<!-- ****************Start Monitoring****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Monitoring - [https://www.gridpp.ac.uk/wiki/Links_Monitoring_pages Links] [http://grid-monitoring.cern.ch/mywlcg/ MyWLCG]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 16th June'''<br />
* F Melaccio & D Crooks decided to add a [https://www.gridpp.ac.uk/wiki/Monitoring_FAQs FAQs section] devoted to common monitoring issues under the monitoring page.<br />
* Feedback welcome. <br />
<br />
<br />
<br />
''' Tuesday 31st March<br />
<br />
* Glasgow Graphite/Grafana documentation: http://www.scotgrid.ac.uk/graphite/<br />
<br />
'''Monday 7th December<br />
<br />
* Meeting last Friday - agenda: https://indico.cern.ch/event/356853/ minutes: https://indico.cern.ch/event/356853/material/minutes/1.pdf<br />
* This was the wrap-up meeting of the consolidation TF; the mailing list will remain extant for a while yet.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Monitoring****************** -----><br />
<!-- ****************On-duty****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | On-duty - [https://operations-portal.in2p3.fr/dashboard Dashboard] [https://www.gridpp.ac.uk/wiki/ROD_rota ROD rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 3rd November'''<br />
* The long-standing UCL availability alarm went green yesterday on 29th October. We are not sure why!<br />
* Quite a lot of activity on the dashboard this week, but only one or two new tickets. <br />
* Tickets: Five for availability / reliability: Sussex, Sheffield, Liverpool, Lancaster and UCL. Two for GLUE2 validation: Liverpool and QMUL. One for the CEs at QMU<br />
<br />
'''Tuesday 20th October'''<br />
<br />
Lots of bits here and there, but no big pattern. Tickets about CE and storage problems open at several sites. QMUL notable as going on for a while, probably with some kind of configuration problem they're not identifying.<br />
<br />
<br />
'''Tuesday 6th October'''<br />
* With the exception of the dashboard getting really confused early in the week as the Nagios instances at Oxford and Lancaster came and went, it's been a fairly quiet week. There are four outstanding tickets:<br />
** Three for availability / reliabaility (Sussex, Liverpool and Lancaster).<br />
** One at Bristol for a GridFTP transfer problem.<br />
<br />
'''Tuesday 15th September'''<br />
* Generally quiet. QMUL have some grumblyness with the CEs. However, I understand much of this is caused by the batch farm being busy. There are low-availability tickets 'on hold' for Liverpool and UCL.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Rollout [http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html Status] [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions WLCG Baseline]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 15th September'''<br />
* [http://repository.egi.eu/2015/09/10/release-umd-3-13-3/ UMD 3.13.3 is available].<br />
<br />
'''Tuesday 12th May'''<br />
* MW Readiness WG meeting Wed May 6th at 4pm. Attended by Raul, Matt, Sam and Jeremy.<br />
<br />
'''Tuesday 17th March'''<br />
* Daniela has updated the [ https://www.gridpp.ac.uk/wiki/Staged_rollout_emi3 EMI-3 testing table]. Please check it is correct for your site. We want a clear view of where we are contributing.<br />
* There is a middleware readiness meeting this Wednesday. Would be good if a few site representatives joined.<br />
* Machine job features solution testing. Fed back that we will only commence tests if more documentation made available. This stops the HTC solution until after CHEP. Is there interest in testing other batch systems? Raul mentioned SLURM. There is also SGE and Torque.<br />
<br />
'''References'''<br />
<br />
* Staged Rollout pages (now separated into EMI1 & 2), and the page listing the deployed versions is extractable from the bdii, so they should all be reasonably up-to-date:<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout_emi2.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End On-duty****************** -----><br />
<!-- ****************Start Security****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Security - [http://www.gridpp.ac.uk/security/inchand/Incident.html Incident Procedure] [http://www.gridpp.ac.uk/security/policies/index.html Policies] [https://www.gridpp.ac.uk/wiki/SD_rota Rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
'''Monday 9th November'''<br />
* EGI SVG Advisory - 'Critical' risk. Remote arbitrary code execution vulnerabilities in the core crypto library used by RedHat - [https://wiki.egi.eu/wiki/SVG:Advisory-SVG-2015-CVE-2015-7183 Advisory-SVG-2015-CVE-2015-7183] All running resources based on Red hat and its derivatives MUST be patched by 2015-11-13 <br />
<br />
'''Tuesday 3rd November'''<br />
* EGI SVG Advisory - Various Java CVE's with max CVSS score.<br />
<br />
'''Monday 26th October'''<br />
* Updated IGTF distribution version [ https://dist.igtf.net/distribution/igtf/current/ 1.69 available].<br />
* Next UK Security Team meeting scheduled for 28th Oct.<br />
<br />
The EGI [https://operations-portal.egi.eu/csiDashboard security dashboard].<br />
<br />
|}<br />
<!-- ****************End Security****************** -----><br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start Services****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Services - [http://psmad.grid.iu.edu/maddash-webui/ PerfSonar dashboard] | [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).<br />
<br />
'''Tuesday 6th October'''<br />
* A reminder, the next [https://indico.cern.ch/event/401680/ LHCOPN and LHCONE joint meeting]: Science Park Amsterdam (NL) 28-29 of October 2015<br />
<br />
'''Tuesday 14th July'''<br />
* GridPP35 in September will have a part focus on networking and IPv6. This will include a review of where sites are with their deployment. Please try to firm up dates for your IPv6 availability between now and September. Please update the [https://gridpp.ac.uk/wiki/IPv6_site_status GridPP IPv6 status table]. <br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Services****************** -----><br />
<!-- ****************Start Tickets****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tickets<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Monday 16th November 2015, 15.45 GMT'''<br /><br />
Only 17 Open UK tickets this week.<br />
<br />
'''LSST FRIDAY 13th TEETHING TROUBLES'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117585 117585] (Liverpool)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117586 117586] (Oxford)<br /><br />
Daniela has been testing out the LSST Dirac pilots - Ewan's fighting the good fight at Oxford getting his ARC to work, maybe Liverpool's problem has a similar root cause (we should be so lucky)?<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116866 116866] (12/10)<br /><br />
In a similar vein (and thanks again to Daniela for the effort with preparing our pilots) is this ticket about getting "other" pilots enabled at the Tier 1 - particularly with a view of these tests: http://www.gridpp.ac.uk/php/gridpp-dirac-sam.php?action=viewlcg Any news? On hold (6/11)<br />
<br />
'''ECDF(-RDF)'''<br /><br />
I think this new atlas ticket:<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117606 117606] (15/11)<br /><br />
is a duplicate of this current atlas ticket:<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117447 117447] (8/11)<br /><br />
So I suspect it can be closed!<br />
<br />
I see from [https://ggus.eu/?mode=ticket_info&ticket_id=117642 117642] that the other [https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 VO nagios] has been recently rebooted which might explain the lot of failures I see, I'll check again before the meeting.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tickets****************** -----><br />
<!-- ****************Start Tools****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tools - [https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] [https://gridppnagios.physics.ox.ac.uk/nagios/ Nagios]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== ===== <br />
<!-- ******************Edit start********************* -----><br />
''' Tuesday 6 Oct 2015'''<br />
<br />
Moved Gridppnagios instance back to Oxford from Lancaster. It was kind of double whammy as both sites went down together. Fortunately Oxford site was partially working so we managed to start SAM Nagios at Oxford. Sam tests were unavailable for few hours but no affect on egi availibilty/reliability. Sites can have a look at https://mon.egi.eu/myegi/ss/ for a/r status. <br />
<br />
'''Tuesday 29 Sep 2015'''<br />
<br />
Following an air-conditioning problem at machine room in Oxford Tier-2 site on 26 September, gridppnagios(OX) was shut down and gridppnagios(Lancs) became active instance. Oxford site is in downtime until 1st Oct and it may be extended depending on the situation. <br />
VO-Nagios was also unavailable for two days but we have started it yesterday as it is running on a VM. VO-nagios is using oxford SE for replication test so it is failing those tests. I am looking to change to some other SE. <br />
<br />
'''Tuesday 09 June 2015'''<br />
*ARC CEs were failing nagios test becuase of non-availability of egi repository. Nagios test compare CA version from EGI repo. It started on 5th June and one of the IP addresses behind webserver was not responding. Problem went away in approximately 3 hours. The same problem started again on 6th June. Finally it was fixed on 8th June. No reason was given in any of the ticket opened regarding this outage. <br />
<br />
'''Tuesday 17th February'''<br />
* Another period where message brokers were temporarily unavailable seen yesterday. Any news on the last follow-up?<br />
<br />
'''Tuesday 27th January'''<br />
* Unscheduled outage of the EGI message broker (GRNET) caused a short-lived disruption to GridPP site monitoring (jobs failed) last Thursday 22nd January. Suspect BDII caching meant no immediate failover to stomp://mq.cro-ngi.hr:6163/ from stomp://mq.afroditi.hellasgrid.gr:6163/<br />
<br />
* [http://southgrid.blogspot.co.uk/2014/10/nagios-monitoring-for-non-lhc-vos.html Blog about VO Nagios]<br />
* [https://vo-nagios.physics.ox.ac.uk/nagios/ Oxford VO Nagios] currently monitoring gridpp, pheno, t2k.org, snoplus.snolab.ca, vo.southgrid.ac.uk.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tools****************** -----><br />
<!-- ****************Start VOs****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | VOs - [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS] [http://operations-portal.egi.eu/vo VO IDs] [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs Approved] [http://pprc.qmul.ac.uk/~walker/votable.html VO table]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 19th May'''<br />
* There is a current priority for enabling/supporting our joining communities. <br />
<br />
'''Tuesday 5th May'''<br />
* We have a number of VOs to be removed. Dedicated follow-up meeting proposed.<br />
<br />
'''Tuesday 28th April'''<br />
* For SNOPLUS.SNOLAB.CA, the port numbers for voms02.gridpp.ac.uk and voms03.gridpp.ac.uk have both been updated from 15003 to 15503.<br />
<br />
'''Tuesday 31st March'''<br />
* LIGO are in need of additional support for debugging some tests.<br />
* LSST now enabled on 3 sites. No 'own' CVMFS yet.<br />
<br />
* Impact<br />
** Citation policy (https://www.gridpp.ac.uk/acknowledging.html)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End VOs****************** -----><br />
<!-- ****************Start Sites****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Site Updates<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 24th February'''<br />
* Next review of status today.<br />
<br />
'''Tuesday 27th January'''<br />
* Squids not in GOCDB for: UCL; ECDF; Birmingham; Durham; RHUL; IC; Sussex; Lancaster<br />
* Squids in GOCDB for: EFDA-JET; Manchester; Liverpool; Cambridge; Sheffield; Bristol; Brunel; QMUL; T1; Oxford; Glasgow; RALPPD.<br />
<br />
'''Tuesday 2nd December''' <br />
* [https://www.gridpp.ac.uk/wiki/Batch_system_status Multicore status]. Queues available (63%)<br />
** YES: RAL T1; Brunel; Imperial; QMUL; Lancaster; Liverpool; Manchester; Glasgow; Cambridge; Oxford; RALPP; Sussex (12)<br />
** NO: RHUL (testing); UCL; Sheffield (testing); Durham; ECDF (testing); Birmingham; Bristol (7)<br />
<br />
* According to our [https://www.gridpp.ac.uk/wiki/Batch_system_status table] for cloud/VMs (26%)<br />
** YES: RAL T1; Brunel; Imperial; Manchester; Oxford (5)<br />
** NO: QMUL; RHUL; UCL; Lancaster; Liverpool; Sheffield; Durham; ECDF; Glasgow; Birmingham; Bristol; Cambridge; RALPP; Sussex (14)<br />
<br />
* [http://www.gridpp.ac.uk/php/gridpp-dirac-sam.php?action=viewlcg GridPP DIRAC jobs] successful (58%)<br />
** YES: Bristol; Glasgow; Lancaster; Liverpool; Manchester; Oxford; Sheffield; Brunel; IC; QMUL; RHUL (11)<br />
** NO: Cambridge; Durham; RALPP; RAL T1 (4) + ECDF; Sussex; UCL; Birmingham (4)<br />
<br />
* [https://www.gridpp.ac.uk/wiki/IPv6_site_status IPv6 status]<br />
** Allocation - 42%<br />
** YES: RAL T1; Brunel; IC; QMUL; Manchester; Sheffield; Cambridge; Oxford (8)<br />
** NO: RHUL; UCL; Lancaster; Liverpool; Durham; ECDF; Glasgow; Birmingham; Bristol; RALPP; Sussex<br />
<br />
* Dual stack nodes - 21%<br />
** YES: Brunel; IC; QMUL; Oxford (4)<br />
** NO: RHUL; UCL; Lancaster; Glasgow; Liverpool; Manchester; Sheffield; Durham; ECDF; Birmingham; Bristol; Cambridge; RALPP; Sussex, RAL T1 (15)<br />
<br />
<br />
<br />
'''Tuesday 21st October'''<br />
* High loads seen in xroot by several sites: Liverpool and RALT1... and also Bristol (see Luke's TB-S email on 16/10 for questions about changes to help).<br />
<br />
'''Tuesday 9th September'''<br />
* Intel announced the new generation of Xeon based on Haswell.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Sites****************** -----><br />
|}<br />
<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Meeting Summaries<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start PMB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Project Management Board - [http://www.gridpp.ac.uk/pmb/ Members][http://www.gridpp.ac.uk/php/pmb/minutes.php Minutes] [http://www.gridpp.ac.uk/pmb/ProjectManagement/QuarterlyReports/reports.html Quarterly Reports]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ******************************************* -----><br />
<!-- ****************End PMB****************** -----><br />
<!-- ****************Start Grid Ops****************** -----><br />
<!-- ******************************************* -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | GridPP ops meeting - [http://indico.cern.ch/categoryDisplay.py?categId=338 Agendas] [https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Actions] [https://www.gridpp.ac.uk/wiki/Category:GridPP_Operations Core Tasks]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Grid Ops****************** -----><br />
<!-- ****************Start T1 liaison****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) [https://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Agenda] Meeting takes place on Vidyo.<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
Highlights from this meeting are now included in the Tier1 report farther up this page.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Liaison****************** -----><br />
<!-- ****************Start GDB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Grid Deployment Board - [http://indico.cern.ch/categoryDisplay.py?categId=3l181 Agendas] [http://indico.cern.ch/categoryDisplay.py?categId=666 MB agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End GDB****************** -----><br />
<br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start NGI****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | NGI UK - [http://www.ukngi.ac.uk/ Homepage] [https://ca.grid-support.ac.uk/cgi-bin/pub/pki?cmd=getStaticPage&name=index CA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End NGI****************** -----><br />
<br />
<!-- ****************Start Events****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Events<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************Stop Events****************** -----><br />
<!-- ****************Start ATLAS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK ATLAS - [http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?#currentView=Shifter+view&highlight=false Shifter view] [http://www.atlas.ac.uk/ops/ News & Links]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Atlas S&C week 2-6 Feb 2015'''<br />
<br />
Production<br />
<br />
• Prodsys-2 in production since Dec 1st<br />
<br />
• Deployment has not been transparent , many issued has been solved, the grid is filled again <br />
<br />
• MC15 is expected to start soon, waiting for physics validations, evgen testing is underway and close to finalised.. Simulation expected to be broadly similar to MC14, no blockers expected. <br />
<br />
Rucio<br />
<br />
• Rucio in production since Dec 1st and is ready for LHC RUN-2. Some fields need improvements, including transfer and deletion agents, documentation and monitoring. <br />
<br />
• [https://rucio-ui.cern.ch/dumps Rucio dumps available].<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMDarkDataCleaning Dark data cleaning]<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMLostFilesLost files declaration] . Only Only DDM ops can issue lost files declaration for now, cloud support needs to fill a ticket.<br />
<br />
• Webdav panda functional tests with Hammercloud are ongoing<br />
<br />
Monitoring<br />
<br />
• [http://adc-monitoring.cern.ch/ Main page]<br />
<br />
• [http://dashb-atlas-ddm-acc.cern.ch/dashboard/request.py/ddmaccounting DDM Accounting]<br />
<br />
• [http://atlas-agis.cern.ch/agis/ddmblacklisting/listGroup space]<br />
<br />
• [http://dashb-atlas-ddm.cern.ch/ddm2/ Deletion]<br />
<br />
ASAP<br />
<br />
• ASAP (ATLAS Site Availability Performance) in place. Every 3 months the T2s sites performing BELOW 80% are [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/ATLASSiteCategorization reported to the International Computing Board.]<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End ATLAS****************** -----><br />
<!-- ****************Start CMS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK CMS<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End CMS****************** -----><br />
<!-- ****************Start LHCb****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK LHCb<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End LHCb****************** -----><br />
<!-- ****************Start Other****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK OTHER<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Other****************** -----><br />
<!-- ****************Start Requests****************** -----><br />
{| cellspacing="0" cellpadding="0" style="margin: 1em 0em 0em 0.3em; width:100%"<br />
| style="width:50%; vertical-align:top; border:1px solid Gold; background-color: LightGreen;" rowspan="2"|<br />
<div style="border-bottom:1px solid Gold; background-color:#ffffaa; padding:0.2em 0.5em 0.2em 0.5em; font-size:110%; font-weight:bold;"> To note</div><br />
<div style="padding:0.4em 1em 0.3em 1em;"><br />
<br />
==== ====<br />
<!-- ******************Edit start********************* -----><br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<br />
</div><br />
|}<br />
<br />
<!-- ****************End Requests****************** -----><br />
<br />
|}</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Operations_Bulletin_LatestOperations Bulletin Latest2015-11-16T13:40:12Z<p>Matthew Doidge 1ac9bd3994: /* */</p>
<hr />
<div>[[Operations_Bulletin_Archive|Bulletin archive]]<br />
<br />
__NOTOC__<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Week commencing 16th November 2015<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Task Areas<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start General****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | General updates <br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start General text*********************** ----->'''<br />
'''Tuesday 10th November'''<br />
* There was a WLCG GDB last week: [http://indico.cern.ch/event/319753/ Agenda]. [https://twiki.cern.ch/twiki/bin/view/LCG/GDBMeetingNotes20151104 Minutes].<br />
* ATLAS is going to move the brokering to use maxrss value rather than maxrss+maxswap (called maxmemory in the ATLAS panda queues). [https://twiki.cern.ch/twiki/bin/view/LCG/BSPassingParameters Reminder notes].<br />
* There is an EGI meeting in Bari this week. The OMB agenda/slides can be seen [https://indico.egi.eu/indico/contributionDisplay.py?contribId=1&confId=2675 here]. <br />
* An update from John Gordon on the CPU efficiency accounting discussion of last week."Stuart has the database merge under way now. We had hoped it would be done by the end of October but it isn’t complete yet. When done and we send an integrated dataset to the portal they will put the current dev view in the production portal alongside the current one. They are currently developing a major rewrite of the portal and don’t want to mess about with it too much."<br />
*J Perkin: Weird user error when reading LFC. Led to a reminder that the CA team are currently best efforts - please be patient.<br />
* J Hill: ATLAS consistency checks at Tier2s requires sites to upload stiorage dumps - please can we have a clear statement of what parameters ATLAS wants us to use for these dumps?<br />
* Simon G: Asked about Tier3 access to Tier2 storage.<br />
* LCG-ROLLOUT: TOP BDII issues with CentOS 6.7 (openldap-servers-2.4.40-5.el6.x86_64). It basically breaks but is being followed up with RH.<br />
* For those interested in the ARGUS working group meeting last week please see their [https://indico.cern.ch/event/459898/ summary].<br />
* There was a GridPP Technical Meeting last Friday: [https://indico.cern.ch/e/460437 Agenda].<br />
<br />
'''Tuesday 3rd November'''<br />
* There is a GDB this week: [https://indico.cern.ch/event/319753/ Agenda].<br />
* (re)introduction of the STRICT_RFC2818 mechanism in Globus.... See Jens's comments on TB-SUPPORT.<br />
* Re-allocation of space for ATLAS as a result of cleanup and removal of the ATLASPRODDISK space token.<br />
* Time out of Nagios glexec probe.<br />
* The GridPP hardware allocations are just about final so expect figures very soon. Purchases are this financial year.<br />
* DPM workshop 2015 7th-8th Dec at CERN - [https://indico.cern.ch/event/432642/ Registration is open].<br />
* The [http://wlcg-sam.cern.ch/reports/2015/201510/wlcg/ October WLCG A/R] figures are now available<br />
** [http://wlcg-sam.cern.ch/reports/2015/201510/wlcg/WLCG_All_Sites_ALICE_Oct2015.pdf ALICE]. All okay.<br />
** [http://wlcg-sam.cern.ch/reports/2015/201510/wlcg/WLCG_All_Sites_ATLAS_Oct2015.pdf ATLAS]. <br />
*** QMUL: 85%:85%<br />
*** Lancaster: N/a:N/a?<br />
*** Liverpool: 85%:100%<br />
*** Sheffield: 78%:78%<br />
**[http://wlcg-sam.cern.ch/reports/2015/201510/wlcg/WLCG_All_Sites_CMS_Oct2015.pdf CMS]. All okay.<br />
** [http://wlcg-sam.cern.ch/reports/2015/201510/wlcg/WLCG_All_Sites_LHCB_Oct2015.pdf LHCb] <br />
*** QMUL: 79%:79%<br />
*** Liverpool: 86%:100%<br />
*** Sheffield: 86%:86%<br />
*** RAL PPD: 79%:79%<br />
* Exploring options for VM based sites (in respect of the monitoring within EGI): Perhaps setup a 'community platform'.<br />
* RCUK Cloud Working Group - a first [http://bit.ly/cloudwgdec15 workshop on the 1st December] at Imperial College.<br />
* From the MB last week: <br />
** Memory Requirements: LHC experiments all basically agreed that 2GB/core was the baseline but that some (advertised) resources with up to 4GB/core would be valuable for some workflows.<br />
** February as kick-off for technical evolution groups.<br />
** PCP - Pre-commercial-procurement and HNSciCloud. This is approved and starts January-16. UK has small involvement. The plan is to build on the hybrid cloud service that results, in order to deploy a European Open Science Cloud funded from the INFRADEV-04 (2016) call <br />
<br />
<br />
<!-- **********************End General text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End General****************** -----><br />
<!-- ****************Start ops coord****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Operations Coordination - [https://indico.cern.ch/categoryDisplay.py?categId=4372 Agendas][https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsCoordination Wiki Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start ops coord text*********************** -----><br />
<br />
'''Tuesday 10th November'''<br />
* There was a WLCG ops coordination meeting last Thursday: [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151105 Minutes]. (Alessandra chaired and might be able to talk over the main items at our ops meeting).<br />
<br />
'''Tuesday 27th October'''<br />
* 13th MW Readiness WG meeting THIS Wed 28/10 @ 4pm CET in CERN room 28-S-023 or via vidyo.<br />
* There was a WLCG ops coordination meeting last Thursday: [https://indico.cern.ch/event/393618/ Agenda] - [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151022 Minutes]. <br />
<br />
'''Next Meeting is scheduled for Thursday 22nd October'''<br />
<br />
'''Tuesday 6th October'''<br />
* There was a WLCG ops coordination meeting last week. [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151001 Minutes]. [https://indico.cern.ch/event/393617/ Agenda] (which has John Gordon's accounting slides).<br />
* The highlights:<br />
** dCache sites should install the latest fix for SRM solving a vulnerability<br />
** All sites hosting a regional or local site xrootd should updgrade it at least to version 4.1.1<br />
** CMS DPM sites should consider upgrading dpm-xrootd to version 3.5.5 now (from epel-testing) or after mid October (from epel-stable) to fix a problem affecting AAA<br />
** Tier-1 sites should do their best to avoid scheduling OUTAGE downtimes at the same time as other Tier-1's supporting common LHC VOs. A calendar will be linked in the minutes of the 3 o'clock operations meeting to easily find out if there are already downtimes at a given date<br />
** The multicore accounting for WLCG is now correct for the 99.5% of the CPU time, with the few remaining issues being addressed. Corrected historical accounting data is expected to be available from the production portal by the end of the month<br />
** All LHCb sites will soon be asked to deploy the "machine features" functionality<br />
<br />
'''Tuesday 22nd September'''<br />
* There was an ops coordination meeting last Thursday: [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150917 Minutes].<br />
* Highlights:<br />
** All 4 experiments have now an agreed workflow with the T0 for tickets that should be handled by the experiment supporters and were accidentally assigned to the T0 service managers.<br />
** A new FTS3 bug fixing release 3.3.1 is now available.<br />
** A globus lib issue is causing problems with FTS3 for sites running IPv6.<br />
** The rogue Glasgow configuration management tool replacing the current configuration for VOMS with the old one was picked up and unfortunately discussed as though sites had not got the message about using the new VOMS.<br />
** No network problems experienced with the transatlantic link despite 3 out of 4 cables being unavailable.<br />
** T0 experts are investigating the slow WN performance reported by LHCb and others.<br />
** A group of experts at CERN and CMS investigate ARGUS authentication problems affecting CMS VOBOXes.<br />
** T1 & T2 sites please observe the actions requested by ATLAS and CMS (also on the WLCG Operations portal).<br />
* Actions for [https://wlcg-ops.web.cern.ch/sys-admins/tasks Sites]; [https://wlcg-ops.web.cern.ch/experiments/tasks Experiments].<br />
<br />
'''Tuesday 15th September'''<br />
* The [https://indico.cern.ch/event/393616/ next WLCG ops coordination meeting is this Thursday 17th September]. Are there any Tier-2 issues we wish to raise? Minutes will appear [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150917 here]. <br />
<br />
<br />
<!-- **********************End ops coord text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End ops coord****************** -----><br />
<br />
<!-- ****************Start T1****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-1 - [http://www.gridpp.rl.ac.uk/status/ Status Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start T1 text*********************** -----><br />
'''Tuesday 10th November'''<br />
A reminder that there is a weekly [http://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Tier-1 experiment liaison meeting]. Notes from the last meeting [https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2015-11-04 here]<br />
* We are investigating why LHCB batch jobs sometimes fail to write results back to Castor (and the sometimes fail to write remotely as well).<br />
* We are continuing with some detailed network changes needed to remove our old core switch from the network.<br />
<!-- **********************End T1 text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End T1****************** -----><br />
<br />
<!-- ****************Start Storage & DM****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Storage & Data Management - [http://storage.esc.rl.ac.uk/weekly/ Agendas/Minutes]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20151028-minutes.txt Wednesday 28 Oct]'''<br />
* Summary of "UK T0" workshop - GridPP well represented<br />
* Sites should not upgrade to DPM 1.8.10 just yet<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150902-minutes.txt Wednesday 02 Sep]'''<br />
* Catch up with [http://www3.imperial.ac.uk/highenergyphysics/research/experiments/mice MICE]<br />
* How to do transfers of '''lots''' of files with FTS3 without the proxy timing out (in particularly if you need it vomsified)<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150812-minutes.txt Wednesday 12 Aug]'''<br />
* sort of housekeeping: data cleanups, catalogue synchronisation - in particular namespace dumps for VOs<br />
* GridPP storage/data at future events; GridPP35 and Hepix and Cloud data events<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150708-minutes.txt Wednesday 08 July]'''<br />
* Huge backlog of ATLAS data from Glasgow waiting to go to RAL, and oddly varying performance numbers - investigating<br />
* How physics data is like your Windows 95 games<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150701-minutes.txt Wednesday 01 July]'''<br />
* Feedback on CMS's proposal for listing contents of storage<br />
* Simple storage on expensive raided disks vs complicated storage on el cheapo or archive drives?<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End Storage & DM****************** -----><br />
<!-- ****************Start T2EVO****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-2 Evolution - [https://its.cern.ch/jira/browse/GRIDPP/ GridPP JIRA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Monday 9 Nov'''<br />
* Vac 0.19.0 released last week: new VacQuery protocol<br />
* Prototyping Vcycle(/Vac) GLUE2 publishing: talk at WLCG InfoSys Evolution TF on Thursday<br />
* Please now use https://repo.gridpp.ac.uk/vacproject/ URLs for user_data and cernvm3.iso files (in preparation for move to new www.gridpp.ac.uk website.)<br />
* See Laurence's GDB talk "Helix Nebula Price Enquiry Results" for more use of Vcycle<br />
<br />
'''Tuesday 3 Nov'''<br />
* LHCb prototype of GOCDB pointers to resource BDII done<br />
* T2C tests at Oxford ongoing<br />
<br />
'''Tuesday 20 Oct'''<br />
* LHCb multipilot VMs now in production<br />
* Support for APEL-Sync records in Vac, but need to co-ordinate with APEL team to validate it. This is to allow pure-VM sites like UCL to pass APEL SAM tests (GRIDPP-10)<br />
* Last GridPP Technical Meeting decided to test disk-less operation at Oxford for CMS (GRIDPP-20) and LHCb (GRIDPP-21).<br />
<br />
'''Tuesday 6 Oct'''<br />
* UCL Vac site now running LHCb test of two payloads per dual processor VM. Total of dual processor VMs at UCL now 120.<br />
<br />
'''Tuesday 29 Sep'''<br />
* UCL Vac site updated with most recent version of Vac-in-a-Box. Now running ~216 jobs: LHCb MC and ATLAS certification jobs.<br />
* Drawing up list of tasks needed to be able to run a site for GridPP-supported VOs purely using VMs (e.g. VM certification by experiments etc.)<br />
* Discussion at GridPP Technical Meeting on storage options, including xrootd-based sites (i.e. xrootd not DPM/dCache)<br />
<br />
'''Tuesday 22 Sep'''<br />
* Fortnightly [https://indico.cern.ch/category/4454/ GridPP Technical Meetings] on Fridays will have Tier-2 Evolution discussions, starting on Fri 25 Sept.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End T2EVO****************** -----><br />
<!-- ****************Start Accounting****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Accounting - [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html UK Grid Metrics] [[HEPSPEC06]] [http://tinyurl.com/cfbfanf Atlas Dashboard HS06]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 3rd November'''<br />
* APEL delay (normal state) Lancaster and Sheffield.<br />
<br />
'''Tuesday 20th October'''<br />
The WLCG MB decided to create a Benchmarking Task force led by Helge Meinhard see [https://indico.cern.ch/event/433303/contribution/9/attachments/1154494/1658853/2015-09-15-LCGMB-Benchmarking.pdf talk]<br />
<br />
'''Tuesday 22nd September'''<br />
* Slight delay for Sheffield but overall okay - although there is a gap between today's date and the most recent update for all sites. Perhaps an APEL delay.<br />
<br />
'''Monday 20th July'''<br />
* Oxford publishing 0 cores from Cream today. Maybe they forgot to switch one off. [http://goc-accounting.grid-support.ac.uk/apel/jobs2_withsubmithost.html Check here]. <br />
<br />
* A reminder to keep updating the [https://www.gridpp.ac.uk/wiki/HEPSPEC06 HEPSPEC06 tables].<br />
<br />
* [http://accounting.egi.eu/egi.php?SubRegion=1.67&query=normcpu&startYear=2014&startMonth=8&endYear=2014&endMonth=9&yRange=SITE&xRange=VO&voGroup=lhc&chart=GRBAR&scale=LIN&localJobs=onlygridjobs APEL status]: An issue at Sheffield?<br />
<br />
<!-- *****************Edit stop*********************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End Accounting****************** -----><br />
<!-- ****************Start Documentation****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Documentation - [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=area KeyDocs]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- *********************************************************** -----><br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Friday 6th Nov, 2015'''<br />
<br />
SteveJ: Advice to admins published about a common GSS error, globus_gsi_callback_module: Could not verify <br />
credential etc.<br />
<br />
https://www.gridpp.ac.uk/wiki/Security_system_errors_and_workarounds<br />
<br />
'''Tuesday 20th October, 2015'''<br />
<br />
Approved VOs document updated with temporary section for [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs#Approved_VOs_in_the_process_of_being_established LZ]<br />
<br />
'''Tuesday 29nd September'''<br />
Steve J: problems with voms server at fnal, voms.fnal.gov, have been detected; I will resolve them soon and may issue an update to Approved VOs, alerting sites with TB_SUPPORT should that occur. Approved VOs potentially affected are CDF, DZERO, LSST. Please do not act act yet.<br />
<br />
'''Tuesday 22nd September'''<br />
* Steve J is going to undertake some GridPP/documentation usability testing. <br />
<br />
'''Tuesday 18th August'''<br />
* Lydia's document - Setup a system to do data archiving using FTS3<br />
<br />
'''Tuesday 28th July'''<br />
* Ewan: /cvmfs/gridpp-vo help ... there's a lot of historical stuff on the GridPP wiki that makes it look a lot more complicated than it is now. We really should have a bit of a clear out at some point.<br />
<br />
'''Tuesday 23rd June'''<br />
* Reminder that documents need reviewing!<br />
<br />
<br />
'''General note'''<br />
<br />
See the [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=reviewed worst KeyDocs list] for documents needing review now and the names of the responsible people.<br />
<br />
|}<br />
<br />
<!-- ****************End Documentation****************** -----><br />
<!-- ****************Start Interop****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Interoperation - [https://wiki.egi.eu/wiki/Grid_Operations_Meetings EGI ops agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
''' Tuesday 10th November<br />
<br />
Meeting yesterday was cancelled in favour of OMB; next meeting scheduled for 14/12.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- *********************************************************** -----><br />
<br />
|}<br />
<!-- ****************End Interop****************** -----><br />
<!-- ****************Start Monitoring****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Monitoring - [https://www.gridpp.ac.uk/wiki/Links_Monitoring_pages Links] [http://grid-monitoring.cern.ch/mywlcg/ MyWLCG]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 16th June'''<br />
* F Melaccio & D Crooks decided to add a [https://www.gridpp.ac.uk/wiki/Monitoring_FAQs FAQs section] devoted to common monitoring issues under the monitoring page.<br />
* Feedback welcome. <br />
<br />
<br />
<br />
''' Tuesday 31st March<br />
<br />
* Glasgow Graphite/Grafana documentation: http://www.scotgrid.ac.uk/graphite/<br />
<br />
'''Monday 7th December<br />
<br />
* Meeting last Friday - agenda: https://indico.cern.ch/event/356853/ minutes: https://indico.cern.ch/event/356853/material/minutes/1.pdf<br />
* This was the wrap-up meeting of the consolidation TF; the mailing list will remain extant for a while yet.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Monitoring****************** -----><br />
<!-- ****************On-duty****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | On-duty - [https://operations-portal.in2p3.fr/dashboard Dashboard] [https://www.gridpp.ac.uk/wiki/ROD_rota ROD rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 3rd November'''<br />
* The long-standing UCL availability alarm went green yesterday on 29th October. We are not sure why!<br />
* Quite a lot of activity on the dashboard this week, but only one or two new tickets. <br />
* Tickets: Five for availability / reliability: Sussex, Sheffield, Liverpool, Lancaster and UCL. Two for GLUE2 validation: Liverpool and QMUL. One for the CEs at QMU<br />
<br />
'''Tuesday 20th October'''<br />
<br />
Lots of bits here and there, but no big pattern. Tickets about CE and storage problems open at several sites. QMUL notable as going on for a while, probably with some kind of configuration problem they're not identifying.<br />
<br />
<br />
'''Tuesday 6th October'''<br />
* With the exception of the dashboard getting really confused early in the week as the Nagios instances at Oxford and Lancaster came and went, it's been a fairly quiet week. There are four outstanding tickets:<br />
** Three for availability / reliabaility (Sussex, Liverpool and Lancaster).<br />
** One at Bristol for a GridFTP transfer problem.<br />
<br />
'''Tuesday 15th September'''<br />
* Generally quiet. QMUL have some grumblyness with the CEs. However, I understand much of this is caused by the batch farm being busy. There are low-availability tickets 'on hold' for Liverpool and UCL.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Rollout [http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html Status] [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions WLCG Baseline]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 15th September'''<br />
* [http://repository.egi.eu/2015/09/10/release-umd-3-13-3/ UMD 3.13.3 is available].<br />
<br />
'''Tuesday 12th May'''<br />
* MW Readiness WG meeting Wed May 6th at 4pm. Attended by Raul, Matt, Sam and Jeremy.<br />
<br />
'''Tuesday 17th March'''<br />
* Daniela has updated the [ https://www.gridpp.ac.uk/wiki/Staged_rollout_emi3 EMI-3 testing table]. Please check it is correct for your site. We want a clear view of where we are contributing.<br />
* There is a middleware readiness meeting this Wednesday. Would be good if a few site representatives joined.<br />
* Machine job features solution testing. Fed back that we will only commence tests if more documentation made available. This stops the HTC solution until after CHEP. Is there interest in testing other batch systems? Raul mentioned SLURM. There is also SGE and Torque.<br />
<br />
'''References'''<br />
<br />
* Staged Rollout pages (now separated into EMI1 & 2), and the page listing the deployed versions is extractable from the bdii, so they should all be reasonably up-to-date:<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout_emi2.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End On-duty****************** -----><br />
<!-- ****************Start Security****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Security - [http://www.gridpp.ac.uk/security/inchand/Incident.html Incident Procedure] [http://www.gridpp.ac.uk/security/policies/index.html Policies] [https://www.gridpp.ac.uk/wiki/SD_rota Rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
'''Monday 9th November'''<br />
* EGI SVG Advisory - 'Critical' risk. Remote arbitrary code execution vulnerabilities in the core crypto library used by RedHat - [https://wiki.egi.eu/wiki/SVG:Advisory-SVG-2015-CVE-2015-7183 Advisory-SVG-2015-CVE-2015-7183] All running resources based on Red hat and its derivatives MUST be patched by 2015-11-13 <br />
<br />
'''Tuesday 3rd November'''<br />
* EGI SVG Advisory - Various Java CVE's with max CVSS score.<br />
<br />
'''Monday 26th October'''<br />
* Updated IGTF distribution version [ https://dist.igtf.net/distribution/igtf/current/ 1.69 available].<br />
* Next UK Security Team meeting scheduled for 28th Oct.<br />
<br />
The EGI [https://operations-portal.egi.eu/csiDashboard security dashboard].<br />
<br />
|}<br />
<!-- ****************End Security****************** -----><br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start Services****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Services - [http://psmad.grid.iu.edu/maddash-webui/ PerfSonar dashboard] | [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).<br />
<br />
'''Tuesday 6th October'''<br />
* A reminder, the next [https://indico.cern.ch/event/401680/ LHCOPN and LHCONE joint meeting]: Science Park Amsterdam (NL) 28-29 of October 2015<br />
<br />
'''Tuesday 14th July'''<br />
* GridPP35 in September will have a part focus on networking and IPv6. This will include a review of where sites are with their deployment. Please try to firm up dates for your IPv6 availability between now and September. Please update the [https://gridpp.ac.uk/wiki/IPv6_site_status GridPP IPv6 status table]. <br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Services****************** -----><br />
<!-- ****************Start Tickets****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tickets<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tickets****************** -----><br />
<!-- ****************Start Tools****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tools - [https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] [https://gridppnagios.physics.ox.ac.uk/nagios/ Nagios]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== ===== <br />
<!-- ******************Edit start********************* -----><br />
''' Tuesday 6 Oct 2015'''<br />
<br />
Moved Gridppnagios instance back to Oxford from Lancaster. It was kind of double whammy as both sites went down together. Fortunately Oxford site was partially working so we managed to start SAM Nagios at Oxford. Sam tests were unavailable for few hours but no affect on egi availibilty/reliability. Sites can have a look at https://mon.egi.eu/myegi/ss/ for a/r status. <br />
<br />
'''Tuesday 29 Sep 2015'''<br />
<br />
Following an air-conditioning problem at machine room in Oxford Tier-2 site on 26 September, gridppnagios(OX) was shut down and gridppnagios(Lancs) became active instance. Oxford site is in downtime until 1st Oct and it may be extended depending on the situation. <br />
VO-Nagios was also unavailable for two days but we have started it yesterday as it is running on a VM. VO-nagios is using oxford SE for replication test so it is failing those tests. I am looking to change to some other SE. <br />
<br />
'''Tuesday 09 June 2015'''<br />
*ARC CEs were failing nagios test becuase of non-availability of egi repository. Nagios test compare CA version from EGI repo. It started on 5th June and one of the IP addresses behind webserver was not responding. Problem went away in approximately 3 hours. The same problem started again on 6th June. Finally it was fixed on 8th June. No reason was given in any of the ticket opened regarding this outage. <br />
<br />
'''Tuesday 17th February'''<br />
* Another period where message brokers were temporarily unavailable seen yesterday. Any news on the last follow-up?<br />
<br />
'''Tuesday 27th January'''<br />
* Unscheduled outage of the EGI message broker (GRNET) caused a short-lived disruption to GridPP site monitoring (jobs failed) last Thursday 22nd January. Suspect BDII caching meant no immediate failover to stomp://mq.cro-ngi.hr:6163/ from stomp://mq.afroditi.hellasgrid.gr:6163/<br />
<br />
* [http://southgrid.blogspot.co.uk/2014/10/nagios-monitoring-for-non-lhc-vos.html Blog about VO Nagios]<br />
* [https://vo-nagios.physics.ox.ac.uk/nagios/ Oxford VO Nagios] currently monitoring gridpp, pheno, t2k.org, snoplus.snolab.ca, vo.southgrid.ac.uk.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tools****************** -----><br />
<!-- ****************Start VOs****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | VOs - [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS] [http://operations-portal.egi.eu/vo VO IDs] [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs Approved] [http://pprc.qmul.ac.uk/~walker/votable.html VO table]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 19th May'''<br />
* There is a current priority for enabling/supporting our joining communities. <br />
<br />
'''Tuesday 5th May'''<br />
* We have a number of VOs to be removed. Dedicated follow-up meeting proposed.<br />
<br />
'''Tuesday 28th April'''<br />
* For SNOPLUS.SNOLAB.CA, the port numbers for voms02.gridpp.ac.uk and voms03.gridpp.ac.uk have both been updated from 15003 to 15503.<br />
<br />
'''Tuesday 31st March'''<br />
* LIGO are in need of additional support for debugging some tests.<br />
* LSST now enabled on 3 sites. No 'own' CVMFS yet.<br />
<br />
* Impact<br />
** Citation policy (https://www.gridpp.ac.uk/acknowledging.html)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End VOs****************** -----><br />
<!-- ****************Start Sites****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Site Updates<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 24th February'''<br />
* Next review of status today.<br />
<br />
'''Tuesday 27th January'''<br />
* Squids not in GOCDB for: UCL; ECDF; Birmingham; Durham; RHUL; IC; Sussex; Lancaster<br />
* Squids in GOCDB for: EFDA-JET; Manchester; Liverpool; Cambridge; Sheffield; Bristol; Brunel; QMUL; T1; Oxford; Glasgow; RALPPD.<br />
<br />
'''Tuesday 2nd December''' <br />
* [https://www.gridpp.ac.uk/wiki/Batch_system_status Multicore status]. Queues available (63%)<br />
** YES: RAL T1; Brunel; Imperial; QMUL; Lancaster; Liverpool; Manchester; Glasgow; Cambridge; Oxford; RALPP; Sussex (12)<br />
** NO: RHUL (testing); UCL; Sheffield (testing); Durham; ECDF (testing); Birmingham; Bristol (7)<br />
<br />
* According to our [https://www.gridpp.ac.uk/wiki/Batch_system_status table] for cloud/VMs (26%)<br />
** YES: RAL T1; Brunel; Imperial; Manchester; Oxford (5)<br />
** NO: QMUL; RHUL; UCL; Lancaster; Liverpool; Sheffield; Durham; ECDF; Glasgow; Birmingham; Bristol; Cambridge; RALPP; Sussex (14)<br />
<br />
* [http://www.gridpp.ac.uk/php/gridpp-dirac-sam.php?action=viewlcg GridPP DIRAC jobs] successful (58%)<br />
** YES: Bristol; Glasgow; Lancaster; Liverpool; Manchester; Oxford; Sheffield; Brunel; IC; QMUL; RHUL (11)<br />
** NO: Cambridge; Durham; RALPP; RAL T1 (4) + ECDF; Sussex; UCL; Birmingham (4)<br />
<br />
* [https://www.gridpp.ac.uk/wiki/IPv6_site_status IPv6 status]<br />
** Allocation - 42%<br />
** YES: RAL T1; Brunel; IC; QMUL; Manchester; Sheffield; Cambridge; Oxford (8)<br />
** NO: RHUL; UCL; Lancaster; Liverpool; Durham; ECDF; Glasgow; Birmingham; Bristol; RALPP; Sussex<br />
<br />
* Dual stack nodes - 21%<br />
** YES: Brunel; IC; QMUL; Oxford (4)<br />
** NO: RHUL; UCL; Lancaster; Glasgow; Liverpool; Manchester; Sheffield; Durham; ECDF; Birmingham; Bristol; Cambridge; RALPP; Sussex, RAL T1 (15)<br />
<br />
<br />
<br />
'''Tuesday 21st October'''<br />
* High loads seen in xroot by several sites: Liverpool and RALT1... and also Bristol (see Luke's TB-S email on 16/10 for questions about changes to help).<br />
<br />
'''Tuesday 9th September'''<br />
* Intel announced the new generation of Xeon based on Haswell.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Sites****************** -----><br />
|}<br />
<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Meeting Summaries<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start PMB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Project Management Board - [http://www.gridpp.ac.uk/pmb/ Members][http://www.gridpp.ac.uk/php/pmb/minutes.php Minutes] [http://www.gridpp.ac.uk/pmb/ProjectManagement/QuarterlyReports/reports.html Quarterly Reports]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ******************************************* -----><br />
<!-- ****************End PMB****************** -----><br />
<!-- ****************Start Grid Ops****************** -----><br />
<!-- ******************************************* -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | GridPP ops meeting - [http://indico.cern.ch/categoryDisplay.py?categId=338 Agendas] [https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Actions] [https://www.gridpp.ac.uk/wiki/Category:GridPP_Operations Core Tasks]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Grid Ops****************** -----><br />
<!-- ****************Start T1 liaison****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) [https://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Agenda] Meeting takes place on Vidyo.<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
Highlights from this meeting are now included in the Tier1 report farther up this page.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Liaison****************** -----><br />
<!-- ****************Start GDB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Grid Deployment Board - [http://indico.cern.ch/categoryDisplay.py?categId=3l181 Agendas] [http://indico.cern.ch/categoryDisplay.py?categId=666 MB agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End GDB****************** -----><br />
<br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start NGI****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | NGI UK - [http://www.ukngi.ac.uk/ Homepage] [https://ca.grid-support.ac.uk/cgi-bin/pub/pki?cmd=getStaticPage&name=index CA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End NGI****************** -----><br />
<br />
<!-- ****************Start Events****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Events<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************Stop Events****************** -----><br />
<!-- ****************Start ATLAS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK ATLAS - [http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?#currentView=Shifter+view&highlight=false Shifter view] [http://www.atlas.ac.uk/ops/ News & Links]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Atlas S&C week 2-6 Feb 2015'''<br />
<br />
Production<br />
<br />
• Prodsys-2 in production since Dec 1st<br />
<br />
• Deployment has not been transparent , many issued has been solved, the grid is filled again <br />
<br />
• MC15 is expected to start soon, waiting for physics validations, evgen testing is underway and close to finalised.. Simulation expected to be broadly similar to MC14, no blockers expected. <br />
<br />
Rucio<br />
<br />
• Rucio in production since Dec 1st and is ready for LHC RUN-2. Some fields need improvements, including transfer and deletion agents, documentation and monitoring. <br />
<br />
• [https://rucio-ui.cern.ch/dumps Rucio dumps available].<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMDarkDataCleaning Dark data cleaning]<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMLostFilesLost files declaration] . Only Only DDM ops can issue lost files declaration for now, cloud support needs to fill a ticket.<br />
<br />
• Webdav panda functional tests with Hammercloud are ongoing<br />
<br />
Monitoring<br />
<br />
• [http://adc-monitoring.cern.ch/ Main page]<br />
<br />
• [http://dashb-atlas-ddm-acc.cern.ch/dashboard/request.py/ddmaccounting DDM Accounting]<br />
<br />
• [http://atlas-agis.cern.ch/agis/ddmblacklisting/listGroup space]<br />
<br />
• [http://dashb-atlas-ddm.cern.ch/ddm2/ Deletion]<br />
<br />
ASAP<br />
<br />
• ASAP (ATLAS Site Availability Performance) in place. Every 3 months the T2s sites performing BELOW 80% are [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/ATLASSiteCategorization reported to the International Computing Board.]<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End ATLAS****************** -----><br />
<!-- ****************Start CMS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK CMS<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End CMS****************** -----><br />
<!-- ****************Start LHCb****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK LHCb<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End LHCb****************** -----><br />
<!-- ****************Start Other****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK OTHER<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Other****************** -----><br />
<!-- ****************Start Requests****************** -----><br />
{| cellspacing="0" cellpadding="0" style="margin: 1em 0em 0em 0.3em; width:100%"<br />
| style="width:50%; vertical-align:top; border:1px solid Gold; background-color: LightGreen;" rowspan="2"|<br />
<div style="border-bottom:1px solid Gold; background-color:#ffffaa; padding:0.2em 0.5em 0.2em 0.5em; font-size:110%; font-weight:bold;"> To note</div><br />
<div style="padding:0.4em 1em 0.3em 1em;"><br />
<br />
==== ====<br />
<!-- ******************Edit start********************* -----><br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<br />
</div><br />
|}<br />
<br />
<!-- ****************End Requests****************** -----><br />
<br />
|}</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Past_Ticket_Bulletins_2015Past Ticket Bulletins 20152015-11-16T13:40:04Z<p>Matthew Doidge 1ac9bd3994: </p>
<hr />
<div>'''Tuesday 1oth November 2015, 10.20 GMT'''<br /><br />
Down to 13 Open tickets this week!<br />
<br />
All the UK can be found [http://tinyurl.com/nwgrnys '''here'''].<br />
<br />
Not very exciting - the ECDF-RDF ticket [https://ggus.eu/?mode=ticket_info&ticket_id=117447 117447] (atlas complaining about stale file handle error messages - but it's for the "RDF" so not urgent?) might have slipped past Andy's sentries yesterday, and the two remaining pilot role tickets for Sheffield and the Tier 1 ([https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] and [https://ggus.eu/?mode=ticket_info&ticket_id=116866 116866]) could really do with updates - with the latter Daniela notes that the gridpp pilots roles are needed for testing purposes.<br />
<br />
Anything I've missed?<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios'''] - no persistent trouble anywhere (that matters to us) - a few ARC CE submit troubles over the last hour or so with some RALPP CEs but I don't think that's ought to worry about.<br />
<br />
'''Monday 2nd November 2015, 13.30 GMT'''<br /><br />
22 Open UK Tickets this week. First Monday of the Month, so all the tickets get looked at, however run of the mill they are.<br />
<br />
First, the link to all the UK [http://tinyurl.com/nwgrnys '''tickets'''].<br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116915 116915] (14/10)<br /><br />
Low availability Ops ticket. On holded whilst the numbers sooth themselves. On Hold (23/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116865 116865] (12/10)<br /><br />
Sno+ job submission failures. Not much on this ticket since it was set In Progress. Looks like an argus problem. How goes things at Sussex before Matt RB moves on? (We'll miss you Matt!). In progress (20/10)<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117261 117261] (28/10)<br /><br />
Atlas jobs failing with stage out failures. Federico notices that the failures are due to odd errors - "file already existing", and that things seem to be calming themselves. He's at a loss of what RALPP can do. Checking the panda link suggests the errors are still there today. Waiting for reply (29/10)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116775 116775] (6/10)<br /><br />
Bristol's CMS glexec ticket. It looks like the solution is to have more cms pool accounts (which of course requires time to deploy). In progress (28/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117303 117303] (30/10)<br /><br />
CMS, not Highlander fans, don't seem to believe that There can be only One (glexec ticket). Poor old Bristol seem to be playing whack-a-mole with duplicate tickets. Is there a note that can be left somewhere to stop this happening? Assigned (30/10)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=95303 95303] (Long long ago)<br /><br />
Edinburgh's (and indeed Scotgrid's) only ticket is this tarball glexec ticket. A bit more on this later. On hold (18/5)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=114460 114460] (18/6)<br /><br />
Gridpp (and other) VO pilot roles at Sheffield. No news for a while, snoplus are trying to use pilot roles now for dirac so this is becoming very relevant. In progress (9/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116560 116560] (30/9)<br /><br />
Sno+ jobs failing, likely due to too many being submitted to the 10 slots that Sno+ has. Maybe a WMS scheduling problem - Stephen B has given advice. Elena asked if the problem persisted a few weeks ago. Waiting for reply (12/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116967 116967] (17/10)<br /><br />
A ROD availability ticket, on hold as per SOP. On hold (20/10)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116478 116478] (28/9)<br /><br />
Another availability ticket. Autumn was not kind to many of us! On hold (8/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116882 116882] (13/10)<br /><br />
Enabling pilot snoplus users at Lancaster. Shouldn't have been a problem, but turned into a bit of a comedy/tragedy of errors by yours truly mucking up. Hopefully fixed now- thanks to Daniela for her patience. In progress (2/11)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=95299 95299] (Far far away)<br /><br />
glexec tarball ticket. There's been a lot of communication with the glexec devs about this - the hopefully last hurdle is sorting out the RPATHs for the libraries. It's not a small hurdle though... On hold (2/11)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117151 117151](23/10)<br /><br />
A ticket about jumbo frame problems, submitted to QM. After Dan provided some education the user replied, in that he only sees this problem at two atlas sites. But he is contacting the network admins at his institution to see if it is their end. On hold (29/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117011 117011] (19/10)<br /><br />
ROD ticket for glue-validate errors. Went away for a while after Dan re-yaimed his site bdii, but possibly back again. Daniela suggests re-running the glue-validate test. Reopened (2/11)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116689 116689] (6/10)<br /><br />
Another ROD ticket, where Ops glexec test jobs are seemingly timing out for QM (this is the ticket Daniela mentioned on the ops mailing list). Dan noted that with the cluster half full tests were passing, suggesting some kind of load correlation (but as he also notes - what's getting loaded and causing the problem - Batch, CE or WNs?). Kashif reckons the argus server, and suggests a handy glexec time test which he posted. In progress (2/11)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117324 117324] (2/11)<br /><br />
A fresh looking ROD ticket - Raul had to restart the BDII and hopefully that got it. In progress (2/11)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116358 116358] (22/9)<br /><br />
Missing Image at 100IT. 100IT have asked for more details, no news since. Waiting for reply (19/10)<br />
<br />
'''THE TIER 1'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116866 116866] (12/10)<br /><br />
Snoplus pilot enablement (not actually a word) at the Tier 1. New accounts were being requested after some internal discussion. On hold (19/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116864 116864] (12/10)<br /><br />
CMS AAA tests failing (the submitter notes "again..."). There are some oddities with other sites, which might be remote problems, but Andrew notes that previous manual fixes have been overwritten which likely explains why problems came back. In progress (does it need to be waiting for a reply?) (26/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117171 117171] (24/10)<br /><br />
LHCB had problems with an arc CE that was misbehaving for everyone. Things were fixed, and this ticket can now be closed. Waiting for reply (can be closed) (27/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117277 117277] (30/10)<br /><br />
Atlas have spotted "bring online timeout has been exceeded). This appears to be a mixture of problems adding up, such as a number of borken disk nodes and heavy write access by atlas. In progress (2/11)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117248 117248] (28/10)<br /><br />
I believe related to the discussion on tb-support, this ticket requests that new SRM host certs that meet the requirements specified be requested for the RAL SRMs. Jens was on it, and the new certs are ready to be deployed. In progress (30/10)<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios'''] - some badness at Sussex, but they have a ticket open for that.<br />
<br />
'''Monday 26th October 2015, 15.30 GMT'''<br /><br />
26 Open UK Tickets this week. Not many seem all that exciting though.<br />
<br />
The link to all the UK [http://tinyurl.com/nwgrnys '''tickets'''].<br />
<br />
The few (two) tickets that really caught my eye are:<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117151 117151] (23/10)<br /><br />
This ticket is quite interesting, mainly for Dan schooling the submitter. QM received a ticket complaining that their jumbo frames were breaking stuff - it doesn't looking like the problem is at QM though. Naught wrong with the ticket handling. Waiting for reply (23/10)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117065 117065] (20/10)<br /><br />
Bristol have a CMS glexec error ticket that looks very similar to an existing one ([https://ggus.eu/?mode=ticket_info&ticket_id=116683 116683]), which is in turn spookily similar to a ticket being worked on by the Bristol admins ([https://ggus.eu/?mode=ticket_info&ticket_id=116775 116775]). At the very least I would say that if the two problems are different one would likely obfuscate the other. Is this a case of over-keen shifters submitting tickets without checking? I'd be tempted to close one or both of [https://ggus.eu/?mode=ticket_info&ticket_id=116683 116683] and this one ([https://ggus.eu/?mode=ticket_info&ticket_id=117065 117065]). Tell them I said it was okay to[1]. In progress (21/10)<br />
<br />
[1] It probably is okay to.<br />
<br />
There are also 4 availability tickets, all On Hold waiting for 30 days or so to pass.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios'''] looks clean at time of writing.<br />
<br />
'''Monday 19th October 2015, 14.30 BST'''<br /><br />
28 Open UK Tickets today.<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116920 116920] (14/10)<br /><br />
UCL have a availability ticket, and Andrew M wonders what can be done for them as a VAC site to stop getting these sort of tickets? Assigned (15/10) ''Update - Alessandra has updated and On-Holded the ticket. Thanks!''<br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116865 116865] (12/10)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116915 116915] (14/10)<br /><br />
Sussex have a Sno+ ticket and a ROD ticket that don't seem to have been looked at since they were submitted last week. Both just Assigned.<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116918 116918] (14/10)<br /><br />
Another ROD ticket (Invalid glue), I think this one has snuck past the Liver-Lad's watch. Assigned (14/10)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116782 116782] (7/10)<br /><br />
Another Rod ticket, Dan looks like he's tracked down why one of his CE's is misbehaving (MaxStartups in sshd_config). Looks like this ticket at least can be closed, Gareth confirms that the test is green. Waiting for reply (19/10)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116560 116560] (30/9)<br /><br />
Stephen B has added some more information to try to figure out why Sno+ jobs are flooding Sheffield's 10 Sno+ slots. Elena is not sure if the problem persists though. Waiting for reply (12/10)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116864 116864] (12/10)<br /><br />
It looks like this CMS AAA test problem has resolved itself - Federica asks if you chaps at RAL changed anything? Looks like the ticket can be closed. In progress (15/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116866 116866] (12/10)<br /><br />
Enabling Sno+ pilots at the Tier 1. Only the LHC VOs had pilot roles enabled at the Tier 1, Andrew was going to discuss how best to make these changes. As with a similar issue at Lancaster - probably best to do it for all the VOs that will be using Dirac. In progress (13/10)<br />
<br />
'''Monday 12th October 2015, 14.30 BST'''<br /><br />
23 Open UK Tickets this week. Just a light review.<br />
<br />
'''OXFORD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116752 116752] (1/10)<br /><br />
Oxford's CMS Phedex renewal ritual ticket. Chris B kindly offered in his ticket for RALPP to officiate this (un)holy task for Oxford, so I advise some communication between you chaps and him - which may well be going on, but it ain't in the ticket! Assigned (6/10)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116683 116683] (5/10)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116775 116775] (6/10)<br /><br />
CMS have by the looks of it thrown a pair of duplicate tickets Bristol's way. How rude! Lukasz has rightfully suggested closing one of them (I suggest 116683. Winnie's put a good reply in t'other one).The underlying problem appears to be a shortage of pool accounts - what are the recommended amount of accounts for VOs these days? In progress.<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] (18/6)<br /><br />
Sheffield pilot role tickets. Daniela has pointed out that as Sno+ have started using Sheffield in earnest they really could do with pilot roles enabled. In progress (9/10)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116812 116812] (8/10)<br /><br />
LHCB asked Andy to clean up the $LOGIN_POST_SCRIPT for LHCB at his site, removing a "export CMTEXTRATAGS="host-slc5"" line. Naught wrong with the ticket, but I liked how lhcb did some good debugging, and there's something about a profile script problem that reminds me of a simpler time... In progress (9/10)<br />
<br />
[http://tinyurl.com/nwgrnys '''A link to the rest of the tickets for completeness.''']<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''The other VO nagios.''']<br /><br />
Looks okay, some (known I think) errors at QM, and Sheffield seems to have a few SRM errors going on since Saturday.<br />
<br />
<br />
'''Monday 5th October 2015, 14.15 BST'''<br />
<br />
22 Open UK Tickets this month, all of them, Site by Site:<br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116136 116136] (9/9)<br /><br />
Sussex got a snoplus ticket for a high number of job failures, although simple test jobs ran okay. Matt asks if the problem persists, the reply was a resounding "not sure". In progress (think about closing) (21/9)<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116652 116652](1/10)<br /><br />
A ticket from CMS, about some important Phedex ritual that must occur on the 3rd of November, when the stars are right. The ticket needs some confirmation and feedback, plus the nomination of one site acolyte to receive the DBParam secrets from CMS - but the ticket only got assigned to sites this morning. Assigned (5/10)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116651 116651](1/10)<br /><br />
Same as the RALPP ticket, Winnie has volunteered Dr Kreczko for the task. In progress (5/10)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303](Long, long ago)<br /><br />
glexec ticket. On hold (18/5)<br />
<br />
'''DURHAM'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116576 116576](1/10)<br /><br />
Atlas ticket asking Durham to delete all files outside of the datadisk path. Oliver asks what this means for the other tokens (I think they can be sacrificed to feed datadisk, but Brian et al can confirm that). Waiting for reply (5/10)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116560 116560](30/9)<br /><br />
Sno+ jobs having trouble at Sheffield. Looks like a proxy going stale problem as only 10 Sno+ jobs at a time can run at Sheffield. Matt M asks if/how the WMS can be notified to stop sending jobs in such a case. In progress (30/9)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460](18/6)<br /><br />
Gridpp Pilot roles. No news on this for a while, after the last attempt seemed to not quite work. In progress (30/7)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116585 116585](1/10)<br /><br />
Biomed ticketed Manchester with problems from their VO nagios box - which Alessandra points out being due to there being no spare cycles for biomed to run on. Assigned (can be put on hold or closed?) (1/10)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116082 116082](7/9)<br /><br />
A classic Rod Availability ticket. On Hold (7/9)<br />
<br />
'''LANCASTER''' (a little embarrassing that my own site has the most tickets)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116478 116478] (28/9)<br /><br />
Another availability ticket, this time for Lancaster (which has been through the wars in September). Still trying to dig our way out, but even the Admin's broke. On hold (5/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116676 116676] (5/10)<br /><br />
Another ROD ticket, Lancaster's not quite out of the woods. We think WMS access is somewhat broken. We have no idea about the sha2 error. In progress (5/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116366 116366] (22/9)<br /><br />
Sno+ spotted malloc errors at Lancaster. The problems seemed to survive one batch of fixes, but I asked again if they still see problems after running a good number of jobs over the weekend. Waiting for reply (5/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (In a galaxy far, far way)<br /><br />
glexec ticket. This was supposed to be done last week, after I had figured out [https://www.gridpp.ac.uk/wiki/RelocatableGlexec "the formula"] - but then last week happened. On hold (5/10)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115959 115959] (31/8)<br /><br />
LHCB job errors at QM, with a 70% pilot failure rate on ce05. Dan couldn't see where things are breaking (only that the CE wasn't publishing to APEL- and asks if this is the cause of the problem?) Waiting for reply (5/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116662 116662] (5/10)<br /><br />
LHCB job failures on ce05 - almost certainly a duplicate of [https://ggus.eu/?mode=ticket_info&ticket_id=115959 115959], but it might have some useful information in it. Assigned (probably can be closed as a duplicate) (5/10)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116650 116650] (1/10)<br /><br />
Imperial's invitation to the CMS Phedex DBParam ritual. Daniela's on it, as well as the other CMS sites. On hold (5/10)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116649 116649] (1/10)<br /><br />
Brunel's ticket for the great DBParam alignment of 2015. On hold (5/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116455 116455] (28/9)<br /><br />
A CMS request to change the xrootd monitoring configs. Did you get round to doing this last week Raul? In progress (29/9)<br />
<br />
'''EFDA-JET'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115448 115448] (3/8)<br /><br />
Biomed having trouble tagging the jet CE. The Jet admins think this is the same underlying issues as their other ticket <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115496 115496]. In progress (25/9)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115496 115496] (5/8)<br /><br />
Biomed unable to remove files from the jet SE. There are clues that suggest that some dns oddness is the cause, but it's not clear. In progress (18/9)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116358 116358] (22/9)<br /><br />
Ticket complaining about a missing image at the site. Some to and fro, the ball is back in the site's court. In progress (2/10)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116618 116618] (1/10)<br /><br />
The Tier 1's CMS DBParam ritual ticket. In progress (5/10)<br />
<br />
Let me know if I missed ought.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''T'OTHER VO NAGIOS''']<br /><br />
At time of writing things looka a bit rough at QM, Liverpool (just getting over their downtime) and for Sno+ at Sheffield (likely related to their ticket).<br />
<br />
<br />
Matt's on holiday until the 29th, so he's being replaced with links or any update Jeremy is kind enough to provide.<br />
<br />
[http://tinyurl.com/nwgrnys '''CAST YOUR GAZE UPON THE UK'S GGUS TICKETS HERE'''].<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''FEAST YOUR EYES ON THE T'OTHER VO NAGIOS STATUS HERE''']<br />
<br />
"Normal" service will resume in October. I'll leave y'all anticipating that lovely review of all the tickets.<br />
<br />
'''Monday 14th September, 15.00 BST'''<br /><br />
<br />
Yet another brief ticket review - its been a tad busy! Hopefully normal service will resume, err, next month. My apologies.<br />
<br />
Only 20 Open UK tickets this week.<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115805 115805] - This RAL ticket from Sno+ looks like it can be closed, with there not actually being a problem with the site, or a feature that RAL would really want to implement. <br />
<br />
Keeping on the Tier 1, does this ticket concerning the FTS3 certificates ([https://ggus.eu/?mode=ticket_info&ticket_id=115290 115290]) have anyone looking into it yet?<br />
<br />
Are these two QMUL LHCB tickets duplicates, or related? [https://ggus.eu/?mode=ticket_info&ticket_id=115959 115959] & the more recent [https://ggus.eu/?mode=ticket_info&ticket_id=116153 116153]<br />
<br />
This Sussex ticket looks like it could do with someone taking a look - still just "assigned" since the 9th - [https://ggus.eu/?mode=ticket_info&ticket_id=116136 116136]<br />
<br />
Finally, an interesting VAC ticket for Oxford - "no space left on device" for atlas jobs - [https://ggus.eu/?mode=ticket_info&ticket_id=116123 116123]. Bit of an odd one!<br />
<br />
'''Monday 7th September 2015, 15.00 BST'''<br /><br />
20 Open UK Tickets this week.<br />
Due to Interesting Downtimes (apologies for reusing that pun!) yet another fairly light review. But not much is going on in ticketland.<br />
<br />
[http://tinyurl.com/nwgrnys THE GGUS TICKETS IN ALL THEIR GLORY]<br /><br />
The Sno+ ticket [https://ggus.eu/?mode=ticket_info&ticket_id=115805 115805] is interesting, Sno+ are looking at monitoring jobs submitted via WMS through the port 9000 URLS, but the RAL WMS behaves differently from the Glasgow one and doesn't let others with the same roles look at the links. Sno+ are still "developing" their grid infrastructure with the WMS in mind by the looks of it.<br />
<br />
The pilot role at Sheffield ticket [https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] could still do with some attention, or an update.<br />
<br />
Bristol had a ticket from CMS that looks interesting - [https://ggus.eu/?mode=ticket_info&ticket_id=115883 115883]. The CMS SAM3 tests are confused by Bristol having an SRM-less SE endpoint. Waiting for reply after things were clarified.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 T'Other VO Nagios Page]<br /><br />
A few "The CREAM service cannot accept jobs at the moment" style errors at QM, but they're only a few hours old. Otherwise looking alright beyond the usual noise.<br />
<br />
Of course with these light reviews I could well be missing something, so feel free to let me know - sites or VO representatives.<br />
<br />
'''Monday 24th August 2015, 15.45 BST'''<br /><br />
<br />
21 Open UK tickets this week, most being looked at or are understood. There will be no ticket update from Matt next week (1st September) either, as he will be flapping about during a local downtime.<br />
<br />
[http://tinyurl.com/nwgrnys YE OLDE GGUS TICKET LINK]<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] (gridpp pilots at Sheffield) could do with an update. The Liverpool ticket discussed last week ([https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248]) has received an update from the user saying that the ticket can be closed. As Steve mentioned last week the underlying issue is still very much there, but I don't think this ticket is a suitable banner for us to fight that battle under.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 T'Other VO Nagios Page]<br /><br />
As usual nothing to see here at time of writing, sites are doing a grand job of working for the monitored VOs.<br />
<br />
And that's all folks! See you in Liverpool.<br />
<br />
<br />
'''Monday 17th August 2015, 15.00 BST'''<br /><br />
29 Open UK Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114992 114992] (10/7)<br /><br />
It looks like this CMS transfer problem ticket can be closed after a user update last week, which reported enabling multiple streams solved the initial failures. In progress (13/8)<br />
<br />
'''EFDA-JET'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115613 115613] (assigned) ''Update- looks like this was a temporary problem, and the ticket can be closed.<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115448 115448] (in progress, but empty)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115496 115496] (in progress, but might not be a site problem).<br /><br />
Jet have 3 biomed tickets, 2 of which are looking a little neglected.<br />
<br />
'''CAMBRIDGE'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115655 115655] (12/8)<br /><br />
John rightfully asks why does a lengthy but very scheduled downtime set off a ROD alarm. Other then that it'll be worth "on-holding" this ticket whilst the unrighteous red flag fades. In progress (17/8) ''Update - On holded''<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
This Sno+ ticket looks like it needs some chasing up, no news for nearly 2 months. In progress (21/7) ''Update - Steve commented on this on TB-SUPPORT''<br />
<br />
'''Monday 10th August 2015, 15.00 BST'''<br /><br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 T'Other VO Nagios] looks alright at time of writing.<br />
<br />
24 Open UK tickets this week.<br />
<br />
''Lots of activity on GGUS since yesterday. Most positive, but I see the number of tickets at EFDA-JET increasing - all three from biomed.''<br />
<br />
'''Tier 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115512 115512] (5/8)<br /><br />
An interesting ticket - where a banned user is still banned after moving to LHCB from Biomed (no, he's not called Heinz). In Progress (6/8) ''Update - Andrew has asked that the ticket be reassigned to the argus devs as the pepd logs are showing oddness. Waiting for reply now.''<br />
<br />
'''Bristol'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115565 115565] (7/8)<br /><br />
Bristol's phedex agents are down, and have been for a few days. I might have dreamt this, but thought that the Bristol Phedex service might not be hosted at Bristol, especially after RALPP had a similar ticket ([https://ggus.eu/?mode=ticket_info&ticket_id=115566 115566]) at the same time. Assigned (7/8) ''Update - solved''<br />
<br />
'''Manchester'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115504 115504] (ROD Ticket) ''Solved'' <br /> <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115399 115399] (Wiki Ticket) ''Still Open'' <br /> <br />
Both these tickets look like they can be closed.<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115525 115525] (5/8)<br /><br />
Atlas deletion errors after a a disk server fell over- nothing wrong with the ticket handling, but Alessandra brings up a point that always niggles me - the emphasis on the total number of transaction errors and not the number of affected unique files. In progress (8/8)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB ticket sparked by those IPv6 problems. Still no word from Vladimir; Raja - could you comment? I suspect there's been plenty of room for LHCB jobs in QM's (and everyone else who mainlines atlas jobs) queues this weekend. Waiting for reply (21/7)<br />
<br />
'''Monday 3rd August 2015, 14.30 BST'''<br /><br />
<br />
23 Open tickets this month, full review time.<br />
<br />
'''''Newish this morning'''''<br /><br />
As discussed on TB-SUPPORT, a few sites have been getting "I can't lcg-tag your CE" tickets from Biomed. The Liverpool ticket [https://ggus.eu/?mode=ticket_info&ticket_id=115449 115449], solved and verified, was the flagship of these issues. Brunel ([https://ggus.eu/?mode=ticket_info&ticket_id=115445 115445]) and EFDA-JET ([https://ggus.eu/?mode=ticket_info&ticket_id=115448 115448]) also have tickets about this.<br />
<br />
<br />
'''Sno+ "glite-wms-job-status warning"''' (3/8)<br /><br />
Glasgow: [https://ggus.eu/?mode=ticket_info&ticket_id=115435 115435]<br /><br />
Tier 1: [https://ggus.eu/?mode=ticket_info&ticket_id=115434 115434]<br /><br />
Matt M submitted these tickets to Glasgow and the Tier 1 after having trouble with a proportion of Sno+ jobs. Both are being looked at- definitely worth collaborating on this one.<br />
<br />
'''Wiki-leak...'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115399 115399](31/7)<br /><br />
Jeremy noticed that the wiki didn't work for him on Friday - but it seems to work for Jeremy, Alessandra and myself now. As Jeremy notes the ticket can be closed, but out of interest did anyone else spot any problems? In progress (3/8)<br />
<br />
'''Spare the ROD...'''<br /><br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115433 115433] (3/8)<br /><br />
Some CE problems noticed on the dashboard for the Liver-lads - who might be in mourning. Assigned (3/8) ''Update - aaannd Solved by upping gridftp max connections''<br />
<br />
'''RALPP & UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114764 114764]<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114851 114851]<br /><br />
Both of these are "availability" alarm tickets, on-holding until they clear. I hope RALPP managed to get a re-computation for their unfair failures (IGTF-1.65 problems on ARC).<br />
<br />
'''Sno+'d Under'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115387 115387] (30/7)<br /><br />
I'm uncertain if this Sno+ ticket, probably somewhat related to Matt M's recent thread on TB-SUPPORT and concerning xrootd access, is meant for the Tier 1 or RALPP. Assigned (3/8)<br />
<br />
'''First Tier Problems.'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115417 115417] (2/8)<br /><br />
LHCB spotted a number of nodes with cvmfs problems at the Tier 1, which the RAL team had already jumped on and repaired this morning. They wonder if the problem persists. Waiting for reply (3/8)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115290 115290] (28/7)<br /><br />
An FTS problem requiring some special CA magic to solve, but the current CA-wizard isn't about. On hold (29/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113836 113836] (20/5)<br /><br />
Glue 1 vs Glue 2 queue mismatches. It's being worked on perfecting cluster publishing for ARC CEs, but the ticket could either do with an update or on-holding. In progress (24/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114992 114992] (10/7)<br /><br />
CMS transfers failing between RAL and, err, TAMU in the US. Assigned to RAL, where Brian has investigating and Andrew has posed an good question, asking if the user has considered managing the transfers with FTS. Quiet on the user side. In progress (21/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/2014)<br /><br />
One of the tickets from the before times, about CMS AA access tests. It has become a long and confusing saga, but Gareth rescued it with a handy summary of the issue in his last update. How goes the battle? In progress (17/7)<br />
<br />
'''Oxford Squid is red.'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115230 115230] (24/7)<br /><br />
Which might be the colour Ewan's seeing right now! The ticket is reopened, with a comment from Alessandra that the current recommendation is to allow all CERN addresses, and asks if this is something Oxford could do. Reopened (3/8) ''Update - solved after a squid restart. It's not the end of this ordeal, but Ewan would like to tackle it in a different arena then an Oxford ticket.''<br />
<br />
'''Mavaricks and Gooses - GridPP Pilot roles.'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114485 114485] '''Bristol'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] '''Sheffield'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114442 114442] '''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114441 114441] '''RHUL'''<br /><br />
Daniela hopped right back into the pilot seat after getting back from her holidays. Bristol and RALPP are looking good, Sheffield and RHUL are still in the Danger Zone - RHUL in particular were having troubles with argus and could do with some working configs from elsewhere to compare and contrast with their own.<br /><br />
''Update - RHUL are looking better, just a few queue permission tweaks to go by the looks of it.''<br />
<br />
<br />
'''My shame - tarball glexec'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] '''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] '''Lancaster'''<br /><br />
The tarball glexec tickets. Actually this is likely to become a defunct (or at least different) problem at Edinburgh with their SL7 move. Lancaster has a plan - we plan to deploy *something* (amazing plan there Matt) during our next big reinstall in September. Between now and then I have a test CE, cluster and most importantly some time. <br />
<br />
'''Pot-luck tickets (or those I couldn't group).'''<br />
<br />
'''Durham'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br /><br />
A tiny fraction of jobs publishing 0 cores used. Looks to be a slurm oddity. Oliver upgraded their CEs to ARC5 last week and hopes this has fixed things. Fingers crossed! In progress (29/7)<br />
<br />
'''Lancaster'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/2014)<br /><br />
My other shame - Lancaster's poor perfsonar performance. It's being worked on. On Hold (should be back in progress soon) (30/7)<br />
<br />
'''Liverpool'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (21/7)<br /><br />
Sno+ production problems at Liverpool, probably due to a lack of space in the shared area. Things are back in Sno+'s court, with the submitter consulting the Sno+ gurus (I think). In progress (21/7)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB job submission problems due to the known about dual-stacking problems. Waiting for input from LHCB for a while now, as things look okay at QM now but at least check LHCB jobs still weren't running for some reason. Waiting for reply (21/7)<br />
<br />
That's all folks!<br />
<br />
'''Monday 27th July 2015, 16.10 BST'''<br /><br />
<br />
Only 20 UK Tickets this week, and many are on hold for summer holidays. I pruned a few tickets, but none are striking me as needing urgent action, so this will be brief.<br />
<br />
[http://tinyurl.com/nwgrnys Mandatory UK GGUS Link]<br /><br />
Nothing to see here really, but maybe I'm missing something? Nit-picking I see:<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (publishing problems at Durham) could still do with an update - not sure if work is progressing offline on the issue.<br />
<br />
The Snoplus ticket [https://ggus.eu/?mode=ticket_info&ticket_id=115165 115165] looks like it might be of interest for others - in it Matt M asks about tape-functionality in gfal2 tools. Brian has updated the ticket clueing us in about gfal-xattr.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 UK T'other VO Nagios]<br /><br />
A few failures here at time of writing - although only one at Brunel seems to be more then a few hours old (dc2-grid-66.brunel.ac.uk is failing pheno CE tests).<br />
<br />
Let me know if I missed ought!<br />
<br />
'''Monday 20th July 2015, 14.30 BST'''<br /><br />
27 Open UK Tickets this week.<br />
<br />
'''Brunel'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115113 115113] (17/7)<br /><br />
This Brunel Ops ticket (otherwise okay) is a good reminder that when removing CEs from the GOCDB for your site make sure to get all the services, or else you just might end up still monitored (and thus end up ticketed!). Reopened (can probably be closed soon) (20/7) ''Update - ticket solved''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114006 114006] (31/5)<br /><br />
This ticket about problems with Brunels accounting figures was looking promisingly close to being solved (at least to my layman's eyes) 3 weeks ago. Any word offline? In progress (30/6)<br />
<br />
'''Lancaster'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114845 114845] (6/7)<br /><br />
LHCB jobs were failing at Lancaster a fortnight ago, but things should have been fixed quite promptly. Are they still broken? Waiting for reply (14/7)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
A similar case for this LHCB ticket for QM (part of their ongoing dual-stack saga). Waiting for reply (13/7)<br />
<br />
Note: The related issue [https://ggus.eu/index.php?mode=ticket_info&ticket_id=115017 115017] has been "solved" pending further testing. <br />
<br />
'''Sheffield'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114649 114649] (26/6)<br /><br />
This Sno+ ticket has been in "Waiting for Reply" for a little while, no word from the user (who isn't Matt M). Could we poke Sno+ though another channel about this? Waiting for reply (6/7)<br />
<br />
'''Liverpool'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
We could do with finding out from Sno+ the state of play at Liverpool too, although we might have to wait until Steve is back from his hols later this week to field any replies. In progress (17/6)<br />
<br />
'''Durham'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br /><br />
Ticket concerning the small percentage of Durham jobs that aren't publishing their core count (probably a slurm oddity). Now that Oliver's back from holiday has he had time to look at this? On Hold (19/6)<br />
<br />
'''Tier 1'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=113836 (20/5)<br /><br />
I suspect that whilst the work described chugs along in the background we consider on holding this ticket. In progress (24/6)<br />
<br />
'''UCL'''<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=114746 114746] (30/6)<br /><br />
Ben has been battling getting his DPM working, and spotted an interesting problem where SELinux was blocking the httpd from accessing mysql. It's nice to see someone not just switching SELinux off (like I have a habit of doing...). In progress (20/7)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115003 115003] (12/7)<br /><br />
Andy having some problems on a test SE at ECDF - he seems ot be suffering a series of unfortunate errors. Maybe the storage group could help? In progress (17/7)<br />
<br />
'''GridPP Pilot Roles.'''<br /><br />
Bristol are ready for testing, Govind discovered a possible bug in argus that needs a bit more testing at RHUL. Things seem quiet at RALPP and Sheffield, and Brunel too.<br />
<br />
''Supplemental - In the region multicore publishing ticket ([https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233]) only Oxford have a CE still appearing to publish 0 cores - but I thought this CE was Old-Yeller'ed?''<br />
<br />
'''Tuesday 14th July 2015, 9.30 BST'''<br />
<br />
Lazy update today, due to some fun and games at Lancaster yesterday. <br />
<br />
[http://tinyurl.com/nwgrnys UK GGUS Tickets]<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios]<br />
<br />
'''Tickets that pop:'''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114952 114952] and [https://ggus.eu/?mode=ticket_info&ticket_id=114951 114951] are both atlas frontier tickets at RALPP and Oxford, both have been reopened - although the underlying issues seem different. The Oxford ticket is similar to one at RAL ([https://ggus.eu/?mode=ticket_info&ticket_id=114957 114957]), which looks to be caused by an unannounced change in IP for some important atlas squids (AIUI - speed reading the tickets this morning).<br />
<br />
'''QM IPv6 Woes'''<br /><br />
Followers of the atlas uk lists will have noticed some heroic attempts to diagnose and repair problems at QM which appear to be someone else's fault.<br />
LHCB's ticket to QM on the matter: [https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573]<br /><br />
Dan's ticket concerning the "rogue routes": [https://ggus.eu/index.php?mode=ticket_info&ticket_id=115017 115017]<br />
<br />
'''GridPP Pilot Roles'''<br /><br />
Durham, Bristol, Sheffield, Brunel and RHUL still have open tickets about this. Bristol are working on it, as are Durham - Oliver's ready for their setup to be tested again (Puppet overwrote his last changes!). Not much recent news from the other three.<br />
<br />
That's all from me folks, let me know if I missed ought!<br />
<br />
'''Tuesday 14th July 2015, 9.30 BST'''<br />
<br />
Lazy update today, due to some fun and games at Lancaster yesterday. <br />
<br />
[http://tinyurl.com/nwgrnys UK GGUS Tickets]<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios]<br />
<br />
'''Tickets that pop:'''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114952 114952] and [https://ggus.eu/?mode=ticket_info&ticket_id=114951 114951] are both atlas frontier tickets at RALPP and Oxford, both have been reopened - although the underlying issues seem different. The Oxford ticket is similar to one at RAL ([https://ggus.eu/?mode=ticket_info&ticket_id=114957 114957]), which looks to be caused by an unannounced change in IP for some important atlas squids (AIUI - speed reading the tickets this morning).<br />
<br />
'''QM IPv6 Woes'''<br /><br />
Followers of the atlas uk lists will have noticed some heroic attempts to diagnose and repair problems at QM which appear to be someone else's fault.<br />
LHCB's ticket to QM on the matter: [https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573]<br /><br />
Dan's ticket concerning the "rogue routes": [https://ggus.eu/index.php?mode=ticket_info&ticket_id=115017 115017]<br />
<br />
'''GridPP Pilot Roles'''<br /><br />
Durham, Bristol, Sheffield, Brunel and RHUL still have open tickets about this. Bristol are working on it, as are Durham - Oliver's ready for their setup to be tested again (Puppet overwrote his last changes!). Not much recent news from the other three.<br />
<br />
That's all from me folks, let me know if I missed ought!<br />
<br />
'''Monday 6th July 2015, 14.00 BST''' <br /><br />
30 Open UK Tickets this month. Looking at them all!<br />
<br />
'''NGI'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233] (10/6)<br /><br />
The UK not publishing core counts at all sites. Some progress, but at last check John G couldn't see a change for Oxford or Glasgow. In progress (30/6) ''Update - Glasgow seems to be okay after de-creaming, checking the July list we have t2ce6 at Oxford, ce3 and ce4 at Durham (see their ticket) and cetest02 at IC (but that node has test in its hostname!).''<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114442 114442] (18/6)<br /><br />
Gridpp Pilot role ticket. Accounts need to be created, but no word for a few weeks. In progress (19/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114764 114764] (1/7)<br /><br />
Ticket tracking (false) availability issues, created to appease COD - the problem caused by a broken CA rpm release for Arc CEs. Kashif has created a counter-ticket [https://ggus.eu/index.php?mode=ticket_info&ticket_id=114742 114742] Gordon's sagely advice is to submit a recalculation request once the issue is fixed. Assigned (1/7)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114485 114485] (19/6)<br /><br />
Bristol's gridpp pilot role ticket. No news, could do with an update really. In progress (22/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114426 114426] (18/6)<br /><br />
CMS AAA reading test problems. The Bristol admins have transferred data to their new shiny SE and have asked CMS to test again. No word since. Waiting for reply (30/6)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (1/7/13...)<br /><br />
Tarball glexec ticket, now 2 years old. After a really promising burst the last 6 weeks haven't seen any progress, due to a lot of other "normal" tarball work taking up the time. Sorry! On hold (18/5)<br />
<br />
'''DURHAM'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114536 114536] (22/6)<br /><br />
Durham's gridpp pilot role ticket. Not acknowledged yet, is Oliver back yet? Assigned (22/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114765 114765] (1/7)<br /><br />
See RALPP ticket 114764. Assigned (1/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114727 114727] (30/6)<br /><br />
Catalin ticketed that a number of SW_DIR variables at Durham are still pointing to the old school .gridpp.ac.uk cvmfs space. Assigned (30/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br /><br />
John G ticketed Durham over a small percentage of jobs being published as "zero core". Looks like a SLURM timeout problem, although a fix isn't obvious. Put on the back burner whilst Oliver is on holiday. On Hold (19/6)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114649 114649] (26/6)<br /><br />
A ticket from a Sno+ user about not being able to access software using the Sheffield CEs. Acknowledged but no news. In progress (26/6) ''Update - Elena can't find anything wrong, cvmfs seems to be working fine. Perhaps a problem with the environment?''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] (18/6)<br /><br />
Sheffield's gridpp pilot role ticket. Did you get round to rolling them out? In progress (19/6)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114444 114444] (18/6)<br /><br />
LHCB ticket concerning the DPM's SRM not returning checksum information. On hold whilst a related ticket is being looked at ([https://ggus.eu/index.php?mode=ticket_info&ticket_id=111403 111403]). On Hold (22/6)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
Another Sno+ ticket, about grid production jobs failing at Liverpool. AIUI caused by Sno+ running out of space on the shared pool. At last check Steve posted the usage information for Sno+ but no word since (and Steve's off on his hols). In progress (17/6)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114845 114845] (6/7)<br /><br />
LHCB pilots failing at Lancaster. Looks like a simple node misconfiguration, hopefully fixed, waiting to see if it is. On hold (6/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/2013)<br /><br />
glexec ticket - see Edinburgh description. On hold (15/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1)<br /><br />
Bad bandwidth performance at Lancaster. Hoping that IPv6 will shake things up a bit so pushing that. On hold (18/5)<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114746 114746] (30/6)<br /><br />
SRM-put failures ROD ticket. No news at all. Assigned (30/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114851 114851] (6/7)<br /><br />
Low availability ROD ticket, related to above. Assigned (6/7)<br />
<br />
'''RHUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114441 114441] (18/6)<br /><br />
Another GridPP pilot role ticket. Pilots rolled out, but something isn't quite right and they're not working - Govind is looking again. In progress (6/7)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB ticket about two out of three QM CEs not responding for them. Dan spotted the broken CEs were dual-stacked, the working one wasn't. The ticket seemed to have trailed off into some confusion over who needs to do some testing where. I agree with Dan that that who needs to be someone with LHCB credentials! The waters still seem muddied. In progress (1/7)<br />
<br />
'''IC'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114737 114737] (30/6)<br /><br />
The IC voms wasn't updating properly, due to what I infer from the ticket as "SSL/mysql madness". Simon and Robert have been heroically battling this one - it's a good read. On hold (3/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114379 114379] (16/6)<br /><br />
Sam's ticket about SE support in Dirac. Sam will shortly try testing things out on the new Dirac to see how it fares. In progress (6/7)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114447 114447] (18/6)<br /><br />
Brunel's gridpp pilot ticket. Being worked on, with one CE with the pilots enabled. In progress (26/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114006 114006] (31/5)<br /><br />
A ticket from APEL, about Brunel under-reporting the number of jobs they are doing. Turned out to be a problem with Arc, which Raul upgraded to the fixed 5.0 version. The APEL team deleted the sync records, but no word since. In progress (30/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114850 114850] (6/7)<br /><br />
Another APEL ticket, likely the fallout of the previous one - it looks like GAP publishing has been left on for the Brunel CREAM CEs. Assigned (6/7) ''Update - solved''<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114786 114786] (2/7)<br /><br />
Low availability ticket - see RALPP ticket 114442 - probably could do with On holding. In progress (2/7) ''Update - Onholded''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113910 113910] (26/5)<br /><br />
Sno+ data staging problems. Brian gave some advice on how the large VOs do data staging from tape, and has asked if Sno+ still has problems. Matt M might still be on leave though. Waiting for reply (23/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA problems, which eventually brought to light to a problem with super-hot datasets which were alleviated (I think). Despite an update to castor that improved performance the last batch of tests didn't show improved results. No news since. In progress (17/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113836 113836] (20/5)<br /><br />
Glue mismatch problems at RAL. Working on getting "many-Arcs" to correctly publish. In progress (24/6)<br />
<br />
'''Monday 29th June 2015, 14.30 BST'''<br /><br />
<br />
Looking at the [https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''"Other VO" Nagios''']. <br /><br />
Things look generally alright - but Durham look like they need to update their CA rpms - but that might have to wait until Oliver is back from leave.<br />
<br />
'''Tarballs:'''<br /><br />
I don't think this effects many, but there's was a ticket to produce a new version of the WN tarball (which is done): [https://ggus.eu/index.php?mode=ticket_info&ticket_id=114574 114574]<br ><br />
Although AFAICS there is no urgent need to upgrade tarball WNs.<br />
<br />
26 UK Tickets, although not many stand out.<br />
<br />
'''Gridpp VO Pilot tickets:'''<br /><br />
Largely doing alright. With Oliver away the Durham ticket hasn't been looked at yet. Sheffield and Bristol's tickets could do with an update (or on-holding if there's going to be a delay). The RHUL ticket has been reopened as they're deployment of the pilot roles hasn't quite worked out.<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB having trouble with two out of three QM CEs. Dan notes that the two "broken" CEs have been recently dual-stacked, and asks if this could be the problem. The answers is a resounding "maybe", and Raja asks if problems could be duplicated by others using lxplus. Waiting for reply (24/6)<br />
<br />
'''IMPERIAL/DIRAC'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114379 114379] (16/6)<br /><br />
Sam's ticket trying to get SE support with Dirac...er, spruced up. Daniela has asked if the tests can be redone with the "new" dirac. Waiting for reply (22/6)<br />
<br />
Let me know if I missed any tickets.<br />
'''Monday 22nd June 2015, 14.00 BST'''<br />
<br />
35 Open UK Tickets this week (!!!)<br />
<br />
'''GridPP Pilot Role'''<br /><br />
A dozen of them are from Daniela (who painstakingly submitted them all) concerning getting the gridpp (and other) pilot role enabled on the site in question's CEs.<br />
<br />
An example of one of these tickets is:<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=114440 114440] (Lancaster's smug solved ticket).<br />
<br />
Ticketed sites are: Durham, Bristol, Cambridge, Glasgow, ECDF (who are also having general gridpp VO support problems), EFDA-JET (looking solved), Oxford, Liverpool, Sheffield, Brunel, RALPP and RHUL. Most tickets are being worked on fine, but the Bristol and Liverpool ones were still just in an "assigned" state at time of writing. <br /><br />
''Update - good progress on this, just one ticket left "assigned". Cambridge are done, as are JET (ticket needs to be closed). Oxford and Manchester are ready for to have their new setups tried out, with Oxford kindly road-testing glexec for the pilot roles. Good stuff.'' <br />
<br />
'''Core Count Publishing'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233] (10/6)<br /><br />
Of the sites mentioned in this ticket (Durham[1], IC, Liverpool, Glasgow, Oxford) who *hasn't* had a go at changing their core count publishing? I know Oxford have. Daniela had a pertinent question about publishing for VMs, which John answered. In progress (17/6)<br />
<br />
[1] Durham have another ticket on this which may explain their lack of core count publishing: <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br />
<br />
'''DIRAC'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114379 114379] (16/6)<br /><br />
Sam S formed this ticket over having trouble accessing the majority of SEs over Dirac, after some discussion around this last week. Sam acknowledges that this could be a site problem, not a DIRAC problem, but you gotta start somewhere (he worded that point more eloquently). Daniela has posted her latest and greatest DIRAC setup gubbins for Sam to try out. Another, unrelated, point to have are the names missing from Sam's list - for example I'm pretty sure Lancaster should support gridpp VO storage but I've forgotten to roll it out! Waiting for reply (22/6)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
Final ticket today, and another one discussed last week in the storage meeting. Steve's explanation of why (and how) Sno+ would need to start to using space tokens was fantastically well worded in a way to not spook easily startled users. David is digesting the information, but it will likely need to wait for Matt M's return before we'll see progress. In progress (16/6)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114444 114444](18/6)<br /><br />
I told a pork pie when I said that was the last ticket - this one caught my eye. A ticket from lhcb over files not having their checksums stored on Manchester's DPM. A link was given to another ticket at CBPF for a similar issue which got the DPM devs involved ([https://ggus.eu/index.php?mode=ticket_info&ticket_id=111403 111403]) - although Andrew McNab was already subscribed to the ticket. In progress (19/6)<br />
<br />
'''Monday 15th June 2015, 14.15 BST'''<br /><br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO nagios:]<br /><br />
ce05.esc.qmul.ac.uk and hepgrid11.ph.liv.ac.uk seem to be having a spot of bother for multiple VOs, and hepgrid2.ph.liv.ac.uk seems to be starting to have trouble too.<br />
<br />
19 Open UK Tickets this week.<br />
<br />
'''NGI'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233] (10/6)<br /><br />
John Gordon ticketed the NGI (as well as others) about some sites in the UK not publishing core counts with their APEL numbers (or more precisely have submission hosts at that site not publishing). Following the link it looks like Imperial, Liverpool, Durham, Glasgow and Oxford are on this list, I've listed the submission hosts reporting "0" core jobs below to help people clear up their rogues! If you've only fixed things in the last fortnight you'd still show up on this list. In progress (15/6) <br />
<br />
"0" core job submission hosts:<br /><br />
ce3.dur.scotgrid.ac.uk<br /><br />
ce4.dur.scotgrid.ac.uk<br /><br />
cetest02.grid.hep.ph.ic.ac.uk<br /><br />
hepgrid5.ph.liv.ac.uk<br /><br />
hepgrid6.ph.liv.ac.uk<br /><br />
hepgrid97.ph.liv.ac.uk<br /><br />
svr009.gla.scotgrid.ac.uk<br /><br />
t2ce06.physics.ox.ac.uk<br />
<br />
'''Tier 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113914 113914] (26/5)<br /><br />
Sno+ file copying ticket. Matt M is away on his hols, but Dave Auty has took over his duties and reports that this problem seems to have gone away - it can probably be closed. In progress (9/6)<br />
<br />
'''Liverpool'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
Another Sno+ ticket here from David concerning job failures. Nothing wrong with the ticket handling, but I thought that David's errors in submitting test jobs are worth documenting, as they were very understandable. David has since asked if the Sno+ job failures are linked to Sno+ nagios test failures at Liverpool. In progress (12/6)<br />
<br />
'''Imperial'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114157 114157] (8/6)<br /><br />
After Simon and Daniela have cleared up the atlas dark data and expanded their Space Tokens using the space freed up there still seems to be some confusion in Rucio, disagreeing with the SRM numbers. In progress (10/6)<br />
<br />
'''Oxford'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114208 114208] (9/6)<br /><br />
Oxford being ticketed for UKI-SOUTHGRID-OX-HEP_IPV6TEST failing connection tests, tests for which Oxford should not be getting ticketed afaacs. I remember this being mentioned in the Thursday cloud meeting, but I'm ashamed to say I wasn't paying attention. Were any conclusions drawn/decisions made? In progress (10/6)<br />
<br />
'''Manchester'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114153 114153] (7/6)<br /><br />
Atlas transfer failures from Manchester. Errors are still occurring as of yesterday, any news? In progress (14/6)<br />
<br />
<br />
'''Monday 8th June 2015, 15.00 BST'''<br /><br />
21 Open Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113914 113914] (26/5)<br /><br />
Sno+ had problems at the Tier 1 where jobs failed whilst uploading data, believed to be due to an incorrect VOInfoPath. There's been a failure at replicating the issue, and the VOInfoPath advertised is correct. Very confusing, as I assume it all worked at some point before! In progress (2/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113910 113910] (26/5)<br /><br />
Another Sno+ ticket, concerning lcg-cp timeouts whilst data-staging from tape. Matt M has asked for advice on the best practice for doing this, or if Sno+ would be better off just upping their timeouts. Brian has given some advice on using the "bringonline" command, but is himself unsure the best way of seeing what files are currently in a VO's cache. Not much news since. In progress (28/5) <br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114004 114004] (31/5)<br /><br />
Atlas transfers fail due to the "bring-online" timeout being exceeded. Brian spotted a problem with file timestamps mismatching, but no news on this ticket since. In progress (1/6)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114006 114006] (31/5)<br /><br />
APEL accounting oddness at Brunel, noticed by the APEL team. After much to-and-fro-ing John noticed that multiple CEs were using the same SumbitHost, and thus overwriting each other's sync records. Something to watch out for. In progress (7/6)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114157 114157] (8/6)<br /><br />
There's been some debate on the atlas lists about this ticket, a classic "not enough space at the site" ticket. Raising above the indignation over being ticketed for this, Daniela has offered a couple of TB to give some space, and pointed out that IC have some atlas data outside space tokens, and that this could be used to expand the tokens if cleaned up. Waiting for reply (8/6)<br />
<br />
<br />
'''Tuesday 26th May 2015'''<br /><br />
Matt's on leave until the 8th of June. But he's replaceable with handy links... 23 tickets today:<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios''']<br /><br />
<br />
[http://tinyurl.com/nwgrnys '''UK NGI GGUS tickets''']<br />
<br />
'''Monday 18th May 2015, 14.30 BST'''<br /><br />
Full review this week.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios''']<br /><br />
At time of writing I see problems with test jobs at Brunel for pheno and Liverpool for a number of VOs (see Sno+ ticket for probable cause and fix at Liverpool).<br />
<br />
22 Open UK Tickets this week. Going site-by-site:<br />
<br />
'''APEL/NGI'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113473 113473] (4/5)<br /><br />
Missing accounting date for April for some sites. Raul is discussing things for Brunel in the ticket, although they have republished. I think it's only ECDF left to republish their April data. In progress (16/5)<br />
<br />
'''OXFORD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113482 113482] (26/4)<br /><br />
Loss of accounting data for Oxford needing a APEL republish. The Oxford guys republished, but there is some confusion with the resulting numbers. Discussion is ongoing, John G is currently looking at the records. In progress (14/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113650 113650] (11/5)<br /><br />
CMS glideins failing at Oxford. The original problem was with a config tweak being left out of the cvmfs setup, but the ticket has been reopened citing problems persisting on the ARC CE (the CREAM appears to be fixed). Reopened (16/5)<br />
<br />
'''GLASGOW'''<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113095 113095] (17/4)<br /><br />
ROD ticket about batch system BDII failures, left open to avoid unnecessary ticket filing. Gareth noted that the full migration to ARC and HTCondor, which should see the end of these issues, will hopefully be completed by the end of June. On Hold (12/5)<br />
<br />
'''ECDF'''<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (31/7/13)<br /><br />
Somehow left this one out of the e-mail update. Edinburgh's glexec ticket, dependent on the tarball. I put in my tuppence worth today with my tarball hat on. On hold (18/5)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113769 113769] (18/5)<br /><br />
LHCB see a cvmfs problem at Sheffield. Elena has probably fixed the problem(restarted the sssd), just waiting to see if it all pans out. In progress (18/5)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113744 113744] (15/5)<br /><br />
For the VOMS rather then the site, Jens' request for the creation of the dIrac VO, vo.dirac.ac.uk. In progress (18/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113692 113692] (13/5)<br /><br />
A request from pheno to add support to for their new cvmfs area at Manchester, and as I understand it, to support them in a new "form" (pheno.egi.eu). In progress (13/5)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113742 113742] (15/5)<br /><br />
Sno+ noticed their nagios failures at Liverpool. Rob reckons this was a problem with the DPM BDII service certificate not being updated (that's bitten me too), and fixed things this morning. Let's see how that goes. In progress (18/5)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13!)<br /><br />
Lancaster's vintage glexec ticket. An update on this - after have a roundtuit session last week I was building glexec for different paths. It still needs some testing to make sure it works properly. There however definitely won't be a one-size-fits-all tarball solution. On hold (15/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/14)<br /><br />
Only the crustiest old tickets for us at Lancaster! Poor perfsonar performance. Sadly didn't get roundtuit on this one - we're pushing getting these nodes dual stacked as Ewan had pointed out that it would be interesting to see if IPv6 tests also saw this issue. On hild (18/5)<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113721 113721] (14/5)<br /><br />
The only UCL ticket, this is a egi "low availability" ticket. However Daniela notes that the plots are on the rise, so things are looking alright. Probably want to "On Hold" it but otherwise not much to be done. In progress (14/5)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113743 113743] (15/5)<br /><br />
A ticket from Durham concerning the Dirac instance at Imperial's settings for their site. Daniela hopes to get it fixed soon. In progress (15/5)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112948 112948] (10/4)<br /><br />
CA certificate update at 100IT leading to a discussion of other authentication based failures. David has asked for voms information after posting his configs. In progress (13/5)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113035 113035] (14/4)<br /><br />
Ticket tracking the decommissioning of the Tier 1 CREAM CEs. I think things are just about done now, this ticket can soon be closed. In progress (11/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694] (28/10/14)<br /><br />
Sno+ gfal-copy ticket. Brian reports that the Tier 1 is upgrading gfal2 on their WNs, and notes that there's a lot of active debugging work going on in the area. As he eloquently puts it "situation is quite fluid". In progress (13/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA tests failing at the Tier 1. There's been a lot of work on this, deploying then trying to get the new xrootd director configured. New problems have cropped up, and are under investigation. In progress (11/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (28/3)<br /><br />
Atlas transfer failures ("failed to get source file size"). Tracked to a odd double transfer error, possibly introduced in one of the recent "upgrades". Brian has been declaring these files as bad, and a workaround or solution is being thought about. In progress (14/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113705 113705] (13/5)<br /><br />
Atlas transfer failures from RAL tape. Checksum failures, which Brian tracked to being due to not being of a type Castor supports. Brian has asked if this can be changed at the CERN FTS or in rucio. Waiting for reply (14/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113748 113748] (16/5)<br /><br />
Another atlas transfer ticket, but as the error indicates no space left at the Brunel space token being transferred to Elena has noted that this isn't a site problem, telling the submitter to put in a JIRA ticket instead. Waiting for reply, but probably can be just closed (16/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112866 112866] (7/4)<br /><br />
Lots of cms job failures at RAL. This has been traced to some super-hot files, mitigation is being looked into. A candidate for perhaps On Holding, depends on the time frame of a work around. In progress (13/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113320 113320] (27/4)<br /><br />
CMS data transfer issues. I'm not actually too sure what's going on. There are files that need invalidating, which seems to be the root of the evil befalling transfers. The issue is being actively worked on though. In progress (18/5)<br />
<br />
'''Monday 11th May 2015, 14.10 BST'''<br /><br />
22 Open UK Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
There are a few tickets at the Tier 1 that are set "In Progress" but haven't received an update yet this month:<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (CMS AAA Tests, 30/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (Atlas Transfer problems, 16/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694] (SNO+ gfal copy trouble, 15/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112866 112866] (CMS job failures, 7/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112819 112819] (SNO+ arcsync troubles, 20/4)<br />
<br />
'''Other Tier 1 Tickets''' (sorry to be picking on you guys!)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (10/2)<br /><br />
Atlas glexec hammercloud test jobs at the Tier 1. It appears to be working, but a batch of test jobs failed because they couldn't find the "mkgltempdir" utility on some nodes ("slot1_5@lcg1742.gridpp.rl.ac.uk" and "slot1_4@lcg1739.gridpp.rl.ac.uk"). In progress (4/5)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=113320 113320] (27/4)<br /><br />
Maybe repeating what Daniela is going to say in the CMS update - trouble with CMS data transfers within RAL. It's under investigation, but it looks like the files in question will need to be invalidated - even if it's just to paint a clearer picture. In progress (10/5)<br />
<br />
'''APEL REPUBLISHING'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113473 113473] <br /><br />
At last update Brunel, Liverpool, Edinburgh, Birmingham and Oxford need to republish still. Oxford have their own ticket about it due to complications ([https://ggus.eu/?mode=ticket_info&ticket_id=113482 113482]).<br />
<br />
'''UCL Tickets''' - Ben is starting to move to close these, some are going to be "unsolved".<br />
<br />
'''GLASGOW'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113095 113095] (17/4)<br /><br />
Andrew asks if the timeframe for the move to Condor be added to this ticket, for the ROD team's information. On Hold (7/4)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112948 112948] (10/4)<br /><br />
No news on this 100IT ticket for a while. In progress (27/4)<br />
<br />
'''Friday 1st May'''<br /><br />
The Bank Holiday weekend might muck up plans for a Ticket review this week. Just in case, some links!<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios page.]<br /><br />
<br />
[http://tinyurl.com/l784bbg UK GGUS Tickets]<br />
<br />
Hope you all have a nice weekend!<br />
<br />
A quick check of the [https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios] page.<br />
<br />
26 Open UK tickets this week.<br />
<br />
'''ITWO Decommissioning'''<br /><br />
Three of the tickets are to the VOMS sites (Manchester, Oxford, IC), concerning the decommissioning of the ITWO VO. Just an FYI to y'all.<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113293 113293] (26/4)<br /><br />
There was an APEL problem last month where a lot of sites needed to republish their data for the month. I think Edinburgh are the only UK site that suffered this problem, but another FYI ticket. Assigned (26/4) '' And solved''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113181 113181](21/4)<br /><br />
Atlas production jobs not running at ECDF. Andy noticed that analysis jobs were running fine, and believes that this might be a problem scheduling pilots in time. Perhaps a multicore issue if this is only effecting (affecting?) production jobs. In progress (22/4) ''Update - solved''<br />
<br />
'''ATLAS GLEXEC HAMMERCLOUD PROBLEMS'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (TIER 1)<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=111703 111703] (RALPP)<br /><br />
It was discovered that there was a problem in the test code, so the ball is very much in atlas' court for this one. The problem has been fixed and the tests are being rebuilt and resubmitted.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (28/3)<br /><br />
ATLAS FTS failures too RAL. A rucio issue causing double-transfers has been discovered ([https://its.cern.ch/jira/browse/ATLDDMOPS-4939 here]), which would explain the behaviour seen. No news since this revelation. In progress (16/4)<br />
<br />
There are a number of other Tier 1 tickets that could do with either an update or On Holding<br />
<br />
'''Monday 20th April 2015, 14.30 BST'''<br /><br />
24 Open UK tickets this week, only a light review. ''Update - down to 20 open tickets as of this morning''<br />
<br />
'''NGI/TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113150 113150] (20/4)<br /><br />
Fresh in - the NGI has been ticketed to change the regional VO from emi.argus to ngi.argus in the gocdb. Seems a bit pedantic, but hey! I assigned it to the NGI ops, and notified RAL as keepers of the regional argus. Assigned (20/4) ''Update - solved''<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113035 113035] (14/4)<br /><br />
Just for people's interest, the ticket tracking the decommissioning of the last of the RAL CREAM CEs. In progress (14/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112819 112819] (2/4)<br /><br />
A SNO+ ticket I must of somehow missed last week, concerning SNO+'s manual renewing of proxies on ARC machines. Matt M has noticed that ArcSync occasionally hangs rather then timeouts smoothly (although he later notes that he doesn't see the initial problems working from a different network). I'm thinking that this should be redirected at the arc devs, but I don't think they have a GGUS support group (I could be wrong, I'm well behind on the ARC curve). In progress (7/4)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113110 113110] (17/4)<br /><br />
Looks like this atlas low transfer efficiency ticket can be closed. Waiting for reply (20/4) ''Update - solved''<br />
<br />
'''GLASGOW'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113095 113095] (17/4)<br /><br />
ROD ticket for some BDII misreporting at Glasgow. The botheration seems to be ephemeral in nature, the blunders passing with the abating of their batch system's burden. This ticket can probably be solved. In progress (17/4)<br />
<br />
<br />
'''Monday 13th April 2015, 14.00 BST'''<br /><br />
24 Open tickets this week - going over all of them this week, site by site.<br />
<br />
''Fresh in this morning - [https://ggus.eu/?mode=ticket_info&ticket_id=113010 113010] and [https://ggus.eu/?mode=ticket_info&ticket_id=113011 113011] - Sno+ tickets concerning the RAL and Glasgow WMSes not updating job statuses.''<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703](11/2)<br /><br />
Atlas glexec hammercloud tests failing. There's been a lot of waiting on atlas to build new HC jobs. The most recent exchange (delayed due to Easter), was asking about SELinux - but no news since the first. In progress (1/4)<br />
<br />
'''BIRMINGHAM'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112875 112875](7/4)<br /><br />
Low availability ROD ticket. Availability is crawling back up, just need it to go green. On hold (13/4)<br />
<br />
'''GLASGOW'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112967 112967](10/4)<br /><br />
Another ROD ticket for bdii errors at Glasgow. Gareth has been doing everything right investigating this. Kashif recommended ticketed the midmon unit, but Gareth has spotted that the errors correspond to high load on their ARC CE - so it might be a site problem after all - Gareth asks for clarification. Waiting for reply (13/4)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (1/7/13)<br /><br />
Tarball glexec ticket. No news (sorry). End of April I believe was the "deadline" I set for having this made. On Hold (9/3)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/14)<br /><br />
Lancaster's poor perfsonar performance. I'm not believing quite what I was seeing with the tests I performed so I'm aiming to rerun them. On hold (13/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13)<br /><br />
Lancaster's tarball glexec ticket. Same as ECDF. On hold (9/3)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112966 112966] (13/3)<br /><br />
A ROD cream job submit ticket, freshly assigned this afternoon. It's a bit mean of me to bring notice to it. Assigned (13/4) ''And POW, Raul closed this after kicking torque into shape - solved''<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112948 112948] (10/4)<br /><br />
100IT needed to upgrade to the latest CA release. They've done this, but there are still authentication problems. In progress (13/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356] (10/9/14)<br /><br />
Deploying vmcatcher at 100IT. After David's questions falling on deaf ears for a while it has been advised that the ticket be closed as this issue will be dealt with elsewhere. Whether or not it is to be "solved" or "unsolved" is open to debate! In progress (can possibly be closed) (13/4)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA tests failing at RAL. After a lot of work and new xrootd redirectors problems persist. It's looking to be a problem that needs the CASTOR and/or xrootd devs to look at. In progress (30/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112713 112713] (27/3)<br /><br />
CMS asking to clean up the "unmerged area". Andrew conjured up a list of files and asked if they could be deleted - CMS responded with a "yes please then close the ticket". Has the deed been done? In progress (31/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694] (28/10/14)<br /><br />
The Sno+ gfal copy ticket. Matt M still sees gfal-copy hang for files at RAL when he uses the GUID (SURL works). A Castor oddity perhaps? Matt asks a question about what problems like this (coupled with the move away from lcg tools) will mean for VOs that rely on the LFC. In progress (31/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112977 112977] (10/3)<br /><br />
CMS high job failure rate at RAL. Related to 112896 (below) - the jobs all want that file! In progress (13/3)<br />
<br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=112896 (9/4)<br /><br />
CMS Dataset access problems - caused by over a million access attempts on a single file over a 18 hour period. Andrew L comments that CMS needs to have a think about how they access pileup datasets. In progress (9/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (10/2)<br /><br />
Tier 1 counterpart to 111703. A new HC stress test was submitted near the end of March, but no news on how it did. In progress (23/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112866 112866] (7/4)<br /><br />
A different "lots of CMS job failures" ticket. Again a "hot file" seems to be the root cause. In progress (7/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (28/3)<br /><br />
An atlas file access ticket, seemingly caused by some odd FTS behaviour. No answers to Shaun's question about this odd occurrence or much noise at all till today. Waiting for reply (13/4)<br />
<br />
'''UCL'''<br /><br />
UCL has 6 tickets - 4 just "assigned". I'll just list them in the interests of brevity.<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112371 112371] (ROD low availiability, On Hold)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112841 112841] (atlas 0% transfer efficiency, assigned)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112873 112873] (ROD srm put failures, assigned)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95298 95298] (glexec ticket, on hold)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112722 112722] (atlas checksum timeouts, in progress)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112966 112966] (ROD job submit failures, assigned)<br />
<br />
'''Tuesday 7th April'''<br />
* 20 open tickets. [http://tinyurl.com/l784bbg Link to GGUS].<br />
<br />
'''Monday 23rd March 2015, 15.30 GMT'''<br /><br />
19 Open tickets this week. <br />
<br />
'''BIRMINGHAM'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=112550 (23/3)<br /><br />
A ticket fresh off the ROD dashboard - the Birmingham CREAMs aren't being matched ("BrokerHelper: no compatible resources"). Matt W has double checked their setup and can't spot anything wrong - they've been running "normal" atlas/lhcb etc jobs fine over the last few weeks. Any advice appreciated. In progress (23/3)<br />
<br />
'''TIER 1'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=112495 (21/3)<br /><br />
MICE problems running jobs at RAL, which Andrew L discovered coincided with WMS problems that he fixed. Probably should be in "Waiting for Reply/Seeing if the problem's evaporated". In progress (23/3)<br />
<br />
https://ggus.eu/?mode=ticket_info&ticket_id=112350 (14/3)<br /><br />
The cause of this Sno+ ticket, about a recent user not being able to access files due to not being in the gridmap, has been discovered. As Robert F sagely pointed out the latest version of the mkgridmap rpm is required to talk to the voms server. Just waiting on the time for it to updated at now. In progress (17/3)<br />
<br />
'''100IT'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=108356 (10/9/14)<br /><br />
This 100IT ticket is still waiting for a reply (since mid-January). The question needs to be answered by someone familiar with the technologies and terminologies? Is anyone up on vmcatcher? Anyone know what other channel to pass David's query onto? Waiting for reply (15/1)<br />
<br />
'''Monday 16th March 2015, 15.30 GMT'''<br /><br />
<br />
16 Open UK tickets this week. Half Red, Half Green.<br />
<br />
'''RALPP and TIER 1 glexec HC tickets'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703] (RALPP)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (TIER 1)<br /><br />
No news on these tickets since it was expected that a new stable HC job would be released last Tuesday. All very quiet.<br />
<br />
'''OXFORD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112011 112011](25/2)<br /><br />
Similar for this CMS glexec ticket - no news after Kashif asked for some more information way back. Waiting for reply (27/2)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694](28/10/14)<br /><br />
The Sno+ gfal copying ticket. A lot of people are working on this, and attempts to recreate the problems seem to occasionally be devolving into "did we get this complicated command right?". At some point it might be necessary to get the gfal devs involved (are there gfal devs?). Waiting for reply (11/3)<br />
<br />
Also there's the poor 100IT ticket, still waiting for a reply. The JET ticket also needs wrapping up, I put a reminder in to the end of it.<br />
<br />
<br />
'''Monday 9th March 2015, 15.00 GMT'''<br/ ><br />
From last week's crusty ticket round up:<br />
<br />
'''Tier 1'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694]- 28/10/14<br/ ><br />
Matt M has managed to reclaim his tickets after a certificate change orphaned his old ones. Progress has resumed. Duncan has asked Matt to retry his failing tests with a simple copy to local disk example.<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944]- 1/10/14<br/ ><br />
CMS AAA access at RAL. Andrew posed a question to the CMS xroot experts last week - if you have their details it might be a good idea to involve them in the ticket.<br />
<br />
'''100IT'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356]- 10/9/14<br/ ><br />
This VMCatcher ticket is still stuck waiting for a reply. Deafening silence for our 100IT colleagues. <br />
<br />
'''EFDA-JET'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=97485 97485] - 21/9/13<br/ ><br />
The Jet LHCB ticket is in the state of being wrapped up. LHCB have been removed from the local configs, and the site has been removed from LHCB's. I believe that this ticket can be terminated.<br />
<br />
'''QMUL'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353] - 25/11/14<br/ ><br />
Dan's managed to get webdav working on one SE, but not t'other. Very strange, but Dan is investigating (see also [https://ggus.eu/?mode=ticket_info&ticket_id=111942 111942]).<br />
<br />
No movement on the three glexec tickets (none expected on the two tarball ones in the last week though), the Lancaster perfsonar ticket is still waiting on another batch of local tweaks (and I still need to make sense of what I'm seeing). Matt RB closed Sussex's perfsonar ticket though - nice one.<br />
<br />
'''The "Normal" tickets:'''<br />
<br />
Atlas gLexec Hammercloud failures (RALPP and Tier 1)<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (Tier 1)<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703] (RALPP)<br/ ><br />
These tests were waiting on a new stable job release being made - this has been delayed (hopefully out tomorrow).<br />
<br />
'''TIER 1'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111856 111856](19/2)<br/ ><br />
This LHCB ticket about stalled jobs looks like it can be closed (LHCB no longer see a problem). ''Update - set to solved, the jobs were being killed for using too much memory.''<br />
<br />
'''OXFORD'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112011 112011](25/2)<br/ ><br />
A CMS user saw glexec failures on some nodes - Kashif asked for some more information but there has been no reply. I'd consider giving the user till the end of the week then closing the ticket if there's still no word. Waiting for reply (27/2)<br />
<br />
'''Tuesday 3rd March'''<br />
* [http://tinyurl.com/nwgrnys A link to GGUS open tickets] for checking status directly.<br />
<br />
Concentrating on pre-2015 tickets this week in an attempt to Spring Clean the UK ggus presence. I will review these again next week - can people please take a look at these tickets if they're owned by them (or if they think they can help!).<br />
<br />
'''SUSSEX - 26/11/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389]<br /><br />
This is a perfsonar ticket - the initial request (reinstalling the perfsonar node) has been done a while ago but things weren't quite right. Matt RB did some soothing to this last week and asks if he's missed anything - I put it to waiting for reply this morning. Waiting for reply (24/2)<br />
<br />
'''QMUL - 25/11/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353]<br /><br />
Atlas wanting https access on QM's SE. Dan's been working on this nicely, carefully testing each stage of his rollout. The end is in sight here. In progress (17/2)<br />
<br />
'''TIER 1 - 28/10/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694]<br /><br />
This is a SNO+ ticket about getting gfal tools working for the Tier 1 - with the new version out Brian has tested it correctly (and I saw a related thread on lcg-rollout) - but no word from Sno+. Who is wrangling the other VOs in these post-Walker times? Waiting for reply (24/2)<br />
<br />
'''TIER 1 - 1/10/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944]<br /><br />
A CMS access about AAA tests at the Tier 1. This ticket is being actively worked on, with a new xrootd redirector at RAL and problems with the EU redirectors mucking things up. No problems that I can see. Waiting for reply (2/3)<br />
<br />
'''100IT - 10/9/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356]<br /><br />
Getting VMcatcher stuff to work at 100IT. This ticket seems to keep stalling due to lack of documentation or replies from the submitters. Waiting for reply (19/1)<br />
<br />
'''LANCASTER - 27/1/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566]<br /><br />
Lancaster's poor perfsonar performance. Being poked and prodded on and off over the last year, but the problems remain a mystery - Ewan's lending of a iperf endpoint has helped out greatly though, waiting on yet another network tweak. On Hold (23/2)<br />
<br />
'''EFDA-JET - 21/9/13'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=97485 97485]<br /><br />
LHCB job failures at EFDA-JET. The causes of this remain a mystery, and is the first ticket on my "to be set to unsolved" list.<br />
<br />
'''UCL - 1/7/13'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95298 95298]<br /><br />
UCL's glexec ticket. Ben's been working hard at this recently, but keeps hitting show stoppers - the latest being a performance problem possibly due to the VM he's running argus on. On hold (19/2)<br />
<br />
'''ECDF and LANCASTER - 1/7/13'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (ECDF)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (Lancaster)<br /><br />
glexec for the tarball. This is *still* waiting on the tarball glexec, which is again waiting on me, which is waiting on me magicking some extra tarball development time. Will be reviewed by the end of March.<br />
<br />
<br />
'''Monday 23rd February 2015, 15.00 GMT'''<br /><br />
Only 15 Open UK tickets this week. Feel free to bring up any ticket-based issues of your own on this quiet week.<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703](11/2)<br /><br />
This CMS glexec hammercloud ticket is looking a little quiet - no update for a while. If it's continuing offline or waiting on input could it at least be put On Hold? In progress (11/2)<br />
(The Tier-1 version of this ticket, 111699, seems to be chugging along fine - there might be useful snippets in there).<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333](22/1)<br /><br />
The Cloud accounting probe ticket was reopened, asking if 100IT ticketed apel support (I assume contacting them via other means would work too) otherwise the new cloud accounting won't be properly republished. Reopened (20/2)<br />
<br />
<br />
'''IMPERIAL''' (but not really their issue)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111872 111872](20/2)<br /><br />
Tom opened a ticket after another cern@school user had troubles using the IC SE - there has been some problems with the newer versions of the dirac UI. Sometimes it's better to go Vintage! Although after trying Simon and Daniela haven't been able to reproduce the failure - perhaps something's up with the user's UI? Waiting for reply (23/2)<br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389](26/11/14)<br /><br />
Sussex's Perfsonar ticket. I know Matt RB has put the ticket On Hold and is very busy - is there any news/anything we can do to help? On Hold (21/1<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694](28/10/14)<br /><br />
Sno+ gfal copy problems at RAL. Brian informs us that the latest version of the gfal tools works for him and has asked if they work for Matt M. and Co. Did you get these packages out of epel/epel-testing or somewhere else Brian? Waiting for reply (18/2)<br />
<br />
'''Monday 16th February 2015, 14.30 GMT'''<br /><br />
Only 19 open UK tickets today.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111120 111120](12/1)<br /><br />
An atlas ticket concerning transfer failures between RAL and BNL. Brian mentioned last week that the lack of recent failures is due to atlas not attempting to transfer any older data recently. Perhaps this could do with being put into the ticket (and the ticket being put On Hold, or prodded some more)? Waiting for reply (29/1)<br />
<br />
Also<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=111800 111800] (17/2)<br /><br />
ARC CE issues at RAL detected. <br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703](11/2)<br /><br />
Atlas running glexec hammerclouds - having trouble at RALPP (and RAL, see 111699). The glexec experts have gotten involved on this one, and asked to take a peek at a proxy - not sure about anyone else, but I'd feel a tad uncomfortable sharing proxies, even with known and trusted experts as in this case. Am I being overly paranoid? Either way the ticket has gone a bit quiet. In progress (11/2) <br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356] & [https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333]<br /><br />
Both the 100IT tickets are in Waiting for reply - the oldest one for quite a while - David asked a question a while back and no answer. The newest one asks if the 100IT logs made it to apel safely - I think what David has to do is submit a ticket with this question to the apel support team - have I got the right end of the stick? <br />
<br />
'''Biomed Tickets at Manchester and Imperial'''<br /> <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356] & [https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357]<br /><br />
FYI There's a note at the bottom of both of these tickets that the version of CREAM that should fix this has been delayed until the end of February(ish).<br /><br />
''Update - I read those updates wrong - the cream update has been released and these tickets have been (perhaps erroneously) closed.'' <br />
<br />
<br />
<br />
'''Monday 9th February 2015, 15.00 GMT'''<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios Results]<br /><br />
At the time of writing the only site showing red that aren't suffering an understood problem was RALPP with org.nordugrid.ARC-CE-submit and SRM-submit test failures for gridpp, pheno, t2k and southgrid for both its CEs and its SE. The failures are between 1 and 12 hours old, so it doesn't seem to be a persistent failure, but it seems to be quite consistent. They all seem to be failing with "Job submission failed... arcsub exited with code 256: ...ERROR: Failed to connect to XXXX(IPv4):443 .... Job submission failed, no more possible targets". Anyone seen something like this before?<br />
<br />
Only 20 Open UK tickets this week.<br />
<br />
'''Biomed tickets:'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356] (Manchester)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357] (Imperial)<br /><br />
Biomed have linked both these tickets as children of 110636, being worked on by the cream blah team. AFAIKS no sign of Cream 1.16.5 just yet.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111347 111347] (22/1)<br /><br />
CMS consistency checks for January 2015. It looks like everything that was asked of RAL has been done by RAL, so hopefully this can be successfully closed. In progress (3/2)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111120 111120] (12/1)<br /><br />
Another ticket, this time concerning a period of Atlas transfer failures between RAL and BNL, that looks like it can be closed as the failures seem to have stopped (and might well have been at the BNL end). Waiting for reply (22/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA test failures at RAL. Federica can't connect to the new xrootd service according to the error messages. No news for a while. In progress (29/1)<br />
<br />
'''100IT'''<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356]<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333]<br /><br />
Both of these 100IT tickets are looking a bit crusty - the first is waiting for advice, the second was just put "In progress".<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353] (25/11/14)<br />
Dan has set up se02.esc.qmul.ac.uk to test out the latest https-accessible version of storm for dteam and atlas. As a cherry on top this node is also IPv6 enabled. I'm not sure if Dan wants others in the UK to "give it a go"? In progress (6/2)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/14)<br /><br />
(Blatantly scounging for advice) Trying to figure out why Lancaster's perfsonar is under-performing. Ewan kindly gave us access to a iperf endpoint and it's been very useful in characterising some of the weirdness - although I'm still confused. Ewan also gave us a bunch of suggestions for testing that have been useful - next stop, window sizes. If anyone else wants to throw advice to me all wisdom donations are thankfully accepted. My advice for others in be careful trying to connect to the default iperf port on a working DPM pool node.... In Progress (9/2)<br />
<br />
'''Monday 2nd February 2015, 14.00 GMT'''<br /><br />
22 Open UK tickets this month. <br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389] (26/11/14)<br /><br />
A perfsonar ticket for Sussex. Their perfsonar has been reinstalled, but needs soothing. Matt has informed us that this might have to wait a few weeks due to other issues. On Hold (21/1)<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110536 110536] (2/12/14)<br /><br />
MICE job failures at RALPP - it looked like they were dying due to running of of memory. The queues have been tweaked to give MICE more, but no word from the MICE if this has solved the problem. Waiting for reply (12/1)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110365 110365] (25/11/14)<br /><br />
Another perfsonar ticket. Again the node is reinstalled, just not quite working right. Winnie is waiting for news from the other sites in a similar boat. In progress (maybe On Hold it?) (20/1)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111118 111118] (12/1)<br /><br />
ECDF "low availability" ticket - just waiting for the silly alarm to clear. Daniela submitted a ticket about this foolish alarm a while ago - [https://ggus.eu/?mode=ticket_info&ticket_id=107689 107689]. On Hold (19/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (1/7/13)<br /><br />
glexec tarball ticket. With my tarball hat on - still no positive news on this front - it's beginning to look like this can't be done but we're having one last go. Sorry! On Hold (19/12)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110225 110225] (18/11/14)<br /><br />
Change of VO Manager for helios-vo.eu. It looks like this ticket is being held up at the user end a lot. I'm not sure there's anything we can do as it involves outside CAs. On Hold (20/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356] (23/1)<br /><br />
One of Manchester's CEs not working for biomed, due to problems with the new CREAM/old WMS communication. Alessandra gave biomed some sagely advice, but I suspect this ticket will need to be prodded soon to get a reponse from biomed (who I agree should use a newer WMS and close it). On Hold (26/1)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111547 111547] (2/2)<br /><br />
I'm reporting on a ticket that I submitted to myself today. I'm not sure what that says about the world. Anyway - a ticket to track the decommissioning of one of Lancaster's CEs, as we try to do it all proper like. On Hold (2/2)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (21/1/14)<br /><br />
Lancaster's perfsonar ticket, which I sadly let reach its first birthday. I've been prodding this offline, does anyone have the address for a regular, open iperf endpoint I could borrow? On Hold (9/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13)<br /><br />
Lancaster's tarball glexec ticket, as the ECDF one. On hold (26/1)<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13)<br /><br />
UCL's glexec ticket. They've been having trouble getting it to behave, and at last check Ben was off ill - probably due to dealing with glexec :-) On Hold (20/1)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353] (25/11/14)<br /><br />
Atlas asking for QM's storage to be made available via https. Waiting on a production ready STORM that can provide this - Dan is trying it out on his testbed se02.esc.qmul.ac.uk, which still needs tweaking. In progress (28/1)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357](23/1)<br /><br />
One of the IC CEs not working for biomed. Similar to the Manchester ticket, Daniela points to ticket 110635 and is waiting on an EMI release to fix it (due out imminently AIUI). On Hold (28/1)<br />
<br />
'''EFDA-JET'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=97485 97485] (21/9/13)<br /><br />
Jet's LCHB job failure tickets. I'm afraid I haven't been able to chase this up (partly due to only ever remembering on the first Monday of the month) - there's been no news for a while. On Hold (1/10/14)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333] (22/1)<br /><br />
A ticket to 100IT and the NGI to get the cloud accounting probe upgraded. I notified 100IT, but forgot to reassign the ticket - thanks to Jeremy for doing it. Assigned (2/2)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356](10/9/14)<br /><br />
Getting VMcatcher working at 100IT. David from 100IT has asked for some answers on which "glancepush" to use, but no reply for a while. Waiting for reply (19/1)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111477 111477](29/1)<br /><br />
CMS would like to run some staging tests to warm up for Run2. The Tier 1 warned CMS of today's outage and they're happy to proceed tomorrow (the 3rd) - I think they'd like a response. In progress (30/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=107935 107935](27/8/14)<br /><br />
A ticket regarding inconsistent BDII and SRM storage numbers. Waiting on a fix from the developers regarding read-only disk accounting (I think), Brian is still on the case. Stephen B let us know that Maria the ticket submitter is on maternity leave, and asks in her stead if the numbers are expected to align now. On hold (28/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111120 111120](12/1)<br /><br />
An atlas ticket about a large number of data transfer errors seen between RAL and BNL. Brian reckoned that this was due to shallow checksums on the old data being transferred, but had trouble looking at the BNL FTS. Regardless, the ADCoS shifter hadn't seen any errors for a week and suggests the ticket can be closed. Waiting for reply (29/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944](1/10/14)<br /><br />
CMS AAA test problems at RAL. After setting up a new xrootd box the test failures have changed in nature, but sadly they're still failures. In progress (29/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111347 111347](22/1)<br /><br />
CMS Consistency Check for RAL, January 2015 edition. Filelists were generated, orphan files were identified, then purged. Just need to know what CMS want to do next. Waiting for reply (26/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694](28/10/14)<br /><br />
Sno+ ticket concerning gfal tool problems, waiting on the new release to come out (middle of this month I believe). If you don't want to wait that long then I believe the 2.8 gfal2 tools can be found in the fts3 repo at last check. On hold (20/1)<br />
<br />
<br />
'''Monday 26th January 2015, 14.15 GMT'''<br /><br />
''Back after being forgotten about by me:''<br /><br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios Status:''']<br />
<br />
At the time of writing I see:<br /><br />
'''Imperial:''' gridpp VO job submission errors (but only 34 minutes old so probably naught to worry about).<br /><br />
'''Brunel:''' gridpp VO jobs aborted (one of these is 94 days old, so might be something to worry about).<br /><br />
'''Lancaster:''' pheno failures (I can't see what's wrong, but this CE only has 10 days left to live).<br /><br />
'''Sussex:''' snoplus failures (but I think Sussex is in downtime).<br /><br />
'''RALPP:''' A number of failures across a number of CEs, all a few hours old. An SE problem?<br /><br />
'''Sheffield:''' gridpp VO job submission failure, but only 6 hours old.<br />
And of course the srm-$VONAME failures at the Tier 1, which are caused by incompatibility between the tests and Castor AIUI. Things are generally looking good.<br />
<br />
22 Open UK Tickets this week. <br /><br />
'''NGI/100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333](22/1)<br /><br />
The NGI has been asked to upgrade the cloud accounting probe, and then notify our (only at the moment) cloud site to republish their accounting. Not entirely sure what this entails or who this falls on, I assigned it to NGI-OPERATIONS (and also noticed that 100IT isn't on the "notify site" list - odd). Assigned (22/1)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944](1/10/14)<br /><br />
CMS AAA test failures. Andrew Lahiff reported last week that the Tier 1 is building a replacement xrootd box which is currently being prepared. If that will take a while can the ticket be put on hold? In progress (19/1)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353](25/11/14)<br /><br />
An atlas ticket, asking for httpd access to at QMUL. The QM chaps were waiting on a production ready Storm that could handle this, and are preparing to test one out. This is another ticket that looks like it might need to be put On Hold (will leave that up to you chaps - there's a big difference between "slow and steady" progress and "no progress for a while"). In progress (21/1)<br />
<br />
'''RHUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111355 111355](23/1)<br /><br />
A dteam ticket - concerning http access to RHUL's SE. Although the initial observation about the SE certificate being expired was incorrect (the expiry date was reported as 5/1/15, which to be fair I would read as the 5th of January and not the 1st of May!) there still is some underlying problem here with intermittent test failures. Also this ticket raises the question of under what context are these tests being conducted? Anyone know, or shall we ask the submitter? In progress (26/1)<br />
<br />
'''BIOMED PROBLEMS:'''<br /><br />
'''Manchester:''' [https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356](23/1)<br /><br />
'''Imperial:''' [https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357](23/1)<br /><br />
Biomed are having job problems, looking to be caused by using crusty old WMSes to communicate with these site's shiny up-to-date CEs. According to ticket 110635 a cream side fix should be out by the end of January (CREAM 1.16.5), although Alessandra suggests that Biomed should try to use newer, working WMSes - or Dirac instead!<br />
<br />
<br />
'''Monday 19th January 2015, 14.30 GMT'''<br /><br />
23 Open UK Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944](1/10/14)<br /><br />
CMS seeing AAA test failures at RAL. The tests have been restarted recently and now seem to be having some suspicious looking authentication failures. In Progress (13/1)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111162 111162](14/1)<br /><br />
Atlas complaining about httpd doors not working on Sheffield's SE. After schooling the submitter in how to submit more useful information Elena is working on it. I bring this up as in the last few days I've had quite a few of my pool nodes have their httpd daemons crash on them (they're up to date, but still SL5), which may or may not be related. In Progress (19/1)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111118 111118](12/1)<br /><br />
ECDF "low availability" ticket after a few days of argus trouble, which Wahid fixed. Now the ticket will languish for a few weeks as the alarm clears. Daniela has reminded us of her ticket against these fairly silly alarms: https://ggus.eu/?mode=ticket_info&ticket_id=107689 . In the mean time this ticket could do with being put On Hold whilst the alarm clears. In progress (19/1)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356](10/9/2014)<br /><br />
Setting up VMCatcher at 100IT. After some troubles things seem to have be looking up, although there are still some questions that the 100IT chaps have for the configurations and what they should be using that aren't getting answers. I set the ticket to "Waiting for Reply" hoping that this will help get those in the know's attention. Waiting for reply (15/1)<br />
<br />
'''Perfsonar Tickets'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389](Sussex)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110382 110382](TIER 1)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108273 108273](Durham)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566](Lancaster)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110365 110365](Bristol)<br /><br />
Everyone seems to have updated their perfsonar hosts, so we're all good on that front, but a number of sites are either having trouble with their reinstalled hosts, or are having problems that they had pre-reinstall still haunt them. I'm afraid I have no suggestions of what to do about the growing number of these tickets though!</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Operations_Bulletin_LatestOperations Bulletin Latest2015-11-10T10:42:23Z<p>Matthew Doidge 1ac9bd3994: /* */</p>
<hr />
<div>[[Operations_Bulletin_Archive|Bulletin archive]]<br />
<br />
__NOTOC__<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Week commencing 9th November 2015<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Task Areas<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start General****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | General updates <br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start General text*********************** ----->'''<br />
'''Tuesday 10th November'''<br />
* There was a WLCG GDB last week: [http://indico.cern.ch/event/319753/ Agenda]. [https://twiki.cern.ch/twiki/bin/view/LCG/GDBMeetingNotes20151104 Minutes].<br />
* ATLAS is going to move the brokering to use maxrss value rather than maxrss+maxswap (called maxmemory in the ATLAS panda queues). [https://twiki.cern.ch/twiki/bin/view/LCG/BSPassingParameters Reminder notes].<br />
* There is an EGI meeting in Bari this week. The OMB agenda/slides can be seen [https://indico.egi.eu/indico/contributionDisplay.py?contribId=1&confId=2675 here]. <br />
* An update from John Gordon on the CPU efficiency accounting discussion of last week."Stuart has the database merge under way now. We had hoped it would be done by the end of October but it isn’t complete yet. When done and we send an integrated dataset to the portal they will put the current dev view in the production portal alongside the current one. They are currently developing a major rewrite of the portal and don’t want to mess about with it too much."<br />
*J Perkin: Weird user error when reading LFC. Led to a reminder that the CA team are currently best efforts - please be patient.<br />
* J Hill: ATLAS consistency checks at Tier2s requires sites to upload stiorage dumps - please can we have a clear statement of what parameters ATLAS wants us to use for these dumps?<br />
* Simon G: Asked about Tier3 access to Tier2 storage.<br />
* LCG-ROLLOUT: TOP BDII issues with CentOS 6.7 (openldap-servers-2.4.40-5.el6.x86_64). It basically breaks but is being followed up with RH.<br />
* For those interested in the ARGUS working group meeting last week please see their [https://indico.cern.ch/event/459898/ summary].<br />
* There was a GridPP Technical Meeting last Friday: [https://indico.cern.ch/e/460437 Agenda].<br />
<br />
'''Tuesday 3rd November'''<br />
* There is a GDB this week: [https://indico.cern.ch/event/319753/ Agenda].<br />
* (re)introduction of the STRICT_RFC2818 mechanism in Globus.... See Jens's comments on TB-SUPPORT.<br />
* Re-allocation of space for ATLAS as a result of cleanup and removal of the ATLASPRODDISK space token.<br />
* Time out of Nagios glexec probe.<br />
* The GridPP hardware allocations are just about final so expect figures very soon. Purchases are this financial year.<br />
* DPM workshop 2015 7th-8th Dec at CERN - [https://indico.cern.ch/event/432642/ Registration is open].<br />
* The [http://wlcg-sam.cern.ch/reports/2015/201510/wlcg/ October WLCG A/R] figures are now available<br />
** [http://wlcg-sam.cern.ch/reports/2015/201510/wlcg/WLCG_All_Sites_ALICE_Oct2015.pdf ALICE]. All okay.<br />
** [http://wlcg-sam.cern.ch/reports/2015/201510/wlcg/WLCG_All_Sites_ATLAS_Oct2015.pdf ATLAS]. <br />
*** QMUL: 85%:85%<br />
*** Lancaster: N/a:N/a?<br />
*** Liverpool: 85%:100%<br />
*** Sheffield: 78%:78%<br />
**[http://wlcg-sam.cern.ch/reports/2015/201510/wlcg/WLCG_All_Sites_CMS_Oct2015.pdf CMS]. All okay.<br />
** [http://wlcg-sam.cern.ch/reports/2015/201510/wlcg/WLCG_All_Sites_LHCB_Oct2015.pdf LHCb] <br />
*** QMUL: 79%:79%<br />
*** Liverpool: 86%:100%<br />
*** Sheffield: 86%:86%<br />
*** RAL PPD: 79%:79%<br />
* Exploring options for VM based sites (in respect of the monitoring within EGI): Perhaps setup a 'community platform'.<br />
* RCUK Cloud Working Group - a first [http://bit.ly/cloudwgdec15 workshop on the 1st December] at Imperial College.<br />
* From the MB last week: <br />
** Memory Requirements: LHC experiments all basically agreed that 2GB/core was the baseline but that some (advertised) resources with up to 4GB/core would be valuable for some workflows.<br />
** February as kick-off for technical evolution groups.<br />
** PCP - Pre-commercial-procurement and HNSciCloud. This is approved and starts January-16. UK has small involvement. The plan is to build on the hybrid cloud service that results, in order to deploy a European Open Science Cloud funded from the INFRADEV-04 (2016) call <br />
<br />
<br />
<!-- **********************End General text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End General****************** -----><br />
<!-- ****************Start ops coord****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Operations Coordination - [https://indico.cern.ch/categoryDisplay.py?categId=4372 Agendas][https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsCoordination Wiki Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start ops coord text*********************** -----><br />
<br />
'''Tuesday 10th November'''<br />
* There was a WLCG ops coordination meeting last Thursday: [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151105 Minutes]. (Alessandra chaired and might be able to talk over the main items at our ops meeting).<br />
<br />
'''Tuesday 27th October'''<br />
* 13th MW Readiness WG meeting THIS Wed 28/10 @ 4pm CET in CERN room 28-S-023 or via vidyo.<br />
* There was a WLCG ops coordination meeting last Thursday: [https://indico.cern.ch/event/393618/ Agenda] - [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151022 Minutes]. <br />
<br />
'''Next Meeting is scheduled for Thursday 22nd October'''<br />
<br />
'''Tuesday 6th October'''<br />
* There was a WLCG ops coordination meeting last week. [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151001 Minutes]. [https://indico.cern.ch/event/393617/ Agenda] (which has John Gordon's accounting slides).<br />
* The highlights:<br />
** dCache sites should install the latest fix for SRM solving a vulnerability<br />
** All sites hosting a regional or local site xrootd should updgrade it at least to version 4.1.1<br />
** CMS DPM sites should consider upgrading dpm-xrootd to version 3.5.5 now (from epel-testing) or after mid October (from epel-stable) to fix a problem affecting AAA<br />
** Tier-1 sites should do their best to avoid scheduling OUTAGE downtimes at the same time as other Tier-1's supporting common LHC VOs. A calendar will be linked in the minutes of the 3 o'clock operations meeting to easily find out if there are already downtimes at a given date<br />
** The multicore accounting for WLCG is now correct for the 99.5% of the CPU time, with the few remaining issues being addressed. Corrected historical accounting data is expected to be available from the production portal by the end of the month<br />
** All LHCb sites will soon be asked to deploy the "machine features" functionality<br />
<br />
'''Tuesday 22nd September'''<br />
* There was an ops coordination meeting last Thursday: [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150917 Minutes].<br />
* Highlights:<br />
** All 4 experiments have now an agreed workflow with the T0 for tickets that should be handled by the experiment supporters and were accidentally assigned to the T0 service managers.<br />
** A new FTS3 bug fixing release 3.3.1 is now available.<br />
** A globus lib issue is causing problems with FTS3 for sites running IPv6.<br />
** The rogue Glasgow configuration management tool replacing the current configuration for VOMS with the old one was picked up and unfortunately discussed as though sites had not got the message about using the new VOMS.<br />
** No network problems experienced with the transatlantic link despite 3 out of 4 cables being unavailable.<br />
** T0 experts are investigating the slow WN performance reported by LHCb and others.<br />
** A group of experts at CERN and CMS investigate ARGUS authentication problems affecting CMS VOBOXes.<br />
** T1 & T2 sites please observe the actions requested by ATLAS and CMS (also on the WLCG Operations portal).<br />
* Actions for [https://wlcg-ops.web.cern.ch/sys-admins/tasks Sites]; [https://wlcg-ops.web.cern.ch/experiments/tasks Experiments].<br />
<br />
'''Tuesday 15th September'''<br />
* The [https://indico.cern.ch/event/393616/ next WLCG ops coordination meeting is this Thursday 17th September]. Are there any Tier-2 issues we wish to raise? Minutes will appear [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150917 here]. <br />
<br />
<br />
<!-- **********************End ops coord text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End ops coord****************** -----><br />
<br />
<!-- ****************Start T1****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-1 - [http://www.gridpp.rl.ac.uk/status/ Status Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start T1 text*********************** -----><br />
'''Tuesday 10th November'''<br />
A reminder that there is a weekly [http://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Tier-1 experiment liaison meeting]. Notes from the last meeting [https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2015-11-04 here]<br />
* We are investigating why LHCB batch jobs sometimes fail to write results back to Castor (and the sometimes fail to write remotely as well).<br />
* We are continuing with some detailed network changes needed to remove our old core switch from the network.<br />
<!-- **********************End T1 text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End T1****************** -----><br />
<br />
<!-- ****************Start Storage & DM****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Storage & Data Management - [http://storage.esc.rl.ac.uk/weekly/ Agendas/Minutes]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20151028-minutes.txt Wednesday 28 Oct]'''<br />
* Summary of "UK T0" workshop - GridPP well represented<br />
* Sites should not upgrade to DPM 1.8.10 just yet<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150902-minutes.txt Wednesday 02 Sep]'''<br />
* Catch up with [http://www3.imperial.ac.uk/highenergyphysics/research/experiments/mice MICE]<br />
* How to do transfers of '''lots''' of files with FTS3 without the proxy timing out (in particularly if you need it vomsified)<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150812-minutes.txt Wednesday 12 Aug]'''<br />
* sort of housekeeping: data cleanups, catalogue synchronisation - in particular namespace dumps for VOs<br />
* GridPP storage/data at future events; GridPP35 and Hepix and Cloud data events<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150708-minutes.txt Wednesday 08 July]'''<br />
* Huge backlog of ATLAS data from Glasgow waiting to go to RAL, and oddly varying performance numbers - investigating<br />
* How physics data is like your Windows 95 games<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150701-minutes.txt Wednesday 01 July]'''<br />
* Feedback on CMS's proposal for listing contents of storage<br />
* Simple storage on expensive raided disks vs complicated storage on el cheapo or archive drives?<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End Storage & DM****************** -----><br />
<!-- ****************Start T2EVO****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-2 Evolution - [https://its.cern.ch/jira/browse/GRIDPP/ GridPP JIRA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Monday 9 Nov'''<br />
* Vac 0.19.0 released last week: new VacQuery protocol<br />
* Prototyping Vcycle(/Vac) GLUE2 publishing: talk at WLCG InfoSys Evolution TF on Thursday<br />
* Please now use https://repo.gridpp.ac.uk/vacproject/ URLs for user_data and cernvm3.iso files (in preparation for move to new www.gridpp.ac.uk website.)<br />
* See Laurence's GDB talk "Helix Nebula Price Enquiry Results" for more use of Vcycle<br />
<br />
'''Tuesday 3 Nov'''<br />
* LHCb prototype of GOCDB pointers to resource BDII done<br />
* T2C tests at Oxford ongoing<br />
<br />
'''Tuesday 20 Oct'''<br />
* LHCb multipilot VMs now in production<br />
* Support for APEL-Sync records in Vac, but need to co-ordinate with APEL team to validate it. This is to allow pure-VM sites like UCL to pass APEL SAM tests (GRIDPP-10)<br />
* Last GridPP Technical Meeting decided to test disk-less operation at Oxford for CMS (GRIDPP-20) and LHCb (GRIDPP-21).<br />
<br />
'''Tuesday 6 Oct'''<br />
* UCL Vac site now running LHCb test of two payloads per dual processor VM. Total of dual processor VMs at UCL now 120.<br />
<br />
'''Tuesday 29 Sep'''<br />
* UCL Vac site updated with most recent version of Vac-in-a-Box. Now running ~216 jobs: LHCb MC and ATLAS certification jobs.<br />
* Drawing up list of tasks needed to be able to run a site for GridPP-supported VOs purely using VMs (e.g. VM certification by experiments etc.)<br />
* Discussion at GridPP Technical Meeting on storage options, including xrootd-based sites (i.e. xrootd not DPM/dCache)<br />
<br />
'''Tuesday 22 Sep'''<br />
* Fortnightly [https://indico.cern.ch/category/4454/ GridPP Technical Meetings] on Fridays will have Tier-2 Evolution discussions, starting on Fri 25 Sept.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End T2EVO****************** -----><br />
<!-- ****************Start Accounting****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Accounting - [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html UK Grid Metrics] [[HEPSPEC06]] [http://tinyurl.com/cfbfanf Atlas Dashboard HS06]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 3rd November'''<br />
* APEL delay (normal state) Lancaster and Sheffield.<br />
<br />
'''Tuesday 20th October'''<br />
The WLCG MB decided to create a Benchmarking Task force led by Helge Meinhard see [https://indico.cern.ch/event/433303/contribution/9/attachments/1154494/1658853/2015-09-15-LCGMB-Benchmarking.pdf talk]<br />
<br />
'''Tuesday 22nd September'''<br />
* Slight delay for Sheffield but overall okay - although there is a gap between today's date and the most recent update for all sites. Perhaps an APEL delay.<br />
<br />
'''Monday 20th July'''<br />
* Oxford publishing 0 cores from Cream today. Maybe they forgot to switch one off. [http://goc-accounting.grid-support.ac.uk/apel/jobs2_withsubmithost.html Check here]. <br />
<br />
* A reminder to keep updating the [https://www.gridpp.ac.uk/wiki/HEPSPEC06 HEPSPEC06 tables].<br />
<br />
* [http://accounting.egi.eu/egi.php?SubRegion=1.67&query=normcpu&startYear=2014&startMonth=8&endYear=2014&endMonth=9&yRange=SITE&xRange=VO&voGroup=lhc&chart=GRBAR&scale=LIN&localJobs=onlygridjobs APEL status]: An issue at Sheffield?<br />
<br />
<!-- *****************Edit stop*********************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End Accounting****************** -----><br />
<!-- ****************Start Documentation****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Documentation - [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=area KeyDocs]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- *********************************************************** -----><br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Friday 6th Nov, 2015'''<br />
<br />
SteveJ: Advice to admins published about a common GSS error, globus_gsi_callback_module: Could not verify <br />
credential etc.<br />
<br />
https://www.gridpp.ac.uk/wiki/Security_system_errors_and_workarounds<br />
<br />
'''Tuesday 20th October, 2015'''<br />
<br />
Approved VOs document updated with temporary section for [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs#Approved_VOs_in_the_process_of_being_established LZ]<br />
<br />
'''Tuesday 29nd September'''<br />
Steve J: problems with voms server at fnal, voms.fnal.gov, have been detected; I will resolve them soon and may issue an update to Approved VOs, alerting sites with TB_SUPPORT should that occur. Approved VOs potentially affected are CDF, DZERO, LSST. Please do not act act yet.<br />
<br />
'''Tuesday 22nd September'''<br />
* Steve J is going to undertake some GridPP/documentation usability testing. <br />
<br />
'''Tuesday 18th August'''<br />
* Lydia's document - Setup a system to do data archiving using FTS3<br />
<br />
'''Tuesday 28th July'''<br />
* Ewan: /cvmfs/gridpp-vo help ... there's a lot of historical stuff on the GridPP wiki that makes it look a lot more complicated than it is now. We really should have a bit of a clear out at some point.<br />
<br />
'''Tuesday 23rd June'''<br />
* Reminder that documents need reviewing!<br />
<br />
<br />
'''General note'''<br />
<br />
See the [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=reviewed worst KeyDocs list] for documents needing review now and the names of the responsible people.<br />
<br />
|}<br />
<br />
<!-- ****************End Documentation****************** -----><br />
<!-- ****************Start Interop****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Interoperation - [https://wiki.egi.eu/wiki/Grid_Operations_Meetings EGI ops agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
''' Tuesday 20th October <br />
<br />
Meeting last week: https://wiki.egi.eu/wiki/Agenda-12-10-2015<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- *********************************************************** -----><br />
<br />
|}<br />
<!-- ****************End Interop****************** -----><br />
<!-- ****************Start Monitoring****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Monitoring - [https://www.gridpp.ac.uk/wiki/Links_Monitoring_pages Links] [http://grid-monitoring.cern.ch/mywlcg/ MyWLCG]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 16th June'''<br />
* F Melaccio & D Crooks decided to add a [https://www.gridpp.ac.uk/wiki/Monitoring_FAQs FAQs section] devoted to common monitoring issues under the monitoring page.<br />
* Feedback welcome. <br />
<br />
<br />
<br />
''' Tuesday 31st March<br />
<br />
* Glasgow Graphite/Grafana documentation: http://www.scotgrid.ac.uk/graphite/<br />
<br />
'''Monday 7th December<br />
<br />
* Meeting last Friday - agenda: https://indico.cern.ch/event/356853/ minutes: https://indico.cern.ch/event/356853/material/minutes/1.pdf<br />
* This was the wrap-up meeting of the consolidation TF; the mailing list will remain extant for a while yet.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Monitoring****************** -----><br />
<!-- ****************On-duty****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | On-duty - [https://operations-portal.in2p3.fr/dashboard Dashboard] [https://www.gridpp.ac.uk/wiki/ROD_rota ROD rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 3rd November'''<br />
* The long-standing UCL availability alarm went green yesterday on 29th October. We are not sure why!<br />
* Quite a lot of activity on the dashboard this week, but only one or two new tickets. <br />
* Tickets: Five for availability / reliability: Sussex, Sheffield, Liverpool, Lancaster and UCL. Two for GLUE2 validation: Liverpool and QMUL. One for the CEs at QMU<br />
<br />
'''Tuesday 20th October'''<br />
<br />
Lots of bits here and there, but no big pattern. Tickets about CE and storage problems open at several sites. QMUL notable as going on for a while, probably with some kind of configuration problem they're not identifying.<br />
<br />
<br />
'''Tuesday 6th October'''<br />
* With the exception of the dashboard getting really confused early in the week as the Nagios instances at Oxford and Lancaster came and went, it's been a fairly quiet week. There are four outstanding tickets:<br />
** Three for availability / reliabaility (Sussex, Liverpool and Lancaster).<br />
** One at Bristol for a GridFTP transfer problem.<br />
<br />
'''Tuesday 15th September'''<br />
* Generally quiet. QMUL have some grumblyness with the CEs. However, I understand much of this is caused by the batch farm being busy. There are low-availability tickets 'on hold' for Liverpool and UCL.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Rollout [http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html Status] [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions WLCG Baseline]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 15th September'''<br />
* [http://repository.egi.eu/2015/09/10/release-umd-3-13-3/ UMD 3.13.3 is available].<br />
<br />
'''Tuesday 12th May'''<br />
* MW Readiness WG meeting Wed May 6th at 4pm. Attended by Raul, Matt, Sam and Jeremy.<br />
<br />
'''Tuesday 17th March'''<br />
* Daniela has updated the [ https://www.gridpp.ac.uk/wiki/Staged_rollout_emi3 EMI-3 testing table]. Please check it is correct for your site. We want a clear view of where we are contributing.<br />
* There is a middleware readiness meeting this Wednesday. Would be good if a few site representatives joined.<br />
* Machine job features solution testing. Fed back that we will only commence tests if more documentation made available. This stops the HTC solution until after CHEP. Is there interest in testing other batch systems? Raul mentioned SLURM. There is also SGE and Torque.<br />
<br />
'''References'''<br />
<br />
* Staged Rollout pages (now separated into EMI1 & 2), and the page listing the deployed versions is extractable from the bdii, so they should all be reasonably up-to-date:<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout_emi2.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End On-duty****************** -----><br />
<!-- ****************Start Security****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Security - [http://www.gridpp.ac.uk/security/inchand/Incident.html Incident Procedure] [http://www.gridpp.ac.uk/security/policies/index.html Policies] [https://www.gridpp.ac.uk/wiki/SD_rota Rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
'''Monday 9th November'''<br />
* EGI SVG Advisory - 'Critical' risk. Remote arbitrary code execution vulnerabilities in the core crypto library used by RedHat - [https://wiki.egi.eu/wiki/SVG:Advisory-SVG-2015-CVE-2015-7183 Advisory-SVG-2015-CVE-2015-7183] All running resources based on Red hat and its derivatives MUST be patched by 2015-11-13 <br />
<br />
'''Tuesday 3rd November'''<br />
* EGI SVG Advisory - Various Java CVE's with max CVSS score.<br />
<br />
'''Monday 26th October'''<br />
* Updated IGTF distribution version [ https://dist.igtf.net/distribution/igtf/current/ 1.69 available].<br />
* Next UK Security Team meeting scheduled for 28th Oct.<br />
<br />
The EGI [https://operations-portal.egi.eu/csiDashboard security dashboard].<br />
<br />
|}<br />
<!-- ****************End Security****************** -----><br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start Services****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Services - [http://psmad.grid.iu.edu/maddash-webui/ PerfSonar dashboard] | [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).<br />
<br />
'''Tuesday 6th October'''<br />
* A reminder, the next [https://indico.cern.ch/event/401680/ LHCOPN and LHCONE joint meeting]: Science Park Amsterdam (NL) 28-29 of October 2015<br />
<br />
'''Tuesday 14th July'''<br />
* GridPP35 in September will have a part focus on networking and IPv6. This will include a review of where sites are with their deployment. Please try to firm up dates for your IPv6 availability between now and September. Please update the [https://gridpp.ac.uk/wiki/IPv6_site_status GridPP IPv6 status table]. <br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Services****************** -----><br />
<!-- ****************Start Tickets****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tickets<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 1oth November 2015, 10.20 GMT'''<br /><br />
Down to 13 Open tickets this week!<br />
<br />
All the UK can be found [http://tinyurl.com/nwgrnys '''here'''].<br />
<br />
Not very exciting - the ECDF-RDF ticket [https://ggus.eu/?mode=ticket_info&ticket_id=117447 117447] (atlas complaining about stale file handle error messages - but it's for the "RDF" so not urgent?) might have slipped past Andy's sentries yesterday, and the two remaining pilot role tickets for Sheffield and the Tier 1 ([https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] and [https://ggus.eu/?mode=ticket_info&ticket_id=116866 116866]) could really do with updates - with the latter Daniela notes that the gridpp pilots roles are needed for testing purposes.<br />
<br />
Anything I've missed?<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios'''] - no persistent trouble anywhere (that matters to us) - a few ARC CE submit troubles over the last hour or so with some RALPP CEs but I don't think that's ought to worry about.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tickets****************** -----><br />
<!-- ****************Start Tools****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tools - [https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] [https://gridppnagios.physics.ox.ac.uk/nagios/ Nagios]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== ===== <br />
<!-- ******************Edit start********************* -----><br />
''' Tuesday 6 Oct 2015'''<br />
<br />
Moved Gridppnagios instance back to Oxford from Lancaster. It was kind of double whammy as both sites went down together. Fortunately Oxford site was partially working so we managed to start SAM Nagios at Oxford. Sam tests were unavailable for few hours but no affect on egi availibilty/reliability. Sites can have a look at https://mon.egi.eu/myegi/ss/ for a/r status. <br />
<br />
'''Tuesday 29 Sep 2015'''<br />
<br />
Following an air-conditioning problem at machine room in Oxford Tier-2 site on 26 September, gridppnagios(OX) was shut down and gridppnagios(Lancs) became active instance. Oxford site is in downtime until 1st Oct and it may be extended depending on the situation. <br />
VO-Nagios was also unavailable for two days but we have started it yesterday as it is running on a VM. VO-nagios is using oxford SE for replication test so it is failing those tests. I am looking to change to some other SE. <br />
<br />
'''Tuesday 09 June 2015'''<br />
*ARC CEs were failing nagios test becuase of non-availability of egi repository. Nagios test compare CA version from EGI repo. It started on 5th June and one of the IP addresses behind webserver was not responding. Problem went away in approximately 3 hours. The same problem started again on 6th June. Finally it was fixed on 8th June. No reason was given in any of the ticket opened regarding this outage. <br />
<br />
'''Tuesday 17th February'''<br />
* Another period where message brokers were temporarily unavailable seen yesterday. Any news on the last follow-up?<br />
<br />
'''Tuesday 27th January'''<br />
* Unscheduled outage of the EGI message broker (GRNET) caused a short-lived disruption to GridPP site monitoring (jobs failed) last Thursday 22nd January. Suspect BDII caching meant no immediate failover to stomp://mq.cro-ngi.hr:6163/ from stomp://mq.afroditi.hellasgrid.gr:6163/<br />
<br />
* [http://southgrid.blogspot.co.uk/2014/10/nagios-monitoring-for-non-lhc-vos.html Blog about VO Nagios]<br />
* [https://vo-nagios.physics.ox.ac.uk/nagios/ Oxford VO Nagios] currently monitoring gridpp, pheno, t2k.org, snoplus.snolab.ca, vo.southgrid.ac.uk.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tools****************** -----><br />
<!-- ****************Start VOs****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | VOs - [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS] [http://operations-portal.egi.eu/vo VO IDs] [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs Approved] [http://pprc.qmul.ac.uk/~walker/votable.html VO table]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 19th May'''<br />
* There is a current priority for enabling/supporting our joining communities. <br />
<br />
'''Tuesday 5th May'''<br />
* We have a number of VOs to be removed. Dedicated follow-up meeting proposed.<br />
<br />
'''Tuesday 28th April'''<br />
* For SNOPLUS.SNOLAB.CA, the port numbers for voms02.gridpp.ac.uk and voms03.gridpp.ac.uk have both been updated from 15003 to 15503.<br />
<br />
'''Tuesday 31st March'''<br />
* LIGO are in need of additional support for debugging some tests.<br />
* LSST now enabled on 3 sites. No 'own' CVMFS yet.<br />
<br />
* Impact<br />
** Citation policy (https://www.gridpp.ac.uk/acknowledging.html)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End VOs****************** -----><br />
<!-- ****************Start Sites****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Site Updates<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 24th February'''<br />
* Next review of status today.<br />
<br />
'''Tuesday 27th January'''<br />
* Squids not in GOCDB for: UCL; ECDF; Birmingham; Durham; RHUL; IC; Sussex; Lancaster<br />
* Squids in GOCDB for: EFDA-JET; Manchester; Liverpool; Cambridge; Sheffield; Bristol; Brunel; QMUL; T1; Oxford; Glasgow; RALPPD.<br />
<br />
'''Tuesday 2nd December''' <br />
* [https://www.gridpp.ac.uk/wiki/Batch_system_status Multicore status]. Queues available (63%)<br />
** YES: RAL T1; Brunel; Imperial; QMUL; Lancaster; Liverpool; Manchester; Glasgow; Cambridge; Oxford; RALPP; Sussex (12)<br />
** NO: RHUL (testing); UCL; Sheffield (testing); Durham; ECDF (testing); Birmingham; Bristol (7)<br />
<br />
* According to our [https://www.gridpp.ac.uk/wiki/Batch_system_status table] for cloud/VMs (26%)<br />
** YES: RAL T1; Brunel; Imperial; Manchester; Oxford (5)<br />
** NO: QMUL; RHUL; UCL; Lancaster; Liverpool; Sheffield; Durham; ECDF; Glasgow; Birmingham; Bristol; Cambridge; RALPP; Sussex (14)<br />
<br />
* [http://www.gridpp.ac.uk/php/gridpp-dirac-sam.php?action=viewlcg GridPP DIRAC jobs] successful (58%)<br />
** YES: Bristol; Glasgow; Lancaster; Liverpool; Manchester; Oxford; Sheffield; Brunel; IC; QMUL; RHUL (11)<br />
** NO: Cambridge; Durham; RALPP; RAL T1 (4) + ECDF; Sussex; UCL; Birmingham (4)<br />
<br />
* [https://www.gridpp.ac.uk/wiki/IPv6_site_status IPv6 status]<br />
** Allocation - 42%<br />
** YES: RAL T1; Brunel; IC; QMUL; Manchester; Sheffield; Cambridge; Oxford (8)<br />
** NO: RHUL; UCL; Lancaster; Liverpool; Durham; ECDF; Glasgow; Birmingham; Bristol; RALPP; Sussex<br />
<br />
* Dual stack nodes - 21%<br />
** YES: Brunel; IC; QMUL; Oxford (4)<br />
** NO: RHUL; UCL; Lancaster; Glasgow; Liverpool; Manchester; Sheffield; Durham; ECDF; Birmingham; Bristol; Cambridge; RALPP; Sussex, RAL T1 (15)<br />
<br />
<br />
<br />
'''Tuesday 21st October'''<br />
* High loads seen in xroot by several sites: Liverpool and RALT1... and also Bristol (see Luke's TB-S email on 16/10 for questions about changes to help).<br />
<br />
'''Tuesday 9th September'''<br />
* Intel announced the new generation of Xeon based on Haswell.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Sites****************** -----><br />
|}<br />
<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Meeting Summaries<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start PMB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Project Management Board - [http://www.gridpp.ac.uk/pmb/ Members][http://www.gridpp.ac.uk/php/pmb/minutes.php Minutes] [http://www.gridpp.ac.uk/pmb/ProjectManagement/QuarterlyReports/reports.html Quarterly Reports]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ******************************************* -----><br />
<!-- ****************End PMB****************** -----><br />
<!-- ****************Start Grid Ops****************** -----><br />
<!-- ******************************************* -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | GridPP ops meeting - [http://indico.cern.ch/categoryDisplay.py?categId=338 Agendas] [https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Actions] [https://www.gridpp.ac.uk/wiki/Category:GridPP_Operations Core Tasks]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Grid Ops****************** -----><br />
<!-- ****************Start T1 liaison****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) [https://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Agenda] Meeting takes place on Vidyo.<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
Highlights from this meeting are now included in the Tier1 report farther up this page.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Liaison****************** -----><br />
<!-- ****************Start GDB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Grid Deployment Board - [http://indico.cern.ch/categoryDisplay.py?categId=3l181 Agendas] [http://indico.cern.ch/categoryDisplay.py?categId=666 MB agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End GDB****************** -----><br />
<br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start NGI****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | NGI UK - [http://www.ukngi.ac.uk/ Homepage] [https://ca.grid-support.ac.uk/cgi-bin/pub/pki?cmd=getStaticPage&name=index CA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End NGI****************** -----><br />
<br />
<!-- ****************Start Events****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Events<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************Stop Events****************** -----><br />
<!-- ****************Start ATLAS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK ATLAS - [http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?#currentView=Shifter+view&highlight=false Shifter view] [http://www.atlas.ac.uk/ops/ News & Links]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Atlas S&C week 2-6 Feb 2015'''<br />
<br />
Production<br />
<br />
• Prodsys-2 in production since Dec 1st<br />
<br />
• Deployment has not been transparent , many issued has been solved, the grid is filled again <br />
<br />
• MC15 is expected to start soon, waiting for physics validations, evgen testing is underway and close to finalised.. Simulation expected to be broadly similar to MC14, no blockers expected. <br />
<br />
Rucio<br />
<br />
• Rucio in production since Dec 1st and is ready for LHC RUN-2. Some fields need improvements, including transfer and deletion agents, documentation and monitoring. <br />
<br />
• [https://rucio-ui.cern.ch/dumps Rucio dumps available].<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMDarkDataCleaning Dark data cleaning]<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMLostFilesLost files declaration] . Only Only DDM ops can issue lost files declaration for now, cloud support needs to fill a ticket.<br />
<br />
• Webdav panda functional tests with Hammercloud are ongoing<br />
<br />
Monitoring<br />
<br />
• [http://adc-monitoring.cern.ch/ Main page]<br />
<br />
• [http://dashb-atlas-ddm-acc.cern.ch/dashboard/request.py/ddmaccounting DDM Accounting]<br />
<br />
• [http://atlas-agis.cern.ch/agis/ddmblacklisting/listGroup space]<br />
<br />
• [http://dashb-atlas-ddm.cern.ch/ddm2/ Deletion]<br />
<br />
ASAP<br />
<br />
• ASAP (ATLAS Site Availability Performance) in place. Every 3 months the T2s sites performing BELOW 80% are [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/ATLASSiteCategorization reported to the International Computing Board.]<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End ATLAS****************** -----><br />
<!-- ****************Start CMS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK CMS<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End CMS****************** -----><br />
<!-- ****************Start LHCb****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK LHCb<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End LHCb****************** -----><br />
<!-- ****************Start Other****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK OTHER<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Other****************** -----><br />
<!-- ****************Start Requests****************** -----><br />
{| cellspacing="0" cellpadding="0" style="margin: 1em 0em 0em 0.3em; width:100%"<br />
| style="width:50%; vertical-align:top; border:1px solid Gold; background-color: LightGreen;" rowspan="2"|<br />
<div style="border-bottom:1px solid Gold; background-color:#ffffaa; padding:0.2em 0.5em 0.2em 0.5em; font-size:110%; font-weight:bold;"> To note</div><br />
<div style="padding:0.4em 1em 0.3em 1em;"><br />
<br />
==== ====<br />
<!-- ******************Edit start********************* -----><br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<br />
</div><br />
|}<br />
<br />
<!-- ****************End Requests****************** -----><br />
<br />
|}</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Operations_Bulletin_LatestOperations Bulletin Latest2015-11-10T10:39:46Z<p>Matthew Doidge 1ac9bd3994: /* */</p>
<hr />
<div>[[Operations_Bulletin_Archive|Bulletin archive]]<br />
<br />
__NOTOC__<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Week commencing 9th November 2015<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Task Areas<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start General****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | General updates <br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start General text*********************** ----->'''<br />
'''Tuesday 10th November'''<br />
* There was a WLCG GDB last week: [http://indico.cern.ch/event/319753/ Agenda]. [https://twiki.cern.ch/twiki/bin/view/LCG/GDBMeetingNotes20151104 Minutes].<br />
* ATLAS is going to move the brokering to use maxrss value rather than maxrss+maxswap (called maxmemory in the ATLAS panda queues). [https://twiki.cern.ch/twiki/bin/view/LCG/BSPassingParameters Reminder notes].<br />
* There is an EGI meeting in Bari this week. The OMB agenda/slides can be seen [https://indico.egi.eu/indico/contributionDisplay.py?contribId=1&confId=2675 here]. <br />
* An update from John Gordon on the CPU efficiency accounting discussion of last week."Stuart has the database merge under way now. We had hoped it would be done by the end of October but it isn’t complete yet. When done and we send an integrated dataset to the portal they will put the current dev view in the production portal alongside the current one. They are currently developing a major rewrite of the portal and don’t want to mess about with it too much."<br />
*J Perkin: Weird user error when reading LFC. Led to a reminder that the CA team are currently best efforts - please be patient.<br />
* J Hill: ATLAS consistency checks at Tier2s requires sites to upload stiorage dumps - please can we have a clear statement of what parameters ATLAS wants us to use for these dumps?<br />
* Simon G: Asked about Tier3 access to Tier2 storage.<br />
* LCG-ROLLOUT: TOP BDII issues with CentOS 6.7 (openldap-servers-2.4.40-5.el6.x86_64). It basically breaks but is being followed up with RH.<br />
* For those interested in the ARGUS working group meeting last week please see their [https://indico.cern.ch/event/459898/ summary].<br />
* There was a GridPP Technical Meeting last Friday: [https://indico.cern.ch/e/460437 Agenda].<br />
<br />
'''Tuesday 3rd November'''<br />
* There is a GDB this week: [https://indico.cern.ch/event/319753/ Agenda].<br />
* (re)introduction of the STRICT_RFC2818 mechanism in Globus.... See Jens's comments on TB-SUPPORT.<br />
* Re-allocation of space for ATLAS as a result of cleanup and removal of the ATLASPRODDISK space token.<br />
* Time out of Nagios glexec probe.<br />
* The GridPP hardware allocations are just about final so expect figures very soon. Purchases are this financial year.<br />
* DPM workshop 2015 7th-8th Dec at CERN - [https://indico.cern.ch/event/432642/ Registration is open].<br />
* The [http://wlcg-sam.cern.ch/reports/2015/201510/wlcg/ October WLCG A/R] figures are now available<br />
** [http://wlcg-sam.cern.ch/reports/2015/201510/wlcg/WLCG_All_Sites_ALICE_Oct2015.pdf ALICE]. All okay.<br />
** [http://wlcg-sam.cern.ch/reports/2015/201510/wlcg/WLCG_All_Sites_ATLAS_Oct2015.pdf ATLAS]. <br />
*** QMUL: 85%:85%<br />
*** Lancaster: N/a:N/a?<br />
*** Liverpool: 85%:100%<br />
*** Sheffield: 78%:78%<br />
**[http://wlcg-sam.cern.ch/reports/2015/201510/wlcg/WLCG_All_Sites_CMS_Oct2015.pdf CMS]. All okay.<br />
** [http://wlcg-sam.cern.ch/reports/2015/201510/wlcg/WLCG_All_Sites_LHCB_Oct2015.pdf LHCb] <br />
*** QMUL: 79%:79%<br />
*** Liverpool: 86%:100%<br />
*** Sheffield: 86%:86%<br />
*** RAL PPD: 79%:79%<br />
* Exploring options for VM based sites (in respect of the monitoring within EGI): Perhaps setup a 'community platform'.<br />
* RCUK Cloud Working Group - a first [http://bit.ly/cloudwgdec15 workshop on the 1st December] at Imperial College.<br />
* From the MB last week: <br />
** Memory Requirements: LHC experiments all basically agreed that 2GB/core was the baseline but that some (advertised) resources with up to 4GB/core would be valuable for some workflows.<br />
** February as kick-off for technical evolution groups.<br />
** PCP - Pre-commercial-procurement and HNSciCloud. This is approved and starts January-16. UK has small involvement. The plan is to build on the hybrid cloud service that results, in order to deploy a European Open Science Cloud funded from the INFRADEV-04 (2016) call <br />
<br />
<br />
<!-- **********************End General text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End General****************** -----><br />
<!-- ****************Start ops coord****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Operations Coordination - [https://indico.cern.ch/categoryDisplay.py?categId=4372 Agendas][https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsCoordination Wiki Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start ops coord text*********************** -----><br />
<br />
'''Tuesday 10th November'''<br />
* There was a WLCG ops coordination meeting last Thursday: [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151105 Minutes]. (Alessandra chaired and might be able to talk over the main items at our ops meeting).<br />
<br />
'''Tuesday 27th October'''<br />
* 13th MW Readiness WG meeting THIS Wed 28/10 @ 4pm CET in CERN room 28-S-023 or via vidyo.<br />
* There was a WLCG ops coordination meeting last Thursday: [https://indico.cern.ch/event/393618/ Agenda] - [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151022 Minutes]. <br />
<br />
'''Next Meeting is scheduled for Thursday 22nd October'''<br />
<br />
'''Tuesday 6th October'''<br />
* There was a WLCG ops coordination meeting last week. [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151001 Minutes]. [https://indico.cern.ch/event/393617/ Agenda] (which has John Gordon's accounting slides).<br />
* The highlights:<br />
** dCache sites should install the latest fix for SRM solving a vulnerability<br />
** All sites hosting a regional or local site xrootd should updgrade it at least to version 4.1.1<br />
** CMS DPM sites should consider upgrading dpm-xrootd to version 3.5.5 now (from epel-testing) or after mid October (from epel-stable) to fix a problem affecting AAA<br />
** Tier-1 sites should do their best to avoid scheduling OUTAGE downtimes at the same time as other Tier-1's supporting common LHC VOs. A calendar will be linked in the minutes of the 3 o'clock operations meeting to easily find out if there are already downtimes at a given date<br />
** The multicore accounting for WLCG is now correct for the 99.5% of the CPU time, with the few remaining issues being addressed. Corrected historical accounting data is expected to be available from the production portal by the end of the month<br />
** All LHCb sites will soon be asked to deploy the "machine features" functionality<br />
<br />
'''Tuesday 22nd September'''<br />
* There was an ops coordination meeting last Thursday: [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150917 Minutes].<br />
* Highlights:<br />
** All 4 experiments have now an agreed workflow with the T0 for tickets that should be handled by the experiment supporters and were accidentally assigned to the T0 service managers.<br />
** A new FTS3 bug fixing release 3.3.1 is now available.<br />
** A globus lib issue is causing problems with FTS3 for sites running IPv6.<br />
** The rogue Glasgow configuration management tool replacing the current configuration for VOMS with the old one was picked up and unfortunately discussed as though sites had not got the message about using the new VOMS.<br />
** No network problems experienced with the transatlantic link despite 3 out of 4 cables being unavailable.<br />
** T0 experts are investigating the slow WN performance reported by LHCb and others.<br />
** A group of experts at CERN and CMS investigate ARGUS authentication problems affecting CMS VOBOXes.<br />
** T1 & T2 sites please observe the actions requested by ATLAS and CMS (also on the WLCG Operations portal).<br />
* Actions for [https://wlcg-ops.web.cern.ch/sys-admins/tasks Sites]; [https://wlcg-ops.web.cern.ch/experiments/tasks Experiments].<br />
<br />
'''Tuesday 15th September'''<br />
* The [https://indico.cern.ch/event/393616/ next WLCG ops coordination meeting is this Thursday 17th September]. Are there any Tier-2 issues we wish to raise? Minutes will appear [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150917 here]. <br />
<br />
<br />
<!-- **********************End ops coord text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End ops coord****************** -----><br />
<br />
<!-- ****************Start T1****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-1 - [http://www.gridpp.rl.ac.uk/status/ Status Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start T1 text*********************** -----><br />
'''Tuesday 10th November'''<br />
A reminder that there is a weekly [http://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Tier-1 experiment liaison meeting]. Notes from the last meeting [https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2015-11-04 here]<br />
* We are investigating why LHCB batch jobs sometimes fail to write results back to Castor (and the sometimes fail to write remotely as well).<br />
* We are continuing with some detailed network changes needed to remove our old core switch from the network.<br />
<!-- **********************End T1 text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End T1****************** -----><br />
<br />
<!-- ****************Start Storage & DM****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Storage & Data Management - [http://storage.esc.rl.ac.uk/weekly/ Agendas/Minutes]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20151028-minutes.txt Wednesday 28 Oct]'''<br />
* Summary of "UK T0" workshop - GridPP well represented<br />
* Sites should not upgrade to DPM 1.8.10 just yet<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150902-minutes.txt Wednesday 02 Sep]'''<br />
* Catch up with [http://www3.imperial.ac.uk/highenergyphysics/research/experiments/mice MICE]<br />
* How to do transfers of '''lots''' of files with FTS3 without the proxy timing out (in particularly if you need it vomsified)<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150812-minutes.txt Wednesday 12 Aug]'''<br />
* sort of housekeeping: data cleanups, catalogue synchronisation - in particular namespace dumps for VOs<br />
* GridPP storage/data at future events; GridPP35 and Hepix and Cloud data events<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150708-minutes.txt Wednesday 08 July]'''<br />
* Huge backlog of ATLAS data from Glasgow waiting to go to RAL, and oddly varying performance numbers - investigating<br />
* How physics data is like your Windows 95 games<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150701-minutes.txt Wednesday 01 July]'''<br />
* Feedback on CMS's proposal for listing contents of storage<br />
* Simple storage on expensive raided disks vs complicated storage on el cheapo or archive drives?<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End Storage & DM****************** -----><br />
<!-- ****************Start T2EVO****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-2 Evolution - [https://its.cern.ch/jira/browse/GRIDPP/ GridPP JIRA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Monday 9 Nov'''<br />
* Vac 0.19.0 released last week: new VacQuery protocol<br />
* Prototyping Vcycle(/Vac) GLUE2 publishing: talk at WLCG InfoSys Evolution TF on Thursday<br />
* Please now use https://repo.gridpp.ac.uk/vacproject/ URLs for user_data and cernvm3.iso files (in preparation for move to new www.gridpp.ac.uk website.)<br />
* See Laurence's GDB talk "Helix Nebula Price Enquiry Results" for more use of Vcycle<br />
<br />
'''Tuesday 3 Nov'''<br />
* LHCb prototype of GOCDB pointers to resource BDII done<br />
* T2C tests at Oxford ongoing<br />
<br />
'''Tuesday 20 Oct'''<br />
* LHCb multipilot VMs now in production<br />
* Support for APEL-Sync records in Vac, but need to co-ordinate with APEL team to validate it. This is to allow pure-VM sites like UCL to pass APEL SAM tests (GRIDPP-10)<br />
* Last GridPP Technical Meeting decided to test disk-less operation at Oxford for CMS (GRIDPP-20) and LHCb (GRIDPP-21).<br />
<br />
'''Tuesday 6 Oct'''<br />
* UCL Vac site now running LHCb test of two payloads per dual processor VM. Total of dual processor VMs at UCL now 120.<br />
<br />
'''Tuesday 29 Sep'''<br />
* UCL Vac site updated with most recent version of Vac-in-a-Box. Now running ~216 jobs: LHCb MC and ATLAS certification jobs.<br />
* Drawing up list of tasks needed to be able to run a site for GridPP-supported VOs purely using VMs (e.g. VM certification by experiments etc.)<br />
* Discussion at GridPP Technical Meeting on storage options, including xrootd-based sites (i.e. xrootd not DPM/dCache)<br />
<br />
'''Tuesday 22 Sep'''<br />
* Fortnightly [https://indico.cern.ch/category/4454/ GridPP Technical Meetings] on Fridays will have Tier-2 Evolution discussions, starting on Fri 25 Sept.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End T2EVO****************** -----><br />
<!-- ****************Start Accounting****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Accounting - [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html UK Grid Metrics] [[HEPSPEC06]] [http://tinyurl.com/cfbfanf Atlas Dashboard HS06]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 3rd November'''<br />
* APEL delay (normal state) Lancaster and Sheffield.<br />
<br />
'''Tuesday 20th October'''<br />
The WLCG MB decided to create a Benchmarking Task force led by Helge Meinhard see [https://indico.cern.ch/event/433303/contribution/9/attachments/1154494/1658853/2015-09-15-LCGMB-Benchmarking.pdf talk]<br />
<br />
'''Tuesday 22nd September'''<br />
* Slight delay for Sheffield but overall okay - although there is a gap between today's date and the most recent update for all sites. Perhaps an APEL delay.<br />
<br />
'''Monday 20th July'''<br />
* Oxford publishing 0 cores from Cream today. Maybe they forgot to switch one off. [http://goc-accounting.grid-support.ac.uk/apel/jobs2_withsubmithost.html Check here]. <br />
<br />
* A reminder to keep updating the [https://www.gridpp.ac.uk/wiki/HEPSPEC06 HEPSPEC06 tables].<br />
<br />
* [http://accounting.egi.eu/egi.php?SubRegion=1.67&query=normcpu&startYear=2014&startMonth=8&endYear=2014&endMonth=9&yRange=SITE&xRange=VO&voGroup=lhc&chart=GRBAR&scale=LIN&localJobs=onlygridjobs APEL status]: An issue at Sheffield?<br />
<br />
<!-- *****************Edit stop*********************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End Accounting****************** -----><br />
<!-- ****************Start Documentation****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Documentation - [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=area KeyDocs]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- *********************************************************** -----><br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Friday 6th Nov, 2015'''<br />
<br />
SteveJ: Advice to admins published about a common GSS error, globus_gsi_callback_module: Could not verify <br />
credential etc.<br />
<br />
https://www.gridpp.ac.uk/wiki/Security_system_errors_and_workarounds<br />
<br />
'''Tuesday 20th October, 2015'''<br />
<br />
Approved VOs document updated with temporary section for [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs#Approved_VOs_in_the_process_of_being_established LZ]<br />
<br />
'''Tuesday 29nd September'''<br />
Steve J: problems with voms server at fnal, voms.fnal.gov, have been detected; I will resolve them soon and may issue an update to Approved VOs, alerting sites with TB_SUPPORT should that occur. Approved VOs potentially affected are CDF, DZERO, LSST. Please do not act act yet.<br />
<br />
'''Tuesday 22nd September'''<br />
* Steve J is going to undertake some GridPP/documentation usability testing. <br />
<br />
'''Tuesday 18th August'''<br />
* Lydia's document - Setup a system to do data archiving using FTS3<br />
<br />
'''Tuesday 28th July'''<br />
* Ewan: /cvmfs/gridpp-vo help ... there's a lot of historical stuff on the GridPP wiki that makes it look a lot more complicated than it is now. We really should have a bit of a clear out at some point.<br />
<br />
'''Tuesday 23rd June'''<br />
* Reminder that documents need reviewing!<br />
<br />
<br />
'''General note'''<br />
<br />
See the [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=reviewed worst KeyDocs list] for documents needing review now and the names of the responsible people.<br />
<br />
|}<br />
<br />
<!-- ****************End Documentation****************** -----><br />
<!-- ****************Start Interop****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Interoperation - [https://wiki.egi.eu/wiki/Grid_Operations_Meetings EGI ops agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
''' Tuesday 20th October <br />
<br />
Meeting last week: https://wiki.egi.eu/wiki/Agenda-12-10-2015<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- *********************************************************** -----><br />
<br />
|}<br />
<!-- ****************End Interop****************** -----><br />
<!-- ****************Start Monitoring****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Monitoring - [https://www.gridpp.ac.uk/wiki/Links_Monitoring_pages Links] [http://grid-monitoring.cern.ch/mywlcg/ MyWLCG]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 16th June'''<br />
* F Melaccio & D Crooks decided to add a [https://www.gridpp.ac.uk/wiki/Monitoring_FAQs FAQs section] devoted to common monitoring issues under the monitoring page.<br />
* Feedback welcome. <br />
<br />
<br />
<br />
''' Tuesday 31st March<br />
<br />
* Glasgow Graphite/Grafana documentation: http://www.scotgrid.ac.uk/graphite/<br />
<br />
'''Monday 7th December<br />
<br />
* Meeting last Friday - agenda: https://indico.cern.ch/event/356853/ minutes: https://indico.cern.ch/event/356853/material/minutes/1.pdf<br />
* This was the wrap-up meeting of the consolidation TF; the mailing list will remain extant for a while yet.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Monitoring****************** -----><br />
<!-- ****************On-duty****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | On-duty - [https://operations-portal.in2p3.fr/dashboard Dashboard] [https://www.gridpp.ac.uk/wiki/ROD_rota ROD rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 3rd November'''<br />
* The long-standing UCL availability alarm went green yesterday on 29th October. We are not sure why!<br />
* Quite a lot of activity on the dashboard this week, but only one or two new tickets. <br />
* Tickets: Five for availability / reliability: Sussex, Sheffield, Liverpool, Lancaster and UCL. Two for GLUE2 validation: Liverpool and QMUL. One for the CEs at QMU<br />
<br />
'''Tuesday 20th October'''<br />
<br />
Lots of bits here and there, but no big pattern. Tickets about CE and storage problems open at several sites. QMUL notable as going on for a while, probably with some kind of configuration problem they're not identifying.<br />
<br />
<br />
'''Tuesday 6th October'''<br />
* With the exception of the dashboard getting really confused early in the week as the Nagios instances at Oxford and Lancaster came and went, it's been a fairly quiet week. There are four outstanding tickets:<br />
** Three for availability / reliabaility (Sussex, Liverpool and Lancaster).<br />
** One at Bristol for a GridFTP transfer problem.<br />
<br />
'''Tuesday 15th September'''<br />
* Generally quiet. QMUL have some grumblyness with the CEs. However, I understand much of this is caused by the batch farm being busy. There are low-availability tickets 'on hold' for Liverpool and UCL.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Rollout [http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html Status] [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions WLCG Baseline]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 15th September'''<br />
* [http://repository.egi.eu/2015/09/10/release-umd-3-13-3/ UMD 3.13.3 is available].<br />
<br />
'''Tuesday 12th May'''<br />
* MW Readiness WG meeting Wed May 6th at 4pm. Attended by Raul, Matt, Sam and Jeremy.<br />
<br />
'''Tuesday 17th March'''<br />
* Daniela has updated the [ https://www.gridpp.ac.uk/wiki/Staged_rollout_emi3 EMI-3 testing table]. Please check it is correct for your site. We want a clear view of where we are contributing.<br />
* There is a middleware readiness meeting this Wednesday. Would be good if a few site representatives joined.<br />
* Machine job features solution testing. Fed back that we will only commence tests if more documentation made available. This stops the HTC solution until after CHEP. Is there interest in testing other batch systems? Raul mentioned SLURM. There is also SGE and Torque.<br />
<br />
'''References'''<br />
<br />
* Staged Rollout pages (now separated into EMI1 & 2), and the page listing the deployed versions is extractable from the bdii, so they should all be reasonably up-to-date:<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout_emi2.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End On-duty****************** -----><br />
<!-- ****************Start Security****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Security - [http://www.gridpp.ac.uk/security/inchand/Incident.html Incident Procedure] [http://www.gridpp.ac.uk/security/policies/index.html Policies] [https://www.gridpp.ac.uk/wiki/SD_rota Rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
'''Monday 9th November'''<br />
* EGI SVG Advisory - 'Critical' risk. Remote arbitrary code execution vulnerabilities in the core crypto library used by RedHat - [https://wiki.egi.eu/wiki/SVG:Advisory-SVG-2015-CVE-2015-7183 Advisory-SVG-2015-CVE-2015-7183] All running resources based on Red hat and its derivatives MUST be patched by 2015-11-13 <br />
<br />
'''Tuesday 3rd November'''<br />
* EGI SVG Advisory - Various Java CVE's with max CVSS score.<br />
<br />
'''Monday 26th October'''<br />
* Updated IGTF distribution version [ https://dist.igtf.net/distribution/igtf/current/ 1.69 available].<br />
* Next UK Security Team meeting scheduled for 28th Oct.<br />
<br />
The EGI [https://operations-portal.egi.eu/csiDashboard security dashboard].<br />
<br />
|}<br />
<!-- ****************End Security****************** -----><br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start Services****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Services - [http://psmad.grid.iu.edu/maddash-webui/ PerfSonar dashboard] | [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).<br />
<br />
'''Tuesday 6th October'''<br />
* A reminder, the next [https://indico.cern.ch/event/401680/ LHCOPN and LHCONE joint meeting]: Science Park Amsterdam (NL) 28-29 of October 2015<br />
<br />
'''Tuesday 14th July'''<br />
* GridPP35 in September will have a part focus on networking and IPv6. This will include a review of where sites are with their deployment. Please try to firm up dates for your IPv6 availability between now and September. Please update the [https://gridpp.ac.uk/wiki/IPv6_site_status GridPP IPv6 status table]. <br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Services****************** -----><br />
<!-- ****************Start Tickets****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tickets<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 1oth November 2015, 10.20 GMT'''<br /><br />
Down to 13 Open tickets this week!<br />
<br />
All the UK can be found [http://tinyurl.com/nwgrnys '''here'''].<br />
<br />
Not very exciting - the ECDF-RDF ticket [https://ggus.eu/?mode=ticket_info&ticket_id=117447 117447] (atlas complaining about stale file handle error messages - but it's for the "RDF" so not urgent?) might have slipped past Andy's sentries yesterday, and the two remaining pilot role tickets for Sheffield and the Tier 1 ([https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] and [https://ggus.eu/?mode=ticket_info&ticket_id=116866 116866]) could really do with updates - with the latter Daniela notes that the gridpp pilots roles are needed for testing purposes.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios''']<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tickets****************** -----><br />
<!-- ****************Start Tools****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tools - [https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] [https://gridppnagios.physics.ox.ac.uk/nagios/ Nagios]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== ===== <br />
<!-- ******************Edit start********************* -----><br />
''' Tuesday 6 Oct 2015'''<br />
<br />
Moved Gridppnagios instance back to Oxford from Lancaster. It was kind of double whammy as both sites went down together. Fortunately Oxford site was partially working so we managed to start SAM Nagios at Oxford. Sam tests were unavailable for few hours but no affect on egi availibilty/reliability. Sites can have a look at https://mon.egi.eu/myegi/ss/ for a/r status. <br />
<br />
'''Tuesday 29 Sep 2015'''<br />
<br />
Following an air-conditioning problem at machine room in Oxford Tier-2 site on 26 September, gridppnagios(OX) was shut down and gridppnagios(Lancs) became active instance. Oxford site is in downtime until 1st Oct and it may be extended depending on the situation. <br />
VO-Nagios was also unavailable for two days but we have started it yesterday as it is running on a VM. VO-nagios is using oxford SE for replication test so it is failing those tests. I am looking to change to some other SE. <br />
<br />
'''Tuesday 09 June 2015'''<br />
*ARC CEs were failing nagios test becuase of non-availability of egi repository. Nagios test compare CA version from EGI repo. It started on 5th June and one of the IP addresses behind webserver was not responding. Problem went away in approximately 3 hours. The same problem started again on 6th June. Finally it was fixed on 8th June. No reason was given in any of the ticket opened regarding this outage. <br />
<br />
'''Tuesday 17th February'''<br />
* Another period where message brokers were temporarily unavailable seen yesterday. Any news on the last follow-up?<br />
<br />
'''Tuesday 27th January'''<br />
* Unscheduled outage of the EGI message broker (GRNET) caused a short-lived disruption to GridPP site monitoring (jobs failed) last Thursday 22nd January. Suspect BDII caching meant no immediate failover to stomp://mq.cro-ngi.hr:6163/ from stomp://mq.afroditi.hellasgrid.gr:6163/<br />
<br />
* [http://southgrid.blogspot.co.uk/2014/10/nagios-monitoring-for-non-lhc-vos.html Blog about VO Nagios]<br />
* [https://vo-nagios.physics.ox.ac.uk/nagios/ Oxford VO Nagios] currently monitoring gridpp, pheno, t2k.org, snoplus.snolab.ca, vo.southgrid.ac.uk.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tools****************** -----><br />
<!-- ****************Start VOs****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | VOs - [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS] [http://operations-portal.egi.eu/vo VO IDs] [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs Approved] [http://pprc.qmul.ac.uk/~walker/votable.html VO table]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 19th May'''<br />
* There is a current priority for enabling/supporting our joining communities. <br />
<br />
'''Tuesday 5th May'''<br />
* We have a number of VOs to be removed. Dedicated follow-up meeting proposed.<br />
<br />
'''Tuesday 28th April'''<br />
* For SNOPLUS.SNOLAB.CA, the port numbers for voms02.gridpp.ac.uk and voms03.gridpp.ac.uk have both been updated from 15003 to 15503.<br />
<br />
'''Tuesday 31st March'''<br />
* LIGO are in need of additional support for debugging some tests.<br />
* LSST now enabled on 3 sites. No 'own' CVMFS yet.<br />
<br />
* Impact<br />
** Citation policy (https://www.gridpp.ac.uk/acknowledging.html)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End VOs****************** -----><br />
<!-- ****************Start Sites****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Site Updates<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 24th February'''<br />
* Next review of status today.<br />
<br />
'''Tuesday 27th January'''<br />
* Squids not in GOCDB for: UCL; ECDF; Birmingham; Durham; RHUL; IC; Sussex; Lancaster<br />
* Squids in GOCDB for: EFDA-JET; Manchester; Liverpool; Cambridge; Sheffield; Bristol; Brunel; QMUL; T1; Oxford; Glasgow; RALPPD.<br />
<br />
'''Tuesday 2nd December''' <br />
* [https://www.gridpp.ac.uk/wiki/Batch_system_status Multicore status]. Queues available (63%)<br />
** YES: RAL T1; Brunel; Imperial; QMUL; Lancaster; Liverpool; Manchester; Glasgow; Cambridge; Oxford; RALPP; Sussex (12)<br />
** NO: RHUL (testing); UCL; Sheffield (testing); Durham; ECDF (testing); Birmingham; Bristol (7)<br />
<br />
* According to our [https://www.gridpp.ac.uk/wiki/Batch_system_status table] for cloud/VMs (26%)<br />
** YES: RAL T1; Brunel; Imperial; Manchester; Oxford (5)<br />
** NO: QMUL; RHUL; UCL; Lancaster; Liverpool; Sheffield; Durham; ECDF; Glasgow; Birmingham; Bristol; Cambridge; RALPP; Sussex (14)<br />
<br />
* [http://www.gridpp.ac.uk/php/gridpp-dirac-sam.php?action=viewlcg GridPP DIRAC jobs] successful (58%)<br />
** YES: Bristol; Glasgow; Lancaster; Liverpool; Manchester; Oxford; Sheffield; Brunel; IC; QMUL; RHUL (11)<br />
** NO: Cambridge; Durham; RALPP; RAL T1 (4) + ECDF; Sussex; UCL; Birmingham (4)<br />
<br />
* [https://www.gridpp.ac.uk/wiki/IPv6_site_status IPv6 status]<br />
** Allocation - 42%<br />
** YES: RAL T1; Brunel; IC; QMUL; Manchester; Sheffield; Cambridge; Oxford (8)<br />
** NO: RHUL; UCL; Lancaster; Liverpool; Durham; ECDF; Glasgow; Birmingham; Bristol; RALPP; Sussex<br />
<br />
* Dual stack nodes - 21%<br />
** YES: Brunel; IC; QMUL; Oxford (4)<br />
** NO: RHUL; UCL; Lancaster; Glasgow; Liverpool; Manchester; Sheffield; Durham; ECDF; Birmingham; Bristol; Cambridge; RALPP; Sussex, RAL T1 (15)<br />
<br />
<br />
<br />
'''Tuesday 21st October'''<br />
* High loads seen in xroot by several sites: Liverpool and RALT1... and also Bristol (see Luke's TB-S email on 16/10 for questions about changes to help).<br />
<br />
'''Tuesday 9th September'''<br />
* Intel announced the new generation of Xeon based on Haswell.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Sites****************** -----><br />
|}<br />
<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Meeting Summaries<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start PMB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Project Management Board - [http://www.gridpp.ac.uk/pmb/ Members][http://www.gridpp.ac.uk/php/pmb/minutes.php Minutes] [http://www.gridpp.ac.uk/pmb/ProjectManagement/QuarterlyReports/reports.html Quarterly Reports]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ******************************************* -----><br />
<!-- ****************End PMB****************** -----><br />
<!-- ****************Start Grid Ops****************** -----><br />
<!-- ******************************************* -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | GridPP ops meeting - [http://indico.cern.ch/categoryDisplay.py?categId=338 Agendas] [https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Actions] [https://www.gridpp.ac.uk/wiki/Category:GridPP_Operations Core Tasks]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Grid Ops****************** -----><br />
<!-- ****************Start T1 liaison****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) [https://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Agenda] Meeting takes place on Vidyo.<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
Highlights from this meeting are now included in the Tier1 report farther up this page.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Liaison****************** -----><br />
<!-- ****************Start GDB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Grid Deployment Board - [http://indico.cern.ch/categoryDisplay.py?categId=3l181 Agendas] [http://indico.cern.ch/categoryDisplay.py?categId=666 MB agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End GDB****************** -----><br />
<br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start NGI****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | NGI UK - [http://www.ukngi.ac.uk/ Homepage] [https://ca.grid-support.ac.uk/cgi-bin/pub/pki?cmd=getStaticPage&name=index CA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End NGI****************** -----><br />
<br />
<!-- ****************Start Events****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Events<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************Stop Events****************** -----><br />
<!-- ****************Start ATLAS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK ATLAS - [http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?#currentView=Shifter+view&highlight=false Shifter view] [http://www.atlas.ac.uk/ops/ News & Links]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Atlas S&C week 2-6 Feb 2015'''<br />
<br />
Production<br />
<br />
• Prodsys-2 in production since Dec 1st<br />
<br />
• Deployment has not been transparent , many issued has been solved, the grid is filled again <br />
<br />
• MC15 is expected to start soon, waiting for physics validations, evgen testing is underway and close to finalised.. Simulation expected to be broadly similar to MC14, no blockers expected. <br />
<br />
Rucio<br />
<br />
• Rucio in production since Dec 1st and is ready for LHC RUN-2. Some fields need improvements, including transfer and deletion agents, documentation and monitoring. <br />
<br />
• [https://rucio-ui.cern.ch/dumps Rucio dumps available].<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMDarkDataCleaning Dark data cleaning]<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMLostFilesLost files declaration] . Only Only DDM ops can issue lost files declaration for now, cloud support needs to fill a ticket.<br />
<br />
• Webdav panda functional tests with Hammercloud are ongoing<br />
<br />
Monitoring<br />
<br />
• [http://adc-monitoring.cern.ch/ Main page]<br />
<br />
• [http://dashb-atlas-ddm-acc.cern.ch/dashboard/request.py/ddmaccounting DDM Accounting]<br />
<br />
• [http://atlas-agis.cern.ch/agis/ddmblacklisting/listGroup space]<br />
<br />
• [http://dashb-atlas-ddm.cern.ch/ddm2/ Deletion]<br />
<br />
ASAP<br />
<br />
• ASAP (ATLAS Site Availability Performance) in place. Every 3 months the T2s sites performing BELOW 80% are [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/ATLASSiteCategorization reported to the International Computing Board.]<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End ATLAS****************** -----><br />
<!-- ****************Start CMS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK CMS<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End CMS****************** -----><br />
<!-- ****************Start LHCb****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK LHCb<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End LHCb****************** -----><br />
<!-- ****************Start Other****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK OTHER<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Other****************** -----><br />
<!-- ****************Start Requests****************** -----><br />
{| cellspacing="0" cellpadding="0" style="margin: 1em 0em 0em 0.3em; width:100%"<br />
| style="width:50%; vertical-align:top; border:1px solid Gold; background-color: LightGreen;" rowspan="2"|<br />
<div style="border-bottom:1px solid Gold; background-color:#ffffaa; padding:0.2em 0.5em 0.2em 0.5em; font-size:110%; font-weight:bold;"> To note</div><br />
<div style="padding:0.4em 1em 0.3em 1em;"><br />
<br />
==== ====<br />
<!-- ******************Edit start********************* -----><br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<br />
</div><br />
|}<br />
<br />
<!-- ****************End Requests****************** -----><br />
<br />
|}</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Operations_Bulletin_LatestOperations Bulletin Latest2015-11-10T10:39:23Z<p>Matthew Doidge 1ac9bd3994: /* */</p>
<hr />
<div>[[Operations_Bulletin_Archive|Bulletin archive]]<br />
<br />
__NOTOC__<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Week commencing 9th November 2015<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Task Areas<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start General****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | General updates <br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start General text*********************** ----->'''<br />
'''Tuesday 10th November'''<br />
* There was a WLCG GDB last week: [http://indico.cern.ch/event/319753/ Agenda]. [https://twiki.cern.ch/twiki/bin/view/LCG/GDBMeetingNotes20151104 Minutes].<br />
* ATLAS is going to move the brokering to use maxrss value rather than maxrss+maxswap (called maxmemory in the ATLAS panda queues). [https://twiki.cern.ch/twiki/bin/view/LCG/BSPassingParameters Reminder notes].<br />
* There is an EGI meeting in Bari this week. The OMB agenda/slides can be seen [https://indico.egi.eu/indico/contributionDisplay.py?contribId=1&confId=2675 here]. <br />
* An update from John Gordon on the CPU efficiency accounting discussion of last week."Stuart has the database merge under way now. We had hoped it would be done by the end of October but it isn’t complete yet. When done and we send an integrated dataset to the portal they will put the current dev view in the production portal alongside the current one. They are currently developing a major rewrite of the portal and don’t want to mess about with it too much."<br />
*J Perkin: Weird user error when reading LFC. Led to a reminder that the CA team are currently best efforts - please be patient.<br />
* J Hill: ATLAS consistency checks at Tier2s requires sites to upload stiorage dumps - please can we have a clear statement of what parameters ATLAS wants us to use for these dumps?<br />
* Simon G: Asked about Tier3 access to Tier2 storage.<br />
* LCG-ROLLOUT: TOP BDII issues with CentOS 6.7 (openldap-servers-2.4.40-5.el6.x86_64). It basically breaks but is being followed up with RH.<br />
* For those interested in the ARGUS working group meeting last week please see their [https://indico.cern.ch/event/459898/ summary].<br />
* There was a GridPP Technical Meeting last Friday: [https://indico.cern.ch/e/460437 Agenda].<br />
<br />
'''Tuesday 3rd November'''<br />
* There is a GDB this week: [https://indico.cern.ch/event/319753/ Agenda].<br />
* (re)introduction of the STRICT_RFC2818 mechanism in Globus.... See Jens's comments on TB-SUPPORT.<br />
* Re-allocation of space for ATLAS as a result of cleanup and removal of the ATLASPRODDISK space token.<br />
* Time out of Nagios glexec probe.<br />
* The GridPP hardware allocations are just about final so expect figures very soon. Purchases are this financial year.<br />
* DPM workshop 2015 7th-8th Dec at CERN - [https://indico.cern.ch/event/432642/ Registration is open].<br />
* The [http://wlcg-sam.cern.ch/reports/2015/201510/wlcg/ October WLCG A/R] figures are now available<br />
** [http://wlcg-sam.cern.ch/reports/2015/201510/wlcg/WLCG_All_Sites_ALICE_Oct2015.pdf ALICE]. All okay.<br />
** [http://wlcg-sam.cern.ch/reports/2015/201510/wlcg/WLCG_All_Sites_ATLAS_Oct2015.pdf ATLAS]. <br />
*** QMUL: 85%:85%<br />
*** Lancaster: N/a:N/a?<br />
*** Liverpool: 85%:100%<br />
*** Sheffield: 78%:78%<br />
**[http://wlcg-sam.cern.ch/reports/2015/201510/wlcg/WLCG_All_Sites_CMS_Oct2015.pdf CMS]. All okay.<br />
** [http://wlcg-sam.cern.ch/reports/2015/201510/wlcg/WLCG_All_Sites_LHCB_Oct2015.pdf LHCb] <br />
*** QMUL: 79%:79%<br />
*** Liverpool: 86%:100%<br />
*** Sheffield: 86%:86%<br />
*** RAL PPD: 79%:79%<br />
* Exploring options for VM based sites (in respect of the monitoring within EGI): Perhaps setup a 'community platform'.<br />
* RCUK Cloud Working Group - a first [http://bit.ly/cloudwgdec15 workshop on the 1st December] at Imperial College.<br />
* From the MB last week: <br />
** Memory Requirements: LHC experiments all basically agreed that 2GB/core was the baseline but that some (advertised) resources with up to 4GB/core would be valuable for some workflows.<br />
** February as kick-off for technical evolution groups.<br />
** PCP - Pre-commercial-procurement and HNSciCloud. This is approved and starts January-16. UK has small involvement. The plan is to build on the hybrid cloud service that results, in order to deploy a European Open Science Cloud funded from the INFRADEV-04 (2016) call <br />
<br />
<br />
<!-- **********************End General text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End General****************** -----><br />
<!-- ****************Start ops coord****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Operations Coordination - [https://indico.cern.ch/categoryDisplay.py?categId=4372 Agendas][https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsCoordination Wiki Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start ops coord text*********************** -----><br />
<br />
'''Tuesday 10th November'''<br />
* There was a WLCG ops coordination meeting last Thursday: [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151105 Minutes]. (Alessandra chaired and might be able to talk over the main items at our ops meeting).<br />
<br />
'''Tuesday 27th October'''<br />
* 13th MW Readiness WG meeting THIS Wed 28/10 @ 4pm CET in CERN room 28-S-023 or via vidyo.<br />
* There was a WLCG ops coordination meeting last Thursday: [https://indico.cern.ch/event/393618/ Agenda] - [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151022 Minutes]. <br />
<br />
'''Next Meeting is scheduled for Thursday 22nd October'''<br />
<br />
'''Tuesday 6th October'''<br />
* There was a WLCG ops coordination meeting last week. [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151001 Minutes]. [https://indico.cern.ch/event/393617/ Agenda] (which has John Gordon's accounting slides).<br />
* The highlights:<br />
** dCache sites should install the latest fix for SRM solving a vulnerability<br />
** All sites hosting a regional or local site xrootd should updgrade it at least to version 4.1.1<br />
** CMS DPM sites should consider upgrading dpm-xrootd to version 3.5.5 now (from epel-testing) or after mid October (from epel-stable) to fix a problem affecting AAA<br />
** Tier-1 sites should do their best to avoid scheduling OUTAGE downtimes at the same time as other Tier-1's supporting common LHC VOs. A calendar will be linked in the minutes of the 3 o'clock operations meeting to easily find out if there are already downtimes at a given date<br />
** The multicore accounting for WLCG is now correct for the 99.5% of the CPU time, with the few remaining issues being addressed. Corrected historical accounting data is expected to be available from the production portal by the end of the month<br />
** All LHCb sites will soon be asked to deploy the "machine features" functionality<br />
<br />
'''Tuesday 22nd September'''<br />
* There was an ops coordination meeting last Thursday: [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150917 Minutes].<br />
* Highlights:<br />
** All 4 experiments have now an agreed workflow with the T0 for tickets that should be handled by the experiment supporters and were accidentally assigned to the T0 service managers.<br />
** A new FTS3 bug fixing release 3.3.1 is now available.<br />
** A globus lib issue is causing problems with FTS3 for sites running IPv6.<br />
** The rogue Glasgow configuration management tool replacing the current configuration for VOMS with the old one was picked up and unfortunately discussed as though sites had not got the message about using the new VOMS.<br />
** No network problems experienced with the transatlantic link despite 3 out of 4 cables being unavailable.<br />
** T0 experts are investigating the slow WN performance reported by LHCb and others.<br />
** A group of experts at CERN and CMS investigate ARGUS authentication problems affecting CMS VOBOXes.<br />
** T1 & T2 sites please observe the actions requested by ATLAS and CMS (also on the WLCG Operations portal).<br />
* Actions for [https://wlcg-ops.web.cern.ch/sys-admins/tasks Sites]; [https://wlcg-ops.web.cern.ch/experiments/tasks Experiments].<br />
<br />
'''Tuesday 15th September'''<br />
* The [https://indico.cern.ch/event/393616/ next WLCG ops coordination meeting is this Thursday 17th September]. Are there any Tier-2 issues we wish to raise? Minutes will appear [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150917 here]. <br />
<br />
<br />
<!-- **********************End ops coord text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End ops coord****************** -----><br />
<br />
<!-- ****************Start T1****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-1 - [http://www.gridpp.rl.ac.uk/status/ Status Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start T1 text*********************** -----><br />
'''Tuesday 10th November'''<br />
A reminder that there is a weekly [http://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Tier-1 experiment liaison meeting]. Notes from the last meeting [https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2015-11-04 here]<br />
* We are investigating why LHCB batch jobs sometimes fail to write results back to Castor (and the sometimes fail to write remotely as well).<br />
* We are continuing with some detailed network changes needed to remove our old core switch from the network.<br />
<!-- **********************End T1 text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End T1****************** -----><br />
<br />
<!-- ****************Start Storage & DM****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Storage & Data Management - [http://storage.esc.rl.ac.uk/weekly/ Agendas/Minutes]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20151028-minutes.txt Wednesday 28 Oct]'''<br />
* Summary of "UK T0" workshop - GridPP well represented<br />
* Sites should not upgrade to DPM 1.8.10 just yet<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150902-minutes.txt Wednesday 02 Sep]'''<br />
* Catch up with [http://www3.imperial.ac.uk/highenergyphysics/research/experiments/mice MICE]<br />
* How to do transfers of '''lots''' of files with FTS3 without the proxy timing out (in particularly if you need it vomsified)<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150812-minutes.txt Wednesday 12 Aug]'''<br />
* sort of housekeeping: data cleanups, catalogue synchronisation - in particular namespace dumps for VOs<br />
* GridPP storage/data at future events; GridPP35 and Hepix and Cloud data events<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150708-minutes.txt Wednesday 08 July]'''<br />
* Huge backlog of ATLAS data from Glasgow waiting to go to RAL, and oddly varying performance numbers - investigating<br />
* How physics data is like your Windows 95 games<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150701-minutes.txt Wednesday 01 July]'''<br />
* Feedback on CMS's proposal for listing contents of storage<br />
* Simple storage on expensive raided disks vs complicated storage on el cheapo or archive drives?<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End Storage & DM****************** -----><br />
<!-- ****************Start T2EVO****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-2 Evolution - [https://its.cern.ch/jira/browse/GRIDPP/ GridPP JIRA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Monday 9 Nov'''<br />
* Vac 0.19.0 released last week: new VacQuery protocol<br />
* Prototyping Vcycle(/Vac) GLUE2 publishing: talk at WLCG InfoSys Evolution TF on Thursday<br />
* Please now use https://repo.gridpp.ac.uk/vacproject/ URLs for user_data and cernvm3.iso files (in preparation for move to new www.gridpp.ac.uk website.)<br />
* See Laurence's GDB talk "Helix Nebula Price Enquiry Results" for more use of Vcycle<br />
<br />
'''Tuesday 3 Nov'''<br />
* LHCb prototype of GOCDB pointers to resource BDII done<br />
* T2C tests at Oxford ongoing<br />
<br />
'''Tuesday 20 Oct'''<br />
* LHCb multipilot VMs now in production<br />
* Support for APEL-Sync records in Vac, but need to co-ordinate with APEL team to validate it. This is to allow pure-VM sites like UCL to pass APEL SAM tests (GRIDPP-10)<br />
* Last GridPP Technical Meeting decided to test disk-less operation at Oxford for CMS (GRIDPP-20) and LHCb (GRIDPP-21).<br />
<br />
'''Tuesday 6 Oct'''<br />
* UCL Vac site now running LHCb test of two payloads per dual processor VM. Total of dual processor VMs at UCL now 120.<br />
<br />
'''Tuesday 29 Sep'''<br />
* UCL Vac site updated with most recent version of Vac-in-a-Box. Now running ~216 jobs: LHCb MC and ATLAS certification jobs.<br />
* Drawing up list of tasks needed to be able to run a site for GridPP-supported VOs purely using VMs (e.g. VM certification by experiments etc.)<br />
* Discussion at GridPP Technical Meeting on storage options, including xrootd-based sites (i.e. xrootd not DPM/dCache)<br />
<br />
'''Tuesday 22 Sep'''<br />
* Fortnightly [https://indico.cern.ch/category/4454/ GridPP Technical Meetings] on Fridays will have Tier-2 Evolution discussions, starting on Fri 25 Sept.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End T2EVO****************** -----><br />
<!-- ****************Start Accounting****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Accounting - [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html UK Grid Metrics] [[HEPSPEC06]] [http://tinyurl.com/cfbfanf Atlas Dashboard HS06]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 3rd November'''<br />
* APEL delay (normal state) Lancaster and Sheffield.<br />
<br />
'''Tuesday 20th October'''<br />
The WLCG MB decided to create a Benchmarking Task force led by Helge Meinhard see [https://indico.cern.ch/event/433303/contribution/9/attachments/1154494/1658853/2015-09-15-LCGMB-Benchmarking.pdf talk]<br />
<br />
'''Tuesday 22nd September'''<br />
* Slight delay for Sheffield but overall okay - although there is a gap between today's date and the most recent update for all sites. Perhaps an APEL delay.<br />
<br />
'''Monday 20th July'''<br />
* Oxford publishing 0 cores from Cream today. Maybe they forgot to switch one off. [http://goc-accounting.grid-support.ac.uk/apel/jobs2_withsubmithost.html Check here]. <br />
<br />
* A reminder to keep updating the [https://www.gridpp.ac.uk/wiki/HEPSPEC06 HEPSPEC06 tables].<br />
<br />
* [http://accounting.egi.eu/egi.php?SubRegion=1.67&query=normcpu&startYear=2014&startMonth=8&endYear=2014&endMonth=9&yRange=SITE&xRange=VO&voGroup=lhc&chart=GRBAR&scale=LIN&localJobs=onlygridjobs APEL status]: An issue at Sheffield?<br />
<br />
<!-- *****************Edit stop*********************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End Accounting****************** -----><br />
<!-- ****************Start Documentation****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Documentation - [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=area KeyDocs]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- *********************************************************** -----><br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Friday 6th Nov, 2015'''<br />
<br />
SteveJ: Advice to admins published about a common GSS error, globus_gsi_callback_module: Could not verify <br />
credential etc.<br />
<br />
https://www.gridpp.ac.uk/wiki/Security_system_errors_and_workarounds<br />
<br />
'''Tuesday 20th October, 2015'''<br />
<br />
Approved VOs document updated with temporary section for [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs#Approved_VOs_in_the_process_of_being_established LZ]<br />
<br />
'''Tuesday 29nd September'''<br />
Steve J: problems with voms server at fnal, voms.fnal.gov, have been detected; I will resolve them soon and may issue an update to Approved VOs, alerting sites with TB_SUPPORT should that occur. Approved VOs potentially affected are CDF, DZERO, LSST. Please do not act act yet.<br />
<br />
'''Tuesday 22nd September'''<br />
* Steve J is going to undertake some GridPP/documentation usability testing. <br />
<br />
'''Tuesday 18th August'''<br />
* Lydia's document - Setup a system to do data archiving using FTS3<br />
<br />
'''Tuesday 28th July'''<br />
* Ewan: /cvmfs/gridpp-vo help ... there's a lot of historical stuff on the GridPP wiki that makes it look a lot more complicated than it is now. We really should have a bit of a clear out at some point.<br />
<br />
'''Tuesday 23rd June'''<br />
* Reminder that documents need reviewing!<br />
<br />
<br />
'''General note'''<br />
<br />
See the [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=reviewed worst KeyDocs list] for documents needing review now and the names of the responsible people.<br />
<br />
|}<br />
<br />
<!-- ****************End Documentation****************** -----><br />
<!-- ****************Start Interop****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Interoperation - [https://wiki.egi.eu/wiki/Grid_Operations_Meetings EGI ops agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
''' Tuesday 20th October <br />
<br />
Meeting last week: https://wiki.egi.eu/wiki/Agenda-12-10-2015<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- *********************************************************** -----><br />
<br />
|}<br />
<!-- ****************End Interop****************** -----><br />
<!-- ****************Start Monitoring****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Monitoring - [https://www.gridpp.ac.uk/wiki/Links_Monitoring_pages Links] [http://grid-monitoring.cern.ch/mywlcg/ MyWLCG]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 16th June'''<br />
* F Melaccio & D Crooks decided to add a [https://www.gridpp.ac.uk/wiki/Monitoring_FAQs FAQs section] devoted to common monitoring issues under the monitoring page.<br />
* Feedback welcome. <br />
<br />
<br />
<br />
''' Tuesday 31st March<br />
<br />
* Glasgow Graphite/Grafana documentation: http://www.scotgrid.ac.uk/graphite/<br />
<br />
'''Monday 7th December<br />
<br />
* Meeting last Friday - agenda: https://indico.cern.ch/event/356853/ minutes: https://indico.cern.ch/event/356853/material/minutes/1.pdf<br />
* This was the wrap-up meeting of the consolidation TF; the mailing list will remain extant for a while yet.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Monitoring****************** -----><br />
<!-- ****************On-duty****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | On-duty - [https://operations-portal.in2p3.fr/dashboard Dashboard] [https://www.gridpp.ac.uk/wiki/ROD_rota ROD rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 3rd November'''<br />
* The long-standing UCL availability alarm went green yesterday on 29th October. We are not sure why!<br />
* Quite a lot of activity on the dashboard this week, but only one or two new tickets. <br />
* Tickets: Five for availability / reliability: Sussex, Sheffield, Liverpool, Lancaster and UCL. Two for GLUE2 validation: Liverpool and QMUL. One for the CEs at QMU<br />
<br />
'''Tuesday 20th October'''<br />
<br />
Lots of bits here and there, but no big pattern. Tickets about CE and storage problems open at several sites. QMUL notable as going on for a while, probably with some kind of configuration problem they're not identifying.<br />
<br />
<br />
'''Tuesday 6th October'''<br />
* With the exception of the dashboard getting really confused early in the week as the Nagios instances at Oxford and Lancaster came and went, it's been a fairly quiet week. There are four outstanding tickets:<br />
** Three for availability / reliabaility (Sussex, Liverpool and Lancaster).<br />
** One at Bristol for a GridFTP transfer problem.<br />
<br />
'''Tuesday 15th September'''<br />
* Generally quiet. QMUL have some grumblyness with the CEs. However, I understand much of this is caused by the batch farm being busy. There are low-availability tickets 'on hold' for Liverpool and UCL.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Rollout [http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html Status] [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions WLCG Baseline]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 15th September'''<br />
* [http://repository.egi.eu/2015/09/10/release-umd-3-13-3/ UMD 3.13.3 is available].<br />
<br />
'''Tuesday 12th May'''<br />
* MW Readiness WG meeting Wed May 6th at 4pm. Attended by Raul, Matt, Sam and Jeremy.<br />
<br />
'''Tuesday 17th March'''<br />
* Daniela has updated the [ https://www.gridpp.ac.uk/wiki/Staged_rollout_emi3 EMI-3 testing table]. Please check it is correct for your site. We want a clear view of where we are contributing.<br />
* There is a middleware readiness meeting this Wednesday. Would be good if a few site representatives joined.<br />
* Machine job features solution testing. Fed back that we will only commence tests if more documentation made available. This stops the HTC solution until after CHEP. Is there interest in testing other batch systems? Raul mentioned SLURM. There is also SGE and Torque.<br />
<br />
'''References'''<br />
<br />
* Staged Rollout pages (now separated into EMI1 & 2), and the page listing the deployed versions is extractable from the bdii, so they should all be reasonably up-to-date:<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout_emi2.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End On-duty****************** -----><br />
<!-- ****************Start Security****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Security - [http://www.gridpp.ac.uk/security/inchand/Incident.html Incident Procedure] [http://www.gridpp.ac.uk/security/policies/index.html Policies] [https://www.gridpp.ac.uk/wiki/SD_rota Rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
'''Monday 9th November'''<br />
* EGI SVG Advisory - 'Critical' risk. Remote arbitrary code execution vulnerabilities in the core crypto library used by RedHat - [https://wiki.egi.eu/wiki/SVG:Advisory-SVG-2015-CVE-2015-7183 Advisory-SVG-2015-CVE-2015-7183] All running resources based on Red hat and its derivatives MUST be patched by 2015-11-13 <br />
<br />
'''Tuesday 3rd November'''<br />
* EGI SVG Advisory - Various Java CVE's with max CVSS score.<br />
<br />
'''Monday 26th October'''<br />
* Updated IGTF distribution version [ https://dist.igtf.net/distribution/igtf/current/ 1.69 available].<br />
* Next UK Security Team meeting scheduled for 28th Oct.<br />
<br />
The EGI [https://operations-portal.egi.eu/csiDashboard security dashboard].<br />
<br />
|}<br />
<!-- ****************End Security****************** -----><br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start Services****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Services - [http://psmad.grid.iu.edu/maddash-webui/ PerfSonar dashboard] | [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).<br />
<br />
'''Tuesday 6th October'''<br />
* A reminder, the next [https://indico.cern.ch/event/401680/ LHCOPN and LHCONE joint meeting]: Science Park Amsterdam (NL) 28-29 of October 2015<br />
<br />
'''Tuesday 14th July'''<br />
* GridPP35 in September will have a part focus on networking and IPv6. This will include a review of where sites are with their deployment. Please try to firm up dates for your IPv6 availability between now and September. Please update the [https://gridpp.ac.uk/wiki/IPv6_site_status GridPP IPv6 status table]. <br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Services****************** -----><br />
<!-- ****************Start Tickets****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tickets<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 1oth November 2015, 10.20 GMT'''<br /><br />
Down to 13 Open tickets this week!<br />
<br />
All the UK [http://tinyurl.com/nwgrnys '''tickets'''].<br />
<br />
Not very exciting - the ECDF-RDF ticket [https://ggus.eu/?mode=ticket_info&ticket_id=117447 117447] (atlas complaining about stale file handle error messages - but it's for the "RDF" so not urgent?) might have slipped past Andy's sentries yesterday, and the two remaining pilot role tickets for Sheffield and the Tier 1 ([https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] and [https://ggus.eu/?mode=ticket_info&ticket_id=116866 116866]) could really do with updates - with the latter Daniela notes that the gridpp pilots roles are needed for testing purposes.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios''']<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tickets****************** -----><br />
<!-- ****************Start Tools****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tools - [https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] [https://gridppnagios.physics.ox.ac.uk/nagios/ Nagios]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== ===== <br />
<!-- ******************Edit start********************* -----><br />
''' Tuesday 6 Oct 2015'''<br />
<br />
Moved Gridppnagios instance back to Oxford from Lancaster. It was kind of double whammy as both sites went down together. Fortunately Oxford site was partially working so we managed to start SAM Nagios at Oxford. Sam tests were unavailable for few hours but no affect on egi availibilty/reliability. Sites can have a look at https://mon.egi.eu/myegi/ss/ for a/r status. <br />
<br />
'''Tuesday 29 Sep 2015'''<br />
<br />
Following an air-conditioning problem at machine room in Oxford Tier-2 site on 26 September, gridppnagios(OX) was shut down and gridppnagios(Lancs) became active instance. Oxford site is in downtime until 1st Oct and it may be extended depending on the situation. <br />
VO-Nagios was also unavailable for two days but we have started it yesterday as it is running on a VM. VO-nagios is using oxford SE for replication test so it is failing those tests. I am looking to change to some other SE. <br />
<br />
'''Tuesday 09 June 2015'''<br />
*ARC CEs were failing nagios test becuase of non-availability of egi repository. Nagios test compare CA version from EGI repo. It started on 5th June and one of the IP addresses behind webserver was not responding. Problem went away in approximately 3 hours. The same problem started again on 6th June. Finally it was fixed on 8th June. No reason was given in any of the ticket opened regarding this outage. <br />
<br />
'''Tuesday 17th February'''<br />
* Another period where message brokers were temporarily unavailable seen yesterday. Any news on the last follow-up?<br />
<br />
'''Tuesday 27th January'''<br />
* Unscheduled outage of the EGI message broker (GRNET) caused a short-lived disruption to GridPP site monitoring (jobs failed) last Thursday 22nd January. Suspect BDII caching meant no immediate failover to stomp://mq.cro-ngi.hr:6163/ from stomp://mq.afroditi.hellasgrid.gr:6163/<br />
<br />
* [http://southgrid.blogspot.co.uk/2014/10/nagios-monitoring-for-non-lhc-vos.html Blog about VO Nagios]<br />
* [https://vo-nagios.physics.ox.ac.uk/nagios/ Oxford VO Nagios] currently monitoring gridpp, pheno, t2k.org, snoplus.snolab.ca, vo.southgrid.ac.uk.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tools****************** -----><br />
<!-- ****************Start VOs****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | VOs - [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS] [http://operations-portal.egi.eu/vo VO IDs] [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs Approved] [http://pprc.qmul.ac.uk/~walker/votable.html VO table]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 19th May'''<br />
* There is a current priority for enabling/supporting our joining communities. <br />
<br />
'''Tuesday 5th May'''<br />
* We have a number of VOs to be removed. Dedicated follow-up meeting proposed.<br />
<br />
'''Tuesday 28th April'''<br />
* For SNOPLUS.SNOLAB.CA, the port numbers for voms02.gridpp.ac.uk and voms03.gridpp.ac.uk have both been updated from 15003 to 15503.<br />
<br />
'''Tuesday 31st March'''<br />
* LIGO are in need of additional support for debugging some tests.<br />
* LSST now enabled on 3 sites. No 'own' CVMFS yet.<br />
<br />
* Impact<br />
** Citation policy (https://www.gridpp.ac.uk/acknowledging.html)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End VOs****************** -----><br />
<!-- ****************Start Sites****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Site Updates<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 24th February'''<br />
* Next review of status today.<br />
<br />
'''Tuesday 27th January'''<br />
* Squids not in GOCDB for: UCL; ECDF; Birmingham; Durham; RHUL; IC; Sussex; Lancaster<br />
* Squids in GOCDB for: EFDA-JET; Manchester; Liverpool; Cambridge; Sheffield; Bristol; Brunel; QMUL; T1; Oxford; Glasgow; RALPPD.<br />
<br />
'''Tuesday 2nd December''' <br />
* [https://www.gridpp.ac.uk/wiki/Batch_system_status Multicore status]. Queues available (63%)<br />
** YES: RAL T1; Brunel; Imperial; QMUL; Lancaster; Liverpool; Manchester; Glasgow; Cambridge; Oxford; RALPP; Sussex (12)<br />
** NO: RHUL (testing); UCL; Sheffield (testing); Durham; ECDF (testing); Birmingham; Bristol (7)<br />
<br />
* According to our [https://www.gridpp.ac.uk/wiki/Batch_system_status table] for cloud/VMs (26%)<br />
** YES: RAL T1; Brunel; Imperial; Manchester; Oxford (5)<br />
** NO: QMUL; RHUL; UCL; Lancaster; Liverpool; Sheffield; Durham; ECDF; Glasgow; Birmingham; Bristol; Cambridge; RALPP; Sussex (14)<br />
<br />
* [http://www.gridpp.ac.uk/php/gridpp-dirac-sam.php?action=viewlcg GridPP DIRAC jobs] successful (58%)<br />
** YES: Bristol; Glasgow; Lancaster; Liverpool; Manchester; Oxford; Sheffield; Brunel; IC; QMUL; RHUL (11)<br />
** NO: Cambridge; Durham; RALPP; RAL T1 (4) + ECDF; Sussex; UCL; Birmingham (4)<br />
<br />
* [https://www.gridpp.ac.uk/wiki/IPv6_site_status IPv6 status]<br />
** Allocation - 42%<br />
** YES: RAL T1; Brunel; IC; QMUL; Manchester; Sheffield; Cambridge; Oxford (8)<br />
** NO: RHUL; UCL; Lancaster; Liverpool; Durham; ECDF; Glasgow; Birmingham; Bristol; RALPP; Sussex<br />
<br />
* Dual stack nodes - 21%<br />
** YES: Brunel; IC; QMUL; Oxford (4)<br />
** NO: RHUL; UCL; Lancaster; Glasgow; Liverpool; Manchester; Sheffield; Durham; ECDF; Birmingham; Bristol; Cambridge; RALPP; Sussex, RAL T1 (15)<br />
<br />
<br />
<br />
'''Tuesday 21st October'''<br />
* High loads seen in xroot by several sites: Liverpool and RALT1... and also Bristol (see Luke's TB-S email on 16/10 for questions about changes to help).<br />
<br />
'''Tuesday 9th September'''<br />
* Intel announced the new generation of Xeon based on Haswell.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Sites****************** -----><br />
|}<br />
<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Meeting Summaries<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start PMB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Project Management Board - [http://www.gridpp.ac.uk/pmb/ Members][http://www.gridpp.ac.uk/php/pmb/minutes.php Minutes] [http://www.gridpp.ac.uk/pmb/ProjectManagement/QuarterlyReports/reports.html Quarterly Reports]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ******************************************* -----><br />
<!-- ****************End PMB****************** -----><br />
<!-- ****************Start Grid Ops****************** -----><br />
<!-- ******************************************* -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | GridPP ops meeting - [http://indico.cern.ch/categoryDisplay.py?categId=338 Agendas] [https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Actions] [https://www.gridpp.ac.uk/wiki/Category:GridPP_Operations Core Tasks]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Grid Ops****************** -----><br />
<!-- ****************Start T1 liaison****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) [https://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Agenda] Meeting takes place on Vidyo.<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
Highlights from this meeting are now included in the Tier1 report farther up this page.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Liaison****************** -----><br />
<!-- ****************Start GDB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Grid Deployment Board - [http://indico.cern.ch/categoryDisplay.py?categId=3l181 Agendas] [http://indico.cern.ch/categoryDisplay.py?categId=666 MB agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End GDB****************** -----><br />
<br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start NGI****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | NGI UK - [http://www.ukngi.ac.uk/ Homepage] [https://ca.grid-support.ac.uk/cgi-bin/pub/pki?cmd=getStaticPage&name=index CA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End NGI****************** -----><br />
<br />
<!-- ****************Start Events****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Events<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************Stop Events****************** -----><br />
<!-- ****************Start ATLAS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK ATLAS - [http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?#currentView=Shifter+view&highlight=false Shifter view] [http://www.atlas.ac.uk/ops/ News & Links]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Atlas S&C week 2-6 Feb 2015'''<br />
<br />
Production<br />
<br />
• Prodsys-2 in production since Dec 1st<br />
<br />
• Deployment has not been transparent , many issued has been solved, the grid is filled again <br />
<br />
• MC15 is expected to start soon, waiting for physics validations, evgen testing is underway and close to finalised.. Simulation expected to be broadly similar to MC14, no blockers expected. <br />
<br />
Rucio<br />
<br />
• Rucio in production since Dec 1st and is ready for LHC RUN-2. Some fields need improvements, including transfer and deletion agents, documentation and monitoring. <br />
<br />
• [https://rucio-ui.cern.ch/dumps Rucio dumps available].<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMDarkDataCleaning Dark data cleaning]<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMLostFilesLost files declaration] . Only Only DDM ops can issue lost files declaration for now, cloud support needs to fill a ticket.<br />
<br />
• Webdav panda functional tests with Hammercloud are ongoing<br />
<br />
Monitoring<br />
<br />
• [http://adc-monitoring.cern.ch/ Main page]<br />
<br />
• [http://dashb-atlas-ddm-acc.cern.ch/dashboard/request.py/ddmaccounting DDM Accounting]<br />
<br />
• [http://atlas-agis.cern.ch/agis/ddmblacklisting/listGroup space]<br />
<br />
• [http://dashb-atlas-ddm.cern.ch/ddm2/ Deletion]<br />
<br />
ASAP<br />
<br />
• ASAP (ATLAS Site Availability Performance) in place. Every 3 months the T2s sites performing BELOW 80% are [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/ATLASSiteCategorization reported to the International Computing Board.]<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End ATLAS****************** -----><br />
<!-- ****************Start CMS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK CMS<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End CMS****************** -----><br />
<!-- ****************Start LHCb****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK LHCb<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End LHCb****************** -----><br />
<!-- ****************Start Other****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK OTHER<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Other****************** -----><br />
<!-- ****************Start Requests****************** -----><br />
{| cellspacing="0" cellpadding="0" style="margin: 1em 0em 0em 0.3em; width:100%"<br />
| style="width:50%; vertical-align:top; border:1px solid Gold; background-color: LightGreen;" rowspan="2"|<br />
<div style="border-bottom:1px solid Gold; background-color:#ffffaa; padding:0.2em 0.5em 0.2em 0.5em; font-size:110%; font-weight:bold;"> To note</div><br />
<div style="padding:0.4em 1em 0.3em 1em;"><br />
<br />
==== ====<br />
<!-- ******************Edit start********************* -----><br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<br />
</div><br />
|}<br />
<br />
<!-- ****************End Requests****************** -----><br />
<br />
|}</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Operations_Bulletin_LatestOperations Bulletin Latest2015-11-10T10:25:56Z<p>Matthew Doidge 1ac9bd3994: /* */</p>
<hr />
<div>[[Operations_Bulletin_Archive|Bulletin archive]]<br />
<br />
__NOTOC__<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Week commencing 9th November 2015<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Task Areas<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start General****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | General updates <br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start General text*********************** ----->'''<br />
'''Tuesday 10th November'''<br />
* There was a WLCG GDB last week: [http://indico.cern.ch/event/319753/ Agenda]. [https://twiki.cern.ch/twiki/bin/view/LCG/GDBMeetingNotes20151104 Minutes].<br />
* ATLAS is going to move the brokering to use maxrss value rather than maxrss+maxswap (called maxmemory in the ATLAS panda queues). [https://twiki.cern.ch/twiki/bin/view/LCG/BSPassingParameters Reminder notes].<br />
* There is an EGI meeting in Bari this week. The OMB agenda/slides can be seen [https://indico.egi.eu/indico/contributionDisplay.py?contribId=1&confId=2675 here]. <br />
* An update from John Gordon on the CPU efficiency accounting discussion of last week."Stuart has the database merge under way now. We had hoped it would be done by the end of October but it isn’t complete yet. When done and we send an integrated dataset to the portal they will put the current dev view in the production portal alongside the current one. They are currently developing a major rewrite of the portal and don’t want to mess about with it too much."<br />
*J Perkin: Weird user error when reading LFC. Led to a reminder that the CA team are currently best efforts - please be patient.<br />
* J Hill: ATLAS consistency checks at Tier2s requires sites to upload stiorage dumps - please can we have a clear statement of what parameters ATLAS wants us to use for these dumps?<br />
* Simon G: Asked about Tier3 access to Tier2 storage.<br />
* LCG-ROLLOUT: TOP BDII issues with CentOS 6.7 (openldap-servers-2.4.40-5.el6.x86_64). It basically breaks but is being followed up with RH.<br />
* For those interested in the ARGUS working group meeting last week please see their [https://indico.cern.ch/event/459898/ summary].<br />
* There was a GridPP Technical Meeting last Friday: [https://indico.cern.ch/e/460437 Agenda].<br />
<br />
'''Tuesday 3rd November'''<br />
* There is a GDB this week: [https://indico.cern.ch/event/319753/ Agenda].<br />
* (re)introduction of the STRICT_RFC2818 mechanism in Globus.... See Jens's comments on TB-SUPPORT.<br />
* Re-allocation of space for ATLAS as a result of cleanup and removal of the ATLASPRODDISK space token.<br />
* Time out of Nagios glexec probe.<br />
* The GridPP hardware allocations are just about final so expect figures very soon. Purchases are this financial year.<br />
* DPM workshop 2015 7th-8th Dec at CERN - [https://indico.cern.ch/event/432642/ Registration is open].<br />
* The [http://wlcg-sam.cern.ch/reports/2015/201510/wlcg/ October WLCG A/R] figures are now available<br />
** [http://wlcg-sam.cern.ch/reports/2015/201510/wlcg/WLCG_All_Sites_ALICE_Oct2015.pdf ALICE]. All okay.<br />
** [http://wlcg-sam.cern.ch/reports/2015/201510/wlcg/WLCG_All_Sites_ATLAS_Oct2015.pdf ATLAS]. <br />
*** QMUL: 85%:85%<br />
*** Lancaster: N/a:N/a?<br />
*** Liverpool: 85%:100%<br />
*** Sheffield: 78%:78%<br />
**[http://wlcg-sam.cern.ch/reports/2015/201510/wlcg/WLCG_All_Sites_CMS_Oct2015.pdf CMS]. All okay.<br />
** [http://wlcg-sam.cern.ch/reports/2015/201510/wlcg/WLCG_All_Sites_LHCB_Oct2015.pdf LHCb] <br />
*** QMUL: 79%:79%<br />
*** Liverpool: 86%:100%<br />
*** Sheffield: 86%:86%<br />
*** RAL PPD: 79%:79%<br />
* Exploring options for VM based sites (in respect of the monitoring within EGI): Perhaps setup a 'community platform'.<br />
* RCUK Cloud Working Group - a first [http://bit.ly/cloudwgdec15 workshop on the 1st December] at Imperial College.<br />
* From the MB last week: <br />
** Memory Requirements: LHC experiments all basically agreed that 2GB/core was the baseline but that some (advertised) resources with up to 4GB/core would be valuable for some workflows.<br />
** February as kick-off for technical evolution groups.<br />
** PCP - Pre-commercial-procurement and HNSciCloud. This is approved and starts January-16. UK has small involvement. The plan is to build on the hybrid cloud service that results, in order to deploy a European Open Science Cloud funded from the INFRADEV-04 (2016) call <br />
<br />
<br />
<!-- **********************End General text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End General****************** -----><br />
<!-- ****************Start ops coord****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Operations Coordination - [https://indico.cern.ch/categoryDisplay.py?categId=4372 Agendas][https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsCoordination Wiki Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start ops coord text*********************** -----><br />
<br />
'''Tuesday 10th November'''<br />
* There was a WLCG ops coordination meeting last Thursday: [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151105 Minutes]. (Alessandra chaired and might be able to talk over the main items at our ops meeting).<br />
<br />
'''Tuesday 27th October'''<br />
* 13th MW Readiness WG meeting THIS Wed 28/10 @ 4pm CET in CERN room 28-S-023 or via vidyo.<br />
* There was a WLCG ops coordination meeting last Thursday: [https://indico.cern.ch/event/393618/ Agenda] - [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151022 Minutes]. <br />
<br />
'''Next Meeting is scheduled for Thursday 22nd October'''<br />
<br />
'''Tuesday 6th October'''<br />
* There was a WLCG ops coordination meeting last week. [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151001 Minutes]. [https://indico.cern.ch/event/393617/ Agenda] (which has John Gordon's accounting slides).<br />
* The highlights:<br />
** dCache sites should install the latest fix for SRM solving a vulnerability<br />
** All sites hosting a regional or local site xrootd should updgrade it at least to version 4.1.1<br />
** CMS DPM sites should consider upgrading dpm-xrootd to version 3.5.5 now (from epel-testing) or after mid October (from epel-stable) to fix a problem affecting AAA<br />
** Tier-1 sites should do their best to avoid scheduling OUTAGE downtimes at the same time as other Tier-1's supporting common LHC VOs. A calendar will be linked in the minutes of the 3 o'clock operations meeting to easily find out if there are already downtimes at a given date<br />
** The multicore accounting for WLCG is now correct for the 99.5% of the CPU time, with the few remaining issues being addressed. Corrected historical accounting data is expected to be available from the production portal by the end of the month<br />
** All LHCb sites will soon be asked to deploy the "machine features" functionality<br />
<br />
'''Tuesday 22nd September'''<br />
* There was an ops coordination meeting last Thursday: [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150917 Minutes].<br />
* Highlights:<br />
** All 4 experiments have now an agreed workflow with the T0 for tickets that should be handled by the experiment supporters and were accidentally assigned to the T0 service managers.<br />
** A new FTS3 bug fixing release 3.3.1 is now available.<br />
** A globus lib issue is causing problems with FTS3 for sites running IPv6.<br />
** The rogue Glasgow configuration management tool replacing the current configuration for VOMS with the old one was picked up and unfortunately discussed as though sites had not got the message about using the new VOMS.<br />
** No network problems experienced with the transatlantic link despite 3 out of 4 cables being unavailable.<br />
** T0 experts are investigating the slow WN performance reported by LHCb and others.<br />
** A group of experts at CERN and CMS investigate ARGUS authentication problems affecting CMS VOBOXes.<br />
** T1 & T2 sites please observe the actions requested by ATLAS and CMS (also on the WLCG Operations portal).<br />
* Actions for [https://wlcg-ops.web.cern.ch/sys-admins/tasks Sites]; [https://wlcg-ops.web.cern.ch/experiments/tasks Experiments].<br />
<br />
'''Tuesday 15th September'''<br />
* The [https://indico.cern.ch/event/393616/ next WLCG ops coordination meeting is this Thursday 17th September]. Are there any Tier-2 issues we wish to raise? Minutes will appear [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150917 here]. <br />
<br />
<br />
<!-- **********************End ops coord text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End ops coord****************** -----><br />
<br />
<!-- ****************Start T1****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-1 - [http://www.gridpp.rl.ac.uk/status/ Status Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start T1 text*********************** -----><br />
'''Tuesday 10th November'''<br />
A reminder that there is a weekly [http://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Tier-1 experiment liaison meeting]. Notes from the last meeting [https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2015-11-04 here]<br />
* We are investigating why LHCB batch jobs sometimes fail to write results back to Castor (and the sometimes fail to write remotely as well).<br />
* We are continuing with some detailed network changes needed to remove our old core switch from the network.<br />
<!-- **********************End T1 text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End T1****************** -----><br />
<br />
<!-- ****************Start Storage & DM****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Storage & Data Management - [http://storage.esc.rl.ac.uk/weekly/ Agendas/Minutes]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20151028-minutes.txt Wednesday 28 Oct]'''<br />
* Summary of "UK T0" workshop - GridPP well represented<br />
* Sites should not upgrade to DPM 1.8.10 just yet<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150902-minutes.txt Wednesday 02 Sep]'''<br />
* Catch up with [http://www3.imperial.ac.uk/highenergyphysics/research/experiments/mice MICE]<br />
* How to do transfers of '''lots''' of files with FTS3 without the proxy timing out (in particularly if you need it vomsified)<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150812-minutes.txt Wednesday 12 Aug]'''<br />
* sort of housekeeping: data cleanups, catalogue synchronisation - in particular namespace dumps for VOs<br />
* GridPP storage/data at future events; GridPP35 and Hepix and Cloud data events<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150708-minutes.txt Wednesday 08 July]'''<br />
* Huge backlog of ATLAS data from Glasgow waiting to go to RAL, and oddly varying performance numbers - investigating<br />
* How physics data is like your Windows 95 games<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150701-minutes.txt Wednesday 01 July]'''<br />
* Feedback on CMS's proposal for listing contents of storage<br />
* Simple storage on expensive raided disks vs complicated storage on el cheapo or archive drives?<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End Storage & DM****************** -----><br />
<!-- ****************Start T2EVO****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-2 Evolution - [https://its.cern.ch/jira/browse/GRIDPP/ GridPP JIRA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Monday 9 Nov'''<br />
* Vac 0.19.0 released last week: new VacQuery protocol<br />
* Prototyping Vcycle(/Vac) GLUE2 publishing: talk at WLCG InfoSys Evolution TF on Thursday<br />
* Please now use https://repo.gridpp.ac.uk/vacproject/ URLs for user_data and cernvm3.iso files (in preparation for move to new www.gridpp.ac.uk website.)<br />
* See Laurence's GDB talk "Helix Nebula Price Enquiry Results" for more use of Vcycle<br />
<br />
'''Tuesday 3 Nov'''<br />
* LHCb prototype of GOCDB pointers to resource BDII done<br />
* T2C tests at Oxford ongoing<br />
<br />
'''Tuesday 20 Oct'''<br />
* LHCb multipilot VMs now in production<br />
* Support for APEL-Sync records in Vac, but need to co-ordinate with APEL team to validate it. This is to allow pure-VM sites like UCL to pass APEL SAM tests (GRIDPP-10)<br />
* Last GridPP Technical Meeting decided to test disk-less operation at Oxford for CMS (GRIDPP-20) and LHCb (GRIDPP-21).<br />
<br />
'''Tuesday 6 Oct'''<br />
* UCL Vac site now running LHCb test of two payloads per dual processor VM. Total of dual processor VMs at UCL now 120.<br />
<br />
'''Tuesday 29 Sep'''<br />
* UCL Vac site updated with most recent version of Vac-in-a-Box. Now running ~216 jobs: LHCb MC and ATLAS certification jobs.<br />
* Drawing up list of tasks needed to be able to run a site for GridPP-supported VOs purely using VMs (e.g. VM certification by experiments etc.)<br />
* Discussion at GridPP Technical Meeting on storage options, including xrootd-based sites (i.e. xrootd not DPM/dCache)<br />
<br />
'''Tuesday 22 Sep'''<br />
* Fortnightly [https://indico.cern.ch/category/4454/ GridPP Technical Meetings] on Fridays will have Tier-2 Evolution discussions, starting on Fri 25 Sept.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End T2EVO****************** -----><br />
<!-- ****************Start Accounting****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Accounting - [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html UK Grid Metrics] [[HEPSPEC06]] [http://tinyurl.com/cfbfanf Atlas Dashboard HS06]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 3rd November'''<br />
* APEL delay (normal state) Lancaster and Sheffield.<br />
<br />
'''Tuesday 20th October'''<br />
The WLCG MB decided to create a Benchmarking Task force led by Helge Meinhard see [https://indico.cern.ch/event/433303/contribution/9/attachments/1154494/1658853/2015-09-15-LCGMB-Benchmarking.pdf talk]<br />
<br />
'''Tuesday 22nd September'''<br />
* Slight delay for Sheffield but overall okay - although there is a gap between today's date and the most recent update for all sites. Perhaps an APEL delay.<br />
<br />
'''Monday 20th July'''<br />
* Oxford publishing 0 cores from Cream today. Maybe they forgot to switch one off. [http://goc-accounting.grid-support.ac.uk/apel/jobs2_withsubmithost.html Check here]. <br />
<br />
* A reminder to keep updating the [https://www.gridpp.ac.uk/wiki/HEPSPEC06 HEPSPEC06 tables].<br />
<br />
* [http://accounting.egi.eu/egi.php?SubRegion=1.67&query=normcpu&startYear=2014&startMonth=8&endYear=2014&endMonth=9&yRange=SITE&xRange=VO&voGroup=lhc&chart=GRBAR&scale=LIN&localJobs=onlygridjobs APEL status]: An issue at Sheffield?<br />
<br />
<!-- *****************Edit stop*********************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End Accounting****************** -----><br />
<!-- ****************Start Documentation****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Documentation - [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=area KeyDocs]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- *********************************************************** -----><br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Friday 6th Nov, 2015'''<br />
<br />
SteveJ: Advice to admins published about a common GSS error, globus_gsi_callback_module: Could not verify <br />
credential etc.<br />
<br />
https://www.gridpp.ac.uk/wiki/Security_system_errors_and_workarounds<br />
<br />
'''Tuesday 20th October, 2015'''<br />
<br />
Approved VOs document updated with temporary section for [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs#Approved_VOs_in_the_process_of_being_established LZ]<br />
<br />
'''Tuesday 29nd September'''<br />
Steve J: problems with voms server at fnal, voms.fnal.gov, have been detected; I will resolve them soon and may issue an update to Approved VOs, alerting sites with TB_SUPPORT should that occur. Approved VOs potentially affected are CDF, DZERO, LSST. Please do not act act yet.<br />
<br />
'''Tuesday 22nd September'''<br />
* Steve J is going to undertake some GridPP/documentation usability testing. <br />
<br />
'''Tuesday 18th August'''<br />
* Lydia's document - Setup a system to do data archiving using FTS3<br />
<br />
'''Tuesday 28th July'''<br />
* Ewan: /cvmfs/gridpp-vo help ... there's a lot of historical stuff on the GridPP wiki that makes it look a lot more complicated than it is now. We really should have a bit of a clear out at some point.<br />
<br />
'''Tuesday 23rd June'''<br />
* Reminder that documents need reviewing!<br />
<br />
<br />
'''General note'''<br />
<br />
See the [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=reviewed worst KeyDocs list] for documents needing review now and the names of the responsible people.<br />
<br />
|}<br />
<br />
<!-- ****************End Documentation****************** -----><br />
<!-- ****************Start Interop****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Interoperation - [https://wiki.egi.eu/wiki/Grid_Operations_Meetings EGI ops agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
''' Tuesday 20th October <br />
<br />
Meeting last week: https://wiki.egi.eu/wiki/Agenda-12-10-2015<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- *********************************************************** -----><br />
<br />
|}<br />
<!-- ****************End Interop****************** -----><br />
<!-- ****************Start Monitoring****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Monitoring - [https://www.gridpp.ac.uk/wiki/Links_Monitoring_pages Links] [http://grid-monitoring.cern.ch/mywlcg/ MyWLCG]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 16th June'''<br />
* F Melaccio & D Crooks decided to add a [https://www.gridpp.ac.uk/wiki/Monitoring_FAQs FAQs section] devoted to common monitoring issues under the monitoring page.<br />
* Feedback welcome. <br />
<br />
<br />
<br />
''' Tuesday 31st March<br />
<br />
* Glasgow Graphite/Grafana documentation: http://www.scotgrid.ac.uk/graphite/<br />
<br />
'''Monday 7th December<br />
<br />
* Meeting last Friday - agenda: https://indico.cern.ch/event/356853/ minutes: https://indico.cern.ch/event/356853/material/minutes/1.pdf<br />
* This was the wrap-up meeting of the consolidation TF; the mailing list will remain extant for a while yet.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Monitoring****************** -----><br />
<!-- ****************On-duty****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | On-duty - [https://operations-portal.in2p3.fr/dashboard Dashboard] [https://www.gridpp.ac.uk/wiki/ROD_rota ROD rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 3rd November'''<br />
* The long-standing UCL availability alarm went green yesterday on 29th October. We are not sure why!<br />
* Quite a lot of activity on the dashboard this week, but only one or two new tickets. <br />
* Tickets: Five for availability / reliability: Sussex, Sheffield, Liverpool, Lancaster and UCL. Two for GLUE2 validation: Liverpool and QMUL. One for the CEs at QMU<br />
<br />
'''Tuesday 20th October'''<br />
<br />
Lots of bits here and there, but no big pattern. Tickets about CE and storage problems open at several sites. QMUL notable as going on for a while, probably with some kind of configuration problem they're not identifying.<br />
<br />
<br />
'''Tuesday 6th October'''<br />
* With the exception of the dashboard getting really confused early in the week as the Nagios instances at Oxford and Lancaster came and went, it's been a fairly quiet week. There are four outstanding tickets:<br />
** Three for availability / reliabaility (Sussex, Liverpool and Lancaster).<br />
** One at Bristol for a GridFTP transfer problem.<br />
<br />
'''Tuesday 15th September'''<br />
* Generally quiet. QMUL have some grumblyness with the CEs. However, I understand much of this is caused by the batch farm being busy. There are low-availability tickets 'on hold' for Liverpool and UCL.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Rollout [http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html Status] [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions WLCG Baseline]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 15th September'''<br />
* [http://repository.egi.eu/2015/09/10/release-umd-3-13-3/ UMD 3.13.3 is available].<br />
<br />
'''Tuesday 12th May'''<br />
* MW Readiness WG meeting Wed May 6th at 4pm. Attended by Raul, Matt, Sam and Jeremy.<br />
<br />
'''Tuesday 17th March'''<br />
* Daniela has updated the [ https://www.gridpp.ac.uk/wiki/Staged_rollout_emi3 EMI-3 testing table]. Please check it is correct for your site. We want a clear view of where we are contributing.<br />
* There is a middleware readiness meeting this Wednesday. Would be good if a few site representatives joined.<br />
* Machine job features solution testing. Fed back that we will only commence tests if more documentation made available. This stops the HTC solution until after CHEP. Is there interest in testing other batch systems? Raul mentioned SLURM. There is also SGE and Torque.<br />
<br />
'''References'''<br />
<br />
* Staged Rollout pages (now separated into EMI1 & 2), and the page listing the deployed versions is extractable from the bdii, so they should all be reasonably up-to-date:<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout_emi2.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End On-duty****************** -----><br />
<!-- ****************Start Security****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Security - [http://www.gridpp.ac.uk/security/inchand/Incident.html Incident Procedure] [http://www.gridpp.ac.uk/security/policies/index.html Policies] [https://www.gridpp.ac.uk/wiki/SD_rota Rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
'''Monday 9th November'''<br />
* EGI SVG Advisory - 'Critical' risk. Remote arbitrary code execution vulnerabilities in the core crypto library used by RedHat - [https://wiki.egi.eu/wiki/SVG:Advisory-SVG-2015-CVE-2015-7183 Advisory-SVG-2015-CVE-2015-7183] All running resources based on Red hat and its derivatives MUST be patched by 2015-11-13 <br />
<br />
'''Tuesday 3rd November'''<br />
* EGI SVG Advisory - Various Java CVE's with max CVSS score.<br />
<br />
'''Monday 26th October'''<br />
* Updated IGTF distribution version [ https://dist.igtf.net/distribution/igtf/current/ 1.69 available].<br />
* Next UK Security Team meeting scheduled for 28th Oct.<br />
<br />
The EGI [https://operations-portal.egi.eu/csiDashboard security dashboard].<br />
<br />
|}<br />
<!-- ****************End Security****************** -----><br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start Services****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Services - [http://psmad.grid.iu.edu/maddash-webui/ PerfSonar dashboard] | [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).<br />
<br />
'''Tuesday 6th October'''<br />
* A reminder, the next [https://indico.cern.ch/event/401680/ LHCOPN and LHCONE joint meeting]: Science Park Amsterdam (NL) 28-29 of October 2015<br />
<br />
'''Tuesday 14th July'''<br />
* GridPP35 in September will have a part focus on networking and IPv6. This will include a review of where sites are with their deployment. Please try to firm up dates for your IPv6 availability between now and September. Please update the [https://gridpp.ac.uk/wiki/IPv6_site_status GridPP IPv6 status table]. <br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Services****************** -----><br />
<!-- ****************Start Tickets****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tickets<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 1oth November 2015, 10.20 GMT'''<br /><br />
Down to 13 Open tickets this week - a busy week for everyone but me it seems!<br />
<br />
All the UK [http://tinyurl.com/nwgrnys '''tickets'''].<br />
<br />
<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios''']<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tickets****************** -----><br />
<!-- ****************Start Tools****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tools - [https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] [https://gridppnagios.physics.ox.ac.uk/nagios/ Nagios]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== ===== <br />
<!-- ******************Edit start********************* -----><br />
''' Tuesday 6 Oct 2015'''<br />
<br />
Moved Gridppnagios instance back to Oxford from Lancaster. It was kind of double whammy as both sites went down together. Fortunately Oxford site was partially working so we managed to start SAM Nagios at Oxford. Sam tests were unavailable for few hours but no affect on egi availibilty/reliability. Sites can have a look at https://mon.egi.eu/myegi/ss/ for a/r status. <br />
<br />
'''Tuesday 29 Sep 2015'''<br />
<br />
Following an air-conditioning problem at machine room in Oxford Tier-2 site on 26 September, gridppnagios(OX) was shut down and gridppnagios(Lancs) became active instance. Oxford site is in downtime until 1st Oct and it may be extended depending on the situation. <br />
VO-Nagios was also unavailable for two days but we have started it yesterday as it is running on a VM. VO-nagios is using oxford SE for replication test so it is failing those tests. I am looking to change to some other SE. <br />
<br />
'''Tuesday 09 June 2015'''<br />
*ARC CEs were failing nagios test becuase of non-availability of egi repository. Nagios test compare CA version from EGI repo. It started on 5th June and one of the IP addresses behind webserver was not responding. Problem went away in approximately 3 hours. The same problem started again on 6th June. Finally it was fixed on 8th June. No reason was given in any of the ticket opened regarding this outage. <br />
<br />
'''Tuesday 17th February'''<br />
* Another period where message brokers were temporarily unavailable seen yesterday. Any news on the last follow-up?<br />
<br />
'''Tuesday 27th January'''<br />
* Unscheduled outage of the EGI message broker (GRNET) caused a short-lived disruption to GridPP site monitoring (jobs failed) last Thursday 22nd January. Suspect BDII caching meant no immediate failover to stomp://mq.cro-ngi.hr:6163/ from stomp://mq.afroditi.hellasgrid.gr:6163/<br />
<br />
* [http://southgrid.blogspot.co.uk/2014/10/nagios-monitoring-for-non-lhc-vos.html Blog about VO Nagios]<br />
* [https://vo-nagios.physics.ox.ac.uk/nagios/ Oxford VO Nagios] currently monitoring gridpp, pheno, t2k.org, snoplus.snolab.ca, vo.southgrid.ac.uk.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tools****************** -----><br />
<!-- ****************Start VOs****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | VOs - [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS] [http://operations-portal.egi.eu/vo VO IDs] [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs Approved] [http://pprc.qmul.ac.uk/~walker/votable.html VO table]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 19th May'''<br />
* There is a current priority for enabling/supporting our joining communities. <br />
<br />
'''Tuesday 5th May'''<br />
* We have a number of VOs to be removed. Dedicated follow-up meeting proposed.<br />
<br />
'''Tuesday 28th April'''<br />
* For SNOPLUS.SNOLAB.CA, the port numbers for voms02.gridpp.ac.uk and voms03.gridpp.ac.uk have both been updated from 15003 to 15503.<br />
<br />
'''Tuesday 31st March'''<br />
* LIGO are in need of additional support for debugging some tests.<br />
* LSST now enabled on 3 sites. No 'own' CVMFS yet.<br />
<br />
* Impact<br />
** Citation policy (https://www.gridpp.ac.uk/acknowledging.html)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End VOs****************** -----><br />
<!-- ****************Start Sites****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Site Updates<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 24th February'''<br />
* Next review of status today.<br />
<br />
'''Tuesday 27th January'''<br />
* Squids not in GOCDB for: UCL; ECDF; Birmingham; Durham; RHUL; IC; Sussex; Lancaster<br />
* Squids in GOCDB for: EFDA-JET; Manchester; Liverpool; Cambridge; Sheffield; Bristol; Brunel; QMUL; T1; Oxford; Glasgow; RALPPD.<br />
<br />
'''Tuesday 2nd December''' <br />
* [https://www.gridpp.ac.uk/wiki/Batch_system_status Multicore status]. Queues available (63%)<br />
** YES: RAL T1; Brunel; Imperial; QMUL; Lancaster; Liverpool; Manchester; Glasgow; Cambridge; Oxford; RALPP; Sussex (12)<br />
** NO: RHUL (testing); UCL; Sheffield (testing); Durham; ECDF (testing); Birmingham; Bristol (7)<br />
<br />
* According to our [https://www.gridpp.ac.uk/wiki/Batch_system_status table] for cloud/VMs (26%)<br />
** YES: RAL T1; Brunel; Imperial; Manchester; Oxford (5)<br />
** NO: QMUL; RHUL; UCL; Lancaster; Liverpool; Sheffield; Durham; ECDF; Glasgow; Birmingham; Bristol; Cambridge; RALPP; Sussex (14)<br />
<br />
* [http://www.gridpp.ac.uk/php/gridpp-dirac-sam.php?action=viewlcg GridPP DIRAC jobs] successful (58%)<br />
** YES: Bristol; Glasgow; Lancaster; Liverpool; Manchester; Oxford; Sheffield; Brunel; IC; QMUL; RHUL (11)<br />
** NO: Cambridge; Durham; RALPP; RAL T1 (4) + ECDF; Sussex; UCL; Birmingham (4)<br />
<br />
* [https://www.gridpp.ac.uk/wiki/IPv6_site_status IPv6 status]<br />
** Allocation - 42%<br />
** YES: RAL T1; Brunel; IC; QMUL; Manchester; Sheffield; Cambridge; Oxford (8)<br />
** NO: RHUL; UCL; Lancaster; Liverpool; Durham; ECDF; Glasgow; Birmingham; Bristol; RALPP; Sussex<br />
<br />
* Dual stack nodes - 21%<br />
** YES: Brunel; IC; QMUL; Oxford (4)<br />
** NO: RHUL; UCL; Lancaster; Glasgow; Liverpool; Manchester; Sheffield; Durham; ECDF; Birmingham; Bristol; Cambridge; RALPP; Sussex, RAL T1 (15)<br />
<br />
<br />
<br />
'''Tuesday 21st October'''<br />
* High loads seen in xroot by several sites: Liverpool and RALT1... and also Bristol (see Luke's TB-S email on 16/10 for questions about changes to help).<br />
<br />
'''Tuesday 9th September'''<br />
* Intel announced the new generation of Xeon based on Haswell.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Sites****************** -----><br />
|}<br />
<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Meeting Summaries<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start PMB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Project Management Board - [http://www.gridpp.ac.uk/pmb/ Members][http://www.gridpp.ac.uk/php/pmb/minutes.php Minutes] [http://www.gridpp.ac.uk/pmb/ProjectManagement/QuarterlyReports/reports.html Quarterly Reports]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ******************************************* -----><br />
<!-- ****************End PMB****************** -----><br />
<!-- ****************Start Grid Ops****************** -----><br />
<!-- ******************************************* -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | GridPP ops meeting - [http://indico.cern.ch/categoryDisplay.py?categId=338 Agendas] [https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Actions] [https://www.gridpp.ac.uk/wiki/Category:GridPP_Operations Core Tasks]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Grid Ops****************** -----><br />
<!-- ****************Start T1 liaison****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) [https://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Agenda] Meeting takes place on Vidyo.<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
Highlights from this meeting are now included in the Tier1 report farther up this page.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Liaison****************** -----><br />
<!-- ****************Start GDB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Grid Deployment Board - [http://indico.cern.ch/categoryDisplay.py?categId=3l181 Agendas] [http://indico.cern.ch/categoryDisplay.py?categId=666 MB agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End GDB****************** -----><br />
<br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start NGI****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | NGI UK - [http://www.ukngi.ac.uk/ Homepage] [https://ca.grid-support.ac.uk/cgi-bin/pub/pki?cmd=getStaticPage&name=index CA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End NGI****************** -----><br />
<br />
<!-- ****************Start Events****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Events<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************Stop Events****************** -----><br />
<!-- ****************Start ATLAS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK ATLAS - [http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?#currentView=Shifter+view&highlight=false Shifter view] [http://www.atlas.ac.uk/ops/ News & Links]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Atlas S&C week 2-6 Feb 2015'''<br />
<br />
Production<br />
<br />
• Prodsys-2 in production since Dec 1st<br />
<br />
• Deployment has not been transparent , many issued has been solved, the grid is filled again <br />
<br />
• MC15 is expected to start soon, waiting for physics validations, evgen testing is underway and close to finalised.. Simulation expected to be broadly similar to MC14, no blockers expected. <br />
<br />
Rucio<br />
<br />
• Rucio in production since Dec 1st and is ready for LHC RUN-2. Some fields need improvements, including transfer and deletion agents, documentation and monitoring. <br />
<br />
• [https://rucio-ui.cern.ch/dumps Rucio dumps available].<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMDarkDataCleaning Dark data cleaning]<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMLostFilesLost files declaration] . Only Only DDM ops can issue lost files declaration for now, cloud support needs to fill a ticket.<br />
<br />
• Webdav panda functional tests with Hammercloud are ongoing<br />
<br />
Monitoring<br />
<br />
• [http://adc-monitoring.cern.ch/ Main page]<br />
<br />
• [http://dashb-atlas-ddm-acc.cern.ch/dashboard/request.py/ddmaccounting DDM Accounting]<br />
<br />
• [http://atlas-agis.cern.ch/agis/ddmblacklisting/listGroup space]<br />
<br />
• [http://dashb-atlas-ddm.cern.ch/ddm2/ Deletion]<br />
<br />
ASAP<br />
<br />
• ASAP (ATLAS Site Availability Performance) in place. Every 3 months the T2s sites performing BELOW 80% are [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/ATLASSiteCategorization reported to the International Computing Board.]<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End ATLAS****************** -----><br />
<!-- ****************Start CMS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK CMS<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End CMS****************** -----><br />
<!-- ****************Start LHCb****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK LHCb<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End LHCb****************** -----><br />
<!-- ****************Start Other****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK OTHER<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Other****************** -----><br />
<!-- ****************Start Requests****************** -----><br />
{| cellspacing="0" cellpadding="0" style="margin: 1em 0em 0em 0.3em; width:100%"<br />
| style="width:50%; vertical-align:top; border:1px solid Gold; background-color: LightGreen;" rowspan="2"|<br />
<div style="border-bottom:1px solid Gold; background-color:#ffffaa; padding:0.2em 0.5em 0.2em 0.5em; font-size:110%; font-weight:bold;"> To note</div><br />
<div style="padding:0.4em 1em 0.3em 1em;"><br />
<br />
==== ====<br />
<!-- ******************Edit start********************* -----><br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<br />
</div><br />
|}<br />
<br />
<!-- ****************End Requests****************** -----><br />
<br />
|}</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Past_Ticket_Bulletins_2015Past Ticket Bulletins 20152015-11-10T10:18:16Z<p>Matthew Doidge 1ac9bd3994: </p>
<hr />
<div>'''Monday 2nd November 2015, 13.30 GMT'''<br /><br />
22 Open UK Tickets this week. First Monday of the Month, so all the tickets get looked at, however run of the mill they are.<br />
<br />
First, the link to all the UK [http://tinyurl.com/nwgrnys '''tickets'''].<br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116915 116915] (14/10)<br /><br />
Low availability Ops ticket. On holded whilst the numbers sooth themselves. On Hold (23/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116865 116865] (12/10)<br /><br />
Sno+ job submission failures. Not much on this ticket since it was set In Progress. Looks like an argus problem. How goes things at Sussex before Matt RB moves on? (We'll miss you Matt!). In progress (20/10)<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117261 117261] (28/10)<br /><br />
Atlas jobs failing with stage out failures. Federico notices that the failures are due to odd errors - "file already existing", and that things seem to be calming themselves. He's at a loss of what RALPP can do. Checking the panda link suggests the errors are still there today. Waiting for reply (29/10)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116775 116775] (6/10)<br /><br />
Bristol's CMS glexec ticket. It looks like the solution is to have more cms pool accounts (which of course requires time to deploy). In progress (28/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117303 117303] (30/10)<br /><br />
CMS, not Highlander fans, don't seem to believe that There can be only One (glexec ticket). Poor old Bristol seem to be playing whack-a-mole with duplicate tickets. Is there a note that can be left somewhere to stop this happening? Assigned (30/10)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=95303 95303] (Long long ago)<br /><br />
Edinburgh's (and indeed Scotgrid's) only ticket is this tarball glexec ticket. A bit more on this later. On hold (18/5)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=114460 114460] (18/6)<br /><br />
Gridpp (and other) VO pilot roles at Sheffield. No news for a while, snoplus are trying to use pilot roles now for dirac so this is becoming very relevant. In progress (9/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116560 116560] (30/9)<br /><br />
Sno+ jobs failing, likely due to too many being submitted to the 10 slots that Sno+ has. Maybe a WMS scheduling problem - Stephen B has given advice. Elena asked if the problem persisted a few weeks ago. Waiting for reply (12/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116967 116967] (17/10)<br /><br />
A ROD availability ticket, on hold as per SOP. On hold (20/10)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116478 116478] (28/9)<br /><br />
Another availability ticket. Autumn was not kind to many of us! On hold (8/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116882 116882] (13/10)<br /><br />
Enabling pilot snoplus users at Lancaster. Shouldn't have been a problem, but turned into a bit of a comedy/tragedy of errors by yours truly mucking up. Hopefully fixed now- thanks to Daniela for her patience. In progress (2/11)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=95299 95299] (Far far away)<br /><br />
glexec tarball ticket. There's been a lot of communication with the glexec devs about this - the hopefully last hurdle is sorting out the RPATHs for the libraries. It's not a small hurdle though... On hold (2/11)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117151 117151](23/10)<br /><br />
A ticket about jumbo frame problems, submitted to QM. After Dan provided some education the user replied, in that he only sees this problem at two atlas sites. But he is contacting the network admins at his institution to see if it is their end. On hold (29/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117011 117011] (19/10)<br /><br />
ROD ticket for glue-validate errors. Went away for a while after Dan re-yaimed his site bdii, but possibly back again. Daniela suggests re-running the glue-validate test. Reopened (2/11)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116689 116689] (6/10)<br /><br />
Another ROD ticket, where Ops glexec test jobs are seemingly timing out for QM (this is the ticket Daniela mentioned on the ops mailing list). Dan noted that with the cluster half full tests were passing, suggesting some kind of load correlation (but as he also notes - what's getting loaded and causing the problem - Batch, CE or WNs?). Kashif reckons the argus server, and suggests a handy glexec time test which he posted. In progress (2/11)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117324 117324] (2/11)<br /><br />
A fresh looking ROD ticket - Raul had to restart the BDII and hopefully that got it. In progress (2/11)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116358 116358] (22/9)<br /><br />
Missing Image at 100IT. 100IT have asked for more details, no news since. Waiting for reply (19/10)<br />
<br />
'''THE TIER 1'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116866 116866] (12/10)<br /><br />
Snoplus pilot enablement (not actually a word) at the Tier 1. New accounts were being requested after some internal discussion. On hold (19/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116864 116864] (12/10)<br /><br />
CMS AAA tests failing (the submitter notes "again..."). There are some oddities with other sites, which might be remote problems, but Andrew notes that previous manual fixes have been overwritten which likely explains why problems came back. In progress (does it need to be waiting for a reply?) (26/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117171 117171] (24/10)<br /><br />
LHCB had problems with an arc CE that was misbehaving for everyone. Things were fixed, and this ticket can now be closed. Waiting for reply (can be closed) (27/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117277 117277] (30/10)<br /><br />
Atlas have spotted "bring online timeout has been exceeded). This appears to be a mixture of problems adding up, such as a number of borken disk nodes and heavy write access by atlas. In progress (2/11)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117248 117248] (28/10)<br /><br />
I believe related to the discussion on tb-support, this ticket requests that new SRM host certs that meet the requirements specified be requested for the RAL SRMs. Jens was on it, and the new certs are ready to be deployed. In progress (30/10)<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios'''] - some badness at Sussex, but they have a ticket open for that.<br />
<br />
'''Monday 26th October 2015, 15.30 GMT'''<br /><br />
26 Open UK Tickets this week. Not many seem all that exciting though.<br />
<br />
The link to all the UK [http://tinyurl.com/nwgrnys '''tickets'''].<br />
<br />
The few (two) tickets that really caught my eye are:<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117151 117151] (23/10)<br /><br />
This ticket is quite interesting, mainly for Dan schooling the submitter. QM received a ticket complaining that their jumbo frames were breaking stuff - it doesn't looking like the problem is at QM though. Naught wrong with the ticket handling. Waiting for reply (23/10)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117065 117065] (20/10)<br /><br />
Bristol have a CMS glexec error ticket that looks very similar to an existing one ([https://ggus.eu/?mode=ticket_info&ticket_id=116683 116683]), which is in turn spookily similar to a ticket being worked on by the Bristol admins ([https://ggus.eu/?mode=ticket_info&ticket_id=116775 116775]). At the very least I would say that if the two problems are different one would likely obfuscate the other. Is this a case of over-keen shifters submitting tickets without checking? I'd be tempted to close one or both of [https://ggus.eu/?mode=ticket_info&ticket_id=116683 116683] and this one ([https://ggus.eu/?mode=ticket_info&ticket_id=117065 117065]). Tell them I said it was okay to[1]. In progress (21/10)<br />
<br />
[1] It probably is okay to.<br />
<br />
There are also 4 availability tickets, all On Hold waiting for 30 days or so to pass.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios'''] looks clean at time of writing.<br />
<br />
'''Monday 19th October 2015, 14.30 BST'''<br /><br />
28 Open UK Tickets today.<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116920 116920] (14/10)<br /><br />
UCL have a availability ticket, and Andrew M wonders what can be done for them as a VAC site to stop getting these sort of tickets? Assigned (15/10) ''Update - Alessandra has updated and On-Holded the ticket. Thanks!''<br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116865 116865] (12/10)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116915 116915] (14/10)<br /><br />
Sussex have a Sno+ ticket and a ROD ticket that don't seem to have been looked at since they were submitted last week. Both just Assigned.<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116918 116918] (14/10)<br /><br />
Another ROD ticket (Invalid glue), I think this one has snuck past the Liver-Lad's watch. Assigned (14/10)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116782 116782] (7/10)<br /><br />
Another Rod ticket, Dan looks like he's tracked down why one of his CE's is misbehaving (MaxStartups in sshd_config). Looks like this ticket at least can be closed, Gareth confirms that the test is green. Waiting for reply (19/10)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116560 116560] (30/9)<br /><br />
Stephen B has added some more information to try to figure out why Sno+ jobs are flooding Sheffield's 10 Sno+ slots. Elena is not sure if the problem persists though. Waiting for reply (12/10)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116864 116864] (12/10)<br /><br />
It looks like this CMS AAA test problem has resolved itself - Federica asks if you chaps at RAL changed anything? Looks like the ticket can be closed. In progress (15/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116866 116866] (12/10)<br /><br />
Enabling Sno+ pilots at the Tier 1. Only the LHC VOs had pilot roles enabled at the Tier 1, Andrew was going to discuss how best to make these changes. As with a similar issue at Lancaster - probably best to do it for all the VOs that will be using Dirac. In progress (13/10)<br />
<br />
'''Monday 12th October 2015, 14.30 BST'''<br /><br />
23 Open UK Tickets this week. Just a light review.<br />
<br />
'''OXFORD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116752 116752] (1/10)<br /><br />
Oxford's CMS Phedex renewal ritual ticket. Chris B kindly offered in his ticket for RALPP to officiate this (un)holy task for Oxford, so I advise some communication between you chaps and him - which may well be going on, but it ain't in the ticket! Assigned (6/10)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116683 116683] (5/10)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116775 116775] (6/10)<br /><br />
CMS have by the looks of it thrown a pair of duplicate tickets Bristol's way. How rude! Lukasz has rightfully suggested closing one of them (I suggest 116683. Winnie's put a good reply in t'other one).The underlying problem appears to be a shortage of pool accounts - what are the recommended amount of accounts for VOs these days? In progress.<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] (18/6)<br /><br />
Sheffield pilot role tickets. Daniela has pointed out that as Sno+ have started using Sheffield in earnest they really could do with pilot roles enabled. In progress (9/10)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116812 116812] (8/10)<br /><br />
LHCB asked Andy to clean up the $LOGIN_POST_SCRIPT for LHCB at his site, removing a "export CMTEXTRATAGS="host-slc5"" line. Naught wrong with the ticket, but I liked how lhcb did some good debugging, and there's something about a profile script problem that reminds me of a simpler time... In progress (9/10)<br />
<br />
[http://tinyurl.com/nwgrnys '''A link to the rest of the tickets for completeness.''']<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''The other VO nagios.''']<br /><br />
Looks okay, some (known I think) errors at QM, and Sheffield seems to have a few SRM errors going on since Saturday.<br />
<br />
<br />
'''Monday 5th October 2015, 14.15 BST'''<br />
<br />
22 Open UK Tickets this month, all of them, Site by Site:<br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116136 116136] (9/9)<br /><br />
Sussex got a snoplus ticket for a high number of job failures, although simple test jobs ran okay. Matt asks if the problem persists, the reply was a resounding "not sure". In progress (think about closing) (21/9)<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116652 116652](1/10)<br /><br />
A ticket from CMS, about some important Phedex ritual that must occur on the 3rd of November, when the stars are right. The ticket needs some confirmation and feedback, plus the nomination of one site acolyte to receive the DBParam secrets from CMS - but the ticket only got assigned to sites this morning. Assigned (5/10)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116651 116651](1/10)<br /><br />
Same as the RALPP ticket, Winnie has volunteered Dr Kreczko for the task. In progress (5/10)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303](Long, long ago)<br /><br />
glexec ticket. On hold (18/5)<br />
<br />
'''DURHAM'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116576 116576](1/10)<br /><br />
Atlas ticket asking Durham to delete all files outside of the datadisk path. Oliver asks what this means for the other tokens (I think they can be sacrificed to feed datadisk, but Brian et al can confirm that). Waiting for reply (5/10)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116560 116560](30/9)<br /><br />
Sno+ jobs having trouble at Sheffield. Looks like a proxy going stale problem as only 10 Sno+ jobs at a time can run at Sheffield. Matt M asks if/how the WMS can be notified to stop sending jobs in such a case. In progress (30/9)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460](18/6)<br /><br />
Gridpp Pilot roles. No news on this for a while, after the last attempt seemed to not quite work. In progress (30/7)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116585 116585](1/10)<br /><br />
Biomed ticketed Manchester with problems from their VO nagios box - which Alessandra points out being due to there being no spare cycles for biomed to run on. Assigned (can be put on hold or closed?) (1/10)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116082 116082](7/9)<br /><br />
A classic Rod Availability ticket. On Hold (7/9)<br />
<br />
'''LANCASTER''' (a little embarrassing that my own site has the most tickets)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116478 116478] (28/9)<br /><br />
Another availability ticket, this time for Lancaster (which has been through the wars in September). Still trying to dig our way out, but even the Admin's broke. On hold (5/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116676 116676] (5/10)<br /><br />
Another ROD ticket, Lancaster's not quite out of the woods. We think WMS access is somewhat broken. We have no idea about the sha2 error. In progress (5/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116366 116366] (22/9)<br /><br />
Sno+ spotted malloc errors at Lancaster. The problems seemed to survive one batch of fixes, but I asked again if they still see problems after running a good number of jobs over the weekend. Waiting for reply (5/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (In a galaxy far, far way)<br /><br />
glexec ticket. This was supposed to be done last week, after I had figured out [https://www.gridpp.ac.uk/wiki/RelocatableGlexec "the formula"] - but then last week happened. On hold (5/10)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115959 115959] (31/8)<br /><br />
LHCB job errors at QM, with a 70% pilot failure rate on ce05. Dan couldn't see where things are breaking (only that the CE wasn't publishing to APEL- and asks if this is the cause of the problem?) Waiting for reply (5/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116662 116662] (5/10)<br /><br />
LHCB job failures on ce05 - almost certainly a duplicate of [https://ggus.eu/?mode=ticket_info&ticket_id=115959 115959], but it might have some useful information in it. Assigned (probably can be closed as a duplicate) (5/10)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116650 116650] (1/10)<br /><br />
Imperial's invitation to the CMS Phedex DBParam ritual. Daniela's on it, as well as the other CMS sites. On hold (5/10)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116649 116649] (1/10)<br /><br />
Brunel's ticket for the great DBParam alignment of 2015. On hold (5/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116455 116455] (28/9)<br /><br />
A CMS request to change the xrootd monitoring configs. Did you get round to doing this last week Raul? In progress (29/9)<br />
<br />
'''EFDA-JET'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115448 115448] (3/8)<br /><br />
Biomed having trouble tagging the jet CE. The Jet admins think this is the same underlying issues as their other ticket <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115496 115496]. In progress (25/9)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115496 115496] (5/8)<br /><br />
Biomed unable to remove files from the jet SE. There are clues that suggest that some dns oddness is the cause, but it's not clear. In progress (18/9)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116358 116358] (22/9)<br /><br />
Ticket complaining about a missing image at the site. Some to and fro, the ball is back in the site's court. In progress (2/10)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116618 116618] (1/10)<br /><br />
The Tier 1's CMS DBParam ritual ticket. In progress (5/10)<br />
<br />
Let me know if I missed ought.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''T'OTHER VO NAGIOS''']<br /><br />
At time of writing things looka a bit rough at QM, Liverpool (just getting over their downtime) and for Sno+ at Sheffield (likely related to their ticket).<br />
<br />
<br />
Matt's on holiday until the 29th, so he's being replaced with links or any update Jeremy is kind enough to provide.<br />
<br />
[http://tinyurl.com/nwgrnys '''CAST YOUR GAZE UPON THE UK'S GGUS TICKETS HERE'''].<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''FEAST YOUR EYES ON THE T'OTHER VO NAGIOS STATUS HERE''']<br />
<br />
"Normal" service will resume in October. I'll leave y'all anticipating that lovely review of all the tickets.<br />
<br />
'''Monday 14th September, 15.00 BST'''<br /><br />
<br />
Yet another brief ticket review - its been a tad busy! Hopefully normal service will resume, err, next month. My apologies.<br />
<br />
Only 20 Open UK tickets this week.<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115805 115805] - This RAL ticket from Sno+ looks like it can be closed, with there not actually being a problem with the site, or a feature that RAL would really want to implement. <br />
<br />
Keeping on the Tier 1, does this ticket concerning the FTS3 certificates ([https://ggus.eu/?mode=ticket_info&ticket_id=115290 115290]) have anyone looking into it yet?<br />
<br />
Are these two QMUL LHCB tickets duplicates, or related? [https://ggus.eu/?mode=ticket_info&ticket_id=115959 115959] & the more recent [https://ggus.eu/?mode=ticket_info&ticket_id=116153 116153]<br />
<br />
This Sussex ticket looks like it could do with someone taking a look - still just "assigned" since the 9th - [https://ggus.eu/?mode=ticket_info&ticket_id=116136 116136]<br />
<br />
Finally, an interesting VAC ticket for Oxford - "no space left on device" for atlas jobs - [https://ggus.eu/?mode=ticket_info&ticket_id=116123 116123]. Bit of an odd one!<br />
<br />
'''Monday 7th September 2015, 15.00 BST'''<br /><br />
20 Open UK Tickets this week.<br />
Due to Interesting Downtimes (apologies for reusing that pun!) yet another fairly light review. But not much is going on in ticketland.<br />
<br />
[http://tinyurl.com/nwgrnys THE GGUS TICKETS IN ALL THEIR GLORY]<br /><br />
The Sno+ ticket [https://ggus.eu/?mode=ticket_info&ticket_id=115805 115805] is interesting, Sno+ are looking at monitoring jobs submitted via WMS through the port 9000 URLS, but the RAL WMS behaves differently from the Glasgow one and doesn't let others with the same roles look at the links. Sno+ are still "developing" their grid infrastructure with the WMS in mind by the looks of it.<br />
<br />
The pilot role at Sheffield ticket [https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] could still do with some attention, or an update.<br />
<br />
Bristol had a ticket from CMS that looks interesting - [https://ggus.eu/?mode=ticket_info&ticket_id=115883 115883]. The CMS SAM3 tests are confused by Bristol having an SRM-less SE endpoint. Waiting for reply after things were clarified.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 T'Other VO Nagios Page]<br /><br />
A few "The CREAM service cannot accept jobs at the moment" style errors at QM, but they're only a few hours old. Otherwise looking alright beyond the usual noise.<br />
<br />
Of course with these light reviews I could well be missing something, so feel free to let me know - sites or VO representatives.<br />
<br />
'''Monday 24th August 2015, 15.45 BST'''<br /><br />
<br />
21 Open UK tickets this week, most being looked at or are understood. There will be no ticket update from Matt next week (1st September) either, as he will be flapping about during a local downtime.<br />
<br />
[http://tinyurl.com/nwgrnys YE OLDE GGUS TICKET LINK]<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] (gridpp pilots at Sheffield) could do with an update. The Liverpool ticket discussed last week ([https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248]) has received an update from the user saying that the ticket can be closed. As Steve mentioned last week the underlying issue is still very much there, but I don't think this ticket is a suitable banner for us to fight that battle under.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 T'Other VO Nagios Page]<br /><br />
As usual nothing to see here at time of writing, sites are doing a grand job of working for the monitored VOs.<br />
<br />
And that's all folks! See you in Liverpool.<br />
<br />
<br />
'''Monday 17th August 2015, 15.00 BST'''<br /><br />
29 Open UK Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114992 114992] (10/7)<br /><br />
It looks like this CMS transfer problem ticket can be closed after a user update last week, which reported enabling multiple streams solved the initial failures. In progress (13/8)<br />
<br />
'''EFDA-JET'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115613 115613] (assigned) ''Update- looks like this was a temporary problem, and the ticket can be closed.<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115448 115448] (in progress, but empty)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115496 115496] (in progress, but might not be a site problem).<br /><br />
Jet have 3 biomed tickets, 2 of which are looking a little neglected.<br />
<br />
'''CAMBRIDGE'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115655 115655] (12/8)<br /><br />
John rightfully asks why does a lengthy but very scheduled downtime set off a ROD alarm. Other then that it'll be worth "on-holding" this ticket whilst the unrighteous red flag fades. In progress (17/8) ''Update - On holded''<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
This Sno+ ticket looks like it needs some chasing up, no news for nearly 2 months. In progress (21/7) ''Update - Steve commented on this on TB-SUPPORT''<br />
<br />
'''Monday 10th August 2015, 15.00 BST'''<br /><br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 T'Other VO Nagios] looks alright at time of writing.<br />
<br />
24 Open UK tickets this week.<br />
<br />
''Lots of activity on GGUS since yesterday. Most positive, but I see the number of tickets at EFDA-JET increasing - all three from biomed.''<br />
<br />
'''Tier 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115512 115512] (5/8)<br /><br />
An interesting ticket - where a banned user is still banned after moving to LHCB from Biomed (no, he's not called Heinz). In Progress (6/8) ''Update - Andrew has asked that the ticket be reassigned to the argus devs as the pepd logs are showing oddness. Waiting for reply now.''<br />
<br />
'''Bristol'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115565 115565] (7/8)<br /><br />
Bristol's phedex agents are down, and have been for a few days. I might have dreamt this, but thought that the Bristol Phedex service might not be hosted at Bristol, especially after RALPP had a similar ticket ([https://ggus.eu/?mode=ticket_info&ticket_id=115566 115566]) at the same time. Assigned (7/8) ''Update - solved''<br />
<br />
'''Manchester'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115504 115504] (ROD Ticket) ''Solved'' <br /> <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115399 115399] (Wiki Ticket) ''Still Open'' <br /> <br />
Both these tickets look like they can be closed.<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115525 115525] (5/8)<br /><br />
Atlas deletion errors after a a disk server fell over- nothing wrong with the ticket handling, but Alessandra brings up a point that always niggles me - the emphasis on the total number of transaction errors and not the number of affected unique files. In progress (8/8)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB ticket sparked by those IPv6 problems. Still no word from Vladimir; Raja - could you comment? I suspect there's been plenty of room for LHCB jobs in QM's (and everyone else who mainlines atlas jobs) queues this weekend. Waiting for reply (21/7)<br />
<br />
'''Monday 3rd August 2015, 14.30 BST'''<br /><br />
<br />
23 Open tickets this month, full review time.<br />
<br />
'''''Newish this morning'''''<br /><br />
As discussed on TB-SUPPORT, a few sites have been getting "I can't lcg-tag your CE" tickets from Biomed. The Liverpool ticket [https://ggus.eu/?mode=ticket_info&ticket_id=115449 115449], solved and verified, was the flagship of these issues. Brunel ([https://ggus.eu/?mode=ticket_info&ticket_id=115445 115445]) and EFDA-JET ([https://ggus.eu/?mode=ticket_info&ticket_id=115448 115448]) also have tickets about this.<br />
<br />
<br />
'''Sno+ "glite-wms-job-status warning"''' (3/8)<br /><br />
Glasgow: [https://ggus.eu/?mode=ticket_info&ticket_id=115435 115435]<br /><br />
Tier 1: [https://ggus.eu/?mode=ticket_info&ticket_id=115434 115434]<br /><br />
Matt M submitted these tickets to Glasgow and the Tier 1 after having trouble with a proportion of Sno+ jobs. Both are being looked at- definitely worth collaborating on this one.<br />
<br />
'''Wiki-leak...'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115399 115399](31/7)<br /><br />
Jeremy noticed that the wiki didn't work for him on Friday - but it seems to work for Jeremy, Alessandra and myself now. As Jeremy notes the ticket can be closed, but out of interest did anyone else spot any problems? In progress (3/8)<br />
<br />
'''Spare the ROD...'''<br /><br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115433 115433] (3/8)<br /><br />
Some CE problems noticed on the dashboard for the Liver-lads - who might be in mourning. Assigned (3/8) ''Update - aaannd Solved by upping gridftp max connections''<br />
<br />
'''RALPP & UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114764 114764]<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114851 114851]<br /><br />
Both of these are "availability" alarm tickets, on-holding until they clear. I hope RALPP managed to get a re-computation for their unfair failures (IGTF-1.65 problems on ARC).<br />
<br />
'''Sno+'d Under'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115387 115387] (30/7)<br /><br />
I'm uncertain if this Sno+ ticket, probably somewhat related to Matt M's recent thread on TB-SUPPORT and concerning xrootd access, is meant for the Tier 1 or RALPP. Assigned (3/8)<br />
<br />
'''First Tier Problems.'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115417 115417] (2/8)<br /><br />
LHCB spotted a number of nodes with cvmfs problems at the Tier 1, which the RAL team had already jumped on and repaired this morning. They wonder if the problem persists. Waiting for reply (3/8)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115290 115290] (28/7)<br /><br />
An FTS problem requiring some special CA magic to solve, but the current CA-wizard isn't about. On hold (29/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113836 113836] (20/5)<br /><br />
Glue 1 vs Glue 2 queue mismatches. It's being worked on perfecting cluster publishing for ARC CEs, but the ticket could either do with an update or on-holding. In progress (24/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114992 114992] (10/7)<br /><br />
CMS transfers failing between RAL and, err, TAMU in the US. Assigned to RAL, where Brian has investigating and Andrew has posed an good question, asking if the user has considered managing the transfers with FTS. Quiet on the user side. In progress (21/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/2014)<br /><br />
One of the tickets from the before times, about CMS AA access tests. It has become a long and confusing saga, but Gareth rescued it with a handy summary of the issue in his last update. How goes the battle? In progress (17/7)<br />
<br />
'''Oxford Squid is red.'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115230 115230] (24/7)<br /><br />
Which might be the colour Ewan's seeing right now! The ticket is reopened, with a comment from Alessandra that the current recommendation is to allow all CERN addresses, and asks if this is something Oxford could do. Reopened (3/8) ''Update - solved after a squid restart. It's not the end of this ordeal, but Ewan would like to tackle it in a different arena then an Oxford ticket.''<br />
<br />
'''Mavaricks and Gooses - GridPP Pilot roles.'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114485 114485] '''Bristol'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] '''Sheffield'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114442 114442] '''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114441 114441] '''RHUL'''<br /><br />
Daniela hopped right back into the pilot seat after getting back from her holidays. Bristol and RALPP are looking good, Sheffield and RHUL are still in the Danger Zone - RHUL in particular were having troubles with argus and could do with some working configs from elsewhere to compare and contrast with their own.<br /><br />
''Update - RHUL are looking better, just a few queue permission tweaks to go by the looks of it.''<br />
<br />
<br />
'''My shame - tarball glexec'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] '''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] '''Lancaster'''<br /><br />
The tarball glexec tickets. Actually this is likely to become a defunct (or at least different) problem at Edinburgh with their SL7 move. Lancaster has a plan - we plan to deploy *something* (amazing plan there Matt) during our next big reinstall in September. Between now and then I have a test CE, cluster and most importantly some time. <br />
<br />
'''Pot-luck tickets (or those I couldn't group).'''<br />
<br />
'''Durham'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br /><br />
A tiny fraction of jobs publishing 0 cores used. Looks to be a slurm oddity. Oliver upgraded their CEs to ARC5 last week and hopes this has fixed things. Fingers crossed! In progress (29/7)<br />
<br />
'''Lancaster'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/2014)<br /><br />
My other shame - Lancaster's poor perfsonar performance. It's being worked on. On Hold (should be back in progress soon) (30/7)<br />
<br />
'''Liverpool'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (21/7)<br /><br />
Sno+ production problems at Liverpool, probably due to a lack of space in the shared area. Things are back in Sno+'s court, with the submitter consulting the Sno+ gurus (I think). In progress (21/7)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB job submission problems due to the known about dual-stacking problems. Waiting for input from LHCB for a while now, as things look okay at QM now but at least check LHCB jobs still weren't running for some reason. Waiting for reply (21/7)<br />
<br />
That's all folks!<br />
<br />
'''Monday 27th July 2015, 16.10 BST'''<br /><br />
<br />
Only 20 UK Tickets this week, and many are on hold for summer holidays. I pruned a few tickets, but none are striking me as needing urgent action, so this will be brief.<br />
<br />
[http://tinyurl.com/nwgrnys Mandatory UK GGUS Link]<br /><br />
Nothing to see here really, but maybe I'm missing something? Nit-picking I see:<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (publishing problems at Durham) could still do with an update - not sure if work is progressing offline on the issue.<br />
<br />
The Snoplus ticket [https://ggus.eu/?mode=ticket_info&ticket_id=115165 115165] looks like it might be of interest for others - in it Matt M asks about tape-functionality in gfal2 tools. Brian has updated the ticket clueing us in about gfal-xattr.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 UK T'other VO Nagios]<br /><br />
A few failures here at time of writing - although only one at Brunel seems to be more then a few hours old (dc2-grid-66.brunel.ac.uk is failing pheno CE tests).<br />
<br />
Let me know if I missed ought!<br />
<br />
'''Monday 20th July 2015, 14.30 BST'''<br /><br />
27 Open UK Tickets this week.<br />
<br />
'''Brunel'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115113 115113] (17/7)<br /><br />
This Brunel Ops ticket (otherwise okay) is a good reminder that when removing CEs from the GOCDB for your site make sure to get all the services, or else you just might end up still monitored (and thus end up ticketed!). Reopened (can probably be closed soon) (20/7) ''Update - ticket solved''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114006 114006] (31/5)<br /><br />
This ticket about problems with Brunels accounting figures was looking promisingly close to being solved (at least to my layman's eyes) 3 weeks ago. Any word offline? In progress (30/6)<br />
<br />
'''Lancaster'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114845 114845] (6/7)<br /><br />
LHCB jobs were failing at Lancaster a fortnight ago, but things should have been fixed quite promptly. Are they still broken? Waiting for reply (14/7)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
A similar case for this LHCB ticket for QM (part of their ongoing dual-stack saga). Waiting for reply (13/7)<br />
<br />
Note: The related issue [https://ggus.eu/index.php?mode=ticket_info&ticket_id=115017 115017] has been "solved" pending further testing. <br />
<br />
'''Sheffield'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114649 114649] (26/6)<br /><br />
This Sno+ ticket has been in "Waiting for Reply" for a little while, no word from the user (who isn't Matt M). Could we poke Sno+ though another channel about this? Waiting for reply (6/7)<br />
<br />
'''Liverpool'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
We could do with finding out from Sno+ the state of play at Liverpool too, although we might have to wait until Steve is back from his hols later this week to field any replies. In progress (17/6)<br />
<br />
'''Durham'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br /><br />
Ticket concerning the small percentage of Durham jobs that aren't publishing their core count (probably a slurm oddity). Now that Oliver's back from holiday has he had time to look at this? On Hold (19/6)<br />
<br />
'''Tier 1'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=113836 (20/5)<br /><br />
I suspect that whilst the work described chugs along in the background we consider on holding this ticket. In progress (24/6)<br />
<br />
'''UCL'''<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=114746 114746] (30/6)<br /><br />
Ben has been battling getting his DPM working, and spotted an interesting problem where SELinux was blocking the httpd from accessing mysql. It's nice to see someone not just switching SELinux off (like I have a habit of doing...). In progress (20/7)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115003 115003] (12/7)<br /><br />
Andy having some problems on a test SE at ECDF - he seems ot be suffering a series of unfortunate errors. Maybe the storage group could help? In progress (17/7)<br />
<br />
'''GridPP Pilot Roles.'''<br /><br />
Bristol are ready for testing, Govind discovered a possible bug in argus that needs a bit more testing at RHUL. Things seem quiet at RALPP and Sheffield, and Brunel too.<br />
<br />
''Supplemental - In the region multicore publishing ticket ([https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233]) only Oxford have a CE still appearing to publish 0 cores - but I thought this CE was Old-Yeller'ed?''<br />
<br />
'''Tuesday 14th July 2015, 9.30 BST'''<br />
<br />
Lazy update today, due to some fun and games at Lancaster yesterday. <br />
<br />
[http://tinyurl.com/nwgrnys UK GGUS Tickets]<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios]<br />
<br />
'''Tickets that pop:'''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114952 114952] and [https://ggus.eu/?mode=ticket_info&ticket_id=114951 114951] are both atlas frontier tickets at RALPP and Oxford, both have been reopened - although the underlying issues seem different. The Oxford ticket is similar to one at RAL ([https://ggus.eu/?mode=ticket_info&ticket_id=114957 114957]), which looks to be caused by an unannounced change in IP for some important atlas squids (AIUI - speed reading the tickets this morning).<br />
<br />
'''QM IPv6 Woes'''<br /><br />
Followers of the atlas uk lists will have noticed some heroic attempts to diagnose and repair problems at QM which appear to be someone else's fault.<br />
LHCB's ticket to QM on the matter: [https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573]<br /><br />
Dan's ticket concerning the "rogue routes": [https://ggus.eu/index.php?mode=ticket_info&ticket_id=115017 115017]<br />
<br />
'''GridPP Pilot Roles'''<br /><br />
Durham, Bristol, Sheffield, Brunel and RHUL still have open tickets about this. Bristol are working on it, as are Durham - Oliver's ready for their setup to be tested again (Puppet overwrote his last changes!). Not much recent news from the other three.<br />
<br />
That's all from me folks, let me know if I missed ought!<br />
<br />
'''Tuesday 14th July 2015, 9.30 BST'''<br />
<br />
Lazy update today, due to some fun and games at Lancaster yesterday. <br />
<br />
[http://tinyurl.com/nwgrnys UK GGUS Tickets]<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios]<br />
<br />
'''Tickets that pop:'''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114952 114952] and [https://ggus.eu/?mode=ticket_info&ticket_id=114951 114951] are both atlas frontier tickets at RALPP and Oxford, both have been reopened - although the underlying issues seem different. The Oxford ticket is similar to one at RAL ([https://ggus.eu/?mode=ticket_info&ticket_id=114957 114957]), which looks to be caused by an unannounced change in IP for some important atlas squids (AIUI - speed reading the tickets this morning).<br />
<br />
'''QM IPv6 Woes'''<br /><br />
Followers of the atlas uk lists will have noticed some heroic attempts to diagnose and repair problems at QM which appear to be someone else's fault.<br />
LHCB's ticket to QM on the matter: [https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573]<br /><br />
Dan's ticket concerning the "rogue routes": [https://ggus.eu/index.php?mode=ticket_info&ticket_id=115017 115017]<br />
<br />
'''GridPP Pilot Roles'''<br /><br />
Durham, Bristol, Sheffield, Brunel and RHUL still have open tickets about this. Bristol are working on it, as are Durham - Oliver's ready for their setup to be tested again (Puppet overwrote his last changes!). Not much recent news from the other three.<br />
<br />
That's all from me folks, let me know if I missed ought!<br />
<br />
'''Monday 6th July 2015, 14.00 BST''' <br /><br />
30 Open UK Tickets this month. Looking at them all!<br />
<br />
'''NGI'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233] (10/6)<br /><br />
The UK not publishing core counts at all sites. Some progress, but at last check John G couldn't see a change for Oxford or Glasgow. In progress (30/6) ''Update - Glasgow seems to be okay after de-creaming, checking the July list we have t2ce6 at Oxford, ce3 and ce4 at Durham (see their ticket) and cetest02 at IC (but that node has test in its hostname!).''<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114442 114442] (18/6)<br /><br />
Gridpp Pilot role ticket. Accounts need to be created, but no word for a few weeks. In progress (19/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114764 114764] (1/7)<br /><br />
Ticket tracking (false) availability issues, created to appease COD - the problem caused by a broken CA rpm release for Arc CEs. Kashif has created a counter-ticket [https://ggus.eu/index.php?mode=ticket_info&ticket_id=114742 114742] Gordon's sagely advice is to submit a recalculation request once the issue is fixed. Assigned (1/7)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114485 114485] (19/6)<br /><br />
Bristol's gridpp pilot role ticket. No news, could do with an update really. In progress (22/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114426 114426] (18/6)<br /><br />
CMS AAA reading test problems. The Bristol admins have transferred data to their new shiny SE and have asked CMS to test again. No word since. Waiting for reply (30/6)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (1/7/13...)<br /><br />
Tarball glexec ticket, now 2 years old. After a really promising burst the last 6 weeks haven't seen any progress, due to a lot of other "normal" tarball work taking up the time. Sorry! On hold (18/5)<br />
<br />
'''DURHAM'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114536 114536] (22/6)<br /><br />
Durham's gridpp pilot role ticket. Not acknowledged yet, is Oliver back yet? Assigned (22/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114765 114765] (1/7)<br /><br />
See RALPP ticket 114764. Assigned (1/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114727 114727] (30/6)<br /><br />
Catalin ticketed that a number of SW_DIR variables at Durham are still pointing to the old school .gridpp.ac.uk cvmfs space. Assigned (30/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br /><br />
John G ticketed Durham over a small percentage of jobs being published as "zero core". Looks like a SLURM timeout problem, although a fix isn't obvious. Put on the back burner whilst Oliver is on holiday. On Hold (19/6)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114649 114649] (26/6)<br /><br />
A ticket from a Sno+ user about not being able to access software using the Sheffield CEs. Acknowledged but no news. In progress (26/6) ''Update - Elena can't find anything wrong, cvmfs seems to be working fine. Perhaps a problem with the environment?''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] (18/6)<br /><br />
Sheffield's gridpp pilot role ticket. Did you get round to rolling them out? In progress (19/6)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114444 114444] (18/6)<br /><br />
LHCB ticket concerning the DPM's SRM not returning checksum information. On hold whilst a related ticket is being looked at ([https://ggus.eu/index.php?mode=ticket_info&ticket_id=111403 111403]). On Hold (22/6)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
Another Sno+ ticket, about grid production jobs failing at Liverpool. AIUI caused by Sno+ running out of space on the shared pool. At last check Steve posted the usage information for Sno+ but no word since (and Steve's off on his hols). In progress (17/6)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114845 114845] (6/7)<br /><br />
LHCB pilots failing at Lancaster. Looks like a simple node misconfiguration, hopefully fixed, waiting to see if it is. On hold (6/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/2013)<br /><br />
glexec ticket - see Edinburgh description. On hold (15/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1)<br /><br />
Bad bandwidth performance at Lancaster. Hoping that IPv6 will shake things up a bit so pushing that. On hold (18/5)<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114746 114746] (30/6)<br /><br />
SRM-put failures ROD ticket. No news at all. Assigned (30/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114851 114851] (6/7)<br /><br />
Low availability ROD ticket, related to above. Assigned (6/7)<br />
<br />
'''RHUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114441 114441] (18/6)<br /><br />
Another GridPP pilot role ticket. Pilots rolled out, but something isn't quite right and they're not working - Govind is looking again. In progress (6/7)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB ticket about two out of three QM CEs not responding for them. Dan spotted the broken CEs were dual-stacked, the working one wasn't. The ticket seemed to have trailed off into some confusion over who needs to do some testing where. I agree with Dan that that who needs to be someone with LHCB credentials! The waters still seem muddied. In progress (1/7)<br />
<br />
'''IC'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114737 114737] (30/6)<br /><br />
The IC voms wasn't updating properly, due to what I infer from the ticket as "SSL/mysql madness". Simon and Robert have been heroically battling this one - it's a good read. On hold (3/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114379 114379] (16/6)<br /><br />
Sam's ticket about SE support in Dirac. Sam will shortly try testing things out on the new Dirac to see how it fares. In progress (6/7)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114447 114447] (18/6)<br /><br />
Brunel's gridpp pilot ticket. Being worked on, with one CE with the pilots enabled. In progress (26/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114006 114006] (31/5)<br /><br />
A ticket from APEL, about Brunel under-reporting the number of jobs they are doing. Turned out to be a problem with Arc, which Raul upgraded to the fixed 5.0 version. The APEL team deleted the sync records, but no word since. In progress (30/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114850 114850] (6/7)<br /><br />
Another APEL ticket, likely the fallout of the previous one - it looks like GAP publishing has been left on for the Brunel CREAM CEs. Assigned (6/7) ''Update - solved''<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114786 114786] (2/7)<br /><br />
Low availability ticket - see RALPP ticket 114442 - probably could do with On holding. In progress (2/7) ''Update - Onholded''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113910 113910] (26/5)<br /><br />
Sno+ data staging problems. Brian gave some advice on how the large VOs do data staging from tape, and has asked if Sno+ still has problems. Matt M might still be on leave though. Waiting for reply (23/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA problems, which eventually brought to light to a problem with super-hot datasets which were alleviated (I think). Despite an update to castor that improved performance the last batch of tests didn't show improved results. No news since. In progress (17/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113836 113836] (20/5)<br /><br />
Glue mismatch problems at RAL. Working on getting "many-Arcs" to correctly publish. In progress (24/6)<br />
<br />
'''Monday 29th June 2015, 14.30 BST'''<br /><br />
<br />
Looking at the [https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''"Other VO" Nagios''']. <br /><br />
Things look generally alright - but Durham look like they need to update their CA rpms - but that might have to wait until Oliver is back from leave.<br />
<br />
'''Tarballs:'''<br /><br />
I don't think this effects many, but there's was a ticket to produce a new version of the WN tarball (which is done): [https://ggus.eu/index.php?mode=ticket_info&ticket_id=114574 114574]<br ><br />
Although AFAICS there is no urgent need to upgrade tarball WNs.<br />
<br />
26 UK Tickets, although not many stand out.<br />
<br />
'''Gridpp VO Pilot tickets:'''<br /><br />
Largely doing alright. With Oliver away the Durham ticket hasn't been looked at yet. Sheffield and Bristol's tickets could do with an update (or on-holding if there's going to be a delay). The RHUL ticket has been reopened as they're deployment of the pilot roles hasn't quite worked out.<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB having trouble with two out of three QM CEs. Dan notes that the two "broken" CEs have been recently dual-stacked, and asks if this could be the problem. The answers is a resounding "maybe", and Raja asks if problems could be duplicated by others using lxplus. Waiting for reply (24/6)<br />
<br />
'''IMPERIAL/DIRAC'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114379 114379] (16/6)<br /><br />
Sam's ticket trying to get SE support with Dirac...er, spruced up. Daniela has asked if the tests can be redone with the "new" dirac. Waiting for reply (22/6)<br />
<br />
Let me know if I missed any tickets.<br />
'''Monday 22nd June 2015, 14.00 BST'''<br />
<br />
35 Open UK Tickets this week (!!!)<br />
<br />
'''GridPP Pilot Role'''<br /><br />
A dozen of them are from Daniela (who painstakingly submitted them all) concerning getting the gridpp (and other) pilot role enabled on the site in question's CEs.<br />
<br />
An example of one of these tickets is:<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=114440 114440] (Lancaster's smug solved ticket).<br />
<br />
Ticketed sites are: Durham, Bristol, Cambridge, Glasgow, ECDF (who are also having general gridpp VO support problems), EFDA-JET (looking solved), Oxford, Liverpool, Sheffield, Brunel, RALPP and RHUL. Most tickets are being worked on fine, but the Bristol and Liverpool ones were still just in an "assigned" state at time of writing. <br /><br />
''Update - good progress on this, just one ticket left "assigned". Cambridge are done, as are JET (ticket needs to be closed). Oxford and Manchester are ready for to have their new setups tried out, with Oxford kindly road-testing glexec for the pilot roles. Good stuff.'' <br />
<br />
'''Core Count Publishing'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233] (10/6)<br /><br />
Of the sites mentioned in this ticket (Durham[1], IC, Liverpool, Glasgow, Oxford) who *hasn't* had a go at changing their core count publishing? I know Oxford have. Daniela had a pertinent question about publishing for VMs, which John answered. In progress (17/6)<br />
<br />
[1] Durham have another ticket on this which may explain their lack of core count publishing: <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br />
<br />
'''DIRAC'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114379 114379] (16/6)<br /><br />
Sam S formed this ticket over having trouble accessing the majority of SEs over Dirac, after some discussion around this last week. Sam acknowledges that this could be a site problem, not a DIRAC problem, but you gotta start somewhere (he worded that point more eloquently). Daniela has posted her latest and greatest DIRAC setup gubbins for Sam to try out. Another, unrelated, point to have are the names missing from Sam's list - for example I'm pretty sure Lancaster should support gridpp VO storage but I've forgotten to roll it out! Waiting for reply (22/6)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
Final ticket today, and another one discussed last week in the storage meeting. Steve's explanation of why (and how) Sno+ would need to start to using space tokens was fantastically well worded in a way to not spook easily startled users. David is digesting the information, but it will likely need to wait for Matt M's return before we'll see progress. In progress (16/6)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114444 114444](18/6)<br /><br />
I told a pork pie when I said that was the last ticket - this one caught my eye. A ticket from lhcb over files not having their checksums stored on Manchester's DPM. A link was given to another ticket at CBPF for a similar issue which got the DPM devs involved ([https://ggus.eu/index.php?mode=ticket_info&ticket_id=111403 111403]) - although Andrew McNab was already subscribed to the ticket. In progress (19/6)<br />
<br />
'''Monday 15th June 2015, 14.15 BST'''<br /><br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO nagios:]<br /><br />
ce05.esc.qmul.ac.uk and hepgrid11.ph.liv.ac.uk seem to be having a spot of bother for multiple VOs, and hepgrid2.ph.liv.ac.uk seems to be starting to have trouble too.<br />
<br />
19 Open UK Tickets this week.<br />
<br />
'''NGI'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233] (10/6)<br /><br />
John Gordon ticketed the NGI (as well as others) about some sites in the UK not publishing core counts with their APEL numbers (or more precisely have submission hosts at that site not publishing). Following the link it looks like Imperial, Liverpool, Durham, Glasgow and Oxford are on this list, I've listed the submission hosts reporting "0" core jobs below to help people clear up their rogues! If you've only fixed things in the last fortnight you'd still show up on this list. In progress (15/6) <br />
<br />
"0" core job submission hosts:<br /><br />
ce3.dur.scotgrid.ac.uk<br /><br />
ce4.dur.scotgrid.ac.uk<br /><br />
cetest02.grid.hep.ph.ic.ac.uk<br /><br />
hepgrid5.ph.liv.ac.uk<br /><br />
hepgrid6.ph.liv.ac.uk<br /><br />
hepgrid97.ph.liv.ac.uk<br /><br />
svr009.gla.scotgrid.ac.uk<br /><br />
t2ce06.physics.ox.ac.uk<br />
<br />
'''Tier 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113914 113914] (26/5)<br /><br />
Sno+ file copying ticket. Matt M is away on his hols, but Dave Auty has took over his duties and reports that this problem seems to have gone away - it can probably be closed. In progress (9/6)<br />
<br />
'''Liverpool'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
Another Sno+ ticket here from David concerning job failures. Nothing wrong with the ticket handling, but I thought that David's errors in submitting test jobs are worth documenting, as they were very understandable. David has since asked if the Sno+ job failures are linked to Sno+ nagios test failures at Liverpool. In progress (12/6)<br />
<br />
'''Imperial'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114157 114157] (8/6)<br /><br />
After Simon and Daniela have cleared up the atlas dark data and expanded their Space Tokens using the space freed up there still seems to be some confusion in Rucio, disagreeing with the SRM numbers. In progress (10/6)<br />
<br />
'''Oxford'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114208 114208] (9/6)<br /><br />
Oxford being ticketed for UKI-SOUTHGRID-OX-HEP_IPV6TEST failing connection tests, tests for which Oxford should not be getting ticketed afaacs. I remember this being mentioned in the Thursday cloud meeting, but I'm ashamed to say I wasn't paying attention. Were any conclusions drawn/decisions made? In progress (10/6)<br />
<br />
'''Manchester'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114153 114153] (7/6)<br /><br />
Atlas transfer failures from Manchester. Errors are still occurring as of yesterday, any news? In progress (14/6)<br />
<br />
<br />
'''Monday 8th June 2015, 15.00 BST'''<br /><br />
21 Open Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113914 113914] (26/5)<br /><br />
Sno+ had problems at the Tier 1 where jobs failed whilst uploading data, believed to be due to an incorrect VOInfoPath. There's been a failure at replicating the issue, and the VOInfoPath advertised is correct. Very confusing, as I assume it all worked at some point before! In progress (2/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113910 113910] (26/5)<br /><br />
Another Sno+ ticket, concerning lcg-cp timeouts whilst data-staging from tape. Matt M has asked for advice on the best practice for doing this, or if Sno+ would be better off just upping their timeouts. Brian has given some advice on using the "bringonline" command, but is himself unsure the best way of seeing what files are currently in a VO's cache. Not much news since. In progress (28/5) <br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114004 114004] (31/5)<br /><br />
Atlas transfers fail due to the "bring-online" timeout being exceeded. Brian spotted a problem with file timestamps mismatching, but no news on this ticket since. In progress (1/6)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114006 114006] (31/5)<br /><br />
APEL accounting oddness at Brunel, noticed by the APEL team. After much to-and-fro-ing John noticed that multiple CEs were using the same SumbitHost, and thus overwriting each other's sync records. Something to watch out for. In progress (7/6)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114157 114157] (8/6)<br /><br />
There's been some debate on the atlas lists about this ticket, a classic "not enough space at the site" ticket. Raising above the indignation over being ticketed for this, Daniela has offered a couple of TB to give some space, and pointed out that IC have some atlas data outside space tokens, and that this could be used to expand the tokens if cleaned up. Waiting for reply (8/6)<br />
<br />
<br />
'''Tuesday 26th May 2015'''<br /><br />
Matt's on leave until the 8th of June. But he's replaceable with handy links... 23 tickets today:<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios''']<br /><br />
<br />
[http://tinyurl.com/nwgrnys '''UK NGI GGUS tickets''']<br />
<br />
'''Monday 18th May 2015, 14.30 BST'''<br /><br />
Full review this week.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios''']<br /><br />
At time of writing I see problems with test jobs at Brunel for pheno and Liverpool for a number of VOs (see Sno+ ticket for probable cause and fix at Liverpool).<br />
<br />
22 Open UK Tickets this week. Going site-by-site:<br />
<br />
'''APEL/NGI'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113473 113473] (4/5)<br /><br />
Missing accounting date for April for some sites. Raul is discussing things for Brunel in the ticket, although they have republished. I think it's only ECDF left to republish their April data. In progress (16/5)<br />
<br />
'''OXFORD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113482 113482] (26/4)<br /><br />
Loss of accounting data for Oxford needing a APEL republish. The Oxford guys republished, but there is some confusion with the resulting numbers. Discussion is ongoing, John G is currently looking at the records. In progress (14/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113650 113650] (11/5)<br /><br />
CMS glideins failing at Oxford. The original problem was with a config tweak being left out of the cvmfs setup, but the ticket has been reopened citing problems persisting on the ARC CE (the CREAM appears to be fixed). Reopened (16/5)<br />
<br />
'''GLASGOW'''<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113095 113095] (17/4)<br /><br />
ROD ticket about batch system BDII failures, left open to avoid unnecessary ticket filing. Gareth noted that the full migration to ARC and HTCondor, which should see the end of these issues, will hopefully be completed by the end of June. On Hold (12/5)<br />
<br />
'''ECDF'''<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (31/7/13)<br /><br />
Somehow left this one out of the e-mail update. Edinburgh's glexec ticket, dependent on the tarball. I put in my tuppence worth today with my tarball hat on. On hold (18/5)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113769 113769] (18/5)<br /><br />
LHCB see a cvmfs problem at Sheffield. Elena has probably fixed the problem(restarted the sssd), just waiting to see if it all pans out. In progress (18/5)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113744 113744] (15/5)<br /><br />
For the VOMS rather then the site, Jens' request for the creation of the dIrac VO, vo.dirac.ac.uk. In progress (18/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113692 113692] (13/5)<br /><br />
A request from pheno to add support to for their new cvmfs area at Manchester, and as I understand it, to support them in a new "form" (pheno.egi.eu). In progress (13/5)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113742 113742] (15/5)<br /><br />
Sno+ noticed their nagios failures at Liverpool. Rob reckons this was a problem with the DPM BDII service certificate not being updated (that's bitten me too), and fixed things this morning. Let's see how that goes. In progress (18/5)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13!)<br /><br />
Lancaster's vintage glexec ticket. An update on this - after have a roundtuit session last week I was building glexec for different paths. It still needs some testing to make sure it works properly. There however definitely won't be a one-size-fits-all tarball solution. On hold (15/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/14)<br /><br />
Only the crustiest old tickets for us at Lancaster! Poor perfsonar performance. Sadly didn't get roundtuit on this one - we're pushing getting these nodes dual stacked as Ewan had pointed out that it would be interesting to see if IPv6 tests also saw this issue. On hild (18/5)<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113721 113721] (14/5)<br /><br />
The only UCL ticket, this is a egi "low availability" ticket. However Daniela notes that the plots are on the rise, so things are looking alright. Probably want to "On Hold" it but otherwise not much to be done. In progress (14/5)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113743 113743] (15/5)<br /><br />
A ticket from Durham concerning the Dirac instance at Imperial's settings for their site. Daniela hopes to get it fixed soon. In progress (15/5)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112948 112948] (10/4)<br /><br />
CA certificate update at 100IT leading to a discussion of other authentication based failures. David has asked for voms information after posting his configs. In progress (13/5)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113035 113035] (14/4)<br /><br />
Ticket tracking the decommissioning of the Tier 1 CREAM CEs. I think things are just about done now, this ticket can soon be closed. In progress (11/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694] (28/10/14)<br /><br />
Sno+ gfal-copy ticket. Brian reports that the Tier 1 is upgrading gfal2 on their WNs, and notes that there's a lot of active debugging work going on in the area. As he eloquently puts it "situation is quite fluid". In progress (13/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA tests failing at the Tier 1. There's been a lot of work on this, deploying then trying to get the new xrootd director configured. New problems have cropped up, and are under investigation. In progress (11/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (28/3)<br /><br />
Atlas transfer failures ("failed to get source file size"). Tracked to a odd double transfer error, possibly introduced in one of the recent "upgrades". Brian has been declaring these files as bad, and a workaround or solution is being thought about. In progress (14/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113705 113705] (13/5)<br /><br />
Atlas transfer failures from RAL tape. Checksum failures, which Brian tracked to being due to not being of a type Castor supports. Brian has asked if this can be changed at the CERN FTS or in rucio. Waiting for reply (14/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113748 113748] (16/5)<br /><br />
Another atlas transfer ticket, but as the error indicates no space left at the Brunel space token being transferred to Elena has noted that this isn't a site problem, telling the submitter to put in a JIRA ticket instead. Waiting for reply, but probably can be just closed (16/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112866 112866] (7/4)<br /><br />
Lots of cms job failures at RAL. This has been traced to some super-hot files, mitigation is being looked into. A candidate for perhaps On Holding, depends on the time frame of a work around. In progress (13/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113320 113320] (27/4)<br /><br />
CMS data transfer issues. I'm not actually too sure what's going on. There are files that need invalidating, which seems to be the root of the evil befalling transfers. The issue is being actively worked on though. In progress (18/5)<br />
<br />
'''Monday 11th May 2015, 14.10 BST'''<br /><br />
22 Open UK Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
There are a few tickets at the Tier 1 that are set "In Progress" but haven't received an update yet this month:<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (CMS AAA Tests, 30/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (Atlas Transfer problems, 16/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694] (SNO+ gfal copy trouble, 15/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112866 112866] (CMS job failures, 7/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112819 112819] (SNO+ arcsync troubles, 20/4)<br />
<br />
'''Other Tier 1 Tickets''' (sorry to be picking on you guys!)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (10/2)<br /><br />
Atlas glexec hammercloud test jobs at the Tier 1. It appears to be working, but a batch of test jobs failed because they couldn't find the "mkgltempdir" utility on some nodes ("slot1_5@lcg1742.gridpp.rl.ac.uk" and "slot1_4@lcg1739.gridpp.rl.ac.uk"). In progress (4/5)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=113320 113320] (27/4)<br /><br />
Maybe repeating what Daniela is going to say in the CMS update - trouble with CMS data transfers within RAL. It's under investigation, but it looks like the files in question will need to be invalidated - even if it's just to paint a clearer picture. In progress (10/5)<br />
<br />
'''APEL REPUBLISHING'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113473 113473] <br /><br />
At last update Brunel, Liverpool, Edinburgh, Birmingham and Oxford need to republish still. Oxford have their own ticket about it due to complications ([https://ggus.eu/?mode=ticket_info&ticket_id=113482 113482]).<br />
<br />
'''UCL Tickets''' - Ben is starting to move to close these, some are going to be "unsolved".<br />
<br />
'''GLASGOW'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113095 113095] (17/4)<br /><br />
Andrew asks if the timeframe for the move to Condor be added to this ticket, for the ROD team's information. On Hold (7/4)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112948 112948] (10/4)<br /><br />
No news on this 100IT ticket for a while. In progress (27/4)<br />
<br />
'''Friday 1st May'''<br /><br />
The Bank Holiday weekend might muck up plans for a Ticket review this week. Just in case, some links!<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios page.]<br /><br />
<br />
[http://tinyurl.com/l784bbg UK GGUS Tickets]<br />
<br />
Hope you all have a nice weekend!<br />
<br />
A quick check of the [https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios] page.<br />
<br />
26 Open UK tickets this week.<br />
<br />
'''ITWO Decommissioning'''<br /><br />
Three of the tickets are to the VOMS sites (Manchester, Oxford, IC), concerning the decommissioning of the ITWO VO. Just an FYI to y'all.<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113293 113293] (26/4)<br /><br />
There was an APEL problem last month where a lot of sites needed to republish their data for the month. I think Edinburgh are the only UK site that suffered this problem, but another FYI ticket. Assigned (26/4) '' And solved''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113181 113181](21/4)<br /><br />
Atlas production jobs not running at ECDF. Andy noticed that analysis jobs were running fine, and believes that this might be a problem scheduling pilots in time. Perhaps a multicore issue if this is only effecting (affecting?) production jobs. In progress (22/4) ''Update - solved''<br />
<br />
'''ATLAS GLEXEC HAMMERCLOUD PROBLEMS'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (TIER 1)<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=111703 111703] (RALPP)<br /><br />
It was discovered that there was a problem in the test code, so the ball is very much in atlas' court for this one. The problem has been fixed and the tests are being rebuilt and resubmitted.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (28/3)<br /><br />
ATLAS FTS failures too RAL. A rucio issue causing double-transfers has been discovered ([https://its.cern.ch/jira/browse/ATLDDMOPS-4939 here]), which would explain the behaviour seen. No news since this revelation. In progress (16/4)<br />
<br />
There are a number of other Tier 1 tickets that could do with either an update or On Holding<br />
<br />
'''Monday 20th April 2015, 14.30 BST'''<br /><br />
24 Open UK tickets this week, only a light review. ''Update - down to 20 open tickets as of this morning''<br />
<br />
'''NGI/TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113150 113150] (20/4)<br /><br />
Fresh in - the NGI has been ticketed to change the regional VO from emi.argus to ngi.argus in the gocdb. Seems a bit pedantic, but hey! I assigned it to the NGI ops, and notified RAL as keepers of the regional argus. Assigned (20/4) ''Update - solved''<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113035 113035] (14/4)<br /><br />
Just for people's interest, the ticket tracking the decommissioning of the last of the RAL CREAM CEs. In progress (14/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112819 112819] (2/4)<br /><br />
A SNO+ ticket I must of somehow missed last week, concerning SNO+'s manual renewing of proxies on ARC machines. Matt M has noticed that ArcSync occasionally hangs rather then timeouts smoothly (although he later notes that he doesn't see the initial problems working from a different network). I'm thinking that this should be redirected at the arc devs, but I don't think they have a GGUS support group (I could be wrong, I'm well behind on the ARC curve). In progress (7/4)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113110 113110] (17/4)<br /><br />
Looks like this atlas low transfer efficiency ticket can be closed. Waiting for reply (20/4) ''Update - solved''<br />
<br />
'''GLASGOW'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113095 113095] (17/4)<br /><br />
ROD ticket for some BDII misreporting at Glasgow. The botheration seems to be ephemeral in nature, the blunders passing with the abating of their batch system's burden. This ticket can probably be solved. In progress (17/4)<br />
<br />
<br />
'''Monday 13th April 2015, 14.00 BST'''<br /><br />
24 Open tickets this week - going over all of them this week, site by site.<br />
<br />
''Fresh in this morning - [https://ggus.eu/?mode=ticket_info&ticket_id=113010 113010] and [https://ggus.eu/?mode=ticket_info&ticket_id=113011 113011] - Sno+ tickets concerning the RAL and Glasgow WMSes not updating job statuses.''<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703](11/2)<br /><br />
Atlas glexec hammercloud tests failing. There's been a lot of waiting on atlas to build new HC jobs. The most recent exchange (delayed due to Easter), was asking about SELinux - but no news since the first. In progress (1/4)<br />
<br />
'''BIRMINGHAM'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112875 112875](7/4)<br /><br />
Low availability ROD ticket. Availability is crawling back up, just need it to go green. On hold (13/4)<br />
<br />
'''GLASGOW'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112967 112967](10/4)<br /><br />
Another ROD ticket for bdii errors at Glasgow. Gareth has been doing everything right investigating this. Kashif recommended ticketed the midmon unit, but Gareth has spotted that the errors correspond to high load on their ARC CE - so it might be a site problem after all - Gareth asks for clarification. Waiting for reply (13/4)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (1/7/13)<br /><br />
Tarball glexec ticket. No news (sorry). End of April I believe was the "deadline" I set for having this made. On Hold (9/3)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/14)<br /><br />
Lancaster's poor perfsonar performance. I'm not believing quite what I was seeing with the tests I performed so I'm aiming to rerun them. On hold (13/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13)<br /><br />
Lancaster's tarball glexec ticket. Same as ECDF. On hold (9/3)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112966 112966] (13/3)<br /><br />
A ROD cream job submit ticket, freshly assigned this afternoon. It's a bit mean of me to bring notice to it. Assigned (13/4) ''And POW, Raul closed this after kicking torque into shape - solved''<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112948 112948] (10/4)<br /><br />
100IT needed to upgrade to the latest CA release. They've done this, but there are still authentication problems. In progress (13/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356] (10/9/14)<br /><br />
Deploying vmcatcher at 100IT. After David's questions falling on deaf ears for a while it has been advised that the ticket be closed as this issue will be dealt with elsewhere. Whether or not it is to be "solved" or "unsolved" is open to debate! In progress (can possibly be closed) (13/4)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA tests failing at RAL. After a lot of work and new xrootd redirectors problems persist. It's looking to be a problem that needs the CASTOR and/or xrootd devs to look at. In progress (30/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112713 112713] (27/3)<br /><br />
CMS asking to clean up the "unmerged area". Andrew conjured up a list of files and asked if they could be deleted - CMS responded with a "yes please then close the ticket". Has the deed been done? In progress (31/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694] (28/10/14)<br /><br />
The Sno+ gfal copy ticket. Matt M still sees gfal-copy hang for files at RAL when he uses the GUID (SURL works). A Castor oddity perhaps? Matt asks a question about what problems like this (coupled with the move away from lcg tools) will mean for VOs that rely on the LFC. In progress (31/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112977 112977] (10/3)<br /><br />
CMS high job failure rate at RAL. Related to 112896 (below) - the jobs all want that file! In progress (13/3)<br />
<br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=112896 (9/4)<br /><br />
CMS Dataset access problems - caused by over a million access attempts on a single file over a 18 hour period. Andrew L comments that CMS needs to have a think about how they access pileup datasets. In progress (9/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (10/2)<br /><br />
Tier 1 counterpart to 111703. A new HC stress test was submitted near the end of March, but no news on how it did. In progress (23/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112866 112866] (7/4)<br /><br />
A different "lots of CMS job failures" ticket. Again a "hot file" seems to be the root cause. In progress (7/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (28/3)<br /><br />
An atlas file access ticket, seemingly caused by some odd FTS behaviour. No answers to Shaun's question about this odd occurrence or much noise at all till today. Waiting for reply (13/4)<br />
<br />
'''UCL'''<br /><br />
UCL has 6 tickets - 4 just "assigned". I'll just list them in the interests of brevity.<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112371 112371] (ROD low availiability, On Hold)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112841 112841] (atlas 0% transfer efficiency, assigned)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112873 112873] (ROD srm put failures, assigned)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95298 95298] (glexec ticket, on hold)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112722 112722] (atlas checksum timeouts, in progress)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112966 112966] (ROD job submit failures, assigned)<br />
<br />
'''Tuesday 7th April'''<br />
* 20 open tickets. [http://tinyurl.com/l784bbg Link to GGUS].<br />
<br />
'''Monday 23rd March 2015, 15.30 GMT'''<br /><br />
19 Open tickets this week. <br />
<br />
'''BIRMINGHAM'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=112550 (23/3)<br /><br />
A ticket fresh off the ROD dashboard - the Birmingham CREAMs aren't being matched ("BrokerHelper: no compatible resources"). Matt W has double checked their setup and can't spot anything wrong - they've been running "normal" atlas/lhcb etc jobs fine over the last few weeks. Any advice appreciated. In progress (23/3)<br />
<br />
'''TIER 1'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=112495 (21/3)<br /><br />
MICE problems running jobs at RAL, which Andrew L discovered coincided with WMS problems that he fixed. Probably should be in "Waiting for Reply/Seeing if the problem's evaporated". In progress (23/3)<br />
<br />
https://ggus.eu/?mode=ticket_info&ticket_id=112350 (14/3)<br /><br />
The cause of this Sno+ ticket, about a recent user not being able to access files due to not being in the gridmap, has been discovered. As Robert F sagely pointed out the latest version of the mkgridmap rpm is required to talk to the voms server. Just waiting on the time for it to updated at now. In progress (17/3)<br />
<br />
'''100IT'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=108356 (10/9/14)<br /><br />
This 100IT ticket is still waiting for a reply (since mid-January). The question needs to be answered by someone familiar with the technologies and terminologies? Is anyone up on vmcatcher? Anyone know what other channel to pass David's query onto? Waiting for reply (15/1)<br />
<br />
'''Monday 16th March 2015, 15.30 GMT'''<br /><br />
<br />
16 Open UK tickets this week. Half Red, Half Green.<br />
<br />
'''RALPP and TIER 1 glexec HC tickets'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703] (RALPP)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (TIER 1)<br /><br />
No news on these tickets since it was expected that a new stable HC job would be released last Tuesday. All very quiet.<br />
<br />
'''OXFORD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112011 112011](25/2)<br /><br />
Similar for this CMS glexec ticket - no news after Kashif asked for some more information way back. Waiting for reply (27/2)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694](28/10/14)<br /><br />
The Sno+ gfal copying ticket. A lot of people are working on this, and attempts to recreate the problems seem to occasionally be devolving into "did we get this complicated command right?". At some point it might be necessary to get the gfal devs involved (are there gfal devs?). Waiting for reply (11/3)<br />
<br />
Also there's the poor 100IT ticket, still waiting for a reply. The JET ticket also needs wrapping up, I put a reminder in to the end of it.<br />
<br />
<br />
'''Monday 9th March 2015, 15.00 GMT'''<br/ ><br />
From last week's crusty ticket round up:<br />
<br />
'''Tier 1'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694]- 28/10/14<br/ ><br />
Matt M has managed to reclaim his tickets after a certificate change orphaned his old ones. Progress has resumed. Duncan has asked Matt to retry his failing tests with a simple copy to local disk example.<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944]- 1/10/14<br/ ><br />
CMS AAA access at RAL. Andrew posed a question to the CMS xroot experts last week - if you have their details it might be a good idea to involve them in the ticket.<br />
<br />
'''100IT'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356]- 10/9/14<br/ ><br />
This VMCatcher ticket is still stuck waiting for a reply. Deafening silence for our 100IT colleagues. <br />
<br />
'''EFDA-JET'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=97485 97485] - 21/9/13<br/ ><br />
The Jet LHCB ticket is in the state of being wrapped up. LHCB have been removed from the local configs, and the site has been removed from LHCB's. I believe that this ticket can be terminated.<br />
<br />
'''QMUL'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353] - 25/11/14<br/ ><br />
Dan's managed to get webdav working on one SE, but not t'other. Very strange, but Dan is investigating (see also [https://ggus.eu/?mode=ticket_info&ticket_id=111942 111942]).<br />
<br />
No movement on the three glexec tickets (none expected on the two tarball ones in the last week though), the Lancaster perfsonar ticket is still waiting on another batch of local tweaks (and I still need to make sense of what I'm seeing). Matt RB closed Sussex's perfsonar ticket though - nice one.<br />
<br />
'''The "Normal" tickets:'''<br />
<br />
Atlas gLexec Hammercloud failures (RALPP and Tier 1)<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (Tier 1)<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703] (RALPP)<br/ ><br />
These tests were waiting on a new stable job release being made - this has been delayed (hopefully out tomorrow).<br />
<br />
'''TIER 1'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111856 111856](19/2)<br/ ><br />
This LHCB ticket about stalled jobs looks like it can be closed (LHCB no longer see a problem). ''Update - set to solved, the jobs were being killed for using too much memory.''<br />
<br />
'''OXFORD'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112011 112011](25/2)<br/ ><br />
A CMS user saw glexec failures on some nodes - Kashif asked for some more information but there has been no reply. I'd consider giving the user till the end of the week then closing the ticket if there's still no word. Waiting for reply (27/2)<br />
<br />
'''Tuesday 3rd March'''<br />
* [http://tinyurl.com/nwgrnys A link to GGUS open tickets] for checking status directly.<br />
<br />
Concentrating on pre-2015 tickets this week in an attempt to Spring Clean the UK ggus presence. I will review these again next week - can people please take a look at these tickets if they're owned by them (or if they think they can help!).<br />
<br />
'''SUSSEX - 26/11/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389]<br /><br />
This is a perfsonar ticket - the initial request (reinstalling the perfsonar node) has been done a while ago but things weren't quite right. Matt RB did some soothing to this last week and asks if he's missed anything - I put it to waiting for reply this morning. Waiting for reply (24/2)<br />
<br />
'''QMUL - 25/11/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353]<br /><br />
Atlas wanting https access on QM's SE. Dan's been working on this nicely, carefully testing each stage of his rollout. The end is in sight here. In progress (17/2)<br />
<br />
'''TIER 1 - 28/10/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694]<br /><br />
This is a SNO+ ticket about getting gfal tools working for the Tier 1 - with the new version out Brian has tested it correctly (and I saw a related thread on lcg-rollout) - but no word from Sno+. Who is wrangling the other VOs in these post-Walker times? Waiting for reply (24/2)<br />
<br />
'''TIER 1 - 1/10/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944]<br /><br />
A CMS access about AAA tests at the Tier 1. This ticket is being actively worked on, with a new xrootd redirector at RAL and problems with the EU redirectors mucking things up. No problems that I can see. Waiting for reply (2/3)<br />
<br />
'''100IT - 10/9/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356]<br /><br />
Getting VMcatcher stuff to work at 100IT. This ticket seems to keep stalling due to lack of documentation or replies from the submitters. Waiting for reply (19/1)<br />
<br />
'''LANCASTER - 27/1/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566]<br /><br />
Lancaster's poor perfsonar performance. Being poked and prodded on and off over the last year, but the problems remain a mystery - Ewan's lending of a iperf endpoint has helped out greatly though, waiting on yet another network tweak. On Hold (23/2)<br />
<br />
'''EFDA-JET - 21/9/13'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=97485 97485]<br /><br />
LHCB job failures at EFDA-JET. The causes of this remain a mystery, and is the first ticket on my "to be set to unsolved" list.<br />
<br />
'''UCL - 1/7/13'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95298 95298]<br /><br />
UCL's glexec ticket. Ben's been working hard at this recently, but keeps hitting show stoppers - the latest being a performance problem possibly due to the VM he's running argus on. On hold (19/2)<br />
<br />
'''ECDF and LANCASTER - 1/7/13'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (ECDF)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (Lancaster)<br /><br />
glexec for the tarball. This is *still* waiting on the tarball glexec, which is again waiting on me, which is waiting on me magicking some extra tarball development time. Will be reviewed by the end of March.<br />
<br />
<br />
'''Monday 23rd February 2015, 15.00 GMT'''<br /><br />
Only 15 Open UK tickets this week. Feel free to bring up any ticket-based issues of your own on this quiet week.<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703](11/2)<br /><br />
This CMS glexec hammercloud ticket is looking a little quiet - no update for a while. If it's continuing offline or waiting on input could it at least be put On Hold? In progress (11/2)<br />
(The Tier-1 version of this ticket, 111699, seems to be chugging along fine - there might be useful snippets in there).<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333](22/1)<br /><br />
The Cloud accounting probe ticket was reopened, asking if 100IT ticketed apel support (I assume contacting them via other means would work too) otherwise the new cloud accounting won't be properly republished. Reopened (20/2)<br />
<br />
<br />
'''IMPERIAL''' (but not really their issue)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111872 111872](20/2)<br /><br />
Tom opened a ticket after another cern@school user had troubles using the IC SE - there has been some problems with the newer versions of the dirac UI. Sometimes it's better to go Vintage! Although after trying Simon and Daniela haven't been able to reproduce the failure - perhaps something's up with the user's UI? Waiting for reply (23/2)<br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389](26/11/14)<br /><br />
Sussex's Perfsonar ticket. I know Matt RB has put the ticket On Hold and is very busy - is there any news/anything we can do to help? On Hold (21/1<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694](28/10/14)<br /><br />
Sno+ gfal copy problems at RAL. Brian informs us that the latest version of the gfal tools works for him and has asked if they work for Matt M. and Co. Did you get these packages out of epel/epel-testing or somewhere else Brian? Waiting for reply (18/2)<br />
<br />
'''Monday 16th February 2015, 14.30 GMT'''<br /><br />
Only 19 open UK tickets today.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111120 111120](12/1)<br /><br />
An atlas ticket concerning transfer failures between RAL and BNL. Brian mentioned last week that the lack of recent failures is due to atlas not attempting to transfer any older data recently. Perhaps this could do with being put into the ticket (and the ticket being put On Hold, or prodded some more)? Waiting for reply (29/1)<br />
<br />
Also<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=111800 111800] (17/2)<br /><br />
ARC CE issues at RAL detected. <br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703](11/2)<br /><br />
Atlas running glexec hammerclouds - having trouble at RALPP (and RAL, see 111699). The glexec experts have gotten involved on this one, and asked to take a peek at a proxy - not sure about anyone else, but I'd feel a tad uncomfortable sharing proxies, even with known and trusted experts as in this case. Am I being overly paranoid? Either way the ticket has gone a bit quiet. In progress (11/2) <br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356] & [https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333]<br /><br />
Both the 100IT tickets are in Waiting for reply - the oldest one for quite a while - David asked a question a while back and no answer. The newest one asks if the 100IT logs made it to apel safely - I think what David has to do is submit a ticket with this question to the apel support team - have I got the right end of the stick? <br />
<br />
'''Biomed Tickets at Manchester and Imperial'''<br /> <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356] & [https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357]<br /><br />
FYI There's a note at the bottom of both of these tickets that the version of CREAM that should fix this has been delayed until the end of February(ish).<br /><br />
''Update - I read those updates wrong - the cream update has been released and these tickets have been (perhaps erroneously) closed.'' <br />
<br />
<br />
<br />
'''Monday 9th February 2015, 15.00 GMT'''<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios Results]<br /><br />
At the time of writing the only site showing red that aren't suffering an understood problem was RALPP with org.nordugrid.ARC-CE-submit and SRM-submit test failures for gridpp, pheno, t2k and southgrid for both its CEs and its SE. The failures are between 1 and 12 hours old, so it doesn't seem to be a persistent failure, but it seems to be quite consistent. They all seem to be failing with "Job submission failed... arcsub exited with code 256: ...ERROR: Failed to connect to XXXX(IPv4):443 .... Job submission failed, no more possible targets". Anyone seen something like this before?<br />
<br />
Only 20 Open UK tickets this week.<br />
<br />
'''Biomed tickets:'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356] (Manchester)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357] (Imperial)<br /><br />
Biomed have linked both these tickets as children of 110636, being worked on by the cream blah team. AFAIKS no sign of Cream 1.16.5 just yet.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111347 111347] (22/1)<br /><br />
CMS consistency checks for January 2015. It looks like everything that was asked of RAL has been done by RAL, so hopefully this can be successfully closed. In progress (3/2)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111120 111120] (12/1)<br /><br />
Another ticket, this time concerning a period of Atlas transfer failures between RAL and BNL, that looks like it can be closed as the failures seem to have stopped (and might well have been at the BNL end). Waiting for reply (22/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA test failures at RAL. Federica can't connect to the new xrootd service according to the error messages. No news for a while. In progress (29/1)<br />
<br />
'''100IT'''<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356]<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333]<br /><br />
Both of these 100IT tickets are looking a bit crusty - the first is waiting for advice, the second was just put "In progress".<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353] (25/11/14)<br />
Dan has set up se02.esc.qmul.ac.uk to test out the latest https-accessible version of storm for dteam and atlas. As a cherry on top this node is also IPv6 enabled. I'm not sure if Dan wants others in the UK to "give it a go"? In progress (6/2)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/14)<br /><br />
(Blatantly scounging for advice) Trying to figure out why Lancaster's perfsonar is under-performing. Ewan kindly gave us access to a iperf endpoint and it's been very useful in characterising some of the weirdness - although I'm still confused. Ewan also gave us a bunch of suggestions for testing that have been useful - next stop, window sizes. If anyone else wants to throw advice to me all wisdom donations are thankfully accepted. My advice for others in be careful trying to connect to the default iperf port on a working DPM pool node.... In Progress (9/2)<br />
<br />
'''Monday 2nd February 2015, 14.00 GMT'''<br /><br />
22 Open UK tickets this month. <br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389] (26/11/14)<br /><br />
A perfsonar ticket for Sussex. Their perfsonar has been reinstalled, but needs soothing. Matt has informed us that this might have to wait a few weeks due to other issues. On Hold (21/1)<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110536 110536] (2/12/14)<br /><br />
MICE job failures at RALPP - it looked like they were dying due to running of of memory. The queues have been tweaked to give MICE more, but no word from the MICE if this has solved the problem. Waiting for reply (12/1)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110365 110365] (25/11/14)<br /><br />
Another perfsonar ticket. Again the node is reinstalled, just not quite working right. Winnie is waiting for news from the other sites in a similar boat. In progress (maybe On Hold it?) (20/1)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111118 111118] (12/1)<br /><br />
ECDF "low availability" ticket - just waiting for the silly alarm to clear. Daniela submitted a ticket about this foolish alarm a while ago - [https://ggus.eu/?mode=ticket_info&ticket_id=107689 107689]. On Hold (19/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (1/7/13)<br /><br />
glexec tarball ticket. With my tarball hat on - still no positive news on this front - it's beginning to look like this can't be done but we're having one last go. Sorry! On Hold (19/12)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110225 110225] (18/11/14)<br /><br />
Change of VO Manager for helios-vo.eu. It looks like this ticket is being held up at the user end a lot. I'm not sure there's anything we can do as it involves outside CAs. On Hold (20/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356] (23/1)<br /><br />
One of Manchester's CEs not working for biomed, due to problems with the new CREAM/old WMS communication. Alessandra gave biomed some sagely advice, but I suspect this ticket will need to be prodded soon to get a reponse from biomed (who I agree should use a newer WMS and close it). On Hold (26/1)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111547 111547] (2/2)<br /><br />
I'm reporting on a ticket that I submitted to myself today. I'm not sure what that says about the world. Anyway - a ticket to track the decommissioning of one of Lancaster's CEs, as we try to do it all proper like. On Hold (2/2)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (21/1/14)<br /><br />
Lancaster's perfsonar ticket, which I sadly let reach its first birthday. I've been prodding this offline, does anyone have the address for a regular, open iperf endpoint I could borrow? On Hold (9/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13)<br /><br />
Lancaster's tarball glexec ticket, as the ECDF one. On hold (26/1)<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13)<br /><br />
UCL's glexec ticket. They've been having trouble getting it to behave, and at last check Ben was off ill - probably due to dealing with glexec :-) On Hold (20/1)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353] (25/11/14)<br /><br />
Atlas asking for QM's storage to be made available via https. Waiting on a production ready STORM that can provide this - Dan is trying it out on his testbed se02.esc.qmul.ac.uk, which still needs tweaking. In progress (28/1)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357](23/1)<br /><br />
One of the IC CEs not working for biomed. Similar to the Manchester ticket, Daniela points to ticket 110635 and is waiting on an EMI release to fix it (due out imminently AIUI). On Hold (28/1)<br />
<br />
'''EFDA-JET'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=97485 97485] (21/9/13)<br /><br />
Jet's LCHB job failure tickets. I'm afraid I haven't been able to chase this up (partly due to only ever remembering on the first Monday of the month) - there's been no news for a while. On Hold (1/10/14)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333] (22/1)<br /><br />
A ticket to 100IT and the NGI to get the cloud accounting probe upgraded. I notified 100IT, but forgot to reassign the ticket - thanks to Jeremy for doing it. Assigned (2/2)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356](10/9/14)<br /><br />
Getting VMcatcher working at 100IT. David from 100IT has asked for some answers on which "glancepush" to use, but no reply for a while. Waiting for reply (19/1)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111477 111477](29/1)<br /><br />
CMS would like to run some staging tests to warm up for Run2. The Tier 1 warned CMS of today's outage and they're happy to proceed tomorrow (the 3rd) - I think they'd like a response. In progress (30/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=107935 107935](27/8/14)<br /><br />
A ticket regarding inconsistent BDII and SRM storage numbers. Waiting on a fix from the developers regarding read-only disk accounting (I think), Brian is still on the case. Stephen B let us know that Maria the ticket submitter is on maternity leave, and asks in her stead if the numbers are expected to align now. On hold (28/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111120 111120](12/1)<br /><br />
An atlas ticket about a large number of data transfer errors seen between RAL and BNL. Brian reckoned that this was due to shallow checksums on the old data being transferred, but had trouble looking at the BNL FTS. Regardless, the ADCoS shifter hadn't seen any errors for a week and suggests the ticket can be closed. Waiting for reply (29/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944](1/10/14)<br /><br />
CMS AAA test problems at RAL. After setting up a new xrootd box the test failures have changed in nature, but sadly they're still failures. In progress (29/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111347 111347](22/1)<br /><br />
CMS Consistency Check for RAL, January 2015 edition. Filelists were generated, orphan files were identified, then purged. Just need to know what CMS want to do next. Waiting for reply (26/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694](28/10/14)<br /><br />
Sno+ ticket concerning gfal tool problems, waiting on the new release to come out (middle of this month I believe). If you don't want to wait that long then I believe the 2.8 gfal2 tools can be found in the fts3 repo at last check. On hold (20/1)<br />
<br />
<br />
'''Monday 26th January 2015, 14.15 GMT'''<br /><br />
''Back after being forgotten about by me:''<br /><br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios Status:''']<br />
<br />
At the time of writing I see:<br /><br />
'''Imperial:''' gridpp VO job submission errors (but only 34 minutes old so probably naught to worry about).<br /><br />
'''Brunel:''' gridpp VO jobs aborted (one of these is 94 days old, so might be something to worry about).<br /><br />
'''Lancaster:''' pheno failures (I can't see what's wrong, but this CE only has 10 days left to live).<br /><br />
'''Sussex:''' snoplus failures (but I think Sussex is in downtime).<br /><br />
'''RALPP:''' A number of failures across a number of CEs, all a few hours old. An SE problem?<br /><br />
'''Sheffield:''' gridpp VO job submission failure, but only 6 hours old.<br />
And of course the srm-$VONAME failures at the Tier 1, which are caused by incompatibility between the tests and Castor AIUI. Things are generally looking good.<br />
<br />
22 Open UK Tickets this week. <br /><br />
'''NGI/100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333](22/1)<br /><br />
The NGI has been asked to upgrade the cloud accounting probe, and then notify our (only at the moment) cloud site to republish their accounting. Not entirely sure what this entails or who this falls on, I assigned it to NGI-OPERATIONS (and also noticed that 100IT isn't on the "notify site" list - odd). Assigned (22/1)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944](1/10/14)<br /><br />
CMS AAA test failures. Andrew Lahiff reported last week that the Tier 1 is building a replacement xrootd box which is currently being prepared. If that will take a while can the ticket be put on hold? In progress (19/1)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353](25/11/14)<br /><br />
An atlas ticket, asking for httpd access to at QMUL. The QM chaps were waiting on a production ready Storm that could handle this, and are preparing to test one out. This is another ticket that looks like it might need to be put On Hold (will leave that up to you chaps - there's a big difference between "slow and steady" progress and "no progress for a while"). In progress (21/1)<br />
<br />
'''RHUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111355 111355](23/1)<br /><br />
A dteam ticket - concerning http access to RHUL's SE. Although the initial observation about the SE certificate being expired was incorrect (the expiry date was reported as 5/1/15, which to be fair I would read as the 5th of January and not the 1st of May!) there still is some underlying problem here with intermittent test failures. Also this ticket raises the question of under what context are these tests being conducted? Anyone know, or shall we ask the submitter? In progress (26/1)<br />
<br />
'''BIOMED PROBLEMS:'''<br /><br />
'''Manchester:''' [https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356](23/1)<br /><br />
'''Imperial:''' [https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357](23/1)<br /><br />
Biomed are having job problems, looking to be caused by using crusty old WMSes to communicate with these site's shiny up-to-date CEs. According to ticket 110635 a cream side fix should be out by the end of January (CREAM 1.16.5), although Alessandra suggests that Biomed should try to use newer, working WMSes - or Dirac instead!<br />
<br />
<br />
'''Monday 19th January 2015, 14.30 GMT'''<br /><br />
23 Open UK Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944](1/10/14)<br /><br />
CMS seeing AAA test failures at RAL. The tests have been restarted recently and now seem to be having some suspicious looking authentication failures. In Progress (13/1)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111162 111162](14/1)<br /><br />
Atlas complaining about httpd doors not working on Sheffield's SE. After schooling the submitter in how to submit more useful information Elena is working on it. I bring this up as in the last few days I've had quite a few of my pool nodes have their httpd daemons crash on them (they're up to date, but still SL5), which may or may not be related. In Progress (19/1)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111118 111118](12/1)<br /><br />
ECDF "low availability" ticket after a few days of argus trouble, which Wahid fixed. Now the ticket will languish for a few weeks as the alarm clears. Daniela has reminded us of her ticket against these fairly silly alarms: https://ggus.eu/?mode=ticket_info&ticket_id=107689 . In the mean time this ticket could do with being put On Hold whilst the alarm clears. In progress (19/1)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356](10/9/2014)<br /><br />
Setting up VMCatcher at 100IT. After some troubles things seem to have be looking up, although there are still some questions that the 100IT chaps have for the configurations and what they should be using that aren't getting answers. I set the ticket to "Waiting for Reply" hoping that this will help get those in the know's attention. Waiting for reply (15/1)<br />
<br />
'''Perfsonar Tickets'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389](Sussex)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110382 110382](TIER 1)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108273 108273](Durham)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566](Lancaster)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110365 110365](Bristol)<br /><br />
Everyone seems to have updated their perfsonar hosts, so we're all good on that front, but a number of sites are either having trouble with their reinstalled hosts, or are having problems that they had pre-reinstall still haunt them. I'm afraid I have no suggestions of what to do about the growing number of these tickets though!</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Operations_Bulletin_LatestOperations Bulletin Latest2015-11-02T17:56:43Z<p>Matthew Doidge 1ac9bd3994: /* */</p>
<hr />
<div>[[Operations_Bulletin_Archive|Bulletin archive]]<br />
<br />
__NOTOC__<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Week commencing 2nd November 2015<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Task Areas<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start General****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | General updates <br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start General text*********************** ----->'''<br />
<br />
'''Tuesday 27th October'''<br />
* HEPiX took place last week: [https://indico.cern.ch/event/384358/timetable/#all.detailed Agenda].<br />
* pheno: VO ID card for SW_DIR. For contacts check [http://operations-portal.egi.eu/vo/help this link].<br />
* [https://indico.cern.ch/e/453272 GridPP Technical Meeting] on Friday "virtual only". <br />
* The latest WLCG [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGDailyMeetingsWeek151026#Monday ops meeting minutes from Monday are available]. <br />
* There was a UK-T0 meeting last week. The other community talks may be of interest, they are linked from the [https://eventbooking.stfc.ac.uk/news-events/uk-t0-workshop-296?agenda=1 agenda].<br />
* Current GridPP vacancies (+ putting them on the GridPP website).<br />
* David Crooks is organising a "SOC" meeting. <br />
<br />
* Raja raised: "ARC CE publishing" and querying the BDII. <br />
* Luke asked about HTC CE documentation links (for European installation).<br />
* John H asked for comments on a dmlite message after update to dpm 1.8.10. <br />
<br />
'''Tuesday 13th October'''<br />
* Krishan mentions lcg-tags errors problems at EFDA-JET. <br />
* BDII installations affected by slapd crash. SL6/CentOS6 (openldap-servers-2.4.40-6) released as a security update, unfortunately from openldap-servers-2.4.40-5 an issue has been introduced which provokes the slapd process to crash under certain conditions.<br />
* Daniela notes: Location of archiving records on ARC-CEs... Beware the jobreport_options of "/var/run" which clears records on reboot.<br />
<br />
<br />
'''Tuesday 6th October'''<br />
* GOCDB has received a new service type request for ‘uk.ac.gridpp.vcycle’.<br />
* John H noted a "CVMFS problem at RAL". Apparently this was due to a misconfiguration at CERN.<br />
* With HPC DIRAC work RAL discovered a bug in how the re-use connection flag is used with the fts commands.<br />
* Here is a summary for the [http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/ September reports]:<br />
** [http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/WLCG_All_Sites_ALICE_Sep2015.pdf ALICE]: All okay.<br />
** [http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/WLCG_All_Sites_ATLAS_Sep2015.pdf ATLAS]: Lancaster: 38%:42% & Liverpool: 81%: 100%.<br />
** [http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/WLCG_All_Sites_CMS_Sep2015.pdf CMS]: All okay.<br />
** [ http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/WLCG_All_Sites_LHCB_Sep2015.pdf LHCb]: QMUL: 47%:47% ; Lancaster: 75%: 80% & ECDF: 89%<br />
* The UCL storage servers are no longer used by ATLAS.<br />
* The November GDB has been moved to 4th November.<br />
<br />
<!-- **********************End General text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End General****************** -----><br />
<!-- ****************Start ops coord****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Operations Coordination - [https://indico.cern.ch/categoryDisplay.py?categId=4372 Agendas][https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsCoordination Wiki Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start ops coord text*********************** -----><br />
<br />
'''Tuesday 27th October'''<br />
* 13th MW Readiness WG meeting THIS Wed 28/10 @ 4pm CET in CERN room 28-S-023 or via vidyo.<br />
* There was a WLCG ops coordination meeting last Thursday: [https://indico.cern.ch/event/393618/ Agenda] - [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151022 Minutes]. <br />
<br />
'''Next Meeting is scheduled for Thursday 22nd October'''<br />
<br />
'''Tuesday 6th October'''<br />
* There was a WLCG ops coordination meeting last week. [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151001 Minutes]. [https://indico.cern.ch/event/393617/ Agenda] (which has John Gordon's accounting slides).<br />
* The highlights:<br />
** dCache sites should install the latest fix for SRM solving a vulnerability<br />
** All sites hosting a regional or local site xrootd should updgrade it at least to version 4.1.1<br />
** CMS DPM sites should consider upgrading dpm-xrootd to version 3.5.5 now (from epel-testing) or after mid October (from epel-stable) to fix a problem affecting AAA<br />
** Tier-1 sites should do their best to avoid scheduling OUTAGE downtimes at the same time as other Tier-1's supporting common LHC VOs. A calendar will be linked in the minutes of the 3 o'clock operations meeting to easily find out if there are already downtimes at a given date<br />
** The multicore accounting for WLCG is now correct for the 99.5% of the CPU time, with the few remaining issues being addressed. Corrected historical accounting data is expected to be available from the production portal by the end of the month<br />
** All LHCb sites will soon be asked to deploy the "machine features" functionality<br />
<br />
'''Tuesday 22nd September'''<br />
* There was an ops coordination meeting last Thursday: [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150917 Minutes].<br />
* Highlights:<br />
** All 4 experiments have now an agreed workflow with the T0 for tickets that should be handled by the experiment supporters and were accidentally assigned to the T0 service managers.<br />
** A new FTS3 bug fixing release 3.3.1 is now available.<br />
** A globus lib issue is causing problems with FTS3 for sites running IPv6.<br />
** The rogue Glasgow configuration management tool replacing the current configuration for VOMS with the old one was picked up and unfortunately discussed as though sites had not got the message about using the new VOMS.<br />
** No network problems experienced with the transatlantic link despite 3 out of 4 cables being unavailable.<br />
** T0 experts are investigating the slow WN performance reported by LHCb and others.<br />
** A group of experts at CERN and CMS investigate ARGUS authentication problems affecting CMS VOBOXes.<br />
** T1 & T2 sites please observe the actions requested by ATLAS and CMS (also on the WLCG Operations portal).<br />
* Actions for [https://wlcg-ops.web.cern.ch/sys-admins/tasks Sites]; [https://wlcg-ops.web.cern.ch/experiments/tasks Experiments].<br />
<br />
'''Tuesday 15th September'''<br />
* The [https://indico.cern.ch/event/393616/ next WLCG ops coordination meeting is this Thursday 17th September]. Are there any Tier-2 issues we wish to raise? Minutes will appear [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150917 here]. <br />
<br />
<br />
<!-- **********************End ops coord text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End ops coord****************** -----><br />
<br />
<!-- ****************Start T1****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-1 - [http://www.gridpp.rl.ac.uk/status/ Status Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start T1 text*********************** -----><br />
'''Tuesday 27th October'''<br />
A reminder that there is a weekly [http://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Tier-1 experiment liaison meeting]. Notes from the last meeting [https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2015-10-21 here]<br />
* There is an issue on the WMS machines where a user was creating very large output files and filling up the disk partition.<br />
* We had two disk server failures over the weekend (Both part of Atlas Tape).<br />
* Also over the weekend, there was a problem with one of the arc-ces which we believe was due to problems with the Hyper_Visor it was running on. <br />
* We are investigating why LHCB batch jobs sometimes fail to write results back to Castor (and the sometimes fail to write remotely as well).<br />
* As reported this last couple of weeks - we did have a problem with glexec for the worker nodes over a weekend. We are trying to understand why this problem was not seen during the testing and roll-out of the new worker node configuration.<br />
<!-- **********************End T1 text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End T1****************** -----><br />
<br />
<!-- ****************Start Storage & DM****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Storage & Data Management - [http://storage.esc.rl.ac.uk/weekly/ Agendas/Minutes]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150902-minutes.txt Wednesday 02 Sep]'''<br />
* Catch up with [http://www3.imperial.ac.uk/highenergyphysics/research/experiments/mice MICE]<br />
* How to do transfers of '''lots''' of files with FTS3 without the proxy timing out (in particularly if you need it vomsified)<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150812-minutes.txt Wednesday 12 Aug]'''<br />
* sort of housekeeping: data cleanups, catalogue synchronisation - in particular namespace dumps for VOs<br />
* GridPP storage/data at future events; GridPP35 and Hepix and Cloud data events<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150708-minutes.txt Wednesday 08 July]'''<br />
* Huge backlog of ATLAS data from Glasgow waiting to go to RAL, and oddly varying performance numbers - investigating<br />
* How physics data is like your Windows 95 games<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150701-minutes.txt Wednesday 01 July]'''<br />
* Feedback on CMS's proposal for listing contents of storage<br />
* Simple storage on expensive raided disks vs complicated storage on el cheapo or archive drives?<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150624-minutes.txt Wednesday 24 June]'''<br />
* Heard about the Indigo datacloud project, a H2020 project in which STFC is participating<br />
* Data transfers, theory and practice<br />
** Somewhat clunky tools to set up but perform well when they run<br />
** Will continue to work on recommendations/overview document<br />
** Worth having recommendations/experiences for different audiences - (potential) users, decision makers, techies<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End Storage & DM****************** -----><br />
<!-- ****************Start T2EVO****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-2 Evolution - [https://its.cern.ch/jira/browse/GRIDPP/ GridPP JIRA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 20 Oct'''<br />
* LHCb multipilot VMs now in production<br />
* Support for APEL-Sync records in Vac, but need to co-ordinate with APEL team to validate it. This is to allow pure-VM sites like UCL to pass APEL SAM tests (GRIDPP-10)<br />
* Last GridPP Technical Meeting decided to test disk-less operation at Oxford for CMS (GRIDPP-20) and LHCb (GRIDPP-21).<br />
<br />
'''Tuesday 6 Oct'''<br />
* UCL Vac site now running LHCb test of two payloads per dual processor VM. Total of dual processor VMs at UCL now 120.<br />
<br />
'''Tuesday 29 Sep'''<br />
* UCL Vac site updated with most recent version of Vac-in-a-Box. Now running ~216 jobs: LHCb MC and ATLAS certification jobs.<br />
* Drawing up list of tasks needed to be able to run a site for GridPP-supported VOs purely using VMs (e.g. VM certification by experiments etc.)<br />
* Discussion at GridPP Technical Meeting on storage options, including xrootd-based sites (i.e. xrootd not DPM/dCache)<br />
<br />
'''Tuesday 22 Sep'''<br />
* Fortnightly [https://indico.cern.ch/category/4454/ GridPP Technical Meetings] on Fridays will have Tier-2 Evolution discussions, starting on Fri 25 Sept.<br />
<br />
'''Thursday 17 Sep'''<br />
* Task force to start developing advice for sites to simplify their operation in line with "6.2.5 Evolution of Tier-2 sites" in the GridPP5 proposal.<br />
* Mailing list for Tier-2 evolution activities: gridpp-t2evo@cern.ch - anyone welcome to join<br />
* Also [https://its.cern.ch/jira/browse/GRIDPP/ GridPP project] on the CERN JIRA service for tracking actions. Can be used with a full or lightweight CERN account. You need to be added manually or on the gridpp-ops@cern.ch mailing list to browse issues.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End T2EVO****************** -----><br />
<!-- ****************Start Accounting****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Accounting - [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html UK Grid Metrics] [[HEPSPEC06]] [http://tinyurl.com/cfbfanf Atlas Dashboard HS06]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 20th October'''<br />
The WLCG MB decided to create a Benchmarking Task force led by Helge Meinhard see [https://indico.cern.ch/event/433303/contribution/9/attachments/1154494/1658853/2015-09-15-LCGMB-Benchmarking.pdf talk]<br />
<br />
'''Tuesday 22nd September'''<br />
* Slight delay for Sheffield but overall okay - although there is a gap between today's date and the most recent update for all sites. Perhaps an APEL delay.<br />
<br />
'''Monday 20th July'''<br />
* Oxford publishing 0 cores from Cream today. Maybe they forgot to switch one off. [http://goc-accounting.grid-support.ac.uk/apel/jobs2_withsubmithost.html Check here]. <br />
<br />
'''Tuesday 14th July'''<br />
* QMUL and Sheffield appear to be lagging with publishing by a week. <br />
* Please check your multicore publishing status (especially those sites mentioned in June).<br />
<br />
* A reminder to keep updating the [https://www.gridpp.ac.uk/wiki/HEPSPEC06 HEPSPEC06 tables].<br />
<br />
* [http://accounting.egi.eu/egi.php?SubRegion=1.67&query=normcpu&startYear=2014&startMonth=8&endYear=2014&endMonth=9&yRange=SITE&xRange=VO&voGroup=lhc&chart=GRBAR&scale=LIN&localJobs=onlygridjobs APEL status]: An issue at Sheffield?<br />
<br />
<!-- *****************Edit stop*********************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End Accounting****************** -----><br />
<!-- ****************Start Documentation****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Documentation - [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=area KeyDocs]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- *********************************************************** -----><br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 20th October, 2015'''<br />
<br />
Approved VOs document updated with temporary section for [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs#Approved_VOs_in_the_process_of_being_established LZ]<br />
<br />
'''Tuesday 29nd September'''<br />
Steve J: problems with voms server at fnal, voms.fnal.gov, have been detected; I will resolve them soon and may issue an update to Approved VOs, alerting sites with TB_SUPPORT should that occur. Approved VOs potentially affected are CDF, DZERO, LSST. Please do not act act yet.<br />
<br />
'''Tuesday 22nd September'''<br />
* Steve J is going to undertake some GridPP/documentation usability testing. <br />
<br />
'''Tuesday 18th August'''<br />
* Lydia's document - Setup a system to do data archiving using FTS3<br />
<br />
'''Tuesday 28th July'''<br />
* Ewan: /cvmfs/gridpp-vo help ... there's a lot of historical stuff on the GridPP wiki that makes it look a lot more complicated than it is now. We really should have a bit of a clear out at some point.<br />
<br />
'''Tuesday 23rd June'''<br />
* Reminder that documents need reviewing!<br />
<br />
<br />
'''General note'''<br />
<br />
See the [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=reviewed worst KeyDocs list] for documents needing review now and the names of the responsible people.<br />
<br />
|}<br />
<br />
<!-- ****************End Documentation****************** -----><br />
<!-- ****************Start Interop****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Interoperation - [https://wiki.egi.eu/wiki/Grid_Operations_Meetings EGI ops agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
''' Tuesday 20th October <br />
<br />
Meeting last week: https://wiki.egi.eu/wiki/Agenda-12-10-2015<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- *********************************************************** -----><br />
<br />
|}<br />
<!-- ****************End Interop****************** -----><br />
<!-- ****************Start Monitoring****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Monitoring - [https://www.gridpp.ac.uk/wiki/Links_Monitoring_pages Links] [http://grid-monitoring.cern.ch/mywlcg/ MyWLCG]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 16th June'''<br />
* F Melaccio & D Crooks decided to add a [https://www.gridpp.ac.uk/wiki/Monitoring_FAQs FAQs section] devoted to common monitoring issues under the monitoring page.<br />
* Feedback welcome. <br />
<br />
<br />
<br />
''' Tuesday 31st March<br />
<br />
* Glasgow Graphite/Grafana documentation: http://www.scotgrid.ac.uk/graphite/<br />
<br />
'''Monday 7th December<br />
<br />
* Meeting last Friday - agenda: https://indico.cern.ch/event/356853/ minutes: https://indico.cern.ch/event/356853/material/minutes/1.pdf<br />
* This was the wrap-up meeting of the consolidation TF; the mailing list will remain extant for a while yet.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Monitoring****************** -----><br />
<!-- ****************On-duty****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | On-duty - [https://operations-portal.in2p3.fr/dashboard Dashboard] [https://www.gridpp.ac.uk/wiki/ROD_rota ROD rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 20th October'''<br />
<br />
Lots of bits here and there, but no big pattern. Tickets about CE and storage problems open at several sites. QMUL notable as going on for a while, probably with some kind of configuration problem they're not identifying.<br />
<br />
<br />
'''Tuesday 6th October'''<br />
* With the exception of the dashboard getting really confused early in the week as the Nagios instances at Oxford and Lancaster came and went, it's been a fairly quiet week. There are four outstanding tickets:<br />
** Three for availability / reliabaility (Sussex, Liverpool and Lancaster).<br />
** One at Bristol for a GridFTP transfer problem.<br />
<br />
'''Tuesday 15th September'''<br />
* Generally quiet. QMUL have some grumblyness with the CEs. However, I understand much of this is caused by the batch farm being busy. There are low-availability tickets 'on hold' for Liverpool and UCL.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Rollout [http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html Status] [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions WLCG Baseline]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 15th September'''<br />
* [http://repository.egi.eu/2015/09/10/release-umd-3-13-3/ UMD 3.13.3 is available].<br />
<br />
'''Tuesday 12th May'''<br />
* MW Readiness WG meeting Wed May 6th at 4pm. Attended by Raul, Matt, Sam and Jeremy.<br />
<br />
'''Tuesday 17th March'''<br />
* Daniela has updated the [ https://www.gridpp.ac.uk/wiki/Staged_rollout_emi3 EMI-3 testing table]. Please check it is correct for your site. We want a clear view of where we are contributing.<br />
* There is a middleware readiness meeting this Wednesday. Would be good if a few site representatives joined.<br />
* Machine job features solution testing. Fed back that we will only commence tests if more documentation made available. This stops the HTC solution until after CHEP. Is there interest in testing other batch systems? Raul mentioned SLURM. There is also SGE and Torque.<br />
<br />
'''References'''<br />
<br />
* Staged Rollout pages (now separated into EMI1 & 2), and the page listing the deployed versions is extractable from the bdii, so they should all be reasonably up-to-date:<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout_emi2.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End On-duty****************** -----><br />
<!-- ****************Start Security****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Security - [http://www.gridpp.ac.uk/security/inchand/Incident.html Incident Procedure] [http://www.gridpp.ac.uk/security/policies/index.html Policies] [https://www.gridpp.ac.uk/wiki/SD_rota Rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
'''Monday 26th October'''<br />
* Updated IGTF distribution version [ https://dist.igtf.net/distribution/igtf/current/ 1.69 available].<br />
* Next UK Security Team meeting scheduled for 28th Oct.<br />
<br />
'''Tuesday 20th October'''<br />
* The IGTF has released a regular update to the trust anchor repository ([ https://rt.egi.eu/rt/Ticket/Display.html?id=9668 1.69]) - for distribution ON OR AFTER October 26t<br />
<br />
'''Tuesday 13th October'''<br />
* Nothing to report<br />
* Next UK Security Team meeting scheduled for 28th Oct.<br />
<br />
'''Monday 5th October'''<br />
* Updated IGTF distribution version 1.68 available - https://dist.igtf.net/distribution/igtf/current/<br />
* Update on incident broadcast EGI-20150925-01 relating to compromised systems in China. - The EGI, WLCG and VO security teams are continuing their investigations. Affected sites and users have been contacted and there is no present indication of further action needed by any site in the UK. However, as more information comes to light, additional updates may be made in the near future and sites are asked as always to read any updates carefully, taking actions as recommended.<br />
<br />
'''Tuesday 29th September'''<br />
* Incident broadcast EGI-20150925-01 relating to compromised systems in China.<br />
* UK security team meeting scheduled for 30th Sept. <br />
<br />
'''Monday 29th September'''<br />
* IGTF has released a regular update to the trust anchor repository (1.68) - for distribution ON OR AFTER October 5th<br />
<br />
<br />
The EGI [https://operations-portal.egi.eu/csiDashboard security dashboard].<br />
<br />
|}<br />
<!-- ****************End Security****************** -----><br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start Services****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Services - [https://maddash.aglt2.org/maddash-webui/ PerfSonar dashboard] | [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).<br />
<br />
'''Tuesday 6th October'''<br />
* A reminder, the next [https://indico.cern.ch/event/401680/ LHCOPN and LHCONE joint meeting]: Science Park Amsterdam (NL) 28-29 of October 2015<br />
<br />
'''Tuesday 14th July'''<br />
* GridPP35 in September will have a part focus on networking and IPv6. This will include a review of where sites are with their deployment. Please try to firm up dates for your IPv6 availability between now and September. Please update the [https://gridpp.ac.uk/wiki/IPv6_site_status GridPP IPv6 status table]. <br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Services****************** -----><br />
<!-- ****************Start Tickets****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tickets<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Monday 2nd November 2015, 13.30 GMT'''<br /><br />
22 Open UK Tickets this week. First Monday of the Month, so all the tickets get looked at, however run of the mill they are.<br />
<br />
First, the link to all the UK [http://tinyurl.com/nwgrnys '''tickets'''].<br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116915 116915] (14/10)<br /><br />
Low availability Ops ticket. On holded whilst the numbers sooth themselves. On Hold (23/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116865 116865] (12/10)<br /><br />
Sno+ job submission failures. Not much on this ticket since it was set In Progress. Looks like an argus problem. How goes things at Sussex before Matt RB moves on? (We'll miss you Matt!). In progress (20/10)<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117261 117261] (28/10)<br /><br />
Atlas jobs failing with stage out failures. Federico notices that the failures are due to odd errors - "file already existing", and that things seem to be calming themselves. He's at a loss of what RALPP can do. Checking the panda link suggests the errors are still there today. Waiting for reply (29/10)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116775 116775] (6/10)<br /><br />
Bristol's CMS glexec ticket. It looks like the solution is to have more cms pool accounts (which of course requires time to deploy). In progress (28/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117303 117303] (30/10)<br /><br />
CMS, not Highlander fans, don't seem to believe that There can be only One (glexec ticket). Poor old Bristol seem to be playing whack-a-mole with duplicate tickets. Is there a note that can be left somewhere to stop this happening? Assigned (30/10)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=95303 95303] (Long long ago)<br /><br />
Edinburgh's (and indeed Scotgrid's) only ticket is this tarball glexec ticket. A bit more on this later. On hold (18/5)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=114460 114460] (18/6)<br /><br />
Gridpp (and other) VO pilot roles at Sheffield. No news for a while, snoplus are trying to use pilot roles now for dirac so this is becoming very relevant. In progress (9/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116560 116560] (30/9)<br /><br />
Sno+ jobs failing, likely due to too many being submitted to the 10 slots that Sno+ has. Maybe a WMS scheduling problem - Stephen B has given advice. Elena asked if the problem persisted a few weeks ago. Waiting for reply (12/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116967 116967] (17/10)<br /><br />
A ROD availability ticket, on hold as per SOP. On hold (20/10)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116478 116478] (28/9)<br /><br />
Another availability ticket. Autumn was not kind to many of us! On hold (8/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116882 116882] (13/10)<br /><br />
Enabling pilot snoplus users at Lancaster. Shouldn't have been a problem, but turned into a bit of a comedy/tragedy of errors by yours truly mucking up. Hopefully fixed now- thanks to Daniela for her patience. In progress (2/11)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=95299 95299] (Far far away)<br /><br />
glexec tarball ticket. There's been a lot of communication with the glexec devs about this - the hopefully last hurdle is sorting out the RPATHs for the libraries. It's not a small hurdle though... On hold (2/11)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117151 117151](23/10)<br /><br />
A ticket about jumbo frame problems, submitted to QM. After Dan provided some education the user replied, in that he only sees this problem at two atlas sites. But he is contacting the network admins at his institution to see if it is their end. On hold (29/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117011 117011] (19/10)<br /><br />
ROD ticket for glue-validate errors. Went away for a while after Dan re-yaimed his site bdii, but possibly back again. Daniela suggests re-running the glue-validate test. Reopened (2/11)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116689 116689] (6/10)<br /><br />
Another ROD ticket, where Ops glexec test jobs are seemingly timing out for QM (this is the ticket Daniela mentioned on the ops mailing list). Dan noted that with the cluster half full tests were passing, suggesting some kind of load correlation (but as he also notes - what's getting loaded and causing the problem - Batch, CE or WNs?). Kashif reckons the argus server, and suggests a handy glexec time test which he posted. In progress (2/11)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117324 117324] (2/11)<br /><br />
A fresh looking ROD ticket - Raul had to restart the BDII and hopefully that got it. In progress (2/11)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116358 116358] (22/9)<br /><br />
Missing Image at 100IT. 100IT have asked for more details, no news since. Waiting for reply (19/10)<br />
<br />
'''THE TIER 1'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116866 116866] (12/10)<br /><br />
Snoplus pilot enablement (not actually a word) at the Tier 1. New accounts were being requested after some internal discussion. On hold (19/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116864 116864] (12/10)<br /><br />
CMS AAA tests failing (the submitter notes "again..."). There are some oddities with other sites, which might be remote problems, but Andrew notes that previous manual fixes have been overwritten which likely explains why problems came back. In progress (does it need to be waiting for a reply?) (26/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117171 117171] (24/10)<br /><br />
LHCB had problems with an arc CE that was misbehaving for everyone. Things were fixed, and this ticket can now be closed. Waiting for reply (can be closed) (27/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117277 117277] (30/10)<br /><br />
Atlas have spotted "bring online timeout has been exceeded). This appears to be a mixture of problems adding up, such as a number of borken disk nodes and heavy write access by atlas. In progress (2/11)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117248 117248] (28/10)<br /><br />
I believe related to the discussion on tb-support, this ticket requests that new SRM host certs that meet the requirements specified be requested for the RAL SRMs. Jens was on it, and the new certs are ready to be deployed. In progress (30/10)<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios'''] - some badness at Sussex, but they have a ticket open for that.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tickets****************** -----><br />
<!-- ****************Start Tools****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tools - [https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] [https://gridppnagios.physics.ox.ac.uk/nagios/ Nagios]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== ===== <br />
<!-- ******************Edit start********************* -----><br />
''' Tuesday 6 Oct 2015'''<br />
<br />
Moved Gridppnagios instance back to Oxford from Lancaster. It was kind of double whammy as both sites went down together. Fortunately Oxford site was partially working so we managed to start SAM Nagios at Oxford. Sam tests were unavailable for few hours but no affect on egi availibilty/reliability. Sites can have a look at https://mon.egi.eu/myegi/ss/ for a/r status. <br />
<br />
'''Tuesday 29 Sep 2015'''<br />
<br />
Following an air-conditioning problem at machine room in Oxford Tier-2 site on 26 September, gridppnagios(OX) was shut down and gridppnagios(Lancs) became active instance. Oxford site is in downtime until 1st Oct and it may be extended depending on the situation. <br />
VO-Nagios was also unavailable for two days but we have started it yesterday as it is running on a VM. VO-nagios is using oxford SE for replication test so it is failing those tests. I am looking to change to some other SE. <br />
<br />
'''Tuesday 09 June 2015'''<br />
*ARC CEs were failing nagios test becuase of non-availability of egi repository. Nagios test compare CA version from EGI repo. It started on 5th June and one of the IP addresses behind webserver was not responding. Problem went away in approximately 3 hours. The same problem started again on 6th June. Finally it was fixed on 8th June. No reason was given in any of the ticket opened regarding this outage. <br />
<br />
'''Tuesday 17th February'''<br />
* Another period where message brokers were temporarily unavailable seen yesterday. Any news on the last follow-up?<br />
<br />
'''Tuesday 27th January'''<br />
* Unscheduled outage of the EGI message broker (GRNET) caused a short-lived disruption to GridPP site monitoring (jobs failed) last Thursday 22nd January. Suspect BDII caching meant no immediate failover to stomp://mq.cro-ngi.hr:6163/ from stomp://mq.afroditi.hellasgrid.gr:6163/<br />
<br />
* [http://southgrid.blogspot.co.uk/2014/10/nagios-monitoring-for-non-lhc-vos.html Blog about VO Nagios]<br />
* [https://vo-nagios.physics.ox.ac.uk/nagios/ Oxford VO Nagios] currently monitoring gridpp, pheno, t2k.org, snoplus.snolab.ca, vo.southgrid.ac.uk.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tools****************** -----><br />
<!-- ****************Start VOs****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | VOs - [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS] [http://operations-portal.egi.eu/vo VO IDs] [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs Approved] [http://pprc.qmul.ac.uk/~walker/votable.html VO table]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 19th May'''<br />
* There is a current priority for enabling/supporting our joining communities. <br />
<br />
'''Tuesday 5th May'''<br />
* We have a number of VOs to be removed. Dedicated follow-up meeting proposed.<br />
<br />
'''Tuesday 28th April'''<br />
* For SNOPLUS.SNOLAB.CA, the port numbers for voms02.gridpp.ac.uk and voms03.gridpp.ac.uk have both been updated from 15003 to 15503.<br />
<br />
'''Tuesday 31st March'''<br />
* LIGO are in need of additional support for debugging some tests.<br />
* LSST now enabled on 3 sites. No 'own' CVMFS yet.<br />
<br />
* Impact<br />
** Citation policy (https://www.gridpp.ac.uk/acknowledging.html)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End VOs****************** -----><br />
<!-- ****************Start Sites****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Site Updates<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 24th February'''<br />
* Next review of status today.<br />
<br />
'''Tuesday 27th January'''<br />
* Squids not in GOCDB for: UCL; ECDF; Birmingham; Durham; RHUL; IC; Sussex; Lancaster<br />
* Squids in GOCDB for: EFDA-JET; Manchester; Liverpool; Cambridge; Sheffield; Bristol; Brunel; QMUL; T1; Oxford; Glasgow; RALPPD.<br />
<br />
'''Tuesday 2nd December''' <br />
* [https://www.gridpp.ac.uk/wiki/Batch_system_status Multicore status]. Queues available (63%)<br />
** YES: RAL T1; Brunel; Imperial; QMUL; Lancaster; Liverpool; Manchester; Glasgow; Cambridge; Oxford; RALPP; Sussex (12)<br />
** NO: RHUL (testing); UCL; Sheffield (testing); Durham; ECDF (testing); Birmingham; Bristol (7)<br />
<br />
* According to our [https://www.gridpp.ac.uk/wiki/Batch_system_status table] for cloud/VMs (26%)<br />
** YES: RAL T1; Brunel; Imperial; Manchester; Oxford (5)<br />
** NO: QMUL; RHUL; UCL; Lancaster; Liverpool; Sheffield; Durham; ECDF; Glasgow; Birmingham; Bristol; Cambridge; RALPP; Sussex (14)<br />
<br />
* [http://www.gridpp.ac.uk/php/gridpp-dirac-sam.php?action=viewlcg GridPP DIRAC jobs] successful (58%)<br />
** YES: Bristol; Glasgow; Lancaster; Liverpool; Manchester; Oxford; Sheffield; Brunel; IC; QMUL; RHUL (11)<br />
** NO: Cambridge; Durham; RALPP; RAL T1 (4) + ECDF; Sussex; UCL; Birmingham (4)<br />
<br />
* [https://www.gridpp.ac.uk/wiki/IPv6_site_status IPv6 status]<br />
** Allocation - 42%<br />
** YES: RAL T1; Brunel; IC; QMUL; Manchester; Sheffield; Cambridge; Oxford (8)<br />
** NO: RHUL; UCL; Lancaster; Liverpool; Durham; ECDF; Glasgow; Birmingham; Bristol; RALPP; Sussex<br />
<br />
* Dual stack nodes - 21%<br />
** YES: Brunel; IC; QMUL; Oxford (4)<br />
** NO: RHUL; UCL; Lancaster; Glasgow; Liverpool; Manchester; Sheffield; Durham; ECDF; Birmingham; Bristol; Cambridge; RALPP; Sussex, RAL T1 (15)<br />
<br />
<br />
<br />
'''Tuesday 21st October'''<br />
* High loads seen in xroot by several sites: Liverpool and RALT1... and also Bristol (see Luke's TB-S email on 16/10 for questions about changes to help).<br />
<br />
'''Tuesday 9th September'''<br />
* Intel announced the new generation of Xeon based on Haswell.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Sites****************** -----><br />
|}<br />
<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Meeting Summaries<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start PMB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Project Management Board - [http://www.gridpp.ac.uk/pmb/ Members][http://www.gridpp.ac.uk/php/pmb/minutes.php Minutes] [http://www.gridpp.ac.uk/pmb/ProjectManagement/QuarterlyReports/reports.html Quarterly Reports]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ******************************************* -----><br />
<!-- ****************End PMB****************** -----><br />
<!-- ****************Start Grid Ops****************** -----><br />
<!-- ******************************************* -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | GridPP ops meeting - [http://indico.cern.ch/categoryDisplay.py?categId=338 Agendas] [https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Actions] [https://www.gridpp.ac.uk/wiki/Category:GridPP_Operations Core Tasks]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Grid Ops****************** -----><br />
<!-- ****************Start T1 liaison****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) [https://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Agenda] Meeting takes place on Vidyo.<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
Highlights from this meeting are now included in the Tier1 report farther up this page.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Liaison****************** -----><br />
<!-- ****************Start GDB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Grid Deployment Board - [http://indico.cern.ch/categoryDisplay.py?categId=3l181 Agendas] [http://indico.cern.ch/categoryDisplay.py?categId=666 MB agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End GDB****************** -----><br />
<br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start NGI****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | NGI UK - [http://www.ukngi.ac.uk/ Homepage] [https://ca.grid-support.ac.uk/cgi-bin/pub/pki?cmd=getStaticPage&name=index CA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End NGI****************** -----><br />
<br />
<!-- ****************Start Events****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Events<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************Stop Events****************** -----><br />
<!-- ****************Start ATLAS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK ATLAS - [http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?#currentView=Shifter+view&highlight=false Shifter view] [http://www.atlas.ac.uk/ops/ News & Links]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Atlas S&C week 2-6 Feb 2015'''<br />
<br />
Production<br />
<br />
• Prodsys-2 in production since Dec 1st<br />
<br />
• Deployment has not been transparent , many issued has been solved, the grid is filled again <br />
<br />
• MC15 is expected to start soon, waiting for physics validations, evgen testing is underway and close to finalised.. Simulation expected to be broadly similar to MC14, no blockers expected. <br />
<br />
Rucio<br />
<br />
• Rucio in production since Dec 1st and is ready for LHC RUN-2. Some fields need improvements, including transfer and deletion agents, documentation and monitoring. <br />
<br />
• [https://rucio-ui.cern.ch/dumps Rucio dumps available].<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMDarkDataCleaning Dark data cleaning]<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMLostFilesLost files declaration] . Only Only DDM ops can issue lost files declaration for now, cloud support needs to fill a ticket.<br />
<br />
• Webdav panda functional tests with Hammercloud are ongoing<br />
<br />
Monitoring<br />
<br />
• [http://adc-monitoring.cern.ch/ Main page]<br />
<br />
• [http://dashb-atlas-ddm-acc.cern.ch/dashboard/request.py/ddmaccounting DDM Accounting]<br />
<br />
• [http://atlas-agis.cern.ch/agis/ddmblacklisting/listGroup space]<br />
<br />
• [http://dashb-atlas-ddm.cern.ch/ddm2/ Deletion]<br />
<br />
ASAP<br />
<br />
• ASAP (ATLAS Site Availability Performance) in place. Every 3 months the T2s sites performing BELOW 80% are [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/ATLASSiteCategorization reported to the International Computing Board.]<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End ATLAS****************** -----><br />
<!-- ****************Start CMS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK CMS<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End CMS****************** -----><br />
<!-- ****************Start LHCb****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK LHCb<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End LHCb****************** -----><br />
<!-- ****************Start Other****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK OTHER<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Other****************** -----><br />
<!-- ****************Start Requests****************** -----><br />
{| cellspacing="0" cellpadding="0" style="margin: 1em 0em 0em 0.3em; width:100%"<br />
| style="width:50%; vertical-align:top; border:1px solid Gold; background-color: LightGreen;" rowspan="2"|<br />
<div style="border-bottom:1px solid Gold; background-color:#ffffaa; padding:0.2em 0.5em 0.2em 0.5em; font-size:110%; font-weight:bold;"> To note</div><br />
<div style="padding:0.4em 1em 0.3em 1em;"><br />
<br />
==== ====<br />
<!-- ******************Edit start********************* -----><br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<br />
</div><br />
|}<br />
<br />
<!-- ****************End Requests****************** -----><br />
<br />
|}</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Operations_Bulletin_LatestOperations Bulletin Latest2015-11-02T17:56:01Z<p>Matthew Doidge 1ac9bd3994: /* */</p>
<hr />
<div>[[Operations_Bulletin_Archive|Bulletin archive]]<br />
<br />
__NOTOC__<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Week commencing 2nd November 2015<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Task Areas<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start General****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | General updates <br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start General text*********************** ----->'''<br />
<br />
'''Tuesday 27th October'''<br />
* HEPiX took place last week: [https://indico.cern.ch/event/384358/timetable/#all.detailed Agenda].<br />
* pheno: VO ID card for SW_DIR. For contacts check [http://operations-portal.egi.eu/vo/help this link].<br />
* [https://indico.cern.ch/e/453272 GridPP Technical Meeting] on Friday "virtual only". <br />
* The latest WLCG [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGDailyMeetingsWeek151026#Monday ops meeting minutes from Monday are available]. <br />
* There was a UK-T0 meeting last week. The other community talks may be of interest, they are linked from the [https://eventbooking.stfc.ac.uk/news-events/uk-t0-workshop-296?agenda=1 agenda].<br />
* Current GridPP vacancies (+ putting them on the GridPP website).<br />
* David Crooks is organising a "SOC" meeting. <br />
<br />
* Raja raised: "ARC CE publishing" and querying the BDII. <br />
* Luke asked about HTC CE documentation links (for European installation).<br />
* John H asked for comments on a dmlite message after update to dpm 1.8.10. <br />
<br />
'''Tuesday 13th October'''<br />
* Krishan mentions lcg-tags errors problems at EFDA-JET. <br />
* BDII installations affected by slapd crash. SL6/CentOS6 (openldap-servers-2.4.40-6) released as a security update, unfortunately from openldap-servers-2.4.40-5 an issue has been introduced which provokes the slapd process to crash under certain conditions.<br />
* Daniela notes: Location of archiving records on ARC-CEs... Beware the jobreport_options of "/var/run" which clears records on reboot.<br />
<br />
<br />
'''Tuesday 6th October'''<br />
* GOCDB has received a new service type request for ‘uk.ac.gridpp.vcycle’.<br />
* John H noted a "CVMFS problem at RAL". Apparently this was due to a misconfiguration at CERN.<br />
* With HPC DIRAC work RAL discovered a bug in how the re-use connection flag is used with the fts commands.<br />
* Here is a summary for the [http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/ September reports]:<br />
** [http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/WLCG_All_Sites_ALICE_Sep2015.pdf ALICE]: All okay.<br />
** [http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/WLCG_All_Sites_ATLAS_Sep2015.pdf ATLAS]: Lancaster: 38%:42% & Liverpool: 81%: 100%.<br />
** [http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/WLCG_All_Sites_CMS_Sep2015.pdf CMS]: All okay.<br />
** [ http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/WLCG_All_Sites_LHCB_Sep2015.pdf LHCb]: QMUL: 47%:47% ; Lancaster: 75%: 80% & ECDF: 89%<br />
* The UCL storage servers are no longer used by ATLAS.<br />
* The November GDB has been moved to 4th November.<br />
<br />
<!-- **********************End General text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End General****************** -----><br />
<!-- ****************Start ops coord****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Operations Coordination - [https://indico.cern.ch/categoryDisplay.py?categId=4372 Agendas][https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsCoordination Wiki Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start ops coord text*********************** -----><br />
<br />
'''Tuesday 27th October'''<br />
* 13th MW Readiness WG meeting THIS Wed 28/10 @ 4pm CET in CERN room 28-S-023 or via vidyo.<br />
* There was a WLCG ops coordination meeting last Thursday: [https://indico.cern.ch/event/393618/ Agenda] - [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151022 Minutes]. <br />
<br />
'''Next Meeting is scheduled for Thursday 22nd October'''<br />
<br />
'''Tuesday 6th October'''<br />
* There was a WLCG ops coordination meeting last week. [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151001 Minutes]. [https://indico.cern.ch/event/393617/ Agenda] (which has John Gordon's accounting slides).<br />
* The highlights:<br />
** dCache sites should install the latest fix for SRM solving a vulnerability<br />
** All sites hosting a regional or local site xrootd should updgrade it at least to version 4.1.1<br />
** CMS DPM sites should consider upgrading dpm-xrootd to version 3.5.5 now (from epel-testing) or after mid October (from epel-stable) to fix a problem affecting AAA<br />
** Tier-1 sites should do their best to avoid scheduling OUTAGE downtimes at the same time as other Tier-1's supporting common LHC VOs. A calendar will be linked in the minutes of the 3 o'clock operations meeting to easily find out if there are already downtimes at a given date<br />
** The multicore accounting for WLCG is now correct for the 99.5% of the CPU time, with the few remaining issues being addressed. Corrected historical accounting data is expected to be available from the production portal by the end of the month<br />
** All LHCb sites will soon be asked to deploy the "machine features" functionality<br />
<br />
'''Tuesday 22nd September'''<br />
* There was an ops coordination meeting last Thursday: [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150917 Minutes].<br />
* Highlights:<br />
** All 4 experiments have now an agreed workflow with the T0 for tickets that should be handled by the experiment supporters and were accidentally assigned to the T0 service managers.<br />
** A new FTS3 bug fixing release 3.3.1 is now available.<br />
** A globus lib issue is causing problems with FTS3 for sites running IPv6.<br />
** The rogue Glasgow configuration management tool replacing the current configuration for VOMS with the old one was picked up and unfortunately discussed as though sites had not got the message about using the new VOMS.<br />
** No network problems experienced with the transatlantic link despite 3 out of 4 cables being unavailable.<br />
** T0 experts are investigating the slow WN performance reported by LHCb and others.<br />
** A group of experts at CERN and CMS investigate ARGUS authentication problems affecting CMS VOBOXes.<br />
** T1 & T2 sites please observe the actions requested by ATLAS and CMS (also on the WLCG Operations portal).<br />
* Actions for [https://wlcg-ops.web.cern.ch/sys-admins/tasks Sites]; [https://wlcg-ops.web.cern.ch/experiments/tasks Experiments].<br />
<br />
'''Tuesday 15th September'''<br />
* The [https://indico.cern.ch/event/393616/ next WLCG ops coordination meeting is this Thursday 17th September]. Are there any Tier-2 issues we wish to raise? Minutes will appear [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150917 here]. <br />
<br />
<br />
<!-- **********************End ops coord text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End ops coord****************** -----><br />
<br />
<!-- ****************Start T1****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-1 - [http://www.gridpp.rl.ac.uk/status/ Status Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start T1 text*********************** -----><br />
'''Tuesday 27th October'''<br />
A reminder that there is a weekly [http://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Tier-1 experiment liaison meeting]. Notes from the last meeting [https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2015-10-21 here]<br />
* There is an issue on the WMS machines where a user was creating very large output files and filling up the disk partition.<br />
* We had two disk server failures over the weekend (Both part of Atlas Tape).<br />
* Also over the weekend, there was a problem with one of the arc-ces which we believe was due to problems with the Hyper_Visor it was running on. <br />
* We are investigating why LHCB batch jobs sometimes fail to write results back to Castor (and the sometimes fail to write remotely as well).<br />
* As reported this last couple of weeks - we did have a problem with glexec for the worker nodes over a weekend. We are trying to understand why this problem was not seen during the testing and roll-out of the new worker node configuration.<br />
<!-- **********************End T1 text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End T1****************** -----><br />
<br />
<!-- ****************Start Storage & DM****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Storage & Data Management - [http://storage.esc.rl.ac.uk/weekly/ Agendas/Minutes]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150902-minutes.txt Wednesday 02 Sep]'''<br />
* Catch up with [http://www3.imperial.ac.uk/highenergyphysics/research/experiments/mice MICE]<br />
* How to do transfers of '''lots''' of files with FTS3 without the proxy timing out (in particularly if you need it vomsified)<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150812-minutes.txt Wednesday 12 Aug]'''<br />
* sort of housekeeping: data cleanups, catalogue synchronisation - in particular namespace dumps for VOs<br />
* GridPP storage/data at future events; GridPP35 and Hepix and Cloud data events<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150708-minutes.txt Wednesday 08 July]'''<br />
* Huge backlog of ATLAS data from Glasgow waiting to go to RAL, and oddly varying performance numbers - investigating<br />
* How physics data is like your Windows 95 games<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150701-minutes.txt Wednesday 01 July]'''<br />
* Feedback on CMS's proposal for listing contents of storage<br />
* Simple storage on expensive raided disks vs complicated storage on el cheapo or archive drives?<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150624-minutes.txt Wednesday 24 June]'''<br />
* Heard about the Indigo datacloud project, a H2020 project in which STFC is participating<br />
* Data transfers, theory and practice<br />
** Somewhat clunky tools to set up but perform well when they run<br />
** Will continue to work on recommendations/overview document<br />
** Worth having recommendations/experiences for different audiences - (potential) users, decision makers, techies<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End Storage & DM****************** -----><br />
<!-- ****************Start T2EVO****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-2 Evolution - [https://its.cern.ch/jira/browse/GRIDPP/ GridPP JIRA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 20 Oct'''<br />
* LHCb multipilot VMs now in production<br />
* Support for APEL-Sync records in Vac, but need to co-ordinate with APEL team to validate it. This is to allow pure-VM sites like UCL to pass APEL SAM tests (GRIDPP-10)<br />
* Last GridPP Technical Meeting decided to test disk-less operation at Oxford for CMS (GRIDPP-20) and LHCb (GRIDPP-21).<br />
<br />
'''Tuesday 6 Oct'''<br />
* UCL Vac site now running LHCb test of two payloads per dual processor VM. Total of dual processor VMs at UCL now 120.<br />
<br />
'''Tuesday 29 Sep'''<br />
* UCL Vac site updated with most recent version of Vac-in-a-Box. Now running ~216 jobs: LHCb MC and ATLAS certification jobs.<br />
* Drawing up list of tasks needed to be able to run a site for GridPP-supported VOs purely using VMs (e.g. VM certification by experiments etc.)<br />
* Discussion at GridPP Technical Meeting on storage options, including xrootd-based sites (i.e. xrootd not DPM/dCache)<br />
<br />
'''Tuesday 22 Sep'''<br />
* Fortnightly [https://indico.cern.ch/category/4454/ GridPP Technical Meetings] on Fridays will have Tier-2 Evolution discussions, starting on Fri 25 Sept.<br />
<br />
'''Thursday 17 Sep'''<br />
* Task force to start developing advice for sites to simplify their operation in line with "6.2.5 Evolution of Tier-2 sites" in the GridPP5 proposal.<br />
* Mailing list for Tier-2 evolution activities: gridpp-t2evo@cern.ch - anyone welcome to join<br />
* Also [https://its.cern.ch/jira/browse/GRIDPP/ GridPP project] on the CERN JIRA service for tracking actions. Can be used with a full or lightweight CERN account. You need to be added manually or on the gridpp-ops@cern.ch mailing list to browse issues.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End T2EVO****************** -----><br />
<!-- ****************Start Accounting****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Accounting - [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html UK Grid Metrics] [[HEPSPEC06]] [http://tinyurl.com/cfbfanf Atlas Dashboard HS06]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 20th October'''<br />
The WLCG MB decided to create a Benchmarking Task force led by Helge Meinhard see [https://indico.cern.ch/event/433303/contribution/9/attachments/1154494/1658853/2015-09-15-LCGMB-Benchmarking.pdf talk]<br />
<br />
'''Tuesday 22nd September'''<br />
* Slight delay for Sheffield but overall okay - although there is a gap between today's date and the most recent update for all sites. Perhaps an APEL delay.<br />
<br />
'''Monday 20th July'''<br />
* Oxford publishing 0 cores from Cream today. Maybe they forgot to switch one off. [http://goc-accounting.grid-support.ac.uk/apel/jobs2_withsubmithost.html Check here]. <br />
<br />
'''Tuesday 14th July'''<br />
* QMUL and Sheffield appear to be lagging with publishing by a week. <br />
* Please check your multicore publishing status (especially those sites mentioned in June).<br />
<br />
* A reminder to keep updating the [https://www.gridpp.ac.uk/wiki/HEPSPEC06 HEPSPEC06 tables].<br />
<br />
* [http://accounting.egi.eu/egi.php?SubRegion=1.67&query=normcpu&startYear=2014&startMonth=8&endYear=2014&endMonth=9&yRange=SITE&xRange=VO&voGroup=lhc&chart=GRBAR&scale=LIN&localJobs=onlygridjobs APEL status]: An issue at Sheffield?<br />
<br />
<!-- *****************Edit stop*********************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End Accounting****************** -----><br />
<!-- ****************Start Documentation****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Documentation - [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=area KeyDocs]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- *********************************************************** -----><br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 20th October, 2015'''<br />
<br />
Approved VOs document updated with temporary section for [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs#Approved_VOs_in_the_process_of_being_established LZ]<br />
<br />
'''Tuesday 29nd September'''<br />
Steve J: problems with voms server at fnal, voms.fnal.gov, have been detected; I will resolve them soon and may issue an update to Approved VOs, alerting sites with TB_SUPPORT should that occur. Approved VOs potentially affected are CDF, DZERO, LSST. Please do not act act yet.<br />
<br />
'''Tuesday 22nd September'''<br />
* Steve J is going to undertake some GridPP/documentation usability testing. <br />
<br />
'''Tuesday 18th August'''<br />
* Lydia's document - Setup a system to do data archiving using FTS3<br />
<br />
'''Tuesday 28th July'''<br />
* Ewan: /cvmfs/gridpp-vo help ... there's a lot of historical stuff on the GridPP wiki that makes it look a lot more complicated than it is now. We really should have a bit of a clear out at some point.<br />
<br />
'''Tuesday 23rd June'''<br />
* Reminder that documents need reviewing!<br />
<br />
<br />
'''General note'''<br />
<br />
See the [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=reviewed worst KeyDocs list] for documents needing review now and the names of the responsible people.<br />
<br />
|}<br />
<br />
<!-- ****************End Documentation****************** -----><br />
<!-- ****************Start Interop****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Interoperation - [https://wiki.egi.eu/wiki/Grid_Operations_Meetings EGI ops agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
''' Tuesday 20th October <br />
<br />
Meeting last week: https://wiki.egi.eu/wiki/Agenda-12-10-2015<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- *********************************************************** -----><br />
<br />
|}<br />
<!-- ****************End Interop****************** -----><br />
<!-- ****************Start Monitoring****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Monitoring - [https://www.gridpp.ac.uk/wiki/Links_Monitoring_pages Links] [http://grid-monitoring.cern.ch/mywlcg/ MyWLCG]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 16th June'''<br />
* F Melaccio & D Crooks decided to add a [https://www.gridpp.ac.uk/wiki/Monitoring_FAQs FAQs section] devoted to common monitoring issues under the monitoring page.<br />
* Feedback welcome. <br />
<br />
<br />
<br />
''' Tuesday 31st March<br />
<br />
* Glasgow Graphite/Grafana documentation: http://www.scotgrid.ac.uk/graphite/<br />
<br />
'''Monday 7th December<br />
<br />
* Meeting last Friday - agenda: https://indico.cern.ch/event/356853/ minutes: https://indico.cern.ch/event/356853/material/minutes/1.pdf<br />
* This was the wrap-up meeting of the consolidation TF; the mailing list will remain extant for a while yet.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Monitoring****************** -----><br />
<!-- ****************On-duty****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | On-duty - [https://operations-portal.in2p3.fr/dashboard Dashboard] [https://www.gridpp.ac.uk/wiki/ROD_rota ROD rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 20th October'''<br />
<br />
Lots of bits here and there, but no big pattern. Tickets about CE and storage problems open at several sites. QMUL notable as going on for a while, probably with some kind of configuration problem they're not identifying.<br />
<br />
<br />
'''Tuesday 6th October'''<br />
* With the exception of the dashboard getting really confused early in the week as the Nagios instances at Oxford and Lancaster came and went, it's been a fairly quiet week. There are four outstanding tickets:<br />
** Three for availability / reliabaility (Sussex, Liverpool and Lancaster).<br />
** One at Bristol for a GridFTP transfer problem.<br />
<br />
'''Tuesday 15th September'''<br />
* Generally quiet. QMUL have some grumblyness with the CEs. However, I understand much of this is caused by the batch farm being busy. There are low-availability tickets 'on hold' for Liverpool and UCL.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Rollout [http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html Status] [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions WLCG Baseline]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 15th September'''<br />
* [http://repository.egi.eu/2015/09/10/release-umd-3-13-3/ UMD 3.13.3 is available].<br />
<br />
'''Tuesday 12th May'''<br />
* MW Readiness WG meeting Wed May 6th at 4pm. Attended by Raul, Matt, Sam and Jeremy.<br />
<br />
'''Tuesday 17th March'''<br />
* Daniela has updated the [ https://www.gridpp.ac.uk/wiki/Staged_rollout_emi3 EMI-3 testing table]. Please check it is correct for your site. We want a clear view of where we are contributing.<br />
* There is a middleware readiness meeting this Wednesday. Would be good if a few site representatives joined.<br />
* Machine job features solution testing. Fed back that we will only commence tests if more documentation made available. This stops the HTC solution until after CHEP. Is there interest in testing other batch systems? Raul mentioned SLURM. There is also SGE and Torque.<br />
<br />
'''References'''<br />
<br />
* Staged Rollout pages (now separated into EMI1 & 2), and the page listing the deployed versions is extractable from the bdii, so they should all be reasonably up-to-date:<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout_emi2.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End On-duty****************** -----><br />
<!-- ****************Start Security****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Security - [http://www.gridpp.ac.uk/security/inchand/Incident.html Incident Procedure] [http://www.gridpp.ac.uk/security/policies/index.html Policies] [https://www.gridpp.ac.uk/wiki/SD_rota Rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
'''Monday 26th October'''<br />
* Updated IGTF distribution version [ https://dist.igtf.net/distribution/igtf/current/ 1.69 available].<br />
* Next UK Security Team meeting scheduled for 28th Oct.<br />
<br />
'''Tuesday 20th October'''<br />
* The IGTF has released a regular update to the trust anchor repository ([ https://rt.egi.eu/rt/Ticket/Display.html?id=9668 1.69]) - for distribution ON OR AFTER October 26t<br />
<br />
'''Tuesday 13th October'''<br />
* Nothing to report<br />
* Next UK Security Team meeting scheduled for 28th Oct.<br />
<br />
'''Monday 5th October'''<br />
* Updated IGTF distribution version 1.68 available - https://dist.igtf.net/distribution/igtf/current/<br />
* Update on incident broadcast EGI-20150925-01 relating to compromised systems in China. - The EGI, WLCG and VO security teams are continuing their investigations. Affected sites and users have been contacted and there is no present indication of further action needed by any site in the UK. However, as more information comes to light, additional updates may be made in the near future and sites are asked as always to read any updates carefully, taking actions as recommended.<br />
<br />
'''Tuesday 29th September'''<br />
* Incident broadcast EGI-20150925-01 relating to compromised systems in China.<br />
* UK security team meeting scheduled for 30th Sept. <br />
<br />
'''Monday 29th September'''<br />
* IGTF has released a regular update to the trust anchor repository (1.68) - for distribution ON OR AFTER October 5th<br />
<br />
<br />
The EGI [https://operations-portal.egi.eu/csiDashboard security dashboard].<br />
<br />
|}<br />
<!-- ****************End Security****************** -----><br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start Services****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Services - [https://maddash.aglt2.org/maddash-webui/ PerfSonar dashboard] | [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).<br />
<br />
'''Tuesday 6th October'''<br />
* A reminder, the next [https://indico.cern.ch/event/401680/ LHCOPN and LHCONE joint meeting]: Science Park Amsterdam (NL) 28-29 of October 2015<br />
<br />
'''Tuesday 14th July'''<br />
* GridPP35 in September will have a part focus on networking and IPv6. This will include a review of where sites are with their deployment. Please try to firm up dates for your IPv6 availability between now and September. Please update the [https://gridpp.ac.uk/wiki/IPv6_site_status GridPP IPv6 status table]. <br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Services****************** -----><br />
<!-- ****************Start Tickets****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tickets<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Monday 2nd November 2015, 13.30 GMT'''<br /><br />
22 Open UK Tickets this week. First Monday of the Month, so all the tickets get looked at, however run of the mill they are.<br />
<br />
First, the link to all the UK [http://tinyurl.com/nwgrnys '''tickets'''].<br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116915 116915] (14/10)<br /><br />
Low availability Ops ticket. On holded whilst the numbers sooth themselves. On Hold (23/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116865 116865] (12/10)<br /><br />
Sno+ job submission failures. Not much on this ticket since it was set In Progress. Looks like an argus problem. How goes things at Sussex before Matt RB moves on? (We'll miss you Matt!). In progress (20/10)<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117261 117261] (28/10)<br /><br />
Atlas jobs failing with stage out failures. Federico notices that the failures are due to odd errors - "file already existing", and that things seem to be calming themselves. He's at a loss of what RALPP can do. Checking the panda link suggests the errors are still there today. Waiting for reply (29/10)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116775 116775] (6/10)<br /><br />
Bristol's CMS glexec ticket. It looks like the solution is to have more cms pool accounts (which of course requires time to deploy). In progress (28/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117303 117303] (30/10)<br /><br />
CMS, not Highlander fans, don't seem to believe that There can be only One (glexec ticket). Poor old Bristol seem to be playing whack-a-mole with duplicate tickets. Is there a note that can be left somewhere to stop this happening? Assigned (30/10)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=95303 95303] (Long long ago)<br /><br />
Edinburgh's (and indeed Scotgrid's) only ticket is this tarball glexec ticket. A bit more on this later. On hold (18/5)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=114460 114460] (18/6)<br /><br />
Gridpp (and other) VO pilot roles at Sheffield. No news for a while, snoplus are trying to use pilot roles now for dirac so this is becoming very relevant. In progress (9/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116560 116560] (30/9)<br /><br />
Sno+ jobs failing, likely due to too many being submitted to the 10 slots that Sno+ has. Maybe a WMS scheduling problem - Stephen B has given advice. Elena asked if the problem persisted a few weeks ago. Waiting for reply (12/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116967 116967] (17/10)<br />
A ROD availability ticket, on hold as per SOP. On hold (20/10)<br /><br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116478 116478] (28/9)<br /><br />
Another availability ticket. Autumn was not kind to many of us! On hold (8/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116882 116882] (13/10)<br /><br />
Enabling pilot snoplus users at Lancaster. Shouldn't have been a problem, but turned into a bit of a comedy/tragedy of errors by yours truly mucking up. Hopefully fixed now- thanks to Daniela for her patience. In progress (2/11)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=95299 95299] (Far far away)<br /><br />
glexec tarball ticket. There's been a lot of communication with the glexec devs about this - the hopefully last hurdle is sorting out the RPATHs for the libraries. It's not a small hurdle though... On hold (2/11)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117151 117151](23/10)<br /><br />
A ticket about jumbo frame problems, submitted to QM. After Dan provided some education the user replied, in that he only sees this problem at two atlas sites. But he is contacting the network admins at his institution to see if it is their end. On hold (29/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117011 117011] (19/10)<br /><br />
ROD ticket for glue-validate errors. Went away for a while after Dan re-yaimed his site bdii, but possibly back again. Daniela suggests re-running the glue-validate test. Reopened (2/11)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116689 116689] (6/10)<br /><br />
Another ROD ticket, where Ops glexec test jobs are seemingly timing out for QM (this is the ticket Daniela mentioned on the ops mailing list). Dan noted that with the cluster half full tests were passing, suggesting some kind of load correlation (but as he also notes - what's getting loaded and causing the problem - Batch, CE or WNs?). Kashif reckons the argus server, and suggests a handy glexec time test which he posted. In progress (2/11)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117324 117324] (2/11)<br /><br />
A fresh looking ROD ticket - Raul had to restart the BDII and hopefully that got it. In progress (2/11)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116358 116358] (22/9)<br /><br />
Missing Image at 100IT. 100IT have asked for more details, no news since. Waiting for reply (19/10)<br />
<br />
'''THE TIER 1'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116866 116866] (12/10)<br /><br />
Snoplus pilot enablement (not actually a word) at the Tier 1. New accounts were being requested after some internal discussion. On hold (19/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116864 116864] (12/10)<br /><br />
CMS AAA tests failing (the submitter notes "again..."). There are some oddities with other sites, which might be remote problems, but Andrew notes that previous manual fixes have been overwritten which likely explains why problems came back. In progress (does it need to be waiting for a reply?) (26/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117171 117171] (24/10)<br /><br />
LHCB had problems with an arc CE that was misbehaving for everyone. Things were fixed, and this ticket can now be closed. Waiting for reply (can be closed) (27/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117277 117277] (30/10)<br /><br />
Atlas have spotted "bring online timeout has been exceeded). This appears to be a mixture of problems adding up, such as a number of borken disk nodes and heavy write access by atlas. In progress (2/11)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=117248 117248] (28/10)<br /><br />
I believe related to the discussion on tb-support, this ticket requests that new SRM host certs that meet the requirements specified be requested for the RAL SRMs. Jens was on it, and the new certs are ready to be deployed. In progress (30/10)<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios'''] - some badness at Sussex, but they have a ticket open for that.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tickets****************** -----><br />
<!-- ****************Start Tools****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tools - [https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] [https://gridppnagios.physics.ox.ac.uk/nagios/ Nagios]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== ===== <br />
<!-- ******************Edit start********************* -----><br />
''' Tuesday 6 Oct 2015'''<br />
<br />
Moved Gridppnagios instance back to Oxford from Lancaster. It was kind of double whammy as both sites went down together. Fortunately Oxford site was partially working so we managed to start SAM Nagios at Oxford. Sam tests were unavailable for few hours but no affect on egi availibilty/reliability. Sites can have a look at https://mon.egi.eu/myegi/ss/ for a/r status. <br />
<br />
'''Tuesday 29 Sep 2015'''<br />
<br />
Following an air-conditioning problem at machine room in Oxford Tier-2 site on 26 September, gridppnagios(OX) was shut down and gridppnagios(Lancs) became active instance. Oxford site is in downtime until 1st Oct and it may be extended depending on the situation. <br />
VO-Nagios was also unavailable for two days but we have started it yesterday as it is running on a VM. VO-nagios is using oxford SE for replication test so it is failing those tests. I am looking to change to some other SE. <br />
<br />
'''Tuesday 09 June 2015'''<br />
*ARC CEs were failing nagios test becuase of non-availability of egi repository. Nagios test compare CA version from EGI repo. It started on 5th June and one of the IP addresses behind webserver was not responding. Problem went away in approximately 3 hours. The same problem started again on 6th June. Finally it was fixed on 8th June. No reason was given in any of the ticket opened regarding this outage. <br />
<br />
'''Tuesday 17th February'''<br />
* Another period where message brokers were temporarily unavailable seen yesterday. Any news on the last follow-up?<br />
<br />
'''Tuesday 27th January'''<br />
* Unscheduled outage of the EGI message broker (GRNET) caused a short-lived disruption to GridPP site monitoring (jobs failed) last Thursday 22nd January. Suspect BDII caching meant no immediate failover to stomp://mq.cro-ngi.hr:6163/ from stomp://mq.afroditi.hellasgrid.gr:6163/<br />
<br />
* [http://southgrid.blogspot.co.uk/2014/10/nagios-monitoring-for-non-lhc-vos.html Blog about VO Nagios]<br />
* [https://vo-nagios.physics.ox.ac.uk/nagios/ Oxford VO Nagios] currently monitoring gridpp, pheno, t2k.org, snoplus.snolab.ca, vo.southgrid.ac.uk.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tools****************** -----><br />
<!-- ****************Start VOs****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | VOs - [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS] [http://operations-portal.egi.eu/vo VO IDs] [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs Approved] [http://pprc.qmul.ac.uk/~walker/votable.html VO table]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 19th May'''<br />
* There is a current priority for enabling/supporting our joining communities. <br />
<br />
'''Tuesday 5th May'''<br />
* We have a number of VOs to be removed. Dedicated follow-up meeting proposed.<br />
<br />
'''Tuesday 28th April'''<br />
* For SNOPLUS.SNOLAB.CA, the port numbers for voms02.gridpp.ac.uk and voms03.gridpp.ac.uk have both been updated from 15003 to 15503.<br />
<br />
'''Tuesday 31st March'''<br />
* LIGO are in need of additional support for debugging some tests.<br />
* LSST now enabled on 3 sites. No 'own' CVMFS yet.<br />
<br />
* Impact<br />
** Citation policy (https://www.gridpp.ac.uk/acknowledging.html)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End VOs****************** -----><br />
<!-- ****************Start Sites****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Site Updates<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 24th February'''<br />
* Next review of status today.<br />
<br />
'''Tuesday 27th January'''<br />
* Squids not in GOCDB for: UCL; ECDF; Birmingham; Durham; RHUL; IC; Sussex; Lancaster<br />
* Squids in GOCDB for: EFDA-JET; Manchester; Liverpool; Cambridge; Sheffield; Bristol; Brunel; QMUL; T1; Oxford; Glasgow; RALPPD.<br />
<br />
'''Tuesday 2nd December''' <br />
* [https://www.gridpp.ac.uk/wiki/Batch_system_status Multicore status]. Queues available (63%)<br />
** YES: RAL T1; Brunel; Imperial; QMUL; Lancaster; Liverpool; Manchester; Glasgow; Cambridge; Oxford; RALPP; Sussex (12)<br />
** NO: RHUL (testing); UCL; Sheffield (testing); Durham; ECDF (testing); Birmingham; Bristol (7)<br />
<br />
* According to our [https://www.gridpp.ac.uk/wiki/Batch_system_status table] for cloud/VMs (26%)<br />
** YES: RAL T1; Brunel; Imperial; Manchester; Oxford (5)<br />
** NO: QMUL; RHUL; UCL; Lancaster; Liverpool; Sheffield; Durham; ECDF; Glasgow; Birmingham; Bristol; Cambridge; RALPP; Sussex (14)<br />
<br />
* [http://www.gridpp.ac.uk/php/gridpp-dirac-sam.php?action=viewlcg GridPP DIRAC jobs] successful (58%)<br />
** YES: Bristol; Glasgow; Lancaster; Liverpool; Manchester; Oxford; Sheffield; Brunel; IC; QMUL; RHUL (11)<br />
** NO: Cambridge; Durham; RALPP; RAL T1 (4) + ECDF; Sussex; UCL; Birmingham (4)<br />
<br />
* [https://www.gridpp.ac.uk/wiki/IPv6_site_status IPv6 status]<br />
** Allocation - 42%<br />
** YES: RAL T1; Brunel; IC; QMUL; Manchester; Sheffield; Cambridge; Oxford (8)<br />
** NO: RHUL; UCL; Lancaster; Liverpool; Durham; ECDF; Glasgow; Birmingham; Bristol; RALPP; Sussex<br />
<br />
* Dual stack nodes - 21%<br />
** YES: Brunel; IC; QMUL; Oxford (4)<br />
** NO: RHUL; UCL; Lancaster; Glasgow; Liverpool; Manchester; Sheffield; Durham; ECDF; Birmingham; Bristol; Cambridge; RALPP; Sussex, RAL T1 (15)<br />
<br />
<br />
<br />
'''Tuesday 21st October'''<br />
* High loads seen in xroot by several sites: Liverpool and RALT1... and also Bristol (see Luke's TB-S email on 16/10 for questions about changes to help).<br />
<br />
'''Tuesday 9th September'''<br />
* Intel announced the new generation of Xeon based on Haswell.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Sites****************** -----><br />
|}<br />
<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Meeting Summaries<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start PMB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Project Management Board - [http://www.gridpp.ac.uk/pmb/ Members][http://www.gridpp.ac.uk/php/pmb/minutes.php Minutes] [http://www.gridpp.ac.uk/pmb/ProjectManagement/QuarterlyReports/reports.html Quarterly Reports]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ******************************************* -----><br />
<!-- ****************End PMB****************** -----><br />
<!-- ****************Start Grid Ops****************** -----><br />
<!-- ******************************************* -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | GridPP ops meeting - [http://indico.cern.ch/categoryDisplay.py?categId=338 Agendas] [https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Actions] [https://www.gridpp.ac.uk/wiki/Category:GridPP_Operations Core Tasks]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Grid Ops****************** -----><br />
<!-- ****************Start T1 liaison****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) [https://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Agenda] Meeting takes place on Vidyo.<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
Highlights from this meeting are now included in the Tier1 report farther up this page.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Liaison****************** -----><br />
<!-- ****************Start GDB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Grid Deployment Board - [http://indico.cern.ch/categoryDisplay.py?categId=3l181 Agendas] [http://indico.cern.ch/categoryDisplay.py?categId=666 MB agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End GDB****************** -----><br />
<br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start NGI****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | NGI UK - [http://www.ukngi.ac.uk/ Homepage] [https://ca.grid-support.ac.uk/cgi-bin/pub/pki?cmd=getStaticPage&name=index CA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End NGI****************** -----><br />
<br />
<!-- ****************Start Events****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Events<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************Stop Events****************** -----><br />
<!-- ****************Start ATLAS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK ATLAS - [http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?#currentView=Shifter+view&highlight=false Shifter view] [http://www.atlas.ac.uk/ops/ News & Links]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Atlas S&C week 2-6 Feb 2015'''<br />
<br />
Production<br />
<br />
• Prodsys-2 in production since Dec 1st<br />
<br />
• Deployment has not been transparent , many issued has been solved, the grid is filled again <br />
<br />
• MC15 is expected to start soon, waiting for physics validations, evgen testing is underway and close to finalised.. Simulation expected to be broadly similar to MC14, no blockers expected. <br />
<br />
Rucio<br />
<br />
• Rucio in production since Dec 1st and is ready for LHC RUN-2. Some fields need improvements, including transfer and deletion agents, documentation and monitoring. <br />
<br />
• [https://rucio-ui.cern.ch/dumps Rucio dumps available].<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMDarkDataCleaning Dark data cleaning]<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMLostFilesLost files declaration] . Only Only DDM ops can issue lost files declaration for now, cloud support needs to fill a ticket.<br />
<br />
• Webdav panda functional tests with Hammercloud are ongoing<br />
<br />
Monitoring<br />
<br />
• [http://adc-monitoring.cern.ch/ Main page]<br />
<br />
• [http://dashb-atlas-ddm-acc.cern.ch/dashboard/request.py/ddmaccounting DDM Accounting]<br />
<br />
• [http://atlas-agis.cern.ch/agis/ddmblacklisting/listGroup space]<br />
<br />
• [http://dashb-atlas-ddm.cern.ch/ddm2/ Deletion]<br />
<br />
ASAP<br />
<br />
• ASAP (ATLAS Site Availability Performance) in place. Every 3 months the T2s sites performing BELOW 80% are [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/ATLASSiteCategorization reported to the International Computing Board.]<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End ATLAS****************** -----><br />
<!-- ****************Start CMS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK CMS<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End CMS****************** -----><br />
<!-- ****************Start LHCb****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK LHCb<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End LHCb****************** -----><br />
<!-- ****************Start Other****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK OTHER<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Other****************** -----><br />
<!-- ****************Start Requests****************** -----><br />
{| cellspacing="0" cellpadding="0" style="margin: 1em 0em 0em 0.3em; width:100%"<br />
| style="width:50%; vertical-align:top; border:1px solid Gold; background-color: LightGreen;" rowspan="2"|<br />
<div style="border-bottom:1px solid Gold; background-color:#ffffaa; padding:0.2em 0.5em 0.2em 0.5em; font-size:110%; font-weight:bold;"> To note</div><br />
<div style="padding:0.4em 1em 0.3em 1em;"><br />
<br />
==== ====<br />
<!-- ******************Edit start********************* -----><br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<br />
</div><br />
|}<br />
<br />
<!-- ****************End Requests****************** -----><br />
<br />
|}</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/RelocatableGlexecRelocatableGlexec2015-11-02T14:59:14Z<p>Matthew Doidge 1ac9bd3994: </p>
<hr />
<div>'''Note:''' The title is misleading - due to security concerns glexec can't be truely relocatable, but it can be built to use a different binary and config path to the defaults, allowing the exporting and use of glexec in a tarball environment.<br />
<br />
=Building GLEXEC to suit your site's tarball needs=<br />
(with reference to [[EMITarball]])<br />
<br />
'''Work in Progress'''<br />
<br />
Please note that we are unable to support glexec directly within the tarball, for many reasons. Listed below is a possible method (still being tested) for a site to build their own relocatable glexec. A group of sites using the same convention for tarball mount points could share the same glexec build to lower the total workload.<br />
<br />
We welcome all feedback on the tickets listed below, or to the tarball support e-mail ( tarball-grid-support atSPAMNOT cern.ch ).<br />
<br />
==Acknowledgements and Further Reading==<br />
Please refer to the glexec web pages for more information:<br /><br />
https://wiki.nikhef.nl/grid/GLExec<br />
<br />
with particular thanks to the writers of:<br><br />
https://wiki.nikhef.nl/grid/Building_gLExec_from_src_rpm<br />
<br />
(the script I use is an updated version of the example given).<br />
<br />
and:<br />
https://wiki.nikhef.nl/grid/GLExec_Argus_Quick_Installation_Guide<br />
<br />
==Requirements==<br />
<br />
* A clean SL6 system, similar to the nodes that you will run on. It will need network connectivity.<br />
* gcc and rpm-build packages installed, as well as the glexec user that you will use on your cluster.<br />
* The script below, or one like it:<br />
<br />
#!/bin/sh<br />
<br />
# SET CUSTOM BUILD ARGUMENTS HERE<br />
<br />
<br />
# EMI and EPEL directories<br />
glexec_pfx=/opt/gridapps/glexec<br />
glexec_etc=/opt/gridapps/glexec/etc<br />
glexec_doc=/opt/gridapps/glexec/doc<br />
<br />
<br />
# END OF BUILD ARGUMENTS<br />
<br />
# Setup build infrastructure<br />
export TOPDIR=`pwd`<br />
mkdir -p $TOPDIR/{SRPMS,SOURCES,SPECS,BUILD,RPMS/x86_64,RPMS/i386}<br />
<br />
# Download and install lcmaps-interface and glexec src<br />
rpm2cpio http://software.nikhef.nl/dist/mwsec/rpm/epel6/x86_64/lcmaps-basic-interface-1.6.1-1.el6.noarch.rpm | cpio -id<br />
rpm --define "_topdir $TOPDIR" -i http://software.nikhef.nl/dist/mwsec/rpm/epel6/SRPMS/glexec-0.9.11-1.el6.src.rpm<br />
<br />
# Patch spec file to match module directories for LCAS and LCMAPS<br />
sed -i "s+^\(%configure\).*+\1 --with-lcmaps-moduledir-sfx=$lcmaps_moddir_sfx --with-lcas-moduledir-sfx=$lcas_moddir_sfx+" $TOPDIR/SPECS/glexec.spec<br />
<br />
# Build the RPM<br />
CFLAGS=-I$TOPDIR/usr/include rpmbuild \<br />
--nodeps \<br />
-ba --define "_topdir $TOPDIR" \<br />
--define "_prefix $glexec_pfx" \<br />
--define "_sysconfdir $glexec_etc" \<br />
--define "_defaultdocdir $glexec_doc" \<br />
$TOPDIR/SPECS/glexec.sp<br />
<br />
<br />
The important site variables are glexec prefix, which should be your tarball mount point (the glexec binary will be in $prefix/sbin). The glexec_etc variable should point to where the glexec.conf file will be kept. The two rpm urls should be checked before building to make sure they are current and point to the latest and greatest release.<br />
<br />
Once run this will give you an rpm to unpack in RPMS. You can do this with an:<br />
<br />
rpm2cpio RPMS/x86_64/$glexec_rpm | cpio -dim<br />
<br />
You will then probably need to do some directory pruning before you have something you can load into your shared area. The glexec.conf file will need to have its ownership and permissions changed, probably to glexec.glexec, 0400. The glexec/sbin directory will likely need to be put into your $PATH environment variable.<br />
<br />
'''If planning on using the (recommended) setuid mode you will need to export and mount your tarballs so that glexec's suid properties aren't squashed. To this end it is recommended that you consider exporting glexec in parallel to instead of on the same mount as the "regular" tarball.<br />
<br />
==Other Dependencies==<br />
Currently this isn't sufficient to get glexec working - one needs to have additional lcas dependencies "installed" for glexec to work. The list is currently thought to be:<br />
lcmaps<br />
lcmaps-plugins-basic<br />
lcmaps-plugins-c-pep<br />
lcmaps-plugins-tracking-groupid<br />
lcmaps-plugins-verify-proxy<br />
lcmaps-plugins-voms<br />
<br />
Our suggested place to install these is within the glexec path, and point glexec at them by editing the "lcas_libdir" and "lcmaps_libdir" variables, as well as possibly the "lcas_moduledir_sfx" and "lcas_moduledir_sfx" settings in the ''glexec.conf''.<br />
<br />
Update 15 October 2015:<br />
<br />
The list of needed libraries is expanding, requiring globus gsi libraries on top of lcas/lcmaps. We are attempting to keep the number of packages needed down.<br />
<br />
==Library Path Problems==<br />
''Update 2nd Nov 2015:'' For obvious reasons glexec does not respect the LD_LIBRARY_PATH environment variable. This leads to errors in execution when using glexec outside of the normal paths (as libraries fail to dynamically link).<br />
<br />
An easy fix to this is add into /etc/ld.so.conf.d/ a file called glexec.conf that contains the full path to the usr/lib64 directory. This is however not a very "tarball-y" or relocatable solution. With a lot of help from the glexec developers (who we are indebted to) we are working on a method to modify the necessary library's RPATHs.<br />
<br />
==Notes on glexec.conf settings==<br />
The lcas and lcmap _lidir variables are very particular, needing to be of the form of an absolute directory, i.e.:<br />
lcas_libdir = /opt/gridapps/glexec/usr/lib64<br />
lcas_moduledir_sfx = /lcas/<br />
lcmaps_libdir = /opt/gridapps/glexec/usr/lib64<br />
lcmaps_moduledir_sfx = /lcmaps/<br />
<br />
==glexec in cvmfs==<br />
<br />
With reference to the ticket [https://ggus.eu/index.php?mode=ticket_info&ticket_id=116154 116154] we are investigating making glexec available through cvmfs - although it is early days yet, and we '''cannot''' at this juncture recommend sites mounting cvmfs with suid enabled.<br />
<br />
==ggus ticket==<br />
Please also see the original glexec tarball ticket [https://ggus.eu/index.php?mode=ticket_info&ticket_id=95832 95832] (submitted by the tarball devs to themselves).</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Operations_Bulletin_LatestOperations Bulletin Latest2015-11-02T13:39:11Z<p>Matthew Doidge 1ac9bd3994: /* */</p>
<hr />
<div>[[Operations_Bulletin_Archive|Bulletin archive]]<br />
<br />
__NOTOC__<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Week commencing 2nd November 2015<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Task Areas<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start General****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | General updates <br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start General text*********************** ----->'''<br />
<br />
'''Tuesday 27th October'''<br />
* HEPiX took place last week: [https://indico.cern.ch/event/384358/timetable/#all.detailed Agenda].<br />
* pheno: VO ID card for SW_DIR. For contacts check [http://operations-portal.egi.eu/vo/help this link].<br />
* [https://indico.cern.ch/e/453272 GridPP Technical Meeting] on Friday "virtual only". <br />
* The latest WLCG [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGDailyMeetingsWeek151026#Monday ops meeting minutes from Monday are available]. <br />
* There was a UK-T0 meeting last week. The other community talks may be of interest, they are linked from the [https://eventbooking.stfc.ac.uk/news-events/uk-t0-workshop-296?agenda=1 agenda].<br />
* Current GridPP vacancies (+ putting them on the GridPP website).<br />
* David Crooks is organising a "SOC" meeting. <br />
<br />
* Raja raised: "ARC CE publishing" and querying the BDII. <br />
* Luke asked about HTC CE documentation links (for European installation).<br />
* John H asked for comments on a dmlite message after update to dpm 1.8.10. <br />
<br />
'''Tuesday 13th October'''<br />
* Krishan mentions lcg-tags errors problems at EFDA-JET. <br />
* BDII installations affected by slapd crash. SL6/CentOS6 (openldap-servers-2.4.40-6) released as a security update, unfortunately from openldap-servers-2.4.40-5 an issue has been introduced which provokes the slapd process to crash under certain conditions.<br />
* Daniela notes: Location of archiving records on ARC-CEs... Beware the jobreport_options of "/var/run" which clears records on reboot.<br />
<br />
<br />
'''Tuesday 6th October'''<br />
* GOCDB has received a new service type request for ‘uk.ac.gridpp.vcycle’.<br />
* John H noted a "CVMFS problem at RAL". Apparently this was due to a misconfiguration at CERN.<br />
* With HPC DIRAC work RAL discovered a bug in how the re-use connection flag is used with the fts commands.<br />
* Here is a summary for the [http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/ September reports]:<br />
** [http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/WLCG_All_Sites_ALICE_Sep2015.pdf ALICE]: All okay.<br />
** [http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/WLCG_All_Sites_ATLAS_Sep2015.pdf ATLAS]: Lancaster: 38%:42% & Liverpool: 81%: 100%.<br />
** [http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/WLCG_All_Sites_CMS_Sep2015.pdf CMS]: All okay.<br />
** [ http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/WLCG_All_Sites_LHCB_Sep2015.pdf LHCb]: QMUL: 47%:47% ; Lancaster: 75%: 80% & ECDF: 89%<br />
* The UCL storage servers are no longer used by ATLAS.<br />
* The November GDB has been moved to 4th November.<br />
<br />
<!-- **********************End General text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End General****************** -----><br />
<!-- ****************Start ops coord****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Operations Coordination - [https://indico.cern.ch/categoryDisplay.py?categId=4372 Agendas][https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsCoordination Wiki Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start ops coord text*********************** -----><br />
<br />
'''Tuesday 27th October'''<br />
* 13th MW Readiness WG meeting THIS Wed 28/10 @ 4pm CET in CERN room 28-S-023 or via vidyo.<br />
* There was a WLCG ops coordination meeting last Thursday: [https://indico.cern.ch/event/393618/ Agenda] - [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151022 Minutes]. <br />
<br />
'''Next Meeting is scheduled for Thursday 22nd October'''<br />
<br />
'''Tuesday 6th October'''<br />
* There was a WLCG ops coordination meeting last week. [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151001 Minutes]. [https://indico.cern.ch/event/393617/ Agenda] (which has John Gordon's accounting slides).<br />
* The highlights:<br />
** dCache sites should install the latest fix for SRM solving a vulnerability<br />
** All sites hosting a regional or local site xrootd should updgrade it at least to version 4.1.1<br />
** CMS DPM sites should consider upgrading dpm-xrootd to version 3.5.5 now (from epel-testing) or after mid October (from epel-stable) to fix a problem affecting AAA<br />
** Tier-1 sites should do their best to avoid scheduling OUTAGE downtimes at the same time as other Tier-1's supporting common LHC VOs. A calendar will be linked in the minutes of the 3 o'clock operations meeting to easily find out if there are already downtimes at a given date<br />
** The multicore accounting for WLCG is now correct for the 99.5% of the CPU time, with the few remaining issues being addressed. Corrected historical accounting data is expected to be available from the production portal by the end of the month<br />
** All LHCb sites will soon be asked to deploy the "machine features" functionality<br />
<br />
'''Tuesday 22nd September'''<br />
* There was an ops coordination meeting last Thursday: [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150917 Minutes].<br />
* Highlights:<br />
** All 4 experiments have now an agreed workflow with the T0 for tickets that should be handled by the experiment supporters and were accidentally assigned to the T0 service managers.<br />
** A new FTS3 bug fixing release 3.3.1 is now available.<br />
** A globus lib issue is causing problems with FTS3 for sites running IPv6.<br />
** The rogue Glasgow configuration management tool replacing the current configuration for VOMS with the old one was picked up and unfortunately discussed as though sites had not got the message about using the new VOMS.<br />
** No network problems experienced with the transatlantic link despite 3 out of 4 cables being unavailable.<br />
** T0 experts are investigating the slow WN performance reported by LHCb and others.<br />
** A group of experts at CERN and CMS investigate ARGUS authentication problems affecting CMS VOBOXes.<br />
** T1 & T2 sites please observe the actions requested by ATLAS and CMS (also on the WLCG Operations portal).<br />
* Actions for [https://wlcg-ops.web.cern.ch/sys-admins/tasks Sites]; [https://wlcg-ops.web.cern.ch/experiments/tasks Experiments].<br />
<br />
'''Tuesday 15th September'''<br />
* The [https://indico.cern.ch/event/393616/ next WLCG ops coordination meeting is this Thursday 17th September]. Are there any Tier-2 issues we wish to raise? Minutes will appear [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150917 here]. <br />
<br />
<br />
<!-- **********************End ops coord text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End ops coord****************** -----><br />
<br />
<!-- ****************Start T1****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-1 - [http://www.gridpp.rl.ac.uk/status/ Status Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start T1 text*********************** -----><br />
'''Tuesday 27th October'''<br />
A reminder that there is a weekly [http://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Tier-1 experiment liaison meeting]. Notes from the last meeting [https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2015-10-21 here]<br />
* There is an issue on the WMS machines where a user was creating very large output files and filling up the disk partition.<br />
* We had two disk server failures over the weekend (Both part of Atlas Tape).<br />
* Also over the weekend, there was a problem with one of the arc-ces which we believe was due to problems with the Hyper_Visor it was running on. <br />
* We are investigating why LHCB batch jobs sometimes fail to write results back to Castor (and the sometimes fail to write remotely as well).<br />
* As reported this last couple of weeks - we did have a problem with glexec for the worker nodes over a weekend. We are trying to understand why this problem was not seen during the testing and roll-out of the new worker node configuration.<br />
<!-- **********************End T1 text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End T1****************** -----><br />
<br />
<!-- ****************Start Storage & DM****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Storage & Data Management - [http://storage.esc.rl.ac.uk/weekly/ Agendas/Minutes]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150902-minutes.txt Wednesday 02 Sep]'''<br />
* Catch up with [http://www3.imperial.ac.uk/highenergyphysics/research/experiments/mice MICE]<br />
* How to do transfers of '''lots''' of files with FTS3 without the proxy timing out (in particularly if you need it vomsified)<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150812-minutes.txt Wednesday 12 Aug]'''<br />
* sort of housekeeping: data cleanups, catalogue synchronisation - in particular namespace dumps for VOs<br />
* GridPP storage/data at future events; GridPP35 and Hepix and Cloud data events<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150708-minutes.txt Wednesday 08 July]'''<br />
* Huge backlog of ATLAS data from Glasgow waiting to go to RAL, and oddly varying performance numbers - investigating<br />
* How physics data is like your Windows 95 games<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150701-minutes.txt Wednesday 01 July]'''<br />
* Feedback on CMS's proposal for listing contents of storage<br />
* Simple storage on expensive raided disks vs complicated storage on el cheapo or archive drives?<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150624-minutes.txt Wednesday 24 June]'''<br />
* Heard about the Indigo datacloud project, a H2020 project in which STFC is participating<br />
* Data transfers, theory and practice<br />
** Somewhat clunky tools to set up but perform well when they run<br />
** Will continue to work on recommendations/overview document<br />
** Worth having recommendations/experiences for different audiences - (potential) users, decision makers, techies<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End Storage & DM****************** -----><br />
<!-- ****************Start T2EVO****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-2 Evolution - [https://its.cern.ch/jira/browse/GRIDPP/ GridPP JIRA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 20 Oct'''<br />
* LHCb multipilot VMs now in production<br />
* Support for APEL-Sync records in Vac, but need to co-ordinate with APEL team to validate it. This is to allow pure-VM sites like UCL to pass APEL SAM tests (GRIDPP-10)<br />
* Last GridPP Technical Meeting decided to test disk-less operation at Oxford for CMS (GRIDPP-20) and LHCb (GRIDPP-21).<br />
<br />
'''Tuesday 6 Oct'''<br />
* UCL Vac site now running LHCb test of two payloads per dual processor VM. Total of dual processor VMs at UCL now 120.<br />
<br />
'''Tuesday 29 Sep'''<br />
* UCL Vac site updated with most recent version of Vac-in-a-Box. Now running ~216 jobs: LHCb MC and ATLAS certification jobs.<br />
* Drawing up list of tasks needed to be able to run a site for GridPP-supported VOs purely using VMs (e.g. VM certification by experiments etc.)<br />
* Discussion at GridPP Technical Meeting on storage options, including xrootd-based sites (i.e. xrootd not DPM/dCache)<br />
<br />
'''Tuesday 22 Sep'''<br />
* Fortnightly [https://indico.cern.ch/category/4454/ GridPP Technical Meetings] on Fridays will have Tier-2 Evolution discussions, starting on Fri 25 Sept.<br />
<br />
'''Thursday 17 Sep'''<br />
* Task force to start developing advice for sites to simplify their operation in line with "6.2.5 Evolution of Tier-2 sites" in the GridPP5 proposal.<br />
* Mailing list for Tier-2 evolution activities: gridpp-t2evo@cern.ch - anyone welcome to join<br />
* Also [https://its.cern.ch/jira/browse/GRIDPP/ GridPP project] on the CERN JIRA service for tracking actions. Can be used with a full or lightweight CERN account. You need to be added manually or on the gridpp-ops@cern.ch mailing list to browse issues.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End T2EVO****************** -----><br />
<!-- ****************Start Accounting****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Accounting - [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html UK Grid Metrics] [[HEPSPEC06]] [http://tinyurl.com/cfbfanf Atlas Dashboard HS06]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 20th October'''<br />
The WLCG MB decided to create a Benchmarking Task force led by Helge Meinhard see [https://indico.cern.ch/event/433303/contribution/9/attachments/1154494/1658853/2015-09-15-LCGMB-Benchmarking.pdf talk]<br />
<br />
'''Tuesday 22nd September'''<br />
* Slight delay for Sheffield but overall okay - although there is a gap between today's date and the most recent update for all sites. Perhaps an APEL delay.<br />
<br />
'''Monday 20th July'''<br />
* Oxford publishing 0 cores from Cream today. Maybe they forgot to switch one off. [http://goc-accounting.grid-support.ac.uk/apel/jobs2_withsubmithost.html Check here]. <br />
<br />
'''Tuesday 14th July'''<br />
* QMUL and Sheffield appear to be lagging with publishing by a week. <br />
* Please check your multicore publishing status (especially those sites mentioned in June).<br />
<br />
* A reminder to keep updating the [https://www.gridpp.ac.uk/wiki/HEPSPEC06 HEPSPEC06 tables].<br />
<br />
* [http://accounting.egi.eu/egi.php?SubRegion=1.67&query=normcpu&startYear=2014&startMonth=8&endYear=2014&endMonth=9&yRange=SITE&xRange=VO&voGroup=lhc&chart=GRBAR&scale=LIN&localJobs=onlygridjobs APEL status]: An issue at Sheffield?<br />
<br />
<!-- *****************Edit stop*********************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End Accounting****************** -----><br />
<!-- ****************Start Documentation****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Documentation - [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=area KeyDocs]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- *********************************************************** -----><br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 20th October, 2015'''<br />
<br />
Approved VOs document updated with temporary section for [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs#Approved_VOs_in_the_process_of_being_established LZ]<br />
<br />
'''Tuesday 29nd September'''<br />
Steve J: problems with voms server at fnal, voms.fnal.gov, have been detected; I will resolve them soon and may issue an update to Approved VOs, alerting sites with TB_SUPPORT should that occur. Approved VOs potentially affected are CDF, DZERO, LSST. Please do not act act yet.<br />
<br />
'''Tuesday 22nd September'''<br />
* Steve J is going to undertake some GridPP/documentation usability testing. <br />
<br />
'''Tuesday 18th August'''<br />
* Lydia's document - Setup a system to do data archiving using FTS3<br />
<br />
'''Tuesday 28th July'''<br />
* Ewan: /cvmfs/gridpp-vo help ... there's a lot of historical stuff on the GridPP wiki that makes it look a lot more complicated than it is now. We really should have a bit of a clear out at some point.<br />
<br />
'''Tuesday 23rd June'''<br />
* Reminder that documents need reviewing!<br />
<br />
<br />
'''General note'''<br />
<br />
See the [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=reviewed worst KeyDocs list] for documents needing review now and the names of the responsible people.<br />
<br />
|}<br />
<br />
<!-- ****************End Documentation****************** -----><br />
<!-- ****************Start Interop****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Interoperation - [https://wiki.egi.eu/wiki/Grid_Operations_Meetings EGI ops agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
''' Tuesday 20th October <br />
<br />
Meeting last week: https://wiki.egi.eu/wiki/Agenda-12-10-2015<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- *********************************************************** -----><br />
<br />
|}<br />
<!-- ****************End Interop****************** -----><br />
<!-- ****************Start Monitoring****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Monitoring - [https://www.gridpp.ac.uk/wiki/Links_Monitoring_pages Links] [http://grid-monitoring.cern.ch/mywlcg/ MyWLCG]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 16th June'''<br />
* F Melaccio & D Crooks decided to add a [https://www.gridpp.ac.uk/wiki/Monitoring_FAQs FAQs section] devoted to common monitoring issues under the monitoring page.<br />
* Feedback welcome. <br />
<br />
<br />
<br />
''' Tuesday 31st March<br />
<br />
* Glasgow Graphite/Grafana documentation: http://www.scotgrid.ac.uk/graphite/<br />
<br />
'''Monday 7th December<br />
<br />
* Meeting last Friday - agenda: https://indico.cern.ch/event/356853/ minutes: https://indico.cern.ch/event/356853/material/minutes/1.pdf<br />
* This was the wrap-up meeting of the consolidation TF; the mailing list will remain extant for a while yet.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Monitoring****************** -----><br />
<!-- ****************On-duty****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | On-duty - [https://operations-portal.in2p3.fr/dashboard Dashboard] [https://www.gridpp.ac.uk/wiki/ROD_rota ROD rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 20th October'''<br />
<br />
Lots of bits here and there, but no big pattern. Tickets about CE and storage problems open at several sites. QMUL notable as going on for a while, probably with some kind of configuration problem they're not identifying.<br />
<br />
<br />
'''Tuesday 6th October'''<br />
* With the exception of the dashboard getting really confused early in the week as the Nagios instances at Oxford and Lancaster came and went, it's been a fairly quiet week. There are four outstanding tickets:<br />
** Three for availability / reliabaility (Sussex, Liverpool and Lancaster).<br />
** One at Bristol for a GridFTP transfer problem.<br />
<br />
'''Tuesday 15th September'''<br />
* Generally quiet. QMUL have some grumblyness with the CEs. However, I understand much of this is caused by the batch farm being busy. There are low-availability tickets 'on hold' for Liverpool and UCL.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Rollout [http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html Status] [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions WLCG Baseline]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 15th September'''<br />
* [http://repository.egi.eu/2015/09/10/release-umd-3-13-3/ UMD 3.13.3 is available].<br />
<br />
'''Tuesday 12th May'''<br />
* MW Readiness WG meeting Wed May 6th at 4pm. Attended by Raul, Matt, Sam and Jeremy.<br />
<br />
'''Tuesday 17th March'''<br />
* Daniela has updated the [ https://www.gridpp.ac.uk/wiki/Staged_rollout_emi3 EMI-3 testing table]. Please check it is correct for your site. We want a clear view of where we are contributing.<br />
* There is a middleware readiness meeting this Wednesday. Would be good if a few site representatives joined.<br />
* Machine job features solution testing. Fed back that we will only commence tests if more documentation made available. This stops the HTC solution until after CHEP. Is there interest in testing other batch systems? Raul mentioned SLURM. There is also SGE and Torque.<br />
<br />
'''References'''<br />
<br />
* Staged Rollout pages (now separated into EMI1 & 2), and the page listing the deployed versions is extractable from the bdii, so they should all be reasonably up-to-date:<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout_emi2.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End On-duty****************** -----><br />
<!-- ****************Start Security****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Security - [http://www.gridpp.ac.uk/security/inchand/Incident.html Incident Procedure] [http://www.gridpp.ac.uk/security/policies/index.html Policies] [https://www.gridpp.ac.uk/wiki/SD_rota Rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
'''Monday 26th October'''<br />
* Updated IGTF distribution version [ https://dist.igtf.net/distribution/igtf/current/ 1.69 available].<br />
* Next UK Security Team meeting scheduled for 28th Oct.<br />
<br />
'''Tuesday 20th October'''<br />
* The IGTF has released a regular update to the trust anchor repository ([ https://rt.egi.eu/rt/Ticket/Display.html?id=9668 1.69]) - for distribution ON OR AFTER October 26t<br />
<br />
'''Tuesday 13th October'''<br />
* Nothing to report<br />
* Next UK Security Team meeting scheduled for 28th Oct.<br />
<br />
'''Monday 5th October'''<br />
* Updated IGTF distribution version 1.68 available - https://dist.igtf.net/distribution/igtf/current/<br />
* Update on incident broadcast EGI-20150925-01 relating to compromised systems in China. - The EGI, WLCG and VO security teams are continuing their investigations. Affected sites and users have been contacted and there is no present indication of further action needed by any site in the UK. However, as more information comes to light, additional updates may be made in the near future and sites are asked as always to read any updates carefully, taking actions as recommended.<br />
<br />
'''Tuesday 29th September'''<br />
* Incident broadcast EGI-20150925-01 relating to compromised systems in China.<br />
* UK security team meeting scheduled for 30th Sept. <br />
<br />
'''Monday 29th September'''<br />
* IGTF has released a regular update to the trust anchor repository (1.68) - for distribution ON OR AFTER October 5th<br />
<br />
<br />
The EGI [https://operations-portal.egi.eu/csiDashboard security dashboard].<br />
<br />
|}<br />
<!-- ****************End Security****************** -----><br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start Services****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Services - [https://maddash.aglt2.org/maddash-webui/ PerfSonar dashboard] | [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).<br />
<br />
'''Tuesday 6th October'''<br />
* A reminder, the next [https://indico.cern.ch/event/401680/ LHCOPN and LHCONE joint meeting]: Science Park Amsterdam (NL) 28-29 of October 2015<br />
<br />
'''Tuesday 14th July'''<br />
* GridPP35 in September will have a part focus on networking and IPv6. This will include a review of where sites are with their deployment. Please try to firm up dates for your IPv6 availability between now and September. Please update the [https://gridpp.ac.uk/wiki/IPv6_site_status GridPP IPv6 status table]. <br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Services****************** -----><br />
<!-- ****************Start Tickets****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tickets<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Monday 2nd November 2015, 13.30 GMT'''<br /><br />
23 Open UK Tickets this week. <br />
<br />
The link to all the UK [http://tinyurl.com/nwgrnys '''tickets'''].<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios'''] looks clean at time of writing.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tickets****************** -----><br />
<!-- ****************Start Tools****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tools - [https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] [https://gridppnagios.physics.ox.ac.uk/nagios/ Nagios]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== ===== <br />
<!-- ******************Edit start********************* -----><br />
''' Tuesday 6 Oct 2015'''<br />
<br />
Moved Gridppnagios instance back to Oxford from Lancaster. It was kind of double whammy as both sites went down together. Fortunately Oxford site was partially working so we managed to start SAM Nagios at Oxford. Sam tests were unavailable for few hours but no affect on egi availibilty/reliability. Sites can have a look at https://mon.egi.eu/myegi/ss/ for a/r status. <br />
<br />
'''Tuesday 29 Sep 2015'''<br />
<br />
Following an air-conditioning problem at machine room in Oxford Tier-2 site on 26 September, gridppnagios(OX) was shut down and gridppnagios(Lancs) became active instance. Oxford site is in downtime until 1st Oct and it may be extended depending on the situation. <br />
VO-Nagios was also unavailable for two days but we have started it yesterday as it is running on a VM. VO-nagios is using oxford SE for replication test so it is failing those tests. I am looking to change to some other SE. <br />
<br />
'''Tuesday 09 June 2015'''<br />
*ARC CEs were failing nagios test becuase of non-availability of egi repository. Nagios test compare CA version from EGI repo. It started on 5th June and one of the IP addresses behind webserver was not responding. Problem went away in approximately 3 hours. The same problem started again on 6th June. Finally it was fixed on 8th June. No reason was given in any of the ticket opened regarding this outage. <br />
<br />
'''Tuesday 17th February'''<br />
* Another period where message brokers were temporarily unavailable seen yesterday. Any news on the last follow-up?<br />
<br />
'''Tuesday 27th January'''<br />
* Unscheduled outage of the EGI message broker (GRNET) caused a short-lived disruption to GridPP site monitoring (jobs failed) last Thursday 22nd January. Suspect BDII caching meant no immediate failover to stomp://mq.cro-ngi.hr:6163/ from stomp://mq.afroditi.hellasgrid.gr:6163/<br />
<br />
* [http://southgrid.blogspot.co.uk/2014/10/nagios-monitoring-for-non-lhc-vos.html Blog about VO Nagios]<br />
* [https://vo-nagios.physics.ox.ac.uk/nagios/ Oxford VO Nagios] currently monitoring gridpp, pheno, t2k.org, snoplus.snolab.ca, vo.southgrid.ac.uk.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tools****************** -----><br />
<!-- ****************Start VOs****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | VOs - [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS] [http://operations-portal.egi.eu/vo VO IDs] [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs Approved] [http://pprc.qmul.ac.uk/~walker/votable.html VO table]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 19th May'''<br />
* There is a current priority for enabling/supporting our joining communities. <br />
<br />
'''Tuesday 5th May'''<br />
* We have a number of VOs to be removed. Dedicated follow-up meeting proposed.<br />
<br />
'''Tuesday 28th April'''<br />
* For SNOPLUS.SNOLAB.CA, the port numbers for voms02.gridpp.ac.uk and voms03.gridpp.ac.uk have both been updated from 15003 to 15503.<br />
<br />
'''Tuesday 31st March'''<br />
* LIGO are in need of additional support for debugging some tests.<br />
* LSST now enabled on 3 sites. No 'own' CVMFS yet.<br />
<br />
* Impact<br />
** Citation policy (https://www.gridpp.ac.uk/acknowledging.html)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End VOs****************** -----><br />
<!-- ****************Start Sites****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Site Updates<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 24th February'''<br />
* Next review of status today.<br />
<br />
'''Tuesday 27th January'''<br />
* Squids not in GOCDB for: UCL; ECDF; Birmingham; Durham; RHUL; IC; Sussex; Lancaster<br />
* Squids in GOCDB for: EFDA-JET; Manchester; Liverpool; Cambridge; Sheffield; Bristol; Brunel; QMUL; T1; Oxford; Glasgow; RALPPD.<br />
<br />
'''Tuesday 2nd December''' <br />
* [https://www.gridpp.ac.uk/wiki/Batch_system_status Multicore status]. Queues available (63%)<br />
** YES: RAL T1; Brunel; Imperial; QMUL; Lancaster; Liverpool; Manchester; Glasgow; Cambridge; Oxford; RALPP; Sussex (12)<br />
** NO: RHUL (testing); UCL; Sheffield (testing); Durham; ECDF (testing); Birmingham; Bristol (7)<br />
<br />
* According to our [https://www.gridpp.ac.uk/wiki/Batch_system_status table] for cloud/VMs (26%)<br />
** YES: RAL T1; Brunel; Imperial; Manchester; Oxford (5)<br />
** NO: QMUL; RHUL; UCL; Lancaster; Liverpool; Sheffield; Durham; ECDF; Glasgow; Birmingham; Bristol; Cambridge; RALPP; Sussex (14)<br />
<br />
* [http://www.gridpp.ac.uk/php/gridpp-dirac-sam.php?action=viewlcg GridPP DIRAC jobs] successful (58%)<br />
** YES: Bristol; Glasgow; Lancaster; Liverpool; Manchester; Oxford; Sheffield; Brunel; IC; QMUL; RHUL (11)<br />
** NO: Cambridge; Durham; RALPP; RAL T1 (4) + ECDF; Sussex; UCL; Birmingham (4)<br />
<br />
* [https://www.gridpp.ac.uk/wiki/IPv6_site_status IPv6 status]<br />
** Allocation - 42%<br />
** YES: RAL T1; Brunel; IC; QMUL; Manchester; Sheffield; Cambridge; Oxford (8)<br />
** NO: RHUL; UCL; Lancaster; Liverpool; Durham; ECDF; Glasgow; Birmingham; Bristol; RALPP; Sussex<br />
<br />
* Dual stack nodes - 21%<br />
** YES: Brunel; IC; QMUL; Oxford (4)<br />
** NO: RHUL; UCL; Lancaster; Glasgow; Liverpool; Manchester; Sheffield; Durham; ECDF; Birmingham; Bristol; Cambridge; RALPP; Sussex, RAL T1 (15)<br />
<br />
<br />
<br />
'''Tuesday 21st October'''<br />
* High loads seen in xroot by several sites: Liverpool and RALT1... and also Bristol (see Luke's TB-S email on 16/10 for questions about changes to help).<br />
<br />
'''Tuesday 9th September'''<br />
* Intel announced the new generation of Xeon based on Haswell.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Sites****************** -----><br />
|}<br />
<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Meeting Summaries<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start PMB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Project Management Board - [http://www.gridpp.ac.uk/pmb/ Members][http://www.gridpp.ac.uk/php/pmb/minutes.php Minutes] [http://www.gridpp.ac.uk/pmb/ProjectManagement/QuarterlyReports/reports.html Quarterly Reports]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ******************************************* -----><br />
<!-- ****************End PMB****************** -----><br />
<!-- ****************Start Grid Ops****************** -----><br />
<!-- ******************************************* -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | GridPP ops meeting - [http://indico.cern.ch/categoryDisplay.py?categId=338 Agendas] [https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Actions] [https://www.gridpp.ac.uk/wiki/Category:GridPP_Operations Core Tasks]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Grid Ops****************** -----><br />
<!-- ****************Start T1 liaison****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) [https://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Agenda] Meeting takes place on Vidyo.<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
Highlights from this meeting are now included in the Tier1 report farther up this page.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Liaison****************** -----><br />
<!-- ****************Start GDB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Grid Deployment Board - [http://indico.cern.ch/categoryDisplay.py?categId=3l181 Agendas] [http://indico.cern.ch/categoryDisplay.py?categId=666 MB agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End GDB****************** -----><br />
<br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start NGI****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | NGI UK - [http://www.ukngi.ac.uk/ Homepage] [https://ca.grid-support.ac.uk/cgi-bin/pub/pki?cmd=getStaticPage&name=index CA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End NGI****************** -----><br />
<br />
<!-- ****************Start Events****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Events<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************Stop Events****************** -----><br />
<!-- ****************Start ATLAS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK ATLAS - [http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?#currentView=Shifter+view&highlight=false Shifter view] [http://www.atlas.ac.uk/ops/ News & Links]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Atlas S&C week 2-6 Feb 2015'''<br />
<br />
Production<br />
<br />
• Prodsys-2 in production since Dec 1st<br />
<br />
• Deployment has not been transparent , many issued has been solved, the grid is filled again <br />
<br />
• MC15 is expected to start soon, waiting for physics validations, evgen testing is underway and close to finalised.. Simulation expected to be broadly similar to MC14, no blockers expected. <br />
<br />
Rucio<br />
<br />
• Rucio in production since Dec 1st and is ready for LHC RUN-2. Some fields need improvements, including transfer and deletion agents, documentation and monitoring. <br />
<br />
• [https://rucio-ui.cern.ch/dumps Rucio dumps available].<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMDarkDataCleaning Dark data cleaning]<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMLostFilesLost files declaration] . Only Only DDM ops can issue lost files declaration for now, cloud support needs to fill a ticket.<br />
<br />
• Webdav panda functional tests with Hammercloud are ongoing<br />
<br />
Monitoring<br />
<br />
• [http://adc-monitoring.cern.ch/ Main page]<br />
<br />
• [http://dashb-atlas-ddm-acc.cern.ch/dashboard/request.py/ddmaccounting DDM Accounting]<br />
<br />
• [http://atlas-agis.cern.ch/agis/ddmblacklisting/listGroup space]<br />
<br />
• [http://dashb-atlas-ddm.cern.ch/ddm2/ Deletion]<br />
<br />
ASAP<br />
<br />
• ASAP (ATLAS Site Availability Performance) in place. Every 3 months the T2s sites performing BELOW 80% are [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/ATLASSiteCategorization reported to the International Computing Board.]<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End ATLAS****************** -----><br />
<!-- ****************Start CMS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK CMS<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End CMS****************** -----><br />
<!-- ****************Start LHCb****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK LHCb<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End LHCb****************** -----><br />
<!-- ****************Start Other****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK OTHER<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Other****************** -----><br />
<!-- ****************Start Requests****************** -----><br />
{| cellspacing="0" cellpadding="0" style="margin: 1em 0em 0em 0.3em; width:100%"<br />
| style="width:50%; vertical-align:top; border:1px solid Gold; background-color: LightGreen;" rowspan="2"|<br />
<div style="border-bottom:1px solid Gold; background-color:#ffffaa; padding:0.2em 0.5em 0.2em 0.5em; font-size:110%; font-weight:bold;"> To note</div><br />
<div style="padding:0.4em 1em 0.3em 1em;"><br />
<br />
==== ====<br />
<!-- ******************Edit start********************* -----><br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<br />
</div><br />
|}<br />
<br />
<!-- ****************End Requests****************** -----><br />
<br />
|}</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Past_Ticket_Bulletins_2015Past Ticket Bulletins 20152015-11-02T13:38:48Z<p>Matthew Doidge 1ac9bd3994: </p>
<hr />
<div>'''Monday 26th October 2015, 15.30 GMT'''<br /><br />
26 Open UK Tickets this week. Not many seem all that exciting though.<br />
<br />
The link to all the UK [http://tinyurl.com/nwgrnys '''tickets'''].<br />
<br />
The few (two) tickets that really caught my eye are:<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117151 117151] (23/10)<br /><br />
This ticket is quite interesting, mainly for Dan schooling the submitter. QM received a ticket complaining that their jumbo frames were breaking stuff - it doesn't looking like the problem is at QM though. Naught wrong with the ticket handling. Waiting for reply (23/10)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117065 117065] (20/10)<br /><br />
Bristol have a CMS glexec error ticket that looks very similar to an existing one ([https://ggus.eu/?mode=ticket_info&ticket_id=116683 116683]), which is in turn spookily similar to a ticket being worked on by the Bristol admins ([https://ggus.eu/?mode=ticket_info&ticket_id=116775 116775]). At the very least I would say that if the two problems are different one would likely obfuscate the other. Is this a case of over-keen shifters submitting tickets without checking? I'd be tempted to close one or both of [https://ggus.eu/?mode=ticket_info&ticket_id=116683 116683] and this one ([https://ggus.eu/?mode=ticket_info&ticket_id=117065 117065]). Tell them I said it was okay to[1]. In progress (21/10)<br />
<br />
[1] It probably is okay to.<br />
<br />
There are also 4 availability tickets, all On Hold waiting for 30 days or so to pass.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios'''] looks clean at time of writing.<br />
<br />
'''Monday 19th October 2015, 14.30 BST'''<br /><br />
28 Open UK Tickets today.<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116920 116920] (14/10)<br /><br />
UCL have a availability ticket, and Andrew M wonders what can be done for them as a VAC site to stop getting these sort of tickets? Assigned (15/10) ''Update - Alessandra has updated and On-Holded the ticket. Thanks!''<br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116865 116865] (12/10)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116915 116915] (14/10)<br /><br />
Sussex have a Sno+ ticket and a ROD ticket that don't seem to have been looked at since they were submitted last week. Both just Assigned.<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116918 116918] (14/10)<br /><br />
Another ROD ticket (Invalid glue), I think this one has snuck past the Liver-Lad's watch. Assigned (14/10)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116782 116782] (7/10)<br /><br />
Another Rod ticket, Dan looks like he's tracked down why one of his CE's is misbehaving (MaxStartups in sshd_config). Looks like this ticket at least can be closed, Gareth confirms that the test is green. Waiting for reply (19/10)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116560 116560] (30/9)<br /><br />
Stephen B has added some more information to try to figure out why Sno+ jobs are flooding Sheffield's 10 Sno+ slots. Elena is not sure if the problem persists though. Waiting for reply (12/10)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116864 116864] (12/10)<br /><br />
It looks like this CMS AAA test problem has resolved itself - Federica asks if you chaps at RAL changed anything? Looks like the ticket can be closed. In progress (15/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116866 116866] (12/10)<br /><br />
Enabling Sno+ pilots at the Tier 1. Only the LHC VOs had pilot roles enabled at the Tier 1, Andrew was going to discuss how best to make these changes. As with a similar issue at Lancaster - probably best to do it for all the VOs that will be using Dirac. In progress (13/10)<br />
<br />
'''Monday 12th October 2015, 14.30 BST'''<br /><br />
23 Open UK Tickets this week. Just a light review.<br />
<br />
'''OXFORD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116752 116752] (1/10)<br /><br />
Oxford's CMS Phedex renewal ritual ticket. Chris B kindly offered in his ticket for RALPP to officiate this (un)holy task for Oxford, so I advise some communication between you chaps and him - which may well be going on, but it ain't in the ticket! Assigned (6/10)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116683 116683] (5/10)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116775 116775] (6/10)<br /><br />
CMS have by the looks of it thrown a pair of duplicate tickets Bristol's way. How rude! Lukasz has rightfully suggested closing one of them (I suggest 116683. Winnie's put a good reply in t'other one).The underlying problem appears to be a shortage of pool accounts - what are the recommended amount of accounts for VOs these days? In progress.<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] (18/6)<br /><br />
Sheffield pilot role tickets. Daniela has pointed out that as Sno+ have started using Sheffield in earnest they really could do with pilot roles enabled. In progress (9/10)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116812 116812] (8/10)<br /><br />
LHCB asked Andy to clean up the $LOGIN_POST_SCRIPT for LHCB at his site, removing a "export CMTEXTRATAGS="host-slc5"" line. Naught wrong with the ticket, but I liked how lhcb did some good debugging, and there's something about a profile script problem that reminds me of a simpler time... In progress (9/10)<br />
<br />
[http://tinyurl.com/nwgrnys '''A link to the rest of the tickets for completeness.''']<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''The other VO nagios.''']<br /><br />
Looks okay, some (known I think) errors at QM, and Sheffield seems to have a few SRM errors going on since Saturday.<br />
<br />
<br />
'''Monday 5th October 2015, 14.15 BST'''<br />
<br />
22 Open UK Tickets this month, all of them, Site by Site:<br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116136 116136] (9/9)<br /><br />
Sussex got a snoplus ticket for a high number of job failures, although simple test jobs ran okay. Matt asks if the problem persists, the reply was a resounding "not sure". In progress (think about closing) (21/9)<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116652 116652](1/10)<br /><br />
A ticket from CMS, about some important Phedex ritual that must occur on the 3rd of November, when the stars are right. The ticket needs some confirmation and feedback, plus the nomination of one site acolyte to receive the DBParam secrets from CMS - but the ticket only got assigned to sites this morning. Assigned (5/10)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116651 116651](1/10)<br /><br />
Same as the RALPP ticket, Winnie has volunteered Dr Kreczko for the task. In progress (5/10)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303](Long, long ago)<br /><br />
glexec ticket. On hold (18/5)<br />
<br />
'''DURHAM'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116576 116576](1/10)<br /><br />
Atlas ticket asking Durham to delete all files outside of the datadisk path. Oliver asks what this means for the other tokens (I think they can be sacrificed to feed datadisk, but Brian et al can confirm that). Waiting for reply (5/10)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116560 116560](30/9)<br /><br />
Sno+ jobs having trouble at Sheffield. Looks like a proxy going stale problem as only 10 Sno+ jobs at a time can run at Sheffield. Matt M asks if/how the WMS can be notified to stop sending jobs in such a case. In progress (30/9)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460](18/6)<br /><br />
Gridpp Pilot roles. No news on this for a while, after the last attempt seemed to not quite work. In progress (30/7)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116585 116585](1/10)<br /><br />
Biomed ticketed Manchester with problems from their VO nagios box - which Alessandra points out being due to there being no spare cycles for biomed to run on. Assigned (can be put on hold or closed?) (1/10)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116082 116082](7/9)<br /><br />
A classic Rod Availability ticket. On Hold (7/9)<br />
<br />
'''LANCASTER''' (a little embarrassing that my own site has the most tickets)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116478 116478] (28/9)<br /><br />
Another availability ticket, this time for Lancaster (which has been through the wars in September). Still trying to dig our way out, but even the Admin's broke. On hold (5/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116676 116676] (5/10)<br /><br />
Another ROD ticket, Lancaster's not quite out of the woods. We think WMS access is somewhat broken. We have no idea about the sha2 error. In progress (5/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116366 116366] (22/9)<br /><br />
Sno+ spotted malloc errors at Lancaster. The problems seemed to survive one batch of fixes, but I asked again if they still see problems after running a good number of jobs over the weekend. Waiting for reply (5/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (In a galaxy far, far way)<br /><br />
glexec ticket. This was supposed to be done last week, after I had figured out [https://www.gridpp.ac.uk/wiki/RelocatableGlexec "the formula"] - but then last week happened. On hold (5/10)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115959 115959] (31/8)<br /><br />
LHCB job errors at QM, with a 70% pilot failure rate on ce05. Dan couldn't see where things are breaking (only that the CE wasn't publishing to APEL- and asks if this is the cause of the problem?) Waiting for reply (5/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116662 116662] (5/10)<br /><br />
LHCB job failures on ce05 - almost certainly a duplicate of [https://ggus.eu/?mode=ticket_info&ticket_id=115959 115959], but it might have some useful information in it. Assigned (probably can be closed as a duplicate) (5/10)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116650 116650] (1/10)<br /><br />
Imperial's invitation to the CMS Phedex DBParam ritual. Daniela's on it, as well as the other CMS sites. On hold (5/10)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116649 116649] (1/10)<br /><br />
Brunel's ticket for the great DBParam alignment of 2015. On hold (5/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116455 116455] (28/9)<br /><br />
A CMS request to change the xrootd monitoring configs. Did you get round to doing this last week Raul? In progress (29/9)<br />
<br />
'''EFDA-JET'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115448 115448] (3/8)<br /><br />
Biomed having trouble tagging the jet CE. The Jet admins think this is the same underlying issues as their other ticket <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115496 115496]. In progress (25/9)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115496 115496] (5/8)<br /><br />
Biomed unable to remove files from the jet SE. There are clues that suggest that some dns oddness is the cause, but it's not clear. In progress (18/9)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116358 116358] (22/9)<br /><br />
Ticket complaining about a missing image at the site. Some to and fro, the ball is back in the site's court. In progress (2/10)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116618 116618] (1/10)<br /><br />
The Tier 1's CMS DBParam ritual ticket. In progress (5/10)<br />
<br />
Let me know if I missed ought.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''T'OTHER VO NAGIOS''']<br /><br />
At time of writing things looka a bit rough at QM, Liverpool (just getting over their downtime) and for Sno+ at Sheffield (likely related to their ticket).<br />
<br />
<br />
Matt's on holiday until the 29th, so he's being replaced with links or any update Jeremy is kind enough to provide.<br />
<br />
[http://tinyurl.com/nwgrnys '''CAST YOUR GAZE UPON THE UK'S GGUS TICKETS HERE'''].<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''FEAST YOUR EYES ON THE T'OTHER VO NAGIOS STATUS HERE''']<br />
<br />
"Normal" service will resume in October. I'll leave y'all anticipating that lovely review of all the tickets.<br />
<br />
'''Monday 14th September, 15.00 BST'''<br /><br />
<br />
Yet another brief ticket review - its been a tad busy! Hopefully normal service will resume, err, next month. My apologies.<br />
<br />
Only 20 Open UK tickets this week.<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115805 115805] - This RAL ticket from Sno+ looks like it can be closed, with there not actually being a problem with the site, or a feature that RAL would really want to implement. <br />
<br />
Keeping on the Tier 1, does this ticket concerning the FTS3 certificates ([https://ggus.eu/?mode=ticket_info&ticket_id=115290 115290]) have anyone looking into it yet?<br />
<br />
Are these two QMUL LHCB tickets duplicates, or related? [https://ggus.eu/?mode=ticket_info&ticket_id=115959 115959] & the more recent [https://ggus.eu/?mode=ticket_info&ticket_id=116153 116153]<br />
<br />
This Sussex ticket looks like it could do with someone taking a look - still just "assigned" since the 9th - [https://ggus.eu/?mode=ticket_info&ticket_id=116136 116136]<br />
<br />
Finally, an interesting VAC ticket for Oxford - "no space left on device" for atlas jobs - [https://ggus.eu/?mode=ticket_info&ticket_id=116123 116123]. Bit of an odd one!<br />
<br />
'''Monday 7th September 2015, 15.00 BST'''<br /><br />
20 Open UK Tickets this week.<br />
Due to Interesting Downtimes (apologies for reusing that pun!) yet another fairly light review. But not much is going on in ticketland.<br />
<br />
[http://tinyurl.com/nwgrnys THE GGUS TICKETS IN ALL THEIR GLORY]<br /><br />
The Sno+ ticket [https://ggus.eu/?mode=ticket_info&ticket_id=115805 115805] is interesting, Sno+ are looking at monitoring jobs submitted via WMS through the port 9000 URLS, but the RAL WMS behaves differently from the Glasgow one and doesn't let others with the same roles look at the links. Sno+ are still "developing" their grid infrastructure with the WMS in mind by the looks of it.<br />
<br />
The pilot role at Sheffield ticket [https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] could still do with some attention, or an update.<br />
<br />
Bristol had a ticket from CMS that looks interesting - [https://ggus.eu/?mode=ticket_info&ticket_id=115883 115883]. The CMS SAM3 tests are confused by Bristol having an SRM-less SE endpoint. Waiting for reply after things were clarified.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 T'Other VO Nagios Page]<br /><br />
A few "The CREAM service cannot accept jobs at the moment" style errors at QM, but they're only a few hours old. Otherwise looking alright beyond the usual noise.<br />
<br />
Of course with these light reviews I could well be missing something, so feel free to let me know - sites or VO representatives.<br />
<br />
'''Monday 24th August 2015, 15.45 BST'''<br /><br />
<br />
21 Open UK tickets this week, most being looked at or are understood. There will be no ticket update from Matt next week (1st September) either, as he will be flapping about during a local downtime.<br />
<br />
[http://tinyurl.com/nwgrnys YE OLDE GGUS TICKET LINK]<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] (gridpp pilots at Sheffield) could do with an update. The Liverpool ticket discussed last week ([https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248]) has received an update from the user saying that the ticket can be closed. As Steve mentioned last week the underlying issue is still very much there, but I don't think this ticket is a suitable banner for us to fight that battle under.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 T'Other VO Nagios Page]<br /><br />
As usual nothing to see here at time of writing, sites are doing a grand job of working for the monitored VOs.<br />
<br />
And that's all folks! See you in Liverpool.<br />
<br />
<br />
'''Monday 17th August 2015, 15.00 BST'''<br /><br />
29 Open UK Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114992 114992] (10/7)<br /><br />
It looks like this CMS transfer problem ticket can be closed after a user update last week, which reported enabling multiple streams solved the initial failures. In progress (13/8)<br />
<br />
'''EFDA-JET'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115613 115613] (assigned) ''Update- looks like this was a temporary problem, and the ticket can be closed.<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115448 115448] (in progress, but empty)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115496 115496] (in progress, but might not be a site problem).<br /><br />
Jet have 3 biomed tickets, 2 of which are looking a little neglected.<br />
<br />
'''CAMBRIDGE'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115655 115655] (12/8)<br /><br />
John rightfully asks why does a lengthy but very scheduled downtime set off a ROD alarm. Other then that it'll be worth "on-holding" this ticket whilst the unrighteous red flag fades. In progress (17/8) ''Update - On holded''<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
This Sno+ ticket looks like it needs some chasing up, no news for nearly 2 months. In progress (21/7) ''Update - Steve commented on this on TB-SUPPORT''<br />
<br />
'''Monday 10th August 2015, 15.00 BST'''<br /><br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 T'Other VO Nagios] looks alright at time of writing.<br />
<br />
24 Open UK tickets this week.<br />
<br />
''Lots of activity on GGUS since yesterday. Most positive, but I see the number of tickets at EFDA-JET increasing - all three from biomed.''<br />
<br />
'''Tier 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115512 115512] (5/8)<br /><br />
An interesting ticket - where a banned user is still banned after moving to LHCB from Biomed (no, he's not called Heinz). In Progress (6/8) ''Update - Andrew has asked that the ticket be reassigned to the argus devs as the pepd logs are showing oddness. Waiting for reply now.''<br />
<br />
'''Bristol'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115565 115565] (7/8)<br /><br />
Bristol's phedex agents are down, and have been for a few days. I might have dreamt this, but thought that the Bristol Phedex service might not be hosted at Bristol, especially after RALPP had a similar ticket ([https://ggus.eu/?mode=ticket_info&ticket_id=115566 115566]) at the same time. Assigned (7/8) ''Update - solved''<br />
<br />
'''Manchester'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115504 115504] (ROD Ticket) ''Solved'' <br /> <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115399 115399] (Wiki Ticket) ''Still Open'' <br /> <br />
Both these tickets look like they can be closed.<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115525 115525] (5/8)<br /><br />
Atlas deletion errors after a a disk server fell over- nothing wrong with the ticket handling, but Alessandra brings up a point that always niggles me - the emphasis on the total number of transaction errors and not the number of affected unique files. In progress (8/8)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB ticket sparked by those IPv6 problems. Still no word from Vladimir; Raja - could you comment? I suspect there's been plenty of room for LHCB jobs in QM's (and everyone else who mainlines atlas jobs) queues this weekend. Waiting for reply (21/7)<br />
<br />
'''Monday 3rd August 2015, 14.30 BST'''<br /><br />
<br />
23 Open tickets this month, full review time.<br />
<br />
'''''Newish this morning'''''<br /><br />
As discussed on TB-SUPPORT, a few sites have been getting "I can't lcg-tag your CE" tickets from Biomed. The Liverpool ticket [https://ggus.eu/?mode=ticket_info&ticket_id=115449 115449], solved and verified, was the flagship of these issues. Brunel ([https://ggus.eu/?mode=ticket_info&ticket_id=115445 115445]) and EFDA-JET ([https://ggus.eu/?mode=ticket_info&ticket_id=115448 115448]) also have tickets about this.<br />
<br />
<br />
'''Sno+ "glite-wms-job-status warning"''' (3/8)<br /><br />
Glasgow: [https://ggus.eu/?mode=ticket_info&ticket_id=115435 115435]<br /><br />
Tier 1: [https://ggus.eu/?mode=ticket_info&ticket_id=115434 115434]<br /><br />
Matt M submitted these tickets to Glasgow and the Tier 1 after having trouble with a proportion of Sno+ jobs. Both are being looked at- definitely worth collaborating on this one.<br />
<br />
'''Wiki-leak...'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115399 115399](31/7)<br /><br />
Jeremy noticed that the wiki didn't work for him on Friday - but it seems to work for Jeremy, Alessandra and myself now. As Jeremy notes the ticket can be closed, but out of interest did anyone else spot any problems? In progress (3/8)<br />
<br />
'''Spare the ROD...'''<br /><br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115433 115433] (3/8)<br /><br />
Some CE problems noticed on the dashboard for the Liver-lads - who might be in mourning. Assigned (3/8) ''Update - aaannd Solved by upping gridftp max connections''<br />
<br />
'''RALPP & UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114764 114764]<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114851 114851]<br /><br />
Both of these are "availability" alarm tickets, on-holding until they clear. I hope RALPP managed to get a re-computation for their unfair failures (IGTF-1.65 problems on ARC).<br />
<br />
'''Sno+'d Under'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115387 115387] (30/7)<br /><br />
I'm uncertain if this Sno+ ticket, probably somewhat related to Matt M's recent thread on TB-SUPPORT and concerning xrootd access, is meant for the Tier 1 or RALPP. Assigned (3/8)<br />
<br />
'''First Tier Problems.'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115417 115417] (2/8)<br /><br />
LHCB spotted a number of nodes with cvmfs problems at the Tier 1, which the RAL team had already jumped on and repaired this morning. They wonder if the problem persists. Waiting for reply (3/8)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115290 115290] (28/7)<br /><br />
An FTS problem requiring some special CA magic to solve, but the current CA-wizard isn't about. On hold (29/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113836 113836] (20/5)<br /><br />
Glue 1 vs Glue 2 queue mismatches. It's being worked on perfecting cluster publishing for ARC CEs, but the ticket could either do with an update or on-holding. In progress (24/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114992 114992] (10/7)<br /><br />
CMS transfers failing between RAL and, err, TAMU in the US. Assigned to RAL, where Brian has investigating and Andrew has posed an good question, asking if the user has considered managing the transfers with FTS. Quiet on the user side. In progress (21/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/2014)<br /><br />
One of the tickets from the before times, about CMS AA access tests. It has become a long and confusing saga, but Gareth rescued it with a handy summary of the issue in his last update. How goes the battle? In progress (17/7)<br />
<br />
'''Oxford Squid is red.'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115230 115230] (24/7)<br /><br />
Which might be the colour Ewan's seeing right now! The ticket is reopened, with a comment from Alessandra that the current recommendation is to allow all CERN addresses, and asks if this is something Oxford could do. Reopened (3/8) ''Update - solved after a squid restart. It's not the end of this ordeal, but Ewan would like to tackle it in a different arena then an Oxford ticket.''<br />
<br />
'''Mavaricks and Gooses - GridPP Pilot roles.'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114485 114485] '''Bristol'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] '''Sheffield'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114442 114442] '''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114441 114441] '''RHUL'''<br /><br />
Daniela hopped right back into the pilot seat after getting back from her holidays. Bristol and RALPP are looking good, Sheffield and RHUL are still in the Danger Zone - RHUL in particular were having troubles with argus and could do with some working configs from elsewhere to compare and contrast with their own.<br /><br />
''Update - RHUL are looking better, just a few queue permission tweaks to go by the looks of it.''<br />
<br />
<br />
'''My shame - tarball glexec'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] '''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] '''Lancaster'''<br /><br />
The tarball glexec tickets. Actually this is likely to become a defunct (or at least different) problem at Edinburgh with their SL7 move. Lancaster has a plan - we plan to deploy *something* (amazing plan there Matt) during our next big reinstall in September. Between now and then I have a test CE, cluster and most importantly some time. <br />
<br />
'''Pot-luck tickets (or those I couldn't group).'''<br />
<br />
'''Durham'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br /><br />
A tiny fraction of jobs publishing 0 cores used. Looks to be a slurm oddity. Oliver upgraded their CEs to ARC5 last week and hopes this has fixed things. Fingers crossed! In progress (29/7)<br />
<br />
'''Lancaster'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/2014)<br /><br />
My other shame - Lancaster's poor perfsonar performance. It's being worked on. On Hold (should be back in progress soon) (30/7)<br />
<br />
'''Liverpool'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (21/7)<br /><br />
Sno+ production problems at Liverpool, probably due to a lack of space in the shared area. Things are back in Sno+'s court, with the submitter consulting the Sno+ gurus (I think). In progress (21/7)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB job submission problems due to the known about dual-stacking problems. Waiting for input from LHCB for a while now, as things look okay at QM now but at least check LHCB jobs still weren't running for some reason. Waiting for reply (21/7)<br />
<br />
That's all folks!<br />
<br />
'''Monday 27th July 2015, 16.10 BST'''<br /><br />
<br />
Only 20 UK Tickets this week, and many are on hold for summer holidays. I pruned a few tickets, but none are striking me as needing urgent action, so this will be brief.<br />
<br />
[http://tinyurl.com/nwgrnys Mandatory UK GGUS Link]<br /><br />
Nothing to see here really, but maybe I'm missing something? Nit-picking I see:<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (publishing problems at Durham) could still do with an update - not sure if work is progressing offline on the issue.<br />
<br />
The Snoplus ticket [https://ggus.eu/?mode=ticket_info&ticket_id=115165 115165] looks like it might be of interest for others - in it Matt M asks about tape-functionality in gfal2 tools. Brian has updated the ticket clueing us in about gfal-xattr.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 UK T'other VO Nagios]<br /><br />
A few failures here at time of writing - although only one at Brunel seems to be more then a few hours old (dc2-grid-66.brunel.ac.uk is failing pheno CE tests).<br />
<br />
Let me know if I missed ought!<br />
<br />
'''Monday 20th July 2015, 14.30 BST'''<br /><br />
27 Open UK Tickets this week.<br />
<br />
'''Brunel'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115113 115113] (17/7)<br /><br />
This Brunel Ops ticket (otherwise okay) is a good reminder that when removing CEs from the GOCDB for your site make sure to get all the services, or else you just might end up still monitored (and thus end up ticketed!). Reopened (can probably be closed soon) (20/7) ''Update - ticket solved''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114006 114006] (31/5)<br /><br />
This ticket about problems with Brunels accounting figures was looking promisingly close to being solved (at least to my layman's eyes) 3 weeks ago. Any word offline? In progress (30/6)<br />
<br />
'''Lancaster'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114845 114845] (6/7)<br /><br />
LHCB jobs were failing at Lancaster a fortnight ago, but things should have been fixed quite promptly. Are they still broken? Waiting for reply (14/7)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
A similar case for this LHCB ticket for QM (part of their ongoing dual-stack saga). Waiting for reply (13/7)<br />
<br />
Note: The related issue [https://ggus.eu/index.php?mode=ticket_info&ticket_id=115017 115017] has been "solved" pending further testing. <br />
<br />
'''Sheffield'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114649 114649] (26/6)<br /><br />
This Sno+ ticket has been in "Waiting for Reply" for a little while, no word from the user (who isn't Matt M). Could we poke Sno+ though another channel about this? Waiting for reply (6/7)<br />
<br />
'''Liverpool'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
We could do with finding out from Sno+ the state of play at Liverpool too, although we might have to wait until Steve is back from his hols later this week to field any replies. In progress (17/6)<br />
<br />
'''Durham'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br /><br />
Ticket concerning the small percentage of Durham jobs that aren't publishing their core count (probably a slurm oddity). Now that Oliver's back from holiday has he had time to look at this? On Hold (19/6)<br />
<br />
'''Tier 1'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=113836 (20/5)<br /><br />
I suspect that whilst the work described chugs along in the background we consider on holding this ticket. In progress (24/6)<br />
<br />
'''UCL'''<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=114746 114746] (30/6)<br /><br />
Ben has been battling getting his DPM working, and spotted an interesting problem where SELinux was blocking the httpd from accessing mysql. It's nice to see someone not just switching SELinux off (like I have a habit of doing...). In progress (20/7)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115003 115003] (12/7)<br /><br />
Andy having some problems on a test SE at ECDF - he seems ot be suffering a series of unfortunate errors. Maybe the storage group could help? In progress (17/7)<br />
<br />
'''GridPP Pilot Roles.'''<br /><br />
Bristol are ready for testing, Govind discovered a possible bug in argus that needs a bit more testing at RHUL. Things seem quiet at RALPP and Sheffield, and Brunel too.<br />
<br />
''Supplemental - In the region multicore publishing ticket ([https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233]) only Oxford have a CE still appearing to publish 0 cores - but I thought this CE was Old-Yeller'ed?''<br />
<br />
'''Tuesday 14th July 2015, 9.30 BST'''<br />
<br />
Lazy update today, due to some fun and games at Lancaster yesterday. <br />
<br />
[http://tinyurl.com/nwgrnys UK GGUS Tickets]<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios]<br />
<br />
'''Tickets that pop:'''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114952 114952] and [https://ggus.eu/?mode=ticket_info&ticket_id=114951 114951] are both atlas frontier tickets at RALPP and Oxford, both have been reopened - although the underlying issues seem different. The Oxford ticket is similar to one at RAL ([https://ggus.eu/?mode=ticket_info&ticket_id=114957 114957]), which looks to be caused by an unannounced change in IP for some important atlas squids (AIUI - speed reading the tickets this morning).<br />
<br />
'''QM IPv6 Woes'''<br /><br />
Followers of the atlas uk lists will have noticed some heroic attempts to diagnose and repair problems at QM which appear to be someone else's fault.<br />
LHCB's ticket to QM on the matter: [https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573]<br /><br />
Dan's ticket concerning the "rogue routes": [https://ggus.eu/index.php?mode=ticket_info&ticket_id=115017 115017]<br />
<br />
'''GridPP Pilot Roles'''<br /><br />
Durham, Bristol, Sheffield, Brunel and RHUL still have open tickets about this. Bristol are working on it, as are Durham - Oliver's ready for their setup to be tested again (Puppet overwrote his last changes!). Not much recent news from the other three.<br />
<br />
That's all from me folks, let me know if I missed ought!<br />
<br />
'''Tuesday 14th July 2015, 9.30 BST'''<br />
<br />
Lazy update today, due to some fun and games at Lancaster yesterday. <br />
<br />
[http://tinyurl.com/nwgrnys UK GGUS Tickets]<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios]<br />
<br />
'''Tickets that pop:'''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114952 114952] and [https://ggus.eu/?mode=ticket_info&ticket_id=114951 114951] are both atlas frontier tickets at RALPP and Oxford, both have been reopened - although the underlying issues seem different. The Oxford ticket is similar to one at RAL ([https://ggus.eu/?mode=ticket_info&ticket_id=114957 114957]), which looks to be caused by an unannounced change in IP for some important atlas squids (AIUI - speed reading the tickets this morning).<br />
<br />
'''QM IPv6 Woes'''<br /><br />
Followers of the atlas uk lists will have noticed some heroic attempts to diagnose and repair problems at QM which appear to be someone else's fault.<br />
LHCB's ticket to QM on the matter: [https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573]<br /><br />
Dan's ticket concerning the "rogue routes": [https://ggus.eu/index.php?mode=ticket_info&ticket_id=115017 115017]<br />
<br />
'''GridPP Pilot Roles'''<br /><br />
Durham, Bristol, Sheffield, Brunel and RHUL still have open tickets about this. Bristol are working on it, as are Durham - Oliver's ready for their setup to be tested again (Puppet overwrote his last changes!). Not much recent news from the other three.<br />
<br />
That's all from me folks, let me know if I missed ought!<br />
<br />
'''Monday 6th July 2015, 14.00 BST''' <br /><br />
30 Open UK Tickets this month. Looking at them all!<br />
<br />
'''NGI'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233] (10/6)<br /><br />
The UK not publishing core counts at all sites. Some progress, but at last check John G couldn't see a change for Oxford or Glasgow. In progress (30/6) ''Update - Glasgow seems to be okay after de-creaming, checking the July list we have t2ce6 at Oxford, ce3 and ce4 at Durham (see their ticket) and cetest02 at IC (but that node has test in its hostname!).''<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114442 114442] (18/6)<br /><br />
Gridpp Pilot role ticket. Accounts need to be created, but no word for a few weeks. In progress (19/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114764 114764] (1/7)<br /><br />
Ticket tracking (false) availability issues, created to appease COD - the problem caused by a broken CA rpm release for Arc CEs. Kashif has created a counter-ticket [https://ggus.eu/index.php?mode=ticket_info&ticket_id=114742 114742] Gordon's sagely advice is to submit a recalculation request once the issue is fixed. Assigned (1/7)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114485 114485] (19/6)<br /><br />
Bristol's gridpp pilot role ticket. No news, could do with an update really. In progress (22/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114426 114426] (18/6)<br /><br />
CMS AAA reading test problems. The Bristol admins have transferred data to their new shiny SE and have asked CMS to test again. No word since. Waiting for reply (30/6)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (1/7/13...)<br /><br />
Tarball glexec ticket, now 2 years old. After a really promising burst the last 6 weeks haven't seen any progress, due to a lot of other "normal" tarball work taking up the time. Sorry! On hold (18/5)<br />
<br />
'''DURHAM'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114536 114536] (22/6)<br /><br />
Durham's gridpp pilot role ticket. Not acknowledged yet, is Oliver back yet? Assigned (22/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114765 114765] (1/7)<br /><br />
See RALPP ticket 114764. Assigned (1/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114727 114727] (30/6)<br /><br />
Catalin ticketed that a number of SW_DIR variables at Durham are still pointing to the old school .gridpp.ac.uk cvmfs space. Assigned (30/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br /><br />
John G ticketed Durham over a small percentage of jobs being published as "zero core". Looks like a SLURM timeout problem, although a fix isn't obvious. Put on the back burner whilst Oliver is on holiday. On Hold (19/6)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114649 114649] (26/6)<br /><br />
A ticket from a Sno+ user about not being able to access software using the Sheffield CEs. Acknowledged but no news. In progress (26/6) ''Update - Elena can't find anything wrong, cvmfs seems to be working fine. Perhaps a problem with the environment?''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] (18/6)<br /><br />
Sheffield's gridpp pilot role ticket. Did you get round to rolling them out? In progress (19/6)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114444 114444] (18/6)<br /><br />
LHCB ticket concerning the DPM's SRM not returning checksum information. On hold whilst a related ticket is being looked at ([https://ggus.eu/index.php?mode=ticket_info&ticket_id=111403 111403]). On Hold (22/6)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
Another Sno+ ticket, about grid production jobs failing at Liverpool. AIUI caused by Sno+ running out of space on the shared pool. At last check Steve posted the usage information for Sno+ but no word since (and Steve's off on his hols). In progress (17/6)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114845 114845] (6/7)<br /><br />
LHCB pilots failing at Lancaster. Looks like a simple node misconfiguration, hopefully fixed, waiting to see if it is. On hold (6/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/2013)<br /><br />
glexec ticket - see Edinburgh description. On hold (15/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1)<br /><br />
Bad bandwidth performance at Lancaster. Hoping that IPv6 will shake things up a bit so pushing that. On hold (18/5)<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114746 114746] (30/6)<br /><br />
SRM-put failures ROD ticket. No news at all. Assigned (30/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114851 114851] (6/7)<br /><br />
Low availability ROD ticket, related to above. Assigned (6/7)<br />
<br />
'''RHUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114441 114441] (18/6)<br /><br />
Another GridPP pilot role ticket. Pilots rolled out, but something isn't quite right and they're not working - Govind is looking again. In progress (6/7)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB ticket about two out of three QM CEs not responding for them. Dan spotted the broken CEs were dual-stacked, the working one wasn't. The ticket seemed to have trailed off into some confusion over who needs to do some testing where. I agree with Dan that that who needs to be someone with LHCB credentials! The waters still seem muddied. In progress (1/7)<br />
<br />
'''IC'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114737 114737] (30/6)<br /><br />
The IC voms wasn't updating properly, due to what I infer from the ticket as "SSL/mysql madness". Simon and Robert have been heroically battling this one - it's a good read. On hold (3/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114379 114379] (16/6)<br /><br />
Sam's ticket about SE support in Dirac. Sam will shortly try testing things out on the new Dirac to see how it fares. In progress (6/7)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114447 114447] (18/6)<br /><br />
Brunel's gridpp pilot ticket. Being worked on, with one CE with the pilots enabled. In progress (26/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114006 114006] (31/5)<br /><br />
A ticket from APEL, about Brunel under-reporting the number of jobs they are doing. Turned out to be a problem with Arc, which Raul upgraded to the fixed 5.0 version. The APEL team deleted the sync records, but no word since. In progress (30/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114850 114850] (6/7)<br /><br />
Another APEL ticket, likely the fallout of the previous one - it looks like GAP publishing has been left on for the Brunel CREAM CEs. Assigned (6/7) ''Update - solved''<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114786 114786] (2/7)<br /><br />
Low availability ticket - see RALPP ticket 114442 - probably could do with On holding. In progress (2/7) ''Update - Onholded''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113910 113910] (26/5)<br /><br />
Sno+ data staging problems. Brian gave some advice on how the large VOs do data staging from tape, and has asked if Sno+ still has problems. Matt M might still be on leave though. Waiting for reply (23/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA problems, which eventually brought to light to a problem with super-hot datasets which were alleviated (I think). Despite an update to castor that improved performance the last batch of tests didn't show improved results. No news since. In progress (17/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113836 113836] (20/5)<br /><br />
Glue mismatch problems at RAL. Working on getting "many-Arcs" to correctly publish. In progress (24/6)<br />
<br />
'''Monday 29th June 2015, 14.30 BST'''<br /><br />
<br />
Looking at the [https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''"Other VO" Nagios''']. <br /><br />
Things look generally alright - but Durham look like they need to update their CA rpms - but that might have to wait until Oliver is back from leave.<br />
<br />
'''Tarballs:'''<br /><br />
I don't think this effects many, but there's was a ticket to produce a new version of the WN tarball (which is done): [https://ggus.eu/index.php?mode=ticket_info&ticket_id=114574 114574]<br ><br />
Although AFAICS there is no urgent need to upgrade tarball WNs.<br />
<br />
26 UK Tickets, although not many stand out.<br />
<br />
'''Gridpp VO Pilot tickets:'''<br /><br />
Largely doing alright. With Oliver away the Durham ticket hasn't been looked at yet. Sheffield and Bristol's tickets could do with an update (or on-holding if there's going to be a delay). The RHUL ticket has been reopened as they're deployment of the pilot roles hasn't quite worked out.<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB having trouble with two out of three QM CEs. Dan notes that the two "broken" CEs have been recently dual-stacked, and asks if this could be the problem. The answers is a resounding "maybe", and Raja asks if problems could be duplicated by others using lxplus. Waiting for reply (24/6)<br />
<br />
'''IMPERIAL/DIRAC'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114379 114379] (16/6)<br /><br />
Sam's ticket trying to get SE support with Dirac...er, spruced up. Daniela has asked if the tests can be redone with the "new" dirac. Waiting for reply (22/6)<br />
<br />
Let me know if I missed any tickets.<br />
'''Monday 22nd June 2015, 14.00 BST'''<br />
<br />
35 Open UK Tickets this week (!!!)<br />
<br />
'''GridPP Pilot Role'''<br /><br />
A dozen of them are from Daniela (who painstakingly submitted them all) concerning getting the gridpp (and other) pilot role enabled on the site in question's CEs.<br />
<br />
An example of one of these tickets is:<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=114440 114440] (Lancaster's smug solved ticket).<br />
<br />
Ticketed sites are: Durham, Bristol, Cambridge, Glasgow, ECDF (who are also having general gridpp VO support problems), EFDA-JET (looking solved), Oxford, Liverpool, Sheffield, Brunel, RALPP and RHUL. Most tickets are being worked on fine, but the Bristol and Liverpool ones were still just in an "assigned" state at time of writing. <br /><br />
''Update - good progress on this, just one ticket left "assigned". Cambridge are done, as are JET (ticket needs to be closed). Oxford and Manchester are ready for to have their new setups tried out, with Oxford kindly road-testing glexec for the pilot roles. Good stuff.'' <br />
<br />
'''Core Count Publishing'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233] (10/6)<br /><br />
Of the sites mentioned in this ticket (Durham[1], IC, Liverpool, Glasgow, Oxford) who *hasn't* had a go at changing their core count publishing? I know Oxford have. Daniela had a pertinent question about publishing for VMs, which John answered. In progress (17/6)<br />
<br />
[1] Durham have another ticket on this which may explain their lack of core count publishing: <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br />
<br />
'''DIRAC'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114379 114379] (16/6)<br /><br />
Sam S formed this ticket over having trouble accessing the majority of SEs over Dirac, after some discussion around this last week. Sam acknowledges that this could be a site problem, not a DIRAC problem, but you gotta start somewhere (he worded that point more eloquently). Daniela has posted her latest and greatest DIRAC setup gubbins for Sam to try out. Another, unrelated, point to have are the names missing from Sam's list - for example I'm pretty sure Lancaster should support gridpp VO storage but I've forgotten to roll it out! Waiting for reply (22/6)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
Final ticket today, and another one discussed last week in the storage meeting. Steve's explanation of why (and how) Sno+ would need to start to using space tokens was fantastically well worded in a way to not spook easily startled users. David is digesting the information, but it will likely need to wait for Matt M's return before we'll see progress. In progress (16/6)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114444 114444](18/6)<br /><br />
I told a pork pie when I said that was the last ticket - this one caught my eye. A ticket from lhcb over files not having their checksums stored on Manchester's DPM. A link was given to another ticket at CBPF for a similar issue which got the DPM devs involved ([https://ggus.eu/index.php?mode=ticket_info&ticket_id=111403 111403]) - although Andrew McNab was already subscribed to the ticket. In progress (19/6)<br />
<br />
'''Monday 15th June 2015, 14.15 BST'''<br /><br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO nagios:]<br /><br />
ce05.esc.qmul.ac.uk and hepgrid11.ph.liv.ac.uk seem to be having a spot of bother for multiple VOs, and hepgrid2.ph.liv.ac.uk seems to be starting to have trouble too.<br />
<br />
19 Open UK Tickets this week.<br />
<br />
'''NGI'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233] (10/6)<br /><br />
John Gordon ticketed the NGI (as well as others) about some sites in the UK not publishing core counts with their APEL numbers (or more precisely have submission hosts at that site not publishing). Following the link it looks like Imperial, Liverpool, Durham, Glasgow and Oxford are on this list, I've listed the submission hosts reporting "0" core jobs below to help people clear up their rogues! If you've only fixed things in the last fortnight you'd still show up on this list. In progress (15/6) <br />
<br />
"0" core job submission hosts:<br /><br />
ce3.dur.scotgrid.ac.uk<br /><br />
ce4.dur.scotgrid.ac.uk<br /><br />
cetest02.grid.hep.ph.ic.ac.uk<br /><br />
hepgrid5.ph.liv.ac.uk<br /><br />
hepgrid6.ph.liv.ac.uk<br /><br />
hepgrid97.ph.liv.ac.uk<br /><br />
svr009.gla.scotgrid.ac.uk<br /><br />
t2ce06.physics.ox.ac.uk<br />
<br />
'''Tier 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113914 113914] (26/5)<br /><br />
Sno+ file copying ticket. Matt M is away on his hols, but Dave Auty has took over his duties and reports that this problem seems to have gone away - it can probably be closed. In progress (9/6)<br />
<br />
'''Liverpool'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
Another Sno+ ticket here from David concerning job failures. Nothing wrong with the ticket handling, but I thought that David's errors in submitting test jobs are worth documenting, as they were very understandable. David has since asked if the Sno+ job failures are linked to Sno+ nagios test failures at Liverpool. In progress (12/6)<br />
<br />
'''Imperial'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114157 114157] (8/6)<br /><br />
After Simon and Daniela have cleared up the atlas dark data and expanded their Space Tokens using the space freed up there still seems to be some confusion in Rucio, disagreeing with the SRM numbers. In progress (10/6)<br />
<br />
'''Oxford'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114208 114208] (9/6)<br /><br />
Oxford being ticketed for UKI-SOUTHGRID-OX-HEP_IPV6TEST failing connection tests, tests for which Oxford should not be getting ticketed afaacs. I remember this being mentioned in the Thursday cloud meeting, but I'm ashamed to say I wasn't paying attention. Were any conclusions drawn/decisions made? In progress (10/6)<br />
<br />
'''Manchester'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114153 114153] (7/6)<br /><br />
Atlas transfer failures from Manchester. Errors are still occurring as of yesterday, any news? In progress (14/6)<br />
<br />
<br />
'''Monday 8th June 2015, 15.00 BST'''<br /><br />
21 Open Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113914 113914] (26/5)<br /><br />
Sno+ had problems at the Tier 1 where jobs failed whilst uploading data, believed to be due to an incorrect VOInfoPath. There's been a failure at replicating the issue, and the VOInfoPath advertised is correct. Very confusing, as I assume it all worked at some point before! In progress (2/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113910 113910] (26/5)<br /><br />
Another Sno+ ticket, concerning lcg-cp timeouts whilst data-staging from tape. Matt M has asked for advice on the best practice for doing this, or if Sno+ would be better off just upping their timeouts. Brian has given some advice on using the "bringonline" command, but is himself unsure the best way of seeing what files are currently in a VO's cache. Not much news since. In progress (28/5) <br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114004 114004] (31/5)<br /><br />
Atlas transfers fail due to the "bring-online" timeout being exceeded. Brian spotted a problem with file timestamps mismatching, but no news on this ticket since. In progress (1/6)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114006 114006] (31/5)<br /><br />
APEL accounting oddness at Brunel, noticed by the APEL team. After much to-and-fro-ing John noticed that multiple CEs were using the same SumbitHost, and thus overwriting each other's sync records. Something to watch out for. In progress (7/6)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114157 114157] (8/6)<br /><br />
There's been some debate on the atlas lists about this ticket, a classic "not enough space at the site" ticket. Raising above the indignation over being ticketed for this, Daniela has offered a couple of TB to give some space, and pointed out that IC have some atlas data outside space tokens, and that this could be used to expand the tokens if cleaned up. Waiting for reply (8/6)<br />
<br />
<br />
'''Tuesday 26th May 2015'''<br /><br />
Matt's on leave until the 8th of June. But he's replaceable with handy links... 23 tickets today:<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios''']<br /><br />
<br />
[http://tinyurl.com/nwgrnys '''UK NGI GGUS tickets''']<br />
<br />
'''Monday 18th May 2015, 14.30 BST'''<br /><br />
Full review this week.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios''']<br /><br />
At time of writing I see problems with test jobs at Brunel for pheno and Liverpool for a number of VOs (see Sno+ ticket for probable cause and fix at Liverpool).<br />
<br />
22 Open UK Tickets this week. Going site-by-site:<br />
<br />
'''APEL/NGI'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113473 113473] (4/5)<br /><br />
Missing accounting date for April for some sites. Raul is discussing things for Brunel in the ticket, although they have republished. I think it's only ECDF left to republish their April data. In progress (16/5)<br />
<br />
'''OXFORD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113482 113482] (26/4)<br /><br />
Loss of accounting data for Oxford needing a APEL republish. The Oxford guys republished, but there is some confusion with the resulting numbers. Discussion is ongoing, John G is currently looking at the records. In progress (14/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113650 113650] (11/5)<br /><br />
CMS glideins failing at Oxford. The original problem was with a config tweak being left out of the cvmfs setup, but the ticket has been reopened citing problems persisting on the ARC CE (the CREAM appears to be fixed). Reopened (16/5)<br />
<br />
'''GLASGOW'''<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113095 113095] (17/4)<br /><br />
ROD ticket about batch system BDII failures, left open to avoid unnecessary ticket filing. Gareth noted that the full migration to ARC and HTCondor, which should see the end of these issues, will hopefully be completed by the end of June. On Hold (12/5)<br />
<br />
'''ECDF'''<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (31/7/13)<br /><br />
Somehow left this one out of the e-mail update. Edinburgh's glexec ticket, dependent on the tarball. I put in my tuppence worth today with my tarball hat on. On hold (18/5)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113769 113769] (18/5)<br /><br />
LHCB see a cvmfs problem at Sheffield. Elena has probably fixed the problem(restarted the sssd), just waiting to see if it all pans out. In progress (18/5)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113744 113744] (15/5)<br /><br />
For the VOMS rather then the site, Jens' request for the creation of the dIrac VO, vo.dirac.ac.uk. In progress (18/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113692 113692] (13/5)<br /><br />
A request from pheno to add support to for their new cvmfs area at Manchester, and as I understand it, to support them in a new "form" (pheno.egi.eu). In progress (13/5)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113742 113742] (15/5)<br /><br />
Sno+ noticed their nagios failures at Liverpool. Rob reckons this was a problem with the DPM BDII service certificate not being updated (that's bitten me too), and fixed things this morning. Let's see how that goes. In progress (18/5)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13!)<br /><br />
Lancaster's vintage glexec ticket. An update on this - after have a roundtuit session last week I was building glexec for different paths. It still needs some testing to make sure it works properly. There however definitely won't be a one-size-fits-all tarball solution. On hold (15/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/14)<br /><br />
Only the crustiest old tickets for us at Lancaster! Poor perfsonar performance. Sadly didn't get roundtuit on this one - we're pushing getting these nodes dual stacked as Ewan had pointed out that it would be interesting to see if IPv6 tests also saw this issue. On hild (18/5)<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113721 113721] (14/5)<br /><br />
The only UCL ticket, this is a egi "low availability" ticket. However Daniela notes that the plots are on the rise, so things are looking alright. Probably want to "On Hold" it but otherwise not much to be done. In progress (14/5)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113743 113743] (15/5)<br /><br />
A ticket from Durham concerning the Dirac instance at Imperial's settings for their site. Daniela hopes to get it fixed soon. In progress (15/5)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112948 112948] (10/4)<br /><br />
CA certificate update at 100IT leading to a discussion of other authentication based failures. David has asked for voms information after posting his configs. In progress (13/5)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113035 113035] (14/4)<br /><br />
Ticket tracking the decommissioning of the Tier 1 CREAM CEs. I think things are just about done now, this ticket can soon be closed. In progress (11/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694] (28/10/14)<br /><br />
Sno+ gfal-copy ticket. Brian reports that the Tier 1 is upgrading gfal2 on their WNs, and notes that there's a lot of active debugging work going on in the area. As he eloquently puts it "situation is quite fluid". In progress (13/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA tests failing at the Tier 1. There's been a lot of work on this, deploying then trying to get the new xrootd director configured. New problems have cropped up, and are under investigation. In progress (11/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (28/3)<br /><br />
Atlas transfer failures ("failed to get source file size"). Tracked to a odd double transfer error, possibly introduced in one of the recent "upgrades". Brian has been declaring these files as bad, and a workaround or solution is being thought about. In progress (14/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113705 113705] (13/5)<br /><br />
Atlas transfer failures from RAL tape. Checksum failures, which Brian tracked to being due to not being of a type Castor supports. Brian has asked if this can be changed at the CERN FTS or in rucio. Waiting for reply (14/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113748 113748] (16/5)<br /><br />
Another atlas transfer ticket, but as the error indicates no space left at the Brunel space token being transferred to Elena has noted that this isn't a site problem, telling the submitter to put in a JIRA ticket instead. Waiting for reply, but probably can be just closed (16/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112866 112866] (7/4)<br /><br />
Lots of cms job failures at RAL. This has been traced to some super-hot files, mitigation is being looked into. A candidate for perhaps On Holding, depends on the time frame of a work around. In progress (13/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113320 113320] (27/4)<br /><br />
CMS data transfer issues. I'm not actually too sure what's going on. There are files that need invalidating, which seems to be the root of the evil befalling transfers. The issue is being actively worked on though. In progress (18/5)<br />
<br />
'''Monday 11th May 2015, 14.10 BST'''<br /><br />
22 Open UK Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
There are a few tickets at the Tier 1 that are set "In Progress" but haven't received an update yet this month:<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (CMS AAA Tests, 30/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (Atlas Transfer problems, 16/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694] (SNO+ gfal copy trouble, 15/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112866 112866] (CMS job failures, 7/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112819 112819] (SNO+ arcsync troubles, 20/4)<br />
<br />
'''Other Tier 1 Tickets''' (sorry to be picking on you guys!)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (10/2)<br /><br />
Atlas glexec hammercloud test jobs at the Tier 1. It appears to be working, but a batch of test jobs failed because they couldn't find the "mkgltempdir" utility on some nodes ("slot1_5@lcg1742.gridpp.rl.ac.uk" and "slot1_4@lcg1739.gridpp.rl.ac.uk"). In progress (4/5)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=113320 113320] (27/4)<br /><br />
Maybe repeating what Daniela is going to say in the CMS update - trouble with CMS data transfers within RAL. It's under investigation, but it looks like the files in question will need to be invalidated - even if it's just to paint a clearer picture. In progress (10/5)<br />
<br />
'''APEL REPUBLISHING'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113473 113473] <br /><br />
At last update Brunel, Liverpool, Edinburgh, Birmingham and Oxford need to republish still. Oxford have their own ticket about it due to complications ([https://ggus.eu/?mode=ticket_info&ticket_id=113482 113482]).<br />
<br />
'''UCL Tickets''' - Ben is starting to move to close these, some are going to be "unsolved".<br />
<br />
'''GLASGOW'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113095 113095] (17/4)<br /><br />
Andrew asks if the timeframe for the move to Condor be added to this ticket, for the ROD team's information. On Hold (7/4)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112948 112948] (10/4)<br /><br />
No news on this 100IT ticket for a while. In progress (27/4)<br />
<br />
'''Friday 1st May'''<br /><br />
The Bank Holiday weekend might muck up plans for a Ticket review this week. Just in case, some links!<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios page.]<br /><br />
<br />
[http://tinyurl.com/l784bbg UK GGUS Tickets]<br />
<br />
Hope you all have a nice weekend!<br />
<br />
A quick check of the [https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios] page.<br />
<br />
26 Open UK tickets this week.<br />
<br />
'''ITWO Decommissioning'''<br /><br />
Three of the tickets are to the VOMS sites (Manchester, Oxford, IC), concerning the decommissioning of the ITWO VO. Just an FYI to y'all.<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113293 113293] (26/4)<br /><br />
There was an APEL problem last month where a lot of sites needed to republish their data for the month. I think Edinburgh are the only UK site that suffered this problem, but another FYI ticket. Assigned (26/4) '' And solved''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113181 113181](21/4)<br /><br />
Atlas production jobs not running at ECDF. Andy noticed that analysis jobs were running fine, and believes that this might be a problem scheduling pilots in time. Perhaps a multicore issue if this is only effecting (affecting?) production jobs. In progress (22/4) ''Update - solved''<br />
<br />
'''ATLAS GLEXEC HAMMERCLOUD PROBLEMS'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (TIER 1)<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=111703 111703] (RALPP)<br /><br />
It was discovered that there was a problem in the test code, so the ball is very much in atlas' court for this one. The problem has been fixed and the tests are being rebuilt and resubmitted.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (28/3)<br /><br />
ATLAS FTS failures too RAL. A rucio issue causing double-transfers has been discovered ([https://its.cern.ch/jira/browse/ATLDDMOPS-4939 here]), which would explain the behaviour seen. No news since this revelation. In progress (16/4)<br />
<br />
There are a number of other Tier 1 tickets that could do with either an update or On Holding<br />
<br />
'''Monday 20th April 2015, 14.30 BST'''<br /><br />
24 Open UK tickets this week, only a light review. ''Update - down to 20 open tickets as of this morning''<br />
<br />
'''NGI/TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113150 113150] (20/4)<br /><br />
Fresh in - the NGI has been ticketed to change the regional VO from emi.argus to ngi.argus in the gocdb. Seems a bit pedantic, but hey! I assigned it to the NGI ops, and notified RAL as keepers of the regional argus. Assigned (20/4) ''Update - solved''<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113035 113035] (14/4)<br /><br />
Just for people's interest, the ticket tracking the decommissioning of the last of the RAL CREAM CEs. In progress (14/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112819 112819] (2/4)<br /><br />
A SNO+ ticket I must of somehow missed last week, concerning SNO+'s manual renewing of proxies on ARC machines. Matt M has noticed that ArcSync occasionally hangs rather then timeouts smoothly (although he later notes that he doesn't see the initial problems working from a different network). I'm thinking that this should be redirected at the arc devs, but I don't think they have a GGUS support group (I could be wrong, I'm well behind on the ARC curve). In progress (7/4)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113110 113110] (17/4)<br /><br />
Looks like this atlas low transfer efficiency ticket can be closed. Waiting for reply (20/4) ''Update - solved''<br />
<br />
'''GLASGOW'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113095 113095] (17/4)<br /><br />
ROD ticket for some BDII misreporting at Glasgow. The botheration seems to be ephemeral in nature, the blunders passing with the abating of their batch system's burden. This ticket can probably be solved. In progress (17/4)<br />
<br />
<br />
'''Monday 13th April 2015, 14.00 BST'''<br /><br />
24 Open tickets this week - going over all of them this week, site by site.<br />
<br />
''Fresh in this morning - [https://ggus.eu/?mode=ticket_info&ticket_id=113010 113010] and [https://ggus.eu/?mode=ticket_info&ticket_id=113011 113011] - Sno+ tickets concerning the RAL and Glasgow WMSes not updating job statuses.''<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703](11/2)<br /><br />
Atlas glexec hammercloud tests failing. There's been a lot of waiting on atlas to build new HC jobs. The most recent exchange (delayed due to Easter), was asking about SELinux - but no news since the first. In progress (1/4)<br />
<br />
'''BIRMINGHAM'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112875 112875](7/4)<br /><br />
Low availability ROD ticket. Availability is crawling back up, just need it to go green. On hold (13/4)<br />
<br />
'''GLASGOW'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112967 112967](10/4)<br /><br />
Another ROD ticket for bdii errors at Glasgow. Gareth has been doing everything right investigating this. Kashif recommended ticketed the midmon unit, but Gareth has spotted that the errors correspond to high load on their ARC CE - so it might be a site problem after all - Gareth asks for clarification. Waiting for reply (13/4)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (1/7/13)<br /><br />
Tarball glexec ticket. No news (sorry). End of April I believe was the "deadline" I set for having this made. On Hold (9/3)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/14)<br /><br />
Lancaster's poor perfsonar performance. I'm not believing quite what I was seeing with the tests I performed so I'm aiming to rerun them. On hold (13/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13)<br /><br />
Lancaster's tarball glexec ticket. Same as ECDF. On hold (9/3)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112966 112966] (13/3)<br /><br />
A ROD cream job submit ticket, freshly assigned this afternoon. It's a bit mean of me to bring notice to it. Assigned (13/4) ''And POW, Raul closed this after kicking torque into shape - solved''<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112948 112948] (10/4)<br /><br />
100IT needed to upgrade to the latest CA release. They've done this, but there are still authentication problems. In progress (13/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356] (10/9/14)<br /><br />
Deploying vmcatcher at 100IT. After David's questions falling on deaf ears for a while it has been advised that the ticket be closed as this issue will be dealt with elsewhere. Whether or not it is to be "solved" or "unsolved" is open to debate! In progress (can possibly be closed) (13/4)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA tests failing at RAL. After a lot of work and new xrootd redirectors problems persist. It's looking to be a problem that needs the CASTOR and/or xrootd devs to look at. In progress (30/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112713 112713] (27/3)<br /><br />
CMS asking to clean up the "unmerged area". Andrew conjured up a list of files and asked if they could be deleted - CMS responded with a "yes please then close the ticket". Has the deed been done? In progress (31/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694] (28/10/14)<br /><br />
The Sno+ gfal copy ticket. Matt M still sees gfal-copy hang for files at RAL when he uses the GUID (SURL works). A Castor oddity perhaps? Matt asks a question about what problems like this (coupled with the move away from lcg tools) will mean for VOs that rely on the LFC. In progress (31/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112977 112977] (10/3)<br /><br />
CMS high job failure rate at RAL. Related to 112896 (below) - the jobs all want that file! In progress (13/3)<br />
<br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=112896 (9/4)<br /><br />
CMS Dataset access problems - caused by over a million access attempts on a single file over a 18 hour period. Andrew L comments that CMS needs to have a think about how they access pileup datasets. In progress (9/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (10/2)<br /><br />
Tier 1 counterpart to 111703. A new HC stress test was submitted near the end of March, but no news on how it did. In progress (23/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112866 112866] (7/4)<br /><br />
A different "lots of CMS job failures" ticket. Again a "hot file" seems to be the root cause. In progress (7/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (28/3)<br /><br />
An atlas file access ticket, seemingly caused by some odd FTS behaviour. No answers to Shaun's question about this odd occurrence or much noise at all till today. Waiting for reply (13/4)<br />
<br />
'''UCL'''<br /><br />
UCL has 6 tickets - 4 just "assigned". I'll just list them in the interests of brevity.<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112371 112371] (ROD low availiability, On Hold)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112841 112841] (atlas 0% transfer efficiency, assigned)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112873 112873] (ROD srm put failures, assigned)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95298 95298] (glexec ticket, on hold)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112722 112722] (atlas checksum timeouts, in progress)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112966 112966] (ROD job submit failures, assigned)<br />
<br />
'''Tuesday 7th April'''<br />
* 20 open tickets. [http://tinyurl.com/l784bbg Link to GGUS].<br />
<br />
'''Monday 23rd March 2015, 15.30 GMT'''<br /><br />
19 Open tickets this week. <br />
<br />
'''BIRMINGHAM'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=112550 (23/3)<br /><br />
A ticket fresh off the ROD dashboard - the Birmingham CREAMs aren't being matched ("BrokerHelper: no compatible resources"). Matt W has double checked their setup and can't spot anything wrong - they've been running "normal" atlas/lhcb etc jobs fine over the last few weeks. Any advice appreciated. In progress (23/3)<br />
<br />
'''TIER 1'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=112495 (21/3)<br /><br />
MICE problems running jobs at RAL, which Andrew L discovered coincided with WMS problems that he fixed. Probably should be in "Waiting for Reply/Seeing if the problem's evaporated". In progress (23/3)<br />
<br />
https://ggus.eu/?mode=ticket_info&ticket_id=112350 (14/3)<br /><br />
The cause of this Sno+ ticket, about a recent user not being able to access files due to not being in the gridmap, has been discovered. As Robert F sagely pointed out the latest version of the mkgridmap rpm is required to talk to the voms server. Just waiting on the time for it to updated at now. In progress (17/3)<br />
<br />
'''100IT'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=108356 (10/9/14)<br /><br />
This 100IT ticket is still waiting for a reply (since mid-January). The question needs to be answered by someone familiar with the technologies and terminologies? Is anyone up on vmcatcher? Anyone know what other channel to pass David's query onto? Waiting for reply (15/1)<br />
<br />
'''Monday 16th March 2015, 15.30 GMT'''<br /><br />
<br />
16 Open UK tickets this week. Half Red, Half Green.<br />
<br />
'''RALPP and TIER 1 glexec HC tickets'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703] (RALPP)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (TIER 1)<br /><br />
No news on these tickets since it was expected that a new stable HC job would be released last Tuesday. All very quiet.<br />
<br />
'''OXFORD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112011 112011](25/2)<br /><br />
Similar for this CMS glexec ticket - no news after Kashif asked for some more information way back. Waiting for reply (27/2)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694](28/10/14)<br /><br />
The Sno+ gfal copying ticket. A lot of people are working on this, and attempts to recreate the problems seem to occasionally be devolving into "did we get this complicated command right?". At some point it might be necessary to get the gfal devs involved (are there gfal devs?). Waiting for reply (11/3)<br />
<br />
Also there's the poor 100IT ticket, still waiting for a reply. The JET ticket also needs wrapping up, I put a reminder in to the end of it.<br />
<br />
<br />
'''Monday 9th March 2015, 15.00 GMT'''<br/ ><br />
From last week's crusty ticket round up:<br />
<br />
'''Tier 1'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694]- 28/10/14<br/ ><br />
Matt M has managed to reclaim his tickets after a certificate change orphaned his old ones. Progress has resumed. Duncan has asked Matt to retry his failing tests with a simple copy to local disk example.<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944]- 1/10/14<br/ ><br />
CMS AAA access at RAL. Andrew posed a question to the CMS xroot experts last week - if you have their details it might be a good idea to involve them in the ticket.<br />
<br />
'''100IT'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356]- 10/9/14<br/ ><br />
This VMCatcher ticket is still stuck waiting for a reply. Deafening silence for our 100IT colleagues. <br />
<br />
'''EFDA-JET'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=97485 97485] - 21/9/13<br/ ><br />
The Jet LHCB ticket is in the state of being wrapped up. LHCB have been removed from the local configs, and the site has been removed from LHCB's. I believe that this ticket can be terminated.<br />
<br />
'''QMUL'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353] - 25/11/14<br/ ><br />
Dan's managed to get webdav working on one SE, but not t'other. Very strange, but Dan is investigating (see also [https://ggus.eu/?mode=ticket_info&ticket_id=111942 111942]).<br />
<br />
No movement on the three glexec tickets (none expected on the two tarball ones in the last week though), the Lancaster perfsonar ticket is still waiting on another batch of local tweaks (and I still need to make sense of what I'm seeing). Matt RB closed Sussex's perfsonar ticket though - nice one.<br />
<br />
'''The "Normal" tickets:'''<br />
<br />
Atlas gLexec Hammercloud failures (RALPP and Tier 1)<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (Tier 1)<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703] (RALPP)<br/ ><br />
These tests were waiting on a new stable job release being made - this has been delayed (hopefully out tomorrow).<br />
<br />
'''TIER 1'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111856 111856](19/2)<br/ ><br />
This LHCB ticket about stalled jobs looks like it can be closed (LHCB no longer see a problem). ''Update - set to solved, the jobs were being killed for using too much memory.''<br />
<br />
'''OXFORD'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112011 112011](25/2)<br/ ><br />
A CMS user saw glexec failures on some nodes - Kashif asked for some more information but there has been no reply. I'd consider giving the user till the end of the week then closing the ticket if there's still no word. Waiting for reply (27/2)<br />
<br />
'''Tuesday 3rd March'''<br />
* [http://tinyurl.com/nwgrnys A link to GGUS open tickets] for checking status directly.<br />
<br />
Concentrating on pre-2015 tickets this week in an attempt to Spring Clean the UK ggus presence. I will review these again next week - can people please take a look at these tickets if they're owned by them (or if they think they can help!).<br />
<br />
'''SUSSEX - 26/11/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389]<br /><br />
This is a perfsonar ticket - the initial request (reinstalling the perfsonar node) has been done a while ago but things weren't quite right. Matt RB did some soothing to this last week and asks if he's missed anything - I put it to waiting for reply this morning. Waiting for reply (24/2)<br />
<br />
'''QMUL - 25/11/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353]<br /><br />
Atlas wanting https access on QM's SE. Dan's been working on this nicely, carefully testing each stage of his rollout. The end is in sight here. In progress (17/2)<br />
<br />
'''TIER 1 - 28/10/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694]<br /><br />
This is a SNO+ ticket about getting gfal tools working for the Tier 1 - with the new version out Brian has tested it correctly (and I saw a related thread on lcg-rollout) - but no word from Sno+. Who is wrangling the other VOs in these post-Walker times? Waiting for reply (24/2)<br />
<br />
'''TIER 1 - 1/10/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944]<br /><br />
A CMS access about AAA tests at the Tier 1. This ticket is being actively worked on, with a new xrootd redirector at RAL and problems with the EU redirectors mucking things up. No problems that I can see. Waiting for reply (2/3)<br />
<br />
'''100IT - 10/9/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356]<br /><br />
Getting VMcatcher stuff to work at 100IT. This ticket seems to keep stalling due to lack of documentation or replies from the submitters. Waiting for reply (19/1)<br />
<br />
'''LANCASTER - 27/1/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566]<br /><br />
Lancaster's poor perfsonar performance. Being poked and prodded on and off over the last year, but the problems remain a mystery - Ewan's lending of a iperf endpoint has helped out greatly though, waiting on yet another network tweak. On Hold (23/2)<br />
<br />
'''EFDA-JET - 21/9/13'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=97485 97485]<br /><br />
LHCB job failures at EFDA-JET. The causes of this remain a mystery, and is the first ticket on my "to be set to unsolved" list.<br />
<br />
'''UCL - 1/7/13'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95298 95298]<br /><br />
UCL's glexec ticket. Ben's been working hard at this recently, but keeps hitting show stoppers - the latest being a performance problem possibly due to the VM he's running argus on. On hold (19/2)<br />
<br />
'''ECDF and LANCASTER - 1/7/13'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (ECDF)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (Lancaster)<br /><br />
glexec for the tarball. This is *still* waiting on the tarball glexec, which is again waiting on me, which is waiting on me magicking some extra tarball development time. Will be reviewed by the end of March.<br />
<br />
<br />
'''Monday 23rd February 2015, 15.00 GMT'''<br /><br />
Only 15 Open UK tickets this week. Feel free to bring up any ticket-based issues of your own on this quiet week.<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703](11/2)<br /><br />
This CMS glexec hammercloud ticket is looking a little quiet - no update for a while. If it's continuing offline or waiting on input could it at least be put On Hold? In progress (11/2)<br />
(The Tier-1 version of this ticket, 111699, seems to be chugging along fine - there might be useful snippets in there).<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333](22/1)<br /><br />
The Cloud accounting probe ticket was reopened, asking if 100IT ticketed apel support (I assume contacting them via other means would work too) otherwise the new cloud accounting won't be properly republished. Reopened (20/2)<br />
<br />
<br />
'''IMPERIAL''' (but not really their issue)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111872 111872](20/2)<br /><br />
Tom opened a ticket after another cern@school user had troubles using the IC SE - there has been some problems with the newer versions of the dirac UI. Sometimes it's better to go Vintage! Although after trying Simon and Daniela haven't been able to reproduce the failure - perhaps something's up with the user's UI? Waiting for reply (23/2)<br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389](26/11/14)<br /><br />
Sussex's Perfsonar ticket. I know Matt RB has put the ticket On Hold and is very busy - is there any news/anything we can do to help? On Hold (21/1<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694](28/10/14)<br /><br />
Sno+ gfal copy problems at RAL. Brian informs us that the latest version of the gfal tools works for him and has asked if they work for Matt M. and Co. Did you get these packages out of epel/epel-testing or somewhere else Brian? Waiting for reply (18/2)<br />
<br />
'''Monday 16th February 2015, 14.30 GMT'''<br /><br />
Only 19 open UK tickets today.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111120 111120](12/1)<br /><br />
An atlas ticket concerning transfer failures between RAL and BNL. Brian mentioned last week that the lack of recent failures is due to atlas not attempting to transfer any older data recently. Perhaps this could do with being put into the ticket (and the ticket being put On Hold, or prodded some more)? Waiting for reply (29/1)<br />
<br />
Also<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=111800 111800] (17/2)<br /><br />
ARC CE issues at RAL detected. <br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703](11/2)<br /><br />
Atlas running glexec hammerclouds - having trouble at RALPP (and RAL, see 111699). The glexec experts have gotten involved on this one, and asked to take a peek at a proxy - not sure about anyone else, but I'd feel a tad uncomfortable sharing proxies, even with known and trusted experts as in this case. Am I being overly paranoid? Either way the ticket has gone a bit quiet. In progress (11/2) <br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356] & [https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333]<br /><br />
Both the 100IT tickets are in Waiting for reply - the oldest one for quite a while - David asked a question a while back and no answer. The newest one asks if the 100IT logs made it to apel safely - I think what David has to do is submit a ticket with this question to the apel support team - have I got the right end of the stick? <br />
<br />
'''Biomed Tickets at Manchester and Imperial'''<br /> <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356] & [https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357]<br /><br />
FYI There's a note at the bottom of both of these tickets that the version of CREAM that should fix this has been delayed until the end of February(ish).<br /><br />
''Update - I read those updates wrong - the cream update has been released and these tickets have been (perhaps erroneously) closed.'' <br />
<br />
<br />
<br />
'''Monday 9th February 2015, 15.00 GMT'''<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios Results]<br /><br />
At the time of writing the only site showing red that aren't suffering an understood problem was RALPP with org.nordugrid.ARC-CE-submit and SRM-submit test failures for gridpp, pheno, t2k and southgrid for both its CEs and its SE. The failures are between 1 and 12 hours old, so it doesn't seem to be a persistent failure, but it seems to be quite consistent. They all seem to be failing with "Job submission failed... arcsub exited with code 256: ...ERROR: Failed to connect to XXXX(IPv4):443 .... Job submission failed, no more possible targets". Anyone seen something like this before?<br />
<br />
Only 20 Open UK tickets this week.<br />
<br />
'''Biomed tickets:'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356] (Manchester)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357] (Imperial)<br /><br />
Biomed have linked both these tickets as children of 110636, being worked on by the cream blah team. AFAIKS no sign of Cream 1.16.5 just yet.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111347 111347] (22/1)<br /><br />
CMS consistency checks for January 2015. It looks like everything that was asked of RAL has been done by RAL, so hopefully this can be successfully closed. In progress (3/2)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111120 111120] (12/1)<br /><br />
Another ticket, this time concerning a period of Atlas transfer failures between RAL and BNL, that looks like it can be closed as the failures seem to have stopped (and might well have been at the BNL end). Waiting for reply (22/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA test failures at RAL. Federica can't connect to the new xrootd service according to the error messages. No news for a while. In progress (29/1)<br />
<br />
'''100IT'''<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356]<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333]<br /><br />
Both of these 100IT tickets are looking a bit crusty - the first is waiting for advice, the second was just put "In progress".<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353] (25/11/14)<br />
Dan has set up se02.esc.qmul.ac.uk to test out the latest https-accessible version of storm for dteam and atlas. As a cherry on top this node is also IPv6 enabled. I'm not sure if Dan wants others in the UK to "give it a go"? In progress (6/2)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/14)<br /><br />
(Blatantly scounging for advice) Trying to figure out why Lancaster's perfsonar is under-performing. Ewan kindly gave us access to a iperf endpoint and it's been very useful in characterising some of the weirdness - although I'm still confused. Ewan also gave us a bunch of suggestions for testing that have been useful - next stop, window sizes. If anyone else wants to throw advice to me all wisdom donations are thankfully accepted. My advice for others in be careful trying to connect to the default iperf port on a working DPM pool node.... In Progress (9/2)<br />
<br />
'''Monday 2nd February 2015, 14.00 GMT'''<br /><br />
22 Open UK tickets this month. <br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389] (26/11/14)<br /><br />
A perfsonar ticket for Sussex. Their perfsonar has been reinstalled, but needs soothing. Matt has informed us that this might have to wait a few weeks due to other issues. On Hold (21/1)<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110536 110536] (2/12/14)<br /><br />
MICE job failures at RALPP - it looked like they were dying due to running of of memory. The queues have been tweaked to give MICE more, but no word from the MICE if this has solved the problem. Waiting for reply (12/1)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110365 110365] (25/11/14)<br /><br />
Another perfsonar ticket. Again the node is reinstalled, just not quite working right. Winnie is waiting for news from the other sites in a similar boat. In progress (maybe On Hold it?) (20/1)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111118 111118] (12/1)<br /><br />
ECDF "low availability" ticket - just waiting for the silly alarm to clear. Daniela submitted a ticket about this foolish alarm a while ago - [https://ggus.eu/?mode=ticket_info&ticket_id=107689 107689]. On Hold (19/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (1/7/13)<br /><br />
glexec tarball ticket. With my tarball hat on - still no positive news on this front - it's beginning to look like this can't be done but we're having one last go. Sorry! On Hold (19/12)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110225 110225] (18/11/14)<br /><br />
Change of VO Manager for helios-vo.eu. It looks like this ticket is being held up at the user end a lot. I'm not sure there's anything we can do as it involves outside CAs. On Hold (20/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356] (23/1)<br /><br />
One of Manchester's CEs not working for biomed, due to problems with the new CREAM/old WMS communication. Alessandra gave biomed some sagely advice, but I suspect this ticket will need to be prodded soon to get a reponse from biomed (who I agree should use a newer WMS and close it). On Hold (26/1)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111547 111547] (2/2)<br /><br />
I'm reporting on a ticket that I submitted to myself today. I'm not sure what that says about the world. Anyway - a ticket to track the decommissioning of one of Lancaster's CEs, as we try to do it all proper like. On Hold (2/2)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (21/1/14)<br /><br />
Lancaster's perfsonar ticket, which I sadly let reach its first birthday. I've been prodding this offline, does anyone have the address for a regular, open iperf endpoint I could borrow? On Hold (9/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13)<br /><br />
Lancaster's tarball glexec ticket, as the ECDF one. On hold (26/1)<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13)<br /><br />
UCL's glexec ticket. They've been having trouble getting it to behave, and at last check Ben was off ill - probably due to dealing with glexec :-) On Hold (20/1)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353] (25/11/14)<br /><br />
Atlas asking for QM's storage to be made available via https. Waiting on a production ready STORM that can provide this - Dan is trying it out on his testbed se02.esc.qmul.ac.uk, which still needs tweaking. In progress (28/1)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357](23/1)<br /><br />
One of the IC CEs not working for biomed. Similar to the Manchester ticket, Daniela points to ticket 110635 and is waiting on an EMI release to fix it (due out imminently AIUI). On Hold (28/1)<br />
<br />
'''EFDA-JET'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=97485 97485] (21/9/13)<br /><br />
Jet's LCHB job failure tickets. I'm afraid I haven't been able to chase this up (partly due to only ever remembering on the first Monday of the month) - there's been no news for a while. On Hold (1/10/14)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333] (22/1)<br /><br />
A ticket to 100IT and the NGI to get the cloud accounting probe upgraded. I notified 100IT, but forgot to reassign the ticket - thanks to Jeremy for doing it. Assigned (2/2)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356](10/9/14)<br /><br />
Getting VMcatcher working at 100IT. David from 100IT has asked for some answers on which "glancepush" to use, but no reply for a while. Waiting for reply (19/1)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111477 111477](29/1)<br /><br />
CMS would like to run some staging tests to warm up for Run2. The Tier 1 warned CMS of today's outage and they're happy to proceed tomorrow (the 3rd) - I think they'd like a response. In progress (30/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=107935 107935](27/8/14)<br /><br />
A ticket regarding inconsistent BDII and SRM storage numbers. Waiting on a fix from the developers regarding read-only disk accounting (I think), Brian is still on the case. Stephen B let us know that Maria the ticket submitter is on maternity leave, and asks in her stead if the numbers are expected to align now. On hold (28/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111120 111120](12/1)<br /><br />
An atlas ticket about a large number of data transfer errors seen between RAL and BNL. Brian reckoned that this was due to shallow checksums on the old data being transferred, but had trouble looking at the BNL FTS. Regardless, the ADCoS shifter hadn't seen any errors for a week and suggests the ticket can be closed. Waiting for reply (29/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944](1/10/14)<br /><br />
CMS AAA test problems at RAL. After setting up a new xrootd box the test failures have changed in nature, but sadly they're still failures. In progress (29/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111347 111347](22/1)<br /><br />
CMS Consistency Check for RAL, January 2015 edition. Filelists were generated, orphan files were identified, then purged. Just need to know what CMS want to do next. Waiting for reply (26/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694](28/10/14)<br /><br />
Sno+ ticket concerning gfal tool problems, waiting on the new release to come out (middle of this month I believe). If you don't want to wait that long then I believe the 2.8 gfal2 tools can be found in the fts3 repo at last check. On hold (20/1)<br />
<br />
<br />
'''Monday 26th January 2015, 14.15 GMT'''<br /><br />
''Back after being forgotten about by me:''<br /><br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios Status:''']<br />
<br />
At the time of writing I see:<br /><br />
'''Imperial:''' gridpp VO job submission errors (but only 34 minutes old so probably naught to worry about).<br /><br />
'''Brunel:''' gridpp VO jobs aborted (one of these is 94 days old, so might be something to worry about).<br /><br />
'''Lancaster:''' pheno failures (I can't see what's wrong, but this CE only has 10 days left to live).<br /><br />
'''Sussex:''' snoplus failures (but I think Sussex is in downtime).<br /><br />
'''RALPP:''' A number of failures across a number of CEs, all a few hours old. An SE problem?<br /><br />
'''Sheffield:''' gridpp VO job submission failure, but only 6 hours old.<br />
And of course the srm-$VONAME failures at the Tier 1, which are caused by incompatibility between the tests and Castor AIUI. Things are generally looking good.<br />
<br />
22 Open UK Tickets this week. <br /><br />
'''NGI/100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333](22/1)<br /><br />
The NGI has been asked to upgrade the cloud accounting probe, and then notify our (only at the moment) cloud site to republish their accounting. Not entirely sure what this entails or who this falls on, I assigned it to NGI-OPERATIONS (and also noticed that 100IT isn't on the "notify site" list - odd). Assigned (22/1)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944](1/10/14)<br /><br />
CMS AAA test failures. Andrew Lahiff reported last week that the Tier 1 is building a replacement xrootd box which is currently being prepared. If that will take a while can the ticket be put on hold? In progress (19/1)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353](25/11/14)<br /><br />
An atlas ticket, asking for httpd access to at QMUL. The QM chaps were waiting on a production ready Storm that could handle this, and are preparing to test one out. This is another ticket that looks like it might need to be put On Hold (will leave that up to you chaps - there's a big difference between "slow and steady" progress and "no progress for a while"). In progress (21/1)<br />
<br />
'''RHUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111355 111355](23/1)<br /><br />
A dteam ticket - concerning http access to RHUL's SE. Although the initial observation about the SE certificate being expired was incorrect (the expiry date was reported as 5/1/15, which to be fair I would read as the 5th of January and not the 1st of May!) there still is some underlying problem here with intermittent test failures. Also this ticket raises the question of under what context are these tests being conducted? Anyone know, or shall we ask the submitter? In progress (26/1)<br />
<br />
'''BIOMED PROBLEMS:'''<br /><br />
'''Manchester:''' [https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356](23/1)<br /><br />
'''Imperial:''' [https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357](23/1)<br /><br />
Biomed are having job problems, looking to be caused by using crusty old WMSes to communicate with these site's shiny up-to-date CEs. According to ticket 110635 a cream side fix should be out by the end of January (CREAM 1.16.5), although Alessandra suggests that Biomed should try to use newer, working WMSes - or Dirac instead!<br />
<br />
<br />
'''Monday 19th January 2015, 14.30 GMT'''<br /><br />
23 Open UK Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944](1/10/14)<br /><br />
CMS seeing AAA test failures at RAL. The tests have been restarted recently and now seem to be having some suspicious looking authentication failures. In Progress (13/1)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111162 111162](14/1)<br /><br />
Atlas complaining about httpd doors not working on Sheffield's SE. After schooling the submitter in how to submit more useful information Elena is working on it. I bring this up as in the last few days I've had quite a few of my pool nodes have their httpd daemons crash on them (they're up to date, but still SL5), which may or may not be related. In Progress (19/1)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111118 111118](12/1)<br /><br />
ECDF "low availability" ticket after a few days of argus trouble, which Wahid fixed. Now the ticket will languish for a few weeks as the alarm clears. Daniela has reminded us of her ticket against these fairly silly alarms: https://ggus.eu/?mode=ticket_info&ticket_id=107689 . In the mean time this ticket could do with being put On Hold whilst the alarm clears. In progress (19/1)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356](10/9/2014)<br /><br />
Setting up VMCatcher at 100IT. After some troubles things seem to have be looking up, although there are still some questions that the 100IT chaps have for the configurations and what they should be using that aren't getting answers. I set the ticket to "Waiting for Reply" hoping that this will help get those in the know's attention. Waiting for reply (15/1)<br />
<br />
'''Perfsonar Tickets'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389](Sussex)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110382 110382](TIER 1)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108273 108273](Durham)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566](Lancaster)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110365 110365](Bristol)<br /><br />
Everyone seems to have updated their perfsonar hosts, so we're all good on that front, but a number of sites are either having trouble with their reinstalled hosts, or are having problems that they had pre-reinstall still haunt them. I'm afraid I have no suggestions of what to do about the growing number of these tickets though!</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Operations_Bulletin_LatestOperations Bulletin Latest2015-10-26T16:22:53Z<p>Matthew Doidge 1ac9bd3994: /* */</p>
<hr />
<div>[[Operations_Bulletin_Archive|Bulletin archive]]<br />
<br />
__NOTOC__<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Week commencing 26th October 2015<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Task Areas<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start General****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | General updates <br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start General text*********************** ----->'''<br />
<br />
'''Tuesday 20th October'''<br />
* HEPiX took place last week: [https://indico.cern.ch/event/384358/timetable/#all.detailed Agenda].<br />
* pheno: VO ID card for SW_DIR. For contacts check [http://operations-portal.egi.eu/vo/help this link].<br />
* [https://indico.cern.ch/e/453272 GridPP Technical Meeting] on Friday "virtual only". <br />
<br />
* Raja raised: "ARC CE publishing" and querying the BDII. <br />
* Luke askes about HTC CE documentation links.<br />
<br />
'''Tuesday 13th October'''<br />
* Krishan mentions lcg-tags errors problems at EFDA-JET. <br />
* BDII installations affected by slapd crash. SL6/CentOS6 (openldap-servers-2.4.40-6) released as a security update, unfortunately from openldap-servers-2.4.40-5 an issue has been introduced which provokes the slapd process to crash under certain conditions.<br />
* Daniela notes: Location of archiving records on ARC-CEs... Beware the jobreport_options of "/var/run" which clears records on reboot.<br />
<br />
<br />
'''Tuesday 6th October'''<br />
* GOCDB has received a new service type request for ‘uk.ac.gridpp.vcycle’.<br />
* John H noted a "CVMFS problem at RAL". Apparently this was due to a misconfiguration at CERN.<br />
* With HPC DIRAC work RAL discovered a bug in how the re-use connection flag is used with the fts commands.<br />
* Here is a summary for the [http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/ September reports]:<br />
** [http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/WLCG_All_Sites_ALICE_Sep2015.pdf ALICE]: All okay.<br />
** [http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/WLCG_All_Sites_ATLAS_Sep2015.pdf ATLAS]: Lancaster: 38%:42% & Liverpool: 81%: 100%.<br />
** [http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/WLCG_All_Sites_CMS_Sep2015.pdf CMS]: All okay.<br />
** [ http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/WLCG_All_Sites_LHCB_Sep2015.pdf LHCb]: QMUL: 47%:47% ; Lancaster: 75%: 80% & ECDF: 89%<br />
* The UCL storage servers are no longer used by ATLAS.<br />
* The November GDB has been moved to 4th November.<br />
<br />
<!-- **********************End General text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End General****************** -----><br />
<!-- ****************Start ops coord****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Operations Coordination - [https://indico.cern.ch/categoryDisplay.py?categId=4372 Agendas][https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsCoordination Wiki Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start ops coord text*********************** -----><br />
<br />
'''Tuesday 27th October'''<br />
* 13th MW Readiness WG meeting THIS Wed 28/10 @ 4pm CET in CERN room 28-S-023 or via vidyo.<br />
<br />
'''Next Meeting is scheduled for Thursday 22nd October'''<br />
<br />
'''Tuesday 6th October'''<br />
* There was a WLCG ops coordination meeting last week. [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151001 Minutes]. [https://indico.cern.ch/event/393617/ Agenda] (which has John Gordon's accounting slides).<br />
* The highlights:<br />
** dCache sites should install the latest fix for SRM solving a vulnerability<br />
** All sites hosting a regional or local site xrootd should updgrade it at least to version 4.1.1<br />
** CMS DPM sites should consider upgrading dpm-xrootd to version 3.5.5 now (from epel-testing) or after mid October (from epel-stable) to fix a problem affecting AAA<br />
** Tier-1 sites should do their best to avoid scheduling OUTAGE downtimes at the same time as other Tier-1's supporting common LHC VOs. A calendar will be linked in the minutes of the 3 o'clock operations meeting to easily find out if there are already downtimes at a given date<br />
** The multicore accounting for WLCG is now correct for the 99.5% of the CPU time, with the few remaining issues being addressed. Corrected historical accounting data is expected to be available from the production portal by the end of the month<br />
** All LHCb sites will soon be asked to deploy the "machine features" functionality<br />
<br />
'''Tuesday 22nd September'''<br />
* There was an ops coordination meeting last Thursday: [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150917 Minutes].<br />
* Highlights:<br />
** All 4 experiments have now an agreed workflow with the T0 for tickets that should be handled by the experiment supporters and were accidentally assigned to the T0 service managers.<br />
** A new FTS3 bug fixing release 3.3.1 is now available.<br />
** A globus lib issue is causing problems with FTS3 for sites running IPv6.<br />
** The rogue Glasgow configuration management tool replacing the current configuration for VOMS with the old one was picked up and unfortunately discussed as though sites had not got the message about using the new VOMS.<br />
** No network problems experienced with the transatlantic link despite 3 out of 4 cables being unavailable.<br />
** T0 experts are investigating the slow WN performance reported by LHCb and others.<br />
** A group of experts at CERN and CMS investigate ARGUS authentication problems affecting CMS VOBOXes.<br />
** T1 & T2 sites please observe the actions requested by ATLAS and CMS (also on the WLCG Operations portal).<br />
* Actions for [https://wlcg-ops.web.cern.ch/sys-admins/tasks Sites]; [https://wlcg-ops.web.cern.ch/experiments/tasks Experiments].<br />
<br />
'''Tuesday 15th September'''<br />
* The [https://indico.cern.ch/event/393616/ next WLCG ops coordination meeting is this Thursday 17th September]. Are there any Tier-2 issues we wish to raise? Minutes will appear [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150917 here]. <br />
<br />
<br />
<!-- **********************End ops coord text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End ops coord****************** -----><br />
<br />
<!-- ****************Start T1****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-1 - [http://www.gridpp.rl.ac.uk/status/ Status Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start T1 text*********************** -----><br />
'''Tuesday 20th October'''<br />
A reminder that there is a weekly [http://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Tier-1 experiment liaison meeting]. Notes from the last meeting [https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2015-10-14 here]<br />
* The final step in the upgrade of the Castor Oracle databases to version 11.2.0.4 took place successfully last Wednesday.<br />
* We had seen very high load on Atlas Tape. Last week the number of disk servers in the cache in front of Atlas Tape was doubled. Load (and performance) has been better since, although Atlas also throttled back writes at that time.<br />
* We had three disk server problems Friday/Saturday/Sunday (Two Atlas disk servers, One LHCb).<br />
* A configuration change has been made on both our "production" and "test" FTS3 services to fix that ATLAS/rucio stalled connections problem.<br />
* We are investigating why LHCB batch jobs sometimes fail to write results back to Castor (and the sometimes fail to write remotely as well).<br />
* As reported this last couple of weeks - we did have a problem with glexec for the worker nodes over a weekend. We are trying to understand why this problem was not seen during the testing and roll-out of the new worker node configuration.<br />
<!-- **********************End T1 text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End T1****************** -----><br />
<br />
<!-- ****************Start Storage & DM****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Storage & Data Management - [http://storage.esc.rl.ac.uk/weekly/ Agendas/Minutes]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150902-minutes.txt Wednesday 02 Sep]'''<br />
* Catch up with [http://www3.imperial.ac.uk/highenergyphysics/research/experiments/mice MICE]<br />
* How to do transfers of '''lots''' of files with FTS3 without the proxy timing out (in particularly if you need it vomsified)<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150812-minutes.txt Wednesday 12 Aug]'''<br />
* sort of housekeeping: data cleanups, catalogue synchronisation - in particular namespace dumps for VOs<br />
* GridPP storage/data at future events; GridPP35 and Hepix and Cloud data events<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150708-minutes.txt Wednesday 08 July]'''<br />
* Huge backlog of ATLAS data from Glasgow waiting to go to RAL, and oddly varying performance numbers - investigating<br />
* How physics data is like your Windows 95 games<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150701-minutes.txt Wednesday 01 July]'''<br />
* Feedback on CMS's proposal for listing contents of storage<br />
* Simple storage on expensive raided disks vs complicated storage on el cheapo or archive drives?<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150624-minutes.txt Wednesday 24 June]'''<br />
* Heard about the Indigo datacloud project, a H2020 project in which STFC is participating<br />
* Data transfers, theory and practice<br />
** Somewhat clunky tools to set up but perform well when they run<br />
** Will continue to work on recommendations/overview document<br />
** Worth having recommendations/experiences for different audiences - (potential) users, decision makers, techies<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End Storage & DM****************** -----><br />
<!-- ****************Start T2EVO****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-2 Evolution - [https://its.cern.ch/jira/browse/GRIDPP/ GridPP JIRA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 20 Oct'''<br />
* LHCb multipilot VMs now in production<br />
* Support for APEL-Sync records in Vac, but need to co-ordinate with APEL team to validate it. This is to allow pure-VM sites like UCL to pass APEL SAM tests (GRIDPP-10)<br />
* Last GridPP Technical Meeting decided to test disk-less operation at Oxford for CMS (GRIDPP-20) and LHCb (GRIDPP-21).<br />
<br />
'''Tuesday 6 Oct'''<br />
* UCL Vac site now running LHCb test of two payloads per dual processor VM. Total of dual processor VMs at UCL now 120.<br />
<br />
'''Tuesday 29 Sep'''<br />
* UCL Vac site updated with most recent version of Vac-in-a-Box. Now running ~216 jobs: LHCb MC and ATLAS certification jobs.<br />
* Drawing up list of tasks needed to be able to run a site for GridPP-supported VOs purely using VMs (e.g. VM certification by experiments etc.)<br />
* Discussion at GridPP Technical Meeting on storage options, including xrootd-based sites (i.e. xrootd not DPM/dCache)<br />
<br />
'''Tuesday 22 Sep'''<br />
* Fortnightly [https://indico.cern.ch/category/4454/ GridPP Technical Meetings] on Fridays will have Tier-2 Evolution discussions, starting on Fri 25 Sept.<br />
<br />
'''Thursday 17 Sep'''<br />
* Task force to start developing advice for sites to simplify their operation in line with "6.2.5 Evolution of Tier-2 sites" in the GridPP5 proposal.<br />
* Mailing list for Tier-2 evolution activities: gridpp-t2evo@cern.ch - anyone welcome to join<br />
* Also [https://its.cern.ch/jira/browse/GRIDPP/ GridPP project] on the CERN JIRA service for tracking actions. Can be used with a full or lightweight CERN account. You need to be added manually or on the gridpp-ops@cern.ch mailing list to browse issues.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End T2EVO****************** -----><br />
<!-- ****************Start Accounting****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Accounting - [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html UK Grid Metrics] [[HEPSPEC06]] [http://tinyurl.com/cfbfanf Atlas Dashboard HS06]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 20th October'''<br />
The WLCG MB decided to create a Benchmarking Task force led by Helge Meinhard see [https://indico.cern.ch/event/433303/contribution/9/attachments/1154494/1658853/2015-09-15-LCGMB-Benchmarking.pdf talk]<br />
<br />
'''Tuesday 22nd September'''<br />
* Slight delay for Sheffield but overall okay - although there is a gap between today's date and the most recent update for all sites. Perhaps an APEL delay.<br />
<br />
'''Monday 20th July'''<br />
* Oxford publishing 0 cores from Cream today. Maybe they forgot to switch one off. [http://goc-accounting.grid-support.ac.uk/apel/jobs2_withsubmithost.html Check here]. <br />
<br />
'''Tuesday 14th July'''<br />
* QMUL and Sheffield appear to be lagging with publishing by a week. <br />
* Please check your multicore publishing status (especially those sites mentioned in June).<br />
<br />
* A reminder to keep updating the [https://www.gridpp.ac.uk/wiki/HEPSPEC06 HEPSPEC06 tables].<br />
<br />
* [http://accounting.egi.eu/egi.php?SubRegion=1.67&query=normcpu&startYear=2014&startMonth=8&endYear=2014&endMonth=9&yRange=SITE&xRange=VO&voGroup=lhc&chart=GRBAR&scale=LIN&localJobs=onlygridjobs APEL status]: An issue at Sheffield?<br />
<br />
<!-- *****************Edit stop*********************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End Accounting****************** -----><br />
<!-- ****************Start Documentation****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Documentation - [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=area KeyDocs]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- *********************************************************** -----><br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 20th October, 2015'''<br />
<br />
Approved VOs document updated with temporary section for [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs#Approved_VOs_in_the_process_of_being_established LZ]<br />
<br />
'''Tuesday 29nd September'''<br />
Steve J: problems with voms server at fnal, voms.fnal.gov, have been detected; I will resolve them soon and may issue an update to Approved VOs, alerting sites with TB_SUPPORT should that occur. Approved VOs potentially affected are CDF, DZERO, LSST. Please do not act act yet.<br />
<br />
'''Tuesday 22nd September'''<br />
* Steve J is going to undertake some GridPP/documentation usability testing. <br />
<br />
'''Tuesday 18th August'''<br />
* Lydia's document - Setup a system to do data archiving using FTS3<br />
<br />
'''Tuesday 28th July'''<br />
* Ewan: /cvmfs/gridpp-vo help ... there's a lot of historical stuff on the GridPP wiki that makes it look a lot more complicated than it is now. We really should have a bit of a clear out at some point.<br />
<br />
'''Tuesday 23rd June'''<br />
* Reminder that documents need reviewing!<br />
<br />
<br />
'''General note'''<br />
<br />
See the [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=reviewed worst KeyDocs list] for documents needing review now and the names of the responsible people.<br />
<br />
|}<br />
<br />
<!-- ****************End Documentation****************** -----><br />
<!-- ****************Start Interop****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Interoperation - [https://wiki.egi.eu/wiki/Grid_Operations_Meetings EGI ops agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
''' Tuesday 20th October <br />
<br />
Meeting last week: https://wiki.egi.eu/wiki/Agenda-12-10-2015<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- *********************************************************** -----><br />
<br />
|}<br />
<!-- ****************End Interop****************** -----><br />
<!-- ****************Start Monitoring****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Monitoring - [https://www.gridpp.ac.uk/wiki/Links_Monitoring_pages Links] [http://grid-monitoring.cern.ch/mywlcg/ MyWLCG]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 16th June'''<br />
* F Melaccio & D Crooks decided to add a [https://www.gridpp.ac.uk/wiki/Monitoring_FAQs FAQs section] devoted to common monitoring issues under the monitoring page.<br />
* Feedback welcome. <br />
<br />
<br />
<br />
''' Tuesday 31st March<br />
<br />
* Glasgow Graphite/Grafana documentation: http://www.scotgrid.ac.uk/graphite/<br />
<br />
'''Monday 7th December<br />
<br />
* Meeting last Friday - agenda: https://indico.cern.ch/event/356853/ minutes: https://indico.cern.ch/event/356853/material/minutes/1.pdf<br />
* This was the wrap-up meeting of the consolidation TF; the mailing list will remain extant for a while yet.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Monitoring****************** -----><br />
<!-- ****************On-duty****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | On-duty - [https://operations-portal.in2p3.fr/dashboard Dashboard] [https://www.gridpp.ac.uk/wiki/ROD_rota ROD rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 20th October'''<br />
<br />
Lots of bits here and there, but no big pattern. Tickets about CE and storage problems open at several sites. QMUL notable as going on for a while, probably with some kind of configuration problem they're not identifying.<br />
<br />
<br />
'''Tuesday 6th October'''<br />
* With the exception of the dashboard getting really confused early in the week as the Nagios instances at Oxford and Lancaster came and went, it's been a fairly quiet week. There are four outstanding tickets:<br />
** Three for availability / reliabaility (Sussex, Liverpool and Lancaster).<br />
** One at Bristol for a GridFTP transfer problem.<br />
<br />
'''Tuesday 15th September'''<br />
* Generally quiet. QMUL have some grumblyness with the CEs. However, I understand much of this is caused by the batch farm being busy. There are low-availability tickets 'on hold' for Liverpool and UCL.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Rollout [http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html Status] [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions WLCG Baseline]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 15th September'''<br />
* [http://repository.egi.eu/2015/09/10/release-umd-3-13-3/ UMD 3.13.3 is available].<br />
<br />
'''Tuesday 12th May'''<br />
* MW Readiness WG meeting Wed May 6th at 4pm. Attended by Raul, Matt, Sam and Jeremy.<br />
<br />
'''Tuesday 17th March'''<br />
* Daniela has updated the [ https://www.gridpp.ac.uk/wiki/Staged_rollout_emi3 EMI-3 testing table]. Please check it is correct for your site. We want a clear view of where we are contributing.<br />
* There is a middleware readiness meeting this Wednesday. Would be good if a few site representatives joined.<br />
* Machine job features solution testing. Fed back that we will only commence tests if more documentation made available. This stops the HTC solution until after CHEP. Is there interest in testing other batch systems? Raul mentioned SLURM. There is also SGE and Torque.<br />
<br />
'''References'''<br />
<br />
* Staged Rollout pages (now separated into EMI1 & 2), and the page listing the deployed versions is extractable from the bdii, so they should all be reasonably up-to-date:<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout_emi2.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End On-duty****************** -----><br />
<!-- ****************Start Security****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Security - [http://www.gridpp.ac.uk/security/inchand/Incident.html Incident Procedure] [http://www.gridpp.ac.uk/security/policies/index.html Policies] [https://www.gridpp.ac.uk/wiki/SD_rota Rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
'''Monday 26th October'''<br />
* Updated IGTF distribution version [ https://dist.igtf.net/distribution/igtf/current/ 1.69 available].<br />
* Next UK Security Team meeting scheduled for 28th Oct.<br />
<br />
'''Tuesday 20th October'''<br />
* The IGTF has released a regular update to the trust anchor repository ([ https://rt.egi.eu/rt/Ticket/Display.html?id=9668 1.69]) - for distribution ON OR AFTER October 26t<br />
<br />
'''Tuesday 13th October'''<br />
* Nothing to report<br />
* Next UK Security Team meeting scheduled for 28th Oct.<br />
<br />
'''Monday 5th October'''<br />
* Updated IGTF distribution version 1.68 available - https://dist.igtf.net/distribution/igtf/current/<br />
* Update on incident broadcast EGI-20150925-01 relating to compromised systems in China. - The EGI, WLCG and VO security teams are continuing their investigations. Affected sites and users have been contacted and there is no present indication of further action needed by any site in the UK. However, as more information comes to light, additional updates may be made in the near future and sites are asked as always to read any updates carefully, taking actions as recommended.<br />
<br />
'''Tuesday 29th September'''<br />
* Incident broadcast EGI-20150925-01 relating to compromised systems in China.<br />
* UK security team meeting scheduled for 30th Sept. <br />
<br />
'''Monday 29th September'''<br />
* IGTF has released a regular update to the trust anchor repository (1.68) - for distribution ON OR AFTER October 5th<br />
<br />
<br />
The EGI [https://operations-portal.egi.eu/csiDashboard security dashboard].<br />
<br />
|}<br />
<!-- ****************End Security****************** -----><br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start Services****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Services - [http://netmon02.grid.hep.ph.ic.ac.uk:8080/maddash-webui/index.cgi PerfSonar dashboard] | [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).<br />
<br />
'''Tuesday 6th October'''<br />
* A reminder, the next [https://indico.cern.ch/event/401680/ LHCOPN and LHCONE joint meeting]: Science Park Amsterdam (NL) 28-29 of October 2015<br />
<br />
'''Tuesday 14th July'''<br />
* GridPP35 in September will have a part focus on networking and IPv6. This will include a review of where sites are with their deployment. Please try to firm up dates for your IPv6 availability between now and September. Please update the [https://gridpp.ac.uk/wiki/IPv6_site_status GridPP IPv6 status table]. <br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Services****************** -----><br />
<!-- ****************Start Tickets****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tickets<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Monday 26th October 2015, 15.30 GMT'''<br /><br />
27 Open UK Tickets this week. Not many seem all that exciting though.<br />
<br />
The link to all the UK [http://tinyurl.com/nwgrnys '''tickets'''].<br />
<br />
The few (two) tickets that really caught my eye are:<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117151 117151] (23/10)<br /><br />
This ticket is quite interesting, mainly for Dan schooling the submitter. QM received a ticket complaining that their jumbo frames were breaking stuff - it doesn't looking like the problem is at QM though. Naught wrong with the ticket handling. Waiting for reply (23/10)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117065 117065] (20/10)<br /><br />
Bristol have a CMS glexec error ticket that looks very similar to an existing one ([https://ggus.eu/?mode=ticket_info&ticket_id=116683 116683]), which is in turn spookily similar to a ticket being worked on by the Bristol admins ([https://ggus.eu/?mode=ticket_info&ticket_id=116775 116775]). At the very least I would say that if the two problems are different one would likely obfuscate the other. Is this a case of over-keen shifters submitting tickets without checking? I'd be tempted to close one or both of [https://ggus.eu/?mode=ticket_info&ticket_id=116683 116683] and this one ([https://ggus.eu/?mode=ticket_info&ticket_id=117065 117065]). Tell them I said it was okay to[1]. In progress (21/10)<br />
<br />
[1] It probably is okay to.<br />
<br />
There are also 4 availability tickets, all On Hold waiting for 30 days or so to pass.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios'''] looks clean at time of writing.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tickets****************** -----><br />
<!-- ****************Start Tools****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tools - [https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] [https://gridppnagios.physics.ox.ac.uk/nagios/ Nagios]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== ===== <br />
<!-- ******************Edit start********************* -----><br />
''' Tuesday 6 Oct 2015'''<br />
<br />
Moved Gridppnagios instance back to Oxford from Lancaster. It was kind of double whammy as both sites went down together. Fortunately Oxford site was partially working so we managed to start SAM Nagios at Oxford. Sam tests were unavailable for few hours but no affect on egi availibilty/reliability. Sites can have a look at https://mon.egi.eu/myegi/ss/ for a/r status. <br />
<br />
'''Tuesday 29 Sep 2015'''<br />
<br />
Following an air-conditioning problem at machine room in Oxford Tier-2 site on 26 September, gridppnagios(OX) was shut down and gridppnagios(Lancs) became active instance. Oxford site is in downtime until 1st Oct and it may be extended depending on the situation. <br />
VO-Nagios was also unavailable for two days but we have started it yesterday as it is running on a VM. VO-nagios is using oxford SE for replication test so it is failing those tests. I am looking to change to some other SE. <br />
<br />
'''Tuesday 09 June 2015'''<br />
*ARC CEs were failing nagios test becuase of non-availability of egi repository. Nagios test compare CA version from EGI repo. It started on 5th June and one of the IP addresses behind webserver was not responding. Problem went away in approximately 3 hours. The same problem started again on 6th June. Finally it was fixed on 8th June. No reason was given in any of the ticket opened regarding this outage. <br />
<br />
'''Tuesday 17th February'''<br />
* Another period where message brokers were temporarily unavailable seen yesterday. Any news on the last follow-up?<br />
<br />
'''Tuesday 27th January'''<br />
* Unscheduled outage of the EGI message broker (GRNET) caused a short-lived disruption to GridPP site monitoring (jobs failed) last Thursday 22nd January. Suspect BDII caching meant no immediate failover to stomp://mq.cro-ngi.hr:6163/ from stomp://mq.afroditi.hellasgrid.gr:6163/<br />
<br />
* [http://southgrid.blogspot.co.uk/2014/10/nagios-monitoring-for-non-lhc-vos.html Blog about VO Nagios]<br />
* [https://vo-nagios.physics.ox.ac.uk/nagios/ Oxford VO Nagios] currently monitoring gridpp, pheno, t2k.org, snoplus.snolab.ca, vo.southgrid.ac.uk.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tools****************** -----><br />
<!-- ****************Start VOs****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | VOs - [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS] [http://operations-portal.egi.eu/vo VO IDs] [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs Approved] [http://pprc.qmul.ac.uk/~walker/votable.html VO table]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 19th May'''<br />
* There is a current priority for enabling/supporting our joining communities. <br />
<br />
'''Tuesday 5th May'''<br />
* We have a number of VOs to be removed. Dedicated follow-up meeting proposed.<br />
<br />
'''Tuesday 28th April'''<br />
* For SNOPLUS.SNOLAB.CA, the port numbers for voms02.gridpp.ac.uk and voms03.gridpp.ac.uk have both been updated from 15003 to 15503.<br />
<br />
'''Tuesday 31st March'''<br />
* LIGO are in need of additional support for debugging some tests.<br />
* LSST now enabled on 3 sites. No 'own' CVMFS yet.<br />
<br />
* Impact<br />
** Citation policy (https://www.gridpp.ac.uk/acknowledging.html)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End VOs****************** -----><br />
<!-- ****************Start Sites****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Site Updates<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 24th February'''<br />
* Next review of status today.<br />
<br />
'''Tuesday 27th January'''<br />
* Squids not in GOCDB for: UCL; ECDF; Birmingham; Durham; RHUL; IC; Sussex; Lancaster<br />
* Squids in GOCDB for: EFDA-JET; Manchester; Liverpool; Cambridge; Sheffield; Bristol; Brunel; QMUL; T1; Oxford; Glasgow; RALPPD.<br />
<br />
'''Tuesday 2nd December''' <br />
* [https://www.gridpp.ac.uk/wiki/Batch_system_status Multicore status]. Queues available (63%)<br />
** YES: RAL T1; Brunel; Imperial; QMUL; Lancaster; Liverpool; Manchester; Glasgow; Cambridge; Oxford; RALPP; Sussex (12)<br />
** NO: RHUL (testing); UCL; Sheffield (testing); Durham; ECDF (testing); Birmingham; Bristol (7)<br />
<br />
* According to our [https://www.gridpp.ac.uk/wiki/Batch_system_status table] for cloud/VMs (26%)<br />
** YES: RAL T1; Brunel; Imperial; Manchester; Oxford (5)<br />
** NO: QMUL; RHUL; UCL; Lancaster; Liverpool; Sheffield; Durham; ECDF; Glasgow; Birmingham; Bristol; Cambridge; RALPP; Sussex (14)<br />
<br />
* [http://www.gridpp.ac.uk/php/gridpp-dirac-sam.php?action=viewlcg GridPP DIRAC jobs] successful (58%)<br />
** YES: Bristol; Glasgow; Lancaster; Liverpool; Manchester; Oxford; Sheffield; Brunel; IC; QMUL; RHUL (11)<br />
** NO: Cambridge; Durham; RALPP; RAL T1 (4) + ECDF; Sussex; UCL; Birmingham (4)<br />
<br />
* [https://www.gridpp.ac.uk/wiki/IPv6_site_status IPv6 status]<br />
** Allocation - 42%<br />
** YES: RAL T1; Brunel; IC; QMUL; Manchester; Sheffield; Cambridge; Oxford (8)<br />
** NO: RHUL; UCL; Lancaster; Liverpool; Durham; ECDF; Glasgow; Birmingham; Bristol; RALPP; Sussex<br />
<br />
* Dual stack nodes - 21%<br />
** YES: Brunel; IC; QMUL; Oxford (4)<br />
** NO: RHUL; UCL; Lancaster; Glasgow; Liverpool; Manchester; Sheffield; Durham; ECDF; Birmingham; Bristol; Cambridge; RALPP; Sussex, RAL T1 (15)<br />
<br />
<br />
<br />
'''Tuesday 21st October'''<br />
* High loads seen in xroot by several sites: Liverpool and RALT1... and also Bristol (see Luke's TB-S email on 16/10 for questions about changes to help).<br />
<br />
'''Tuesday 9th September'''<br />
* Intel announced the new generation of Xeon based on Haswell.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Sites****************** -----><br />
|}<br />
<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Meeting Summaries<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start PMB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Project Management Board - [http://www.gridpp.ac.uk/pmb/ Members][http://www.gridpp.ac.uk/php/pmb/minutes.php Minutes] [http://www.gridpp.ac.uk/pmb/ProjectManagement/QuarterlyReports/reports.html Quarterly Reports]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ******************************************* -----><br />
<!-- ****************End PMB****************** -----><br />
<!-- ****************Start Grid Ops****************** -----><br />
<!-- ******************************************* -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | GridPP ops meeting - [http://indico.cern.ch/categoryDisplay.py?categId=338 Agendas] [https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Actions] [https://www.gridpp.ac.uk/wiki/Category:GridPP_Operations Core Tasks]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Grid Ops****************** -----><br />
<!-- ****************Start T1 liaison****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) [https://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Agenda] Meeting takes place on Vidyo.<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
Highlights from this meeting are now included in the Tier1 report farther up this page.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Liaison****************** -----><br />
<!-- ****************Start GDB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Grid Deployment Board - [http://indico.cern.ch/categoryDisplay.py?categId=3l181 Agendas] [http://indico.cern.ch/categoryDisplay.py?categId=666 MB agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End GDB****************** -----><br />
<br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start NGI****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | NGI UK - [http://www.ukngi.ac.uk/ Homepage] [https://ca.grid-support.ac.uk/cgi-bin/pub/pki?cmd=getStaticPage&name=index CA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End NGI****************** -----><br />
<br />
<!-- ****************Start Events****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Events<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************Stop Events****************** -----><br />
<!-- ****************Start ATLAS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK ATLAS - [http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?#currentView=Shifter+view&highlight=false Shifter view] [http://www.atlas.ac.uk/ops/ News & Links]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Atlas S&C week 2-6 Feb 2015'''<br />
<br />
Production<br />
<br />
• Prodsys-2 in production since Dec 1st<br />
<br />
• Deployment has not been transparent , many issued has been solved, the grid is filled again <br />
<br />
• MC15 is expected to start soon, waiting for physics validations, evgen testing is underway and close to finalised.. Simulation expected to be broadly similar to MC14, no blockers expected. <br />
<br />
Rucio<br />
<br />
• Rucio in production since Dec 1st and is ready for LHC RUN-2. Some fields need improvements, including transfer and deletion agents, documentation and monitoring. <br />
<br />
• [https://rucio-ui.cern.ch/dumps Rucio dumps available].<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMDarkDataCleaning Dark data cleaning]<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMLostFilesLost files declaration] . Only Only DDM ops can issue lost files declaration for now, cloud support needs to fill a ticket.<br />
<br />
• Webdav panda functional tests with Hammercloud are ongoing<br />
<br />
Monitoring<br />
<br />
• [http://adc-monitoring.cern.ch/ Main page]<br />
<br />
• [http://dashb-atlas-ddm-acc.cern.ch/dashboard/request.py/ddmaccounting DDM Accounting]<br />
<br />
• [http://atlas-agis.cern.ch/agis/ddmblacklisting/listGroup space]<br />
<br />
• [http://dashb-atlas-ddm.cern.ch/ddm2/ Deletion]<br />
<br />
ASAP<br />
<br />
• ASAP (ATLAS Site Availability Performance) in place. Every 3 months the T2s sites performing BELOW 80% are [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/ATLASSiteCategorization reported to the International Computing Board.]<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End ATLAS****************** -----><br />
<!-- ****************Start CMS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK CMS<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End CMS****************** -----><br />
<!-- ****************Start LHCb****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK LHCb<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End LHCb****************** -----><br />
<!-- ****************Start Other****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK OTHER<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Other****************** -----><br />
<!-- ****************Start Requests****************** -----><br />
{| cellspacing="0" cellpadding="0" style="margin: 1em 0em 0em 0.3em; width:100%"<br />
| style="width:50%; vertical-align:top; border:1px solid Gold; background-color: LightGreen;" rowspan="2"|<br />
<div style="border-bottom:1px solid Gold; background-color:#ffffaa; padding:0.2em 0.5em 0.2em 0.5em; font-size:110%; font-weight:bold;"> To note</div><br />
<div style="padding:0.4em 1em 0.3em 1em;"><br />
<br />
==== ====<br />
<!-- ******************Edit start********************* -----><br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<br />
</div><br />
|}<br />
<br />
<!-- ****************End Requests****************** -----><br />
<br />
|}</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Operations_Bulletin_LatestOperations Bulletin Latest2015-10-26T16:22:41Z<p>Matthew Doidge 1ac9bd3994: /* */</p>
<hr />
<div>[[Operations_Bulletin_Archive|Bulletin archive]]<br />
<br />
__NOTOC__<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Week commencing 26th October 2015<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Task Areas<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start General****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | General updates <br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start General text*********************** ----->'''<br />
<br />
'''Tuesday 20th October'''<br />
* HEPiX took place last week: [https://indico.cern.ch/event/384358/timetable/#all.detailed Agenda].<br />
* pheno: VO ID card for SW_DIR. For contacts check [http://operations-portal.egi.eu/vo/help this link].<br />
* [https://indico.cern.ch/e/453272 GridPP Technical Meeting] on Friday "virtual only". <br />
<br />
* Raja raised: "ARC CE publishing" and querying the BDII. <br />
* Luke askes about HTC CE documentation links.<br />
<br />
'''Tuesday 13th October'''<br />
* Krishan mentions lcg-tags errors problems at EFDA-JET. <br />
* BDII installations affected by slapd crash. SL6/CentOS6 (openldap-servers-2.4.40-6) released as a security update, unfortunately from openldap-servers-2.4.40-5 an issue has been introduced which provokes the slapd process to crash under certain conditions.<br />
* Daniela notes: Location of archiving records on ARC-CEs... Beware the jobreport_options of "/var/run" which clears records on reboot.<br />
<br />
<br />
'''Tuesday 6th October'''<br />
* GOCDB has received a new service type request for ‘uk.ac.gridpp.vcycle’.<br />
* John H noted a "CVMFS problem at RAL". Apparently this was due to a misconfiguration at CERN.<br />
* With HPC DIRAC work RAL discovered a bug in how the re-use connection flag is used with the fts commands.<br />
* Here is a summary for the [http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/ September reports]:<br />
** [http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/WLCG_All_Sites_ALICE_Sep2015.pdf ALICE]: All okay.<br />
** [http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/WLCG_All_Sites_ATLAS_Sep2015.pdf ATLAS]: Lancaster: 38%:42% & Liverpool: 81%: 100%.<br />
** [http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/WLCG_All_Sites_CMS_Sep2015.pdf CMS]: All okay.<br />
** [ http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/WLCG_All_Sites_LHCB_Sep2015.pdf LHCb]: QMUL: 47%:47% ; Lancaster: 75%: 80% & ECDF: 89%<br />
* The UCL storage servers are no longer used by ATLAS.<br />
* The November GDB has been moved to 4th November.<br />
<br />
<!-- **********************End General text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End General****************** -----><br />
<!-- ****************Start ops coord****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Operations Coordination - [https://indico.cern.ch/categoryDisplay.py?categId=4372 Agendas][https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsCoordination Wiki Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start ops coord text*********************** -----><br />
<br />
'''Tuesday 27th October'''<br />
* 13th MW Readiness WG meeting THIS Wed 28/10 @ 4pm CET in CERN room 28-S-023 or via vidyo.<br />
<br />
'''Next Meeting is scheduled for Thursday 22nd October'''<br />
<br />
'''Tuesday 6th October'''<br />
* There was a WLCG ops coordination meeting last week. [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151001 Minutes]. [https://indico.cern.ch/event/393617/ Agenda] (which has John Gordon's accounting slides).<br />
* The highlights:<br />
** dCache sites should install the latest fix for SRM solving a vulnerability<br />
** All sites hosting a regional or local site xrootd should updgrade it at least to version 4.1.1<br />
** CMS DPM sites should consider upgrading dpm-xrootd to version 3.5.5 now (from epel-testing) or after mid October (from epel-stable) to fix a problem affecting AAA<br />
** Tier-1 sites should do their best to avoid scheduling OUTAGE downtimes at the same time as other Tier-1's supporting common LHC VOs. A calendar will be linked in the minutes of the 3 o'clock operations meeting to easily find out if there are already downtimes at a given date<br />
** The multicore accounting for WLCG is now correct for the 99.5% of the CPU time, with the few remaining issues being addressed. Corrected historical accounting data is expected to be available from the production portal by the end of the month<br />
** All LHCb sites will soon be asked to deploy the "machine features" functionality<br />
<br />
'''Tuesday 22nd September'''<br />
* There was an ops coordination meeting last Thursday: [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150917 Minutes].<br />
* Highlights:<br />
** All 4 experiments have now an agreed workflow with the T0 for tickets that should be handled by the experiment supporters and were accidentally assigned to the T0 service managers.<br />
** A new FTS3 bug fixing release 3.3.1 is now available.<br />
** A globus lib issue is causing problems with FTS3 for sites running IPv6.<br />
** The rogue Glasgow configuration management tool replacing the current configuration for VOMS with the old one was picked up and unfortunately discussed as though sites had not got the message about using the new VOMS.<br />
** No network problems experienced with the transatlantic link despite 3 out of 4 cables being unavailable.<br />
** T0 experts are investigating the slow WN performance reported by LHCb and others.<br />
** A group of experts at CERN and CMS investigate ARGUS authentication problems affecting CMS VOBOXes.<br />
** T1 & T2 sites please observe the actions requested by ATLAS and CMS (also on the WLCG Operations portal).<br />
* Actions for [https://wlcg-ops.web.cern.ch/sys-admins/tasks Sites]; [https://wlcg-ops.web.cern.ch/experiments/tasks Experiments].<br />
<br />
'''Tuesday 15th September'''<br />
* The [https://indico.cern.ch/event/393616/ next WLCG ops coordination meeting is this Thursday 17th September]. Are there any Tier-2 issues we wish to raise? Minutes will appear [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150917 here]. <br />
<br />
<br />
<!-- **********************End ops coord text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End ops coord****************** -----><br />
<br />
<!-- ****************Start T1****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-1 - [http://www.gridpp.rl.ac.uk/status/ Status Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start T1 text*********************** -----><br />
'''Tuesday 20th October'''<br />
A reminder that there is a weekly [http://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Tier-1 experiment liaison meeting]. Notes from the last meeting [https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2015-10-14 here]<br />
* The final step in the upgrade of the Castor Oracle databases to version 11.2.0.4 took place successfully last Wednesday.<br />
* We had seen very high load on Atlas Tape. Last week the number of disk servers in the cache in front of Atlas Tape was doubled. Load (and performance) has been better since, although Atlas also throttled back writes at that time.<br />
* We had three disk server problems Friday/Saturday/Sunday (Two Atlas disk servers, One LHCb).<br />
* A configuration change has been made on both our "production" and "test" FTS3 services to fix that ATLAS/rucio stalled connections problem.<br />
* We are investigating why LHCB batch jobs sometimes fail to write results back to Castor (and the sometimes fail to write remotely as well).<br />
* As reported this last couple of weeks - we did have a problem with glexec for the worker nodes over a weekend. We are trying to understand why this problem was not seen during the testing and roll-out of the new worker node configuration.<br />
<!-- **********************End T1 text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End T1****************** -----><br />
<br />
<!-- ****************Start Storage & DM****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Storage & Data Management - [http://storage.esc.rl.ac.uk/weekly/ Agendas/Minutes]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150902-minutes.txt Wednesday 02 Sep]'''<br />
* Catch up with [http://www3.imperial.ac.uk/highenergyphysics/research/experiments/mice MICE]<br />
* How to do transfers of '''lots''' of files with FTS3 without the proxy timing out (in particularly if you need it vomsified)<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150812-minutes.txt Wednesday 12 Aug]'''<br />
* sort of housekeeping: data cleanups, catalogue synchronisation - in particular namespace dumps for VOs<br />
* GridPP storage/data at future events; GridPP35 and Hepix and Cloud data events<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150708-minutes.txt Wednesday 08 July]'''<br />
* Huge backlog of ATLAS data from Glasgow waiting to go to RAL, and oddly varying performance numbers - investigating<br />
* How physics data is like your Windows 95 games<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150701-minutes.txt Wednesday 01 July]'''<br />
* Feedback on CMS's proposal for listing contents of storage<br />
* Simple storage on expensive raided disks vs complicated storage on el cheapo or archive drives?<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150624-minutes.txt Wednesday 24 June]'''<br />
* Heard about the Indigo datacloud project, a H2020 project in which STFC is participating<br />
* Data transfers, theory and practice<br />
** Somewhat clunky tools to set up but perform well when they run<br />
** Will continue to work on recommendations/overview document<br />
** Worth having recommendations/experiences for different audiences - (potential) users, decision makers, techies<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End Storage & DM****************** -----><br />
<!-- ****************Start T2EVO****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-2 Evolution - [https://its.cern.ch/jira/browse/GRIDPP/ GridPP JIRA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 20 Oct'''<br />
* LHCb multipilot VMs now in production<br />
* Support for APEL-Sync records in Vac, but need to co-ordinate with APEL team to validate it. This is to allow pure-VM sites like UCL to pass APEL SAM tests (GRIDPP-10)<br />
* Last GridPP Technical Meeting decided to test disk-less operation at Oxford for CMS (GRIDPP-20) and LHCb (GRIDPP-21).<br />
<br />
'''Tuesday 6 Oct'''<br />
* UCL Vac site now running LHCb test of two payloads per dual processor VM. Total of dual processor VMs at UCL now 120.<br />
<br />
'''Tuesday 29 Sep'''<br />
* UCL Vac site updated with most recent version of Vac-in-a-Box. Now running ~216 jobs: LHCb MC and ATLAS certification jobs.<br />
* Drawing up list of tasks needed to be able to run a site for GridPP-supported VOs purely using VMs (e.g. VM certification by experiments etc.)<br />
* Discussion at GridPP Technical Meeting on storage options, including xrootd-based sites (i.e. xrootd not DPM/dCache)<br />
<br />
'''Tuesday 22 Sep'''<br />
* Fortnightly [https://indico.cern.ch/category/4454/ GridPP Technical Meetings] on Fridays will have Tier-2 Evolution discussions, starting on Fri 25 Sept.<br />
<br />
'''Thursday 17 Sep'''<br />
* Task force to start developing advice for sites to simplify their operation in line with "6.2.5 Evolution of Tier-2 sites" in the GridPP5 proposal.<br />
* Mailing list for Tier-2 evolution activities: gridpp-t2evo@cern.ch - anyone welcome to join<br />
* Also [https://its.cern.ch/jira/browse/GRIDPP/ GridPP project] on the CERN JIRA service for tracking actions. Can be used with a full or lightweight CERN account. You need to be added manually or on the gridpp-ops@cern.ch mailing list to browse issues.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End T2EVO****************** -----><br />
<!-- ****************Start Accounting****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Accounting - [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html UK Grid Metrics] [[HEPSPEC06]] [http://tinyurl.com/cfbfanf Atlas Dashboard HS06]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 20th October'''<br />
The WLCG MB decided to create a Benchmarking Task force led by Helge Meinhard see [https://indico.cern.ch/event/433303/contribution/9/attachments/1154494/1658853/2015-09-15-LCGMB-Benchmarking.pdf talk]<br />
<br />
'''Tuesday 22nd September'''<br />
* Slight delay for Sheffield but overall okay - although there is a gap between today's date and the most recent update for all sites. Perhaps an APEL delay.<br />
<br />
'''Monday 20th July'''<br />
* Oxford publishing 0 cores from Cream today. Maybe they forgot to switch one off. [http://goc-accounting.grid-support.ac.uk/apel/jobs2_withsubmithost.html Check here]. <br />
<br />
'''Tuesday 14th July'''<br />
* QMUL and Sheffield appear to be lagging with publishing by a week. <br />
* Please check your multicore publishing status (especially those sites mentioned in June).<br />
<br />
* A reminder to keep updating the [https://www.gridpp.ac.uk/wiki/HEPSPEC06 HEPSPEC06 tables].<br />
<br />
* [http://accounting.egi.eu/egi.php?SubRegion=1.67&query=normcpu&startYear=2014&startMonth=8&endYear=2014&endMonth=9&yRange=SITE&xRange=VO&voGroup=lhc&chart=GRBAR&scale=LIN&localJobs=onlygridjobs APEL status]: An issue at Sheffield?<br />
<br />
<!-- *****************Edit stop*********************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End Accounting****************** -----><br />
<!-- ****************Start Documentation****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Documentation - [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=area KeyDocs]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- *********************************************************** -----><br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 20th October, 2015'''<br />
<br />
Approved VOs document updated with temporary section for [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs#Approved_VOs_in_the_process_of_being_established LZ]<br />
<br />
'''Tuesday 29nd September'''<br />
Steve J: problems with voms server at fnal, voms.fnal.gov, have been detected; I will resolve them soon and may issue an update to Approved VOs, alerting sites with TB_SUPPORT should that occur. Approved VOs potentially affected are CDF, DZERO, LSST. Please do not act act yet.<br />
<br />
'''Tuesday 22nd September'''<br />
* Steve J is going to undertake some GridPP/documentation usability testing. <br />
<br />
'''Tuesday 18th August'''<br />
* Lydia's document - Setup a system to do data archiving using FTS3<br />
<br />
'''Tuesday 28th July'''<br />
* Ewan: /cvmfs/gridpp-vo help ... there's a lot of historical stuff on the GridPP wiki that makes it look a lot more complicated than it is now. We really should have a bit of a clear out at some point.<br />
<br />
'''Tuesday 23rd June'''<br />
* Reminder that documents need reviewing!<br />
<br />
<br />
'''General note'''<br />
<br />
See the [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=reviewed worst KeyDocs list] for documents needing review now and the names of the responsible people.<br />
<br />
|}<br />
<br />
<!-- ****************End Documentation****************** -----><br />
<!-- ****************Start Interop****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Interoperation - [https://wiki.egi.eu/wiki/Grid_Operations_Meetings EGI ops agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
''' Tuesday 20th October <br />
<br />
Meeting last week: https://wiki.egi.eu/wiki/Agenda-12-10-2015<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- *********************************************************** -----><br />
<br />
|}<br />
<!-- ****************End Interop****************** -----><br />
<!-- ****************Start Monitoring****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Monitoring - [https://www.gridpp.ac.uk/wiki/Links_Monitoring_pages Links] [http://grid-monitoring.cern.ch/mywlcg/ MyWLCG]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 16th June'''<br />
* F Melaccio & D Crooks decided to add a [https://www.gridpp.ac.uk/wiki/Monitoring_FAQs FAQs section] devoted to common monitoring issues under the monitoring page.<br />
* Feedback welcome. <br />
<br />
<br />
<br />
''' Tuesday 31st March<br />
<br />
* Glasgow Graphite/Grafana documentation: http://www.scotgrid.ac.uk/graphite/<br />
<br />
'''Monday 7th December<br />
<br />
* Meeting last Friday - agenda: https://indico.cern.ch/event/356853/ minutes: https://indico.cern.ch/event/356853/material/minutes/1.pdf<br />
* This was the wrap-up meeting of the consolidation TF; the mailing list will remain extant for a while yet.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Monitoring****************** -----><br />
<!-- ****************On-duty****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | On-duty - [https://operations-portal.in2p3.fr/dashboard Dashboard] [https://www.gridpp.ac.uk/wiki/ROD_rota ROD rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 20th October'''<br />
<br />
Lots of bits here and there, but no big pattern. Tickets about CE and storage problems open at several sites. QMUL notable as going on for a while, probably with some kind of configuration problem they're not identifying.<br />
<br />
<br />
'''Tuesday 6th October'''<br />
* With the exception of the dashboard getting really confused early in the week as the Nagios instances at Oxford and Lancaster came and went, it's been a fairly quiet week. There are four outstanding tickets:<br />
** Three for availability / reliabaility (Sussex, Liverpool and Lancaster).<br />
** One at Bristol for a GridFTP transfer problem.<br />
<br />
'''Tuesday 15th September'''<br />
* Generally quiet. QMUL have some grumblyness with the CEs. However, I understand much of this is caused by the batch farm being busy. There are low-availability tickets 'on hold' for Liverpool and UCL.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Rollout [http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html Status] [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions WLCG Baseline]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 15th September'''<br />
* [http://repository.egi.eu/2015/09/10/release-umd-3-13-3/ UMD 3.13.3 is available].<br />
<br />
'''Tuesday 12th May'''<br />
* MW Readiness WG meeting Wed May 6th at 4pm. Attended by Raul, Matt, Sam and Jeremy.<br />
<br />
'''Tuesday 17th March'''<br />
* Daniela has updated the [ https://www.gridpp.ac.uk/wiki/Staged_rollout_emi3 EMI-3 testing table]. Please check it is correct for your site. We want a clear view of where we are contributing.<br />
* There is a middleware readiness meeting this Wednesday. Would be good if a few site representatives joined.<br />
* Machine job features solution testing. Fed back that we will only commence tests if more documentation made available. This stops the HTC solution until after CHEP. Is there interest in testing other batch systems? Raul mentioned SLURM. There is also SGE and Torque.<br />
<br />
'''References'''<br />
<br />
* Staged Rollout pages (now separated into EMI1 & 2), and the page listing the deployed versions is extractable from the bdii, so they should all be reasonably up-to-date:<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout_emi2.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End On-duty****************** -----><br />
<!-- ****************Start Security****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Security - [http://www.gridpp.ac.uk/security/inchand/Incident.html Incident Procedure] [http://www.gridpp.ac.uk/security/policies/index.html Policies] [https://www.gridpp.ac.uk/wiki/SD_rota Rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
'''Monday 26th October'''<br />
* Updated IGTF distribution version [ https://dist.igtf.net/distribution/igtf/current/ 1.69 available].<br />
* Next UK Security Team meeting scheduled for 28th Oct.<br />
<br />
'''Tuesday 20th October'''<br />
* The IGTF has released a regular update to the trust anchor repository ([ https://rt.egi.eu/rt/Ticket/Display.html?id=9668 1.69]) - for distribution ON OR AFTER October 26t<br />
<br />
'''Tuesday 13th October'''<br />
* Nothing to report<br />
* Next UK Security Team meeting scheduled for 28th Oct.<br />
<br />
'''Monday 5th October'''<br />
* Updated IGTF distribution version 1.68 available - https://dist.igtf.net/distribution/igtf/current/<br />
* Update on incident broadcast EGI-20150925-01 relating to compromised systems in China. - The EGI, WLCG and VO security teams are continuing their investigations. Affected sites and users have been contacted and there is no present indication of further action needed by any site in the UK. However, as more information comes to light, additional updates may be made in the near future and sites are asked as always to read any updates carefully, taking actions as recommended.<br />
<br />
'''Tuesday 29th September'''<br />
* Incident broadcast EGI-20150925-01 relating to compromised systems in China.<br />
* UK security team meeting scheduled for 30th Sept. <br />
<br />
'''Monday 29th September'''<br />
* IGTF has released a regular update to the trust anchor repository (1.68) - for distribution ON OR AFTER October 5th<br />
<br />
<br />
The EGI [https://operations-portal.egi.eu/csiDashboard security dashboard].<br />
<br />
|}<br />
<!-- ****************End Security****************** -----><br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start Services****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Services - [http://netmon02.grid.hep.ph.ic.ac.uk:8080/maddash-webui/index.cgi PerfSonar dashboard] | [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).<br />
<br />
'''Tuesday 6th October'''<br />
* A reminder, the next [https://indico.cern.ch/event/401680/ LHCOPN and LHCONE joint meeting]: Science Park Amsterdam (NL) 28-29 of October 2015<br />
<br />
'''Tuesday 14th July'''<br />
* GridPP35 in September will have a part focus on networking and IPv6. This will include a review of where sites are with their deployment. Please try to firm up dates for your IPv6 availability between now and September. Please update the [https://gridpp.ac.uk/wiki/IPv6_site_status GridPP IPv6 status table]. <br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Services****************** -----><br />
<!-- ****************Start Tickets****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tickets<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Monday 26th October 2015, 15.30 GMT'''<br /><br />
27 Open UK Tickets this week. Not many seem all that exciting though.<br />
<br />
The line to all the UK [http://tinyurl.com/nwgrnys '''tickets'''].<br />
<br />
The few (two) tickets that really caught my eye are:<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117151 117151] (23/10)<br /><br />
This ticket is quite interesting, mainly for Dan schooling the submitter. QM received a ticket complaining that their jumbo frames were breaking stuff - it doesn't looking like the problem is at QM though. Naught wrong with the ticket handling. Waiting for reply (23/10)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=117065 117065] (20/10)<br /><br />
Bristol have a CMS glexec error ticket that looks very similar to an existing one ([https://ggus.eu/?mode=ticket_info&ticket_id=116683 116683]), which is in turn spookily similar to a ticket being worked on by the Bristol admins ([https://ggus.eu/?mode=ticket_info&ticket_id=116775 116775]). At the very least I would say that if the two problems are different one would likely obfuscate the other. Is this a case of over-keen shifters submitting tickets without checking? I'd be tempted to close one or both of [https://ggus.eu/?mode=ticket_info&ticket_id=116683 116683] and this one ([https://ggus.eu/?mode=ticket_info&ticket_id=117065 117065]). Tell them I said it was okay to[1]. In progress (21/10)<br />
<br />
[1] It probably is okay to.<br />
<br />
There are also 4 availability tickets, all On Hold waiting for 30 days or so to pass.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios'''] looks clean at time of writing.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tickets****************** -----><br />
<!-- ****************Start Tools****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tools - [https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] [https://gridppnagios.physics.ox.ac.uk/nagios/ Nagios]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== ===== <br />
<!-- ******************Edit start********************* -----><br />
''' Tuesday 6 Oct 2015'''<br />
<br />
Moved Gridppnagios instance back to Oxford from Lancaster. It was kind of double whammy as both sites went down together. Fortunately Oxford site was partially working so we managed to start SAM Nagios at Oxford. Sam tests were unavailable for few hours but no affect on egi availibilty/reliability. Sites can have a look at https://mon.egi.eu/myegi/ss/ for a/r status. <br />
<br />
'''Tuesday 29 Sep 2015'''<br />
<br />
Following an air-conditioning problem at machine room in Oxford Tier-2 site on 26 September, gridppnagios(OX) was shut down and gridppnagios(Lancs) became active instance. Oxford site is in downtime until 1st Oct and it may be extended depending on the situation. <br />
VO-Nagios was also unavailable for two days but we have started it yesterday as it is running on a VM. VO-nagios is using oxford SE for replication test so it is failing those tests. I am looking to change to some other SE. <br />
<br />
'''Tuesday 09 June 2015'''<br />
*ARC CEs were failing nagios test becuase of non-availability of egi repository. Nagios test compare CA version from EGI repo. It started on 5th June and one of the IP addresses behind webserver was not responding. Problem went away in approximately 3 hours. The same problem started again on 6th June. Finally it was fixed on 8th June. No reason was given in any of the ticket opened regarding this outage. <br />
<br />
'''Tuesday 17th February'''<br />
* Another period where message brokers were temporarily unavailable seen yesterday. Any news on the last follow-up?<br />
<br />
'''Tuesday 27th January'''<br />
* Unscheduled outage of the EGI message broker (GRNET) caused a short-lived disruption to GridPP site monitoring (jobs failed) last Thursday 22nd January. Suspect BDII caching meant no immediate failover to stomp://mq.cro-ngi.hr:6163/ from stomp://mq.afroditi.hellasgrid.gr:6163/<br />
<br />
* [http://southgrid.blogspot.co.uk/2014/10/nagios-monitoring-for-non-lhc-vos.html Blog about VO Nagios]<br />
* [https://vo-nagios.physics.ox.ac.uk/nagios/ Oxford VO Nagios] currently monitoring gridpp, pheno, t2k.org, snoplus.snolab.ca, vo.southgrid.ac.uk.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tools****************** -----><br />
<!-- ****************Start VOs****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | VOs - [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS] [http://operations-portal.egi.eu/vo VO IDs] [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs Approved] [http://pprc.qmul.ac.uk/~walker/votable.html VO table]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 19th May'''<br />
* There is a current priority for enabling/supporting our joining communities. <br />
<br />
'''Tuesday 5th May'''<br />
* We have a number of VOs to be removed. Dedicated follow-up meeting proposed.<br />
<br />
'''Tuesday 28th April'''<br />
* For SNOPLUS.SNOLAB.CA, the port numbers for voms02.gridpp.ac.uk and voms03.gridpp.ac.uk have both been updated from 15003 to 15503.<br />
<br />
'''Tuesday 31st March'''<br />
* LIGO are in need of additional support for debugging some tests.<br />
* LSST now enabled on 3 sites. No 'own' CVMFS yet.<br />
<br />
* Impact<br />
** Citation policy (https://www.gridpp.ac.uk/acknowledging.html)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End VOs****************** -----><br />
<!-- ****************Start Sites****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Site Updates<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 24th February'''<br />
* Next review of status today.<br />
<br />
'''Tuesday 27th January'''<br />
* Squids not in GOCDB for: UCL; ECDF; Birmingham; Durham; RHUL; IC; Sussex; Lancaster<br />
* Squids in GOCDB for: EFDA-JET; Manchester; Liverpool; Cambridge; Sheffield; Bristol; Brunel; QMUL; T1; Oxford; Glasgow; RALPPD.<br />
<br />
'''Tuesday 2nd December''' <br />
* [https://www.gridpp.ac.uk/wiki/Batch_system_status Multicore status]. Queues available (63%)<br />
** YES: RAL T1; Brunel; Imperial; QMUL; Lancaster; Liverpool; Manchester; Glasgow; Cambridge; Oxford; RALPP; Sussex (12)<br />
** NO: RHUL (testing); UCL; Sheffield (testing); Durham; ECDF (testing); Birmingham; Bristol (7)<br />
<br />
* According to our [https://www.gridpp.ac.uk/wiki/Batch_system_status table] for cloud/VMs (26%)<br />
** YES: RAL T1; Brunel; Imperial; Manchester; Oxford (5)<br />
** NO: QMUL; RHUL; UCL; Lancaster; Liverpool; Sheffield; Durham; ECDF; Glasgow; Birmingham; Bristol; Cambridge; RALPP; Sussex (14)<br />
<br />
* [http://www.gridpp.ac.uk/php/gridpp-dirac-sam.php?action=viewlcg GridPP DIRAC jobs] successful (58%)<br />
** YES: Bristol; Glasgow; Lancaster; Liverpool; Manchester; Oxford; Sheffield; Brunel; IC; QMUL; RHUL (11)<br />
** NO: Cambridge; Durham; RALPP; RAL T1 (4) + ECDF; Sussex; UCL; Birmingham (4)<br />
<br />
* [https://www.gridpp.ac.uk/wiki/IPv6_site_status IPv6 status]<br />
** Allocation - 42%<br />
** YES: RAL T1; Brunel; IC; QMUL; Manchester; Sheffield; Cambridge; Oxford (8)<br />
** NO: RHUL; UCL; Lancaster; Liverpool; Durham; ECDF; Glasgow; Birmingham; Bristol; RALPP; Sussex<br />
<br />
* Dual stack nodes - 21%<br />
** YES: Brunel; IC; QMUL; Oxford (4)<br />
** NO: RHUL; UCL; Lancaster; Glasgow; Liverpool; Manchester; Sheffield; Durham; ECDF; Birmingham; Bristol; Cambridge; RALPP; Sussex, RAL T1 (15)<br />
<br />
<br />
<br />
'''Tuesday 21st October'''<br />
* High loads seen in xroot by several sites: Liverpool and RALT1... and also Bristol (see Luke's TB-S email on 16/10 for questions about changes to help).<br />
<br />
'''Tuesday 9th September'''<br />
* Intel announced the new generation of Xeon based on Haswell.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Sites****************** -----><br />
|}<br />
<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Meeting Summaries<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start PMB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Project Management Board - [http://www.gridpp.ac.uk/pmb/ Members][http://www.gridpp.ac.uk/php/pmb/minutes.php Minutes] [http://www.gridpp.ac.uk/pmb/ProjectManagement/QuarterlyReports/reports.html Quarterly Reports]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ******************************************* -----><br />
<!-- ****************End PMB****************** -----><br />
<!-- ****************Start Grid Ops****************** -----><br />
<!-- ******************************************* -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | GridPP ops meeting - [http://indico.cern.ch/categoryDisplay.py?categId=338 Agendas] [https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Actions] [https://www.gridpp.ac.uk/wiki/Category:GridPP_Operations Core Tasks]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Grid Ops****************** -----><br />
<!-- ****************Start T1 liaison****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) [https://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Agenda] Meeting takes place on Vidyo.<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
Highlights from this meeting are now included in the Tier1 report farther up this page.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Liaison****************** -----><br />
<!-- ****************Start GDB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Grid Deployment Board - [http://indico.cern.ch/categoryDisplay.py?categId=3l181 Agendas] [http://indico.cern.ch/categoryDisplay.py?categId=666 MB agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End GDB****************** -----><br />
<br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start NGI****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | NGI UK - [http://www.ukngi.ac.uk/ Homepage] [https://ca.grid-support.ac.uk/cgi-bin/pub/pki?cmd=getStaticPage&name=index CA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End NGI****************** -----><br />
<br />
<!-- ****************Start Events****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Events<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************Stop Events****************** -----><br />
<!-- ****************Start ATLAS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK ATLAS - [http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?#currentView=Shifter+view&highlight=false Shifter view] [http://www.atlas.ac.uk/ops/ News & Links]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Atlas S&C week 2-6 Feb 2015'''<br />
<br />
Production<br />
<br />
• Prodsys-2 in production since Dec 1st<br />
<br />
• Deployment has not been transparent , many issued has been solved, the grid is filled again <br />
<br />
• MC15 is expected to start soon, waiting for physics validations, evgen testing is underway and close to finalised.. Simulation expected to be broadly similar to MC14, no blockers expected. <br />
<br />
Rucio<br />
<br />
• Rucio in production since Dec 1st and is ready for LHC RUN-2. Some fields need improvements, including transfer and deletion agents, documentation and monitoring. <br />
<br />
• [https://rucio-ui.cern.ch/dumps Rucio dumps available].<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMDarkDataCleaning Dark data cleaning]<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMLostFilesLost files declaration] . Only Only DDM ops can issue lost files declaration for now, cloud support needs to fill a ticket.<br />
<br />
• Webdav panda functional tests with Hammercloud are ongoing<br />
<br />
Monitoring<br />
<br />
• [http://adc-monitoring.cern.ch/ Main page]<br />
<br />
• [http://dashb-atlas-ddm-acc.cern.ch/dashboard/request.py/ddmaccounting DDM Accounting]<br />
<br />
• [http://atlas-agis.cern.ch/agis/ddmblacklisting/listGroup space]<br />
<br />
• [http://dashb-atlas-ddm.cern.ch/ddm2/ Deletion]<br />
<br />
ASAP<br />
<br />
• ASAP (ATLAS Site Availability Performance) in place. Every 3 months the T2s sites performing BELOW 80% are [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/ATLASSiteCategorization reported to the International Computing Board.]<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End ATLAS****************** -----><br />
<!-- ****************Start CMS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK CMS<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End CMS****************** -----><br />
<!-- ****************Start LHCb****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK LHCb<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End LHCb****************** -----><br />
<!-- ****************Start Other****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK OTHER<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Other****************** -----><br />
<!-- ****************Start Requests****************** -----><br />
{| cellspacing="0" cellpadding="0" style="margin: 1em 0em 0em 0.3em; width:100%"<br />
| style="width:50%; vertical-align:top; border:1px solid Gold; background-color: LightGreen;" rowspan="2"|<br />
<div style="border-bottom:1px solid Gold; background-color:#ffffaa; padding:0.2em 0.5em 0.2em 0.5em; font-size:110%; font-weight:bold;"> To note</div><br />
<div style="padding:0.4em 1em 0.3em 1em;"><br />
<br />
==== ====<br />
<!-- ******************Edit start********************* -----><br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<br />
</div><br />
|}<br />
<br />
<!-- ****************End Requests****************** -----><br />
<br />
|}</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Past_Ticket_Bulletins_2015Past Ticket Bulletins 20152015-10-26T16:15:08Z<p>Matthew Doidge 1ac9bd3994: </p>
<hr />
<div>'''Monday 19th October 2015, 14.30 BST'''<br /><br />
28 Open UK Tickets today.<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116920 116920] (14/10)<br /><br />
UCL have a availability ticket, and Andrew M wonders what can be done for them as a VAC site to stop getting these sort of tickets? Assigned (15/10) ''Update - Alessandra has updated and On-Holded the ticket. Thanks!''<br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116865 116865] (12/10)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116915 116915] (14/10)<br /><br />
Sussex have a Sno+ ticket and a ROD ticket that don't seem to have been looked at since they were submitted last week. Both just Assigned.<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116918 116918] (14/10)<br /><br />
Another ROD ticket (Invalid glue), I think this one has snuck past the Liver-Lad's watch. Assigned (14/10)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116782 116782] (7/10)<br /><br />
Another Rod ticket, Dan looks like he's tracked down why one of his CE's is misbehaving (MaxStartups in sshd_config). Looks like this ticket at least can be closed, Gareth confirms that the test is green. Waiting for reply (19/10)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116560 116560] (30/9)<br /><br />
Stephen B has added some more information to try to figure out why Sno+ jobs are flooding Sheffield's 10 Sno+ slots. Elena is not sure if the problem persists though. Waiting for reply (12/10)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116864 116864] (12/10)<br /><br />
It looks like this CMS AAA test problem has resolved itself - Federica asks if you chaps at RAL changed anything? Looks like the ticket can be closed. In progress (15/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116866 116866] (12/10)<br /><br />
Enabling Sno+ pilots at the Tier 1. Only the LHC VOs had pilot roles enabled at the Tier 1, Andrew was going to discuss how best to make these changes. As with a similar issue at Lancaster - probably best to do it for all the VOs that will be using Dirac. In progress (13/10)<br />
<br />
'''Monday 12th October 2015, 14.30 BST'''<br /><br />
23 Open UK Tickets this week. Just a light review.<br />
<br />
'''OXFORD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116752 116752] (1/10)<br /><br />
Oxford's CMS Phedex renewal ritual ticket. Chris B kindly offered in his ticket for RALPP to officiate this (un)holy task for Oxford, so I advise some communication between you chaps and him - which may well be going on, but it ain't in the ticket! Assigned (6/10)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116683 116683] (5/10)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116775 116775] (6/10)<br /><br />
CMS have by the looks of it thrown a pair of duplicate tickets Bristol's way. How rude! Lukasz has rightfully suggested closing one of them (I suggest 116683. Winnie's put a good reply in t'other one).The underlying problem appears to be a shortage of pool accounts - what are the recommended amount of accounts for VOs these days? In progress.<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] (18/6)<br /><br />
Sheffield pilot role tickets. Daniela has pointed out that as Sno+ have started using Sheffield in earnest they really could do with pilot roles enabled. In progress (9/10)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116812 116812] (8/10)<br /><br />
LHCB asked Andy to clean up the $LOGIN_POST_SCRIPT for LHCB at his site, removing a "export CMTEXTRATAGS="host-slc5"" line. Naught wrong with the ticket, but I liked how lhcb did some good debugging, and there's something about a profile script problem that reminds me of a simpler time... In progress (9/10)<br />
<br />
[http://tinyurl.com/nwgrnys '''A link to the rest of the tickets for completeness.''']<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''The other VO nagios.''']<br /><br />
Looks okay, some (known I think) errors at QM, and Sheffield seems to have a few SRM errors going on since Saturday.<br />
<br />
<br />
'''Monday 5th October 2015, 14.15 BST'''<br />
<br />
22 Open UK Tickets this month, all of them, Site by Site:<br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116136 116136] (9/9)<br /><br />
Sussex got a snoplus ticket for a high number of job failures, although simple test jobs ran okay. Matt asks if the problem persists, the reply was a resounding "not sure". In progress (think about closing) (21/9)<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116652 116652](1/10)<br /><br />
A ticket from CMS, about some important Phedex ritual that must occur on the 3rd of November, when the stars are right. The ticket needs some confirmation and feedback, plus the nomination of one site acolyte to receive the DBParam secrets from CMS - but the ticket only got assigned to sites this morning. Assigned (5/10)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116651 116651](1/10)<br /><br />
Same as the RALPP ticket, Winnie has volunteered Dr Kreczko for the task. In progress (5/10)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303](Long, long ago)<br /><br />
glexec ticket. On hold (18/5)<br />
<br />
'''DURHAM'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116576 116576](1/10)<br /><br />
Atlas ticket asking Durham to delete all files outside of the datadisk path. Oliver asks what this means for the other tokens (I think they can be sacrificed to feed datadisk, but Brian et al can confirm that). Waiting for reply (5/10)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116560 116560](30/9)<br /><br />
Sno+ jobs having trouble at Sheffield. Looks like a proxy going stale problem as only 10 Sno+ jobs at a time can run at Sheffield. Matt M asks if/how the WMS can be notified to stop sending jobs in such a case. In progress (30/9)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460](18/6)<br /><br />
Gridpp Pilot roles. No news on this for a while, after the last attempt seemed to not quite work. In progress (30/7)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116585 116585](1/10)<br /><br />
Biomed ticketed Manchester with problems from their VO nagios box - which Alessandra points out being due to there being no spare cycles for biomed to run on. Assigned (can be put on hold or closed?) (1/10)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116082 116082](7/9)<br /><br />
A classic Rod Availability ticket. On Hold (7/9)<br />
<br />
'''LANCASTER''' (a little embarrassing that my own site has the most tickets)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116478 116478] (28/9)<br /><br />
Another availability ticket, this time for Lancaster (which has been through the wars in September). Still trying to dig our way out, but even the Admin's broke. On hold (5/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116676 116676] (5/10)<br /><br />
Another ROD ticket, Lancaster's not quite out of the woods. We think WMS access is somewhat broken. We have no idea about the sha2 error. In progress (5/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116366 116366] (22/9)<br /><br />
Sno+ spotted malloc errors at Lancaster. The problems seemed to survive one batch of fixes, but I asked again if they still see problems after running a good number of jobs over the weekend. Waiting for reply (5/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (In a galaxy far, far way)<br /><br />
glexec ticket. This was supposed to be done last week, after I had figured out [https://www.gridpp.ac.uk/wiki/RelocatableGlexec "the formula"] - but then last week happened. On hold (5/10)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115959 115959] (31/8)<br /><br />
LHCB job errors at QM, with a 70% pilot failure rate on ce05. Dan couldn't see where things are breaking (only that the CE wasn't publishing to APEL- and asks if this is the cause of the problem?) Waiting for reply (5/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116662 116662] (5/10)<br /><br />
LHCB job failures on ce05 - almost certainly a duplicate of [https://ggus.eu/?mode=ticket_info&ticket_id=115959 115959], but it might have some useful information in it. Assigned (probably can be closed as a duplicate) (5/10)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116650 116650] (1/10)<br /><br />
Imperial's invitation to the CMS Phedex DBParam ritual. Daniela's on it, as well as the other CMS sites. On hold (5/10)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116649 116649] (1/10)<br /><br />
Brunel's ticket for the great DBParam alignment of 2015. On hold (5/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116455 116455] (28/9)<br /><br />
A CMS request to change the xrootd monitoring configs. Did you get round to doing this last week Raul? In progress (29/9)<br />
<br />
'''EFDA-JET'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115448 115448] (3/8)<br /><br />
Biomed having trouble tagging the jet CE. The Jet admins think this is the same underlying issues as their other ticket <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115496 115496]. In progress (25/9)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115496 115496] (5/8)<br /><br />
Biomed unable to remove files from the jet SE. There are clues that suggest that some dns oddness is the cause, but it's not clear. In progress (18/9)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116358 116358] (22/9)<br /><br />
Ticket complaining about a missing image at the site. Some to and fro, the ball is back in the site's court. In progress (2/10)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116618 116618] (1/10)<br /><br />
The Tier 1's CMS DBParam ritual ticket. In progress (5/10)<br />
<br />
Let me know if I missed ought.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''T'OTHER VO NAGIOS''']<br /><br />
At time of writing things looka a bit rough at QM, Liverpool (just getting over their downtime) and for Sno+ at Sheffield (likely related to their ticket).<br />
<br />
<br />
Matt's on holiday until the 29th, so he's being replaced with links or any update Jeremy is kind enough to provide.<br />
<br />
[http://tinyurl.com/nwgrnys '''CAST YOUR GAZE UPON THE UK'S GGUS TICKETS HERE'''].<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''FEAST YOUR EYES ON THE T'OTHER VO NAGIOS STATUS HERE''']<br />
<br />
"Normal" service will resume in October. I'll leave y'all anticipating that lovely review of all the tickets.<br />
<br />
'''Monday 14th September, 15.00 BST'''<br /><br />
<br />
Yet another brief ticket review - its been a tad busy! Hopefully normal service will resume, err, next month. My apologies.<br />
<br />
Only 20 Open UK tickets this week.<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115805 115805] - This RAL ticket from Sno+ looks like it can be closed, with there not actually being a problem with the site, or a feature that RAL would really want to implement. <br />
<br />
Keeping on the Tier 1, does this ticket concerning the FTS3 certificates ([https://ggus.eu/?mode=ticket_info&ticket_id=115290 115290]) have anyone looking into it yet?<br />
<br />
Are these two QMUL LHCB tickets duplicates, or related? [https://ggus.eu/?mode=ticket_info&ticket_id=115959 115959] & the more recent [https://ggus.eu/?mode=ticket_info&ticket_id=116153 116153]<br />
<br />
This Sussex ticket looks like it could do with someone taking a look - still just "assigned" since the 9th - [https://ggus.eu/?mode=ticket_info&ticket_id=116136 116136]<br />
<br />
Finally, an interesting VAC ticket for Oxford - "no space left on device" for atlas jobs - [https://ggus.eu/?mode=ticket_info&ticket_id=116123 116123]. Bit of an odd one!<br />
<br />
'''Monday 7th September 2015, 15.00 BST'''<br /><br />
20 Open UK Tickets this week.<br />
Due to Interesting Downtimes (apologies for reusing that pun!) yet another fairly light review. But not much is going on in ticketland.<br />
<br />
[http://tinyurl.com/nwgrnys THE GGUS TICKETS IN ALL THEIR GLORY]<br /><br />
The Sno+ ticket [https://ggus.eu/?mode=ticket_info&ticket_id=115805 115805] is interesting, Sno+ are looking at monitoring jobs submitted via WMS through the port 9000 URLS, but the RAL WMS behaves differently from the Glasgow one and doesn't let others with the same roles look at the links. Sno+ are still "developing" their grid infrastructure with the WMS in mind by the looks of it.<br />
<br />
The pilot role at Sheffield ticket [https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] could still do with some attention, or an update.<br />
<br />
Bristol had a ticket from CMS that looks interesting - [https://ggus.eu/?mode=ticket_info&ticket_id=115883 115883]. The CMS SAM3 tests are confused by Bristol having an SRM-less SE endpoint. Waiting for reply after things were clarified.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 T'Other VO Nagios Page]<br /><br />
A few "The CREAM service cannot accept jobs at the moment" style errors at QM, but they're only a few hours old. Otherwise looking alright beyond the usual noise.<br />
<br />
Of course with these light reviews I could well be missing something, so feel free to let me know - sites or VO representatives.<br />
<br />
'''Monday 24th August 2015, 15.45 BST'''<br /><br />
<br />
21 Open UK tickets this week, most being looked at or are understood. There will be no ticket update from Matt next week (1st September) either, as he will be flapping about during a local downtime.<br />
<br />
[http://tinyurl.com/nwgrnys YE OLDE GGUS TICKET LINK]<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] (gridpp pilots at Sheffield) could do with an update. The Liverpool ticket discussed last week ([https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248]) has received an update from the user saying that the ticket can be closed. As Steve mentioned last week the underlying issue is still very much there, but I don't think this ticket is a suitable banner for us to fight that battle under.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 T'Other VO Nagios Page]<br /><br />
As usual nothing to see here at time of writing, sites are doing a grand job of working for the monitored VOs.<br />
<br />
And that's all folks! See you in Liverpool.<br />
<br />
<br />
'''Monday 17th August 2015, 15.00 BST'''<br /><br />
29 Open UK Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114992 114992] (10/7)<br /><br />
It looks like this CMS transfer problem ticket can be closed after a user update last week, which reported enabling multiple streams solved the initial failures. In progress (13/8)<br />
<br />
'''EFDA-JET'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115613 115613] (assigned) ''Update- looks like this was a temporary problem, and the ticket can be closed.<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115448 115448] (in progress, but empty)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115496 115496] (in progress, but might not be a site problem).<br /><br />
Jet have 3 biomed tickets, 2 of which are looking a little neglected.<br />
<br />
'''CAMBRIDGE'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115655 115655] (12/8)<br /><br />
John rightfully asks why does a lengthy but very scheduled downtime set off a ROD alarm. Other then that it'll be worth "on-holding" this ticket whilst the unrighteous red flag fades. In progress (17/8) ''Update - On holded''<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
This Sno+ ticket looks like it needs some chasing up, no news for nearly 2 months. In progress (21/7) ''Update - Steve commented on this on TB-SUPPORT''<br />
<br />
'''Monday 10th August 2015, 15.00 BST'''<br /><br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 T'Other VO Nagios] looks alright at time of writing.<br />
<br />
24 Open UK tickets this week.<br />
<br />
''Lots of activity on GGUS since yesterday. Most positive, but I see the number of tickets at EFDA-JET increasing - all three from biomed.''<br />
<br />
'''Tier 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115512 115512] (5/8)<br /><br />
An interesting ticket - where a banned user is still banned after moving to LHCB from Biomed (no, he's not called Heinz). In Progress (6/8) ''Update - Andrew has asked that the ticket be reassigned to the argus devs as the pepd logs are showing oddness. Waiting for reply now.''<br />
<br />
'''Bristol'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115565 115565] (7/8)<br /><br />
Bristol's phedex agents are down, and have been for a few days. I might have dreamt this, but thought that the Bristol Phedex service might not be hosted at Bristol, especially after RALPP had a similar ticket ([https://ggus.eu/?mode=ticket_info&ticket_id=115566 115566]) at the same time. Assigned (7/8) ''Update - solved''<br />
<br />
'''Manchester'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115504 115504] (ROD Ticket) ''Solved'' <br /> <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115399 115399] (Wiki Ticket) ''Still Open'' <br /> <br />
Both these tickets look like they can be closed.<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115525 115525] (5/8)<br /><br />
Atlas deletion errors after a a disk server fell over- nothing wrong with the ticket handling, but Alessandra brings up a point that always niggles me - the emphasis on the total number of transaction errors and not the number of affected unique files. In progress (8/8)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB ticket sparked by those IPv6 problems. Still no word from Vladimir; Raja - could you comment? I suspect there's been plenty of room for LHCB jobs in QM's (and everyone else who mainlines atlas jobs) queues this weekend. Waiting for reply (21/7)<br />
<br />
'''Monday 3rd August 2015, 14.30 BST'''<br /><br />
<br />
23 Open tickets this month, full review time.<br />
<br />
'''''Newish this morning'''''<br /><br />
As discussed on TB-SUPPORT, a few sites have been getting "I can't lcg-tag your CE" tickets from Biomed. The Liverpool ticket [https://ggus.eu/?mode=ticket_info&ticket_id=115449 115449], solved and verified, was the flagship of these issues. Brunel ([https://ggus.eu/?mode=ticket_info&ticket_id=115445 115445]) and EFDA-JET ([https://ggus.eu/?mode=ticket_info&ticket_id=115448 115448]) also have tickets about this.<br />
<br />
<br />
'''Sno+ "glite-wms-job-status warning"''' (3/8)<br /><br />
Glasgow: [https://ggus.eu/?mode=ticket_info&ticket_id=115435 115435]<br /><br />
Tier 1: [https://ggus.eu/?mode=ticket_info&ticket_id=115434 115434]<br /><br />
Matt M submitted these tickets to Glasgow and the Tier 1 after having trouble with a proportion of Sno+ jobs. Both are being looked at- definitely worth collaborating on this one.<br />
<br />
'''Wiki-leak...'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115399 115399](31/7)<br /><br />
Jeremy noticed that the wiki didn't work for him on Friday - but it seems to work for Jeremy, Alessandra and myself now. As Jeremy notes the ticket can be closed, but out of interest did anyone else spot any problems? In progress (3/8)<br />
<br />
'''Spare the ROD...'''<br /><br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115433 115433] (3/8)<br /><br />
Some CE problems noticed on the dashboard for the Liver-lads - who might be in mourning. Assigned (3/8) ''Update - aaannd Solved by upping gridftp max connections''<br />
<br />
'''RALPP & UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114764 114764]<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114851 114851]<br /><br />
Both of these are "availability" alarm tickets, on-holding until they clear. I hope RALPP managed to get a re-computation for their unfair failures (IGTF-1.65 problems on ARC).<br />
<br />
'''Sno+'d Under'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115387 115387] (30/7)<br /><br />
I'm uncertain if this Sno+ ticket, probably somewhat related to Matt M's recent thread on TB-SUPPORT and concerning xrootd access, is meant for the Tier 1 or RALPP. Assigned (3/8)<br />
<br />
'''First Tier Problems.'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115417 115417] (2/8)<br /><br />
LHCB spotted a number of nodes with cvmfs problems at the Tier 1, which the RAL team had already jumped on and repaired this morning. They wonder if the problem persists. Waiting for reply (3/8)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115290 115290] (28/7)<br /><br />
An FTS problem requiring some special CA magic to solve, but the current CA-wizard isn't about. On hold (29/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113836 113836] (20/5)<br /><br />
Glue 1 vs Glue 2 queue mismatches. It's being worked on perfecting cluster publishing for ARC CEs, but the ticket could either do with an update or on-holding. In progress (24/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114992 114992] (10/7)<br /><br />
CMS transfers failing between RAL and, err, TAMU in the US. Assigned to RAL, where Brian has investigating and Andrew has posed an good question, asking if the user has considered managing the transfers with FTS. Quiet on the user side. In progress (21/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/2014)<br /><br />
One of the tickets from the before times, about CMS AA access tests. It has become a long and confusing saga, but Gareth rescued it with a handy summary of the issue in his last update. How goes the battle? In progress (17/7)<br />
<br />
'''Oxford Squid is red.'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115230 115230] (24/7)<br /><br />
Which might be the colour Ewan's seeing right now! The ticket is reopened, with a comment from Alessandra that the current recommendation is to allow all CERN addresses, and asks if this is something Oxford could do. Reopened (3/8) ''Update - solved after a squid restart. It's not the end of this ordeal, but Ewan would like to tackle it in a different arena then an Oxford ticket.''<br />
<br />
'''Mavaricks and Gooses - GridPP Pilot roles.'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114485 114485] '''Bristol'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] '''Sheffield'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114442 114442] '''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114441 114441] '''RHUL'''<br /><br />
Daniela hopped right back into the pilot seat after getting back from her holidays. Bristol and RALPP are looking good, Sheffield and RHUL are still in the Danger Zone - RHUL in particular were having troubles with argus and could do with some working configs from elsewhere to compare and contrast with their own.<br /><br />
''Update - RHUL are looking better, just a few queue permission tweaks to go by the looks of it.''<br />
<br />
<br />
'''My shame - tarball glexec'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] '''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] '''Lancaster'''<br /><br />
The tarball glexec tickets. Actually this is likely to become a defunct (or at least different) problem at Edinburgh with their SL7 move. Lancaster has a plan - we plan to deploy *something* (amazing plan there Matt) during our next big reinstall in September. Between now and then I have a test CE, cluster and most importantly some time. <br />
<br />
'''Pot-luck tickets (or those I couldn't group).'''<br />
<br />
'''Durham'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br /><br />
A tiny fraction of jobs publishing 0 cores used. Looks to be a slurm oddity. Oliver upgraded their CEs to ARC5 last week and hopes this has fixed things. Fingers crossed! In progress (29/7)<br />
<br />
'''Lancaster'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/2014)<br /><br />
My other shame - Lancaster's poor perfsonar performance. It's being worked on. On Hold (should be back in progress soon) (30/7)<br />
<br />
'''Liverpool'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (21/7)<br /><br />
Sno+ production problems at Liverpool, probably due to a lack of space in the shared area. Things are back in Sno+'s court, with the submitter consulting the Sno+ gurus (I think). In progress (21/7)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB job submission problems due to the known about dual-stacking problems. Waiting for input from LHCB for a while now, as things look okay at QM now but at least check LHCB jobs still weren't running for some reason. Waiting for reply (21/7)<br />
<br />
That's all folks!<br />
<br />
'''Monday 27th July 2015, 16.10 BST'''<br /><br />
<br />
Only 20 UK Tickets this week, and many are on hold for summer holidays. I pruned a few tickets, but none are striking me as needing urgent action, so this will be brief.<br />
<br />
[http://tinyurl.com/nwgrnys Mandatory UK GGUS Link]<br /><br />
Nothing to see here really, but maybe I'm missing something? Nit-picking I see:<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (publishing problems at Durham) could still do with an update - not sure if work is progressing offline on the issue.<br />
<br />
The Snoplus ticket [https://ggus.eu/?mode=ticket_info&ticket_id=115165 115165] looks like it might be of interest for others - in it Matt M asks about tape-functionality in gfal2 tools. Brian has updated the ticket clueing us in about gfal-xattr.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 UK T'other VO Nagios]<br /><br />
A few failures here at time of writing - although only one at Brunel seems to be more then a few hours old (dc2-grid-66.brunel.ac.uk is failing pheno CE tests).<br />
<br />
Let me know if I missed ought!<br />
<br />
'''Monday 20th July 2015, 14.30 BST'''<br /><br />
27 Open UK Tickets this week.<br />
<br />
'''Brunel'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115113 115113] (17/7)<br /><br />
This Brunel Ops ticket (otherwise okay) is a good reminder that when removing CEs from the GOCDB for your site make sure to get all the services, or else you just might end up still monitored (and thus end up ticketed!). Reopened (can probably be closed soon) (20/7) ''Update - ticket solved''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114006 114006] (31/5)<br /><br />
This ticket about problems with Brunels accounting figures was looking promisingly close to being solved (at least to my layman's eyes) 3 weeks ago. Any word offline? In progress (30/6)<br />
<br />
'''Lancaster'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114845 114845] (6/7)<br /><br />
LHCB jobs were failing at Lancaster a fortnight ago, but things should have been fixed quite promptly. Are they still broken? Waiting for reply (14/7)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
A similar case for this LHCB ticket for QM (part of their ongoing dual-stack saga). Waiting for reply (13/7)<br />
<br />
Note: The related issue [https://ggus.eu/index.php?mode=ticket_info&ticket_id=115017 115017] has been "solved" pending further testing. <br />
<br />
'''Sheffield'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114649 114649] (26/6)<br /><br />
This Sno+ ticket has been in "Waiting for Reply" for a little while, no word from the user (who isn't Matt M). Could we poke Sno+ though another channel about this? Waiting for reply (6/7)<br />
<br />
'''Liverpool'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
We could do with finding out from Sno+ the state of play at Liverpool too, although we might have to wait until Steve is back from his hols later this week to field any replies. In progress (17/6)<br />
<br />
'''Durham'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br /><br />
Ticket concerning the small percentage of Durham jobs that aren't publishing their core count (probably a slurm oddity). Now that Oliver's back from holiday has he had time to look at this? On Hold (19/6)<br />
<br />
'''Tier 1'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=113836 (20/5)<br /><br />
I suspect that whilst the work described chugs along in the background we consider on holding this ticket. In progress (24/6)<br />
<br />
'''UCL'''<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=114746 114746] (30/6)<br /><br />
Ben has been battling getting his DPM working, and spotted an interesting problem where SELinux was blocking the httpd from accessing mysql. It's nice to see someone not just switching SELinux off (like I have a habit of doing...). In progress (20/7)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115003 115003] (12/7)<br /><br />
Andy having some problems on a test SE at ECDF - he seems ot be suffering a series of unfortunate errors. Maybe the storage group could help? In progress (17/7)<br />
<br />
'''GridPP Pilot Roles.'''<br /><br />
Bristol are ready for testing, Govind discovered a possible bug in argus that needs a bit more testing at RHUL. Things seem quiet at RALPP and Sheffield, and Brunel too.<br />
<br />
''Supplemental - In the region multicore publishing ticket ([https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233]) only Oxford have a CE still appearing to publish 0 cores - but I thought this CE was Old-Yeller'ed?''<br />
<br />
'''Tuesday 14th July 2015, 9.30 BST'''<br />
<br />
Lazy update today, due to some fun and games at Lancaster yesterday. <br />
<br />
[http://tinyurl.com/nwgrnys UK GGUS Tickets]<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios]<br />
<br />
'''Tickets that pop:'''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114952 114952] and [https://ggus.eu/?mode=ticket_info&ticket_id=114951 114951] are both atlas frontier tickets at RALPP and Oxford, both have been reopened - although the underlying issues seem different. The Oxford ticket is similar to one at RAL ([https://ggus.eu/?mode=ticket_info&ticket_id=114957 114957]), which looks to be caused by an unannounced change in IP for some important atlas squids (AIUI - speed reading the tickets this morning).<br />
<br />
'''QM IPv6 Woes'''<br /><br />
Followers of the atlas uk lists will have noticed some heroic attempts to diagnose and repair problems at QM which appear to be someone else's fault.<br />
LHCB's ticket to QM on the matter: [https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573]<br /><br />
Dan's ticket concerning the "rogue routes": [https://ggus.eu/index.php?mode=ticket_info&ticket_id=115017 115017]<br />
<br />
'''GridPP Pilot Roles'''<br /><br />
Durham, Bristol, Sheffield, Brunel and RHUL still have open tickets about this. Bristol are working on it, as are Durham - Oliver's ready for their setup to be tested again (Puppet overwrote his last changes!). Not much recent news from the other three.<br />
<br />
That's all from me folks, let me know if I missed ought!<br />
<br />
'''Tuesday 14th July 2015, 9.30 BST'''<br />
<br />
Lazy update today, due to some fun and games at Lancaster yesterday. <br />
<br />
[http://tinyurl.com/nwgrnys UK GGUS Tickets]<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios]<br />
<br />
'''Tickets that pop:'''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114952 114952] and [https://ggus.eu/?mode=ticket_info&ticket_id=114951 114951] are both atlas frontier tickets at RALPP and Oxford, both have been reopened - although the underlying issues seem different. The Oxford ticket is similar to one at RAL ([https://ggus.eu/?mode=ticket_info&ticket_id=114957 114957]), which looks to be caused by an unannounced change in IP for some important atlas squids (AIUI - speed reading the tickets this morning).<br />
<br />
'''QM IPv6 Woes'''<br /><br />
Followers of the atlas uk lists will have noticed some heroic attempts to diagnose and repair problems at QM which appear to be someone else's fault.<br />
LHCB's ticket to QM on the matter: [https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573]<br /><br />
Dan's ticket concerning the "rogue routes": [https://ggus.eu/index.php?mode=ticket_info&ticket_id=115017 115017]<br />
<br />
'''GridPP Pilot Roles'''<br /><br />
Durham, Bristol, Sheffield, Brunel and RHUL still have open tickets about this. Bristol are working on it, as are Durham - Oliver's ready for their setup to be tested again (Puppet overwrote his last changes!). Not much recent news from the other three.<br />
<br />
That's all from me folks, let me know if I missed ought!<br />
<br />
'''Monday 6th July 2015, 14.00 BST''' <br /><br />
30 Open UK Tickets this month. Looking at them all!<br />
<br />
'''NGI'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233] (10/6)<br /><br />
The UK not publishing core counts at all sites. Some progress, but at last check John G couldn't see a change for Oxford or Glasgow. In progress (30/6) ''Update - Glasgow seems to be okay after de-creaming, checking the July list we have t2ce6 at Oxford, ce3 and ce4 at Durham (see their ticket) and cetest02 at IC (but that node has test in its hostname!).''<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114442 114442] (18/6)<br /><br />
Gridpp Pilot role ticket. Accounts need to be created, but no word for a few weeks. In progress (19/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114764 114764] (1/7)<br /><br />
Ticket tracking (false) availability issues, created to appease COD - the problem caused by a broken CA rpm release for Arc CEs. Kashif has created a counter-ticket [https://ggus.eu/index.php?mode=ticket_info&ticket_id=114742 114742] Gordon's sagely advice is to submit a recalculation request once the issue is fixed. Assigned (1/7)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114485 114485] (19/6)<br /><br />
Bristol's gridpp pilot role ticket. No news, could do with an update really. In progress (22/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114426 114426] (18/6)<br /><br />
CMS AAA reading test problems. The Bristol admins have transferred data to their new shiny SE and have asked CMS to test again. No word since. Waiting for reply (30/6)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (1/7/13...)<br /><br />
Tarball glexec ticket, now 2 years old. After a really promising burst the last 6 weeks haven't seen any progress, due to a lot of other "normal" tarball work taking up the time. Sorry! On hold (18/5)<br />
<br />
'''DURHAM'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114536 114536] (22/6)<br /><br />
Durham's gridpp pilot role ticket. Not acknowledged yet, is Oliver back yet? Assigned (22/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114765 114765] (1/7)<br /><br />
See RALPP ticket 114764. Assigned (1/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114727 114727] (30/6)<br /><br />
Catalin ticketed that a number of SW_DIR variables at Durham are still pointing to the old school .gridpp.ac.uk cvmfs space. Assigned (30/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br /><br />
John G ticketed Durham over a small percentage of jobs being published as "zero core". Looks like a SLURM timeout problem, although a fix isn't obvious. Put on the back burner whilst Oliver is on holiday. On Hold (19/6)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114649 114649] (26/6)<br /><br />
A ticket from a Sno+ user about not being able to access software using the Sheffield CEs. Acknowledged but no news. In progress (26/6) ''Update - Elena can't find anything wrong, cvmfs seems to be working fine. Perhaps a problem with the environment?''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] (18/6)<br /><br />
Sheffield's gridpp pilot role ticket. Did you get round to rolling them out? In progress (19/6)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114444 114444] (18/6)<br /><br />
LHCB ticket concerning the DPM's SRM not returning checksum information. On hold whilst a related ticket is being looked at ([https://ggus.eu/index.php?mode=ticket_info&ticket_id=111403 111403]). On Hold (22/6)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
Another Sno+ ticket, about grid production jobs failing at Liverpool. AIUI caused by Sno+ running out of space on the shared pool. At last check Steve posted the usage information for Sno+ but no word since (and Steve's off on his hols). In progress (17/6)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114845 114845] (6/7)<br /><br />
LHCB pilots failing at Lancaster. Looks like a simple node misconfiguration, hopefully fixed, waiting to see if it is. On hold (6/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/2013)<br /><br />
glexec ticket - see Edinburgh description. On hold (15/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1)<br /><br />
Bad bandwidth performance at Lancaster. Hoping that IPv6 will shake things up a bit so pushing that. On hold (18/5)<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114746 114746] (30/6)<br /><br />
SRM-put failures ROD ticket. No news at all. Assigned (30/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114851 114851] (6/7)<br /><br />
Low availability ROD ticket, related to above. Assigned (6/7)<br />
<br />
'''RHUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114441 114441] (18/6)<br /><br />
Another GridPP pilot role ticket. Pilots rolled out, but something isn't quite right and they're not working - Govind is looking again. In progress (6/7)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB ticket about two out of three QM CEs not responding for them. Dan spotted the broken CEs were dual-stacked, the working one wasn't. The ticket seemed to have trailed off into some confusion over who needs to do some testing where. I agree with Dan that that who needs to be someone with LHCB credentials! The waters still seem muddied. In progress (1/7)<br />
<br />
'''IC'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114737 114737] (30/6)<br /><br />
The IC voms wasn't updating properly, due to what I infer from the ticket as "SSL/mysql madness". Simon and Robert have been heroically battling this one - it's a good read. On hold (3/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114379 114379] (16/6)<br /><br />
Sam's ticket about SE support in Dirac. Sam will shortly try testing things out on the new Dirac to see how it fares. In progress (6/7)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114447 114447] (18/6)<br /><br />
Brunel's gridpp pilot ticket. Being worked on, with one CE with the pilots enabled. In progress (26/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114006 114006] (31/5)<br /><br />
A ticket from APEL, about Brunel under-reporting the number of jobs they are doing. Turned out to be a problem with Arc, which Raul upgraded to the fixed 5.0 version. The APEL team deleted the sync records, but no word since. In progress (30/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114850 114850] (6/7)<br /><br />
Another APEL ticket, likely the fallout of the previous one - it looks like GAP publishing has been left on for the Brunel CREAM CEs. Assigned (6/7) ''Update - solved''<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114786 114786] (2/7)<br /><br />
Low availability ticket - see RALPP ticket 114442 - probably could do with On holding. In progress (2/7) ''Update - Onholded''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113910 113910] (26/5)<br /><br />
Sno+ data staging problems. Brian gave some advice on how the large VOs do data staging from tape, and has asked if Sno+ still has problems. Matt M might still be on leave though. Waiting for reply (23/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA problems, which eventually brought to light to a problem with super-hot datasets which were alleviated (I think). Despite an update to castor that improved performance the last batch of tests didn't show improved results. No news since. In progress (17/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113836 113836] (20/5)<br /><br />
Glue mismatch problems at RAL. Working on getting "many-Arcs" to correctly publish. In progress (24/6)<br />
<br />
'''Monday 29th June 2015, 14.30 BST'''<br /><br />
<br />
Looking at the [https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''"Other VO" Nagios''']. <br /><br />
Things look generally alright - but Durham look like they need to update their CA rpms - but that might have to wait until Oliver is back from leave.<br />
<br />
'''Tarballs:'''<br /><br />
I don't think this effects many, but there's was a ticket to produce a new version of the WN tarball (which is done): [https://ggus.eu/index.php?mode=ticket_info&ticket_id=114574 114574]<br ><br />
Although AFAICS there is no urgent need to upgrade tarball WNs.<br />
<br />
26 UK Tickets, although not many stand out.<br />
<br />
'''Gridpp VO Pilot tickets:'''<br /><br />
Largely doing alright. With Oliver away the Durham ticket hasn't been looked at yet. Sheffield and Bristol's tickets could do with an update (or on-holding if there's going to be a delay). The RHUL ticket has been reopened as they're deployment of the pilot roles hasn't quite worked out.<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB having trouble with two out of three QM CEs. Dan notes that the two "broken" CEs have been recently dual-stacked, and asks if this could be the problem. The answers is a resounding "maybe", and Raja asks if problems could be duplicated by others using lxplus. Waiting for reply (24/6)<br />
<br />
'''IMPERIAL/DIRAC'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114379 114379] (16/6)<br /><br />
Sam's ticket trying to get SE support with Dirac...er, spruced up. Daniela has asked if the tests can be redone with the "new" dirac. Waiting for reply (22/6)<br />
<br />
Let me know if I missed any tickets.<br />
'''Monday 22nd June 2015, 14.00 BST'''<br />
<br />
35 Open UK Tickets this week (!!!)<br />
<br />
'''GridPP Pilot Role'''<br /><br />
A dozen of them are from Daniela (who painstakingly submitted them all) concerning getting the gridpp (and other) pilot role enabled on the site in question's CEs.<br />
<br />
An example of one of these tickets is:<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=114440 114440] (Lancaster's smug solved ticket).<br />
<br />
Ticketed sites are: Durham, Bristol, Cambridge, Glasgow, ECDF (who are also having general gridpp VO support problems), EFDA-JET (looking solved), Oxford, Liverpool, Sheffield, Brunel, RALPP and RHUL. Most tickets are being worked on fine, but the Bristol and Liverpool ones were still just in an "assigned" state at time of writing. <br /><br />
''Update - good progress on this, just one ticket left "assigned". Cambridge are done, as are JET (ticket needs to be closed). Oxford and Manchester are ready for to have their new setups tried out, with Oxford kindly road-testing glexec for the pilot roles. Good stuff.'' <br />
<br />
'''Core Count Publishing'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233] (10/6)<br /><br />
Of the sites mentioned in this ticket (Durham[1], IC, Liverpool, Glasgow, Oxford) who *hasn't* had a go at changing their core count publishing? I know Oxford have. Daniela had a pertinent question about publishing for VMs, which John answered. In progress (17/6)<br />
<br />
[1] Durham have another ticket on this which may explain their lack of core count publishing: <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br />
<br />
'''DIRAC'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114379 114379] (16/6)<br /><br />
Sam S formed this ticket over having trouble accessing the majority of SEs over Dirac, after some discussion around this last week. Sam acknowledges that this could be a site problem, not a DIRAC problem, but you gotta start somewhere (he worded that point more eloquently). Daniela has posted her latest and greatest DIRAC setup gubbins for Sam to try out. Another, unrelated, point to have are the names missing from Sam's list - for example I'm pretty sure Lancaster should support gridpp VO storage but I've forgotten to roll it out! Waiting for reply (22/6)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
Final ticket today, and another one discussed last week in the storage meeting. Steve's explanation of why (and how) Sno+ would need to start to using space tokens was fantastically well worded in a way to not spook easily startled users. David is digesting the information, but it will likely need to wait for Matt M's return before we'll see progress. In progress (16/6)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114444 114444](18/6)<br /><br />
I told a pork pie when I said that was the last ticket - this one caught my eye. A ticket from lhcb over files not having their checksums stored on Manchester's DPM. A link was given to another ticket at CBPF for a similar issue which got the DPM devs involved ([https://ggus.eu/index.php?mode=ticket_info&ticket_id=111403 111403]) - although Andrew McNab was already subscribed to the ticket. In progress (19/6)<br />
<br />
'''Monday 15th June 2015, 14.15 BST'''<br /><br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO nagios:]<br /><br />
ce05.esc.qmul.ac.uk and hepgrid11.ph.liv.ac.uk seem to be having a spot of bother for multiple VOs, and hepgrid2.ph.liv.ac.uk seems to be starting to have trouble too.<br />
<br />
19 Open UK Tickets this week.<br />
<br />
'''NGI'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233] (10/6)<br /><br />
John Gordon ticketed the NGI (as well as others) about some sites in the UK not publishing core counts with their APEL numbers (or more precisely have submission hosts at that site not publishing). Following the link it looks like Imperial, Liverpool, Durham, Glasgow and Oxford are on this list, I've listed the submission hosts reporting "0" core jobs below to help people clear up their rogues! If you've only fixed things in the last fortnight you'd still show up on this list. In progress (15/6) <br />
<br />
"0" core job submission hosts:<br /><br />
ce3.dur.scotgrid.ac.uk<br /><br />
ce4.dur.scotgrid.ac.uk<br /><br />
cetest02.grid.hep.ph.ic.ac.uk<br /><br />
hepgrid5.ph.liv.ac.uk<br /><br />
hepgrid6.ph.liv.ac.uk<br /><br />
hepgrid97.ph.liv.ac.uk<br /><br />
svr009.gla.scotgrid.ac.uk<br /><br />
t2ce06.physics.ox.ac.uk<br />
<br />
'''Tier 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113914 113914] (26/5)<br /><br />
Sno+ file copying ticket. Matt M is away on his hols, but Dave Auty has took over his duties and reports that this problem seems to have gone away - it can probably be closed. In progress (9/6)<br />
<br />
'''Liverpool'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
Another Sno+ ticket here from David concerning job failures. Nothing wrong with the ticket handling, but I thought that David's errors in submitting test jobs are worth documenting, as they were very understandable. David has since asked if the Sno+ job failures are linked to Sno+ nagios test failures at Liverpool. In progress (12/6)<br />
<br />
'''Imperial'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114157 114157] (8/6)<br /><br />
After Simon and Daniela have cleared up the atlas dark data and expanded their Space Tokens using the space freed up there still seems to be some confusion in Rucio, disagreeing with the SRM numbers. In progress (10/6)<br />
<br />
'''Oxford'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114208 114208] (9/6)<br /><br />
Oxford being ticketed for UKI-SOUTHGRID-OX-HEP_IPV6TEST failing connection tests, tests for which Oxford should not be getting ticketed afaacs. I remember this being mentioned in the Thursday cloud meeting, but I'm ashamed to say I wasn't paying attention. Were any conclusions drawn/decisions made? In progress (10/6)<br />
<br />
'''Manchester'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114153 114153] (7/6)<br /><br />
Atlas transfer failures from Manchester. Errors are still occurring as of yesterday, any news? In progress (14/6)<br />
<br />
<br />
'''Monday 8th June 2015, 15.00 BST'''<br /><br />
21 Open Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113914 113914] (26/5)<br /><br />
Sno+ had problems at the Tier 1 where jobs failed whilst uploading data, believed to be due to an incorrect VOInfoPath. There's been a failure at replicating the issue, and the VOInfoPath advertised is correct. Very confusing, as I assume it all worked at some point before! In progress (2/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113910 113910] (26/5)<br /><br />
Another Sno+ ticket, concerning lcg-cp timeouts whilst data-staging from tape. Matt M has asked for advice on the best practice for doing this, or if Sno+ would be better off just upping their timeouts. Brian has given some advice on using the "bringonline" command, but is himself unsure the best way of seeing what files are currently in a VO's cache. Not much news since. In progress (28/5) <br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114004 114004] (31/5)<br /><br />
Atlas transfers fail due to the "bring-online" timeout being exceeded. Brian spotted a problem with file timestamps mismatching, but no news on this ticket since. In progress (1/6)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114006 114006] (31/5)<br /><br />
APEL accounting oddness at Brunel, noticed by the APEL team. After much to-and-fro-ing John noticed that multiple CEs were using the same SumbitHost, and thus overwriting each other's sync records. Something to watch out for. In progress (7/6)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114157 114157] (8/6)<br /><br />
There's been some debate on the atlas lists about this ticket, a classic "not enough space at the site" ticket. Raising above the indignation over being ticketed for this, Daniela has offered a couple of TB to give some space, and pointed out that IC have some atlas data outside space tokens, and that this could be used to expand the tokens if cleaned up. Waiting for reply (8/6)<br />
<br />
<br />
'''Tuesday 26th May 2015'''<br /><br />
Matt's on leave until the 8th of June. But he's replaceable with handy links... 23 tickets today:<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios''']<br /><br />
<br />
[http://tinyurl.com/nwgrnys '''UK NGI GGUS tickets''']<br />
<br />
'''Monday 18th May 2015, 14.30 BST'''<br /><br />
Full review this week.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios''']<br /><br />
At time of writing I see problems with test jobs at Brunel for pheno and Liverpool for a number of VOs (see Sno+ ticket for probable cause and fix at Liverpool).<br />
<br />
22 Open UK Tickets this week. Going site-by-site:<br />
<br />
'''APEL/NGI'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113473 113473] (4/5)<br /><br />
Missing accounting date for April for some sites. Raul is discussing things for Brunel in the ticket, although they have republished. I think it's only ECDF left to republish their April data. In progress (16/5)<br />
<br />
'''OXFORD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113482 113482] (26/4)<br /><br />
Loss of accounting data for Oxford needing a APEL republish. The Oxford guys republished, but there is some confusion with the resulting numbers. Discussion is ongoing, John G is currently looking at the records. In progress (14/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113650 113650] (11/5)<br /><br />
CMS glideins failing at Oxford. The original problem was with a config tweak being left out of the cvmfs setup, but the ticket has been reopened citing problems persisting on the ARC CE (the CREAM appears to be fixed). Reopened (16/5)<br />
<br />
'''GLASGOW'''<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113095 113095] (17/4)<br /><br />
ROD ticket about batch system BDII failures, left open to avoid unnecessary ticket filing. Gareth noted that the full migration to ARC and HTCondor, which should see the end of these issues, will hopefully be completed by the end of June. On Hold (12/5)<br />
<br />
'''ECDF'''<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (31/7/13)<br /><br />
Somehow left this one out of the e-mail update. Edinburgh's glexec ticket, dependent on the tarball. I put in my tuppence worth today with my tarball hat on. On hold (18/5)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113769 113769] (18/5)<br /><br />
LHCB see a cvmfs problem at Sheffield. Elena has probably fixed the problem(restarted the sssd), just waiting to see if it all pans out. In progress (18/5)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113744 113744] (15/5)<br /><br />
For the VOMS rather then the site, Jens' request for the creation of the dIrac VO, vo.dirac.ac.uk. In progress (18/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113692 113692] (13/5)<br /><br />
A request from pheno to add support to for their new cvmfs area at Manchester, and as I understand it, to support them in a new "form" (pheno.egi.eu). In progress (13/5)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113742 113742] (15/5)<br /><br />
Sno+ noticed their nagios failures at Liverpool. Rob reckons this was a problem with the DPM BDII service certificate not being updated (that's bitten me too), and fixed things this morning. Let's see how that goes. In progress (18/5)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13!)<br /><br />
Lancaster's vintage glexec ticket. An update on this - after have a roundtuit session last week I was building glexec for different paths. It still needs some testing to make sure it works properly. There however definitely won't be a one-size-fits-all tarball solution. On hold (15/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/14)<br /><br />
Only the crustiest old tickets for us at Lancaster! Poor perfsonar performance. Sadly didn't get roundtuit on this one - we're pushing getting these nodes dual stacked as Ewan had pointed out that it would be interesting to see if IPv6 tests also saw this issue. On hild (18/5)<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113721 113721] (14/5)<br /><br />
The only UCL ticket, this is a egi "low availability" ticket. However Daniela notes that the plots are on the rise, so things are looking alright. Probably want to "On Hold" it but otherwise not much to be done. In progress (14/5)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113743 113743] (15/5)<br /><br />
A ticket from Durham concerning the Dirac instance at Imperial's settings for their site. Daniela hopes to get it fixed soon. In progress (15/5)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112948 112948] (10/4)<br /><br />
CA certificate update at 100IT leading to a discussion of other authentication based failures. David has asked for voms information after posting his configs. In progress (13/5)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113035 113035] (14/4)<br /><br />
Ticket tracking the decommissioning of the Tier 1 CREAM CEs. I think things are just about done now, this ticket can soon be closed. In progress (11/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694] (28/10/14)<br /><br />
Sno+ gfal-copy ticket. Brian reports that the Tier 1 is upgrading gfal2 on their WNs, and notes that there's a lot of active debugging work going on in the area. As he eloquently puts it "situation is quite fluid". In progress (13/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA tests failing at the Tier 1. There's been a lot of work on this, deploying then trying to get the new xrootd director configured. New problems have cropped up, and are under investigation. In progress (11/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (28/3)<br /><br />
Atlas transfer failures ("failed to get source file size"). Tracked to a odd double transfer error, possibly introduced in one of the recent "upgrades". Brian has been declaring these files as bad, and a workaround or solution is being thought about. In progress (14/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113705 113705] (13/5)<br /><br />
Atlas transfer failures from RAL tape. Checksum failures, which Brian tracked to being due to not being of a type Castor supports. Brian has asked if this can be changed at the CERN FTS or in rucio. Waiting for reply (14/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113748 113748] (16/5)<br /><br />
Another atlas transfer ticket, but as the error indicates no space left at the Brunel space token being transferred to Elena has noted that this isn't a site problem, telling the submitter to put in a JIRA ticket instead. Waiting for reply, but probably can be just closed (16/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112866 112866] (7/4)<br /><br />
Lots of cms job failures at RAL. This has been traced to some super-hot files, mitigation is being looked into. A candidate for perhaps On Holding, depends on the time frame of a work around. In progress (13/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113320 113320] (27/4)<br /><br />
CMS data transfer issues. I'm not actually too sure what's going on. There are files that need invalidating, which seems to be the root of the evil befalling transfers. The issue is being actively worked on though. In progress (18/5)<br />
<br />
'''Monday 11th May 2015, 14.10 BST'''<br /><br />
22 Open UK Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
There are a few tickets at the Tier 1 that are set "In Progress" but haven't received an update yet this month:<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (CMS AAA Tests, 30/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (Atlas Transfer problems, 16/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694] (SNO+ gfal copy trouble, 15/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112866 112866] (CMS job failures, 7/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112819 112819] (SNO+ arcsync troubles, 20/4)<br />
<br />
'''Other Tier 1 Tickets''' (sorry to be picking on you guys!)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (10/2)<br /><br />
Atlas glexec hammercloud test jobs at the Tier 1. It appears to be working, but a batch of test jobs failed because they couldn't find the "mkgltempdir" utility on some nodes ("slot1_5@lcg1742.gridpp.rl.ac.uk" and "slot1_4@lcg1739.gridpp.rl.ac.uk"). In progress (4/5)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=113320 113320] (27/4)<br /><br />
Maybe repeating what Daniela is going to say in the CMS update - trouble with CMS data transfers within RAL. It's under investigation, but it looks like the files in question will need to be invalidated - even if it's just to paint a clearer picture. In progress (10/5)<br />
<br />
'''APEL REPUBLISHING'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113473 113473] <br /><br />
At last update Brunel, Liverpool, Edinburgh, Birmingham and Oxford need to republish still. Oxford have their own ticket about it due to complications ([https://ggus.eu/?mode=ticket_info&ticket_id=113482 113482]).<br />
<br />
'''UCL Tickets''' - Ben is starting to move to close these, some are going to be "unsolved".<br />
<br />
'''GLASGOW'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113095 113095] (17/4)<br /><br />
Andrew asks if the timeframe for the move to Condor be added to this ticket, for the ROD team's information. On Hold (7/4)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112948 112948] (10/4)<br /><br />
No news on this 100IT ticket for a while. In progress (27/4)<br />
<br />
'''Friday 1st May'''<br /><br />
The Bank Holiday weekend might muck up plans for a Ticket review this week. Just in case, some links!<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios page.]<br /><br />
<br />
[http://tinyurl.com/l784bbg UK GGUS Tickets]<br />
<br />
Hope you all have a nice weekend!<br />
<br />
A quick check of the [https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios] page.<br />
<br />
26 Open UK tickets this week.<br />
<br />
'''ITWO Decommissioning'''<br /><br />
Three of the tickets are to the VOMS sites (Manchester, Oxford, IC), concerning the decommissioning of the ITWO VO. Just an FYI to y'all.<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113293 113293] (26/4)<br /><br />
There was an APEL problem last month where a lot of sites needed to republish their data for the month. I think Edinburgh are the only UK site that suffered this problem, but another FYI ticket. Assigned (26/4) '' And solved''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113181 113181](21/4)<br /><br />
Atlas production jobs not running at ECDF. Andy noticed that analysis jobs were running fine, and believes that this might be a problem scheduling pilots in time. Perhaps a multicore issue if this is only effecting (affecting?) production jobs. In progress (22/4) ''Update - solved''<br />
<br />
'''ATLAS GLEXEC HAMMERCLOUD PROBLEMS'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (TIER 1)<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=111703 111703] (RALPP)<br /><br />
It was discovered that there was a problem in the test code, so the ball is very much in atlas' court for this one. The problem has been fixed and the tests are being rebuilt and resubmitted.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (28/3)<br /><br />
ATLAS FTS failures too RAL. A rucio issue causing double-transfers has been discovered ([https://its.cern.ch/jira/browse/ATLDDMOPS-4939 here]), which would explain the behaviour seen. No news since this revelation. In progress (16/4)<br />
<br />
There are a number of other Tier 1 tickets that could do with either an update or On Holding<br />
<br />
'''Monday 20th April 2015, 14.30 BST'''<br /><br />
24 Open UK tickets this week, only a light review. ''Update - down to 20 open tickets as of this morning''<br />
<br />
'''NGI/TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113150 113150] (20/4)<br /><br />
Fresh in - the NGI has been ticketed to change the regional VO from emi.argus to ngi.argus in the gocdb. Seems a bit pedantic, but hey! I assigned it to the NGI ops, and notified RAL as keepers of the regional argus. Assigned (20/4) ''Update - solved''<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113035 113035] (14/4)<br /><br />
Just for people's interest, the ticket tracking the decommissioning of the last of the RAL CREAM CEs. In progress (14/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112819 112819] (2/4)<br /><br />
A SNO+ ticket I must of somehow missed last week, concerning SNO+'s manual renewing of proxies on ARC machines. Matt M has noticed that ArcSync occasionally hangs rather then timeouts smoothly (although he later notes that he doesn't see the initial problems working from a different network). I'm thinking that this should be redirected at the arc devs, but I don't think they have a GGUS support group (I could be wrong, I'm well behind on the ARC curve). In progress (7/4)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113110 113110] (17/4)<br /><br />
Looks like this atlas low transfer efficiency ticket can be closed. Waiting for reply (20/4) ''Update - solved''<br />
<br />
'''GLASGOW'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113095 113095] (17/4)<br /><br />
ROD ticket for some BDII misreporting at Glasgow. The botheration seems to be ephemeral in nature, the blunders passing with the abating of their batch system's burden. This ticket can probably be solved. In progress (17/4)<br />
<br />
<br />
'''Monday 13th April 2015, 14.00 BST'''<br /><br />
24 Open tickets this week - going over all of them this week, site by site.<br />
<br />
''Fresh in this morning - [https://ggus.eu/?mode=ticket_info&ticket_id=113010 113010] and [https://ggus.eu/?mode=ticket_info&ticket_id=113011 113011] - Sno+ tickets concerning the RAL and Glasgow WMSes not updating job statuses.''<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703](11/2)<br /><br />
Atlas glexec hammercloud tests failing. There's been a lot of waiting on atlas to build new HC jobs. The most recent exchange (delayed due to Easter), was asking about SELinux - but no news since the first. In progress (1/4)<br />
<br />
'''BIRMINGHAM'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112875 112875](7/4)<br /><br />
Low availability ROD ticket. Availability is crawling back up, just need it to go green. On hold (13/4)<br />
<br />
'''GLASGOW'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112967 112967](10/4)<br /><br />
Another ROD ticket for bdii errors at Glasgow. Gareth has been doing everything right investigating this. Kashif recommended ticketed the midmon unit, but Gareth has spotted that the errors correspond to high load on their ARC CE - so it might be a site problem after all - Gareth asks for clarification. Waiting for reply (13/4)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (1/7/13)<br /><br />
Tarball glexec ticket. No news (sorry). End of April I believe was the "deadline" I set for having this made. On Hold (9/3)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/14)<br /><br />
Lancaster's poor perfsonar performance. I'm not believing quite what I was seeing with the tests I performed so I'm aiming to rerun them. On hold (13/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13)<br /><br />
Lancaster's tarball glexec ticket. Same as ECDF. On hold (9/3)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112966 112966] (13/3)<br /><br />
A ROD cream job submit ticket, freshly assigned this afternoon. It's a bit mean of me to bring notice to it. Assigned (13/4) ''And POW, Raul closed this after kicking torque into shape - solved''<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112948 112948] (10/4)<br /><br />
100IT needed to upgrade to the latest CA release. They've done this, but there are still authentication problems. In progress (13/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356] (10/9/14)<br /><br />
Deploying vmcatcher at 100IT. After David's questions falling on deaf ears for a while it has been advised that the ticket be closed as this issue will be dealt with elsewhere. Whether or not it is to be "solved" or "unsolved" is open to debate! In progress (can possibly be closed) (13/4)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA tests failing at RAL. After a lot of work and new xrootd redirectors problems persist. It's looking to be a problem that needs the CASTOR and/or xrootd devs to look at. In progress (30/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112713 112713] (27/3)<br /><br />
CMS asking to clean up the "unmerged area". Andrew conjured up a list of files and asked if they could be deleted - CMS responded with a "yes please then close the ticket". Has the deed been done? In progress (31/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694] (28/10/14)<br /><br />
The Sno+ gfal copy ticket. Matt M still sees gfal-copy hang for files at RAL when he uses the GUID (SURL works). A Castor oddity perhaps? Matt asks a question about what problems like this (coupled with the move away from lcg tools) will mean for VOs that rely on the LFC. In progress (31/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112977 112977] (10/3)<br /><br />
CMS high job failure rate at RAL. Related to 112896 (below) - the jobs all want that file! In progress (13/3)<br />
<br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=112896 (9/4)<br /><br />
CMS Dataset access problems - caused by over a million access attempts on a single file over a 18 hour period. Andrew L comments that CMS needs to have a think about how they access pileup datasets. In progress (9/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (10/2)<br /><br />
Tier 1 counterpart to 111703. A new HC stress test was submitted near the end of March, but no news on how it did. In progress (23/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112866 112866] (7/4)<br /><br />
A different "lots of CMS job failures" ticket. Again a "hot file" seems to be the root cause. In progress (7/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (28/3)<br /><br />
An atlas file access ticket, seemingly caused by some odd FTS behaviour. No answers to Shaun's question about this odd occurrence or much noise at all till today. Waiting for reply (13/4)<br />
<br />
'''UCL'''<br /><br />
UCL has 6 tickets - 4 just "assigned". I'll just list them in the interests of brevity.<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112371 112371] (ROD low availiability, On Hold)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112841 112841] (atlas 0% transfer efficiency, assigned)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112873 112873] (ROD srm put failures, assigned)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95298 95298] (glexec ticket, on hold)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112722 112722] (atlas checksum timeouts, in progress)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112966 112966] (ROD job submit failures, assigned)<br />
<br />
'''Tuesday 7th April'''<br />
* 20 open tickets. [http://tinyurl.com/l784bbg Link to GGUS].<br />
<br />
'''Monday 23rd March 2015, 15.30 GMT'''<br /><br />
19 Open tickets this week. <br />
<br />
'''BIRMINGHAM'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=112550 (23/3)<br /><br />
A ticket fresh off the ROD dashboard - the Birmingham CREAMs aren't being matched ("BrokerHelper: no compatible resources"). Matt W has double checked their setup and can't spot anything wrong - they've been running "normal" atlas/lhcb etc jobs fine over the last few weeks. Any advice appreciated. In progress (23/3)<br />
<br />
'''TIER 1'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=112495 (21/3)<br /><br />
MICE problems running jobs at RAL, which Andrew L discovered coincided with WMS problems that he fixed. Probably should be in "Waiting for Reply/Seeing if the problem's evaporated". In progress (23/3)<br />
<br />
https://ggus.eu/?mode=ticket_info&ticket_id=112350 (14/3)<br /><br />
The cause of this Sno+ ticket, about a recent user not being able to access files due to not being in the gridmap, has been discovered. As Robert F sagely pointed out the latest version of the mkgridmap rpm is required to talk to the voms server. Just waiting on the time for it to updated at now. In progress (17/3)<br />
<br />
'''100IT'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=108356 (10/9/14)<br /><br />
This 100IT ticket is still waiting for a reply (since mid-January). The question needs to be answered by someone familiar with the technologies and terminologies? Is anyone up on vmcatcher? Anyone know what other channel to pass David's query onto? Waiting for reply (15/1)<br />
<br />
'''Monday 16th March 2015, 15.30 GMT'''<br /><br />
<br />
16 Open UK tickets this week. Half Red, Half Green.<br />
<br />
'''RALPP and TIER 1 glexec HC tickets'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703] (RALPP)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (TIER 1)<br /><br />
No news on these tickets since it was expected that a new stable HC job would be released last Tuesday. All very quiet.<br />
<br />
'''OXFORD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112011 112011](25/2)<br /><br />
Similar for this CMS glexec ticket - no news after Kashif asked for some more information way back. Waiting for reply (27/2)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694](28/10/14)<br /><br />
The Sno+ gfal copying ticket. A lot of people are working on this, and attempts to recreate the problems seem to occasionally be devolving into "did we get this complicated command right?". At some point it might be necessary to get the gfal devs involved (are there gfal devs?). Waiting for reply (11/3)<br />
<br />
Also there's the poor 100IT ticket, still waiting for a reply. The JET ticket also needs wrapping up, I put a reminder in to the end of it.<br />
<br />
<br />
'''Monday 9th March 2015, 15.00 GMT'''<br/ ><br />
From last week's crusty ticket round up:<br />
<br />
'''Tier 1'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694]- 28/10/14<br/ ><br />
Matt M has managed to reclaim his tickets after a certificate change orphaned his old ones. Progress has resumed. Duncan has asked Matt to retry his failing tests with a simple copy to local disk example.<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944]- 1/10/14<br/ ><br />
CMS AAA access at RAL. Andrew posed a question to the CMS xroot experts last week - if you have their details it might be a good idea to involve them in the ticket.<br />
<br />
'''100IT'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356]- 10/9/14<br/ ><br />
This VMCatcher ticket is still stuck waiting for a reply. Deafening silence for our 100IT colleagues. <br />
<br />
'''EFDA-JET'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=97485 97485] - 21/9/13<br/ ><br />
The Jet LHCB ticket is in the state of being wrapped up. LHCB have been removed from the local configs, and the site has been removed from LHCB's. I believe that this ticket can be terminated.<br />
<br />
'''QMUL'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353] - 25/11/14<br/ ><br />
Dan's managed to get webdav working on one SE, but not t'other. Very strange, but Dan is investigating (see also [https://ggus.eu/?mode=ticket_info&ticket_id=111942 111942]).<br />
<br />
No movement on the three glexec tickets (none expected on the two tarball ones in the last week though), the Lancaster perfsonar ticket is still waiting on another batch of local tweaks (and I still need to make sense of what I'm seeing). Matt RB closed Sussex's perfsonar ticket though - nice one.<br />
<br />
'''The "Normal" tickets:'''<br />
<br />
Atlas gLexec Hammercloud failures (RALPP and Tier 1)<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (Tier 1)<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703] (RALPP)<br/ ><br />
These tests were waiting on a new stable job release being made - this has been delayed (hopefully out tomorrow).<br />
<br />
'''TIER 1'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111856 111856](19/2)<br/ ><br />
This LHCB ticket about stalled jobs looks like it can be closed (LHCB no longer see a problem). ''Update - set to solved, the jobs were being killed for using too much memory.''<br />
<br />
'''OXFORD'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112011 112011](25/2)<br/ ><br />
A CMS user saw glexec failures on some nodes - Kashif asked for some more information but there has been no reply. I'd consider giving the user till the end of the week then closing the ticket if there's still no word. Waiting for reply (27/2)<br />
<br />
'''Tuesday 3rd March'''<br />
* [http://tinyurl.com/nwgrnys A link to GGUS open tickets] for checking status directly.<br />
<br />
Concentrating on pre-2015 tickets this week in an attempt to Spring Clean the UK ggus presence. I will review these again next week - can people please take a look at these tickets if they're owned by them (or if they think they can help!).<br />
<br />
'''SUSSEX - 26/11/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389]<br /><br />
This is a perfsonar ticket - the initial request (reinstalling the perfsonar node) has been done a while ago but things weren't quite right. Matt RB did some soothing to this last week and asks if he's missed anything - I put it to waiting for reply this morning. Waiting for reply (24/2)<br />
<br />
'''QMUL - 25/11/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353]<br /><br />
Atlas wanting https access on QM's SE. Dan's been working on this nicely, carefully testing each stage of his rollout. The end is in sight here. In progress (17/2)<br />
<br />
'''TIER 1 - 28/10/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694]<br /><br />
This is a SNO+ ticket about getting gfal tools working for the Tier 1 - with the new version out Brian has tested it correctly (and I saw a related thread on lcg-rollout) - but no word from Sno+. Who is wrangling the other VOs in these post-Walker times? Waiting for reply (24/2)<br />
<br />
'''TIER 1 - 1/10/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944]<br /><br />
A CMS access about AAA tests at the Tier 1. This ticket is being actively worked on, with a new xrootd redirector at RAL and problems with the EU redirectors mucking things up. No problems that I can see. Waiting for reply (2/3)<br />
<br />
'''100IT - 10/9/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356]<br /><br />
Getting VMcatcher stuff to work at 100IT. This ticket seems to keep stalling due to lack of documentation or replies from the submitters. Waiting for reply (19/1)<br />
<br />
'''LANCASTER - 27/1/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566]<br /><br />
Lancaster's poor perfsonar performance. Being poked and prodded on and off over the last year, but the problems remain a mystery - Ewan's lending of a iperf endpoint has helped out greatly though, waiting on yet another network tweak. On Hold (23/2)<br />
<br />
'''EFDA-JET - 21/9/13'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=97485 97485]<br /><br />
LHCB job failures at EFDA-JET. The causes of this remain a mystery, and is the first ticket on my "to be set to unsolved" list.<br />
<br />
'''UCL - 1/7/13'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95298 95298]<br /><br />
UCL's glexec ticket. Ben's been working hard at this recently, but keeps hitting show stoppers - the latest being a performance problem possibly due to the VM he's running argus on. On hold (19/2)<br />
<br />
'''ECDF and LANCASTER - 1/7/13'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (ECDF)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (Lancaster)<br /><br />
glexec for the tarball. This is *still* waiting on the tarball glexec, which is again waiting on me, which is waiting on me magicking some extra tarball development time. Will be reviewed by the end of March.<br />
<br />
<br />
'''Monday 23rd February 2015, 15.00 GMT'''<br /><br />
Only 15 Open UK tickets this week. Feel free to bring up any ticket-based issues of your own on this quiet week.<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703](11/2)<br /><br />
This CMS glexec hammercloud ticket is looking a little quiet - no update for a while. If it's continuing offline or waiting on input could it at least be put On Hold? In progress (11/2)<br />
(The Tier-1 version of this ticket, 111699, seems to be chugging along fine - there might be useful snippets in there).<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333](22/1)<br /><br />
The Cloud accounting probe ticket was reopened, asking if 100IT ticketed apel support (I assume contacting them via other means would work too) otherwise the new cloud accounting won't be properly republished. Reopened (20/2)<br />
<br />
<br />
'''IMPERIAL''' (but not really their issue)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111872 111872](20/2)<br /><br />
Tom opened a ticket after another cern@school user had troubles using the IC SE - there has been some problems with the newer versions of the dirac UI. Sometimes it's better to go Vintage! Although after trying Simon and Daniela haven't been able to reproduce the failure - perhaps something's up with the user's UI? Waiting for reply (23/2)<br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389](26/11/14)<br /><br />
Sussex's Perfsonar ticket. I know Matt RB has put the ticket On Hold and is very busy - is there any news/anything we can do to help? On Hold (21/1<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694](28/10/14)<br /><br />
Sno+ gfal copy problems at RAL. Brian informs us that the latest version of the gfal tools works for him and has asked if they work for Matt M. and Co. Did you get these packages out of epel/epel-testing or somewhere else Brian? Waiting for reply (18/2)<br />
<br />
'''Monday 16th February 2015, 14.30 GMT'''<br /><br />
Only 19 open UK tickets today.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111120 111120](12/1)<br /><br />
An atlas ticket concerning transfer failures between RAL and BNL. Brian mentioned last week that the lack of recent failures is due to atlas not attempting to transfer any older data recently. Perhaps this could do with being put into the ticket (and the ticket being put On Hold, or prodded some more)? Waiting for reply (29/1)<br />
<br />
Also<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=111800 111800] (17/2)<br /><br />
ARC CE issues at RAL detected. <br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703](11/2)<br /><br />
Atlas running glexec hammerclouds - having trouble at RALPP (and RAL, see 111699). The glexec experts have gotten involved on this one, and asked to take a peek at a proxy - not sure about anyone else, but I'd feel a tad uncomfortable sharing proxies, even with known and trusted experts as in this case. Am I being overly paranoid? Either way the ticket has gone a bit quiet. In progress (11/2) <br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356] & [https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333]<br /><br />
Both the 100IT tickets are in Waiting for reply - the oldest one for quite a while - David asked a question a while back and no answer. The newest one asks if the 100IT logs made it to apel safely - I think what David has to do is submit a ticket with this question to the apel support team - have I got the right end of the stick? <br />
<br />
'''Biomed Tickets at Manchester and Imperial'''<br /> <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356] & [https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357]<br /><br />
FYI There's a note at the bottom of both of these tickets that the version of CREAM that should fix this has been delayed until the end of February(ish).<br /><br />
''Update - I read those updates wrong - the cream update has been released and these tickets have been (perhaps erroneously) closed.'' <br />
<br />
<br />
<br />
'''Monday 9th February 2015, 15.00 GMT'''<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios Results]<br /><br />
At the time of writing the only site showing red that aren't suffering an understood problem was RALPP with org.nordugrid.ARC-CE-submit and SRM-submit test failures for gridpp, pheno, t2k and southgrid for both its CEs and its SE. The failures are between 1 and 12 hours old, so it doesn't seem to be a persistent failure, but it seems to be quite consistent. They all seem to be failing with "Job submission failed... arcsub exited with code 256: ...ERROR: Failed to connect to XXXX(IPv4):443 .... Job submission failed, no more possible targets". Anyone seen something like this before?<br />
<br />
Only 20 Open UK tickets this week.<br />
<br />
'''Biomed tickets:'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356] (Manchester)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357] (Imperial)<br /><br />
Biomed have linked both these tickets as children of 110636, being worked on by the cream blah team. AFAIKS no sign of Cream 1.16.5 just yet.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111347 111347] (22/1)<br /><br />
CMS consistency checks for January 2015. It looks like everything that was asked of RAL has been done by RAL, so hopefully this can be successfully closed. In progress (3/2)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111120 111120] (12/1)<br /><br />
Another ticket, this time concerning a period of Atlas transfer failures between RAL and BNL, that looks like it can be closed as the failures seem to have stopped (and might well have been at the BNL end). Waiting for reply (22/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA test failures at RAL. Federica can't connect to the new xrootd service according to the error messages. No news for a while. In progress (29/1)<br />
<br />
'''100IT'''<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356]<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333]<br /><br />
Both of these 100IT tickets are looking a bit crusty - the first is waiting for advice, the second was just put "In progress".<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353] (25/11/14)<br />
Dan has set up se02.esc.qmul.ac.uk to test out the latest https-accessible version of storm for dteam and atlas. As a cherry on top this node is also IPv6 enabled. I'm not sure if Dan wants others in the UK to "give it a go"? In progress (6/2)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/14)<br /><br />
(Blatantly scounging for advice) Trying to figure out why Lancaster's perfsonar is under-performing. Ewan kindly gave us access to a iperf endpoint and it's been very useful in characterising some of the weirdness - although I'm still confused. Ewan also gave us a bunch of suggestions for testing that have been useful - next stop, window sizes. If anyone else wants to throw advice to me all wisdom donations are thankfully accepted. My advice for others in be careful trying to connect to the default iperf port on a working DPM pool node.... In Progress (9/2)<br />
<br />
'''Monday 2nd February 2015, 14.00 GMT'''<br /><br />
22 Open UK tickets this month. <br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389] (26/11/14)<br /><br />
A perfsonar ticket for Sussex. Their perfsonar has been reinstalled, but needs soothing. Matt has informed us that this might have to wait a few weeks due to other issues. On Hold (21/1)<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110536 110536] (2/12/14)<br /><br />
MICE job failures at RALPP - it looked like they were dying due to running of of memory. The queues have been tweaked to give MICE more, but no word from the MICE if this has solved the problem. Waiting for reply (12/1)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110365 110365] (25/11/14)<br /><br />
Another perfsonar ticket. Again the node is reinstalled, just not quite working right. Winnie is waiting for news from the other sites in a similar boat. In progress (maybe On Hold it?) (20/1)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111118 111118] (12/1)<br /><br />
ECDF "low availability" ticket - just waiting for the silly alarm to clear. Daniela submitted a ticket about this foolish alarm a while ago - [https://ggus.eu/?mode=ticket_info&ticket_id=107689 107689]. On Hold (19/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (1/7/13)<br /><br />
glexec tarball ticket. With my tarball hat on - still no positive news on this front - it's beginning to look like this can't be done but we're having one last go. Sorry! On Hold (19/12)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110225 110225] (18/11/14)<br /><br />
Change of VO Manager for helios-vo.eu. It looks like this ticket is being held up at the user end a lot. I'm not sure there's anything we can do as it involves outside CAs. On Hold (20/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356] (23/1)<br /><br />
One of Manchester's CEs not working for biomed, due to problems with the new CREAM/old WMS communication. Alessandra gave biomed some sagely advice, but I suspect this ticket will need to be prodded soon to get a reponse from biomed (who I agree should use a newer WMS and close it). On Hold (26/1)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111547 111547] (2/2)<br /><br />
I'm reporting on a ticket that I submitted to myself today. I'm not sure what that says about the world. Anyway - a ticket to track the decommissioning of one of Lancaster's CEs, as we try to do it all proper like. On Hold (2/2)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (21/1/14)<br /><br />
Lancaster's perfsonar ticket, which I sadly let reach its first birthday. I've been prodding this offline, does anyone have the address for a regular, open iperf endpoint I could borrow? On Hold (9/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13)<br /><br />
Lancaster's tarball glexec ticket, as the ECDF one. On hold (26/1)<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13)<br /><br />
UCL's glexec ticket. They've been having trouble getting it to behave, and at last check Ben was off ill - probably due to dealing with glexec :-) On Hold (20/1)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353] (25/11/14)<br /><br />
Atlas asking for QM's storage to be made available via https. Waiting on a production ready STORM that can provide this - Dan is trying it out on his testbed se02.esc.qmul.ac.uk, which still needs tweaking. In progress (28/1)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357](23/1)<br /><br />
One of the IC CEs not working for biomed. Similar to the Manchester ticket, Daniela points to ticket 110635 and is waiting on an EMI release to fix it (due out imminently AIUI). On Hold (28/1)<br />
<br />
'''EFDA-JET'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=97485 97485] (21/9/13)<br /><br />
Jet's LCHB job failure tickets. I'm afraid I haven't been able to chase this up (partly due to only ever remembering on the first Monday of the month) - there's been no news for a while. On Hold (1/10/14)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333] (22/1)<br /><br />
A ticket to 100IT and the NGI to get the cloud accounting probe upgraded. I notified 100IT, but forgot to reassign the ticket - thanks to Jeremy for doing it. Assigned (2/2)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356](10/9/14)<br /><br />
Getting VMcatcher working at 100IT. David from 100IT has asked for some answers on which "glancepush" to use, but no reply for a while. Waiting for reply (19/1)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111477 111477](29/1)<br /><br />
CMS would like to run some staging tests to warm up for Run2. The Tier 1 warned CMS of today's outage and they're happy to proceed tomorrow (the 3rd) - I think they'd like a response. In progress (30/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=107935 107935](27/8/14)<br /><br />
A ticket regarding inconsistent BDII and SRM storage numbers. Waiting on a fix from the developers regarding read-only disk accounting (I think), Brian is still on the case. Stephen B let us know that Maria the ticket submitter is on maternity leave, and asks in her stead if the numbers are expected to align now. On hold (28/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111120 111120](12/1)<br /><br />
An atlas ticket about a large number of data transfer errors seen between RAL and BNL. Brian reckoned that this was due to shallow checksums on the old data being transferred, but had trouble looking at the BNL FTS. Regardless, the ADCoS shifter hadn't seen any errors for a week and suggests the ticket can be closed. Waiting for reply (29/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944](1/10/14)<br /><br />
CMS AAA test problems at RAL. After setting up a new xrootd box the test failures have changed in nature, but sadly they're still failures. In progress (29/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111347 111347](22/1)<br /><br />
CMS Consistency Check for RAL, January 2015 edition. Filelists were generated, orphan files were identified, then purged. Just need to know what CMS want to do next. Waiting for reply (26/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694](28/10/14)<br /><br />
Sno+ ticket concerning gfal tool problems, waiting on the new release to come out (middle of this month I believe). If you don't want to wait that long then I believe the 2.8 gfal2 tools can be found in the fts3 repo at last check. On hold (20/1)<br />
<br />
<br />
'''Monday 26th January 2015, 14.15 GMT'''<br /><br />
''Back after being forgotten about by me:''<br /><br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios Status:''']<br />
<br />
At the time of writing I see:<br /><br />
'''Imperial:''' gridpp VO job submission errors (but only 34 minutes old so probably naught to worry about).<br /><br />
'''Brunel:''' gridpp VO jobs aborted (one of these is 94 days old, so might be something to worry about).<br /><br />
'''Lancaster:''' pheno failures (I can't see what's wrong, but this CE only has 10 days left to live).<br /><br />
'''Sussex:''' snoplus failures (but I think Sussex is in downtime).<br /><br />
'''RALPP:''' A number of failures across a number of CEs, all a few hours old. An SE problem?<br /><br />
'''Sheffield:''' gridpp VO job submission failure, but only 6 hours old.<br />
And of course the srm-$VONAME failures at the Tier 1, which are caused by incompatibility between the tests and Castor AIUI. Things are generally looking good.<br />
<br />
22 Open UK Tickets this week. <br /><br />
'''NGI/100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333](22/1)<br /><br />
The NGI has been asked to upgrade the cloud accounting probe, and then notify our (only at the moment) cloud site to republish their accounting. Not entirely sure what this entails or who this falls on, I assigned it to NGI-OPERATIONS (and also noticed that 100IT isn't on the "notify site" list - odd). Assigned (22/1)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944](1/10/14)<br /><br />
CMS AAA test failures. Andrew Lahiff reported last week that the Tier 1 is building a replacement xrootd box which is currently being prepared. If that will take a while can the ticket be put on hold? In progress (19/1)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353](25/11/14)<br /><br />
An atlas ticket, asking for httpd access to at QMUL. The QM chaps were waiting on a production ready Storm that could handle this, and are preparing to test one out. This is another ticket that looks like it might need to be put On Hold (will leave that up to you chaps - there's a big difference between "slow and steady" progress and "no progress for a while"). In progress (21/1)<br />
<br />
'''RHUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111355 111355](23/1)<br /><br />
A dteam ticket - concerning http access to RHUL's SE. Although the initial observation about the SE certificate being expired was incorrect (the expiry date was reported as 5/1/15, which to be fair I would read as the 5th of January and not the 1st of May!) there still is some underlying problem here with intermittent test failures. Also this ticket raises the question of under what context are these tests being conducted? Anyone know, or shall we ask the submitter? In progress (26/1)<br />
<br />
'''BIOMED PROBLEMS:'''<br /><br />
'''Manchester:''' [https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356](23/1)<br /><br />
'''Imperial:''' [https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357](23/1)<br /><br />
Biomed are having job problems, looking to be caused by using crusty old WMSes to communicate with these site's shiny up-to-date CEs. According to ticket 110635 a cream side fix should be out by the end of January (CREAM 1.16.5), although Alessandra suggests that Biomed should try to use newer, working WMSes - or Dirac instead!<br />
<br />
<br />
'''Monday 19th January 2015, 14.30 GMT'''<br /><br />
23 Open UK Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944](1/10/14)<br /><br />
CMS seeing AAA test failures at RAL. The tests have been restarted recently and now seem to be having some suspicious looking authentication failures. In Progress (13/1)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111162 111162](14/1)<br /><br />
Atlas complaining about httpd doors not working on Sheffield's SE. After schooling the submitter in how to submit more useful information Elena is working on it. I bring this up as in the last few days I've had quite a few of my pool nodes have their httpd daemons crash on them (they're up to date, but still SL5), which may or may not be related. In Progress (19/1)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111118 111118](12/1)<br /><br />
ECDF "low availability" ticket after a few days of argus trouble, which Wahid fixed. Now the ticket will languish for a few weeks as the alarm clears. Daniela has reminded us of her ticket against these fairly silly alarms: https://ggus.eu/?mode=ticket_info&ticket_id=107689 . In the mean time this ticket could do with being put On Hold whilst the alarm clears. In progress (19/1)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356](10/9/2014)<br /><br />
Setting up VMCatcher at 100IT. After some troubles things seem to have be looking up, although there are still some questions that the 100IT chaps have for the configurations and what they should be using that aren't getting answers. I set the ticket to "Waiting for Reply" hoping that this will help get those in the know's attention. Waiting for reply (15/1)<br />
<br />
'''Perfsonar Tickets'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389](Sussex)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110382 110382](TIER 1)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108273 108273](Durham)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566](Lancaster)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110365 110365](Bristol)<br /><br />
Everyone seems to have updated their perfsonar hosts, so we're all good on that front, but a number of sites are either having trouble with their reinstalled hosts, or are having problems that they had pre-reinstall still haunt them. I'm afraid I have no suggestions of what to do about the growing number of these tickets though!</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Operations_Bulletin_LatestOperations Bulletin Latest2015-10-19T15:43:36Z<p>Matthew Doidge 1ac9bd3994: /* */</p>
<hr />
<div>[[Operations_Bulletin_Archive|Bulletin archive]]<br />
<br />
__NOTOC__<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Week commencing 19th October 2015<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Task Areas<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start General****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | General updates <br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start General text*********************** ----->'''<br />
<br />
* Krishan mentions lcg-tags errors problems at EFDA-JET. <br />
* BDII installations affected by slapd crash. SL6/CentOS6 (openldap-servers-2.4.40-6) released as a security update, unfortunately from openldap-servers-2.4.40-5 an issue has been introduced which provokes the slapd process to crash under certain conditions.<br />
* Daniela notes: Location of archiving records on ARC-CEs... Beware the jobreport_options of "/var/run" which clears records on reboot.<br />
<br />
<br />
'''Tuesday 6th October'''<br />
* GOCDB has received a new service type request for ‘uk.ac.gridpp.vcycle’.<br />
* John H noted a "CVMFS problem at RAL". Apparently this was due to a misconfiguration at CERN.<br />
* With HPC DIRAC work RAL discovered a bug in how the re-use connection flag is used with the fts commands.<br />
* Here is a summary for the [http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/ September reports]:<br />
** [http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/WLCG_All_Sites_ALICE_Sep2015.pdf ALICE]: All okay.<br />
** [http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/WLCG_All_Sites_ATLAS_Sep2015.pdf ATLAS]: Lancaster: 38%:42% & Liverpool: 81%: 100%.<br />
** [http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/WLCG_All_Sites_CMS_Sep2015.pdf CMS]: All okay.<br />
** [ http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/WLCG_All_Sites_LHCB_Sep2015.pdf LHCb]: QMUL: 47%:47% ; Lancaster: 75%: 80% & ECDF: 89%<br />
* The UCL storage servers are no longer used by ATLAS.<br />
* The November GDB has been moved to 4th November.<br />
<br />
<br />
'''Tuesday 29th September'''<br />
* Nagios was (and therefore the regional dashboard has) affected by a weekend A/C outage at Oxford.<br />
* Steve J reports on: Condor libglobus_common problems<br />
* There was an EGI OMB on 24th September. [https://indico.egi.eu/indico/conferenceDisplay.py?confId=2380 Agenda]. <br />
* [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGDailyMeetingsWeek150928#Monday Notes from the Monday biweekly WLCG ops meeting] are available for anyone who is interested in the latest ops news.<br />
* On the topic 'Perfsonar Bandwidth checks not running' Duncan reported a move to a [https://maddash.aglt2.org/maddash-webui/index.cgi?dashboard=Latency%20tests%20between%20all%20WLCG%20hosts full WLCG mesh].<br />
* Tom would appreciate feedback on the [https://vm36.tier2.hep.manchester.ac.uk/ GridPP website v2].<br />
* Steve Lloyd has setup a [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics2.html new metrics page] as a basis for allocating T2 hardware funding. This just uses total Disk and total Elapsed and/or CPU time. In the PMB yesterday it was agreed that Elapsed time would be used, but the results of various combinations will be watched and assessed over the coming months. One overriding reason for using Elapsed time is that CPU is not provided by all cloud implementations.<br />
<br />
<br />
<!-- **********************End General text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End General****************** -----><br />
<!-- ****************Start ops coord****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Operations Coordination - [https://indico.cern.ch/categoryDisplay.py?categId=4372 Agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start ops coord text*********************** -----><br />
'''Tuesday 6th October'''<br />
* There was a WLCG ops coordination meeting last week. [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151001 Minutes]. [https://indico.cern.ch/event/393617/ Agenda] (which has John Gordon's accounting slides).<br />
* The highlights:<br />
** dCache sites should install the latest fix for SRM solving a vulnerability<br />
** All sites hosting a regional or local site xrootd should updgrade it at least to version 4.1.1<br />
** CMS DPM sites should consider upgrading dpm-xrootd to version 3.5.5 now (from epel-testing) or after mid October (from epel-stable) to fix a problem affecting AAA<br />
** Tier-1 sites should do their best to avoid scheduling OUTAGE downtimes at the same time as other Tier-1's supporting common LHC VOs. A calendar will be linked in the minutes of the 3 o'clock operations meeting to easily find out if there are already downtimes at a given date<br />
** The multicore accounting for WLCG is now correct for the 99.5% of the CPU time, with the few remaining issues being addressed. Corrected historical accounting data is expected to be available from the production portal by the end of the month<br />
** All LHCb sites will soon be asked to deploy the "machine features" functionality<br />
<br />
'''Tuesday 22nd September'''<br />
* There was an ops coordination meeting last Thursday: [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150917 Minutes].<br />
* Highlights:<br />
** All 4 experiments have now an agreed workflow with the T0 for tickets that should be handled by the experiment supporters and were accidentally assigned to the T0 service managers.<br />
** A new FTS3 bug fixing release 3.3.1 is now available.<br />
** A globus lib issue is causing problems with FTS3 for sites running IPv6.<br />
** The rogue Glasgow configuration management tool replacing the current configuration for VOMS with the old one was picked up and unfortunately discussed as though sites had not got the message about using the new VOMS.<br />
** No network problems experienced with the transatlantic link despite 3 out of 4 cables being unavailable.<br />
** T0 experts are investigating the slow WN performance reported by LHCb and others.<br />
** A group of experts at CERN and CMS investigate ARGUS authentication problems affecting CMS VOBOXes.<br />
** T1 & T2 sites please observe the actions requested by ATLAS and CMS (also on the WLCG Operations portal).<br />
* Actions for [https://wlcg-ops.web.cern.ch/sys-admins/tasks Sites]; [https://wlcg-ops.web.cern.ch/experiments/tasks Experiments].<br />
<br />
'''Tuesday 15th September'''<br />
* The [https://indico.cern.ch/event/393616/ next WLCG ops coordination meeting is this Thursday 17th September]. Are there any Tier-2 issues we wish to raise? Minutes will appear [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150917 here]. <br />
<br />
<br />
<!-- **********************End ops coord text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End ops coord****************** -----><br />
<br />
<!-- ****************Start T1****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-1 - [http://www.gridpp.rl.ac.uk/status/ Status Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start T1 text*********************** -----><br />
'''Tuesday 13th October'''<br />
A reminder that there is a weekly [http://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Tier-1 experiment liaison meeting]. Notes from the last meeting [https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2015-10-07 here]<br />
* The final step in the upgrade of the Castor Oracle databases to version 11.2.0.4 is taking place '''today'''. At the time of the meeting Castor is down.<br />
* There was a successful UPS/Generator load test last Wednesday (7th).<br />
* As reported last week - we did have a problem with glexec for the worker nodes over the weekend before last caused by a configuration error. We are trying to understand why this problem was not seen during the testing and roll-out of the new configuration.<br />
* We have seen very high load on Atlas Tape. We are increasing the size of the disk buffer in front of this in order to improve its performance.<br />
<!-- **********************End T1 text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End T1****************** -----><br />
<br />
<!-- ****************Start Storage & DM****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Storage & Data Management - [http://storage.esc.rl.ac.uk/weekly/ Agendas/Minutes]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150902-minutes.txt Wednesday 02 Sep]'''<br />
* Catch up with [http://www3.imperial.ac.uk/highenergyphysics/research/experiments/mice MICE]<br />
* How to do transfers of '''lots''' of files with FTS3 without the proxy timing out (in particularly if you need it vomsified)<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150812-minutes.txt Wednesday 12 Aug]'''<br />
* sort of housekeeping: data cleanups, catalogue synchronisation - in particular namespace dumps for VOs<br />
* GridPP storage/data at future events; GridPP35 and Hepix and Cloud data events<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150708-minutes.txt Wednesday 08 July]'''<br />
* Huge backlog of ATLAS data from Glasgow waiting to go to RAL, and oddly varying performance numbers - investigating<br />
* How physics data is like your Windows 95 games<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150701-minutes.txt Wednesday 01 July]'''<br />
* Feedback on CMS's proposal for listing contents of storage<br />
* Simple storage on expensive raided disks vs complicated storage on el cheapo or archive drives?<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150624-minutes.txt Wednesday 24 June]'''<br />
* Heard about the Indigo datacloud project, a H2020 project in which STFC is participating<br />
* Data transfers, theory and practice<br />
** Somewhat clunky tools to set up but perform well when they run<br />
** Will continue to work on recommendations/overview document<br />
** Worth having recommendations/experiences for different audiences - (potential) users, decision makers, techies<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End Storage & DM****************** -----><br />
<!-- ****************Start T2EVO****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-2 Evolution - [https://its.cern.ch/jira/browse/GRIDPP/ GridPP JIRA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 6 Oct'''<br />
* UCL Vac site now running LHCb test of two payloads per dual processor VM. Total of dual processor VMs at UCL now 120.<br />
<br />
'''Tuesday 29 Sep'''<br />
* UCL Vac site updated with most recent version of Vac-in-a-Box. Now running ~216 jobs: LHCb MC and ATLAS certification jobs.<br />
* Drawing up list of tasks needed to be able to run a site for GridPP-supported VOs purely using VMs (e.g. VM certification by experiments etc.)<br />
* Discussion at GridPP Technical Meeting on storage options, including xrootd-based sites (i.e. xrootd not DPM/dCache)<br />
<br />
'''Tuesday 22 Sep'''<br />
* Fortnightly [https://indico.cern.ch/category/4454/ GridPP Technical Meetings] on Fridays will have Tier-2 Evolution discussions, starting on Fri 25 Sept.<br />
<br />
'''Thursday 17 Sep'''<br />
* Task force to start developing advice for sites to simplify their operation in line with "6.2.5 Evolution of Tier-2 sites" in the GridPP5 proposal.<br />
* Mailing list for Tier-2 evolution activities: gridpp-t2evo@cern.ch - anyone welcome to join<br />
* Also [https://its.cern.ch/jira/browse/GRIDPP/ GridPP project] on the CERN JIRA service for tracking actions. Can be used with a full or lightweight CERN account. You need to be added manually or on the gridpp-ops@cern.ch mailing list to browse issues.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End T2EVO****************** -----><br />
<!-- ****************Start Accounting****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Accounting - [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html UK Grid Metrics] [[HEPSPEC06]] [http://tinyurl.com/cfbfanf Atlas Dashboard HS06]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 22nd September'''<br />
* Slight delay for Sheffield but overall okay - although there is a gap between today's date and the most recent update for all sites. Perhaps an APEL delay.<br />
<br />
'''Monday 20th July'''<br />
* Oxford publishing 0 cores from Cream today. Maybe they forgot to switch one off. [http://goc-accounting.grid-support.ac.uk/apel/jobs2_withsubmithost.html Check here]. <br />
<br />
'''Tuesday 14th July'''<br />
* QMUL and Sheffield appear to be lagging with publishing by a week. <br />
* Please check your multicore publishing status (especially those sites mentioned in June).<br />
<br />
* A reminder to keep updating the [https://www.gridpp.ac.uk/wiki/HEPSPEC06 HEPSPEC06 tables].<br />
<br />
* [http://accounting.egi.eu/egi.php?SubRegion=1.67&query=normcpu&startYear=2014&startMonth=8&endYear=2014&endMonth=9&yRange=SITE&xRange=VO&voGroup=lhc&chart=GRBAR&scale=LIN&localJobs=onlygridjobs APEL status]: An issue at Sheffield?<br />
<br />
<!-- *****************Edit stop*********************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End Accounting****************** -----><br />
<!-- ****************Start Documentation****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Documentation - [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=area KeyDocs]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- *********************************************************** -----><br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 29nd September'''<br />
Steve J: problems with voms server at fnal, voms.fnal.gov, have been detected; I will resolve them soon and may issue an update to Approved VOs, alerting sites with TB_SUPPORT should that occur. Approved VOs potentially affected are CDF, DZERO, LSST. Please do not act act yet.<br />
<br />
'''Tuesday 22nd September'''<br />
* Steve J is going to undertake some GridPP/documentation usability testing. <br />
<br />
'''Tuesday 18th August'''<br />
* Lydia's document - Setup a system to do data archiving using FTS3<br />
<br />
'''Tuesday 28th July'''<br />
* Ewan: /cvmfs/gridpp-vo help ... there's a lot of historical stuff on the GridPP wiki that makes it look a lot more complicated than it is now. We really should have a bit of a clear out at some point.<br />
<br />
'''Tuesday 23rd June'''<br />
* Reminder that documents need reviewing!<br />
<br />
<br />
'''General note'''<br />
<br />
See the [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=reviewed worst KeyDocs list] for documents needing review now and the names of the responsible people.<br />
<br />
|}<br />
<br />
<!-- ****************End Documentation****************** -----><br />
<!-- ****************Start Interop****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Interoperation - [https://wiki.egi.eu/wiki/Grid_Operations_Meetings EGI ops agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Monday 13th July'''<br />
<br />
* There was an EGI Ops meeting today: [https://wiki.egi.eu/wiki/Agenda-13-07-2015 agenda]<br />
* URT/UMD updates:<br />
** UMD 3.13.0 released on 01.07.2015: http://repository.egi.eu/2015/07/01/release-umd-3-13-0/<br />
*** APEL 1.4.1<br />
*** Argus PAP v. 1.6.2<br />
*** gLExec-wn - v. 1.2.3 (lcmaps and mkdir)<br />
*** storm 1.11.8<br />
*** fetch-crl 3.0.16<br />
*** cream 1.16.5<br />
*** dpm-xroot 3.5.2<br />
*** Xroot 4.1.1.<br />
*** Frontier Squid 2.7.24<br />
*** CVMFS 2.1.20<br />
*** GFAL2 2.8.4<br />
*** GFAL2-PYTHON 1.7.1<br />
** UMD 3.13.1 released on 13.07.2015: http://repository.egi.eu/2015/07/13/release-candidate-umd-3-13-1-rc1/ (link was not updated correctly during release)<br />
*** ARC Nagios probes 1.8.3<br />
<br />
* SR updates (small because it's summer):<br />
*** gfal2 2.9.1<br />
*** storm 1.11.9<br />
*** srm-ifce 1.23.1....<br />
*** gfal2-python 1.8.1<br />
** In Verification<br />
*** gfal2-plugin-xrootd 0.3.4<br />
<br />
* Accounting<br />
** [John Gordon] "Of the WLCG sites we now have 97%+ of cpu reported with cores. I expect you all saw my recent email to GDB naming 16 sites. If one German and one Spanish site and the four Russians start publishing we will jump to 99%+"<br />
** New list of sites needing to update multicore accounting being prepared this evening (Monday) by Vincenzo<br />
<br />
* SL5 decommissioning date March 2016; <br />
* Next meeting 10th August<br />
<br />
'''Monday 15th June'''<br />
* There was an EGI operations meeting today: [https://wiki.egi.eu/wiki/Agenda-15-06-2015 agenda]. <br />
* New Action: for the NGIs: please start tracking which sites are still using SL5 services: how many services, and for each service if still needed on SL5, if upgrades on SL5 services are expected). [https://wiki.egi.eu/wiki/SL5_retirement A wiki has been provided to record updates]. Also interesting to understand who is using Debian.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- *********************************************************** -----><br />
<br />
|}<br />
<!-- ****************End Interop****************** -----><br />
<!-- ****************Start Monitoring****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Monitoring - [https://www.gridpp.ac.uk/wiki/Links_Monitoring_pages Links] [http://grid-monitoring.cern.ch/mywlcg/ MyWLCG]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 16th June'''<br />
* F Melaccio & D Crooks decided to add a [https://www.gridpp.ac.uk/wiki/Monitoring_FAQs FAQs section] devoted to common monitoring issues under the monitoring page.<br />
* Feedback welcome. <br />
<br />
<br />
<br />
''' Tuesday 31st March<br />
<br />
* Glasgow Graphite/Grafana documentation: http://www.scotgrid.ac.uk/graphite/<br />
<br />
'''Monday 7th December<br />
<br />
* Meeting last Friday - agenda: https://indico.cern.ch/event/356853/ minutes: https://indico.cern.ch/event/356853/material/minutes/1.pdf<br />
* This was the wrap-up meeting of the consolidation TF; the mailing list will remain extant for a while yet.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Monitoring****************** -----><br />
<!-- ****************On-duty****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | On-duty - [https://operations-portal.in2p3.fr/dashboard Dashboard] [https://www.gridpp.ac.uk/wiki/ROD_rota ROD rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 6th October'''<br />
* With the exception of the dashboard getting really confused early in the week as the Nagios instances at Oxford and Lancaster came and went, it's been a fairly quiet week. There are four outstanding tickets:<br />
** Three for availability / reliabaility (Sussex, Liverpool and Lancaster).<br />
** One at Bristol for a GridFTP transfer problem.<br />
<br />
'''Tuesday 15th September'''<br />
* Generally quiet. QMUL have some grumblyness with the CEs. However, I understand much of this is caused by the batch farm being busy. There are low-availability tickets 'on hold' for Liverpool and UCL.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Rollout [http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html Status] [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions WLCG Baseline]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 15th September'''<br />
* [http://repository.egi.eu/2015/09/10/release-umd-3-13-3/ UMD 3.13.3 is available].<br />
<br />
'''Tuesday 12th May'''<br />
* MW Readiness WG meeting Wed May 6th at 4pm. Attended by Raul, Matt, Sam and Jeremy.<br />
<br />
'''Tuesday 17th March'''<br />
* Daniela has updated the [ https://www.gridpp.ac.uk/wiki/Staged_rollout_emi3 EMI-3 testing table]. Please check it is correct for your site. We want a clear view of where we are contributing.<br />
* There is a middleware readiness meeting this Wednesday. Would be good if a few site representatives joined.<br />
* Machine job features solution testing. Fed back that we will only commence tests if more documentation made available. This stops the HTC solution until after CHEP. Is there interest in testing other batch systems? Raul mentioned SLURM. There is also SGE and Torque.<br />
<br />
'''References'''<br />
<br />
* Staged Rollout pages (now separated into EMI1 & 2), and the page listing the deployed versions is extractable from the bdii, so they should all be reasonably up-to-date:<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout_emi2.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End On-duty****************** -----><br />
<!-- ****************Start Security****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Security - [http://www.gridpp.ac.uk/security/inchand/Incident.html Incident Procedure] [http://www.gridpp.ac.uk/security/policies/index.html Policies] [https://www.gridpp.ac.uk/wiki/SD_rota Rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
'''Tuesday 13th October'''<br />
* Nothing to report<br />
* Next UK Security Team meeting scheduled for 28th Oct.<br />
<br />
'''Monday 5th October'''<br />
* Updated IGTF distribution version 1.68 available - https://dist.igtf.net/distribution/igtf/current/<br />
* Update on incident broadcast EGI-20150925-01 relating to compromised systems in China. - The EGI, WLCG and VO security teams are continuing their investigations. Affected sites and users have been contacted and there is no present indication of further action needed by any site in the UK. However, as more information comes to light, additional updates may be made in the near future and sites are asked as always to read any updates carefully, taking actions as recommended.<br />
<br />
'''Tuesday 29th September'''<br />
* Incident broadcast EGI-20150925-01 relating to compromised systems in China.<br />
* UK security team meeting scheduled for 30th Sept. <br />
<br />
'''Monday 29th September'''<br />
* IGTF has released a regular update to the trust anchor repository (1.68) - for distribution ON OR AFTER October 5th<br />
<br />
<br />
The EGI [https://operations-portal.egi.eu/csiDashboard security dashboard].<br />
<br />
|}<br />
<!-- ****************End Security****************** -----><br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start Services****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Services - [http://netmon02.grid.hep.ph.ic.ac.uk:8080/maddash-webui/index.cgi PerfSonar dashboard] | [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).<br />
<br />
'''Tuesday 6th October'''<br />
* A reminder, the next [https://indico.cern.ch/event/401680/ LHCOPN and LHCONE joint meeting]: Science Park Amsterdam (NL) 28-29 of October 2015<br />
<br />
'''Tuesday 14th July'''<br />
* GridPP35 in September will have a part focus on networking and IPv6. This will include a review of where sites are with their deployment. Please try to firm up dates for your IPv6 availability between now and September. Please update the [https://gridpp.ac.uk/wiki/IPv6_site_status GridPP IPv6 status table]. <br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Services****************** -----><br />
<!-- ****************Start Tickets****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tickets<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Monday 19th October 2015, 14.30 BST'''<br /><br />
28 Open UK Tickets today.<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116920 116920] (14/10)<br /><br />
UCL have a availability ticket, and Andrew M wonders what can be done for them as a VAC site to stop getting these sort of tickets? Assigned (15/10) ''Update - Alessandra has updated and On-Holded the ticket. Thanks!''<br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116865 116865] (12/10)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116915 116915] (14/10)<br /><br />
Sussex have a Sno+ ticket and a ROD ticket that don't seem to have been looked at since they were submitted last week. Both just Assigned.<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116918 116918] (14/10)<br /><br />
Another ROD ticket (Invalid glue), I think this one has snuck past the Liver-Lad's watch. Assigned (14/10)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116782 116782] (7/10)<br /><br />
Another Rod ticket, Dan looks like he's tracked down why one of his CE's is misbehaving (MaxStartups in sshd_config). Looks like this ticket at least can be closed, Gareth confirms that the test is green. Waiting for reply (19/10)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116560 116560] (30/9)<br /><br />
Stephen B has added some more information to try to figure out why Sno+ jobs are flooding Sheffield's 10 Sno+ slots. Elena is not sure if the problem persists though. Waiting for reply (12/10)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116864 116864] (12/10)<br /><br />
It looks like this CMS AAA test problem has resolved itself - Federica asks if you chaps at RAL changed anything? Looks like the ticket can be closed. In progress (15/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116866 116866] (12/10)<br /><br />
Enabling Sno+ pilots at the Tier 1. Only the LHC VOs had pilot roles enabled at the Tier 1, Andrew was going to discuss how best to make these changes. As with a similar issue at Lancaster - probably best to do it for all the VOs that will be using Dirac. In progress (13/10)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tickets****************** -----><br />
<!-- ****************Start Tools****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tools - [https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] [https://gridppnagios.physics.ox.ac.uk/nagios/ Nagios]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== ===== <br />
<!-- ******************Edit start********************* -----><br />
''' Tuesday 6 Oct 2015'''<br />
<br />
Moved Gridppnagios instance back to Oxford from Lancaster. It was kind of double whammy as both sites went down together. Fortunately Oxford site was partially working so we managed to start SAM Nagios at Oxford. Sam tests were unavailable for few hours but no affect on egi availibilty/reliability. Sites can have a look at https://mon.egi.eu/myegi/ss/ for a/r status. <br />
<br />
'''Tuesday 29 Sep 2015'''<br />
<br />
Following an air-conditioning problem at machine room in Oxford Tier-2 site on 26 September, gridppnagios(OX) was shut down and gridppnagios(Lancs) became active instance. Oxford site is in downtime until 1st Oct and it may be extended depending on the situation. <br />
VO-Nagios was also unavailable for two days but we have started it yesterday as it is running on a VM. VO-nagios is using oxford SE for replication test so it is failing those tests. I am looking to change to some other SE. <br />
<br />
'''Tuesday 09 June 2015'''<br />
*ARC CEs were failing nagios test becuase of non-availability of egi repository. Nagios test compare CA version from EGI repo. It started on 5th June and one of the IP addresses behind webserver was not responding. Problem went away in approximately 3 hours. The same problem started again on 6th June. Finally it was fixed on 8th June. No reason was given in any of the ticket opened regarding this outage. <br />
<br />
'''Tuesday 17th February'''<br />
* Another period where message brokers were temporarily unavailable seen yesterday. Any news on the last follow-up?<br />
<br />
'''Tuesday 27th January'''<br />
* Unscheduled outage of the EGI message broker (GRNET) caused a short-lived disruption to GridPP site monitoring (jobs failed) last Thursday 22nd January. Suspect BDII caching meant no immediate failover to stomp://mq.cro-ngi.hr:6163/ from stomp://mq.afroditi.hellasgrid.gr:6163/<br />
<br />
* [http://southgrid.blogspot.co.uk/2014/10/nagios-monitoring-for-non-lhc-vos.html Blog about VO Nagios]<br />
* [https://vo-nagios.physics.ox.ac.uk/nagios/ Oxford VO Nagios] currently monitoring gridpp, pheno, t2k.org, snoplus.snolab.ca, vo.southgrid.ac.uk.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tools****************** -----><br />
<!-- ****************Start VOs****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | VOs - [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS] [http://operations-portal.egi.eu/vo VO IDs] [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs Approved] [http://pprc.qmul.ac.uk/~walker/votable.html VO table]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 19th May'''<br />
* There is a current priority for enabling/supporting our joining communities. <br />
<br />
'''Tuesday 5th May'''<br />
* We have a number of VOs to be removed. Dedicated follow-up meeting proposed.<br />
<br />
'''Tuesday 28th April'''<br />
* For SNOPLUS.SNOLAB.CA, the port numbers for voms02.gridpp.ac.uk and voms03.gridpp.ac.uk have both been updated from 15003 to 15503.<br />
<br />
'''Tuesday 31st March'''<br />
* LIGO are in need of additional support for debugging some tests.<br />
* LSST now enabled on 3 sites. No 'own' CVMFS yet.<br />
<br />
* Impact<br />
** Citation policy (https://www.gridpp.ac.uk/acknowledging.html)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End VOs****************** -----><br />
<!-- ****************Start Sites****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Site Updates<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 24th February'''<br />
* Next review of status today.<br />
<br />
'''Tuesday 27th January'''<br />
* Squids not in GOCDB for: UCL; ECDF; Birmingham; Durham; RHUL; IC; Sussex; Lancaster<br />
* Squids in GOCDB for: EFDA-JET; Manchester; Liverpool; Cambridge; Sheffield; Bristol; Brunel; QMUL; T1; Oxford; Glasgow; RALPPD.<br />
<br />
'''Tuesday 2nd December''' <br />
* [https://www.gridpp.ac.uk/wiki/Batch_system_status Multicore status]. Queues available (63%)<br />
** YES: RAL T1; Brunel; Imperial; QMUL; Lancaster; Liverpool; Manchester; Glasgow; Cambridge; Oxford; RALPP; Sussex (12)<br />
** NO: RHUL (testing); UCL; Sheffield (testing); Durham; ECDF (testing); Birmingham; Bristol (7)<br />
<br />
* According to our [https://www.gridpp.ac.uk/wiki/Batch_system_status table] for cloud/VMs (26%)<br />
** YES: RAL T1; Brunel; Imperial; Manchester; Oxford (5)<br />
** NO: QMUL; RHUL; UCL; Lancaster; Liverpool; Sheffield; Durham; ECDF; Glasgow; Birmingham; Bristol; Cambridge; RALPP; Sussex (14)<br />
<br />
* [http://www.gridpp.ac.uk/php/gridpp-dirac-sam.php?action=viewlcg GridPP DIRAC jobs] successful (58%)<br />
** YES: Bristol; Glasgow; Lancaster; Liverpool; Manchester; Oxford; Sheffield; Brunel; IC; QMUL; RHUL (11)<br />
** NO: Cambridge; Durham; RALPP; RAL T1 (4) + ECDF; Sussex; UCL; Birmingham (4)<br />
<br />
* [https://www.gridpp.ac.uk/wiki/IPv6_site_status IPv6 status]<br />
** Allocation - 42%<br />
** YES: RAL T1; Brunel; IC; QMUL; Manchester; Sheffield; Cambridge; Oxford (8)<br />
** NO: RHUL; UCL; Lancaster; Liverpool; Durham; ECDF; Glasgow; Birmingham; Bristol; RALPP; Sussex<br />
<br />
* Dual stack nodes - 21%<br />
** YES: Brunel; IC; QMUL; Oxford (4)<br />
** NO: RHUL; UCL; Lancaster; Glasgow; Liverpool; Manchester; Sheffield; Durham; ECDF; Birmingham; Bristol; Cambridge; RALPP; Sussex, RAL T1 (15)<br />
<br />
<br />
<br />
'''Tuesday 21st October'''<br />
* High loads seen in xroot by several sites: Liverpool and RALT1... and also Bristol (see Luke's TB-S email on 16/10 for questions about changes to help).<br />
<br />
'''Tuesday 9th September'''<br />
* Intel announced the new generation of Xeon based on Haswell.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Sites****************** -----><br />
|}<br />
<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Meeting Summaries<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start PMB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Project Management Board - [http://www.gridpp.ac.uk/pmb/ Members][http://www.gridpp.ac.uk/php/pmb/minutes.php Minutes] [http://www.gridpp.ac.uk/pmb/ProjectManagement/QuarterlyReports/reports.html Quarterly Reports]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ******************************************* -----><br />
<!-- ****************End PMB****************** -----><br />
<!-- ****************Start Grid Ops****************** -----><br />
<!-- ******************************************* -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | GridPP ops meeting - [http://indico.cern.ch/categoryDisplay.py?categId=338 Agendas] [https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Actions] [https://www.gridpp.ac.uk/wiki/Category:GridPP_Operations Core Tasks]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Grid Ops****************** -----><br />
<!-- ****************Start T1 liaison****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) [https://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Agenda] Meeting takes place on Vidyo.<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
Highlights from this meeting are now included in the Tier1 report farther up this page.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Liaison****************** -----><br />
<!-- ****************Start GDB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Grid Deployment Board - [http://indico.cern.ch/categoryDisplay.py?categId=3l181 Agendas] [http://indico.cern.ch/categoryDisplay.py?categId=666 MB agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End GDB****************** -----><br />
<br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start NGI****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | NGI UK - [http://www.ukngi.ac.uk/ Homepage] [https://ca.grid-support.ac.uk/cgi-bin/pub/pki?cmd=getStaticPage&name=index CA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End NGI****************** -----><br />
<br />
<!-- ****************Start Events****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Events<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************Stop Events****************** -----><br />
<!-- ****************Start ATLAS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK ATLAS - [http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?#currentView=Shifter+view&highlight=false Shifter view] [http://www.atlas.ac.uk/ops/ News & Links]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Atlas S&C week 2-6 Feb 2015'''<br />
<br />
Production<br />
<br />
• Prodsys-2 in production since Dec 1st<br />
<br />
• Deployment has not been transparent , many issued has been solved, the grid is filled again <br />
<br />
• MC15 is expected to start soon, waiting for physics validations, evgen testing is underway and close to finalised.. Simulation expected to be broadly similar to MC14, no blockers expected. <br />
<br />
Rucio<br />
<br />
• Rucio in production since Dec 1st and is ready for LHC RUN-2. Some fields need improvements, including transfer and deletion agents, documentation and monitoring. <br />
<br />
• [https://rucio-ui.cern.ch/dumps Rucio dumps available].<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMDarkDataCleaning Dark data cleaning]<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMLostFilesLost files declaration] . Only Only DDM ops can issue lost files declaration for now, cloud support needs to fill a ticket.<br />
<br />
• Webdav panda functional tests with Hammercloud are ongoing<br />
<br />
Monitoring<br />
<br />
• [http://adc-monitoring.cern.ch/ Main page]<br />
<br />
• [http://dashb-atlas-ddm-acc.cern.ch/dashboard/request.py/ddmaccounting DDM Accounting]<br />
<br />
• [http://atlas-agis.cern.ch/agis/ddmblacklisting/listGroup space]<br />
<br />
• [http://dashb-atlas-ddm.cern.ch/ddm2/ Deletion]<br />
<br />
ASAP<br />
<br />
• ASAP (ATLAS Site Availability Performance) in place. Every 3 months the T2s sites performing BELOW 80% are [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/ATLASSiteCategorization reported to the International Computing Board.]<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End ATLAS****************** -----><br />
<!-- ****************Start CMS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK CMS<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End CMS****************** -----><br />
<!-- ****************Start LHCb****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK LHCb<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End LHCb****************** -----><br />
<!-- ****************Start Other****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK OTHER<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Other****************** -----><br />
<!-- ****************Start Requests****************** -----><br />
{| cellspacing="0" cellpadding="0" style="margin: 1em 0em 0em 0.3em; width:100%"<br />
| style="width:50%; vertical-align:top; border:1px solid Gold; background-color: LightGreen;" rowspan="2"|<br />
<div style="border-bottom:1px solid Gold; background-color:#ffffaa; padding:0.2em 0.5em 0.2em 0.5em; font-size:110%; font-weight:bold;"> To note</div><br />
<div style="padding:0.4em 1em 0.3em 1em;"><br />
<br />
==== ====<br />
<!-- ******************Edit start********************* -----><br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<br />
</div><br />
|}<br />
<br />
<!-- ****************End Requests****************** -----><br />
<br />
|}</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Operations_Bulletin_LatestOperations Bulletin Latest2015-10-19T15:05:57Z<p>Matthew Doidge 1ac9bd3994: /* */</p>
<hr />
<div>[[Operations_Bulletin_Archive|Bulletin archive]]<br />
<br />
__NOTOC__<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Week commencing 19th October 2015<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Task Areas<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start General****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | General updates <br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start General text*********************** ----->'''<br />
<br />
* Krishan mentions lcg-tags errors problems at EFDA-JET. <br />
* BDII installations affected by slapd crash. SL6/CentOS6 (openldap-servers-2.4.40-6) released as a security update, unfortunately from openldap-servers-2.4.40-5 an issue has been introduced which provokes the slapd process to crash under certain conditions.<br />
* Daniela notes: Location of archiving records on ARC-CEs... Beware the jobreport_options of "/var/run" which clears records on reboot.<br />
<br />
<br />
'''Tuesday 6th October'''<br />
* GOCDB has received a new service type request for ‘uk.ac.gridpp.vcycle’.<br />
* John H noted a "CVMFS problem at RAL". Apparently this was due to a misconfiguration at CERN.<br />
* With HPC DIRAC work RAL discovered a bug in how the re-use connection flag is used with the fts commands.<br />
* Here is a summary for the [http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/ September reports]:<br />
** [http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/WLCG_All_Sites_ALICE_Sep2015.pdf ALICE]: All okay.<br />
** [http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/WLCG_All_Sites_ATLAS_Sep2015.pdf ATLAS]: Lancaster: 38%:42% & Liverpool: 81%: 100%.<br />
** [http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/WLCG_All_Sites_CMS_Sep2015.pdf CMS]: All okay.<br />
** [ http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/WLCG_All_Sites_LHCB_Sep2015.pdf LHCb]: QMUL: 47%:47% ; Lancaster: 75%: 80% & ECDF: 89%<br />
* The UCL storage servers are no longer used by ATLAS.<br />
* The November GDB has been moved to 4th November.<br />
<br />
<br />
'''Tuesday 29th September'''<br />
* Nagios was (and therefore the regional dashboard has) affected by a weekend A/C outage at Oxford.<br />
* Steve J reports on: Condor libglobus_common problems<br />
* There was an EGI OMB on 24th September. [https://indico.egi.eu/indico/conferenceDisplay.py?confId=2380 Agenda]. <br />
* [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGDailyMeetingsWeek150928#Monday Notes from the Monday biweekly WLCG ops meeting] are available for anyone who is interested in the latest ops news.<br />
* On the topic 'Perfsonar Bandwidth checks not running' Duncan reported a move to a [https://maddash.aglt2.org/maddash-webui/index.cgi?dashboard=Latency%20tests%20between%20all%20WLCG%20hosts full WLCG mesh].<br />
* Tom would appreciate feedback on the [https://vm36.tier2.hep.manchester.ac.uk/ GridPP website v2].<br />
* Steve Lloyd has setup a [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics2.html new metrics page] as a basis for allocating T2 hardware funding. This just uses total Disk and total Elapsed and/or CPU time. In the PMB yesterday it was agreed that Elapsed time would be used, but the results of various combinations will be watched and assessed over the coming months. One overriding reason for using Elapsed time is that CPU is not provided by all cloud implementations.<br />
<br />
<br />
<!-- **********************End General text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End General****************** -----><br />
<!-- ****************Start ops coord****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Operations Coordination - [https://indico.cern.ch/categoryDisplay.py?categId=4372 Agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start ops coord text*********************** -----><br />
'''Tuesday 6th October'''<br />
* There was a WLCG ops coordination meeting last week. [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151001 Minutes]. [https://indico.cern.ch/event/393617/ Agenda] (which has John Gordon's accounting slides).<br />
* The highlights:<br />
** dCache sites should install the latest fix for SRM solving a vulnerability<br />
** All sites hosting a regional or local site xrootd should updgrade it at least to version 4.1.1<br />
** CMS DPM sites should consider upgrading dpm-xrootd to version 3.5.5 now (from epel-testing) or after mid October (from epel-stable) to fix a problem affecting AAA<br />
** Tier-1 sites should do their best to avoid scheduling OUTAGE downtimes at the same time as other Tier-1's supporting common LHC VOs. A calendar will be linked in the minutes of the 3 o'clock operations meeting to easily find out if there are already downtimes at a given date<br />
** The multicore accounting for WLCG is now correct for the 99.5% of the CPU time, with the few remaining issues being addressed. Corrected historical accounting data is expected to be available from the production portal by the end of the month<br />
** All LHCb sites will soon be asked to deploy the "machine features" functionality<br />
<br />
'''Tuesday 22nd September'''<br />
* There was an ops coordination meeting last Thursday: [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150917 Minutes].<br />
* Highlights:<br />
** All 4 experiments have now an agreed workflow with the T0 for tickets that should be handled by the experiment supporters and were accidentally assigned to the T0 service managers.<br />
** A new FTS3 bug fixing release 3.3.1 is now available.<br />
** A globus lib issue is causing problems with FTS3 for sites running IPv6.<br />
** The rogue Glasgow configuration management tool replacing the current configuration for VOMS with the old one was picked up and unfortunately discussed as though sites had not got the message about using the new VOMS.<br />
** No network problems experienced with the transatlantic link despite 3 out of 4 cables being unavailable.<br />
** T0 experts are investigating the slow WN performance reported by LHCb and others.<br />
** A group of experts at CERN and CMS investigate ARGUS authentication problems affecting CMS VOBOXes.<br />
** T1 & T2 sites please observe the actions requested by ATLAS and CMS (also on the WLCG Operations portal).<br />
* Actions for [https://wlcg-ops.web.cern.ch/sys-admins/tasks Sites]; [https://wlcg-ops.web.cern.ch/experiments/tasks Experiments].<br />
<br />
'''Tuesday 15th September'''<br />
* The [https://indico.cern.ch/event/393616/ next WLCG ops coordination meeting is this Thursday 17th September]. Are there any Tier-2 issues we wish to raise? Minutes will appear [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150917 here]. <br />
<br />
<br />
<!-- **********************End ops coord text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End ops coord****************** -----><br />
<br />
<!-- ****************Start T1****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-1 - [http://www.gridpp.rl.ac.uk/status/ Status Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start T1 text*********************** -----><br />
'''Tuesday 13th October'''<br />
A reminder that there is a weekly [http://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Tier-1 experiment liaison meeting]. Notes from the last meeting [https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2015-10-07 here]<br />
* The final step in the upgrade of the Castor Oracle databases to version 11.2.0.4 is taking place '''today'''. At the time of the meeting Castor is down.<br />
* There was a successful UPS/Generator load test last Wednesday (7th).<br />
* As reported last week - we did have a problem with glexec for the worker nodes over the weekend before last caused by a configuration error. We are trying to understand why this problem was not seen during the testing and roll-out of the new configuration.<br />
* We have seen very high load on Atlas Tape. We are increasing the size of the disk buffer in front of this in order to improve its performance.<br />
<!-- **********************End T1 text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End T1****************** -----><br />
<br />
<!-- ****************Start Storage & DM****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Storage & Data Management - [http://storage.esc.rl.ac.uk/weekly/ Agendas/Minutes]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150902-minutes.txt Wednesday 02 Sep]'''<br />
* Catch up with [http://www3.imperial.ac.uk/highenergyphysics/research/experiments/mice MICE]<br />
* How to do transfers of '''lots''' of files with FTS3 without the proxy timing out (in particularly if you need it vomsified)<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150812-minutes.txt Wednesday 12 Aug]'''<br />
* sort of housekeeping: data cleanups, catalogue synchronisation - in particular namespace dumps for VOs<br />
* GridPP storage/data at future events; GridPP35 and Hepix and Cloud data events<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150708-minutes.txt Wednesday 08 July]'''<br />
* Huge backlog of ATLAS data from Glasgow waiting to go to RAL, and oddly varying performance numbers - investigating<br />
* How physics data is like your Windows 95 games<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150701-minutes.txt Wednesday 01 July]'''<br />
* Feedback on CMS's proposal for listing contents of storage<br />
* Simple storage on expensive raided disks vs complicated storage on el cheapo or archive drives?<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150624-minutes.txt Wednesday 24 June]'''<br />
* Heard about the Indigo datacloud project, a H2020 project in which STFC is participating<br />
* Data transfers, theory and practice<br />
** Somewhat clunky tools to set up but perform well when they run<br />
** Will continue to work on recommendations/overview document<br />
** Worth having recommendations/experiences for different audiences - (potential) users, decision makers, techies<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End Storage & DM****************** -----><br />
<!-- ****************Start T2EVO****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-2 Evolution - [https://its.cern.ch/jira/browse/GRIDPP/ GridPP JIRA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 6 Oct'''<br />
* UCL Vac site now running LHCb test of two payloads per dual processor VM. Total of dual processor VMs at UCL now 120.<br />
<br />
'''Tuesday 29 Sep'''<br />
* UCL Vac site updated with most recent version of Vac-in-a-Box. Now running ~216 jobs: LHCb MC and ATLAS certification jobs.<br />
* Drawing up list of tasks needed to be able to run a site for GridPP-supported VOs purely using VMs (e.g. VM certification by experiments etc.)<br />
* Discussion at GridPP Technical Meeting on storage options, including xrootd-based sites (i.e. xrootd not DPM/dCache)<br />
<br />
'''Tuesday 22 Sep'''<br />
* Fortnightly [https://indico.cern.ch/category/4454/ GridPP Technical Meetings] on Fridays will have Tier-2 Evolution discussions, starting on Fri 25 Sept.<br />
<br />
'''Thursday 17 Sep'''<br />
* Task force to start developing advice for sites to simplify their operation in line with "6.2.5 Evolution of Tier-2 sites" in the GridPP5 proposal.<br />
* Mailing list for Tier-2 evolution activities: gridpp-t2evo@cern.ch - anyone welcome to join<br />
* Also [https://its.cern.ch/jira/browse/GRIDPP/ GridPP project] on the CERN JIRA service for tracking actions. Can be used with a full or lightweight CERN account. You need to be added manually or on the gridpp-ops@cern.ch mailing list to browse issues.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End T2EVO****************** -----><br />
<!-- ****************Start Accounting****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Accounting - [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html UK Grid Metrics] [[HEPSPEC06]] [http://tinyurl.com/cfbfanf Atlas Dashboard HS06]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 22nd September'''<br />
* Slight delay for Sheffield but overall okay - although there is a gap between today's date and the most recent update for all sites. Perhaps an APEL delay.<br />
<br />
'''Monday 20th July'''<br />
* Oxford publishing 0 cores from Cream today. Maybe they forgot to switch one off. [http://goc-accounting.grid-support.ac.uk/apel/jobs2_withsubmithost.html Check here]. <br />
<br />
'''Tuesday 14th July'''<br />
* QMUL and Sheffield appear to be lagging with publishing by a week. <br />
* Please check your multicore publishing status (especially those sites mentioned in June).<br />
<br />
* A reminder to keep updating the [https://www.gridpp.ac.uk/wiki/HEPSPEC06 HEPSPEC06 tables].<br />
<br />
* [http://accounting.egi.eu/egi.php?SubRegion=1.67&query=normcpu&startYear=2014&startMonth=8&endYear=2014&endMonth=9&yRange=SITE&xRange=VO&voGroup=lhc&chart=GRBAR&scale=LIN&localJobs=onlygridjobs APEL status]: An issue at Sheffield?<br />
<br />
<!-- *****************Edit stop*********************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End Accounting****************** -----><br />
<!-- ****************Start Documentation****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Documentation - [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=area KeyDocs]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- *********************************************************** -----><br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 29nd September'''<br />
Steve J: problems with voms server at fnal, voms.fnal.gov, have been detected; I will resolve them soon and may issue an update to Approved VOs, alerting sites with TB_SUPPORT should that occur. Approved VOs potentially affected are CDF, DZERO, LSST. Please do not act act yet.<br />
<br />
'''Tuesday 22nd September'''<br />
* Steve J is going to undertake some GridPP/documentation usability testing. <br />
<br />
'''Tuesday 18th August'''<br />
* Lydia's document - Setup a system to do data archiving using FTS3<br />
<br />
'''Tuesday 28th July'''<br />
* Ewan: /cvmfs/gridpp-vo help ... there's a lot of historical stuff on the GridPP wiki that makes it look a lot more complicated than it is now. We really should have a bit of a clear out at some point.<br />
<br />
'''Tuesday 23rd June'''<br />
* Reminder that documents need reviewing!<br />
<br />
<br />
'''General note'''<br />
<br />
See the [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=reviewed worst KeyDocs list] for documents needing review now and the names of the responsible people.<br />
<br />
|}<br />
<br />
<!-- ****************End Documentation****************** -----><br />
<!-- ****************Start Interop****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Interoperation - [https://wiki.egi.eu/wiki/Grid_Operations_Meetings EGI ops agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Monday 13th July'''<br />
<br />
* There was an EGI Ops meeting today: [https://wiki.egi.eu/wiki/Agenda-13-07-2015 agenda]<br />
* URT/UMD updates:<br />
** UMD 3.13.0 released on 01.07.2015: http://repository.egi.eu/2015/07/01/release-umd-3-13-0/<br />
*** APEL 1.4.1<br />
*** Argus PAP v. 1.6.2<br />
*** gLExec-wn - v. 1.2.3 (lcmaps and mkdir)<br />
*** storm 1.11.8<br />
*** fetch-crl 3.0.16<br />
*** cream 1.16.5<br />
*** dpm-xroot 3.5.2<br />
*** Xroot 4.1.1.<br />
*** Frontier Squid 2.7.24<br />
*** CVMFS 2.1.20<br />
*** GFAL2 2.8.4<br />
*** GFAL2-PYTHON 1.7.1<br />
** UMD 3.13.1 released on 13.07.2015: http://repository.egi.eu/2015/07/13/release-candidate-umd-3-13-1-rc1/ (link was not updated correctly during release)<br />
*** ARC Nagios probes 1.8.3<br />
<br />
* SR updates (small because it's summer):<br />
*** gfal2 2.9.1<br />
*** storm 1.11.9<br />
*** srm-ifce 1.23.1....<br />
*** gfal2-python 1.8.1<br />
** In Verification<br />
*** gfal2-plugin-xrootd 0.3.4<br />
<br />
* Accounting<br />
** [John Gordon] "Of the WLCG sites we now have 97%+ of cpu reported with cores. I expect you all saw my recent email to GDB naming 16 sites. If one German and one Spanish site and the four Russians start publishing we will jump to 99%+"<br />
** New list of sites needing to update multicore accounting being prepared this evening (Monday) by Vincenzo<br />
<br />
* SL5 decommissioning date March 2016; <br />
* Next meeting 10th August<br />
<br />
'''Monday 15th June'''<br />
* There was an EGI operations meeting today: [https://wiki.egi.eu/wiki/Agenda-15-06-2015 agenda]. <br />
* New Action: for the NGIs: please start tracking which sites are still using SL5 services: how many services, and for each service if still needed on SL5, if upgrades on SL5 services are expected). [https://wiki.egi.eu/wiki/SL5_retirement A wiki has been provided to record updates]. Also interesting to understand who is using Debian.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- *********************************************************** -----><br />
<br />
|}<br />
<!-- ****************End Interop****************** -----><br />
<!-- ****************Start Monitoring****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Monitoring - [https://www.gridpp.ac.uk/wiki/Links_Monitoring_pages Links] [http://grid-monitoring.cern.ch/mywlcg/ MyWLCG]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 16th June'''<br />
* F Melaccio & D Crooks decided to add a [https://www.gridpp.ac.uk/wiki/Monitoring_FAQs FAQs section] devoted to common monitoring issues under the monitoring page.<br />
* Feedback welcome. <br />
<br />
<br />
<br />
''' Tuesday 31st March<br />
<br />
* Glasgow Graphite/Grafana documentation: http://www.scotgrid.ac.uk/graphite/<br />
<br />
'''Monday 7th December<br />
<br />
* Meeting last Friday - agenda: https://indico.cern.ch/event/356853/ minutes: https://indico.cern.ch/event/356853/material/minutes/1.pdf<br />
* This was the wrap-up meeting of the consolidation TF; the mailing list will remain extant for a while yet.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Monitoring****************** -----><br />
<!-- ****************On-duty****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | On-duty - [https://operations-portal.in2p3.fr/dashboard Dashboard] [https://www.gridpp.ac.uk/wiki/ROD_rota ROD rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 6th October'''<br />
* With the exception of the dashboard getting really confused early in the week as the Nagios instances at Oxford and Lancaster came and went, it's been a fairly quiet week. There are four outstanding tickets:<br />
** Three for availability / reliabaility (Sussex, Liverpool and Lancaster).<br />
** One at Bristol for a GridFTP transfer problem.<br />
<br />
'''Tuesday 15th September'''<br />
* Generally quiet. QMUL have some grumblyness with the CEs. However, I understand much of this is caused by the batch farm being busy. There are low-availability tickets 'on hold' for Liverpool and UCL.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Rollout [http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html Status] [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions WLCG Baseline]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 15th September'''<br />
* [http://repository.egi.eu/2015/09/10/release-umd-3-13-3/ UMD 3.13.3 is available].<br />
<br />
'''Tuesday 12th May'''<br />
* MW Readiness WG meeting Wed May 6th at 4pm. Attended by Raul, Matt, Sam and Jeremy.<br />
<br />
'''Tuesday 17th March'''<br />
* Daniela has updated the [ https://www.gridpp.ac.uk/wiki/Staged_rollout_emi3 EMI-3 testing table]. Please check it is correct for your site. We want a clear view of where we are contributing.<br />
* There is a middleware readiness meeting this Wednesday. Would be good if a few site representatives joined.<br />
* Machine job features solution testing. Fed back that we will only commence tests if more documentation made available. This stops the HTC solution until after CHEP. Is there interest in testing other batch systems? Raul mentioned SLURM. There is also SGE and Torque.<br />
<br />
'''References'''<br />
<br />
* Staged Rollout pages (now separated into EMI1 & 2), and the page listing the deployed versions is extractable from the bdii, so they should all be reasonably up-to-date:<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout_emi2.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End On-duty****************** -----><br />
<!-- ****************Start Security****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Security - [http://www.gridpp.ac.uk/security/inchand/Incident.html Incident Procedure] [http://www.gridpp.ac.uk/security/policies/index.html Policies] [https://www.gridpp.ac.uk/wiki/SD_rota Rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
'''Tuesday 13th October'''<br />
* Nothing to report<br />
* Next UK Security Team meeting scheduled for 28th Oct.<br />
<br />
'''Monday 5th October'''<br />
* Updated IGTF distribution version 1.68 available - https://dist.igtf.net/distribution/igtf/current/<br />
* Update on incident broadcast EGI-20150925-01 relating to compromised systems in China. - The EGI, WLCG and VO security teams are continuing their investigations. Affected sites and users have been contacted and there is no present indication of further action needed by any site in the UK. However, as more information comes to light, additional updates may be made in the near future and sites are asked as always to read any updates carefully, taking actions as recommended.<br />
<br />
'''Tuesday 29th September'''<br />
* Incident broadcast EGI-20150925-01 relating to compromised systems in China.<br />
* UK security team meeting scheduled for 30th Sept. <br />
<br />
'''Monday 29th September'''<br />
* IGTF has released a regular update to the trust anchor repository (1.68) - for distribution ON OR AFTER October 5th<br />
<br />
<br />
The EGI [https://operations-portal.egi.eu/csiDashboard security dashboard].<br />
<br />
|}<br />
<!-- ****************End Security****************** -----><br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start Services****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Services - [http://netmon02.grid.hep.ph.ic.ac.uk:8080/maddash-webui/index.cgi PerfSonar dashboard] | [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).<br />
<br />
'''Tuesday 6th October'''<br />
* A reminder, the next [https://indico.cern.ch/event/401680/ LHCOPN and LHCONE joint meeting]: Science Park Amsterdam (NL) 28-29 of October 2015<br />
<br />
'''Tuesday 14th July'''<br />
* GridPP35 in September will have a part focus on networking and IPv6. This will include a review of where sites are with their deployment. Please try to firm up dates for your IPv6 availability between now and September. Please update the [https://gridpp.ac.uk/wiki/IPv6_site_status GridPP IPv6 status table]. <br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Services****************** -----><br />
<!-- ****************Start Tickets****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tickets<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Monday 19th October 2015, 14.30 BST'''<br /><br />
28 Open UK Tickets today.<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116920 116920] (14/10)<br /><br />
UCL have a availability ticket, and Andrew M wonders what can be done for them as a VAC site to stop getting these sort of tickets? Assigned (15/10)<br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116865 116865] (12/10)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116915 116915] (14/10)<br /><br />
Sussex have a Sno+ ticket and a ROD ticket that don't seem to have been looked at since they were submitted last week. Both just Assigned.<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116918 116918] (14/10)<br /><br />
Another ROD ticket (Invalid glue), I think this one has snuck past the Liver-Lad's watch. Assigned (14/10)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116782 116782] (7/10)<br /><br />
Another Rod ticket, Dan looks like he's tracked down why one of his CE's is misbehaving (MaxStartups in sshd_config). Looks like this ticket at least can be closed, Gareth confirms that the test is green. Waiting for reply (19/10)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116560 116560] (30/9)<br /><br />
Stephen B has added some more information to try to figure out why Sno+ jobs are flooding Sheffield's 10 Sno+ slots. Elena is not sure if the problem persists though. Waiting for reply (12/10)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116864 116864] (12/10)<br /><br />
It looks like this CMS AAA test problem has resolved itself - Federica asks if you chaps at RAL changed anything? Looks like the ticket can be closed. In progress (15/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116866 116866] (12/10)<br /><br />
Enabling Sno+ pilots at the Tier 1. Only the LHC VOs had pilot roles enabled at the Tier 1, Andrew was going to discuss how best to make these changes. As with a similar issue at Lancaster - probably best to do it for all the VOs that will be using Dirac. In progress (13/10)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tickets****************** -----><br />
<!-- ****************Start Tools****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tools - [https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] [https://gridppnagios.physics.ox.ac.uk/nagios/ Nagios]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== ===== <br />
<!-- ******************Edit start********************* -----><br />
''' Tuesday 6 Oct 2015'''<br />
<br />
Moved Gridppnagios instance back to Oxford from Lancaster. It was kind of double whammy as both sites went down together. Fortunately Oxford site was partially working so we managed to start SAM Nagios at Oxford. Sam tests were unavailable for few hours but no affect on egi availibilty/reliability. Sites can have a look at https://mon.egi.eu/myegi/ss/ for a/r status. <br />
<br />
'''Tuesday 29 Sep 2015'''<br />
<br />
Following an air-conditioning problem at machine room in Oxford Tier-2 site on 26 September, gridppnagios(OX) was shut down and gridppnagios(Lancs) became active instance. Oxford site is in downtime until 1st Oct and it may be extended depending on the situation. <br />
VO-Nagios was also unavailable for two days but we have started it yesterday as it is running on a VM. VO-nagios is using oxford SE for replication test so it is failing those tests. I am looking to change to some other SE. <br />
<br />
'''Tuesday 09 June 2015'''<br />
*ARC CEs were failing nagios test becuase of non-availability of egi repository. Nagios test compare CA version from EGI repo. It started on 5th June and one of the IP addresses behind webserver was not responding. Problem went away in approximately 3 hours. The same problem started again on 6th June. Finally it was fixed on 8th June. No reason was given in any of the ticket opened regarding this outage. <br />
<br />
'''Tuesday 17th February'''<br />
* Another period where message brokers were temporarily unavailable seen yesterday. Any news on the last follow-up?<br />
<br />
'''Tuesday 27th January'''<br />
* Unscheduled outage of the EGI message broker (GRNET) caused a short-lived disruption to GridPP site monitoring (jobs failed) last Thursday 22nd January. Suspect BDII caching meant no immediate failover to stomp://mq.cro-ngi.hr:6163/ from stomp://mq.afroditi.hellasgrid.gr:6163/<br />
<br />
* [http://southgrid.blogspot.co.uk/2014/10/nagios-monitoring-for-non-lhc-vos.html Blog about VO Nagios]<br />
* [https://vo-nagios.physics.ox.ac.uk/nagios/ Oxford VO Nagios] currently monitoring gridpp, pheno, t2k.org, snoplus.snolab.ca, vo.southgrid.ac.uk.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tools****************** -----><br />
<!-- ****************Start VOs****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | VOs - [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS] [http://operations-portal.egi.eu/vo VO IDs] [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs Approved] [http://pprc.qmul.ac.uk/~walker/votable.html VO table]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 19th May'''<br />
* There is a current priority for enabling/supporting our joining communities. <br />
<br />
'''Tuesday 5th May'''<br />
* We have a number of VOs to be removed. Dedicated follow-up meeting proposed.<br />
<br />
'''Tuesday 28th April'''<br />
* For SNOPLUS.SNOLAB.CA, the port numbers for voms02.gridpp.ac.uk and voms03.gridpp.ac.uk have both been updated from 15003 to 15503.<br />
<br />
'''Tuesday 31st March'''<br />
* LIGO are in need of additional support for debugging some tests.<br />
* LSST now enabled on 3 sites. No 'own' CVMFS yet.<br />
<br />
* Impact<br />
** Citation policy (https://www.gridpp.ac.uk/acknowledging.html)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End VOs****************** -----><br />
<!-- ****************Start Sites****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Site Updates<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 24th February'''<br />
* Next review of status today.<br />
<br />
'''Tuesday 27th January'''<br />
* Squids not in GOCDB for: UCL; ECDF; Birmingham; Durham; RHUL; IC; Sussex; Lancaster<br />
* Squids in GOCDB for: EFDA-JET; Manchester; Liverpool; Cambridge; Sheffield; Bristol; Brunel; QMUL; T1; Oxford; Glasgow; RALPPD.<br />
<br />
'''Tuesday 2nd December''' <br />
* [https://www.gridpp.ac.uk/wiki/Batch_system_status Multicore status]. Queues available (63%)<br />
** YES: RAL T1; Brunel; Imperial; QMUL; Lancaster; Liverpool; Manchester; Glasgow; Cambridge; Oxford; RALPP; Sussex (12)<br />
** NO: RHUL (testing); UCL; Sheffield (testing); Durham; ECDF (testing); Birmingham; Bristol (7)<br />
<br />
* According to our [https://www.gridpp.ac.uk/wiki/Batch_system_status table] for cloud/VMs (26%)<br />
** YES: RAL T1; Brunel; Imperial; Manchester; Oxford (5)<br />
** NO: QMUL; RHUL; UCL; Lancaster; Liverpool; Sheffield; Durham; ECDF; Glasgow; Birmingham; Bristol; Cambridge; RALPP; Sussex (14)<br />
<br />
* [http://www.gridpp.ac.uk/php/gridpp-dirac-sam.php?action=viewlcg GridPP DIRAC jobs] successful (58%)<br />
** YES: Bristol; Glasgow; Lancaster; Liverpool; Manchester; Oxford; Sheffield; Brunel; IC; QMUL; RHUL (11)<br />
** NO: Cambridge; Durham; RALPP; RAL T1 (4) + ECDF; Sussex; UCL; Birmingham (4)<br />
<br />
* [https://www.gridpp.ac.uk/wiki/IPv6_site_status IPv6 status]<br />
** Allocation - 42%<br />
** YES: RAL T1; Brunel; IC; QMUL; Manchester; Sheffield; Cambridge; Oxford (8)<br />
** NO: RHUL; UCL; Lancaster; Liverpool; Durham; ECDF; Glasgow; Birmingham; Bristol; RALPP; Sussex<br />
<br />
* Dual stack nodes - 21%<br />
** YES: Brunel; IC; QMUL; Oxford (4)<br />
** NO: RHUL; UCL; Lancaster; Glasgow; Liverpool; Manchester; Sheffield; Durham; ECDF; Birmingham; Bristol; Cambridge; RALPP; Sussex, RAL T1 (15)<br />
<br />
<br />
<br />
'''Tuesday 21st October'''<br />
* High loads seen in xroot by several sites: Liverpool and RALT1... and also Bristol (see Luke's TB-S email on 16/10 for questions about changes to help).<br />
<br />
'''Tuesday 9th September'''<br />
* Intel announced the new generation of Xeon based on Haswell.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Sites****************** -----><br />
|}<br />
<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Meeting Summaries<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start PMB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Project Management Board - [http://www.gridpp.ac.uk/pmb/ Members][http://www.gridpp.ac.uk/php/pmb/minutes.php Minutes] [http://www.gridpp.ac.uk/pmb/ProjectManagement/QuarterlyReports/reports.html Quarterly Reports]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ******************************************* -----><br />
<!-- ****************End PMB****************** -----><br />
<!-- ****************Start Grid Ops****************** -----><br />
<!-- ******************************************* -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | GridPP ops meeting - [http://indico.cern.ch/categoryDisplay.py?categId=338 Agendas] [https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Actions] [https://www.gridpp.ac.uk/wiki/Category:GridPP_Operations Core Tasks]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Grid Ops****************** -----><br />
<!-- ****************Start T1 liaison****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) [https://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Agenda] Meeting takes place on Vidyo.<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
Highlights from this meeting are now included in the Tier1 report farther up this page.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Liaison****************** -----><br />
<!-- ****************Start GDB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Grid Deployment Board - [http://indico.cern.ch/categoryDisplay.py?categId=3l181 Agendas] [http://indico.cern.ch/categoryDisplay.py?categId=666 MB agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End GDB****************** -----><br />
<br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start NGI****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | NGI UK - [http://www.ukngi.ac.uk/ Homepage] [https://ca.grid-support.ac.uk/cgi-bin/pub/pki?cmd=getStaticPage&name=index CA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End NGI****************** -----><br />
<br />
<!-- ****************Start Events****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Events<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************Stop Events****************** -----><br />
<!-- ****************Start ATLAS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK ATLAS - [http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?#currentView=Shifter+view&highlight=false Shifter view] [http://www.atlas.ac.uk/ops/ News & Links]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Atlas S&C week 2-6 Feb 2015'''<br />
<br />
Production<br />
<br />
• Prodsys-2 in production since Dec 1st<br />
<br />
• Deployment has not been transparent , many issued has been solved, the grid is filled again <br />
<br />
• MC15 is expected to start soon, waiting for physics validations, evgen testing is underway and close to finalised.. Simulation expected to be broadly similar to MC14, no blockers expected. <br />
<br />
Rucio<br />
<br />
• Rucio in production since Dec 1st and is ready for LHC RUN-2. Some fields need improvements, including transfer and deletion agents, documentation and monitoring. <br />
<br />
• [https://rucio-ui.cern.ch/dumps Rucio dumps available].<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMDarkDataCleaning Dark data cleaning]<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMLostFilesLost files declaration] . Only Only DDM ops can issue lost files declaration for now, cloud support needs to fill a ticket.<br />
<br />
• Webdav panda functional tests with Hammercloud are ongoing<br />
<br />
Monitoring<br />
<br />
• [http://adc-monitoring.cern.ch/ Main page]<br />
<br />
• [http://dashb-atlas-ddm-acc.cern.ch/dashboard/request.py/ddmaccounting DDM Accounting]<br />
<br />
• [http://atlas-agis.cern.ch/agis/ddmblacklisting/listGroup space]<br />
<br />
• [http://dashb-atlas-ddm.cern.ch/ddm2/ Deletion]<br />
<br />
ASAP<br />
<br />
• ASAP (ATLAS Site Availability Performance) in place. Every 3 months the T2s sites performing BELOW 80% are [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/ATLASSiteCategorization reported to the International Computing Board.]<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End ATLAS****************** -----><br />
<!-- ****************Start CMS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK CMS<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End CMS****************** -----><br />
<!-- ****************Start LHCb****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK LHCb<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End LHCb****************** -----><br />
<!-- ****************Start Other****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK OTHER<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Other****************** -----><br />
<!-- ****************Start Requests****************** -----><br />
{| cellspacing="0" cellpadding="0" style="margin: 1em 0em 0em 0.3em; width:100%"<br />
| style="width:50%; vertical-align:top; border:1px solid Gold; background-color: LightGreen;" rowspan="2"|<br />
<div style="border-bottom:1px solid Gold; background-color:#ffffaa; padding:0.2em 0.5em 0.2em 0.5em; font-size:110%; font-weight:bold;"> To note</div><br />
<div style="padding:0.4em 1em 0.3em 1em;"><br />
<br />
==== ====<br />
<!-- ******************Edit start********************* -----><br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<br />
</div><br />
|}<br />
<br />
<!-- ****************End Requests****************** -----><br />
<br />
|}</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Operations_Bulletin_LatestOperations Bulletin Latest2015-10-19T09:03:00Z<p>Matthew Doidge 1ac9bd3994: /* */</p>
<hr />
<div>[[Operations_Bulletin_Archive|Bulletin archive]]<br />
<br />
__NOTOC__<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Week commencing 12th October 2015<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Task Areas<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start General****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | General updates <br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start General text*********************** ----->'''<br />
<br />
* Krishan mentions lcg-tags errors problems at EFDA-JET. <br />
* BDII installations affected by slapd crash. SL6/CentOS6 (openldap-servers-2.4.40-6) released as a security update, unfortunately from openldap-servers-2.4.40-5 an issue has been introduced which provokes the slapd process to crash under certain conditions.<br />
* Daniela notes: Location of archiving records on ARC-CEs... Beware the jobreport_options of "/var/run" which clears records on reboot.<br />
<br />
<br />
'''Tuesday 6th October'''<br />
* GOCDB has received a new service type request for ‘uk.ac.gridpp.vcycle’.<br />
* John H noted a "CVMFS problem at RAL". Apparently this was due to a misconfiguration at CERN.<br />
* With HPC DIRAC work RAL discovered a bug in how the re-use connection flag is used with the fts commands.<br />
* Here is a summary for the [http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/ September reports]:<br />
** [http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/WLCG_All_Sites_ALICE_Sep2015.pdf ALICE]: All okay.<br />
** [http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/WLCG_All_Sites_ATLAS_Sep2015.pdf ATLAS]: Lancaster: 38%:42% & Liverpool: 81%: 100%.<br />
** [http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/WLCG_All_Sites_CMS_Sep2015.pdf CMS]: All okay.<br />
** [ http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/WLCG_All_Sites_LHCB_Sep2015.pdf LHCb]: QMUL: 47%:47% ; Lancaster: 75%: 80% & ECDF: 89%<br />
* The UCL storage servers are no longer used by ATLAS.<br />
* The November GDB has been moved to 4th November.<br />
<br />
<br />
'''Tuesday 29th September'''<br />
* Nagios was (and therefore the regional dashboard has) affected by a weekend A/C outage at Oxford.<br />
* Steve J reports on: Condor libglobus_common problems<br />
* There was an EGI OMB on 24th September. [https://indico.egi.eu/indico/conferenceDisplay.py?confId=2380 Agenda]. <br />
* [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGDailyMeetingsWeek150928#Monday Notes from the Monday biweekly WLCG ops meeting] are available for anyone who is interested in the latest ops news.<br />
* On the topic 'Perfsonar Bandwidth checks not running' Duncan reported a move to a [https://maddash.aglt2.org/maddash-webui/index.cgi?dashboard=Latency%20tests%20between%20all%20WLCG%20hosts full WLCG mesh].<br />
* Tom would appreciate feedback on the [https://vm36.tier2.hep.manchester.ac.uk/ GridPP website v2].<br />
* Steve Lloyd has setup a [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics2.html new metrics page] as a basis for allocating T2 hardware funding. This just uses total Disk and total Elapsed and/or CPU time. In the PMB yesterday it was agreed that Elapsed time would be used, but the results of various combinations will be watched and assessed over the coming months. One overriding reason for using Elapsed time is that CPU is not provided by all cloud implementations.<br />
<br />
<br />
<!-- **********************End General text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End General****************** -----><br />
<!-- ****************Start ops coord****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Operations Coordination - [https://indico.cern.ch/categoryDisplay.py?categId=4372 Agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start ops coord text*********************** -----><br />
'''Tuesday 6th October'''<br />
* There was a WLCG ops coordination meeting last week. [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151001 Minutes]. [https://indico.cern.ch/event/393617/ Agenda] (which has John Gordon's accounting slides).<br />
* The highlights:<br />
** dCache sites should install the latest fix for SRM solving a vulnerability<br />
** All sites hosting a regional or local site xrootd should updgrade it at least to version 4.1.1<br />
** CMS DPM sites should consider upgrading dpm-xrootd to version 3.5.5 now (from epel-testing) or after mid October (from epel-stable) to fix a problem affecting AAA<br />
** Tier-1 sites should do their best to avoid scheduling OUTAGE downtimes at the same time as other Tier-1's supporting common LHC VOs. A calendar will be linked in the minutes of the 3 o'clock operations meeting to easily find out if there are already downtimes at a given date<br />
** The multicore accounting for WLCG is now correct for the 99.5% of the CPU time, with the few remaining issues being addressed. Corrected historical accounting data is expected to be available from the production portal by the end of the month<br />
** All LHCb sites will soon be asked to deploy the "machine features" functionality<br />
<br />
'''Tuesday 22nd September'''<br />
* There was an ops coordination meeting last Thursday: [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150917 Minutes].<br />
* Highlights:<br />
** All 4 experiments have now an agreed workflow with the T0 for tickets that should be handled by the experiment supporters and were accidentally assigned to the T0 service managers.<br />
** A new FTS3 bug fixing release 3.3.1 is now available.<br />
** A globus lib issue is causing problems with FTS3 for sites running IPv6.<br />
** The rogue Glasgow configuration management tool replacing the current configuration for VOMS with the old one was picked up and unfortunately discussed as though sites had not got the message about using the new VOMS.<br />
** No network problems experienced with the transatlantic link despite 3 out of 4 cables being unavailable.<br />
** T0 experts are investigating the slow WN performance reported by LHCb and others.<br />
** A group of experts at CERN and CMS investigate ARGUS authentication problems affecting CMS VOBOXes.<br />
** T1 & T2 sites please observe the actions requested by ATLAS and CMS (also on the WLCG Operations portal).<br />
* Actions for [https://wlcg-ops.web.cern.ch/sys-admins/tasks Sites]; [https://wlcg-ops.web.cern.ch/experiments/tasks Experiments].<br />
<br />
'''Tuesday 15th September'''<br />
* The [https://indico.cern.ch/event/393616/ next WLCG ops coordination meeting is this Thursday 17th September]. Are there any Tier-2 issues we wish to raise? Minutes will appear [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150917 here]. <br />
<br />
<br />
<!-- **********************End ops coord text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End ops coord****************** -----><br />
<br />
<!-- ****************Start T1****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-1 - [http://www.gridpp.rl.ac.uk/status/ Status Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start T1 text*********************** -----><br />
'''Tuesday 13th October'''<br />
A reminder that there is a weekly [http://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Tier-1 experiment liaison meeting]. Notes from the last meeting [https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2015-10-07 here]<br />
* The final step in the upgrade of the Castor Oracle databases to version 11.2.0.4 is taking place '''today'''. At the time of the meeting Castor is down.<br />
* There was a successful UPS/Generator load test last Wednesday (7th).<br />
* As reported last week - we did have a problem with glexec for the worker nodes over the weekend before last caused by a configuration error. We are trying to understand why this problem was not seen during the testing and roll-out of the new configuration.<br />
* We have seen very high load on Atlas Tape. We are increasing the size of the disk buffer in front of this in order to improve its performance.<br />
<!-- **********************End T1 text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End T1****************** -----><br />
<br />
<!-- ****************Start Storage & DM****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Storage & Data Management - [http://storage.esc.rl.ac.uk/weekly/ Agendas/Minutes]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150902-minutes.txt Wednesday 02 Sep]'''<br />
* Catch up with [http://www3.imperial.ac.uk/highenergyphysics/research/experiments/mice MICE]<br />
* How to do transfers of '''lots''' of files with FTS3 without the proxy timing out (in particularly if you need it vomsified)<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150812-minutes.txt Wednesday 12 Aug]'''<br />
* sort of housekeeping: data cleanups, catalogue synchronisation - in particular namespace dumps for VOs<br />
* GridPP storage/data at future events; GridPP35 and Hepix and Cloud data events<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150708-minutes.txt Wednesday 08 July]'''<br />
* Huge backlog of ATLAS data from Glasgow waiting to go to RAL, and oddly varying performance numbers - investigating<br />
* How physics data is like your Windows 95 games<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150701-minutes.txt Wednesday 01 July]'''<br />
* Feedback on CMS's proposal for listing contents of storage<br />
* Simple storage on expensive raided disks vs complicated storage on el cheapo or archive drives?<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150624-minutes.txt Wednesday 24 June]'''<br />
* Heard about the Indigo datacloud project, a H2020 project in which STFC is participating<br />
* Data transfers, theory and practice<br />
** Somewhat clunky tools to set up but perform well when they run<br />
** Will continue to work on recommendations/overview document<br />
** Worth having recommendations/experiences for different audiences - (potential) users, decision makers, techies<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End Storage & DM****************** -----><br />
<!-- ****************Start T2EVO****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-2 Evolution - [https://its.cern.ch/jira/browse/GRIDPP/ GridPP JIRA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 6 Oct'''<br />
* UCL Vac site now running LHCb test of two payloads per dual processor VM. Total of dual processor VMs at UCL now 120.<br />
<br />
'''Tuesday 29 Sep'''<br />
* UCL Vac site updated with most recent version of Vac-in-a-Box. Now running ~216 jobs: LHCb MC and ATLAS certification jobs.<br />
* Drawing up list of tasks needed to be able to run a site for GridPP-supported VOs purely using VMs (e.g. VM certification by experiments etc.)<br />
* Discussion at GridPP Technical Meeting on storage options, including xrootd-based sites (i.e. xrootd not DPM/dCache)<br />
<br />
'''Tuesday 22 Sep'''<br />
* Fortnightly [https://indico.cern.ch/category/4454/ GridPP Technical Meetings] on Fridays will have Tier-2 Evolution discussions, starting on Fri 25 Sept.<br />
<br />
'''Thursday 17 Sep'''<br />
* Task force to start developing advice for sites to simplify their operation in line with "6.2.5 Evolution of Tier-2 sites" in the GridPP5 proposal.<br />
* Mailing list for Tier-2 evolution activities: gridpp-t2evo@cern.ch - anyone welcome to join<br />
* Also [https://its.cern.ch/jira/browse/GRIDPP/ GridPP project] on the CERN JIRA service for tracking actions. Can be used with a full or lightweight CERN account. You need to be added manually or on the gridpp-ops@cern.ch mailing list to browse issues.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End T2EVO****************** -----><br />
<!-- ****************Start Accounting****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Accounting - [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html UK Grid Metrics] [[HEPSPEC06]] [http://tinyurl.com/cfbfanf Atlas Dashboard HS06]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 22nd September'''<br />
* Slight delay for Sheffield but overall okay - although there is a gap between today's date and the most recent update for all sites. Perhaps an APEL delay.<br />
<br />
'''Monday 20th July'''<br />
* Oxford publishing 0 cores from Cream today. Maybe they forgot to switch one off. [http://goc-accounting.grid-support.ac.uk/apel/jobs2_withsubmithost.html Check here]. <br />
<br />
'''Tuesday 14th July'''<br />
* QMUL and Sheffield appear to be lagging with publishing by a week. <br />
* Please check your multicore publishing status (especially those sites mentioned in June).<br />
<br />
* A reminder to keep updating the [https://www.gridpp.ac.uk/wiki/HEPSPEC06 HEPSPEC06 tables].<br />
<br />
* [http://accounting.egi.eu/egi.php?SubRegion=1.67&query=normcpu&startYear=2014&startMonth=8&endYear=2014&endMonth=9&yRange=SITE&xRange=VO&voGroup=lhc&chart=GRBAR&scale=LIN&localJobs=onlygridjobs APEL status]: An issue at Sheffield?<br />
<br />
<!-- *****************Edit stop*********************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End Accounting****************** -----><br />
<!-- ****************Start Documentation****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Documentation - [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=area KeyDocs]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- *********************************************************** -----><br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 29nd September'''<br />
Steve J: problems with voms server at fnal, voms.fnal.gov, have been detected; I will resolve them soon and may issue an update to Approved VOs, alerting sites with TB_SUPPORT should that occur. Approved VOs potentially affected are CDF, DZERO, LSST. Please do not act act yet.<br />
<br />
'''Tuesday 22nd September'''<br />
* Steve J is going to undertake some GridPP/documentation usability testing. <br />
<br />
'''Tuesday 18th August'''<br />
* Lydia's document - Setup a system to do data archiving using FTS3<br />
<br />
'''Tuesday 28th July'''<br />
* Ewan: /cvmfs/gridpp-vo help ... there's a lot of historical stuff on the GridPP wiki that makes it look a lot more complicated than it is now. We really should have a bit of a clear out at some point.<br />
<br />
'''Tuesday 23rd June'''<br />
* Reminder that documents need reviewing!<br />
<br />
<br />
'''General note'''<br />
<br />
See the [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=reviewed worst KeyDocs list] for documents needing review now and the names of the responsible people.<br />
<br />
|}<br />
<br />
<!-- ****************End Documentation****************** -----><br />
<!-- ****************Start Interop****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Interoperation - [https://wiki.egi.eu/wiki/Grid_Operations_Meetings EGI ops agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Monday 13th July'''<br />
<br />
* There was an EGI Ops meeting today: [https://wiki.egi.eu/wiki/Agenda-13-07-2015 agenda]<br />
* URT/UMD updates:<br />
** UMD 3.13.0 released on 01.07.2015: http://repository.egi.eu/2015/07/01/release-umd-3-13-0/<br />
*** APEL 1.4.1<br />
*** Argus PAP v. 1.6.2<br />
*** gLExec-wn - v. 1.2.3 (lcmaps and mkdir)<br />
*** storm 1.11.8<br />
*** fetch-crl 3.0.16<br />
*** cream 1.16.5<br />
*** dpm-xroot 3.5.2<br />
*** Xroot 4.1.1.<br />
*** Frontier Squid 2.7.24<br />
*** CVMFS 2.1.20<br />
*** GFAL2 2.8.4<br />
*** GFAL2-PYTHON 1.7.1<br />
** UMD 3.13.1 released on 13.07.2015: http://repository.egi.eu/2015/07/13/release-candidate-umd-3-13-1-rc1/ (link was not updated correctly during release)<br />
*** ARC Nagios probes 1.8.3<br />
<br />
* SR updates (small because it's summer):<br />
*** gfal2 2.9.1<br />
*** storm 1.11.9<br />
*** srm-ifce 1.23.1....<br />
*** gfal2-python 1.8.1<br />
** In Verification<br />
*** gfal2-plugin-xrootd 0.3.4<br />
<br />
* Accounting<br />
** [John Gordon] "Of the WLCG sites we now have 97%+ of cpu reported with cores. I expect you all saw my recent email to GDB naming 16 sites. If one German and one Spanish site and the four Russians start publishing we will jump to 99%+"<br />
** New list of sites needing to update multicore accounting being prepared this evening (Monday) by Vincenzo<br />
<br />
* SL5 decommissioning date March 2016; <br />
* Next meeting 10th August<br />
<br />
'''Monday 15th June'''<br />
* There was an EGI operations meeting today: [https://wiki.egi.eu/wiki/Agenda-15-06-2015 agenda]. <br />
* New Action: for the NGIs: please start tracking which sites are still using SL5 services: how many services, and for each service if still needed on SL5, if upgrades on SL5 services are expected). [https://wiki.egi.eu/wiki/SL5_retirement A wiki has been provided to record updates]. Also interesting to understand who is using Debian.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- *********************************************************** -----><br />
<br />
|}<br />
<!-- ****************End Interop****************** -----><br />
<!-- ****************Start Monitoring****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Monitoring - [https://www.gridpp.ac.uk/wiki/Links_Monitoring_pages Links] [http://grid-monitoring.cern.ch/mywlcg/ MyWLCG]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 16th June'''<br />
* F Melaccio & D Crooks decided to add a [https://www.gridpp.ac.uk/wiki/Monitoring_FAQs FAQs section] devoted to common monitoring issues under the monitoring page.<br />
* Feedback welcome. <br />
<br />
<br />
<br />
''' Tuesday 31st March<br />
<br />
* Glasgow Graphite/Grafana documentation: http://www.scotgrid.ac.uk/graphite/<br />
<br />
'''Monday 7th December<br />
<br />
* Meeting last Friday - agenda: https://indico.cern.ch/event/356853/ minutes: https://indico.cern.ch/event/356853/material/minutes/1.pdf<br />
* This was the wrap-up meeting of the consolidation TF; the mailing list will remain extant for a while yet.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Monitoring****************** -----><br />
<!-- ****************On-duty****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | On-duty - [https://operations-portal.in2p3.fr/dashboard Dashboard] [https://www.gridpp.ac.uk/wiki/ROD_rota ROD rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 6th October'''<br />
* With the exception of the dashboard getting really confused early in the week as the Nagios instances at Oxford and Lancaster came and went, it's been a fairly quiet week. There are four outstanding tickets:<br />
** Three for availability / reliabaility (Sussex, Liverpool and Lancaster).<br />
** One at Bristol for a GridFTP transfer problem.<br />
<br />
'''Tuesday 15th September'''<br />
* Generally quiet. QMUL have some grumblyness with the CEs. However, I understand much of this is caused by the batch farm being busy. There are low-availability tickets 'on hold' for Liverpool and UCL.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Rollout [http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html Status] [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions WLCG Baseline]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 15th September'''<br />
* [http://repository.egi.eu/2015/09/10/release-umd-3-13-3/ UMD 3.13.3 is available].<br />
<br />
'''Tuesday 12th May'''<br />
* MW Readiness WG meeting Wed May 6th at 4pm. Attended by Raul, Matt, Sam and Jeremy.<br />
<br />
'''Tuesday 17th March'''<br />
* Daniela has updated the [ https://www.gridpp.ac.uk/wiki/Staged_rollout_emi3 EMI-3 testing table]. Please check it is correct for your site. We want a clear view of where we are contributing.<br />
* There is a middleware readiness meeting this Wednesday. Would be good if a few site representatives joined.<br />
* Machine job features solution testing. Fed back that we will only commence tests if more documentation made available. This stops the HTC solution until after CHEP. Is there interest in testing other batch systems? Raul mentioned SLURM. There is also SGE and Torque.<br />
<br />
'''References'''<br />
<br />
* Staged Rollout pages (now separated into EMI1 & 2), and the page listing the deployed versions is extractable from the bdii, so they should all be reasonably up-to-date:<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout_emi2.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End On-duty****************** -----><br />
<!-- ****************Start Security****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Security - [http://www.gridpp.ac.uk/security/inchand/Incident.html Incident Procedure] [http://www.gridpp.ac.uk/security/policies/index.html Policies] [https://www.gridpp.ac.uk/wiki/SD_rota Rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
'''Tuesday 13th October'''<br />
* Nothing to report<br />
* Next UK Security Team meeting scheduled for 28th Oct.<br />
<br />
'''Monday 5th October'''<br />
* Updated IGTF distribution version 1.68 available - https://dist.igtf.net/distribution/igtf/current/<br />
* Update on incident broadcast EGI-20150925-01 relating to compromised systems in China. - The EGI, WLCG and VO security teams are continuing their investigations. Affected sites and users have been contacted and there is no present indication of further action needed by any site in the UK. However, as more information comes to light, additional updates may be made in the near future and sites are asked as always to read any updates carefully, taking actions as recommended.<br />
<br />
'''Tuesday 29th September'''<br />
* Incident broadcast EGI-20150925-01 relating to compromised systems in China.<br />
* UK security team meeting scheduled for 30th Sept. <br />
<br />
'''Monday 29th September'''<br />
* IGTF has released a regular update to the trust anchor repository (1.68) - for distribution ON OR AFTER October 5th<br />
<br />
<br />
The EGI [https://operations-portal.egi.eu/csiDashboard security dashboard].<br />
<br />
|}<br />
<!-- ****************End Security****************** -----><br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start Services****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Services - [http://netmon02.grid.hep.ph.ic.ac.uk:8080/maddash-webui/index.cgi PerfSonar dashboard] | [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).<br />
<br />
'''Tuesday 6th October'''<br />
* A reminder, the next [https://indico.cern.ch/event/401680/ LHCOPN and LHCONE joint meeting]: Science Park Amsterdam (NL) 28-29 of October 2015<br />
<br />
'''Tuesday 14th July'''<br />
* GridPP35 in September will have a part focus on networking and IPv6. This will include a review of where sites are with their deployment. Please try to firm up dates for your IPv6 availability between now and September. Please update the [https://gridpp.ac.uk/wiki/IPv6_site_status GridPP IPv6 status table]. <br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Services****************** -----><br />
<!-- ****************Start Tickets****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tickets<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tickets****************** -----><br />
<!-- ****************Start Tools****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tools - [https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] [https://gridppnagios.physics.ox.ac.uk/nagios/ Nagios]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== ===== <br />
<!-- ******************Edit start********************* -----><br />
''' Tuesday 6 Oct 2015'''<br />
<br />
Moved Gridppnagios instance back to Oxford from Lancaster. It was kind of double whammy as both sites went down together. Fortunately Oxford site was partially working so we managed to start SAM Nagios at Oxford. Sam tests were unavailable for few hours but no affect on egi availibilty/reliability. Sites can have a look at https://mon.egi.eu/myegi/ss/ for a/r status. <br />
<br />
'''Tuesday 29 Sep 2015'''<br />
<br />
Following an air-conditioning problem at machine room in Oxford Tier-2 site on 26 September, gridppnagios(OX) was shut down and gridppnagios(Lancs) became active instance. Oxford site is in downtime until 1st Oct and it may be extended depending on the situation. <br />
VO-Nagios was also unavailable for two days but we have started it yesterday as it is running on a VM. VO-nagios is using oxford SE for replication test so it is failing those tests. I am looking to change to some other SE. <br />
<br />
'''Tuesday 09 June 2015'''<br />
*ARC CEs were failing nagios test becuase of non-availability of egi repository. Nagios test compare CA version from EGI repo. It started on 5th June and one of the IP addresses behind webserver was not responding. Problem went away in approximately 3 hours. The same problem started again on 6th June. Finally it was fixed on 8th June. No reason was given in any of the ticket opened regarding this outage. <br />
<br />
'''Tuesday 17th February'''<br />
* Another period where message brokers were temporarily unavailable seen yesterday. Any news on the last follow-up?<br />
<br />
'''Tuesday 27th January'''<br />
* Unscheduled outage of the EGI message broker (GRNET) caused a short-lived disruption to GridPP site monitoring (jobs failed) last Thursday 22nd January. Suspect BDII caching meant no immediate failover to stomp://mq.cro-ngi.hr:6163/ from stomp://mq.afroditi.hellasgrid.gr:6163/<br />
<br />
* [http://southgrid.blogspot.co.uk/2014/10/nagios-monitoring-for-non-lhc-vos.html Blog about VO Nagios]<br />
* [https://vo-nagios.physics.ox.ac.uk/nagios/ Oxford VO Nagios] currently monitoring gridpp, pheno, t2k.org, snoplus.snolab.ca, vo.southgrid.ac.uk.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tools****************** -----><br />
<!-- ****************Start VOs****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | VOs - [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS] [http://operations-portal.egi.eu/vo VO IDs] [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs Approved] [http://pprc.qmul.ac.uk/~walker/votable.html VO table]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 19th May'''<br />
* There is a current priority for enabling/supporting our joining communities. <br />
<br />
'''Tuesday 5th May'''<br />
* We have a number of VOs to be removed. Dedicated follow-up meeting proposed.<br />
<br />
'''Tuesday 28th April'''<br />
* For SNOPLUS.SNOLAB.CA, the port numbers for voms02.gridpp.ac.uk and voms03.gridpp.ac.uk have both been updated from 15003 to 15503.<br />
<br />
'''Tuesday 31st March'''<br />
* LIGO are in need of additional support for debugging some tests.<br />
* LSST now enabled on 3 sites. No 'own' CVMFS yet.<br />
<br />
* Impact<br />
** Citation policy (https://www.gridpp.ac.uk/acknowledging.html)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End VOs****************** -----><br />
<!-- ****************Start Sites****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Site Updates<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 24th February'''<br />
* Next review of status today.<br />
<br />
'''Tuesday 27th January'''<br />
* Squids not in GOCDB for: UCL; ECDF; Birmingham; Durham; RHUL; IC; Sussex; Lancaster<br />
* Squids in GOCDB for: EFDA-JET; Manchester; Liverpool; Cambridge; Sheffield; Bristol; Brunel; QMUL; T1; Oxford; Glasgow; RALPPD.<br />
<br />
'''Tuesday 2nd December''' <br />
* [https://www.gridpp.ac.uk/wiki/Batch_system_status Multicore status]. Queues available (63%)<br />
** YES: RAL T1; Brunel; Imperial; QMUL; Lancaster; Liverpool; Manchester; Glasgow; Cambridge; Oxford; RALPP; Sussex (12)<br />
** NO: RHUL (testing); UCL; Sheffield (testing); Durham; ECDF (testing); Birmingham; Bristol (7)<br />
<br />
* According to our [https://www.gridpp.ac.uk/wiki/Batch_system_status table] for cloud/VMs (26%)<br />
** YES: RAL T1; Brunel; Imperial; Manchester; Oxford (5)<br />
** NO: QMUL; RHUL; UCL; Lancaster; Liverpool; Sheffield; Durham; ECDF; Glasgow; Birmingham; Bristol; Cambridge; RALPP; Sussex (14)<br />
<br />
* [http://www.gridpp.ac.uk/php/gridpp-dirac-sam.php?action=viewlcg GridPP DIRAC jobs] successful (58%)<br />
** YES: Bristol; Glasgow; Lancaster; Liverpool; Manchester; Oxford; Sheffield; Brunel; IC; QMUL; RHUL (11)<br />
** NO: Cambridge; Durham; RALPP; RAL T1 (4) + ECDF; Sussex; UCL; Birmingham (4)<br />
<br />
* [https://www.gridpp.ac.uk/wiki/IPv6_site_status IPv6 status]<br />
** Allocation - 42%<br />
** YES: RAL T1; Brunel; IC; QMUL; Manchester; Sheffield; Cambridge; Oxford (8)<br />
** NO: RHUL; UCL; Lancaster; Liverpool; Durham; ECDF; Glasgow; Birmingham; Bristol; RALPP; Sussex<br />
<br />
* Dual stack nodes - 21%<br />
** YES: Brunel; IC; QMUL; Oxford (4)<br />
** NO: RHUL; UCL; Lancaster; Glasgow; Liverpool; Manchester; Sheffield; Durham; ECDF; Birmingham; Bristol; Cambridge; RALPP; Sussex, RAL T1 (15)<br />
<br />
<br />
<br />
'''Tuesday 21st October'''<br />
* High loads seen in xroot by several sites: Liverpool and RALT1... and also Bristol (see Luke's TB-S email on 16/10 for questions about changes to help).<br />
<br />
'''Tuesday 9th September'''<br />
* Intel announced the new generation of Xeon based on Haswell.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Sites****************** -----><br />
|}<br />
<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Meeting Summaries<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start PMB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Project Management Board - [http://www.gridpp.ac.uk/pmb/ Members][http://www.gridpp.ac.uk/php/pmb/minutes.php Minutes] [http://www.gridpp.ac.uk/pmb/ProjectManagement/QuarterlyReports/reports.html Quarterly Reports]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ******************************************* -----><br />
<!-- ****************End PMB****************** -----><br />
<!-- ****************Start Grid Ops****************** -----><br />
<!-- ******************************************* -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | GridPP ops meeting - [http://indico.cern.ch/categoryDisplay.py?categId=338 Agendas] [https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Actions] [https://www.gridpp.ac.uk/wiki/Category:GridPP_Operations Core Tasks]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Grid Ops****************** -----><br />
<!-- ****************Start T1 liaison****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) [https://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Agenda] Meeting takes place on Vidyo.<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
Highlights from this meeting are now included in the Tier1 report farther up this page.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Liaison****************** -----><br />
<!-- ****************Start GDB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Grid Deployment Board - [http://indico.cern.ch/categoryDisplay.py?categId=3l181 Agendas] [http://indico.cern.ch/categoryDisplay.py?categId=666 MB agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End GDB****************** -----><br />
<br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start NGI****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | NGI UK - [http://www.ukngi.ac.uk/ Homepage] [https://ca.grid-support.ac.uk/cgi-bin/pub/pki?cmd=getStaticPage&name=index CA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End NGI****************** -----><br />
<br />
<!-- ****************Start Events****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Events<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************Stop Events****************** -----><br />
<!-- ****************Start ATLAS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK ATLAS - [http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?#currentView=Shifter+view&highlight=false Shifter view] [http://www.atlas.ac.uk/ops/ News & Links]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Atlas S&C week 2-6 Feb 2015'''<br />
<br />
Production<br />
<br />
• Prodsys-2 in production since Dec 1st<br />
<br />
• Deployment has not been transparent , many issued has been solved, the grid is filled again <br />
<br />
• MC15 is expected to start soon, waiting for physics validations, evgen testing is underway and close to finalised.. Simulation expected to be broadly similar to MC14, no blockers expected. <br />
<br />
Rucio<br />
<br />
• Rucio in production since Dec 1st and is ready for LHC RUN-2. Some fields need improvements, including transfer and deletion agents, documentation and monitoring. <br />
<br />
• [https://rucio-ui.cern.ch/dumps Rucio dumps available].<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMDarkDataCleaning Dark data cleaning]<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMLostFilesLost files declaration] . Only Only DDM ops can issue lost files declaration for now, cloud support needs to fill a ticket.<br />
<br />
• Webdav panda functional tests with Hammercloud are ongoing<br />
<br />
Monitoring<br />
<br />
• [http://adc-monitoring.cern.ch/ Main page]<br />
<br />
• [http://dashb-atlas-ddm-acc.cern.ch/dashboard/request.py/ddmaccounting DDM Accounting]<br />
<br />
• [http://atlas-agis.cern.ch/agis/ddmblacklisting/listGroup space]<br />
<br />
• [http://dashb-atlas-ddm.cern.ch/ddm2/ Deletion]<br />
<br />
ASAP<br />
<br />
• ASAP (ATLAS Site Availability Performance) in place. Every 3 months the T2s sites performing BELOW 80% are [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/ATLASSiteCategorization reported to the International Computing Board.]<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End ATLAS****************** -----><br />
<!-- ****************Start CMS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK CMS<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End CMS****************** -----><br />
<!-- ****************Start LHCb****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK LHCb<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End LHCb****************** -----><br />
<!-- ****************Start Other****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK OTHER<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Other****************** -----><br />
<!-- ****************Start Requests****************** -----><br />
{| cellspacing="0" cellpadding="0" style="margin: 1em 0em 0em 0.3em; width:100%"<br />
| style="width:50%; vertical-align:top; border:1px solid Gold; background-color: LightGreen;" rowspan="2"|<br />
<div style="border-bottom:1px solid Gold; background-color:#ffffaa; padding:0.2em 0.5em 0.2em 0.5em; font-size:110%; font-weight:bold;"> To note</div><br />
<div style="padding:0.4em 1em 0.3em 1em;"><br />
<br />
==== ====<br />
<!-- ******************Edit start********************* -----><br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<br />
</div><br />
|}<br />
<br />
<!-- ****************End Requests****************** -----><br />
<br />
|}</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Past_Ticket_Bulletins_2015Past Ticket Bulletins 20152015-10-19T09:02:55Z<p>Matthew Doidge 1ac9bd3994: </p>
<hr />
<div>'''Monday 12th October 2015, 14.30 BST'''<br /><br />
23 Open UK Tickets this week. Just a light review.<br />
<br />
'''OXFORD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116752 116752] (1/10)<br /><br />
Oxford's CMS Phedex renewal ritual ticket. Chris B kindly offered in his ticket for RALPP to officiate this (un)holy task for Oxford, so I advise some communication between you chaps and him - which may well be going on, but it ain't in the ticket! Assigned (6/10)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116683 116683] (5/10)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116775 116775] (6/10)<br /><br />
CMS have by the looks of it thrown a pair of duplicate tickets Bristol's way. How rude! Lukasz has rightfully suggested closing one of them (I suggest 116683. Winnie's put a good reply in t'other one).The underlying problem appears to be a shortage of pool accounts - what are the recommended amount of accounts for VOs these days? In progress.<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] (18/6)<br /><br />
Sheffield pilot role tickets. Daniela has pointed out that as Sno+ have started using Sheffield in earnest they really could do with pilot roles enabled. In progress (9/10)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116812 116812] (8/10)<br /><br />
LHCB asked Andy to clean up the $LOGIN_POST_SCRIPT for LHCB at his site, removing a "export CMTEXTRATAGS="host-slc5"" line. Naught wrong with the ticket, but I liked how lhcb did some good debugging, and there's something about a profile script problem that reminds me of a simpler time... In progress (9/10)<br />
<br />
[http://tinyurl.com/nwgrnys '''A link to the rest of the tickets for completeness.''']<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''The other VO nagios.''']<br /><br />
Looks okay, some (known I think) errors at QM, and Sheffield seems to have a few SRM errors going on since Saturday.<br />
<br />
<br />
'''Monday 5th October 2015, 14.15 BST'''<br />
<br />
22 Open UK Tickets this month, all of them, Site by Site:<br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116136 116136] (9/9)<br /><br />
Sussex got a snoplus ticket for a high number of job failures, although simple test jobs ran okay. Matt asks if the problem persists, the reply was a resounding "not sure". In progress (think about closing) (21/9)<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116652 116652](1/10)<br /><br />
A ticket from CMS, about some important Phedex ritual that must occur on the 3rd of November, when the stars are right. The ticket needs some confirmation and feedback, plus the nomination of one site acolyte to receive the DBParam secrets from CMS - but the ticket only got assigned to sites this morning. Assigned (5/10)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116651 116651](1/10)<br /><br />
Same as the RALPP ticket, Winnie has volunteered Dr Kreczko for the task. In progress (5/10)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303](Long, long ago)<br /><br />
glexec ticket. On hold (18/5)<br />
<br />
'''DURHAM'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116576 116576](1/10)<br /><br />
Atlas ticket asking Durham to delete all files outside of the datadisk path. Oliver asks what this means for the other tokens (I think they can be sacrificed to feed datadisk, but Brian et al can confirm that). Waiting for reply (5/10)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116560 116560](30/9)<br /><br />
Sno+ jobs having trouble at Sheffield. Looks like a proxy going stale problem as only 10 Sno+ jobs at a time can run at Sheffield. Matt M asks if/how the WMS can be notified to stop sending jobs in such a case. In progress (30/9)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460](18/6)<br /><br />
Gridpp Pilot roles. No news on this for a while, after the last attempt seemed to not quite work. In progress (30/7)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116585 116585](1/10)<br /><br />
Biomed ticketed Manchester with problems from their VO nagios box - which Alessandra points out being due to there being no spare cycles for biomed to run on. Assigned (can be put on hold or closed?) (1/10)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116082 116082](7/9)<br /><br />
A classic Rod Availability ticket. On Hold (7/9)<br />
<br />
'''LANCASTER''' (a little embarrassing that my own site has the most tickets)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116478 116478] (28/9)<br /><br />
Another availability ticket, this time for Lancaster (which has been through the wars in September). Still trying to dig our way out, but even the Admin's broke. On hold (5/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116676 116676] (5/10)<br /><br />
Another ROD ticket, Lancaster's not quite out of the woods. We think WMS access is somewhat broken. We have no idea about the sha2 error. In progress (5/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116366 116366] (22/9)<br /><br />
Sno+ spotted malloc errors at Lancaster. The problems seemed to survive one batch of fixes, but I asked again if they still see problems after running a good number of jobs over the weekend. Waiting for reply (5/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (In a galaxy far, far way)<br /><br />
glexec ticket. This was supposed to be done last week, after I had figured out [https://www.gridpp.ac.uk/wiki/RelocatableGlexec "the formula"] - but then last week happened. On hold (5/10)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115959 115959] (31/8)<br /><br />
LHCB job errors at QM, with a 70% pilot failure rate on ce05. Dan couldn't see where things are breaking (only that the CE wasn't publishing to APEL- and asks if this is the cause of the problem?) Waiting for reply (5/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116662 116662] (5/10)<br /><br />
LHCB job failures on ce05 - almost certainly a duplicate of [https://ggus.eu/?mode=ticket_info&ticket_id=115959 115959], but it might have some useful information in it. Assigned (probably can be closed as a duplicate) (5/10)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116650 116650] (1/10)<br /><br />
Imperial's invitation to the CMS Phedex DBParam ritual. Daniela's on it, as well as the other CMS sites. On hold (5/10)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116649 116649] (1/10)<br /><br />
Brunel's ticket for the great DBParam alignment of 2015. On hold (5/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116455 116455] (28/9)<br /><br />
A CMS request to change the xrootd monitoring configs. Did you get round to doing this last week Raul? In progress (29/9)<br />
<br />
'''EFDA-JET'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115448 115448] (3/8)<br /><br />
Biomed having trouble tagging the jet CE. The Jet admins think this is the same underlying issues as their other ticket <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115496 115496]. In progress (25/9)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115496 115496] (5/8)<br /><br />
Biomed unable to remove files from the jet SE. There are clues that suggest that some dns oddness is the cause, but it's not clear. In progress (18/9)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116358 116358] (22/9)<br /><br />
Ticket complaining about a missing image at the site. Some to and fro, the ball is back in the site's court. In progress (2/10)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116618 116618] (1/10)<br /><br />
The Tier 1's CMS DBParam ritual ticket. In progress (5/10)<br />
<br />
Let me know if I missed ought.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''T'OTHER VO NAGIOS''']<br /><br />
At time of writing things looka a bit rough at QM, Liverpool (just getting over their downtime) and for Sno+ at Sheffield (likely related to their ticket).<br />
<br />
<br />
Matt's on holiday until the 29th, so he's being replaced with links or any update Jeremy is kind enough to provide.<br />
<br />
[http://tinyurl.com/nwgrnys '''CAST YOUR GAZE UPON THE UK'S GGUS TICKETS HERE'''].<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''FEAST YOUR EYES ON THE T'OTHER VO NAGIOS STATUS HERE''']<br />
<br />
"Normal" service will resume in October. I'll leave y'all anticipating that lovely review of all the tickets.<br />
<br />
'''Monday 14th September, 15.00 BST'''<br /><br />
<br />
Yet another brief ticket review - its been a tad busy! Hopefully normal service will resume, err, next month. My apologies.<br />
<br />
Only 20 Open UK tickets this week.<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115805 115805] - This RAL ticket from Sno+ looks like it can be closed, with there not actually being a problem with the site, or a feature that RAL would really want to implement. <br />
<br />
Keeping on the Tier 1, does this ticket concerning the FTS3 certificates ([https://ggus.eu/?mode=ticket_info&ticket_id=115290 115290]) have anyone looking into it yet?<br />
<br />
Are these two QMUL LHCB tickets duplicates, or related? [https://ggus.eu/?mode=ticket_info&ticket_id=115959 115959] & the more recent [https://ggus.eu/?mode=ticket_info&ticket_id=116153 116153]<br />
<br />
This Sussex ticket looks like it could do with someone taking a look - still just "assigned" since the 9th - [https://ggus.eu/?mode=ticket_info&ticket_id=116136 116136]<br />
<br />
Finally, an interesting VAC ticket for Oxford - "no space left on device" for atlas jobs - [https://ggus.eu/?mode=ticket_info&ticket_id=116123 116123]. Bit of an odd one!<br />
<br />
'''Monday 7th September 2015, 15.00 BST'''<br /><br />
20 Open UK Tickets this week.<br />
Due to Interesting Downtimes (apologies for reusing that pun!) yet another fairly light review. But not much is going on in ticketland.<br />
<br />
[http://tinyurl.com/nwgrnys THE GGUS TICKETS IN ALL THEIR GLORY]<br /><br />
The Sno+ ticket [https://ggus.eu/?mode=ticket_info&ticket_id=115805 115805] is interesting, Sno+ are looking at monitoring jobs submitted via WMS through the port 9000 URLS, but the RAL WMS behaves differently from the Glasgow one and doesn't let others with the same roles look at the links. Sno+ are still "developing" their grid infrastructure with the WMS in mind by the looks of it.<br />
<br />
The pilot role at Sheffield ticket [https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] could still do with some attention, or an update.<br />
<br />
Bristol had a ticket from CMS that looks interesting - [https://ggus.eu/?mode=ticket_info&ticket_id=115883 115883]. The CMS SAM3 tests are confused by Bristol having an SRM-less SE endpoint. Waiting for reply after things were clarified.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 T'Other VO Nagios Page]<br /><br />
A few "The CREAM service cannot accept jobs at the moment" style errors at QM, but they're only a few hours old. Otherwise looking alright beyond the usual noise.<br />
<br />
Of course with these light reviews I could well be missing something, so feel free to let me know - sites or VO representatives.<br />
<br />
'''Monday 24th August 2015, 15.45 BST'''<br /><br />
<br />
21 Open UK tickets this week, most being looked at or are understood. There will be no ticket update from Matt next week (1st September) either, as he will be flapping about during a local downtime.<br />
<br />
[http://tinyurl.com/nwgrnys YE OLDE GGUS TICKET LINK]<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] (gridpp pilots at Sheffield) could do with an update. The Liverpool ticket discussed last week ([https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248]) has received an update from the user saying that the ticket can be closed. As Steve mentioned last week the underlying issue is still very much there, but I don't think this ticket is a suitable banner for us to fight that battle under.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 T'Other VO Nagios Page]<br /><br />
As usual nothing to see here at time of writing, sites are doing a grand job of working for the monitored VOs.<br />
<br />
And that's all folks! See you in Liverpool.<br />
<br />
<br />
'''Monday 17th August 2015, 15.00 BST'''<br /><br />
29 Open UK Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114992 114992] (10/7)<br /><br />
It looks like this CMS transfer problem ticket can be closed after a user update last week, which reported enabling multiple streams solved the initial failures. In progress (13/8)<br />
<br />
'''EFDA-JET'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115613 115613] (assigned) ''Update- looks like this was a temporary problem, and the ticket can be closed.<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115448 115448] (in progress, but empty)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115496 115496] (in progress, but might not be a site problem).<br /><br />
Jet have 3 biomed tickets, 2 of which are looking a little neglected.<br />
<br />
'''CAMBRIDGE'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115655 115655] (12/8)<br /><br />
John rightfully asks why does a lengthy but very scheduled downtime set off a ROD alarm. Other then that it'll be worth "on-holding" this ticket whilst the unrighteous red flag fades. In progress (17/8) ''Update - On holded''<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
This Sno+ ticket looks like it needs some chasing up, no news for nearly 2 months. In progress (21/7) ''Update - Steve commented on this on TB-SUPPORT''<br />
<br />
'''Monday 10th August 2015, 15.00 BST'''<br /><br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 T'Other VO Nagios] looks alright at time of writing.<br />
<br />
24 Open UK tickets this week.<br />
<br />
''Lots of activity on GGUS since yesterday. Most positive, but I see the number of tickets at EFDA-JET increasing - all three from biomed.''<br />
<br />
'''Tier 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115512 115512] (5/8)<br /><br />
An interesting ticket - where a banned user is still banned after moving to LHCB from Biomed (no, he's not called Heinz). In Progress (6/8) ''Update - Andrew has asked that the ticket be reassigned to the argus devs as the pepd logs are showing oddness. Waiting for reply now.''<br />
<br />
'''Bristol'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115565 115565] (7/8)<br /><br />
Bristol's phedex agents are down, and have been for a few days. I might have dreamt this, but thought that the Bristol Phedex service might not be hosted at Bristol, especially after RALPP had a similar ticket ([https://ggus.eu/?mode=ticket_info&ticket_id=115566 115566]) at the same time. Assigned (7/8) ''Update - solved''<br />
<br />
'''Manchester'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115504 115504] (ROD Ticket) ''Solved'' <br /> <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115399 115399] (Wiki Ticket) ''Still Open'' <br /> <br />
Both these tickets look like they can be closed.<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115525 115525] (5/8)<br /><br />
Atlas deletion errors after a a disk server fell over- nothing wrong with the ticket handling, but Alessandra brings up a point that always niggles me - the emphasis on the total number of transaction errors and not the number of affected unique files. In progress (8/8)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB ticket sparked by those IPv6 problems. Still no word from Vladimir; Raja - could you comment? I suspect there's been plenty of room for LHCB jobs in QM's (and everyone else who mainlines atlas jobs) queues this weekend. Waiting for reply (21/7)<br />
<br />
'''Monday 3rd August 2015, 14.30 BST'''<br /><br />
<br />
23 Open tickets this month, full review time.<br />
<br />
'''''Newish this morning'''''<br /><br />
As discussed on TB-SUPPORT, a few sites have been getting "I can't lcg-tag your CE" tickets from Biomed. The Liverpool ticket [https://ggus.eu/?mode=ticket_info&ticket_id=115449 115449], solved and verified, was the flagship of these issues. Brunel ([https://ggus.eu/?mode=ticket_info&ticket_id=115445 115445]) and EFDA-JET ([https://ggus.eu/?mode=ticket_info&ticket_id=115448 115448]) also have tickets about this.<br />
<br />
<br />
'''Sno+ "glite-wms-job-status warning"''' (3/8)<br /><br />
Glasgow: [https://ggus.eu/?mode=ticket_info&ticket_id=115435 115435]<br /><br />
Tier 1: [https://ggus.eu/?mode=ticket_info&ticket_id=115434 115434]<br /><br />
Matt M submitted these tickets to Glasgow and the Tier 1 after having trouble with a proportion of Sno+ jobs. Both are being looked at- definitely worth collaborating on this one.<br />
<br />
'''Wiki-leak...'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115399 115399](31/7)<br /><br />
Jeremy noticed that the wiki didn't work for him on Friday - but it seems to work for Jeremy, Alessandra and myself now. As Jeremy notes the ticket can be closed, but out of interest did anyone else spot any problems? In progress (3/8)<br />
<br />
'''Spare the ROD...'''<br /><br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115433 115433] (3/8)<br /><br />
Some CE problems noticed on the dashboard for the Liver-lads - who might be in mourning. Assigned (3/8) ''Update - aaannd Solved by upping gridftp max connections''<br />
<br />
'''RALPP & UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114764 114764]<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114851 114851]<br /><br />
Both of these are "availability" alarm tickets, on-holding until they clear. I hope RALPP managed to get a re-computation for their unfair failures (IGTF-1.65 problems on ARC).<br />
<br />
'''Sno+'d Under'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115387 115387] (30/7)<br /><br />
I'm uncertain if this Sno+ ticket, probably somewhat related to Matt M's recent thread on TB-SUPPORT and concerning xrootd access, is meant for the Tier 1 or RALPP. Assigned (3/8)<br />
<br />
'''First Tier Problems.'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115417 115417] (2/8)<br /><br />
LHCB spotted a number of nodes with cvmfs problems at the Tier 1, which the RAL team had already jumped on and repaired this morning. They wonder if the problem persists. Waiting for reply (3/8)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115290 115290] (28/7)<br /><br />
An FTS problem requiring some special CA magic to solve, but the current CA-wizard isn't about. On hold (29/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113836 113836] (20/5)<br /><br />
Glue 1 vs Glue 2 queue mismatches. It's being worked on perfecting cluster publishing for ARC CEs, but the ticket could either do with an update or on-holding. In progress (24/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114992 114992] (10/7)<br /><br />
CMS transfers failing between RAL and, err, TAMU in the US. Assigned to RAL, where Brian has investigating and Andrew has posed an good question, asking if the user has considered managing the transfers with FTS. Quiet on the user side. In progress (21/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/2014)<br /><br />
One of the tickets from the before times, about CMS AA access tests. It has become a long and confusing saga, but Gareth rescued it with a handy summary of the issue in his last update. How goes the battle? In progress (17/7)<br />
<br />
'''Oxford Squid is red.'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115230 115230] (24/7)<br /><br />
Which might be the colour Ewan's seeing right now! The ticket is reopened, with a comment from Alessandra that the current recommendation is to allow all CERN addresses, and asks if this is something Oxford could do. Reopened (3/8) ''Update - solved after a squid restart. It's not the end of this ordeal, but Ewan would like to tackle it in a different arena then an Oxford ticket.''<br />
<br />
'''Mavaricks and Gooses - GridPP Pilot roles.'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114485 114485] '''Bristol'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] '''Sheffield'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114442 114442] '''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114441 114441] '''RHUL'''<br /><br />
Daniela hopped right back into the pilot seat after getting back from her holidays. Bristol and RALPP are looking good, Sheffield and RHUL are still in the Danger Zone - RHUL in particular were having troubles with argus and could do with some working configs from elsewhere to compare and contrast with their own.<br /><br />
''Update - RHUL are looking better, just a few queue permission tweaks to go by the looks of it.''<br />
<br />
<br />
'''My shame - tarball glexec'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] '''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] '''Lancaster'''<br /><br />
The tarball glexec tickets. Actually this is likely to become a defunct (or at least different) problem at Edinburgh with their SL7 move. Lancaster has a plan - we plan to deploy *something* (amazing plan there Matt) during our next big reinstall in September. Between now and then I have a test CE, cluster and most importantly some time. <br />
<br />
'''Pot-luck tickets (or those I couldn't group).'''<br />
<br />
'''Durham'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br /><br />
A tiny fraction of jobs publishing 0 cores used. Looks to be a slurm oddity. Oliver upgraded their CEs to ARC5 last week and hopes this has fixed things. Fingers crossed! In progress (29/7)<br />
<br />
'''Lancaster'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/2014)<br /><br />
My other shame - Lancaster's poor perfsonar performance. It's being worked on. On Hold (should be back in progress soon) (30/7)<br />
<br />
'''Liverpool'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (21/7)<br /><br />
Sno+ production problems at Liverpool, probably due to a lack of space in the shared area. Things are back in Sno+'s court, with the submitter consulting the Sno+ gurus (I think). In progress (21/7)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB job submission problems due to the known about dual-stacking problems. Waiting for input from LHCB for a while now, as things look okay at QM now but at least check LHCB jobs still weren't running for some reason. Waiting for reply (21/7)<br />
<br />
That's all folks!<br />
<br />
'''Monday 27th July 2015, 16.10 BST'''<br /><br />
<br />
Only 20 UK Tickets this week, and many are on hold for summer holidays. I pruned a few tickets, but none are striking me as needing urgent action, so this will be brief.<br />
<br />
[http://tinyurl.com/nwgrnys Mandatory UK GGUS Link]<br /><br />
Nothing to see here really, but maybe I'm missing something? Nit-picking I see:<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (publishing problems at Durham) could still do with an update - not sure if work is progressing offline on the issue.<br />
<br />
The Snoplus ticket [https://ggus.eu/?mode=ticket_info&ticket_id=115165 115165] looks like it might be of interest for others - in it Matt M asks about tape-functionality in gfal2 tools. Brian has updated the ticket clueing us in about gfal-xattr.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 UK T'other VO Nagios]<br /><br />
A few failures here at time of writing - although only one at Brunel seems to be more then a few hours old (dc2-grid-66.brunel.ac.uk is failing pheno CE tests).<br />
<br />
Let me know if I missed ought!<br />
<br />
'''Monday 20th July 2015, 14.30 BST'''<br /><br />
27 Open UK Tickets this week.<br />
<br />
'''Brunel'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115113 115113] (17/7)<br /><br />
This Brunel Ops ticket (otherwise okay) is a good reminder that when removing CEs from the GOCDB for your site make sure to get all the services, or else you just might end up still monitored (and thus end up ticketed!). Reopened (can probably be closed soon) (20/7) ''Update - ticket solved''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114006 114006] (31/5)<br /><br />
This ticket about problems with Brunels accounting figures was looking promisingly close to being solved (at least to my layman's eyes) 3 weeks ago. Any word offline? In progress (30/6)<br />
<br />
'''Lancaster'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114845 114845] (6/7)<br /><br />
LHCB jobs were failing at Lancaster a fortnight ago, but things should have been fixed quite promptly. Are they still broken? Waiting for reply (14/7)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
A similar case for this LHCB ticket for QM (part of their ongoing dual-stack saga). Waiting for reply (13/7)<br />
<br />
Note: The related issue [https://ggus.eu/index.php?mode=ticket_info&ticket_id=115017 115017] has been "solved" pending further testing. <br />
<br />
'''Sheffield'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114649 114649] (26/6)<br /><br />
This Sno+ ticket has been in "Waiting for Reply" for a little while, no word from the user (who isn't Matt M). Could we poke Sno+ though another channel about this? Waiting for reply (6/7)<br />
<br />
'''Liverpool'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
We could do with finding out from Sno+ the state of play at Liverpool too, although we might have to wait until Steve is back from his hols later this week to field any replies. In progress (17/6)<br />
<br />
'''Durham'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br /><br />
Ticket concerning the small percentage of Durham jobs that aren't publishing their core count (probably a slurm oddity). Now that Oliver's back from holiday has he had time to look at this? On Hold (19/6)<br />
<br />
'''Tier 1'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=113836 (20/5)<br /><br />
I suspect that whilst the work described chugs along in the background we consider on holding this ticket. In progress (24/6)<br />
<br />
'''UCL'''<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=114746 114746] (30/6)<br /><br />
Ben has been battling getting his DPM working, and spotted an interesting problem where SELinux was blocking the httpd from accessing mysql. It's nice to see someone not just switching SELinux off (like I have a habit of doing...). In progress (20/7)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115003 115003] (12/7)<br /><br />
Andy having some problems on a test SE at ECDF - he seems ot be suffering a series of unfortunate errors. Maybe the storage group could help? In progress (17/7)<br />
<br />
'''GridPP Pilot Roles.'''<br /><br />
Bristol are ready for testing, Govind discovered a possible bug in argus that needs a bit more testing at RHUL. Things seem quiet at RALPP and Sheffield, and Brunel too.<br />
<br />
''Supplemental - In the region multicore publishing ticket ([https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233]) only Oxford have a CE still appearing to publish 0 cores - but I thought this CE was Old-Yeller'ed?''<br />
<br />
'''Tuesday 14th July 2015, 9.30 BST'''<br />
<br />
Lazy update today, due to some fun and games at Lancaster yesterday. <br />
<br />
[http://tinyurl.com/nwgrnys UK GGUS Tickets]<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios]<br />
<br />
'''Tickets that pop:'''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114952 114952] and [https://ggus.eu/?mode=ticket_info&ticket_id=114951 114951] are both atlas frontier tickets at RALPP and Oxford, both have been reopened - although the underlying issues seem different. The Oxford ticket is similar to one at RAL ([https://ggus.eu/?mode=ticket_info&ticket_id=114957 114957]), which looks to be caused by an unannounced change in IP for some important atlas squids (AIUI - speed reading the tickets this morning).<br />
<br />
'''QM IPv6 Woes'''<br /><br />
Followers of the atlas uk lists will have noticed some heroic attempts to diagnose and repair problems at QM which appear to be someone else's fault.<br />
LHCB's ticket to QM on the matter: [https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573]<br /><br />
Dan's ticket concerning the "rogue routes": [https://ggus.eu/index.php?mode=ticket_info&ticket_id=115017 115017]<br />
<br />
'''GridPP Pilot Roles'''<br /><br />
Durham, Bristol, Sheffield, Brunel and RHUL still have open tickets about this. Bristol are working on it, as are Durham - Oliver's ready for their setup to be tested again (Puppet overwrote his last changes!). Not much recent news from the other three.<br />
<br />
That's all from me folks, let me know if I missed ought!<br />
<br />
'''Tuesday 14th July 2015, 9.30 BST'''<br />
<br />
Lazy update today, due to some fun and games at Lancaster yesterday. <br />
<br />
[http://tinyurl.com/nwgrnys UK GGUS Tickets]<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios]<br />
<br />
'''Tickets that pop:'''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114952 114952] and [https://ggus.eu/?mode=ticket_info&ticket_id=114951 114951] are both atlas frontier tickets at RALPP and Oxford, both have been reopened - although the underlying issues seem different. The Oxford ticket is similar to one at RAL ([https://ggus.eu/?mode=ticket_info&ticket_id=114957 114957]), which looks to be caused by an unannounced change in IP for some important atlas squids (AIUI - speed reading the tickets this morning).<br />
<br />
'''QM IPv6 Woes'''<br /><br />
Followers of the atlas uk lists will have noticed some heroic attempts to diagnose and repair problems at QM which appear to be someone else's fault.<br />
LHCB's ticket to QM on the matter: [https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573]<br /><br />
Dan's ticket concerning the "rogue routes": [https://ggus.eu/index.php?mode=ticket_info&ticket_id=115017 115017]<br />
<br />
'''GridPP Pilot Roles'''<br /><br />
Durham, Bristol, Sheffield, Brunel and RHUL still have open tickets about this. Bristol are working on it, as are Durham - Oliver's ready for their setup to be tested again (Puppet overwrote his last changes!). Not much recent news from the other three.<br />
<br />
That's all from me folks, let me know if I missed ought!<br />
<br />
'''Monday 6th July 2015, 14.00 BST''' <br /><br />
30 Open UK Tickets this month. Looking at them all!<br />
<br />
'''NGI'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233] (10/6)<br /><br />
The UK not publishing core counts at all sites. Some progress, but at last check John G couldn't see a change for Oxford or Glasgow. In progress (30/6) ''Update - Glasgow seems to be okay after de-creaming, checking the July list we have t2ce6 at Oxford, ce3 and ce4 at Durham (see their ticket) and cetest02 at IC (but that node has test in its hostname!).''<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114442 114442] (18/6)<br /><br />
Gridpp Pilot role ticket. Accounts need to be created, but no word for a few weeks. In progress (19/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114764 114764] (1/7)<br /><br />
Ticket tracking (false) availability issues, created to appease COD - the problem caused by a broken CA rpm release for Arc CEs. Kashif has created a counter-ticket [https://ggus.eu/index.php?mode=ticket_info&ticket_id=114742 114742] Gordon's sagely advice is to submit a recalculation request once the issue is fixed. Assigned (1/7)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114485 114485] (19/6)<br /><br />
Bristol's gridpp pilot role ticket. No news, could do with an update really. In progress (22/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114426 114426] (18/6)<br /><br />
CMS AAA reading test problems. The Bristol admins have transferred data to their new shiny SE and have asked CMS to test again. No word since. Waiting for reply (30/6)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (1/7/13...)<br /><br />
Tarball glexec ticket, now 2 years old. After a really promising burst the last 6 weeks haven't seen any progress, due to a lot of other "normal" tarball work taking up the time. Sorry! On hold (18/5)<br />
<br />
'''DURHAM'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114536 114536] (22/6)<br /><br />
Durham's gridpp pilot role ticket. Not acknowledged yet, is Oliver back yet? Assigned (22/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114765 114765] (1/7)<br /><br />
See RALPP ticket 114764. Assigned (1/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114727 114727] (30/6)<br /><br />
Catalin ticketed that a number of SW_DIR variables at Durham are still pointing to the old school .gridpp.ac.uk cvmfs space. Assigned (30/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br /><br />
John G ticketed Durham over a small percentage of jobs being published as "zero core". Looks like a SLURM timeout problem, although a fix isn't obvious. Put on the back burner whilst Oliver is on holiday. On Hold (19/6)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114649 114649] (26/6)<br /><br />
A ticket from a Sno+ user about not being able to access software using the Sheffield CEs. Acknowledged but no news. In progress (26/6) ''Update - Elena can't find anything wrong, cvmfs seems to be working fine. Perhaps a problem with the environment?''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] (18/6)<br /><br />
Sheffield's gridpp pilot role ticket. Did you get round to rolling them out? In progress (19/6)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114444 114444] (18/6)<br /><br />
LHCB ticket concerning the DPM's SRM not returning checksum information. On hold whilst a related ticket is being looked at ([https://ggus.eu/index.php?mode=ticket_info&ticket_id=111403 111403]). On Hold (22/6)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
Another Sno+ ticket, about grid production jobs failing at Liverpool. AIUI caused by Sno+ running out of space on the shared pool. At last check Steve posted the usage information for Sno+ but no word since (and Steve's off on his hols). In progress (17/6)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114845 114845] (6/7)<br /><br />
LHCB pilots failing at Lancaster. Looks like a simple node misconfiguration, hopefully fixed, waiting to see if it is. On hold (6/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/2013)<br /><br />
glexec ticket - see Edinburgh description. On hold (15/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1)<br /><br />
Bad bandwidth performance at Lancaster. Hoping that IPv6 will shake things up a bit so pushing that. On hold (18/5)<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114746 114746] (30/6)<br /><br />
SRM-put failures ROD ticket. No news at all. Assigned (30/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114851 114851] (6/7)<br /><br />
Low availability ROD ticket, related to above. Assigned (6/7)<br />
<br />
'''RHUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114441 114441] (18/6)<br /><br />
Another GridPP pilot role ticket. Pilots rolled out, but something isn't quite right and they're not working - Govind is looking again. In progress (6/7)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB ticket about two out of three QM CEs not responding for them. Dan spotted the broken CEs were dual-stacked, the working one wasn't. The ticket seemed to have trailed off into some confusion over who needs to do some testing where. I agree with Dan that that who needs to be someone with LHCB credentials! The waters still seem muddied. In progress (1/7)<br />
<br />
'''IC'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114737 114737] (30/6)<br /><br />
The IC voms wasn't updating properly, due to what I infer from the ticket as "SSL/mysql madness". Simon and Robert have been heroically battling this one - it's a good read. On hold (3/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114379 114379] (16/6)<br /><br />
Sam's ticket about SE support in Dirac. Sam will shortly try testing things out on the new Dirac to see how it fares. In progress (6/7)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114447 114447] (18/6)<br /><br />
Brunel's gridpp pilot ticket. Being worked on, with one CE with the pilots enabled. In progress (26/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114006 114006] (31/5)<br /><br />
A ticket from APEL, about Brunel under-reporting the number of jobs they are doing. Turned out to be a problem with Arc, which Raul upgraded to the fixed 5.0 version. The APEL team deleted the sync records, but no word since. In progress (30/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114850 114850] (6/7)<br /><br />
Another APEL ticket, likely the fallout of the previous one - it looks like GAP publishing has been left on for the Brunel CREAM CEs. Assigned (6/7) ''Update - solved''<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114786 114786] (2/7)<br /><br />
Low availability ticket - see RALPP ticket 114442 - probably could do with On holding. In progress (2/7) ''Update - Onholded''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113910 113910] (26/5)<br /><br />
Sno+ data staging problems. Brian gave some advice on how the large VOs do data staging from tape, and has asked if Sno+ still has problems. Matt M might still be on leave though. Waiting for reply (23/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA problems, which eventually brought to light to a problem with super-hot datasets which were alleviated (I think). Despite an update to castor that improved performance the last batch of tests didn't show improved results. No news since. In progress (17/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113836 113836] (20/5)<br /><br />
Glue mismatch problems at RAL. Working on getting "many-Arcs" to correctly publish. In progress (24/6)<br />
<br />
'''Monday 29th June 2015, 14.30 BST'''<br /><br />
<br />
Looking at the [https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''"Other VO" Nagios''']. <br /><br />
Things look generally alright - but Durham look like they need to update their CA rpms - but that might have to wait until Oliver is back from leave.<br />
<br />
'''Tarballs:'''<br /><br />
I don't think this effects many, but there's was a ticket to produce a new version of the WN tarball (which is done): [https://ggus.eu/index.php?mode=ticket_info&ticket_id=114574 114574]<br ><br />
Although AFAICS there is no urgent need to upgrade tarball WNs.<br />
<br />
26 UK Tickets, although not many stand out.<br />
<br />
'''Gridpp VO Pilot tickets:'''<br /><br />
Largely doing alright. With Oliver away the Durham ticket hasn't been looked at yet. Sheffield and Bristol's tickets could do with an update (or on-holding if there's going to be a delay). The RHUL ticket has been reopened as they're deployment of the pilot roles hasn't quite worked out.<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB having trouble with two out of three QM CEs. Dan notes that the two "broken" CEs have been recently dual-stacked, and asks if this could be the problem. The answers is a resounding "maybe", and Raja asks if problems could be duplicated by others using lxplus. Waiting for reply (24/6)<br />
<br />
'''IMPERIAL/DIRAC'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114379 114379] (16/6)<br /><br />
Sam's ticket trying to get SE support with Dirac...er, spruced up. Daniela has asked if the tests can be redone with the "new" dirac. Waiting for reply (22/6)<br />
<br />
Let me know if I missed any tickets.<br />
'''Monday 22nd June 2015, 14.00 BST'''<br />
<br />
35 Open UK Tickets this week (!!!)<br />
<br />
'''GridPP Pilot Role'''<br /><br />
A dozen of them are from Daniela (who painstakingly submitted them all) concerning getting the gridpp (and other) pilot role enabled on the site in question's CEs.<br />
<br />
An example of one of these tickets is:<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=114440 114440] (Lancaster's smug solved ticket).<br />
<br />
Ticketed sites are: Durham, Bristol, Cambridge, Glasgow, ECDF (who are also having general gridpp VO support problems), EFDA-JET (looking solved), Oxford, Liverpool, Sheffield, Brunel, RALPP and RHUL. Most tickets are being worked on fine, but the Bristol and Liverpool ones were still just in an "assigned" state at time of writing. <br /><br />
''Update - good progress on this, just one ticket left "assigned". Cambridge are done, as are JET (ticket needs to be closed). Oxford and Manchester are ready for to have their new setups tried out, with Oxford kindly road-testing glexec for the pilot roles. Good stuff.'' <br />
<br />
'''Core Count Publishing'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233] (10/6)<br /><br />
Of the sites mentioned in this ticket (Durham[1], IC, Liverpool, Glasgow, Oxford) who *hasn't* had a go at changing their core count publishing? I know Oxford have. Daniela had a pertinent question about publishing for VMs, which John answered. In progress (17/6)<br />
<br />
[1] Durham have another ticket on this which may explain their lack of core count publishing: <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br />
<br />
'''DIRAC'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114379 114379] (16/6)<br /><br />
Sam S formed this ticket over having trouble accessing the majority of SEs over Dirac, after some discussion around this last week. Sam acknowledges that this could be a site problem, not a DIRAC problem, but you gotta start somewhere (he worded that point more eloquently). Daniela has posted her latest and greatest DIRAC setup gubbins for Sam to try out. Another, unrelated, point to have are the names missing from Sam's list - for example I'm pretty sure Lancaster should support gridpp VO storage but I've forgotten to roll it out! Waiting for reply (22/6)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
Final ticket today, and another one discussed last week in the storage meeting. Steve's explanation of why (and how) Sno+ would need to start to using space tokens was fantastically well worded in a way to not spook easily startled users. David is digesting the information, but it will likely need to wait for Matt M's return before we'll see progress. In progress (16/6)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114444 114444](18/6)<br /><br />
I told a pork pie when I said that was the last ticket - this one caught my eye. A ticket from lhcb over files not having their checksums stored on Manchester's DPM. A link was given to another ticket at CBPF for a similar issue which got the DPM devs involved ([https://ggus.eu/index.php?mode=ticket_info&ticket_id=111403 111403]) - although Andrew McNab was already subscribed to the ticket. In progress (19/6)<br />
<br />
'''Monday 15th June 2015, 14.15 BST'''<br /><br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO nagios:]<br /><br />
ce05.esc.qmul.ac.uk and hepgrid11.ph.liv.ac.uk seem to be having a spot of bother for multiple VOs, and hepgrid2.ph.liv.ac.uk seems to be starting to have trouble too.<br />
<br />
19 Open UK Tickets this week.<br />
<br />
'''NGI'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233] (10/6)<br /><br />
John Gordon ticketed the NGI (as well as others) about some sites in the UK not publishing core counts with their APEL numbers (or more precisely have submission hosts at that site not publishing). Following the link it looks like Imperial, Liverpool, Durham, Glasgow and Oxford are on this list, I've listed the submission hosts reporting "0" core jobs below to help people clear up their rogues! If you've only fixed things in the last fortnight you'd still show up on this list. In progress (15/6) <br />
<br />
"0" core job submission hosts:<br /><br />
ce3.dur.scotgrid.ac.uk<br /><br />
ce4.dur.scotgrid.ac.uk<br /><br />
cetest02.grid.hep.ph.ic.ac.uk<br /><br />
hepgrid5.ph.liv.ac.uk<br /><br />
hepgrid6.ph.liv.ac.uk<br /><br />
hepgrid97.ph.liv.ac.uk<br /><br />
svr009.gla.scotgrid.ac.uk<br /><br />
t2ce06.physics.ox.ac.uk<br />
<br />
'''Tier 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113914 113914] (26/5)<br /><br />
Sno+ file copying ticket. Matt M is away on his hols, but Dave Auty has took over his duties and reports that this problem seems to have gone away - it can probably be closed. In progress (9/6)<br />
<br />
'''Liverpool'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
Another Sno+ ticket here from David concerning job failures. Nothing wrong with the ticket handling, but I thought that David's errors in submitting test jobs are worth documenting, as they were very understandable. David has since asked if the Sno+ job failures are linked to Sno+ nagios test failures at Liverpool. In progress (12/6)<br />
<br />
'''Imperial'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114157 114157] (8/6)<br /><br />
After Simon and Daniela have cleared up the atlas dark data and expanded their Space Tokens using the space freed up there still seems to be some confusion in Rucio, disagreeing with the SRM numbers. In progress (10/6)<br />
<br />
'''Oxford'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114208 114208] (9/6)<br /><br />
Oxford being ticketed for UKI-SOUTHGRID-OX-HEP_IPV6TEST failing connection tests, tests for which Oxford should not be getting ticketed afaacs. I remember this being mentioned in the Thursday cloud meeting, but I'm ashamed to say I wasn't paying attention. Were any conclusions drawn/decisions made? In progress (10/6)<br />
<br />
'''Manchester'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114153 114153] (7/6)<br /><br />
Atlas transfer failures from Manchester. Errors are still occurring as of yesterday, any news? In progress (14/6)<br />
<br />
<br />
'''Monday 8th June 2015, 15.00 BST'''<br /><br />
21 Open Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113914 113914] (26/5)<br /><br />
Sno+ had problems at the Tier 1 where jobs failed whilst uploading data, believed to be due to an incorrect VOInfoPath. There's been a failure at replicating the issue, and the VOInfoPath advertised is correct. Very confusing, as I assume it all worked at some point before! In progress (2/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113910 113910] (26/5)<br /><br />
Another Sno+ ticket, concerning lcg-cp timeouts whilst data-staging from tape. Matt M has asked for advice on the best practice for doing this, or if Sno+ would be better off just upping their timeouts. Brian has given some advice on using the "bringonline" command, but is himself unsure the best way of seeing what files are currently in a VO's cache. Not much news since. In progress (28/5) <br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114004 114004] (31/5)<br /><br />
Atlas transfers fail due to the "bring-online" timeout being exceeded. Brian spotted a problem with file timestamps mismatching, but no news on this ticket since. In progress (1/6)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114006 114006] (31/5)<br /><br />
APEL accounting oddness at Brunel, noticed by the APEL team. After much to-and-fro-ing John noticed that multiple CEs were using the same SumbitHost, and thus overwriting each other's sync records. Something to watch out for. In progress (7/6)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114157 114157] (8/6)<br /><br />
There's been some debate on the atlas lists about this ticket, a classic "not enough space at the site" ticket. Raising above the indignation over being ticketed for this, Daniela has offered a couple of TB to give some space, and pointed out that IC have some atlas data outside space tokens, and that this could be used to expand the tokens if cleaned up. Waiting for reply (8/6)<br />
<br />
<br />
'''Tuesday 26th May 2015'''<br /><br />
Matt's on leave until the 8th of June. But he's replaceable with handy links... 23 tickets today:<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios''']<br /><br />
<br />
[http://tinyurl.com/nwgrnys '''UK NGI GGUS tickets''']<br />
<br />
'''Monday 18th May 2015, 14.30 BST'''<br /><br />
Full review this week.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios''']<br /><br />
At time of writing I see problems with test jobs at Brunel for pheno and Liverpool for a number of VOs (see Sno+ ticket for probable cause and fix at Liverpool).<br />
<br />
22 Open UK Tickets this week. Going site-by-site:<br />
<br />
'''APEL/NGI'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113473 113473] (4/5)<br /><br />
Missing accounting date for April for some sites. Raul is discussing things for Brunel in the ticket, although they have republished. I think it's only ECDF left to republish their April data. In progress (16/5)<br />
<br />
'''OXFORD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113482 113482] (26/4)<br /><br />
Loss of accounting data for Oxford needing a APEL republish. The Oxford guys republished, but there is some confusion with the resulting numbers. Discussion is ongoing, John G is currently looking at the records. In progress (14/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113650 113650] (11/5)<br /><br />
CMS glideins failing at Oxford. The original problem was with a config tweak being left out of the cvmfs setup, but the ticket has been reopened citing problems persisting on the ARC CE (the CREAM appears to be fixed). Reopened (16/5)<br />
<br />
'''GLASGOW'''<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113095 113095] (17/4)<br /><br />
ROD ticket about batch system BDII failures, left open to avoid unnecessary ticket filing. Gareth noted that the full migration to ARC and HTCondor, which should see the end of these issues, will hopefully be completed by the end of June. On Hold (12/5)<br />
<br />
'''ECDF'''<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (31/7/13)<br /><br />
Somehow left this one out of the e-mail update. Edinburgh's glexec ticket, dependent on the tarball. I put in my tuppence worth today with my tarball hat on. On hold (18/5)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113769 113769] (18/5)<br /><br />
LHCB see a cvmfs problem at Sheffield. Elena has probably fixed the problem(restarted the sssd), just waiting to see if it all pans out. In progress (18/5)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113744 113744] (15/5)<br /><br />
For the VOMS rather then the site, Jens' request for the creation of the dIrac VO, vo.dirac.ac.uk. In progress (18/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113692 113692] (13/5)<br /><br />
A request from pheno to add support to for their new cvmfs area at Manchester, and as I understand it, to support them in a new "form" (pheno.egi.eu). In progress (13/5)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113742 113742] (15/5)<br /><br />
Sno+ noticed their nagios failures at Liverpool. Rob reckons this was a problem with the DPM BDII service certificate not being updated (that's bitten me too), and fixed things this morning. Let's see how that goes. In progress (18/5)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13!)<br /><br />
Lancaster's vintage glexec ticket. An update on this - after have a roundtuit session last week I was building glexec for different paths. It still needs some testing to make sure it works properly. There however definitely won't be a one-size-fits-all tarball solution. On hold (15/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/14)<br /><br />
Only the crustiest old tickets for us at Lancaster! Poor perfsonar performance. Sadly didn't get roundtuit on this one - we're pushing getting these nodes dual stacked as Ewan had pointed out that it would be interesting to see if IPv6 tests also saw this issue. On hild (18/5)<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113721 113721] (14/5)<br /><br />
The only UCL ticket, this is a egi "low availability" ticket. However Daniela notes that the plots are on the rise, so things are looking alright. Probably want to "On Hold" it but otherwise not much to be done. In progress (14/5)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113743 113743] (15/5)<br /><br />
A ticket from Durham concerning the Dirac instance at Imperial's settings for their site. Daniela hopes to get it fixed soon. In progress (15/5)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112948 112948] (10/4)<br /><br />
CA certificate update at 100IT leading to a discussion of other authentication based failures. David has asked for voms information after posting his configs. In progress (13/5)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113035 113035] (14/4)<br /><br />
Ticket tracking the decommissioning of the Tier 1 CREAM CEs. I think things are just about done now, this ticket can soon be closed. In progress (11/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694] (28/10/14)<br /><br />
Sno+ gfal-copy ticket. Brian reports that the Tier 1 is upgrading gfal2 on their WNs, and notes that there's a lot of active debugging work going on in the area. As he eloquently puts it "situation is quite fluid". In progress (13/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA tests failing at the Tier 1. There's been a lot of work on this, deploying then trying to get the new xrootd director configured. New problems have cropped up, and are under investigation. In progress (11/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (28/3)<br /><br />
Atlas transfer failures ("failed to get source file size"). Tracked to a odd double transfer error, possibly introduced in one of the recent "upgrades". Brian has been declaring these files as bad, and a workaround or solution is being thought about. In progress (14/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113705 113705] (13/5)<br /><br />
Atlas transfer failures from RAL tape. Checksum failures, which Brian tracked to being due to not being of a type Castor supports. Brian has asked if this can be changed at the CERN FTS or in rucio. Waiting for reply (14/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113748 113748] (16/5)<br /><br />
Another atlas transfer ticket, but as the error indicates no space left at the Brunel space token being transferred to Elena has noted that this isn't a site problem, telling the submitter to put in a JIRA ticket instead. Waiting for reply, but probably can be just closed (16/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112866 112866] (7/4)<br /><br />
Lots of cms job failures at RAL. This has been traced to some super-hot files, mitigation is being looked into. A candidate for perhaps On Holding, depends on the time frame of a work around. In progress (13/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113320 113320] (27/4)<br /><br />
CMS data transfer issues. I'm not actually too sure what's going on. There are files that need invalidating, which seems to be the root of the evil befalling transfers. The issue is being actively worked on though. In progress (18/5)<br />
<br />
'''Monday 11th May 2015, 14.10 BST'''<br /><br />
22 Open UK Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
There are a few tickets at the Tier 1 that are set "In Progress" but haven't received an update yet this month:<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (CMS AAA Tests, 30/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (Atlas Transfer problems, 16/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694] (SNO+ gfal copy trouble, 15/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112866 112866] (CMS job failures, 7/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112819 112819] (SNO+ arcsync troubles, 20/4)<br />
<br />
'''Other Tier 1 Tickets''' (sorry to be picking on you guys!)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (10/2)<br /><br />
Atlas glexec hammercloud test jobs at the Tier 1. It appears to be working, but a batch of test jobs failed because they couldn't find the "mkgltempdir" utility on some nodes ("slot1_5@lcg1742.gridpp.rl.ac.uk" and "slot1_4@lcg1739.gridpp.rl.ac.uk"). In progress (4/5)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=113320 113320] (27/4)<br /><br />
Maybe repeating what Daniela is going to say in the CMS update - trouble with CMS data transfers within RAL. It's under investigation, but it looks like the files in question will need to be invalidated - even if it's just to paint a clearer picture. In progress (10/5)<br />
<br />
'''APEL REPUBLISHING'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113473 113473] <br /><br />
At last update Brunel, Liverpool, Edinburgh, Birmingham and Oxford need to republish still. Oxford have their own ticket about it due to complications ([https://ggus.eu/?mode=ticket_info&ticket_id=113482 113482]).<br />
<br />
'''UCL Tickets''' - Ben is starting to move to close these, some are going to be "unsolved".<br />
<br />
'''GLASGOW'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113095 113095] (17/4)<br /><br />
Andrew asks if the timeframe for the move to Condor be added to this ticket, for the ROD team's information. On Hold (7/4)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112948 112948] (10/4)<br /><br />
No news on this 100IT ticket for a while. In progress (27/4)<br />
<br />
'''Friday 1st May'''<br /><br />
The Bank Holiday weekend might muck up plans for a Ticket review this week. Just in case, some links!<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios page.]<br /><br />
<br />
[http://tinyurl.com/l784bbg UK GGUS Tickets]<br />
<br />
Hope you all have a nice weekend!<br />
<br />
A quick check of the [https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios] page.<br />
<br />
26 Open UK tickets this week.<br />
<br />
'''ITWO Decommissioning'''<br /><br />
Three of the tickets are to the VOMS sites (Manchester, Oxford, IC), concerning the decommissioning of the ITWO VO. Just an FYI to y'all.<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113293 113293] (26/4)<br /><br />
There was an APEL problem last month where a lot of sites needed to republish their data for the month. I think Edinburgh are the only UK site that suffered this problem, but another FYI ticket. Assigned (26/4) '' And solved''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113181 113181](21/4)<br /><br />
Atlas production jobs not running at ECDF. Andy noticed that analysis jobs were running fine, and believes that this might be a problem scheduling pilots in time. Perhaps a multicore issue if this is only effecting (affecting?) production jobs. In progress (22/4) ''Update - solved''<br />
<br />
'''ATLAS GLEXEC HAMMERCLOUD PROBLEMS'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (TIER 1)<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=111703 111703] (RALPP)<br /><br />
It was discovered that there was a problem in the test code, so the ball is very much in atlas' court for this one. The problem has been fixed and the tests are being rebuilt and resubmitted.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (28/3)<br /><br />
ATLAS FTS failures too RAL. A rucio issue causing double-transfers has been discovered ([https://its.cern.ch/jira/browse/ATLDDMOPS-4939 here]), which would explain the behaviour seen. No news since this revelation. In progress (16/4)<br />
<br />
There are a number of other Tier 1 tickets that could do with either an update or On Holding<br />
<br />
'''Monday 20th April 2015, 14.30 BST'''<br /><br />
24 Open UK tickets this week, only a light review. ''Update - down to 20 open tickets as of this morning''<br />
<br />
'''NGI/TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113150 113150] (20/4)<br /><br />
Fresh in - the NGI has been ticketed to change the regional VO from emi.argus to ngi.argus in the gocdb. Seems a bit pedantic, but hey! I assigned it to the NGI ops, and notified RAL as keepers of the regional argus. Assigned (20/4) ''Update - solved''<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113035 113035] (14/4)<br /><br />
Just for people's interest, the ticket tracking the decommissioning of the last of the RAL CREAM CEs. In progress (14/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112819 112819] (2/4)<br /><br />
A SNO+ ticket I must of somehow missed last week, concerning SNO+'s manual renewing of proxies on ARC machines. Matt M has noticed that ArcSync occasionally hangs rather then timeouts smoothly (although he later notes that he doesn't see the initial problems working from a different network). I'm thinking that this should be redirected at the arc devs, but I don't think they have a GGUS support group (I could be wrong, I'm well behind on the ARC curve). In progress (7/4)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113110 113110] (17/4)<br /><br />
Looks like this atlas low transfer efficiency ticket can be closed. Waiting for reply (20/4) ''Update - solved''<br />
<br />
'''GLASGOW'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113095 113095] (17/4)<br /><br />
ROD ticket for some BDII misreporting at Glasgow. The botheration seems to be ephemeral in nature, the blunders passing with the abating of their batch system's burden. This ticket can probably be solved. In progress (17/4)<br />
<br />
<br />
'''Monday 13th April 2015, 14.00 BST'''<br /><br />
24 Open tickets this week - going over all of them this week, site by site.<br />
<br />
''Fresh in this morning - [https://ggus.eu/?mode=ticket_info&ticket_id=113010 113010] and [https://ggus.eu/?mode=ticket_info&ticket_id=113011 113011] - Sno+ tickets concerning the RAL and Glasgow WMSes not updating job statuses.''<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703](11/2)<br /><br />
Atlas glexec hammercloud tests failing. There's been a lot of waiting on atlas to build new HC jobs. The most recent exchange (delayed due to Easter), was asking about SELinux - but no news since the first. In progress (1/4)<br />
<br />
'''BIRMINGHAM'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112875 112875](7/4)<br /><br />
Low availability ROD ticket. Availability is crawling back up, just need it to go green. On hold (13/4)<br />
<br />
'''GLASGOW'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112967 112967](10/4)<br /><br />
Another ROD ticket for bdii errors at Glasgow. Gareth has been doing everything right investigating this. Kashif recommended ticketed the midmon unit, but Gareth has spotted that the errors correspond to high load on their ARC CE - so it might be a site problem after all - Gareth asks for clarification. Waiting for reply (13/4)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (1/7/13)<br /><br />
Tarball glexec ticket. No news (sorry). End of April I believe was the "deadline" I set for having this made. On Hold (9/3)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/14)<br /><br />
Lancaster's poor perfsonar performance. I'm not believing quite what I was seeing with the tests I performed so I'm aiming to rerun them. On hold (13/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13)<br /><br />
Lancaster's tarball glexec ticket. Same as ECDF. On hold (9/3)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112966 112966] (13/3)<br /><br />
A ROD cream job submit ticket, freshly assigned this afternoon. It's a bit mean of me to bring notice to it. Assigned (13/4) ''And POW, Raul closed this after kicking torque into shape - solved''<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112948 112948] (10/4)<br /><br />
100IT needed to upgrade to the latest CA release. They've done this, but there are still authentication problems. In progress (13/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356] (10/9/14)<br /><br />
Deploying vmcatcher at 100IT. After David's questions falling on deaf ears for a while it has been advised that the ticket be closed as this issue will be dealt with elsewhere. Whether or not it is to be "solved" or "unsolved" is open to debate! In progress (can possibly be closed) (13/4)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA tests failing at RAL. After a lot of work and new xrootd redirectors problems persist. It's looking to be a problem that needs the CASTOR and/or xrootd devs to look at. In progress (30/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112713 112713] (27/3)<br /><br />
CMS asking to clean up the "unmerged area". Andrew conjured up a list of files and asked if they could be deleted - CMS responded with a "yes please then close the ticket". Has the deed been done? In progress (31/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694] (28/10/14)<br /><br />
The Sno+ gfal copy ticket. Matt M still sees gfal-copy hang for files at RAL when he uses the GUID (SURL works). A Castor oddity perhaps? Matt asks a question about what problems like this (coupled with the move away from lcg tools) will mean for VOs that rely on the LFC. In progress (31/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112977 112977] (10/3)<br /><br />
CMS high job failure rate at RAL. Related to 112896 (below) - the jobs all want that file! In progress (13/3)<br />
<br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=112896 (9/4)<br /><br />
CMS Dataset access problems - caused by over a million access attempts on a single file over a 18 hour period. Andrew L comments that CMS needs to have a think about how they access pileup datasets. In progress (9/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (10/2)<br /><br />
Tier 1 counterpart to 111703. A new HC stress test was submitted near the end of March, but no news on how it did. In progress (23/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112866 112866] (7/4)<br /><br />
A different "lots of CMS job failures" ticket. Again a "hot file" seems to be the root cause. In progress (7/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (28/3)<br /><br />
An atlas file access ticket, seemingly caused by some odd FTS behaviour. No answers to Shaun's question about this odd occurrence or much noise at all till today. Waiting for reply (13/4)<br />
<br />
'''UCL'''<br /><br />
UCL has 6 tickets - 4 just "assigned". I'll just list them in the interests of brevity.<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112371 112371] (ROD low availiability, On Hold)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112841 112841] (atlas 0% transfer efficiency, assigned)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112873 112873] (ROD srm put failures, assigned)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95298 95298] (glexec ticket, on hold)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112722 112722] (atlas checksum timeouts, in progress)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112966 112966] (ROD job submit failures, assigned)<br />
<br />
'''Tuesday 7th April'''<br />
* 20 open tickets. [http://tinyurl.com/l784bbg Link to GGUS].<br />
<br />
'''Monday 23rd March 2015, 15.30 GMT'''<br /><br />
19 Open tickets this week. <br />
<br />
'''BIRMINGHAM'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=112550 (23/3)<br /><br />
A ticket fresh off the ROD dashboard - the Birmingham CREAMs aren't being matched ("BrokerHelper: no compatible resources"). Matt W has double checked their setup and can't spot anything wrong - they've been running "normal" atlas/lhcb etc jobs fine over the last few weeks. Any advice appreciated. In progress (23/3)<br />
<br />
'''TIER 1'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=112495 (21/3)<br /><br />
MICE problems running jobs at RAL, which Andrew L discovered coincided with WMS problems that he fixed. Probably should be in "Waiting for Reply/Seeing if the problem's evaporated". In progress (23/3)<br />
<br />
https://ggus.eu/?mode=ticket_info&ticket_id=112350 (14/3)<br /><br />
The cause of this Sno+ ticket, about a recent user not being able to access files due to not being in the gridmap, has been discovered. As Robert F sagely pointed out the latest version of the mkgridmap rpm is required to talk to the voms server. Just waiting on the time for it to updated at now. In progress (17/3)<br />
<br />
'''100IT'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=108356 (10/9/14)<br /><br />
This 100IT ticket is still waiting for a reply (since mid-January). The question needs to be answered by someone familiar with the technologies and terminologies? Is anyone up on vmcatcher? Anyone know what other channel to pass David's query onto? Waiting for reply (15/1)<br />
<br />
'''Monday 16th March 2015, 15.30 GMT'''<br /><br />
<br />
16 Open UK tickets this week. Half Red, Half Green.<br />
<br />
'''RALPP and TIER 1 glexec HC tickets'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703] (RALPP)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (TIER 1)<br /><br />
No news on these tickets since it was expected that a new stable HC job would be released last Tuesday. All very quiet.<br />
<br />
'''OXFORD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112011 112011](25/2)<br /><br />
Similar for this CMS glexec ticket - no news after Kashif asked for some more information way back. Waiting for reply (27/2)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694](28/10/14)<br /><br />
The Sno+ gfal copying ticket. A lot of people are working on this, and attempts to recreate the problems seem to occasionally be devolving into "did we get this complicated command right?". At some point it might be necessary to get the gfal devs involved (are there gfal devs?). Waiting for reply (11/3)<br />
<br />
Also there's the poor 100IT ticket, still waiting for a reply. The JET ticket also needs wrapping up, I put a reminder in to the end of it.<br />
<br />
<br />
'''Monday 9th March 2015, 15.00 GMT'''<br/ ><br />
From last week's crusty ticket round up:<br />
<br />
'''Tier 1'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694]- 28/10/14<br/ ><br />
Matt M has managed to reclaim his tickets after a certificate change orphaned his old ones. Progress has resumed. Duncan has asked Matt to retry his failing tests with a simple copy to local disk example.<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944]- 1/10/14<br/ ><br />
CMS AAA access at RAL. Andrew posed a question to the CMS xroot experts last week - if you have their details it might be a good idea to involve them in the ticket.<br />
<br />
'''100IT'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356]- 10/9/14<br/ ><br />
This VMCatcher ticket is still stuck waiting for a reply. Deafening silence for our 100IT colleagues. <br />
<br />
'''EFDA-JET'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=97485 97485] - 21/9/13<br/ ><br />
The Jet LHCB ticket is in the state of being wrapped up. LHCB have been removed from the local configs, and the site has been removed from LHCB's. I believe that this ticket can be terminated.<br />
<br />
'''QMUL'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353] - 25/11/14<br/ ><br />
Dan's managed to get webdav working on one SE, but not t'other. Very strange, but Dan is investigating (see also [https://ggus.eu/?mode=ticket_info&ticket_id=111942 111942]).<br />
<br />
No movement on the three glexec tickets (none expected on the two tarball ones in the last week though), the Lancaster perfsonar ticket is still waiting on another batch of local tweaks (and I still need to make sense of what I'm seeing). Matt RB closed Sussex's perfsonar ticket though - nice one.<br />
<br />
'''The "Normal" tickets:'''<br />
<br />
Atlas gLexec Hammercloud failures (RALPP and Tier 1)<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (Tier 1)<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703] (RALPP)<br/ ><br />
These tests were waiting on a new stable job release being made - this has been delayed (hopefully out tomorrow).<br />
<br />
'''TIER 1'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111856 111856](19/2)<br/ ><br />
This LHCB ticket about stalled jobs looks like it can be closed (LHCB no longer see a problem). ''Update - set to solved, the jobs were being killed for using too much memory.''<br />
<br />
'''OXFORD'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112011 112011](25/2)<br/ ><br />
A CMS user saw glexec failures on some nodes - Kashif asked for some more information but there has been no reply. I'd consider giving the user till the end of the week then closing the ticket if there's still no word. Waiting for reply (27/2)<br />
<br />
'''Tuesday 3rd March'''<br />
* [http://tinyurl.com/nwgrnys A link to GGUS open tickets] for checking status directly.<br />
<br />
Concentrating on pre-2015 tickets this week in an attempt to Spring Clean the UK ggus presence. I will review these again next week - can people please take a look at these tickets if they're owned by them (or if they think they can help!).<br />
<br />
'''SUSSEX - 26/11/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389]<br /><br />
This is a perfsonar ticket - the initial request (reinstalling the perfsonar node) has been done a while ago but things weren't quite right. Matt RB did some soothing to this last week and asks if he's missed anything - I put it to waiting for reply this morning. Waiting for reply (24/2)<br />
<br />
'''QMUL - 25/11/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353]<br /><br />
Atlas wanting https access on QM's SE. Dan's been working on this nicely, carefully testing each stage of his rollout. The end is in sight here. In progress (17/2)<br />
<br />
'''TIER 1 - 28/10/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694]<br /><br />
This is a SNO+ ticket about getting gfal tools working for the Tier 1 - with the new version out Brian has tested it correctly (and I saw a related thread on lcg-rollout) - but no word from Sno+. Who is wrangling the other VOs in these post-Walker times? Waiting for reply (24/2)<br />
<br />
'''TIER 1 - 1/10/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944]<br /><br />
A CMS access about AAA tests at the Tier 1. This ticket is being actively worked on, with a new xrootd redirector at RAL and problems with the EU redirectors mucking things up. No problems that I can see. Waiting for reply (2/3)<br />
<br />
'''100IT - 10/9/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356]<br /><br />
Getting VMcatcher stuff to work at 100IT. This ticket seems to keep stalling due to lack of documentation or replies from the submitters. Waiting for reply (19/1)<br />
<br />
'''LANCASTER - 27/1/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566]<br /><br />
Lancaster's poor perfsonar performance. Being poked and prodded on and off over the last year, but the problems remain a mystery - Ewan's lending of a iperf endpoint has helped out greatly though, waiting on yet another network tweak. On Hold (23/2)<br />
<br />
'''EFDA-JET - 21/9/13'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=97485 97485]<br /><br />
LHCB job failures at EFDA-JET. The causes of this remain a mystery, and is the first ticket on my "to be set to unsolved" list.<br />
<br />
'''UCL - 1/7/13'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95298 95298]<br /><br />
UCL's glexec ticket. Ben's been working hard at this recently, but keeps hitting show stoppers - the latest being a performance problem possibly due to the VM he's running argus on. On hold (19/2)<br />
<br />
'''ECDF and LANCASTER - 1/7/13'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (ECDF)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (Lancaster)<br /><br />
glexec for the tarball. This is *still* waiting on the tarball glexec, which is again waiting on me, which is waiting on me magicking some extra tarball development time. Will be reviewed by the end of March.<br />
<br />
<br />
'''Monday 23rd February 2015, 15.00 GMT'''<br /><br />
Only 15 Open UK tickets this week. Feel free to bring up any ticket-based issues of your own on this quiet week.<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703](11/2)<br /><br />
This CMS glexec hammercloud ticket is looking a little quiet - no update for a while. If it's continuing offline or waiting on input could it at least be put On Hold? In progress (11/2)<br />
(The Tier-1 version of this ticket, 111699, seems to be chugging along fine - there might be useful snippets in there).<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333](22/1)<br /><br />
The Cloud accounting probe ticket was reopened, asking if 100IT ticketed apel support (I assume contacting them via other means would work too) otherwise the new cloud accounting won't be properly republished. Reopened (20/2)<br />
<br />
<br />
'''IMPERIAL''' (but not really their issue)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111872 111872](20/2)<br /><br />
Tom opened a ticket after another cern@school user had troubles using the IC SE - there has been some problems with the newer versions of the dirac UI. Sometimes it's better to go Vintage! Although after trying Simon and Daniela haven't been able to reproduce the failure - perhaps something's up with the user's UI? Waiting for reply (23/2)<br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389](26/11/14)<br /><br />
Sussex's Perfsonar ticket. I know Matt RB has put the ticket On Hold and is very busy - is there any news/anything we can do to help? On Hold (21/1<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694](28/10/14)<br /><br />
Sno+ gfal copy problems at RAL. Brian informs us that the latest version of the gfal tools works for him and has asked if they work for Matt M. and Co. Did you get these packages out of epel/epel-testing or somewhere else Brian? Waiting for reply (18/2)<br />
<br />
'''Monday 16th February 2015, 14.30 GMT'''<br /><br />
Only 19 open UK tickets today.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111120 111120](12/1)<br /><br />
An atlas ticket concerning transfer failures between RAL and BNL. Brian mentioned last week that the lack of recent failures is due to atlas not attempting to transfer any older data recently. Perhaps this could do with being put into the ticket (and the ticket being put On Hold, or prodded some more)? Waiting for reply (29/1)<br />
<br />
Also<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=111800 111800] (17/2)<br /><br />
ARC CE issues at RAL detected. <br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703](11/2)<br /><br />
Atlas running glexec hammerclouds - having trouble at RALPP (and RAL, see 111699). The glexec experts have gotten involved on this one, and asked to take a peek at a proxy - not sure about anyone else, but I'd feel a tad uncomfortable sharing proxies, even with known and trusted experts as in this case. Am I being overly paranoid? Either way the ticket has gone a bit quiet. In progress (11/2) <br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356] & [https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333]<br /><br />
Both the 100IT tickets are in Waiting for reply - the oldest one for quite a while - David asked a question a while back and no answer. The newest one asks if the 100IT logs made it to apel safely - I think what David has to do is submit a ticket with this question to the apel support team - have I got the right end of the stick? <br />
<br />
'''Biomed Tickets at Manchester and Imperial'''<br /> <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356] & [https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357]<br /><br />
FYI There's a note at the bottom of both of these tickets that the version of CREAM that should fix this has been delayed until the end of February(ish).<br /><br />
''Update - I read those updates wrong - the cream update has been released and these tickets have been (perhaps erroneously) closed.'' <br />
<br />
<br />
<br />
'''Monday 9th February 2015, 15.00 GMT'''<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios Results]<br /><br />
At the time of writing the only site showing red that aren't suffering an understood problem was RALPP with org.nordugrid.ARC-CE-submit and SRM-submit test failures for gridpp, pheno, t2k and southgrid for both its CEs and its SE. The failures are between 1 and 12 hours old, so it doesn't seem to be a persistent failure, but it seems to be quite consistent. They all seem to be failing with "Job submission failed... arcsub exited with code 256: ...ERROR: Failed to connect to XXXX(IPv4):443 .... Job submission failed, no more possible targets". Anyone seen something like this before?<br />
<br />
Only 20 Open UK tickets this week.<br />
<br />
'''Biomed tickets:'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356] (Manchester)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357] (Imperial)<br /><br />
Biomed have linked both these tickets as children of 110636, being worked on by the cream blah team. AFAIKS no sign of Cream 1.16.5 just yet.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111347 111347] (22/1)<br /><br />
CMS consistency checks for January 2015. It looks like everything that was asked of RAL has been done by RAL, so hopefully this can be successfully closed. In progress (3/2)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111120 111120] (12/1)<br /><br />
Another ticket, this time concerning a period of Atlas transfer failures between RAL and BNL, that looks like it can be closed as the failures seem to have stopped (and might well have been at the BNL end). Waiting for reply (22/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA test failures at RAL. Federica can't connect to the new xrootd service according to the error messages. No news for a while. In progress (29/1)<br />
<br />
'''100IT'''<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356]<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333]<br /><br />
Both of these 100IT tickets are looking a bit crusty - the first is waiting for advice, the second was just put "In progress".<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353] (25/11/14)<br />
Dan has set up se02.esc.qmul.ac.uk to test out the latest https-accessible version of storm for dteam and atlas. As a cherry on top this node is also IPv6 enabled. I'm not sure if Dan wants others in the UK to "give it a go"? In progress (6/2)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/14)<br /><br />
(Blatantly scounging for advice) Trying to figure out why Lancaster's perfsonar is under-performing. Ewan kindly gave us access to a iperf endpoint and it's been very useful in characterising some of the weirdness - although I'm still confused. Ewan also gave us a bunch of suggestions for testing that have been useful - next stop, window sizes. If anyone else wants to throw advice to me all wisdom donations are thankfully accepted. My advice for others in be careful trying to connect to the default iperf port on a working DPM pool node.... In Progress (9/2)<br />
<br />
'''Monday 2nd February 2015, 14.00 GMT'''<br /><br />
22 Open UK tickets this month. <br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389] (26/11/14)<br /><br />
A perfsonar ticket for Sussex. Their perfsonar has been reinstalled, but needs soothing. Matt has informed us that this might have to wait a few weeks due to other issues. On Hold (21/1)<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110536 110536] (2/12/14)<br /><br />
MICE job failures at RALPP - it looked like they were dying due to running of of memory. The queues have been tweaked to give MICE more, but no word from the MICE if this has solved the problem. Waiting for reply (12/1)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110365 110365] (25/11/14)<br /><br />
Another perfsonar ticket. Again the node is reinstalled, just not quite working right. Winnie is waiting for news from the other sites in a similar boat. In progress (maybe On Hold it?) (20/1)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111118 111118] (12/1)<br /><br />
ECDF "low availability" ticket - just waiting for the silly alarm to clear. Daniela submitted a ticket about this foolish alarm a while ago - [https://ggus.eu/?mode=ticket_info&ticket_id=107689 107689]. On Hold (19/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (1/7/13)<br /><br />
glexec tarball ticket. With my tarball hat on - still no positive news on this front - it's beginning to look like this can't be done but we're having one last go. Sorry! On Hold (19/12)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110225 110225] (18/11/14)<br /><br />
Change of VO Manager for helios-vo.eu. It looks like this ticket is being held up at the user end a lot. I'm not sure there's anything we can do as it involves outside CAs. On Hold (20/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356] (23/1)<br /><br />
One of Manchester's CEs not working for biomed, due to problems with the new CREAM/old WMS communication. Alessandra gave biomed some sagely advice, but I suspect this ticket will need to be prodded soon to get a reponse from biomed (who I agree should use a newer WMS and close it). On Hold (26/1)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111547 111547] (2/2)<br /><br />
I'm reporting on a ticket that I submitted to myself today. I'm not sure what that says about the world. Anyway - a ticket to track the decommissioning of one of Lancaster's CEs, as we try to do it all proper like. On Hold (2/2)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (21/1/14)<br /><br />
Lancaster's perfsonar ticket, which I sadly let reach its first birthday. I've been prodding this offline, does anyone have the address for a regular, open iperf endpoint I could borrow? On Hold (9/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13)<br /><br />
Lancaster's tarball glexec ticket, as the ECDF one. On hold (26/1)<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13)<br /><br />
UCL's glexec ticket. They've been having trouble getting it to behave, and at last check Ben was off ill - probably due to dealing with glexec :-) On Hold (20/1)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353] (25/11/14)<br /><br />
Atlas asking for QM's storage to be made available via https. Waiting on a production ready STORM that can provide this - Dan is trying it out on his testbed se02.esc.qmul.ac.uk, which still needs tweaking. In progress (28/1)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357](23/1)<br /><br />
One of the IC CEs not working for biomed. Similar to the Manchester ticket, Daniela points to ticket 110635 and is waiting on an EMI release to fix it (due out imminently AIUI). On Hold (28/1)<br />
<br />
'''EFDA-JET'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=97485 97485] (21/9/13)<br /><br />
Jet's LCHB job failure tickets. I'm afraid I haven't been able to chase this up (partly due to only ever remembering on the first Monday of the month) - there's been no news for a while. On Hold (1/10/14)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333] (22/1)<br /><br />
A ticket to 100IT and the NGI to get the cloud accounting probe upgraded. I notified 100IT, but forgot to reassign the ticket - thanks to Jeremy for doing it. Assigned (2/2)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356](10/9/14)<br /><br />
Getting VMcatcher working at 100IT. David from 100IT has asked for some answers on which "glancepush" to use, but no reply for a while. Waiting for reply (19/1)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111477 111477](29/1)<br /><br />
CMS would like to run some staging tests to warm up for Run2. The Tier 1 warned CMS of today's outage and they're happy to proceed tomorrow (the 3rd) - I think they'd like a response. In progress (30/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=107935 107935](27/8/14)<br /><br />
A ticket regarding inconsistent BDII and SRM storage numbers. Waiting on a fix from the developers regarding read-only disk accounting (I think), Brian is still on the case. Stephen B let us know that Maria the ticket submitter is on maternity leave, and asks in her stead if the numbers are expected to align now. On hold (28/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111120 111120](12/1)<br /><br />
An atlas ticket about a large number of data transfer errors seen between RAL and BNL. Brian reckoned that this was due to shallow checksums on the old data being transferred, but had trouble looking at the BNL FTS. Regardless, the ADCoS shifter hadn't seen any errors for a week and suggests the ticket can be closed. Waiting for reply (29/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944](1/10/14)<br /><br />
CMS AAA test problems at RAL. After setting up a new xrootd box the test failures have changed in nature, but sadly they're still failures. In progress (29/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111347 111347](22/1)<br /><br />
CMS Consistency Check for RAL, January 2015 edition. Filelists were generated, orphan files were identified, then purged. Just need to know what CMS want to do next. Waiting for reply (26/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694](28/10/14)<br /><br />
Sno+ ticket concerning gfal tool problems, waiting on the new release to come out (middle of this month I believe). If you don't want to wait that long then I believe the 2.8 gfal2 tools can be found in the fts3 repo at last check. On hold (20/1)<br />
<br />
<br />
'''Monday 26th January 2015, 14.15 GMT'''<br /><br />
''Back after being forgotten about by me:''<br /><br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios Status:''']<br />
<br />
At the time of writing I see:<br /><br />
'''Imperial:''' gridpp VO job submission errors (but only 34 minutes old so probably naught to worry about).<br /><br />
'''Brunel:''' gridpp VO jobs aborted (one of these is 94 days old, so might be something to worry about).<br /><br />
'''Lancaster:''' pheno failures (I can't see what's wrong, but this CE only has 10 days left to live).<br /><br />
'''Sussex:''' snoplus failures (but I think Sussex is in downtime).<br /><br />
'''RALPP:''' A number of failures across a number of CEs, all a few hours old. An SE problem?<br /><br />
'''Sheffield:''' gridpp VO job submission failure, but only 6 hours old.<br />
And of course the srm-$VONAME failures at the Tier 1, which are caused by incompatibility between the tests and Castor AIUI. Things are generally looking good.<br />
<br />
22 Open UK Tickets this week. <br /><br />
'''NGI/100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333](22/1)<br /><br />
The NGI has been asked to upgrade the cloud accounting probe, and then notify our (only at the moment) cloud site to republish their accounting. Not entirely sure what this entails or who this falls on, I assigned it to NGI-OPERATIONS (and also noticed that 100IT isn't on the "notify site" list - odd). Assigned (22/1)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944](1/10/14)<br /><br />
CMS AAA test failures. Andrew Lahiff reported last week that the Tier 1 is building a replacement xrootd box which is currently being prepared. If that will take a while can the ticket be put on hold? In progress (19/1)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353](25/11/14)<br /><br />
An atlas ticket, asking for httpd access to at QMUL. The QM chaps were waiting on a production ready Storm that could handle this, and are preparing to test one out. This is another ticket that looks like it might need to be put On Hold (will leave that up to you chaps - there's a big difference between "slow and steady" progress and "no progress for a while"). In progress (21/1)<br />
<br />
'''RHUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111355 111355](23/1)<br /><br />
A dteam ticket - concerning http access to RHUL's SE. Although the initial observation about the SE certificate being expired was incorrect (the expiry date was reported as 5/1/15, which to be fair I would read as the 5th of January and not the 1st of May!) there still is some underlying problem here with intermittent test failures. Also this ticket raises the question of under what context are these tests being conducted? Anyone know, or shall we ask the submitter? In progress (26/1)<br />
<br />
'''BIOMED PROBLEMS:'''<br /><br />
'''Manchester:''' [https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356](23/1)<br /><br />
'''Imperial:''' [https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357](23/1)<br /><br />
Biomed are having job problems, looking to be caused by using crusty old WMSes to communicate with these site's shiny up-to-date CEs. According to ticket 110635 a cream side fix should be out by the end of January (CREAM 1.16.5), although Alessandra suggests that Biomed should try to use newer, working WMSes - or Dirac instead!<br />
<br />
<br />
'''Monday 19th January 2015, 14.30 GMT'''<br /><br />
23 Open UK Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944](1/10/14)<br /><br />
CMS seeing AAA test failures at RAL. The tests have been restarted recently and now seem to be having some suspicious looking authentication failures. In Progress (13/1)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111162 111162](14/1)<br /><br />
Atlas complaining about httpd doors not working on Sheffield's SE. After schooling the submitter in how to submit more useful information Elena is working on it. I bring this up as in the last few days I've had quite a few of my pool nodes have their httpd daemons crash on them (they're up to date, but still SL5), which may or may not be related. In Progress (19/1)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111118 111118](12/1)<br /><br />
ECDF "low availability" ticket after a few days of argus trouble, which Wahid fixed. Now the ticket will languish for a few weeks as the alarm clears. Daniela has reminded us of her ticket against these fairly silly alarms: https://ggus.eu/?mode=ticket_info&ticket_id=107689 . In the mean time this ticket could do with being put On Hold whilst the alarm clears. In progress (19/1)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356](10/9/2014)<br /><br />
Setting up VMCatcher at 100IT. After some troubles things seem to have be looking up, although there are still some questions that the 100IT chaps have for the configurations and what they should be using that aren't getting answers. I set the ticket to "Waiting for Reply" hoping that this will help get those in the know's attention. Waiting for reply (15/1)<br />
<br />
'''Perfsonar Tickets'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389](Sussex)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110382 110382](TIER 1)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108273 108273](Durham)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566](Lancaster)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110365 110365](Bristol)<br /><br />
Everyone seems to have updated their perfsonar hosts, so we're all good on that front, but a number of sites are either having trouble with their reinstalled hosts, or are having problems that they had pre-reinstall still haunt them. I'm afraid I have no suggestions of what to do about the growing number of these tickets though!</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/RelocatableGlexecRelocatableGlexec2015-10-16T15:01:56Z<p>Matthew Doidge 1ac9bd3994: /* Other Dependencies */</p>
<hr />
<div>The title is misleading - due to security concerns glexec can't be truely relocatable, but it can be built to use a different binary and config path to the defaults, allowing the exporting and use of glexec in a tarball environment.<br />
<br />
=Building GLEXEC to suit your site's tarball needs=<br />
(with reference to [[EMITarball]])<br />
<br />
'''Work in Progress'''<br />
<br />
Please note that we are unable to support glexec directly within the tarball, for many reasons. Listed below is a possible method (still being tested) for a site to build their own relocatable glexec. A group of sites using the same convention for tarball mount points could share the same glexec build to lower the total workload.<br />
<br />
We welcome all feedback on the tickets listed below, or to the tarball support e-mail ( tarball-grid-support atSPAMNOT cern.ch ).<br />
<br />
==Acknowledgements and Further Reading==<br />
Please refer to the glexec web pages for more information:<br /><br />
https://wiki.nikhef.nl/grid/GLExec<br />
<br />
with particular thanks to the writers of:<br><br />
https://wiki.nikhef.nl/grid/Building_gLExec_from_src_rpm<br />
<br />
(the script I use is an updated version of the example given).<br />
<br />
and:<br />
https://wiki.nikhef.nl/grid/GLExec_Argus_Quick_Installation_Guide<br />
<br />
==Requirements==<br />
<br />
* A clean SL6 system, similar to the nodes that you will run on. It will need network connectivity.<br />
* gcc and rpm-build packages installed, as well as the glexec user that you will use on your cluster.<br />
* The script below, or one like it:<br />
<br />
#!/bin/sh<br />
<br />
# SET CUSTOM BUILD ARGUMENTS HERE<br />
<br />
<br />
# EMI and EPEL directories<br />
glexec_pfx=/opt/gridapps/glexec<br />
glexec_etc=/opt/gridapps/glexec/etc<br />
glexec_doc=/opt/gridapps/glexec/doc<br />
<br />
<br />
# END OF BUILD ARGUMENTS<br />
<br />
# Setup build infrastructure<br />
export TOPDIR=`pwd`<br />
mkdir -p $TOPDIR/{SRPMS,SOURCES,SPECS,BUILD,RPMS/x86_64,RPMS/i386}<br />
<br />
# Download and install lcmaps-interface and glexec src<br />
rpm2cpio http://software.nikhef.nl/dist/mwsec/rpm/epel6/x86_64/lcmaps-basic-interface-1.6.1-1.el6.noarch.rpm | cpio -id<br />
rpm --define "_topdir $TOPDIR" -i http://software.nikhef.nl/dist/mwsec/rpm/epel6/SRPMS/glexec-0.9.11-1.el6.src.rpm<br />
<br />
# Patch spec file to match module directories for LCAS and LCMAPS<br />
sed -i "s+^\(%configure\).*+\1 --with-lcmaps-moduledir-sfx=$lcmaps_moddir_sfx --with-lcas-moduledir-sfx=$lcas_moddir_sfx+" $TOPDIR/SPECS/glexec.spec<br />
<br />
# Build the RPM<br />
CFLAGS=-I$TOPDIR/usr/include rpmbuild \<br />
--nodeps \<br />
-ba --define "_topdir $TOPDIR" \<br />
--define "_prefix $glexec_pfx" \<br />
--define "_sysconfdir $glexec_etc" \<br />
--define "_defaultdocdir $glexec_doc" \<br />
$TOPDIR/SPECS/glexec.sp<br />
<br />
<br />
The important site variables are glexec prefix, which should be your tarball mount point (the glexec binary will be in $prefix/sbin). The glexec_etc variable should point to where the glexec.conf file will be kept. The two rpm urls should be checked before building to make sure they are current and point to the latest and greatest release.<br />
<br />
Once run this will give you an rpm to unpack in RPMS. You can do this with an:<br />
<br />
rpm2cpio RPMS/x86_64/$glexec_rpm | cpio -dim<br />
<br />
You will then probably need to do some directory pruning before you have something you can load into your shared area. The glexec.conf file will need to have its ownership and permissions changed, probably to glexec.glexec, 0400. The glexec/sbin directory will likely need to be put into your $PATH environment variable.<br />
<br />
'''If planning on using the (recommended) setuid mode you will need to export and mount your tarballs so that glexec's suid properties aren't squashed. To this end it is recommended that you consider exporting glexec in parallel to instead of on the same mount as the "regular" tarball.<br />
<br />
==Other Dependencies==<br />
Currently this isn't sufficient to get glexec working - one needs to have additional lcas dependencies "installed" for glexec to work. The list is currently thought to be:<br />
lcmaps<br />
lcmaps-plugins-basic<br />
lcmaps-plugins-c-pep<br />
lcmaps-plugins-tracking-groupid<br />
lcmaps-plugins-verify-proxy<br />
lcmaps-plugins-voms<br />
<br />
Our suggested place to install these is within the glexec path, and point glexec at them by editing the "lcas_libdir" and "lcmaps_libdir" variables, as well as possibly the "lcas_moduledir_sfx" and "lcas_moduledir_sfx" settings in the ''glexec.conf''.<br />
<br />
Update 15 October 2015:<br />
<br />
The list of needed libraries is expanding, requiring globus gsi libraries on top of lcas/lcmaps. We are attempting to keep the number of packages needed down.<br />
<br />
==Notes on glexec.conf settings==<br />
The lcas and lcmap _lidir variables are very particular, needing to be of the form of an absolute directory, i.e.:<br />
lcas_libdir = /opt/gridapps/glexec/usr/lib64<br />
lcas_moduledir_sfx = /lcas/<br />
lcmaps_libdir = /opt/gridapps/glexec/usr/lib64<br />
lcmaps_moduledir_sfx = /lcmaps/<br />
<br />
==glexec in cvmfs==<br />
<br />
With reference to the ticket [https://ggus.eu/index.php?mode=ticket_info&ticket_id=116154 116154] we are investigating making glexec available through cvmfs - although it is early days yet, and we '''cannot''' at this juncture recommend sites mounting cvmfs with suid enabled.<br />
<br />
==ggus ticket==<br />
Please also see the original glexec tarball ticket [https://ggus.eu/index.php?mode=ticket_info&ticket_id=95832 95832] (submitted by the tarball devs to themselves).<br />
<br />
-Matt, 17th September 2015</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/RelocatableGlexecRelocatableGlexec2015-10-16T14:31:20Z<p>Matthew Doidge 1ac9bd3994: /* Acknowledgements and Further Reading */</p>
<hr />
<div>The title is misleading - due to security concerns glexec can't be truely relocatable, but it can be built to use a different binary and config path to the defaults, allowing the exporting and use of glexec in a tarball environment.<br />
<br />
=Building GLEXEC to suit your site's tarball needs=<br />
(with reference to [[EMITarball]])<br />
<br />
'''Work in Progress'''<br />
<br />
Please note that we are unable to support glexec directly within the tarball, for many reasons. Listed below is a possible method (still being tested) for a site to build their own relocatable glexec. A group of sites using the same convention for tarball mount points could share the same glexec build to lower the total workload.<br />
<br />
We welcome all feedback on the tickets listed below, or to the tarball support e-mail ( tarball-grid-support atSPAMNOT cern.ch ).<br />
<br />
==Acknowledgements and Further Reading==<br />
Please refer to the glexec web pages for more information:<br /><br />
https://wiki.nikhef.nl/grid/GLExec<br />
<br />
with particular thanks to the writers of:<br><br />
https://wiki.nikhef.nl/grid/Building_gLExec_from_src_rpm<br />
<br />
(the script I use is an updated version of the example given).<br />
<br />
and:<br />
https://wiki.nikhef.nl/grid/GLExec_Argus_Quick_Installation_Guide<br />
<br />
==Requirements==<br />
<br />
* A clean SL6 system, similar to the nodes that you will run on. It will need network connectivity.<br />
* gcc and rpm-build packages installed, as well as the glexec user that you will use on your cluster.<br />
* The script below, or one like it:<br />
<br />
#!/bin/sh<br />
<br />
# SET CUSTOM BUILD ARGUMENTS HERE<br />
<br />
<br />
# EMI and EPEL directories<br />
glexec_pfx=/opt/gridapps/glexec<br />
glexec_etc=/opt/gridapps/glexec/etc<br />
glexec_doc=/opt/gridapps/glexec/doc<br />
<br />
<br />
# END OF BUILD ARGUMENTS<br />
<br />
# Setup build infrastructure<br />
export TOPDIR=`pwd`<br />
mkdir -p $TOPDIR/{SRPMS,SOURCES,SPECS,BUILD,RPMS/x86_64,RPMS/i386}<br />
<br />
# Download and install lcmaps-interface and glexec src<br />
rpm2cpio http://software.nikhef.nl/dist/mwsec/rpm/epel6/x86_64/lcmaps-basic-interface-1.6.1-1.el6.noarch.rpm | cpio -id<br />
rpm --define "_topdir $TOPDIR" -i http://software.nikhef.nl/dist/mwsec/rpm/epel6/SRPMS/glexec-0.9.11-1.el6.src.rpm<br />
<br />
# Patch spec file to match module directories for LCAS and LCMAPS<br />
sed -i "s+^\(%configure\).*+\1 --with-lcmaps-moduledir-sfx=$lcmaps_moddir_sfx --with-lcas-moduledir-sfx=$lcas_moddir_sfx+" $TOPDIR/SPECS/glexec.spec<br />
<br />
# Build the RPM<br />
CFLAGS=-I$TOPDIR/usr/include rpmbuild \<br />
--nodeps \<br />
-ba --define "_topdir $TOPDIR" \<br />
--define "_prefix $glexec_pfx" \<br />
--define "_sysconfdir $glexec_etc" \<br />
--define "_defaultdocdir $glexec_doc" \<br />
$TOPDIR/SPECS/glexec.sp<br />
<br />
<br />
The important site variables are glexec prefix, which should be your tarball mount point (the glexec binary will be in $prefix/sbin). The glexec_etc variable should point to where the glexec.conf file will be kept. The two rpm urls should be checked before building to make sure they are current and point to the latest and greatest release.<br />
<br />
Once run this will give you an rpm to unpack in RPMS. You can do this with an:<br />
<br />
rpm2cpio RPMS/x86_64/$glexec_rpm | cpio -dim<br />
<br />
You will then probably need to do some directory pruning before you have something you can load into your shared area. The glexec.conf file will need to have its ownership and permissions changed, probably to glexec.glexec, 0400. The glexec/sbin directory will likely need to be put into your $PATH environment variable.<br />
<br />
'''If planning on using the (recommended) setuid mode you will need to export and mount your tarballs so that glexec's suid properties aren't squashed. To this end it is recommended that you consider exporting glexec in parallel to instead of on the same mount as the "regular" tarball.<br />
<br />
==Other Dependencies==<br />
Currently this isn't sufficient to get glexec working - one needs to have additional lcas dependencies "installed" for glexec to work. The list is currently thought to be:<br />
lcmaps<br />
lcmaps-plugins-basic<br />
lcmaps-plugins-c-pep<br />
lcmaps-plugins-tracking-groupid<br />
lcmaps-plugins-verify-proxy<br />
lcmaps-plugins-voms<br />
<br />
Our suggested place to install these is within the glexec path, and point glexec at them by editing the "lcas_libdir" and "lcmaps_libdir" variables, as well as possibly the "lcas_moduledir_sfx" and "lcas_moduledir_sfx" settings in the ''glexec.conf''.<br />
<br />
==glexec in cvmfs==<br />
<br />
With reference to the ticket [https://ggus.eu/index.php?mode=ticket_info&ticket_id=116154 116154] we are investigating making glexec available through cvmfs - although it is early days yet, and we '''cannot''' at this juncture recommend sites mounting cvmfs with suid enabled.<br />
<br />
==ggus ticket==<br />
Please also see the original glexec tarball ticket [https://ggus.eu/index.php?mode=ticket_info&ticket_id=95832 95832] (submitted by the tarball devs to themselves).<br />
<br />
-Matt, 17th September 2015</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/RelocatableGlexecRelocatableGlexec2015-10-16T14:12:10Z<p>Matthew Doidge 1ac9bd3994: </p>
<hr />
<div>The title is misleading - due to security concerns glexec can't be truely relocatable, but it can be built to use a different binary and config path to the defaults, allowing the exporting and use of glexec in a tarball environment.<br />
<br />
=Building GLEXEC to suit your site's tarball needs=<br />
(with reference to [[EMITarball]])<br />
<br />
'''Work in Progress'''<br />
<br />
Please note that we are unable to support glexec directly within the tarball, for many reasons. Listed below is a possible method (still being tested) for a site to build their own relocatable glexec. A group of sites using the same convention for tarball mount points could share the same glexec build to lower the total workload.<br />
<br />
We welcome all feedback on the tickets listed below, or to the tarball support e-mail ( tarball-grid-support atSPAMNOT cern.ch ).<br />
<br />
==Acknowledgements and Further Reading==<br />
Please refer to the glexec web pages for more information:<br /><br />
https://wiki.nikhef.nl/grid/GLExec<br />
<br />
with particular thanks to the writers of:<br><br />
https://wiki.nikhef.nl/grid/Building_gLExec_from_src_rpm<br />
<br />
(the script I use is an updated version of the example given).<br />
<br />
==Requirements==<br />
<br />
* A clean SL6 system, similar to the nodes that you will run on. It will need network connectivity.<br />
* gcc and rpm-build packages installed, as well as the glexec user that you will use on your cluster.<br />
* The script below, or one like it:<br />
<br />
#!/bin/sh<br />
<br />
# SET CUSTOM BUILD ARGUMENTS HERE<br />
<br />
<br />
# EMI and EPEL directories<br />
glexec_pfx=/opt/gridapps/glexec<br />
glexec_etc=/opt/gridapps/glexec/etc<br />
glexec_doc=/opt/gridapps/glexec/doc<br />
<br />
<br />
# END OF BUILD ARGUMENTS<br />
<br />
# Setup build infrastructure<br />
export TOPDIR=`pwd`<br />
mkdir -p $TOPDIR/{SRPMS,SOURCES,SPECS,BUILD,RPMS/x86_64,RPMS/i386}<br />
<br />
# Download and install lcmaps-interface and glexec src<br />
rpm2cpio http://software.nikhef.nl/dist/mwsec/rpm/epel6/x86_64/lcmaps-basic-interface-1.6.1-1.el6.noarch.rpm | cpio -id<br />
rpm --define "_topdir $TOPDIR" -i http://software.nikhef.nl/dist/mwsec/rpm/epel6/SRPMS/glexec-0.9.11-1.el6.src.rpm<br />
<br />
# Patch spec file to match module directories for LCAS and LCMAPS<br />
sed -i "s+^\(%configure\).*+\1 --with-lcmaps-moduledir-sfx=$lcmaps_moddir_sfx --with-lcas-moduledir-sfx=$lcas_moddir_sfx+" $TOPDIR/SPECS/glexec.spec<br />
<br />
# Build the RPM<br />
CFLAGS=-I$TOPDIR/usr/include rpmbuild \<br />
--nodeps \<br />
-ba --define "_topdir $TOPDIR" \<br />
--define "_prefix $glexec_pfx" \<br />
--define "_sysconfdir $glexec_etc" \<br />
--define "_defaultdocdir $glexec_doc" \<br />
$TOPDIR/SPECS/glexec.sp<br />
<br />
<br />
The important site variables are glexec prefix, which should be your tarball mount point (the glexec binary will be in $prefix/sbin). The glexec_etc variable should point to where the glexec.conf file will be kept. The two rpm urls should be checked before building to make sure they are current and point to the latest and greatest release.<br />
<br />
Once run this will give you an rpm to unpack in RPMS. You can do this with an:<br />
<br />
rpm2cpio RPMS/x86_64/$glexec_rpm | cpio -dim<br />
<br />
You will then probably need to do some directory pruning before you have something you can load into your shared area. The glexec.conf file will need to have its ownership and permissions changed, probably to glexec.glexec, 0400. The glexec/sbin directory will likely need to be put into your $PATH environment variable.<br />
<br />
'''If planning on using the (recommended) setuid mode you will need to export and mount your tarballs so that glexec's suid properties aren't squashed. To this end it is recommended that you consider exporting glexec in parallel to instead of on the same mount as the "regular" tarball.<br />
<br />
==Other Dependencies==<br />
Currently this isn't sufficient to get glexec working - one needs to have additional lcas dependencies "installed" for glexec to work. The list is currently thought to be:<br />
lcmaps<br />
lcmaps-plugins-basic<br />
lcmaps-plugins-c-pep<br />
lcmaps-plugins-tracking-groupid<br />
lcmaps-plugins-verify-proxy<br />
lcmaps-plugins-voms<br />
<br />
Our suggested place to install these is within the glexec path, and point glexec at them by editing the "lcas_libdir" and "lcmaps_libdir" variables, as well as possibly the "lcas_moduledir_sfx" and "lcas_moduledir_sfx" settings in the ''glexec.conf''.<br />
<br />
==glexec in cvmfs==<br />
<br />
With reference to the ticket [https://ggus.eu/index.php?mode=ticket_info&ticket_id=116154 116154] we are investigating making glexec available through cvmfs - although it is early days yet, and we '''cannot''' at this juncture recommend sites mounting cvmfs with suid enabled.<br />
<br />
==ggus ticket==<br />
Please also see the original glexec tarball ticket [https://ggus.eu/index.php?mode=ticket_info&ticket_id=95832 95832] (submitted by the tarball devs to themselves).<br />
<br />
-Matt, 17th September 2015</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Operations_Bulletin_LatestOperations Bulletin Latest2015-10-12T14:29:14Z<p>Matthew Doidge 1ac9bd3994: /* */</p>
<hr />
<div>[[Operations_Bulletin_Archive|Bulletin archive]]<br />
<br />
__NOTOC__<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Week commencing 5th October 2015<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Task Areas<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start General****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | General updates <br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start General text*********************** ----->'''<br />
<br />
* Krishan mentions lcg-tags errors problems at EFDA-JET. <br />
* BDII installations affected by slapd crash. SL6/CentOS6 (openldap-servers-2.4.40-6) released as a security update, unfortunately from openldap-servers-2.4.40-5 an issue has been introduced which provokes the slapd process to crash under certain conditions.<br />
* Daniela notes: Location of archiving records on ARC-CEs... Beware the jobreport_options of "/var/run" which clears records on reboot.<br />
<br />
<br />
'''Tuesday 6th October'''<br />
* GOCDB has received a new service type request for ‘uk.ac.gridpp.vcycle’.<br />
* John H noted a "CVMFS problem at RAL". Apparently this was due to a misconfiguration at CERN.<br />
* With HPC DIRAC work RAL discovered a bug in how the re-use connection flag is used with the fts commands.<br />
* Here is a summary for the [http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/ September reports]:<br />
** [http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/WLCG_All_Sites_ALICE_Sep2015.pdf ALICE]: All okay.<br />
** [http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/WLCG_All_Sites_ATLAS_Sep2015.pdf ATLAS]: Lancaster: 38%:42% & Liverpool: 81%: 100%.<br />
** [http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/WLCG_All_Sites_CMS_Sep2015.pdf CMS]: All okay.<br />
** [ http://wlcg-sam.cern.ch/reports/2015/201509/wlcg/WLCG_All_Sites_LHCB_Sep2015.pdf LHCb]: QMUL: 47%:47% ; Lancaster: 75%: 80% & ECDF: 89%<br />
* The UCL storage servers are no longer used by ATLAS.<br />
* The November GDB has been moved to 4th November.<br />
<br />
<br />
'''Tuesday 29th September'''<br />
* Nagios was (and therefore the regional dashboard has) affected by a weekend A/C outage at Oxford.<br />
* Steve J reports on: Condor libglobus_common problems<br />
* There was an EGI OMB on 24th September. [https://indico.egi.eu/indico/conferenceDisplay.py?confId=2380 Agenda]. <br />
* [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGDailyMeetingsWeek150928#Monday Notes from the Monday biweekly WLCG ops meeting] are available for anyone who is interested in the latest ops news.<br />
* On the topic 'Perfsonar Bandwidth checks not running' Duncan reported a move to a [https://maddash.aglt2.org/maddash-webui/index.cgi?dashboard=Latency%20tests%20between%20all%20WLCG%20hosts full WLCG mesh].<br />
* Tom would appreciate feedback on the [https://vm36.tier2.hep.manchester.ac.uk/ GridPP website v2].<br />
* Steve Lloyd has setup a [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics2.html new metrics page] as a basis for allocating T2 hardware funding. This just uses total Disk and total Elapsed and/or CPU time. In the PMB yesterday it was agreed that Elapsed time would be used, but the results of various combinations will be watched and assessed over the coming months. One overriding reason for using Elapsed time is that CPU is not provided by all cloud implementations.<br />
<br />
<br />
<!-- **********************End General text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End General****************** -----><br />
<!-- ****************Start ops coord****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Operations Coordination - [https://indico.cern.ch/categoryDisplay.py?categId=4372 Agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start ops coord text*********************** -----><br />
'''Tuesday 6th October'''<br />
* There was a WLCG ops coordination meeting last week. [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes151001 Minutes]. [https://indico.cern.ch/event/393617/ Agenda] (which has John Gordon's accounting slides).<br />
* The highlights:<br />
** dCache sites should install the latest fix for SRM solving a vulnerability<br />
** All sites hosting a regional or local site xrootd should updgrade it at least to version 4.1.1<br />
** CMS DPM sites should consider upgrading dpm-xrootd to version 3.5.5 now (from epel-testing) or after mid October (from epel-stable) to fix a problem affecting AAA<br />
** Tier-1 sites should do their best to avoid scheduling OUTAGE downtimes at the same time as other Tier-1's supporting common LHC VOs. A calendar will be linked in the minutes of the 3 o'clock operations meeting to easily find out if there are already downtimes at a given date<br />
** The multicore accounting for WLCG is now correct for the 99.5% of the CPU time, with the few remaining issues being addressed. Corrected historical accounting data is expected to be available from the production portal by the end of the month<br />
** All LHCb sites will soon be asked to deploy the "machine features" functionality<br />
<br />
'''Tuesday 22nd September'''<br />
* There was an ops coordination meeting last Thursday: [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150917 Minutes].<br />
* Highlights:<br />
** All 4 experiments have now an agreed workflow with the T0 for tickets that should be handled by the experiment supporters and were accidentally assigned to the T0 service managers.<br />
** A new FTS3 bug fixing release 3.3.1 is now available.<br />
** A globus lib issue is causing problems with FTS3 for sites running IPv6.<br />
** The rogue Glasgow configuration management tool replacing the current configuration for VOMS with the old one was picked up and unfortunately discussed as though sites had not got the message about using the new VOMS.<br />
** No network problems experienced with the transatlantic link despite 3 out of 4 cables being unavailable.<br />
** T0 experts are investigating the slow WN performance reported by LHCb and others.<br />
** A group of experts at CERN and CMS investigate ARGUS authentication problems affecting CMS VOBOXes.<br />
** T1 & T2 sites please observe the actions requested by ATLAS and CMS (also on the WLCG Operations portal).<br />
* Actions for [https://wlcg-ops.web.cern.ch/sys-admins/tasks Sites]; [https://wlcg-ops.web.cern.ch/experiments/tasks Experiments].<br />
<br />
'''Tuesday 15th September'''<br />
* The [https://indico.cern.ch/event/393616/ next WLCG ops coordination meeting is this Thursday 17th September]. Are there any Tier-2 issues we wish to raise? Minutes will appear [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150917 here]. <br />
<br />
<br />
<!-- **********************End ops coord text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End ops coord****************** -----><br />
<br />
<!-- ****************Start T1****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-1 - [http://www.gridpp.rl.ac.uk/status/ Status Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start T1 text*********************** -----><br />
'''Tuesday 6th October'''<br />
A reminder that there is a weekly [http://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Tier-1 experiment liaison meeting]. Notes from the last meeting [https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2015-10-07 here]<br />
* The problems with the production FTS service have been resolved. A workaround to the memory leak introduced with the new version has been supplied. This, along with a reduction in the numbers of transfers queued, has enabled the service to return to normal operation.<br />
* The next step in the upgrade of the Castor Oracle databases to version 11.2.0.4 is taking place '''today'''. At the time of the meeting Castor is down. This is the upgrade of the "Pluto" database which hosts the Nameserver as well as the CMS & LHCb stager databases. The previous step took place successfully last Tuesday.<br />
* The upgrading of the Tier1's link into the RAL core network to 40Gb took place successfully on the morning of Wednesday 30th September.<br />
* There is an 'At Risk' on the Tier1 tomorrow morning for a UPS/generator load test that will take place from 10:00 to 11:00.<br />
* There was a problem with glexec for the worker nodes over the weekend caused by a configuration error. This affected our CMS availabilities badly. The problem was fixed yesterday (Monday).<br />
<!-- **********************End T1 text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End T1****************** -----><br />
<br />
<!-- ****************Start Storage & DM****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Storage & Data Management - [http://storage.esc.rl.ac.uk/weekly/ Agendas/Minutes]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150902-minutes.txt Wednesday 02 Sep]'''<br />
* Catch up with [http://www3.imperial.ac.uk/highenergyphysics/research/experiments/mice MICE]<br />
* How to do transfers of '''lots''' of files with FTS3 without the proxy timing out (in particularly if you need it vomsified)<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150812-minutes.txt Wednesday 12 Aug]'''<br />
* sort of housekeeping: data cleanups, catalogue synchronisation - in particular namespace dumps for VOs<br />
* GridPP storage/data at future events; GridPP35 and Hepix and Cloud data events<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150708-minutes.txt Wednesday 08 July]'''<br />
* Huge backlog of ATLAS data from Glasgow waiting to go to RAL, and oddly varying performance numbers - investigating<br />
* How physics data is like your Windows 95 games<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150701-minutes.txt Wednesday 01 July]'''<br />
* Feedback on CMS's proposal for listing contents of storage<br />
* Simple storage on expensive raided disks vs complicated storage on el cheapo or archive drives?<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150624-minutes.txt Wednesday 24 June]'''<br />
* Heard about the Indigo datacloud project, a H2020 project in which STFC is participating<br />
* Data transfers, theory and practice<br />
** Somewhat clunky tools to set up but perform well when they run<br />
** Will continue to work on recommendations/overview document<br />
** Worth having recommendations/experiences for different audiences - (potential) users, decision makers, techies<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End Storage & DM****************** -----><br />
<!-- ****************Start T2EVO****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-2 Evolution - [https://its.cern.ch/jira/browse/GRIDPP/ GridPP JIRA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 6 Oct'''<br />
* UCL Vac site now running LHCb test of two payloads per dual processor VM. Total of dual processor VMs at UCL now 120.<br />
<br />
'''Tuesday 29 Sep'''<br />
* UCL Vac site updated with most recent version of Vac-in-a-Box. Now running ~216 jobs: LHCb MC and ATLAS certification jobs.<br />
* Drawing up list of tasks needed to be able to run a site for GridPP-supported VOs purely using VMs (e.g. VM certification by experiments etc.)<br />
* Discussion at GridPP Technical Meeting on storage options, including xrootd-based sites (i.e. xrootd not DPM/dCache)<br />
<br />
'''Tuesday 22 Sep'''<br />
* Fortnightly [https://indico.cern.ch/category/4454/ GridPP Technical Meetings] on Fridays will have Tier-2 Evolution discussions, starting on Fri 25 Sept.<br />
<br />
'''Thursday 17 Sep'''<br />
* Task force to start developing advice for sites to simplify their operation in line with "6.2.5 Evolution of Tier-2 sites" in the GridPP5 proposal.<br />
* Mailing list for Tier-2 evolution activities: gridpp-t2evo@cern.ch - anyone welcome to join<br />
* Also [https://its.cern.ch/jira/browse/GRIDPP/ GridPP project] on the CERN JIRA service for tracking actions. Can be used with a full or lightweight CERN account. You need to be added manually or on the gridpp-ops@cern.ch mailing list to browse issues.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End T2EVO****************** -----><br />
<!-- ****************Start Accounting****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Accounting - [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html UK Grid Metrics] [[HEPSPEC06]] [http://tinyurl.com/cfbfanf Atlas Dashboard HS06]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 22nd September'''<br />
* Slight delay for Sheffield but overall okay - although there is a gap between today's date and the most recent update for all sites. Perhaps an APEL delay.<br />
<br />
'''Monday 20th July'''<br />
* Oxford publishing 0 cores from Cream today. Maybe they forgot to switch one off. [http://goc-accounting.grid-support.ac.uk/apel/jobs2_withsubmithost.html Check here]. <br />
<br />
'''Tuesday 14th July'''<br />
* QMUL and Sheffield appear to be lagging with publishing by a week. <br />
* Please check your multicore publishing status (especially those sites mentioned in June).<br />
<br />
* A reminder to keep updating the [https://www.gridpp.ac.uk/wiki/HEPSPEC06 HEPSPEC06 tables].<br />
<br />
* [http://accounting.egi.eu/egi.php?SubRegion=1.67&query=normcpu&startYear=2014&startMonth=8&endYear=2014&endMonth=9&yRange=SITE&xRange=VO&voGroup=lhc&chart=GRBAR&scale=LIN&localJobs=onlygridjobs APEL status]: An issue at Sheffield?<br />
<br />
<!-- *****************Edit stop*********************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End Accounting****************** -----><br />
<!-- ****************Start Documentation****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Documentation - [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=area KeyDocs]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- *********************************************************** -----><br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 29nd September'''<br />
Steve J: problems with voms server at fnal, voms.fnal.gov, have been detected; I will resolve them soon and may issue an update to Approved VOs, alerting sites with TB_SUPPORT should that occur. Approved VOs potentially affected are CDF, DZERO, LSST. Please do not act act yet.<br />
<br />
'''Tuesday 22nd September'''<br />
* Steve J is going to undertake some GridPP/documentation usability testing. <br />
<br />
'''Tuesday 18th August'''<br />
* Lydia's document - Setup a system to do data archiving using FTS3<br />
<br />
'''Tuesday 28th July'''<br />
* Ewan: /cvmfs/gridpp-vo help ... there's a lot of historical stuff on the GridPP wiki that makes it look a lot more complicated than it is now. We really should have a bit of a clear out at some point.<br />
<br />
'''Tuesday 23rd June'''<br />
* Reminder that documents need reviewing!<br />
<br />
<br />
'''General note'''<br />
<br />
See the [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=reviewed worst KeyDocs list] for documents needing review now and the names of the responsible people.<br />
<br />
|}<br />
<br />
<!-- ****************End Documentation****************** -----><br />
<!-- ****************Start Interop****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Interoperation - [https://wiki.egi.eu/wiki/Grid_Operations_Meetings EGI ops agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Monday 13th July'''<br />
<br />
* There was an EGI Ops meeting today: [https://wiki.egi.eu/wiki/Agenda-13-07-2015 agenda]<br />
* URT/UMD updates:<br />
** UMD 3.13.0 released on 01.07.2015: http://repository.egi.eu/2015/07/01/release-umd-3-13-0/<br />
*** APEL 1.4.1<br />
*** Argus PAP v. 1.6.2<br />
*** gLExec-wn - v. 1.2.3 (lcmaps and mkdir)<br />
*** storm 1.11.8<br />
*** fetch-crl 3.0.16<br />
*** cream 1.16.5<br />
*** dpm-xroot 3.5.2<br />
*** Xroot 4.1.1.<br />
*** Frontier Squid 2.7.24<br />
*** CVMFS 2.1.20<br />
*** GFAL2 2.8.4<br />
*** GFAL2-PYTHON 1.7.1<br />
** UMD 3.13.1 released on 13.07.2015: http://repository.egi.eu/2015/07/13/release-candidate-umd-3-13-1-rc1/ (link was not updated correctly during release)<br />
*** ARC Nagios probes 1.8.3<br />
<br />
* SR updates (small because it's summer):<br />
*** gfal2 2.9.1<br />
*** storm 1.11.9<br />
*** srm-ifce 1.23.1....<br />
*** gfal2-python 1.8.1<br />
** In Verification<br />
*** gfal2-plugin-xrootd 0.3.4<br />
<br />
* Accounting<br />
** [John Gordon] "Of the WLCG sites we now have 97%+ of cpu reported with cores. I expect you all saw my recent email to GDB naming 16 sites. If one German and one Spanish site and the four Russians start publishing we will jump to 99%+"<br />
** New list of sites needing to update multicore accounting being prepared this evening (Monday) by Vincenzo<br />
<br />
* SL5 decommissioning date March 2016; <br />
* Next meeting 10th August<br />
<br />
'''Monday 15th June'''<br />
* There was an EGI operations meeting today: [https://wiki.egi.eu/wiki/Agenda-15-06-2015 agenda]. <br />
* New Action: for the NGIs: please start tracking which sites are still using SL5 services: how many services, and for each service if still needed on SL5, if upgrades on SL5 services are expected). [https://wiki.egi.eu/wiki/SL5_retirement A wiki has been provided to record updates]. Also interesting to understand who is using Debian.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- *********************************************************** -----><br />
<br />
|}<br />
<!-- ****************End Interop****************** -----><br />
<!-- ****************Start Monitoring****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Monitoring - [https://www.gridpp.ac.uk/wiki/Links_Monitoring_pages Links] [http://grid-monitoring.cern.ch/mywlcg/ MyWLCG]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 16th June'''<br />
* F Melaccio & D Crooks decided to add a [https://www.gridpp.ac.uk/wiki/Monitoring_FAQs FAQs section] devoted to common monitoring issues under the monitoring page.<br />
* Feedback welcome. <br />
<br />
<br />
<br />
''' Tuesday 31st March<br />
<br />
* Glasgow Graphite/Grafana documentation: http://www.scotgrid.ac.uk/graphite/<br />
<br />
'''Monday 7th December<br />
<br />
* Meeting last Friday - agenda: https://indico.cern.ch/event/356853/ minutes: https://indico.cern.ch/event/356853/material/minutes/1.pdf<br />
* This was the wrap-up meeting of the consolidation TF; the mailing list will remain extant for a while yet.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Monitoring****************** -----><br />
<!-- ****************On-duty****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | On-duty - [https://operations-portal.in2p3.fr/dashboard Dashboard] [https://www.gridpp.ac.uk/wiki/ROD_rota ROD rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 6th October'''<br />
* With the exception of the dashboard getting really confused early in the week as the Nagios instances at Oxford and Lancaster came and went, it's been a fairly quiet week. There are four outstanding tickets:<br />
** Three for availability / reliabaility (Sussex, Liverpool and Lancaster).<br />
** One at Bristol for a GridFTP transfer problem.<br />
<br />
'''Tuesday 15th September'''<br />
* Generally quiet. QMUL have some grumblyness with the CEs. However, I understand much of this is caused by the batch farm being busy. There are low-availability tickets 'on hold' for Liverpool and UCL.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Rollout [http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html Status] [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions WLCG Baseline]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 15th September'''<br />
* [http://repository.egi.eu/2015/09/10/release-umd-3-13-3/ UMD 3.13.3 is available].<br />
<br />
'''Tuesday 12th May'''<br />
* MW Readiness WG meeting Wed May 6th at 4pm. Attended by Raul, Matt, Sam and Jeremy.<br />
<br />
'''Tuesday 17th March'''<br />
* Daniela has updated the [ https://www.gridpp.ac.uk/wiki/Staged_rollout_emi3 EMI-3 testing table]. Please check it is correct for your site. We want a clear view of where we are contributing.<br />
* There is a middleware readiness meeting this Wednesday. Would be good if a few site representatives joined.<br />
* Machine job features solution testing. Fed back that we will only commence tests if more documentation made available. This stops the HTC solution until after CHEP. Is there interest in testing other batch systems? Raul mentioned SLURM. There is also SGE and Torque.<br />
<br />
'''References'''<br />
<br />
* Staged Rollout pages (now separated into EMI1 & 2), and the page listing the deployed versions is extractable from the bdii, so they should all be reasonably up-to-date:<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout_emi2.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End On-duty****************** -----><br />
<!-- ****************Start Security****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Security - [http://www.gridpp.ac.uk/security/inchand/Incident.html Incident Procedure] [http://www.gridpp.ac.uk/security/policies/index.html Policies] [https://www.gridpp.ac.uk/wiki/SD_rota Rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
'''Monday 5th October'''<br />
* Updated IGTF distribution version 1.68 available - https://dist.igtf.net/distribution/igtf/current/<br />
* Update on incident broadcast EGI-20150925-01 relating to compromised systems in China. - The EGI, WLCG and VO security teams are continuing their investigations. Affected sites and users have been contacted and there is no present indication of further action needed by any site in the UK. However, as more information comes to light, additional updates may be made in the near future and sites are asked as always to read any updates carefully, taking actions as recommended.<br />
<br />
'''Tuesday 29th September'''<br />
* Incident broadcast EGI-20150925-01 relating to compromised systems in China.<br />
* UK security team meeting scheduled for 30th Sept. <br />
<br />
'''Monday 29th September'''<br />
* IGTF has released a regular update to the trust anchor repository (1.68) - for distribution ON OR AFTER October 5th<br />
<br />
<br />
The EGI [https://operations-portal.egi.eu/csiDashboard security dashboard].<br />
<br />
|}<br />
<!-- ****************End Security****************** -----><br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start Services****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Services - [http://netmon02.grid.hep.ph.ic.ac.uk:8080/maddash-webui/index.cgi PerfSonar dashboard] | [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).<br />
<br />
'''Tuesday 6th October'''<br />
* A reminder, the next [https://indico.cern.ch/event/401680/ LHCOPN and LHCONE joint meeting]: Science Park Amsterdam (NL) 28-29 of October 2015<br />
<br />
'''Tuesday 14th July'''<br />
* GridPP35 in September will have a part focus on networking and IPv6. This will include a review of where sites are with their deployment. Please try to firm up dates for your IPv6 availability between now and September. Please update the [https://gridpp.ac.uk/wiki/IPv6_site_status GridPP IPv6 status table]. <br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Services****************** -----><br />
<!-- ****************Start Tickets****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tickets<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Monday 12th October 2015, 14.30 BST'''<br /><br />
23 Open UK Tickets this week. Just a light review.<br />
<br />
'''OXFORD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116752 116752] (1/10)<br /><br />
Oxford's CMS Phedex renewal ritual ticket. Chris B kindly offered in his ticket for RALPP to officiate this (un)holy task for Oxford, so I advise some communication between you chaps and him - which may well be going on, but it ain't in the ticket! Assigned (6/10)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116683 116683] (5/10)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116775 116775] (6/10)<br /><br />
CMS have by the looks of it thrown a pair of duplicate tickets Bristol's way. How rude! Lukasz has rightfully suggested closing one of them (I suggest 116683. Winnie's put a good reply in t'other one).The underlying problem appears to be a shortage of pool accounts - what are the recommended amount of accounts for VOs these days? In progress.<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] (18/6)<br /><br />
Sheffield pilot role tickets. Daniela has pointed out that as Sno+ have started using Sheffield in earnest they really could do with pilot roles enabled. In progress (9/10)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116812 116812] (8/10)<br /><br />
LHCB asked Andy to clean up the $LOGIN_POST_SCRIPT for LHCB at his site, removing a "export CMTEXTRATAGS="host-slc5"" line. Naught wrong with the ticket, but I liked how lhcb did some good debugging, and there's something about a profile script problem that reminds me of a simpler time... In progress (9/10)<br />
<br />
[http://tinyurl.com/nwgrnys '''A link to the rest of the tickets for completeness.''']<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''The other VO nagios.''']<br /><br />
Looks okay, some (known I think) errors at QM, and Sheffield seems to have a few SRM errors going on since Saturday.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tickets****************** -----><br />
<!-- ****************Start Tools****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tools - [https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] [https://gridppnagios.physics.ox.ac.uk/nagios/ Nagios]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== ===== <br />
<!-- ******************Edit start********************* -----><br />
''' Tuesday 6 Oct 2015'''<br />
<br />
Moved Gridppnagios instance back to Oxford from Lancaster. It was kind of double whammy as both sites went down together. Fortunately Oxford site was partially working so we managed to start SAM Nagios at Oxford. Sam tests were unavailable for few hours but no affect on egi availibilty/reliability. Sites can have a look at https://mon.egi.eu/myegi/ss/ for a/r status. <br />
<br />
'''Tuesday 29 Sep 2015'''<br />
<br />
Following an air-conditioning problem at machine room in Oxford Tier-2 site on 26 September, gridppnagios(OX) was shut down and gridppnagios(Lancs) became active instance. Oxford site is in downtime until 1st Oct and it may be extended depending on the situation. <br />
VO-Nagios was also unavailable for two days but we have started it yesterday as it is running on a VM. VO-nagios is using oxford SE for replication test so it is failing those tests. I am looking to change to some other SE. <br />
<br />
'''Tuesday 09 June 2015'''<br />
*ARC CEs were failing nagios test becuase of non-availability of egi repository. Nagios test compare CA version from EGI repo. It started on 5th June and one of the IP addresses behind webserver was not responding. Problem went away in approximately 3 hours. The same problem started again on 6th June. Finally it was fixed on 8th June. No reason was given in any of the ticket opened regarding this outage. <br />
<br />
'''Tuesday 17th February'''<br />
* Another period where message brokers were temporarily unavailable seen yesterday. Any news on the last follow-up?<br />
<br />
'''Tuesday 27th January'''<br />
* Unscheduled outage of the EGI message broker (GRNET) caused a short-lived disruption to GridPP site monitoring (jobs failed) last Thursday 22nd January. Suspect BDII caching meant no immediate failover to stomp://mq.cro-ngi.hr:6163/ from stomp://mq.afroditi.hellasgrid.gr:6163/<br />
<br />
* [http://southgrid.blogspot.co.uk/2014/10/nagios-monitoring-for-non-lhc-vos.html Blog about VO Nagios]<br />
* [https://vo-nagios.physics.ox.ac.uk/nagios/ Oxford VO Nagios] currently monitoring gridpp, pheno, t2k.org, snoplus.snolab.ca, vo.southgrid.ac.uk.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tools****************** -----><br />
<!-- ****************Start VOs****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | VOs - [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS] [http://operations-portal.egi.eu/vo VO IDs] [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs Approved] [http://pprc.qmul.ac.uk/~walker/votable.html VO table]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 19th May'''<br />
* There is a current priority for enabling/supporting our joining communities. <br />
<br />
'''Tuesday 5th May'''<br />
* We have a number of VOs to be removed. Dedicated follow-up meeting proposed.<br />
<br />
'''Tuesday 28th April'''<br />
* For SNOPLUS.SNOLAB.CA, the port numbers for voms02.gridpp.ac.uk and voms03.gridpp.ac.uk have both been updated from 15003 to 15503.<br />
<br />
'''Tuesday 31st March'''<br />
* LIGO are in need of additional support for debugging some tests.<br />
* LSST now enabled on 3 sites. No 'own' CVMFS yet.<br />
<br />
* Impact<br />
** Citation policy (https://www.gridpp.ac.uk/acknowledging.html)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End VOs****************** -----><br />
<!-- ****************Start Sites****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Site Updates<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 24th February'''<br />
* Next review of status today.<br />
<br />
'''Tuesday 27th January'''<br />
* Squids not in GOCDB for: UCL; ECDF; Birmingham; Durham; RHUL; IC; Sussex; Lancaster<br />
* Squids in GOCDB for: EFDA-JET; Manchester; Liverpool; Cambridge; Sheffield; Bristol; Brunel; QMUL; T1; Oxford; Glasgow; RALPPD.<br />
<br />
'''Tuesday 2nd December''' <br />
* [https://www.gridpp.ac.uk/wiki/Batch_system_status Multicore status]. Queues available (63%)<br />
** YES: RAL T1; Brunel; Imperial; QMUL; Lancaster; Liverpool; Manchester; Glasgow; Cambridge; Oxford; RALPP; Sussex (12)<br />
** NO: RHUL (testing); UCL; Sheffield (testing); Durham; ECDF (testing); Birmingham; Bristol (7)<br />
<br />
* According to our [https://www.gridpp.ac.uk/wiki/Batch_system_status table] for cloud/VMs (26%)<br />
** YES: RAL T1; Brunel; Imperial; Manchester; Oxford (5)<br />
** NO: QMUL; RHUL; UCL; Lancaster; Liverpool; Sheffield; Durham; ECDF; Glasgow; Birmingham; Bristol; Cambridge; RALPP; Sussex (14)<br />
<br />
* [http://www.gridpp.ac.uk/php/gridpp-dirac-sam.php?action=viewlcg GridPP DIRAC jobs] successful (58%)<br />
** YES: Bristol; Glasgow; Lancaster; Liverpool; Manchester; Oxford; Sheffield; Brunel; IC; QMUL; RHUL (11)<br />
** NO: Cambridge; Durham; RALPP; RAL T1 (4) + ECDF; Sussex; UCL; Birmingham (4)<br />
<br />
* [https://www.gridpp.ac.uk/wiki/IPv6_site_status IPv6 status]<br />
** Allocation - 42%<br />
** YES: RAL T1; Brunel; IC; QMUL; Manchester; Sheffield; Cambridge; Oxford (8)<br />
** NO: RHUL; UCL; Lancaster; Liverpool; Durham; ECDF; Glasgow; Birmingham; Bristol; RALPP; Sussex<br />
<br />
* Dual stack nodes - 21%<br />
** YES: Brunel; IC; QMUL; Oxford (4)<br />
** NO: RHUL; UCL; Lancaster; Glasgow; Liverpool; Manchester; Sheffield; Durham; ECDF; Birmingham; Bristol; Cambridge; RALPP; Sussex, RAL T1 (15)<br />
<br />
<br />
<br />
'''Tuesday 21st October'''<br />
* High loads seen in xroot by several sites: Liverpool and RALT1... and also Bristol (see Luke's TB-S email on 16/10 for questions about changes to help).<br />
<br />
'''Tuesday 9th September'''<br />
* Intel announced the new generation of Xeon based on Haswell.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Sites****************** -----><br />
|}<br />
<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Meeting Summaries<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start PMB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Project Management Board - [http://www.gridpp.ac.uk/pmb/ Members][http://www.gridpp.ac.uk/php/pmb/minutes.php Minutes] [http://www.gridpp.ac.uk/pmb/ProjectManagement/QuarterlyReports/reports.html Quarterly Reports]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ******************************************* -----><br />
<!-- ****************End PMB****************** -----><br />
<!-- ****************Start Grid Ops****************** -----><br />
<!-- ******************************************* -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | GridPP ops meeting - [http://indico.cern.ch/categoryDisplay.py?categId=338 Agendas] [https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Actions] [https://www.gridpp.ac.uk/wiki/Category:GridPP_Operations Core Tasks]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Grid Ops****************** -----><br />
<!-- ****************Start T1 liaison****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) [https://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Agenda] Meeting takes place on Vidyo.<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
Highlights from this meeting are now included in the Tier1 report farther up this page.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Liaison****************** -----><br />
<!-- ****************Start GDB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Grid Deployment Board - [http://indico.cern.ch/categoryDisplay.py?categId=3l181 Agendas] [http://indico.cern.ch/categoryDisplay.py?categId=666 MB agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End GDB****************** -----><br />
<br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start NGI****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | NGI UK - [http://www.ukngi.ac.uk/ Homepage] [https://ca.grid-support.ac.uk/cgi-bin/pub/pki?cmd=getStaticPage&name=index CA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End NGI****************** -----><br />
<br />
<!-- ****************Start Events****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Events<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************Stop Events****************** -----><br />
<!-- ****************Start ATLAS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK ATLAS - [http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?#currentView=Shifter+view&highlight=false Shifter view] [http://www.atlas.ac.uk/ops/ News & Links]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Atlas S&C week 2-6 Feb 2015'''<br />
<br />
Production<br />
<br />
• Prodsys-2 in production since Dec 1st<br />
<br />
• Deployment has not been transparent , many issued has been solved, the grid is filled again <br />
<br />
• MC15 is expected to start soon, waiting for physics validations, evgen testing is underway and close to finalised.. Simulation expected to be broadly similar to MC14, no blockers expected. <br />
<br />
Rucio<br />
<br />
• Rucio in production since Dec 1st and is ready for LHC RUN-2. Some fields need improvements, including transfer and deletion agents, documentation and monitoring. <br />
<br />
• [https://rucio-ui.cern.ch/dumps Rucio dumps available].<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMDarkDataCleaning Dark data cleaning]<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMLostFilesLost files declaration] . Only Only DDM ops can issue lost files declaration for now, cloud support needs to fill a ticket.<br />
<br />
• Webdav panda functional tests with Hammercloud are ongoing<br />
<br />
Monitoring<br />
<br />
• [http://adc-monitoring.cern.ch/ Main page]<br />
<br />
• [http://dashb-atlas-ddm-acc.cern.ch/dashboard/request.py/ddmaccounting DDM Accounting]<br />
<br />
• [http://atlas-agis.cern.ch/agis/ddmblacklisting/listGroup space]<br />
<br />
• [http://dashb-atlas-ddm.cern.ch/ddm2/ Deletion]<br />
<br />
ASAP<br />
<br />
• ASAP (ATLAS Site Availability Performance) in place. Every 3 months the T2s sites performing BELOW 80% are [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/ATLASSiteCategorization reported to the International Computing Board.]<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End ATLAS****************** -----><br />
<!-- ****************Start CMS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK CMS<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End CMS****************** -----><br />
<!-- ****************Start LHCb****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK LHCb<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End LHCb****************** -----><br />
<!-- ****************Start Other****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK OTHER<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Other****************** -----><br />
<!-- ****************Start Requests****************** -----><br />
{| cellspacing="0" cellpadding="0" style="margin: 1em 0em 0em 0.3em; width:100%"<br />
| style="width:50%; vertical-align:top; border:1px solid Gold; background-color: LightGreen;" rowspan="2"|<br />
<div style="border-bottom:1px solid Gold; background-color:#ffffaa; padding:0.2em 0.5em 0.2em 0.5em; font-size:110%; font-weight:bold;"> To note</div><br />
<div style="padding:0.4em 1em 0.3em 1em;"><br />
<br />
==== ====<br />
<!-- ******************Edit start********************* -----><br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<br />
</div><br />
|}<br />
<br />
<!-- ****************End Requests****************** -----><br />
<br />
|}</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Past_Ticket_Bulletins_2015Past Ticket Bulletins 20152015-10-12T14:24:10Z<p>Matthew Doidge 1ac9bd3994: </p>
<hr />
<div>'''Monday 5th October 2015, 14.15 BST'''<br />
<br />
22 Open UK Tickets this month, all of them, Site by Site:<br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116136 116136] (9/9)<br /><br />
Sussex got a snoplus ticket for a high number of job failures, although simple test jobs ran okay. Matt asks if the problem persists, the reply was a resounding "not sure". In progress (think about closing) (21/9)<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116652 116652](1/10)<br /><br />
A ticket from CMS, about some important Phedex ritual that must occur on the 3rd of November, when the stars are right. The ticket needs some confirmation and feedback, plus the nomination of one site acolyte to receive the DBParam secrets from CMS - but the ticket only got assigned to sites this morning. Assigned (5/10)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116651 116651](1/10)<br /><br />
Same as the RALPP ticket, Winnie has volunteered Dr Kreczko for the task. In progress (5/10)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303](Long, long ago)<br /><br />
glexec ticket. On hold (18/5)<br />
<br />
'''DURHAM'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116576 116576](1/10)<br /><br />
Atlas ticket asking Durham to delete all files outside of the datadisk path. Oliver asks what this means for the other tokens (I think they can be sacrificed to feed datadisk, but Brian et al can confirm that). Waiting for reply (5/10)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116560 116560](30/9)<br /><br />
Sno+ jobs having trouble at Sheffield. Looks like a proxy going stale problem as only 10 Sno+ jobs at a time can run at Sheffield. Matt M asks if/how the WMS can be notified to stop sending jobs in such a case. In progress (30/9)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460](18/6)<br /><br />
Gridpp Pilot roles. No news on this for a while, after the last attempt seemed to not quite work. In progress (30/7)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116585 116585](1/10)<br /><br />
Biomed ticketed Manchester with problems from their VO nagios box - which Alessandra points out being due to there being no spare cycles for biomed to run on. Assigned (can be put on hold or closed?) (1/10)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116082 116082](7/9)<br /><br />
A classic Rod Availability ticket. On Hold (7/9)<br />
<br />
'''LANCASTER''' (a little embarrassing that my own site has the most tickets)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116478 116478] (28/9)<br /><br />
Another availability ticket, this time for Lancaster (which has been through the wars in September). Still trying to dig our way out, but even the Admin's broke. On hold (5/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116676 116676] (5/10)<br /><br />
Another ROD ticket, Lancaster's not quite out of the woods. We think WMS access is somewhat broken. We have no idea about the sha2 error. In progress (5/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116366 116366] (22/9)<br /><br />
Sno+ spotted malloc errors at Lancaster. The problems seemed to survive one batch of fixes, but I asked again if they still see problems after running a good number of jobs over the weekend. Waiting for reply (5/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (In a galaxy far, far way)<br /><br />
glexec ticket. This was supposed to be done last week, after I had figured out [https://www.gridpp.ac.uk/wiki/RelocatableGlexec "the formula"] - but then last week happened. On hold (5/10)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115959 115959] (31/8)<br /><br />
LHCB job errors at QM, with a 70% pilot failure rate on ce05. Dan couldn't see where things are breaking (only that the CE wasn't publishing to APEL- and asks if this is the cause of the problem?) Waiting for reply (5/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116662 116662] (5/10)<br /><br />
LHCB job failures on ce05 - almost certainly a duplicate of [https://ggus.eu/?mode=ticket_info&ticket_id=115959 115959], but it might have some useful information in it. Assigned (probably can be closed as a duplicate) (5/10)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116650 116650] (1/10)<br /><br />
Imperial's invitation to the CMS Phedex DBParam ritual. Daniela's on it, as well as the other CMS sites. On hold (5/10)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116649 116649] (1/10)<br /><br />
Brunel's ticket for the great DBParam alignment of 2015. On hold (5/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116455 116455] (28/9)<br /><br />
A CMS request to change the xrootd monitoring configs. Did you get round to doing this last week Raul? In progress (29/9)<br />
<br />
'''EFDA-JET'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115448 115448] (3/8)<br /><br />
Biomed having trouble tagging the jet CE. The Jet admins think this is the same underlying issues as their other ticket <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115496 115496]. In progress (25/9)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115496 115496] (5/8)<br /><br />
Biomed unable to remove files from the jet SE. There are clues that suggest that some dns oddness is the cause, but it's not clear. In progress (18/9)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116358 116358] (22/9)<br /><br />
Ticket complaining about a missing image at the site. Some to and fro, the ball is back in the site's court. In progress (2/10)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116618 116618] (1/10)<br /><br />
The Tier 1's CMS DBParam ritual ticket. In progress (5/10)<br />
<br />
Let me know if I missed ought.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''T'OTHER VO NAGIOS''']<br /><br />
At time of writing things looka a bit rough at QM, Liverpool (just getting over their downtime) and for Sno+ at Sheffield (likely related to their ticket).<br />
<br />
<br />
Matt's on holiday until the 29th, so he's being replaced with links or any update Jeremy is kind enough to provide.<br />
<br />
[http://tinyurl.com/nwgrnys '''CAST YOUR GAZE UPON THE UK'S GGUS TICKETS HERE'''].<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''FEAST YOUR EYES ON THE T'OTHER VO NAGIOS STATUS HERE''']<br />
<br />
"Normal" service will resume in October. I'll leave y'all anticipating that lovely review of all the tickets.<br />
<br />
'''Monday 14th September, 15.00 BST'''<br /><br />
<br />
Yet another brief ticket review - its been a tad busy! Hopefully normal service will resume, err, next month. My apologies.<br />
<br />
Only 20 Open UK tickets this week.<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115805 115805] - This RAL ticket from Sno+ looks like it can be closed, with there not actually being a problem with the site, or a feature that RAL would really want to implement. <br />
<br />
Keeping on the Tier 1, does this ticket concerning the FTS3 certificates ([https://ggus.eu/?mode=ticket_info&ticket_id=115290 115290]) have anyone looking into it yet?<br />
<br />
Are these two QMUL LHCB tickets duplicates, or related? [https://ggus.eu/?mode=ticket_info&ticket_id=115959 115959] & the more recent [https://ggus.eu/?mode=ticket_info&ticket_id=116153 116153]<br />
<br />
This Sussex ticket looks like it could do with someone taking a look - still just "assigned" since the 9th - [https://ggus.eu/?mode=ticket_info&ticket_id=116136 116136]<br />
<br />
Finally, an interesting VAC ticket for Oxford - "no space left on device" for atlas jobs - [https://ggus.eu/?mode=ticket_info&ticket_id=116123 116123]. Bit of an odd one!<br />
<br />
'''Monday 7th September 2015, 15.00 BST'''<br /><br />
20 Open UK Tickets this week.<br />
Due to Interesting Downtimes (apologies for reusing that pun!) yet another fairly light review. But not much is going on in ticketland.<br />
<br />
[http://tinyurl.com/nwgrnys THE GGUS TICKETS IN ALL THEIR GLORY]<br /><br />
The Sno+ ticket [https://ggus.eu/?mode=ticket_info&ticket_id=115805 115805] is interesting, Sno+ are looking at monitoring jobs submitted via WMS through the port 9000 URLS, but the RAL WMS behaves differently from the Glasgow one and doesn't let others with the same roles look at the links. Sno+ are still "developing" their grid infrastructure with the WMS in mind by the looks of it.<br />
<br />
The pilot role at Sheffield ticket [https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] could still do with some attention, or an update.<br />
<br />
Bristol had a ticket from CMS that looks interesting - [https://ggus.eu/?mode=ticket_info&ticket_id=115883 115883]. The CMS SAM3 tests are confused by Bristol having an SRM-less SE endpoint. Waiting for reply after things were clarified.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 T'Other VO Nagios Page]<br /><br />
A few "The CREAM service cannot accept jobs at the moment" style errors at QM, but they're only a few hours old. Otherwise looking alright beyond the usual noise.<br />
<br />
Of course with these light reviews I could well be missing something, so feel free to let me know - sites or VO representatives.<br />
<br />
'''Monday 24th August 2015, 15.45 BST'''<br /><br />
<br />
21 Open UK tickets this week, most being looked at or are understood. There will be no ticket update from Matt next week (1st September) either, as he will be flapping about during a local downtime.<br />
<br />
[http://tinyurl.com/nwgrnys YE OLDE GGUS TICKET LINK]<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] (gridpp pilots at Sheffield) could do with an update. The Liverpool ticket discussed last week ([https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248]) has received an update from the user saying that the ticket can be closed. As Steve mentioned last week the underlying issue is still very much there, but I don't think this ticket is a suitable banner for us to fight that battle under.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 T'Other VO Nagios Page]<br /><br />
As usual nothing to see here at time of writing, sites are doing a grand job of working for the monitored VOs.<br />
<br />
And that's all folks! See you in Liverpool.<br />
<br />
<br />
'''Monday 17th August 2015, 15.00 BST'''<br /><br />
29 Open UK Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114992 114992] (10/7)<br /><br />
It looks like this CMS transfer problem ticket can be closed after a user update last week, which reported enabling multiple streams solved the initial failures. In progress (13/8)<br />
<br />
'''EFDA-JET'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115613 115613] (assigned) ''Update- looks like this was a temporary problem, and the ticket can be closed.<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115448 115448] (in progress, but empty)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115496 115496] (in progress, but might not be a site problem).<br /><br />
Jet have 3 biomed tickets, 2 of which are looking a little neglected.<br />
<br />
'''CAMBRIDGE'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115655 115655] (12/8)<br /><br />
John rightfully asks why does a lengthy but very scheduled downtime set off a ROD alarm. Other then that it'll be worth "on-holding" this ticket whilst the unrighteous red flag fades. In progress (17/8) ''Update - On holded''<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
This Sno+ ticket looks like it needs some chasing up, no news for nearly 2 months. In progress (21/7) ''Update - Steve commented on this on TB-SUPPORT''<br />
<br />
'''Monday 10th August 2015, 15.00 BST'''<br /><br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 T'Other VO Nagios] looks alright at time of writing.<br />
<br />
24 Open UK tickets this week.<br />
<br />
''Lots of activity on GGUS since yesterday. Most positive, but I see the number of tickets at EFDA-JET increasing - all three from biomed.''<br />
<br />
'''Tier 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115512 115512] (5/8)<br /><br />
An interesting ticket - where a banned user is still banned after moving to LHCB from Biomed (no, he's not called Heinz). In Progress (6/8) ''Update - Andrew has asked that the ticket be reassigned to the argus devs as the pepd logs are showing oddness. Waiting for reply now.''<br />
<br />
'''Bristol'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115565 115565] (7/8)<br /><br />
Bristol's phedex agents are down, and have been for a few days. I might have dreamt this, but thought that the Bristol Phedex service might not be hosted at Bristol, especially after RALPP had a similar ticket ([https://ggus.eu/?mode=ticket_info&ticket_id=115566 115566]) at the same time. Assigned (7/8) ''Update - solved''<br />
<br />
'''Manchester'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115504 115504] (ROD Ticket) ''Solved'' <br /> <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115399 115399] (Wiki Ticket) ''Still Open'' <br /> <br />
Both these tickets look like they can be closed.<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115525 115525] (5/8)<br /><br />
Atlas deletion errors after a a disk server fell over- nothing wrong with the ticket handling, but Alessandra brings up a point that always niggles me - the emphasis on the total number of transaction errors and not the number of affected unique files. In progress (8/8)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB ticket sparked by those IPv6 problems. Still no word from Vladimir; Raja - could you comment? I suspect there's been plenty of room for LHCB jobs in QM's (and everyone else who mainlines atlas jobs) queues this weekend. Waiting for reply (21/7)<br />
<br />
'''Monday 3rd August 2015, 14.30 BST'''<br /><br />
<br />
23 Open tickets this month, full review time.<br />
<br />
'''''Newish this morning'''''<br /><br />
As discussed on TB-SUPPORT, a few sites have been getting "I can't lcg-tag your CE" tickets from Biomed. The Liverpool ticket [https://ggus.eu/?mode=ticket_info&ticket_id=115449 115449], solved and verified, was the flagship of these issues. Brunel ([https://ggus.eu/?mode=ticket_info&ticket_id=115445 115445]) and EFDA-JET ([https://ggus.eu/?mode=ticket_info&ticket_id=115448 115448]) also have tickets about this.<br />
<br />
<br />
'''Sno+ "glite-wms-job-status warning"''' (3/8)<br /><br />
Glasgow: [https://ggus.eu/?mode=ticket_info&ticket_id=115435 115435]<br /><br />
Tier 1: [https://ggus.eu/?mode=ticket_info&ticket_id=115434 115434]<br /><br />
Matt M submitted these tickets to Glasgow and the Tier 1 after having trouble with a proportion of Sno+ jobs. Both are being looked at- definitely worth collaborating on this one.<br />
<br />
'''Wiki-leak...'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115399 115399](31/7)<br /><br />
Jeremy noticed that the wiki didn't work for him on Friday - but it seems to work for Jeremy, Alessandra and myself now. As Jeremy notes the ticket can be closed, but out of interest did anyone else spot any problems? In progress (3/8)<br />
<br />
'''Spare the ROD...'''<br /><br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115433 115433] (3/8)<br /><br />
Some CE problems noticed on the dashboard for the Liver-lads - who might be in mourning. Assigned (3/8) ''Update - aaannd Solved by upping gridftp max connections''<br />
<br />
'''RALPP & UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114764 114764]<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114851 114851]<br /><br />
Both of these are "availability" alarm tickets, on-holding until they clear. I hope RALPP managed to get a re-computation for their unfair failures (IGTF-1.65 problems on ARC).<br />
<br />
'''Sno+'d Under'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115387 115387] (30/7)<br /><br />
I'm uncertain if this Sno+ ticket, probably somewhat related to Matt M's recent thread on TB-SUPPORT and concerning xrootd access, is meant for the Tier 1 or RALPP. Assigned (3/8)<br />
<br />
'''First Tier Problems.'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115417 115417] (2/8)<br /><br />
LHCB spotted a number of nodes with cvmfs problems at the Tier 1, which the RAL team had already jumped on and repaired this morning. They wonder if the problem persists. Waiting for reply (3/8)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115290 115290] (28/7)<br /><br />
An FTS problem requiring some special CA magic to solve, but the current CA-wizard isn't about. On hold (29/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113836 113836] (20/5)<br /><br />
Glue 1 vs Glue 2 queue mismatches. It's being worked on perfecting cluster publishing for ARC CEs, but the ticket could either do with an update or on-holding. In progress (24/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114992 114992] (10/7)<br /><br />
CMS transfers failing between RAL and, err, TAMU in the US. Assigned to RAL, where Brian has investigating and Andrew has posed an good question, asking if the user has considered managing the transfers with FTS. Quiet on the user side. In progress (21/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/2014)<br /><br />
One of the tickets from the before times, about CMS AA access tests. It has become a long and confusing saga, but Gareth rescued it with a handy summary of the issue in his last update. How goes the battle? In progress (17/7)<br />
<br />
'''Oxford Squid is red.'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115230 115230] (24/7)<br /><br />
Which might be the colour Ewan's seeing right now! The ticket is reopened, with a comment from Alessandra that the current recommendation is to allow all CERN addresses, and asks if this is something Oxford could do. Reopened (3/8) ''Update - solved after a squid restart. It's not the end of this ordeal, but Ewan would like to tackle it in a different arena then an Oxford ticket.''<br />
<br />
'''Mavaricks and Gooses - GridPP Pilot roles.'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114485 114485] '''Bristol'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] '''Sheffield'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114442 114442] '''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114441 114441] '''RHUL'''<br /><br />
Daniela hopped right back into the pilot seat after getting back from her holidays. Bristol and RALPP are looking good, Sheffield and RHUL are still in the Danger Zone - RHUL in particular were having troubles with argus and could do with some working configs from elsewhere to compare and contrast with their own.<br /><br />
''Update - RHUL are looking better, just a few queue permission tweaks to go by the looks of it.''<br />
<br />
<br />
'''My shame - tarball glexec'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] '''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] '''Lancaster'''<br /><br />
The tarball glexec tickets. Actually this is likely to become a defunct (or at least different) problem at Edinburgh with their SL7 move. Lancaster has a plan - we plan to deploy *something* (amazing plan there Matt) during our next big reinstall in September. Between now and then I have a test CE, cluster and most importantly some time. <br />
<br />
'''Pot-luck tickets (or those I couldn't group).'''<br />
<br />
'''Durham'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br /><br />
A tiny fraction of jobs publishing 0 cores used. Looks to be a slurm oddity. Oliver upgraded their CEs to ARC5 last week and hopes this has fixed things. Fingers crossed! In progress (29/7)<br />
<br />
'''Lancaster'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/2014)<br /><br />
My other shame - Lancaster's poor perfsonar performance. It's being worked on. On Hold (should be back in progress soon) (30/7)<br />
<br />
'''Liverpool'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (21/7)<br /><br />
Sno+ production problems at Liverpool, probably due to a lack of space in the shared area. Things are back in Sno+'s court, with the submitter consulting the Sno+ gurus (I think). In progress (21/7)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB job submission problems due to the known about dual-stacking problems. Waiting for input from LHCB for a while now, as things look okay at QM now but at least check LHCB jobs still weren't running for some reason. Waiting for reply (21/7)<br />
<br />
That's all folks!<br />
<br />
'''Monday 27th July 2015, 16.10 BST'''<br /><br />
<br />
Only 20 UK Tickets this week, and many are on hold for summer holidays. I pruned a few tickets, but none are striking me as needing urgent action, so this will be brief.<br />
<br />
[http://tinyurl.com/nwgrnys Mandatory UK GGUS Link]<br /><br />
Nothing to see here really, but maybe I'm missing something? Nit-picking I see:<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (publishing problems at Durham) could still do with an update - not sure if work is progressing offline on the issue.<br />
<br />
The Snoplus ticket [https://ggus.eu/?mode=ticket_info&ticket_id=115165 115165] looks like it might be of interest for others - in it Matt M asks about tape-functionality in gfal2 tools. Brian has updated the ticket clueing us in about gfal-xattr.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 UK T'other VO Nagios]<br /><br />
A few failures here at time of writing - although only one at Brunel seems to be more then a few hours old (dc2-grid-66.brunel.ac.uk is failing pheno CE tests).<br />
<br />
Let me know if I missed ought!<br />
<br />
'''Monday 20th July 2015, 14.30 BST'''<br /><br />
27 Open UK Tickets this week.<br />
<br />
'''Brunel'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115113 115113] (17/7)<br /><br />
This Brunel Ops ticket (otherwise okay) is a good reminder that when removing CEs from the GOCDB for your site make sure to get all the services, or else you just might end up still monitored (and thus end up ticketed!). Reopened (can probably be closed soon) (20/7) ''Update - ticket solved''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114006 114006] (31/5)<br /><br />
This ticket about problems with Brunels accounting figures was looking promisingly close to being solved (at least to my layman's eyes) 3 weeks ago. Any word offline? In progress (30/6)<br />
<br />
'''Lancaster'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114845 114845] (6/7)<br /><br />
LHCB jobs were failing at Lancaster a fortnight ago, but things should have been fixed quite promptly. Are they still broken? Waiting for reply (14/7)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
A similar case for this LHCB ticket for QM (part of their ongoing dual-stack saga). Waiting for reply (13/7)<br />
<br />
Note: The related issue [https://ggus.eu/index.php?mode=ticket_info&ticket_id=115017 115017] has been "solved" pending further testing. <br />
<br />
'''Sheffield'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114649 114649] (26/6)<br /><br />
This Sno+ ticket has been in "Waiting for Reply" for a little while, no word from the user (who isn't Matt M). Could we poke Sno+ though another channel about this? Waiting for reply (6/7)<br />
<br />
'''Liverpool'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
We could do with finding out from Sno+ the state of play at Liverpool too, although we might have to wait until Steve is back from his hols later this week to field any replies. In progress (17/6)<br />
<br />
'''Durham'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br /><br />
Ticket concerning the small percentage of Durham jobs that aren't publishing their core count (probably a slurm oddity). Now that Oliver's back from holiday has he had time to look at this? On Hold (19/6)<br />
<br />
'''Tier 1'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=113836 (20/5)<br /><br />
I suspect that whilst the work described chugs along in the background we consider on holding this ticket. In progress (24/6)<br />
<br />
'''UCL'''<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=114746 114746] (30/6)<br /><br />
Ben has been battling getting his DPM working, and spotted an interesting problem where SELinux was blocking the httpd from accessing mysql. It's nice to see someone not just switching SELinux off (like I have a habit of doing...). In progress (20/7)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115003 115003] (12/7)<br /><br />
Andy having some problems on a test SE at ECDF - he seems ot be suffering a series of unfortunate errors. Maybe the storage group could help? In progress (17/7)<br />
<br />
'''GridPP Pilot Roles.'''<br /><br />
Bristol are ready for testing, Govind discovered a possible bug in argus that needs a bit more testing at RHUL. Things seem quiet at RALPP and Sheffield, and Brunel too.<br />
<br />
''Supplemental - In the region multicore publishing ticket ([https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233]) only Oxford have a CE still appearing to publish 0 cores - but I thought this CE was Old-Yeller'ed?''<br />
<br />
'''Tuesday 14th July 2015, 9.30 BST'''<br />
<br />
Lazy update today, due to some fun and games at Lancaster yesterday. <br />
<br />
[http://tinyurl.com/nwgrnys UK GGUS Tickets]<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios]<br />
<br />
'''Tickets that pop:'''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114952 114952] and [https://ggus.eu/?mode=ticket_info&ticket_id=114951 114951] are both atlas frontier tickets at RALPP and Oxford, both have been reopened - although the underlying issues seem different. The Oxford ticket is similar to one at RAL ([https://ggus.eu/?mode=ticket_info&ticket_id=114957 114957]), which looks to be caused by an unannounced change in IP for some important atlas squids (AIUI - speed reading the tickets this morning).<br />
<br />
'''QM IPv6 Woes'''<br /><br />
Followers of the atlas uk lists will have noticed some heroic attempts to diagnose and repair problems at QM which appear to be someone else's fault.<br />
LHCB's ticket to QM on the matter: [https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573]<br /><br />
Dan's ticket concerning the "rogue routes": [https://ggus.eu/index.php?mode=ticket_info&ticket_id=115017 115017]<br />
<br />
'''GridPP Pilot Roles'''<br /><br />
Durham, Bristol, Sheffield, Brunel and RHUL still have open tickets about this. Bristol are working on it, as are Durham - Oliver's ready for their setup to be tested again (Puppet overwrote his last changes!). Not much recent news from the other three.<br />
<br />
That's all from me folks, let me know if I missed ought!<br />
<br />
'''Tuesday 14th July 2015, 9.30 BST'''<br />
<br />
Lazy update today, due to some fun and games at Lancaster yesterday. <br />
<br />
[http://tinyurl.com/nwgrnys UK GGUS Tickets]<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios]<br />
<br />
'''Tickets that pop:'''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114952 114952] and [https://ggus.eu/?mode=ticket_info&ticket_id=114951 114951] are both atlas frontier tickets at RALPP and Oxford, both have been reopened - although the underlying issues seem different. The Oxford ticket is similar to one at RAL ([https://ggus.eu/?mode=ticket_info&ticket_id=114957 114957]), which looks to be caused by an unannounced change in IP for some important atlas squids (AIUI - speed reading the tickets this morning).<br />
<br />
'''QM IPv6 Woes'''<br /><br />
Followers of the atlas uk lists will have noticed some heroic attempts to diagnose and repair problems at QM which appear to be someone else's fault.<br />
LHCB's ticket to QM on the matter: [https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573]<br /><br />
Dan's ticket concerning the "rogue routes": [https://ggus.eu/index.php?mode=ticket_info&ticket_id=115017 115017]<br />
<br />
'''GridPP Pilot Roles'''<br /><br />
Durham, Bristol, Sheffield, Brunel and RHUL still have open tickets about this. Bristol are working on it, as are Durham - Oliver's ready for their setup to be tested again (Puppet overwrote his last changes!). Not much recent news from the other three.<br />
<br />
That's all from me folks, let me know if I missed ought!<br />
<br />
'''Monday 6th July 2015, 14.00 BST''' <br /><br />
30 Open UK Tickets this month. Looking at them all!<br />
<br />
'''NGI'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233] (10/6)<br /><br />
The UK not publishing core counts at all sites. Some progress, but at last check John G couldn't see a change for Oxford or Glasgow. In progress (30/6) ''Update - Glasgow seems to be okay after de-creaming, checking the July list we have t2ce6 at Oxford, ce3 and ce4 at Durham (see their ticket) and cetest02 at IC (but that node has test in its hostname!).''<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114442 114442] (18/6)<br /><br />
Gridpp Pilot role ticket. Accounts need to be created, but no word for a few weeks. In progress (19/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114764 114764] (1/7)<br /><br />
Ticket tracking (false) availability issues, created to appease COD - the problem caused by a broken CA rpm release for Arc CEs. Kashif has created a counter-ticket [https://ggus.eu/index.php?mode=ticket_info&ticket_id=114742 114742] Gordon's sagely advice is to submit a recalculation request once the issue is fixed. Assigned (1/7)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114485 114485] (19/6)<br /><br />
Bristol's gridpp pilot role ticket. No news, could do with an update really. In progress (22/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114426 114426] (18/6)<br /><br />
CMS AAA reading test problems. The Bristol admins have transferred data to their new shiny SE and have asked CMS to test again. No word since. Waiting for reply (30/6)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (1/7/13...)<br /><br />
Tarball glexec ticket, now 2 years old. After a really promising burst the last 6 weeks haven't seen any progress, due to a lot of other "normal" tarball work taking up the time. Sorry! On hold (18/5)<br />
<br />
'''DURHAM'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114536 114536] (22/6)<br /><br />
Durham's gridpp pilot role ticket. Not acknowledged yet, is Oliver back yet? Assigned (22/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114765 114765] (1/7)<br /><br />
See RALPP ticket 114764. Assigned (1/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114727 114727] (30/6)<br /><br />
Catalin ticketed that a number of SW_DIR variables at Durham are still pointing to the old school .gridpp.ac.uk cvmfs space. Assigned (30/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br /><br />
John G ticketed Durham over a small percentage of jobs being published as "zero core". Looks like a SLURM timeout problem, although a fix isn't obvious. Put on the back burner whilst Oliver is on holiday. On Hold (19/6)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114649 114649] (26/6)<br /><br />
A ticket from a Sno+ user about not being able to access software using the Sheffield CEs. Acknowledged but no news. In progress (26/6) ''Update - Elena can't find anything wrong, cvmfs seems to be working fine. Perhaps a problem with the environment?''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] (18/6)<br /><br />
Sheffield's gridpp pilot role ticket. Did you get round to rolling them out? In progress (19/6)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114444 114444] (18/6)<br /><br />
LHCB ticket concerning the DPM's SRM not returning checksum information. On hold whilst a related ticket is being looked at ([https://ggus.eu/index.php?mode=ticket_info&ticket_id=111403 111403]). On Hold (22/6)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
Another Sno+ ticket, about grid production jobs failing at Liverpool. AIUI caused by Sno+ running out of space on the shared pool. At last check Steve posted the usage information for Sno+ but no word since (and Steve's off on his hols). In progress (17/6)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114845 114845] (6/7)<br /><br />
LHCB pilots failing at Lancaster. Looks like a simple node misconfiguration, hopefully fixed, waiting to see if it is. On hold (6/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/2013)<br /><br />
glexec ticket - see Edinburgh description. On hold (15/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1)<br /><br />
Bad bandwidth performance at Lancaster. Hoping that IPv6 will shake things up a bit so pushing that. On hold (18/5)<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114746 114746] (30/6)<br /><br />
SRM-put failures ROD ticket. No news at all. Assigned (30/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114851 114851] (6/7)<br /><br />
Low availability ROD ticket, related to above. Assigned (6/7)<br />
<br />
'''RHUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114441 114441] (18/6)<br /><br />
Another GridPP pilot role ticket. Pilots rolled out, but something isn't quite right and they're not working - Govind is looking again. In progress (6/7)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB ticket about two out of three QM CEs not responding for them. Dan spotted the broken CEs were dual-stacked, the working one wasn't. The ticket seemed to have trailed off into some confusion over who needs to do some testing where. I agree with Dan that that who needs to be someone with LHCB credentials! The waters still seem muddied. In progress (1/7)<br />
<br />
'''IC'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114737 114737] (30/6)<br /><br />
The IC voms wasn't updating properly, due to what I infer from the ticket as "SSL/mysql madness". Simon and Robert have been heroically battling this one - it's a good read. On hold (3/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114379 114379] (16/6)<br /><br />
Sam's ticket about SE support in Dirac. Sam will shortly try testing things out on the new Dirac to see how it fares. In progress (6/7)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114447 114447] (18/6)<br /><br />
Brunel's gridpp pilot ticket. Being worked on, with one CE with the pilots enabled. In progress (26/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114006 114006] (31/5)<br /><br />
A ticket from APEL, about Brunel under-reporting the number of jobs they are doing. Turned out to be a problem with Arc, which Raul upgraded to the fixed 5.0 version. The APEL team deleted the sync records, but no word since. In progress (30/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114850 114850] (6/7)<br /><br />
Another APEL ticket, likely the fallout of the previous one - it looks like GAP publishing has been left on for the Brunel CREAM CEs. Assigned (6/7) ''Update - solved''<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114786 114786] (2/7)<br /><br />
Low availability ticket - see RALPP ticket 114442 - probably could do with On holding. In progress (2/7) ''Update - Onholded''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113910 113910] (26/5)<br /><br />
Sno+ data staging problems. Brian gave some advice on how the large VOs do data staging from tape, and has asked if Sno+ still has problems. Matt M might still be on leave though. Waiting for reply (23/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA problems, which eventually brought to light to a problem with super-hot datasets which were alleviated (I think). Despite an update to castor that improved performance the last batch of tests didn't show improved results. No news since. In progress (17/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113836 113836] (20/5)<br /><br />
Glue mismatch problems at RAL. Working on getting "many-Arcs" to correctly publish. In progress (24/6)<br />
<br />
'''Monday 29th June 2015, 14.30 BST'''<br /><br />
<br />
Looking at the [https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''"Other VO" Nagios''']. <br /><br />
Things look generally alright - but Durham look like they need to update their CA rpms - but that might have to wait until Oliver is back from leave.<br />
<br />
'''Tarballs:'''<br /><br />
I don't think this effects many, but there's was a ticket to produce a new version of the WN tarball (which is done): [https://ggus.eu/index.php?mode=ticket_info&ticket_id=114574 114574]<br ><br />
Although AFAICS there is no urgent need to upgrade tarball WNs.<br />
<br />
26 UK Tickets, although not many stand out.<br />
<br />
'''Gridpp VO Pilot tickets:'''<br /><br />
Largely doing alright. With Oliver away the Durham ticket hasn't been looked at yet. Sheffield and Bristol's tickets could do with an update (or on-holding if there's going to be a delay). The RHUL ticket has been reopened as they're deployment of the pilot roles hasn't quite worked out.<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB having trouble with two out of three QM CEs. Dan notes that the two "broken" CEs have been recently dual-stacked, and asks if this could be the problem. The answers is a resounding "maybe", and Raja asks if problems could be duplicated by others using lxplus. Waiting for reply (24/6)<br />
<br />
'''IMPERIAL/DIRAC'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114379 114379] (16/6)<br /><br />
Sam's ticket trying to get SE support with Dirac...er, spruced up. Daniela has asked if the tests can be redone with the "new" dirac. Waiting for reply (22/6)<br />
<br />
Let me know if I missed any tickets.<br />
'''Monday 22nd June 2015, 14.00 BST'''<br />
<br />
35 Open UK Tickets this week (!!!)<br />
<br />
'''GridPP Pilot Role'''<br /><br />
A dozen of them are from Daniela (who painstakingly submitted them all) concerning getting the gridpp (and other) pilot role enabled on the site in question's CEs.<br />
<br />
An example of one of these tickets is:<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=114440 114440] (Lancaster's smug solved ticket).<br />
<br />
Ticketed sites are: Durham, Bristol, Cambridge, Glasgow, ECDF (who are also having general gridpp VO support problems), EFDA-JET (looking solved), Oxford, Liverpool, Sheffield, Brunel, RALPP and RHUL. Most tickets are being worked on fine, but the Bristol and Liverpool ones were still just in an "assigned" state at time of writing. <br /><br />
''Update - good progress on this, just one ticket left "assigned". Cambridge are done, as are JET (ticket needs to be closed). Oxford and Manchester are ready for to have their new setups tried out, with Oxford kindly road-testing glexec for the pilot roles. Good stuff.'' <br />
<br />
'''Core Count Publishing'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233] (10/6)<br /><br />
Of the sites mentioned in this ticket (Durham[1], IC, Liverpool, Glasgow, Oxford) who *hasn't* had a go at changing their core count publishing? I know Oxford have. Daniela had a pertinent question about publishing for VMs, which John answered. In progress (17/6)<br />
<br />
[1] Durham have another ticket on this which may explain their lack of core count publishing: <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br />
<br />
'''DIRAC'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114379 114379] (16/6)<br /><br />
Sam S formed this ticket over having trouble accessing the majority of SEs over Dirac, after some discussion around this last week. Sam acknowledges that this could be a site problem, not a DIRAC problem, but you gotta start somewhere (he worded that point more eloquently). Daniela has posted her latest and greatest DIRAC setup gubbins for Sam to try out. Another, unrelated, point to have are the names missing from Sam's list - for example I'm pretty sure Lancaster should support gridpp VO storage but I've forgotten to roll it out! Waiting for reply (22/6)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
Final ticket today, and another one discussed last week in the storage meeting. Steve's explanation of why (and how) Sno+ would need to start to using space tokens was fantastically well worded in a way to not spook easily startled users. David is digesting the information, but it will likely need to wait for Matt M's return before we'll see progress. In progress (16/6)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114444 114444](18/6)<br /><br />
I told a pork pie when I said that was the last ticket - this one caught my eye. A ticket from lhcb over files not having their checksums stored on Manchester's DPM. A link was given to another ticket at CBPF for a similar issue which got the DPM devs involved ([https://ggus.eu/index.php?mode=ticket_info&ticket_id=111403 111403]) - although Andrew McNab was already subscribed to the ticket. In progress (19/6)<br />
<br />
'''Monday 15th June 2015, 14.15 BST'''<br /><br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO nagios:]<br /><br />
ce05.esc.qmul.ac.uk and hepgrid11.ph.liv.ac.uk seem to be having a spot of bother for multiple VOs, and hepgrid2.ph.liv.ac.uk seems to be starting to have trouble too.<br />
<br />
19 Open UK Tickets this week.<br />
<br />
'''NGI'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233] (10/6)<br /><br />
John Gordon ticketed the NGI (as well as others) about some sites in the UK not publishing core counts with their APEL numbers (or more precisely have submission hosts at that site not publishing). Following the link it looks like Imperial, Liverpool, Durham, Glasgow and Oxford are on this list, I've listed the submission hosts reporting "0" core jobs below to help people clear up their rogues! If you've only fixed things in the last fortnight you'd still show up on this list. In progress (15/6) <br />
<br />
"0" core job submission hosts:<br /><br />
ce3.dur.scotgrid.ac.uk<br /><br />
ce4.dur.scotgrid.ac.uk<br /><br />
cetest02.grid.hep.ph.ic.ac.uk<br /><br />
hepgrid5.ph.liv.ac.uk<br /><br />
hepgrid6.ph.liv.ac.uk<br /><br />
hepgrid97.ph.liv.ac.uk<br /><br />
svr009.gla.scotgrid.ac.uk<br /><br />
t2ce06.physics.ox.ac.uk<br />
<br />
'''Tier 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113914 113914] (26/5)<br /><br />
Sno+ file copying ticket. Matt M is away on his hols, but Dave Auty has took over his duties and reports that this problem seems to have gone away - it can probably be closed. In progress (9/6)<br />
<br />
'''Liverpool'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
Another Sno+ ticket here from David concerning job failures. Nothing wrong with the ticket handling, but I thought that David's errors in submitting test jobs are worth documenting, as they were very understandable. David has since asked if the Sno+ job failures are linked to Sno+ nagios test failures at Liverpool. In progress (12/6)<br />
<br />
'''Imperial'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114157 114157] (8/6)<br /><br />
After Simon and Daniela have cleared up the atlas dark data and expanded their Space Tokens using the space freed up there still seems to be some confusion in Rucio, disagreeing with the SRM numbers. In progress (10/6)<br />
<br />
'''Oxford'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114208 114208] (9/6)<br /><br />
Oxford being ticketed for UKI-SOUTHGRID-OX-HEP_IPV6TEST failing connection tests, tests for which Oxford should not be getting ticketed afaacs. I remember this being mentioned in the Thursday cloud meeting, but I'm ashamed to say I wasn't paying attention. Were any conclusions drawn/decisions made? In progress (10/6)<br />
<br />
'''Manchester'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114153 114153] (7/6)<br /><br />
Atlas transfer failures from Manchester. Errors are still occurring as of yesterday, any news? In progress (14/6)<br />
<br />
<br />
'''Monday 8th June 2015, 15.00 BST'''<br /><br />
21 Open Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113914 113914] (26/5)<br /><br />
Sno+ had problems at the Tier 1 where jobs failed whilst uploading data, believed to be due to an incorrect VOInfoPath. There's been a failure at replicating the issue, and the VOInfoPath advertised is correct. Very confusing, as I assume it all worked at some point before! In progress (2/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113910 113910] (26/5)<br /><br />
Another Sno+ ticket, concerning lcg-cp timeouts whilst data-staging from tape. Matt M has asked for advice on the best practice for doing this, or if Sno+ would be better off just upping their timeouts. Brian has given some advice on using the "bringonline" command, but is himself unsure the best way of seeing what files are currently in a VO's cache. Not much news since. In progress (28/5) <br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114004 114004] (31/5)<br /><br />
Atlas transfers fail due to the "bring-online" timeout being exceeded. Brian spotted a problem with file timestamps mismatching, but no news on this ticket since. In progress (1/6)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114006 114006] (31/5)<br /><br />
APEL accounting oddness at Brunel, noticed by the APEL team. After much to-and-fro-ing John noticed that multiple CEs were using the same SumbitHost, and thus overwriting each other's sync records. Something to watch out for. In progress (7/6)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114157 114157] (8/6)<br /><br />
There's been some debate on the atlas lists about this ticket, a classic "not enough space at the site" ticket. Raising above the indignation over being ticketed for this, Daniela has offered a couple of TB to give some space, and pointed out that IC have some atlas data outside space tokens, and that this could be used to expand the tokens if cleaned up. Waiting for reply (8/6)<br />
<br />
<br />
'''Tuesday 26th May 2015'''<br /><br />
Matt's on leave until the 8th of June. But he's replaceable with handy links... 23 tickets today:<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios''']<br /><br />
<br />
[http://tinyurl.com/nwgrnys '''UK NGI GGUS tickets''']<br />
<br />
'''Monday 18th May 2015, 14.30 BST'''<br /><br />
Full review this week.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios''']<br /><br />
At time of writing I see problems with test jobs at Brunel for pheno and Liverpool for a number of VOs (see Sno+ ticket for probable cause and fix at Liverpool).<br />
<br />
22 Open UK Tickets this week. Going site-by-site:<br />
<br />
'''APEL/NGI'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113473 113473] (4/5)<br /><br />
Missing accounting date for April for some sites. Raul is discussing things for Brunel in the ticket, although they have republished. I think it's only ECDF left to republish their April data. In progress (16/5)<br />
<br />
'''OXFORD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113482 113482] (26/4)<br /><br />
Loss of accounting data for Oxford needing a APEL republish. The Oxford guys republished, but there is some confusion with the resulting numbers. Discussion is ongoing, John G is currently looking at the records. In progress (14/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113650 113650] (11/5)<br /><br />
CMS glideins failing at Oxford. The original problem was with a config tweak being left out of the cvmfs setup, but the ticket has been reopened citing problems persisting on the ARC CE (the CREAM appears to be fixed). Reopened (16/5)<br />
<br />
'''GLASGOW'''<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113095 113095] (17/4)<br /><br />
ROD ticket about batch system BDII failures, left open to avoid unnecessary ticket filing. Gareth noted that the full migration to ARC and HTCondor, which should see the end of these issues, will hopefully be completed by the end of June. On Hold (12/5)<br />
<br />
'''ECDF'''<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (31/7/13)<br /><br />
Somehow left this one out of the e-mail update. Edinburgh's glexec ticket, dependent on the tarball. I put in my tuppence worth today with my tarball hat on. On hold (18/5)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113769 113769] (18/5)<br /><br />
LHCB see a cvmfs problem at Sheffield. Elena has probably fixed the problem(restarted the sssd), just waiting to see if it all pans out. In progress (18/5)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113744 113744] (15/5)<br /><br />
For the VOMS rather then the site, Jens' request for the creation of the dIrac VO, vo.dirac.ac.uk. In progress (18/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113692 113692] (13/5)<br /><br />
A request from pheno to add support to for their new cvmfs area at Manchester, and as I understand it, to support them in a new "form" (pheno.egi.eu). In progress (13/5)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113742 113742] (15/5)<br /><br />
Sno+ noticed their nagios failures at Liverpool. Rob reckons this was a problem with the DPM BDII service certificate not being updated (that's bitten me too), and fixed things this morning. Let's see how that goes. In progress (18/5)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13!)<br /><br />
Lancaster's vintage glexec ticket. An update on this - after have a roundtuit session last week I was building glexec for different paths. It still needs some testing to make sure it works properly. There however definitely won't be a one-size-fits-all tarball solution. On hold (15/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/14)<br /><br />
Only the crustiest old tickets for us at Lancaster! Poor perfsonar performance. Sadly didn't get roundtuit on this one - we're pushing getting these nodes dual stacked as Ewan had pointed out that it would be interesting to see if IPv6 tests also saw this issue. On hild (18/5)<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113721 113721] (14/5)<br /><br />
The only UCL ticket, this is a egi "low availability" ticket. However Daniela notes that the plots are on the rise, so things are looking alright. Probably want to "On Hold" it but otherwise not much to be done. In progress (14/5)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113743 113743] (15/5)<br /><br />
A ticket from Durham concerning the Dirac instance at Imperial's settings for their site. Daniela hopes to get it fixed soon. In progress (15/5)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112948 112948] (10/4)<br /><br />
CA certificate update at 100IT leading to a discussion of other authentication based failures. David has asked for voms information after posting his configs. In progress (13/5)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113035 113035] (14/4)<br /><br />
Ticket tracking the decommissioning of the Tier 1 CREAM CEs. I think things are just about done now, this ticket can soon be closed. In progress (11/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694] (28/10/14)<br /><br />
Sno+ gfal-copy ticket. Brian reports that the Tier 1 is upgrading gfal2 on their WNs, and notes that there's a lot of active debugging work going on in the area. As he eloquently puts it "situation is quite fluid". In progress (13/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA tests failing at the Tier 1. There's been a lot of work on this, deploying then trying to get the new xrootd director configured. New problems have cropped up, and are under investigation. In progress (11/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (28/3)<br /><br />
Atlas transfer failures ("failed to get source file size"). Tracked to a odd double transfer error, possibly introduced in one of the recent "upgrades". Brian has been declaring these files as bad, and a workaround or solution is being thought about. In progress (14/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113705 113705] (13/5)<br /><br />
Atlas transfer failures from RAL tape. Checksum failures, which Brian tracked to being due to not being of a type Castor supports. Brian has asked if this can be changed at the CERN FTS or in rucio. Waiting for reply (14/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113748 113748] (16/5)<br /><br />
Another atlas transfer ticket, but as the error indicates no space left at the Brunel space token being transferred to Elena has noted that this isn't a site problem, telling the submitter to put in a JIRA ticket instead. Waiting for reply, but probably can be just closed (16/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112866 112866] (7/4)<br /><br />
Lots of cms job failures at RAL. This has been traced to some super-hot files, mitigation is being looked into. A candidate for perhaps On Holding, depends on the time frame of a work around. In progress (13/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113320 113320] (27/4)<br /><br />
CMS data transfer issues. I'm not actually too sure what's going on. There are files that need invalidating, which seems to be the root of the evil befalling transfers. The issue is being actively worked on though. In progress (18/5)<br />
<br />
'''Monday 11th May 2015, 14.10 BST'''<br /><br />
22 Open UK Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
There are a few tickets at the Tier 1 that are set "In Progress" but haven't received an update yet this month:<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (CMS AAA Tests, 30/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (Atlas Transfer problems, 16/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694] (SNO+ gfal copy trouble, 15/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112866 112866] (CMS job failures, 7/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112819 112819] (SNO+ arcsync troubles, 20/4)<br />
<br />
'''Other Tier 1 Tickets''' (sorry to be picking on you guys!)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (10/2)<br /><br />
Atlas glexec hammercloud test jobs at the Tier 1. It appears to be working, but a batch of test jobs failed because they couldn't find the "mkgltempdir" utility on some nodes ("slot1_5@lcg1742.gridpp.rl.ac.uk" and "slot1_4@lcg1739.gridpp.rl.ac.uk"). In progress (4/5)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=113320 113320] (27/4)<br /><br />
Maybe repeating what Daniela is going to say in the CMS update - trouble with CMS data transfers within RAL. It's under investigation, but it looks like the files in question will need to be invalidated - even if it's just to paint a clearer picture. In progress (10/5)<br />
<br />
'''APEL REPUBLISHING'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113473 113473] <br /><br />
At last update Brunel, Liverpool, Edinburgh, Birmingham and Oxford need to republish still. Oxford have their own ticket about it due to complications ([https://ggus.eu/?mode=ticket_info&ticket_id=113482 113482]).<br />
<br />
'''UCL Tickets''' - Ben is starting to move to close these, some are going to be "unsolved".<br />
<br />
'''GLASGOW'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113095 113095] (17/4)<br /><br />
Andrew asks if the timeframe for the move to Condor be added to this ticket, for the ROD team's information. On Hold (7/4)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112948 112948] (10/4)<br /><br />
No news on this 100IT ticket for a while. In progress (27/4)<br />
<br />
'''Friday 1st May'''<br /><br />
The Bank Holiday weekend might muck up plans for a Ticket review this week. Just in case, some links!<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios page.]<br /><br />
<br />
[http://tinyurl.com/l784bbg UK GGUS Tickets]<br />
<br />
Hope you all have a nice weekend!<br />
<br />
A quick check of the [https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios] page.<br />
<br />
26 Open UK tickets this week.<br />
<br />
'''ITWO Decommissioning'''<br /><br />
Three of the tickets are to the VOMS sites (Manchester, Oxford, IC), concerning the decommissioning of the ITWO VO. Just an FYI to y'all.<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113293 113293] (26/4)<br /><br />
There was an APEL problem last month where a lot of sites needed to republish their data for the month. I think Edinburgh are the only UK site that suffered this problem, but another FYI ticket. Assigned (26/4) '' And solved''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113181 113181](21/4)<br /><br />
Atlas production jobs not running at ECDF. Andy noticed that analysis jobs were running fine, and believes that this might be a problem scheduling pilots in time. Perhaps a multicore issue if this is only effecting (affecting?) production jobs. In progress (22/4) ''Update - solved''<br />
<br />
'''ATLAS GLEXEC HAMMERCLOUD PROBLEMS'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (TIER 1)<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=111703 111703] (RALPP)<br /><br />
It was discovered that there was a problem in the test code, so the ball is very much in atlas' court for this one. The problem has been fixed and the tests are being rebuilt and resubmitted.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (28/3)<br /><br />
ATLAS FTS failures too RAL. A rucio issue causing double-transfers has been discovered ([https://its.cern.ch/jira/browse/ATLDDMOPS-4939 here]), which would explain the behaviour seen. No news since this revelation. In progress (16/4)<br />
<br />
There are a number of other Tier 1 tickets that could do with either an update or On Holding<br />
<br />
'''Monday 20th April 2015, 14.30 BST'''<br /><br />
24 Open UK tickets this week, only a light review. ''Update - down to 20 open tickets as of this morning''<br />
<br />
'''NGI/TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113150 113150] (20/4)<br /><br />
Fresh in - the NGI has been ticketed to change the regional VO from emi.argus to ngi.argus in the gocdb. Seems a bit pedantic, but hey! I assigned it to the NGI ops, and notified RAL as keepers of the regional argus. Assigned (20/4) ''Update - solved''<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113035 113035] (14/4)<br /><br />
Just for people's interest, the ticket tracking the decommissioning of the last of the RAL CREAM CEs. In progress (14/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112819 112819] (2/4)<br /><br />
A SNO+ ticket I must of somehow missed last week, concerning SNO+'s manual renewing of proxies on ARC machines. Matt M has noticed that ArcSync occasionally hangs rather then timeouts smoothly (although he later notes that he doesn't see the initial problems working from a different network). I'm thinking that this should be redirected at the arc devs, but I don't think they have a GGUS support group (I could be wrong, I'm well behind on the ARC curve). In progress (7/4)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113110 113110] (17/4)<br /><br />
Looks like this atlas low transfer efficiency ticket can be closed. Waiting for reply (20/4) ''Update - solved''<br />
<br />
'''GLASGOW'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113095 113095] (17/4)<br /><br />
ROD ticket for some BDII misreporting at Glasgow. The botheration seems to be ephemeral in nature, the blunders passing with the abating of their batch system's burden. This ticket can probably be solved. In progress (17/4)<br />
<br />
<br />
'''Monday 13th April 2015, 14.00 BST'''<br /><br />
24 Open tickets this week - going over all of them this week, site by site.<br />
<br />
''Fresh in this morning - [https://ggus.eu/?mode=ticket_info&ticket_id=113010 113010] and [https://ggus.eu/?mode=ticket_info&ticket_id=113011 113011] - Sno+ tickets concerning the RAL and Glasgow WMSes not updating job statuses.''<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703](11/2)<br /><br />
Atlas glexec hammercloud tests failing. There's been a lot of waiting on atlas to build new HC jobs. The most recent exchange (delayed due to Easter), was asking about SELinux - but no news since the first. In progress (1/4)<br />
<br />
'''BIRMINGHAM'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112875 112875](7/4)<br /><br />
Low availability ROD ticket. Availability is crawling back up, just need it to go green. On hold (13/4)<br />
<br />
'''GLASGOW'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112967 112967](10/4)<br /><br />
Another ROD ticket for bdii errors at Glasgow. Gareth has been doing everything right investigating this. Kashif recommended ticketed the midmon unit, but Gareth has spotted that the errors correspond to high load on their ARC CE - so it might be a site problem after all - Gareth asks for clarification. Waiting for reply (13/4)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (1/7/13)<br /><br />
Tarball glexec ticket. No news (sorry). End of April I believe was the "deadline" I set for having this made. On Hold (9/3)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/14)<br /><br />
Lancaster's poor perfsonar performance. I'm not believing quite what I was seeing with the tests I performed so I'm aiming to rerun them. On hold (13/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13)<br /><br />
Lancaster's tarball glexec ticket. Same as ECDF. On hold (9/3)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112966 112966] (13/3)<br /><br />
A ROD cream job submit ticket, freshly assigned this afternoon. It's a bit mean of me to bring notice to it. Assigned (13/4) ''And POW, Raul closed this after kicking torque into shape - solved''<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112948 112948] (10/4)<br /><br />
100IT needed to upgrade to the latest CA release. They've done this, but there are still authentication problems. In progress (13/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356] (10/9/14)<br /><br />
Deploying vmcatcher at 100IT. After David's questions falling on deaf ears for a while it has been advised that the ticket be closed as this issue will be dealt with elsewhere. Whether or not it is to be "solved" or "unsolved" is open to debate! In progress (can possibly be closed) (13/4)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA tests failing at RAL. After a lot of work and new xrootd redirectors problems persist. It's looking to be a problem that needs the CASTOR and/or xrootd devs to look at. In progress (30/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112713 112713] (27/3)<br /><br />
CMS asking to clean up the "unmerged area". Andrew conjured up a list of files and asked if they could be deleted - CMS responded with a "yes please then close the ticket". Has the deed been done? In progress (31/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694] (28/10/14)<br /><br />
The Sno+ gfal copy ticket. Matt M still sees gfal-copy hang for files at RAL when he uses the GUID (SURL works). A Castor oddity perhaps? Matt asks a question about what problems like this (coupled with the move away from lcg tools) will mean for VOs that rely on the LFC. In progress (31/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112977 112977] (10/3)<br /><br />
CMS high job failure rate at RAL. Related to 112896 (below) - the jobs all want that file! In progress (13/3)<br />
<br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=112896 (9/4)<br /><br />
CMS Dataset access problems - caused by over a million access attempts on a single file over a 18 hour period. Andrew L comments that CMS needs to have a think about how they access pileup datasets. In progress (9/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (10/2)<br /><br />
Tier 1 counterpart to 111703. A new HC stress test was submitted near the end of March, but no news on how it did. In progress (23/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112866 112866] (7/4)<br /><br />
A different "lots of CMS job failures" ticket. Again a "hot file" seems to be the root cause. In progress (7/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (28/3)<br /><br />
An atlas file access ticket, seemingly caused by some odd FTS behaviour. No answers to Shaun's question about this odd occurrence or much noise at all till today. Waiting for reply (13/4)<br />
<br />
'''UCL'''<br /><br />
UCL has 6 tickets - 4 just "assigned". I'll just list them in the interests of brevity.<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112371 112371] (ROD low availiability, On Hold)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112841 112841] (atlas 0% transfer efficiency, assigned)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112873 112873] (ROD srm put failures, assigned)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95298 95298] (glexec ticket, on hold)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112722 112722] (atlas checksum timeouts, in progress)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112966 112966] (ROD job submit failures, assigned)<br />
<br />
'''Tuesday 7th April'''<br />
* 20 open tickets. [http://tinyurl.com/l784bbg Link to GGUS].<br />
<br />
'''Monday 23rd March 2015, 15.30 GMT'''<br /><br />
19 Open tickets this week. <br />
<br />
'''BIRMINGHAM'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=112550 (23/3)<br /><br />
A ticket fresh off the ROD dashboard - the Birmingham CREAMs aren't being matched ("BrokerHelper: no compatible resources"). Matt W has double checked their setup and can't spot anything wrong - they've been running "normal" atlas/lhcb etc jobs fine over the last few weeks. Any advice appreciated. In progress (23/3)<br />
<br />
'''TIER 1'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=112495 (21/3)<br /><br />
MICE problems running jobs at RAL, which Andrew L discovered coincided with WMS problems that he fixed. Probably should be in "Waiting for Reply/Seeing if the problem's evaporated". In progress (23/3)<br />
<br />
https://ggus.eu/?mode=ticket_info&ticket_id=112350 (14/3)<br /><br />
The cause of this Sno+ ticket, about a recent user not being able to access files due to not being in the gridmap, has been discovered. As Robert F sagely pointed out the latest version of the mkgridmap rpm is required to talk to the voms server. Just waiting on the time for it to updated at now. In progress (17/3)<br />
<br />
'''100IT'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=108356 (10/9/14)<br /><br />
This 100IT ticket is still waiting for a reply (since mid-January). The question needs to be answered by someone familiar with the technologies and terminologies? Is anyone up on vmcatcher? Anyone know what other channel to pass David's query onto? Waiting for reply (15/1)<br />
<br />
'''Monday 16th March 2015, 15.30 GMT'''<br /><br />
<br />
16 Open UK tickets this week. Half Red, Half Green.<br />
<br />
'''RALPP and TIER 1 glexec HC tickets'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703] (RALPP)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (TIER 1)<br /><br />
No news on these tickets since it was expected that a new stable HC job would be released last Tuesday. All very quiet.<br />
<br />
'''OXFORD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112011 112011](25/2)<br /><br />
Similar for this CMS glexec ticket - no news after Kashif asked for some more information way back. Waiting for reply (27/2)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694](28/10/14)<br /><br />
The Sno+ gfal copying ticket. A lot of people are working on this, and attempts to recreate the problems seem to occasionally be devolving into "did we get this complicated command right?". At some point it might be necessary to get the gfal devs involved (are there gfal devs?). Waiting for reply (11/3)<br />
<br />
Also there's the poor 100IT ticket, still waiting for a reply. The JET ticket also needs wrapping up, I put a reminder in to the end of it.<br />
<br />
<br />
'''Monday 9th March 2015, 15.00 GMT'''<br/ ><br />
From last week's crusty ticket round up:<br />
<br />
'''Tier 1'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694]- 28/10/14<br/ ><br />
Matt M has managed to reclaim his tickets after a certificate change orphaned his old ones. Progress has resumed. Duncan has asked Matt to retry his failing tests with a simple copy to local disk example.<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944]- 1/10/14<br/ ><br />
CMS AAA access at RAL. Andrew posed a question to the CMS xroot experts last week - if you have their details it might be a good idea to involve them in the ticket.<br />
<br />
'''100IT'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356]- 10/9/14<br/ ><br />
This VMCatcher ticket is still stuck waiting for a reply. Deafening silence for our 100IT colleagues. <br />
<br />
'''EFDA-JET'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=97485 97485] - 21/9/13<br/ ><br />
The Jet LHCB ticket is in the state of being wrapped up. LHCB have been removed from the local configs, and the site has been removed from LHCB's. I believe that this ticket can be terminated.<br />
<br />
'''QMUL'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353] - 25/11/14<br/ ><br />
Dan's managed to get webdav working on one SE, but not t'other. Very strange, but Dan is investigating (see also [https://ggus.eu/?mode=ticket_info&ticket_id=111942 111942]).<br />
<br />
No movement on the three glexec tickets (none expected on the two tarball ones in the last week though), the Lancaster perfsonar ticket is still waiting on another batch of local tweaks (and I still need to make sense of what I'm seeing). Matt RB closed Sussex's perfsonar ticket though - nice one.<br />
<br />
'''The "Normal" tickets:'''<br />
<br />
Atlas gLexec Hammercloud failures (RALPP and Tier 1)<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (Tier 1)<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703] (RALPP)<br/ ><br />
These tests were waiting on a new stable job release being made - this has been delayed (hopefully out tomorrow).<br />
<br />
'''TIER 1'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111856 111856](19/2)<br/ ><br />
This LHCB ticket about stalled jobs looks like it can be closed (LHCB no longer see a problem). ''Update - set to solved, the jobs were being killed for using too much memory.''<br />
<br />
'''OXFORD'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112011 112011](25/2)<br/ ><br />
A CMS user saw glexec failures on some nodes - Kashif asked for some more information but there has been no reply. I'd consider giving the user till the end of the week then closing the ticket if there's still no word. Waiting for reply (27/2)<br />
<br />
'''Tuesday 3rd March'''<br />
* [http://tinyurl.com/nwgrnys A link to GGUS open tickets] for checking status directly.<br />
<br />
Concentrating on pre-2015 tickets this week in an attempt to Spring Clean the UK ggus presence. I will review these again next week - can people please take a look at these tickets if they're owned by them (or if they think they can help!).<br />
<br />
'''SUSSEX - 26/11/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389]<br /><br />
This is a perfsonar ticket - the initial request (reinstalling the perfsonar node) has been done a while ago but things weren't quite right. Matt RB did some soothing to this last week and asks if he's missed anything - I put it to waiting for reply this morning. Waiting for reply (24/2)<br />
<br />
'''QMUL - 25/11/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353]<br /><br />
Atlas wanting https access on QM's SE. Dan's been working on this nicely, carefully testing each stage of his rollout. The end is in sight here. In progress (17/2)<br />
<br />
'''TIER 1 - 28/10/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694]<br /><br />
This is a SNO+ ticket about getting gfal tools working for the Tier 1 - with the new version out Brian has tested it correctly (and I saw a related thread on lcg-rollout) - but no word from Sno+. Who is wrangling the other VOs in these post-Walker times? Waiting for reply (24/2)<br />
<br />
'''TIER 1 - 1/10/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944]<br /><br />
A CMS access about AAA tests at the Tier 1. This ticket is being actively worked on, with a new xrootd redirector at RAL and problems with the EU redirectors mucking things up. No problems that I can see. Waiting for reply (2/3)<br />
<br />
'''100IT - 10/9/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356]<br /><br />
Getting VMcatcher stuff to work at 100IT. This ticket seems to keep stalling due to lack of documentation or replies from the submitters. Waiting for reply (19/1)<br />
<br />
'''LANCASTER - 27/1/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566]<br /><br />
Lancaster's poor perfsonar performance. Being poked and prodded on and off over the last year, but the problems remain a mystery - Ewan's lending of a iperf endpoint has helped out greatly though, waiting on yet another network tweak. On Hold (23/2)<br />
<br />
'''EFDA-JET - 21/9/13'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=97485 97485]<br /><br />
LHCB job failures at EFDA-JET. The causes of this remain a mystery, and is the first ticket on my "to be set to unsolved" list.<br />
<br />
'''UCL - 1/7/13'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95298 95298]<br /><br />
UCL's glexec ticket. Ben's been working hard at this recently, but keeps hitting show stoppers - the latest being a performance problem possibly due to the VM he's running argus on. On hold (19/2)<br />
<br />
'''ECDF and LANCASTER - 1/7/13'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (ECDF)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (Lancaster)<br /><br />
glexec for the tarball. This is *still* waiting on the tarball glexec, which is again waiting on me, which is waiting on me magicking some extra tarball development time. Will be reviewed by the end of March.<br />
<br />
<br />
'''Monday 23rd February 2015, 15.00 GMT'''<br /><br />
Only 15 Open UK tickets this week. Feel free to bring up any ticket-based issues of your own on this quiet week.<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703](11/2)<br /><br />
This CMS glexec hammercloud ticket is looking a little quiet - no update for a while. If it's continuing offline or waiting on input could it at least be put On Hold? In progress (11/2)<br />
(The Tier-1 version of this ticket, 111699, seems to be chugging along fine - there might be useful snippets in there).<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333](22/1)<br /><br />
The Cloud accounting probe ticket was reopened, asking if 100IT ticketed apel support (I assume contacting them via other means would work too) otherwise the new cloud accounting won't be properly republished. Reopened (20/2)<br />
<br />
<br />
'''IMPERIAL''' (but not really their issue)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111872 111872](20/2)<br /><br />
Tom opened a ticket after another cern@school user had troubles using the IC SE - there has been some problems with the newer versions of the dirac UI. Sometimes it's better to go Vintage! Although after trying Simon and Daniela haven't been able to reproduce the failure - perhaps something's up with the user's UI? Waiting for reply (23/2)<br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389](26/11/14)<br /><br />
Sussex's Perfsonar ticket. I know Matt RB has put the ticket On Hold and is very busy - is there any news/anything we can do to help? On Hold (21/1<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694](28/10/14)<br /><br />
Sno+ gfal copy problems at RAL. Brian informs us that the latest version of the gfal tools works for him and has asked if they work for Matt M. and Co. Did you get these packages out of epel/epel-testing or somewhere else Brian? Waiting for reply (18/2)<br />
<br />
'''Monday 16th February 2015, 14.30 GMT'''<br /><br />
Only 19 open UK tickets today.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111120 111120](12/1)<br /><br />
An atlas ticket concerning transfer failures between RAL and BNL. Brian mentioned last week that the lack of recent failures is due to atlas not attempting to transfer any older data recently. Perhaps this could do with being put into the ticket (and the ticket being put On Hold, or prodded some more)? Waiting for reply (29/1)<br />
<br />
Also<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=111800 111800] (17/2)<br /><br />
ARC CE issues at RAL detected. <br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703](11/2)<br /><br />
Atlas running glexec hammerclouds - having trouble at RALPP (and RAL, see 111699). The glexec experts have gotten involved on this one, and asked to take a peek at a proxy - not sure about anyone else, but I'd feel a tad uncomfortable sharing proxies, even with known and trusted experts as in this case. Am I being overly paranoid? Either way the ticket has gone a bit quiet. In progress (11/2) <br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356] & [https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333]<br /><br />
Both the 100IT tickets are in Waiting for reply - the oldest one for quite a while - David asked a question a while back and no answer. The newest one asks if the 100IT logs made it to apel safely - I think what David has to do is submit a ticket with this question to the apel support team - have I got the right end of the stick? <br />
<br />
'''Biomed Tickets at Manchester and Imperial'''<br /> <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356] & [https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357]<br /><br />
FYI There's a note at the bottom of both of these tickets that the version of CREAM that should fix this has been delayed until the end of February(ish).<br /><br />
''Update - I read those updates wrong - the cream update has been released and these tickets have been (perhaps erroneously) closed.'' <br />
<br />
<br />
<br />
'''Monday 9th February 2015, 15.00 GMT'''<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios Results]<br /><br />
At the time of writing the only site showing red that aren't suffering an understood problem was RALPP with org.nordugrid.ARC-CE-submit and SRM-submit test failures for gridpp, pheno, t2k and southgrid for both its CEs and its SE. The failures are between 1 and 12 hours old, so it doesn't seem to be a persistent failure, but it seems to be quite consistent. They all seem to be failing with "Job submission failed... arcsub exited with code 256: ...ERROR: Failed to connect to XXXX(IPv4):443 .... Job submission failed, no more possible targets". Anyone seen something like this before?<br />
<br />
Only 20 Open UK tickets this week.<br />
<br />
'''Biomed tickets:'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356] (Manchester)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357] (Imperial)<br /><br />
Biomed have linked both these tickets as children of 110636, being worked on by the cream blah team. AFAIKS no sign of Cream 1.16.5 just yet.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111347 111347] (22/1)<br /><br />
CMS consistency checks for January 2015. It looks like everything that was asked of RAL has been done by RAL, so hopefully this can be successfully closed. In progress (3/2)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111120 111120] (12/1)<br /><br />
Another ticket, this time concerning a period of Atlas transfer failures between RAL and BNL, that looks like it can be closed as the failures seem to have stopped (and might well have been at the BNL end). Waiting for reply (22/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA test failures at RAL. Federica can't connect to the new xrootd service according to the error messages. No news for a while. In progress (29/1)<br />
<br />
'''100IT'''<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356]<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333]<br /><br />
Both of these 100IT tickets are looking a bit crusty - the first is waiting for advice, the second was just put "In progress".<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353] (25/11/14)<br />
Dan has set up se02.esc.qmul.ac.uk to test out the latest https-accessible version of storm for dteam and atlas. As a cherry on top this node is also IPv6 enabled. I'm not sure if Dan wants others in the UK to "give it a go"? In progress (6/2)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/14)<br /><br />
(Blatantly scounging for advice) Trying to figure out why Lancaster's perfsonar is under-performing. Ewan kindly gave us access to a iperf endpoint and it's been very useful in characterising some of the weirdness - although I'm still confused. Ewan also gave us a bunch of suggestions for testing that have been useful - next stop, window sizes. If anyone else wants to throw advice to me all wisdom donations are thankfully accepted. My advice for others in be careful trying to connect to the default iperf port on a working DPM pool node.... In Progress (9/2)<br />
<br />
'''Monday 2nd February 2015, 14.00 GMT'''<br /><br />
22 Open UK tickets this month. <br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389] (26/11/14)<br /><br />
A perfsonar ticket for Sussex. Their perfsonar has been reinstalled, but needs soothing. Matt has informed us that this might have to wait a few weeks due to other issues. On Hold (21/1)<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110536 110536] (2/12/14)<br /><br />
MICE job failures at RALPP - it looked like they were dying due to running of of memory. The queues have been tweaked to give MICE more, but no word from the MICE if this has solved the problem. Waiting for reply (12/1)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110365 110365] (25/11/14)<br /><br />
Another perfsonar ticket. Again the node is reinstalled, just not quite working right. Winnie is waiting for news from the other sites in a similar boat. In progress (maybe On Hold it?) (20/1)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111118 111118] (12/1)<br /><br />
ECDF "low availability" ticket - just waiting for the silly alarm to clear. Daniela submitted a ticket about this foolish alarm a while ago - [https://ggus.eu/?mode=ticket_info&ticket_id=107689 107689]. On Hold (19/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (1/7/13)<br /><br />
glexec tarball ticket. With my tarball hat on - still no positive news on this front - it's beginning to look like this can't be done but we're having one last go. Sorry! On Hold (19/12)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110225 110225] (18/11/14)<br /><br />
Change of VO Manager for helios-vo.eu. It looks like this ticket is being held up at the user end a lot. I'm not sure there's anything we can do as it involves outside CAs. On Hold (20/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356] (23/1)<br /><br />
One of Manchester's CEs not working for biomed, due to problems with the new CREAM/old WMS communication. Alessandra gave biomed some sagely advice, but I suspect this ticket will need to be prodded soon to get a reponse from biomed (who I agree should use a newer WMS and close it). On Hold (26/1)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111547 111547] (2/2)<br /><br />
I'm reporting on a ticket that I submitted to myself today. I'm not sure what that says about the world. Anyway - a ticket to track the decommissioning of one of Lancaster's CEs, as we try to do it all proper like. On Hold (2/2)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (21/1/14)<br /><br />
Lancaster's perfsonar ticket, which I sadly let reach its first birthday. I've been prodding this offline, does anyone have the address for a regular, open iperf endpoint I could borrow? On Hold (9/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13)<br /><br />
Lancaster's tarball glexec ticket, as the ECDF one. On hold (26/1)<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13)<br /><br />
UCL's glexec ticket. They've been having trouble getting it to behave, and at last check Ben was off ill - probably due to dealing with glexec :-) On Hold (20/1)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353] (25/11/14)<br /><br />
Atlas asking for QM's storage to be made available via https. Waiting on a production ready STORM that can provide this - Dan is trying it out on his testbed se02.esc.qmul.ac.uk, which still needs tweaking. In progress (28/1)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357](23/1)<br /><br />
One of the IC CEs not working for biomed. Similar to the Manchester ticket, Daniela points to ticket 110635 and is waiting on an EMI release to fix it (due out imminently AIUI). On Hold (28/1)<br />
<br />
'''EFDA-JET'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=97485 97485] (21/9/13)<br /><br />
Jet's LCHB job failure tickets. I'm afraid I haven't been able to chase this up (partly due to only ever remembering on the first Monday of the month) - there's been no news for a while. On Hold (1/10/14)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333] (22/1)<br /><br />
A ticket to 100IT and the NGI to get the cloud accounting probe upgraded. I notified 100IT, but forgot to reassign the ticket - thanks to Jeremy for doing it. Assigned (2/2)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356](10/9/14)<br /><br />
Getting VMcatcher working at 100IT. David from 100IT has asked for some answers on which "glancepush" to use, but no reply for a while. Waiting for reply (19/1)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111477 111477](29/1)<br /><br />
CMS would like to run some staging tests to warm up for Run2. The Tier 1 warned CMS of today's outage and they're happy to proceed tomorrow (the 3rd) - I think they'd like a response. In progress (30/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=107935 107935](27/8/14)<br /><br />
A ticket regarding inconsistent BDII and SRM storage numbers. Waiting on a fix from the developers regarding read-only disk accounting (I think), Brian is still on the case. Stephen B let us know that Maria the ticket submitter is on maternity leave, and asks in her stead if the numbers are expected to align now. On hold (28/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111120 111120](12/1)<br /><br />
An atlas ticket about a large number of data transfer errors seen between RAL and BNL. Brian reckoned that this was due to shallow checksums on the old data being transferred, but had trouble looking at the BNL FTS. Regardless, the ADCoS shifter hadn't seen any errors for a week and suggests the ticket can be closed. Waiting for reply (29/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944](1/10/14)<br /><br />
CMS AAA test problems at RAL. After setting up a new xrootd box the test failures have changed in nature, but sadly they're still failures. In progress (29/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111347 111347](22/1)<br /><br />
CMS Consistency Check for RAL, January 2015 edition. Filelists were generated, orphan files were identified, then purged. Just need to know what CMS want to do next. Waiting for reply (26/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694](28/10/14)<br /><br />
Sno+ ticket concerning gfal tool problems, waiting on the new release to come out (middle of this month I believe). If you don't want to wait that long then I believe the 2.8 gfal2 tools can be found in the fts3 repo at last check. On hold (20/1)<br />
<br />
<br />
'''Monday 26th January 2015, 14.15 GMT'''<br /><br />
''Back after being forgotten about by me:''<br /><br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios Status:''']<br />
<br />
At the time of writing I see:<br /><br />
'''Imperial:''' gridpp VO job submission errors (but only 34 minutes old so probably naught to worry about).<br /><br />
'''Brunel:''' gridpp VO jobs aborted (one of these is 94 days old, so might be something to worry about).<br /><br />
'''Lancaster:''' pheno failures (I can't see what's wrong, but this CE only has 10 days left to live).<br /><br />
'''Sussex:''' snoplus failures (but I think Sussex is in downtime).<br /><br />
'''RALPP:''' A number of failures across a number of CEs, all a few hours old. An SE problem?<br /><br />
'''Sheffield:''' gridpp VO job submission failure, but only 6 hours old.<br />
And of course the srm-$VONAME failures at the Tier 1, which are caused by incompatibility between the tests and Castor AIUI. Things are generally looking good.<br />
<br />
22 Open UK Tickets this week. <br /><br />
'''NGI/100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333](22/1)<br /><br />
The NGI has been asked to upgrade the cloud accounting probe, and then notify our (only at the moment) cloud site to republish their accounting. Not entirely sure what this entails or who this falls on, I assigned it to NGI-OPERATIONS (and also noticed that 100IT isn't on the "notify site" list - odd). Assigned (22/1)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944](1/10/14)<br /><br />
CMS AAA test failures. Andrew Lahiff reported last week that the Tier 1 is building a replacement xrootd box which is currently being prepared. If that will take a while can the ticket be put on hold? In progress (19/1)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353](25/11/14)<br /><br />
An atlas ticket, asking for httpd access to at QMUL. The QM chaps were waiting on a production ready Storm that could handle this, and are preparing to test one out. This is another ticket that looks like it might need to be put On Hold (will leave that up to you chaps - there's a big difference between "slow and steady" progress and "no progress for a while"). In progress (21/1)<br />
<br />
'''RHUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111355 111355](23/1)<br /><br />
A dteam ticket - concerning http access to RHUL's SE. Although the initial observation about the SE certificate being expired was incorrect (the expiry date was reported as 5/1/15, which to be fair I would read as the 5th of January and not the 1st of May!) there still is some underlying problem here with intermittent test failures. Also this ticket raises the question of under what context are these tests being conducted? Anyone know, or shall we ask the submitter? In progress (26/1)<br />
<br />
'''BIOMED PROBLEMS:'''<br /><br />
'''Manchester:''' [https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356](23/1)<br /><br />
'''Imperial:''' [https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357](23/1)<br /><br />
Biomed are having job problems, looking to be caused by using crusty old WMSes to communicate with these site's shiny up-to-date CEs. According to ticket 110635 a cream side fix should be out by the end of January (CREAM 1.16.5), although Alessandra suggests that Biomed should try to use newer, working WMSes - or Dirac instead!<br />
<br />
<br />
'''Monday 19th January 2015, 14.30 GMT'''<br /><br />
23 Open UK Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944](1/10/14)<br /><br />
CMS seeing AAA test failures at RAL. The tests have been restarted recently and now seem to be having some suspicious looking authentication failures. In Progress (13/1)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111162 111162](14/1)<br /><br />
Atlas complaining about httpd doors not working on Sheffield's SE. After schooling the submitter in how to submit more useful information Elena is working on it. I bring this up as in the last few days I've had quite a few of my pool nodes have their httpd daemons crash on them (they're up to date, but still SL5), which may or may not be related. In Progress (19/1)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111118 111118](12/1)<br /><br />
ECDF "low availability" ticket after a few days of argus trouble, which Wahid fixed. Now the ticket will languish for a few weeks as the alarm clears. Daniela has reminded us of her ticket against these fairly silly alarms: https://ggus.eu/?mode=ticket_info&ticket_id=107689 . In the mean time this ticket could do with being put On Hold whilst the alarm clears. In progress (19/1)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356](10/9/2014)<br /><br />
Setting up VMCatcher at 100IT. After some troubles things seem to have be looking up, although there are still some questions that the 100IT chaps have for the configurations and what they should be using that aren't getting answers. I set the ticket to "Waiting for Reply" hoping that this will help get those in the know's attention. Waiting for reply (15/1)<br />
<br />
'''Perfsonar Tickets'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389](Sussex)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110382 110382](TIER 1)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108273 108273](Durham)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566](Lancaster)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110365 110365](Bristol)<br /><br />
Everyone seems to have updated their perfsonar hosts, so we're all good on that front, but a number of sites are either having trouble with their reinstalled hosts, or are having problems that they had pre-reinstall still haunt them. I'm afraid I have no suggestions of what to do about the growing number of these tickets though!</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Operations_Bulletin_LatestOperations Bulletin Latest2015-10-05T14:57:19Z<p>Matthew Doidge 1ac9bd3994: /* */</p>
<hr />
<div>[[Operations_Bulletin_Archive|Bulletin archive]]<br />
<br />
__NOTOC__<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Week commencing 5th October 2015<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Task Areas<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start General****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | General updates <br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start General text*********************** ----->'''<br />
<br />
'''Tuesday 29th September'''<br />
* Nagios was (and therefore the regional dashboard has) affected by a weekend A/C outage at Oxford.<br />
* Steve J reports on: Condor libglobus_common problems<br />
* There was an EGI OMB on 24th September. [https://indico.egi.eu/indico/conferenceDisplay.py?confId=2380 Agenda]. <br />
* [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGDailyMeetingsWeek150928#Monday Notes from the Monday biweekly WLCG ops meeting] are available for anyone who is interested in the latest ops news.<br />
* On the topic 'Perfsonar Bandwidth checks not running' Duncan reported a move to a [https://maddash.aglt2.org/maddash-webui/index.cgi?dashboard=Latency%20tests%20between%20all%20WLCG%20hosts full WLCG mesh].<br />
* Tom would appreciate feedback on the [https://vm36.tier2.hep.manchester.ac.uk/ GridPP website v2].<br />
* Steve Lloyd has setup a [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics2.html new metrics page] as a basis for allocating T2 hardware funding. This just uses total Disk and total Elapsed and/or CPU time. In the PMB yesterday it was agreed that Elapsed time would be used, but the results of various combinations will be watched and assessed over the coming months. One overriding reason for using Elapsed time is that CPU is not provided by all cloud implementations.<br />
* <br />
<br />
'''Tuesday 22nd September'''<br />
* Monday's WLCG weekly ops meeting [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGDailyMeetingsWeek150921#Monday minutes] are available.<br />
* There is an EGI Operations Management Board this Thursday. Do we have any items to raise?<br />
* Several observations recently of FTS3 at RAL being overloaded.<br />
* Federico raised: anomalous CPU usage for DIRAC ilc jobs.<br />
* Looking at supporting DEAP3600 (RHUL, RAL and Sussex).<br />
<br />
<br />
'''Tuesday 15th September'''<br />
* As mentioned last week, GridPP is creating a Tier-2 Evolution working group.<br />
* LCG-ROLLOUT ''glexec' missing in /cvmfs/grid.cern.ch/emi3wn-latest?'. Matt is making progress.<br />
* Cambridge machine room relocation Wednesday 16th September.<br />
* Registration for the [http://cf2015.egi.eu/ EGI Community Forum 2015 in Bari] is open. [https://indico.egi.eu/indico/conferenceDisplay.py?confId=2544 Agenda].<br />
* There was a GDB at CERN last week: Agenda : [https://twiki.cern.ch/twiki/bin/view/LCG/GDBMeetingNotes20150909 Minutes]. [http://indico.cern.ch/event/319751/ Agenda].<br />
* Simon noted this (DIRAC doc link) was broken: https://github.com/gridpp/user-guides/blob/master/DIRAC-getting-started.md. <br />
* Articles for the [https://wlcg-ops.web.cern.ch/ WLCG ops portal].<br />
<br />
<br />
'''Tuesday 1st September'''<br />
* From October/November, the EGI ops VO monitoring will be performed using RFC proxies, as opposed to legacy proxies.<br />
* There will be a new filter for the critical profile for ATLAS WLCG SAM tests so that only production endpoints will be tested and taken into account for site availability metrics. This will be available from the SAM3 interface.<br />
* CNAF: Due to a fire causing problems with one electrical supply line that happened last Thursday, the computing centre is running at lower capacity (around 30% less of the pledged capacity).<br />
* Machine job features testing has hit a big according to Raul!<br />
* The HNSciCloud PCP pilot proposal successful submission (refer to Andrew Sansum's talk at GridPP34 if you forget what this means!). The project intends to procure commercial cloud resources for FY17 and FY18. We will contribute 75K euro towards this activity and the EU will then top up to 250K euro.<br />
* There was an EGI Operations Management Board last Thursday. There are no summary notes yet, but please take a look at the [https://indico.egi.eu/indico/conferenceDisplay.py?confId=2379 agenda] and linked talks (may be worth skimming them at the ops meeting).<br />
* There was a quick request/reminder for sites to please update their [https://www.gridpp.ac.uk/wiki/IPv6_site_status IPv6 entries in the GridPP wiki].<br />
* RAL is closed on Monday and Tuesday of this week!<br />
<br />
<br />
<br />
'''Tuesday 24th August'''<br />
* A [https://indico.cern.ch/event/319751/ draft agenda for the September GDB] is taking shape. Any Tier-2 rep volunteers this month please email Jeremy.<br />
* Interest in HTCondor CE. (Ops thread 20/08).<br />
<br />
<br />
<!-- **********************End General text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End General****************** -----><br />
<!-- ****************Start ops coord****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Operations Coordination - [https://indico.cern.ch/categoryDisplay.py?categId=4372 Agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start ops coord text*********************** -----><br />
'''Tuesday 22nd September'''<br />
* There was an ops coordination meeting last Thursday: [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150917 Minutes].<br />
* Highlights:<br />
** All 4 experiments have now an agreed workflow with the T0 for tickets that should be handled by the experiment supporters and were accidentally assigned to the T0 service managers.<br />
** A new FTS3 bug fixing release 3.3.1 is now available.<br />
** A globus lib issue is causing problems with FTS3 for sites running IPv6.<br />
** The rogue Glasgow configuration management tool replacing the current configuration for VOMS with the old one was picked up and unfortunately discussed as though sites had not got the message about using the new VOMS.<br />
** No network problems experienced with the transatlantic link despite 3 out of 4 cables being unavailable.<br />
** T0 experts are investigating the slow WN performance reported by LHCb and others.<br />
** A group of experts at CERN and CMS investigate ARGUS authentication problems affecting CMS VOBOXes.<br />
** T1 & T2 sites please observe the actions requested by ATLAS and CMS (also on the WLCG Operations portal).<br />
* Actions for [https://wlcg-ops.web.cern.ch/sys-admins/tasks Sites]; [https://wlcg-ops.web.cern.ch/experiments/tasks Experiments].<br />
<br />
'''Tuesday 15th September'''<br />
* The [https://indico.cern.ch/event/393616/ next WLCG ops coordination meeting is this Thursday 17th September]. Are there any Tier-2 issues we wish to raise? Minutes will appear [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150917 here]. <br />
<br />
<br />
* There was a middleware readiness meeting on 16th September.<br />
* The new DPM version is being tested via the ATLAS workflow by the Edinburgh Volunteer site.<br />
* Many new sites showed interest to participate in MW Readiness testing with CentOS7. It is useful to anticipate the MW behaviour in the event of new HW purchase. DPM validation on CentOS/SL7 is already ongoing at Glasgow.<br />
* ATLAS and CMS are asked to declare whether the xrootd 4 monitoring plugin is important for them or not. As it is now, it doesn't work with dCache v. 2.13.8<br />
* Despite the fact that FTS3 runs at very few sites we decided to test it for Readiness. In this context, ATLAS and CMS are asked to use the FTS3 pilot in their transfer test workflows<br />
* PIC successfully tested dCache v.2.13.8 for CMS.<br />
* CNAF has obtained Indigo-DataCloud effort to strengthen the ARGUS development team. The ARGUS collaboration will meet again early October. The problems faced at CERN with a CMS VOBOX are being investigated in ticket GGUS:116092.<br />
* The next MW Readiness WG vidyo meeting will take place on Wednesday 28 October at 4pm CET.<br />
<br />
<br />
<!-- **********************End ops coord text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End ops coord****************** -----><br />
<br />
<!-- ****************Start T1****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-1 - [http://www.gridpp.rl.ac.uk/status/ Status Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start T1 text*********************** -----><br />
'''Tuesday 6th October'''<br />
A reminder that there is a weekly [http://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Tier-1 experiment liaison meeting]. Notes from the last meeting [https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2015-09-30 here]<br />
* The problems with the production FTS service have been resolved. A workaround to the memory leak introduced with the new version has been supplied. This, along with a reduction in the numbers of transfers queued, has enabled the service to return to normal operation.<br />
* The second step in the upgrade of the Castor Oracle databases to version 11.2.0.4 took place last Tuesday. This was the upgrade of the "Neptune" standby database and the re-establishment of the Dataguard link. ("Neptune" hosts the Atlas and GEN instance stagers.). Note: The next step in this upgrade is the upgrade of the "Pluto" database which hosts the Nameserver as well as the CMS & LHCb stager databases. '''This will require all of Castor to be down for the day and is scheduled for the 6th October.'''<br />
* The upgrading of the Tier1's link into the RAL core network to 40Gb took place successfully on the morning of Wednesday 30th September.<br />
<!-- **********************End T1 text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End T1****************** -----><br />
<br />
<!-- ****************Start Storage & DM****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Storage & Data Management - [http://storage.esc.rl.ac.uk/weekly/ Agendas/Minutes]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150902-minutes.txt Wednesday 02 Sep]'''<br />
* Catch up with [http://www3.imperial.ac.uk/highenergyphysics/research/experiments/mice MICE]<br />
* How to do transfers of '''lots''' of files with FTS3 without the proxy timing out (in particularly if you need it vomsified)<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150812-minutes.txt Wednesday 12 Aug]'''<br />
* sort of housekeeping: data cleanups, catalogue synchronisation - in particular namespace dumps for VOs<br />
* GridPP storage/data at future events; GridPP35 and Hepix and Cloud data events<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150708-minutes.txt Wednesday 08 July]'''<br />
* Huge backlog of ATLAS data from Glasgow waiting to go to RAL, and oddly varying performance numbers - investigating<br />
* How physics data is like your Windows 95 games<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150701-minutes.txt Wednesday 01 July]'''<br />
* Feedback on CMS's proposal for listing contents of storage<br />
* Simple storage on expensive raided disks vs complicated storage on el cheapo or archive drives?<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150624-minutes.txt Wednesday 24 June]'''<br />
* Heard about the Indigo datacloud project, a H2020 project in which STFC is participating<br />
* Data transfers, theory and practice<br />
** Somewhat clunky tools to set up but perform well when they run<br />
** Will continue to work on recommendations/overview document<br />
** Worth having recommendations/experiences for different audiences - (potential) users, decision makers, techies<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End Storage & DM****************** -----><br />
<!-- ****************Start T2EVO****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-2 Evolution - [https://its.cern.ch/jira/browse/GRIDPP/ GridPP JIRA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 29 Sep'''<br />
* UCL Vac site updated with most recent version of Vac-in-a-Box. Now running ~216 jobs: LHCb MC and ATLAS certification jobs.<br />
* Drawing up list of tasks needed to be able to run a site for GridPP-supported VOs purely using VMs (e.g. VM certification by experiments etc.)<br />
* Discussion at GridPP Technical Meeting on storage options, including xrootd-based sites (i.e. xrootd not DPM/dCache)<br />
<br />
'''Tuesday 22 Sep'''<br />
* Fortnightly [https://indico.cern.ch/category/4454/ GridPP Technical Meetings] on Fridays will have Tier-2 Evolution discussions, starting on Fri 25 Sept.<br />
<br />
'''Thursday 17 Sep'''<br />
* Task force to start developing advice for sites to simplify their operation in line with "6.2.5 Evolution of Tier-2 sites" in the GridPP5 proposal.<br />
* Mailing list for Tier-2 evolution activities: gridpp-t2evo@cern.ch - anyone welcome to join<br />
* Also [https://its.cern.ch/jira/browse/GRIDPP/ GridPP project] on the CERN JIRA service for tracking actions. Can be used with a full or lightweight CERN account. You need to be added manually or on the gridpp-ops@cern.ch mailing list to browse issues.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End T2EVO****************** -----><br />
<!-- ****************Start Accounting****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Accounting - [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html UK Grid Metrics] [[HEPSPEC06]] [http://tinyurl.com/cfbfanf Atlas Dashboard HS06]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 22nd September'''<br />
* Slight delay for Sheffield but overall okay - although there is a gap between today's date and the most recent update for all sites. Perhaps an APEL delay.<br />
<br />
'''Monday 20th July'''<br />
* Oxford publishing 0 cores from Cream today. Maybe they forgot to switch one off. [http://goc-accounting.grid-support.ac.uk/apel/jobs2_withsubmithost.html Check here]. <br />
<br />
'''Tuesday 14th July'''<br />
* QMUL and Sheffield appear to be lagging with publishing by a week. <br />
* Please check your multicore publishing status (especially those sites mentioned in June).<br />
<br />
* A reminder to keep updating the [https://www.gridpp.ac.uk/wiki/HEPSPEC06 HEPSPEC06 tables].<br />
<br />
* [http://accounting.egi.eu/egi.php?SubRegion=1.67&query=normcpu&startYear=2014&startMonth=8&endYear=2014&endMonth=9&yRange=SITE&xRange=VO&voGroup=lhc&chart=GRBAR&scale=LIN&localJobs=onlygridjobs APEL status]: An issue at Sheffield?<br />
<br />
<!-- *****************Edit stop*********************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End Accounting****************** -----><br />
<!-- ****************Start Documentation****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Documentation - [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=area KeyDocs]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- *********************************************************** -----><br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 29nd September'''<br />
Steve J: problems with voms server at fnal, voms.fnal.gov, have been detected; I will resolve them soon and may issue an update to Approved VOs, alerting sites with TB_SUPPORT should that occur. Approved VOs potentially affected are CDF, DZERO, LSST. Please do not act act yet.<br />
<br />
'''Tuesday 22nd September'''<br />
* Steve J is going to undertake some GridPP/documentation usability testing. <br />
<br />
'''Tuesday 18th August'''<br />
* Lydia's document - Setup a system to do data archiving using FTS3<br />
<br />
'''Tuesday 28th July'''<br />
* Ewan: /cvmfs/gridpp-vo help ... there's a lot of historical stuff on the GridPP wiki that makes it look a lot more complicated than it is now. We really should have a bit of a clear out at some point.<br />
<br />
'''Tuesday 23rd June'''<br />
* Reminder that documents need reviewing!<br />
<br />
<br />
'''General note'''<br />
<br />
See the [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=reviewed worst KeyDocs list] for documents needing review now and the names of the responsible people.<br />
<br />
|}<br />
<br />
<!-- ****************End Documentation****************** -----><br />
<!-- ****************Start Interop****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Interoperation - [https://wiki.egi.eu/wiki/Grid_Operations_Meetings EGI ops agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Monday 13th July'''<br />
<br />
* There was an EGI Ops meeting today: [https://wiki.egi.eu/wiki/Agenda-13-07-2015 agenda]<br />
* URT/UMD updates:<br />
** UMD 3.13.0 released on 01.07.2015: http://repository.egi.eu/2015/07/01/release-umd-3-13-0/<br />
*** APEL 1.4.1<br />
*** Argus PAP v. 1.6.2<br />
*** gLExec-wn - v. 1.2.3 (lcmaps and mkdir)<br />
*** storm 1.11.8<br />
*** fetch-crl 3.0.16<br />
*** cream 1.16.5<br />
*** dpm-xroot 3.5.2<br />
*** Xroot 4.1.1.<br />
*** Frontier Squid 2.7.24<br />
*** CVMFS 2.1.20<br />
*** GFAL2 2.8.4<br />
*** GFAL2-PYTHON 1.7.1<br />
** UMD 3.13.1 released on 13.07.2015: http://repository.egi.eu/2015/07/13/release-candidate-umd-3-13-1-rc1/ (link was not updated correctly during release)<br />
*** ARC Nagios probes 1.8.3<br />
<br />
* SR updates (small because it's summer):<br />
*** gfal2 2.9.1<br />
*** storm 1.11.9<br />
*** srm-ifce 1.23.1....<br />
*** gfal2-python 1.8.1<br />
** In Verification<br />
*** gfal2-plugin-xrootd 0.3.4<br />
<br />
* Accounting<br />
** [John Gordon] "Of the WLCG sites we now have 97%+ of cpu reported with cores. I expect you all saw my recent email to GDB naming 16 sites. If one German and one Spanish site and the four Russians start publishing we will jump to 99%+"<br />
** New list of sites needing to update multicore accounting being prepared this evening (Monday) by Vincenzo<br />
<br />
* SL5 decommissioning date March 2016; <br />
* Next meeting 10th August<br />
<br />
'''Monday 15th June'''<br />
* There was an EGI operations meeting today: [https://wiki.egi.eu/wiki/Agenda-15-06-2015 agenda]. <br />
* New Action: for the NGIs: please start tracking which sites are still using SL5 services: how many services, and for each service if still needed on SL5, if upgrades on SL5 services are expected). [https://wiki.egi.eu/wiki/SL5_retirement A wiki has been provided to record updates]. Also interesting to understand who is using Debian.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- *********************************************************** -----><br />
<br />
|}<br />
<!-- ****************End Interop****************** -----><br />
<!-- ****************Start Monitoring****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Monitoring - [https://www.gridpp.ac.uk/wiki/Links_Monitoring_pages Links] [http://grid-monitoring.cern.ch/mywlcg/ MyWLCG]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 16th June'''<br />
* F Melaccio & D Crooks decided to add a [https://www.gridpp.ac.uk/wiki/Monitoring_FAQs FAQs section] devoted to common monitoring issues under the monitoring page.<br />
* Feedback welcome. <br />
<br />
<br />
<br />
''' Tuesday 31st March<br />
<br />
* Glasgow Graphite/Grafana documentation: http://www.scotgrid.ac.uk/graphite/<br />
<br />
'''Monday 7th December<br />
<br />
* Meeting last Friday - agenda: https://indico.cern.ch/event/356853/ minutes: https://indico.cern.ch/event/356853/material/minutes/1.pdf<br />
* This was the wrap-up meeting of the consolidation TF; the mailing list will remain extant for a while yet.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Monitoring****************** -----><br />
<!-- ****************On-duty****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | On-duty - [https://operations-portal.in2p3.fr/dashboard Dashboard] [https://www.gridpp.ac.uk/wiki/ROD_rota ROD rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 15th September'''<br />
* Generally quiet. QMUL have some grumblyness with the CEs. However, I understand much of this is caused by the batch farm being busy. There are low-availability tickets 'on hold' for Liverpool and UCL.<br />
<br />
<br />
'''Tuesday 1st September'''<br />
* Gordon was on shift.<br />
* Another very quiet week with no new tickets. We have two open ROD tickets, both of which are for A/R; one is against Cambridge, and the other is the now 53-day-old ticket at UCL.<br />
* Next up.... Kashif again (thanks Kashif!)<br />
<br />
'''Monday 24th August'''<br />
* Kashif on shift.<br />
* There were quite a few alarms throughout the week and many tickets were opened. All of the tickets were fixed within time limit. <br />
* The certificate of the Argus server at sheffield expired but Elena got a new certificate quickly. <br />
* Cambridge and UCL have low availibilty tickets and not much can be done about it except waiting for availibilty to reach 90%.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Rollout [http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html Status] [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions WLCG Baseline]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 15th September'''<br />
* [http://repository.egi.eu/2015/09/10/release-umd-3-13-3/ UMD 3.13.3 is available].<br />
<br />
'''Tuesday 12th May'''<br />
* MW Readiness WG meeting Wed May 6th at 4pm. Attended by Raul, Matt, Sam and Jeremy.<br />
<br />
'''Tuesday 17th March'''<br />
* Daniela has updated the [ https://www.gridpp.ac.uk/wiki/Staged_rollout_emi3 EMI-3 testing table]. Please check it is correct for your site. We want a clear view of where we are contributing.<br />
* There is a middleware readiness meeting this Wednesday. Would be good if a few site representatives joined.<br />
* Machine job features solution testing. Fed back that we will only commence tests if more documentation made available. This stops the HTC solution until after CHEP. Is there interest in testing other batch systems? Raul mentioned SLURM. There is also SGE and Torque.<br />
<br />
'''References'''<br />
<br />
* Staged Rollout pages (now separated into EMI1 & 2), and the page listing the deployed versions is extractable from the bdii, so they should all be reasonably up-to-date:<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout_emi2.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End On-duty****************** -----><br />
<!-- ****************Start Security****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Security - [http://www.gridpp.ac.uk/security/inchand/Incident.html Incident Procedure] [http://www.gridpp.ac.uk/security/policies/index.html Policies] [https://www.gridpp.ac.uk/wiki/SD_rota Rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
'''Monday 5th October'''<br />
* Update on incident broadcast EGI-20150925-01 relating to compromised systems in China. - The EGI, WLCG and VO security teams are continuing their investigations. Affected sites and users have been contacted and there is no present indication of further action needed by any site in the UK. However, as more information comes to light, additional updates may be made in the near future and sites are asked as always to read any updates carefully, taking actions as recommended.<br />
<br />
'''Tuesday 29th September'''<br />
* Incident broadcast EGI-20150925-01 relating to compromised systems in China.<br />
* UK security team meeting scheduled for 30th Sept. <br />
<br />
'''Monday 29th September'''<br />
* IGTF has released a regular update to the trust anchor repository (1.68) - for distribution ON OR AFTER October 5th<br />
<br />
<br />
The EGI [https://operations-portal.egi.eu/csiDashboard security dashboard].<br />
<br />
|}<br />
<!-- ****************End Security****************** -----><br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start Services****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Services - [http://netmon02.grid.hep.ph.ic.ac.uk:8080/maddash-webui/index.cgi PerfSonar dashboard] | [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).<br />
<br />
'''Tuesday 18th August'''<br />
* Next [https://indico.cern.ch/event/401680/ LHCOPN and LHCONE joint meeting]: Science Park Amsterdam (NL) 28-29 of October 2015<br />
<br />
'''Tuesday 14th July'''<br />
* GridPP35 in September will have a part focus on networking and IPv6. This will include a review of where sites are with their deployment. Please try to firm up dates for your IPv6 availability between now and September. Please update the [https://gridpp.ac.uk/wiki/IPv6_site_status GridPP IPv6 status table]. <br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Services****************** -----><br />
<!-- ****************Start Tickets****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tickets<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Monday 5th October 2015, 14.15 BST'''<br />
<br />
22 Open UK Tickets this month, all of them, Site by Site:<br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116136 116136] (9/9)<br /><br />
Sussex got a snoplus ticket for a high number of job failures, although simple test jobs ran okay. Matt asks if the problem persists, the reply was a resounding "not sure". In progress (think about closing) (21/9)<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116652 116652](1/10)<br /><br />
A ticket from CMS, about some important Phedex ritual that must occur on the 3rd of November, when the stars are right. The ticket needs some confirmation and feedback, plus the nomination of one site acolyte to receive the DBParam secrets from CMS - but the ticket only got assigned to sites this morning. Assigned (5/10)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116651 116651](1/10)<br /><br />
Same as the RALPP ticket, Winnie has volunteered Dr Kreczko for the task. In progress (5/10)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303](Long, long ago)<br /><br />
glexec ticket. On hold (18/5)<br />
<br />
'''DURHAM'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116576 116576](1/10)<br /><br />
Atlas ticket asking Durham to delete all files outside of the datadisk path. Oliver asks what this means for the other tokens (I think they can be sacrificed to feed datadisk, but Brian et al can confirm that). Waiting for reply (5/10)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116560 116560](30/9)<br /><br />
Sno+ jobs having trouble at Sheffield. Looks like a proxy going stale problem as only 10 Sno+ jobs at a time can run at Sheffield. Matt M asks if/how the WMS can be notified to stop sending jobs in such a case. In progress (30/9)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460](18/6)<br /><br />
Gridpp Pilot roles. No news on this for a while, after the last attempt seemed to not quite work. In progress (30/7)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116585 116585](1/10)<br /><br />
Biomed ticketed Manchester with problems from their VO nagios box - which Alessandra points out being due to there being no spare cycles for biomed to run on. Assigned (can be put on hold or closed?) (1/10)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116082 116082](7/9)<br /><br />
A classic Rod Availability ticket. On Hold (7/9)<br />
<br />
'''LANCASTER''' (a little embarrassing that my own site has the most tickets)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116478 116478] (28/9)<br /><br />
Another availability ticket, this time for Lancaster (which has been through the wars in September). Still trying to dig our way out, but even the Admin's broke. On hold (5/10)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=116676 116676] (5/10)<br /><br />
Another ROD ticket, Lancaster's not quite out of the woods. We think WMS access is somewhat broken. We have no idea about the sha2 error. In progress (5/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116366 116366] (22/9)<br /><br />
Sno+ spotted malloc errors at Lancaster. The problems seemed to survive one batch of fixes, but I asked again if they still see problems after running a good number of jobs over the weekend. Waiting for reply (5/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (In a galaxy far, far way)<br /><br />
glexec ticket. This was supposed to be done last week, after I had figured out [https://www.gridpp.ac.uk/wiki/RelocatableGlexec "the formula"] - but then last week happened. On hold (5/10)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115959 115959] (31/8)<br /><br />
LHCB job errors at QM, with a 70% pilot failure rate on ce05. Dan couldn't see where things are breaking (only that the CE wasn't publishing to APEL- and asks if this is the cause of the problem?) Waiting for reply (5/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116662 116662] (5/10)<br /><br />
LHCB job failures on ce05 - almost certainly a duplicate of [https://ggus.eu/?mode=ticket_info&ticket_id=115959 115959], but it might have some useful information in it. Assigned (probably can be closed as a duplicate) (5/10)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116650 116650] (1/10)<br /><br />
Imperial's invitation to the CMS Phedex DBParam ritual. Daniela's on it, as well as the other CMS sites. On hold (5/10)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116649 116649] (1/10)<br /><br />
Brunel's ticket for the great DBParam alignment of 2015. On hold (5/10)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116455 116455] (28/9)<br /><br />
A CMS request to change the xrootd monitoring configs. Did you get round to doing this last week Raul? In progress (29/9)<br />
<br />
'''EFDA-JET'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115448 115448] (3/8)<br /><br />
Biomed having trouble tagging the jet CE. The Jet admins think this is the same underlying issues as their other ticket <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115496 115496]. In progress (25/9)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115496 115496] (5/8)<br /><br />
Biomed unable to remove files from the jet SE. There are clues that suggest that some dns oddness is the cause, but it's not clear. In progress (18/9)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116358 116358] (22/9)<br /><br />
Ticket complaining about a missing image at the site. Some to and fro, the ball is back in the site's court. In progress (2/10)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=116618 116618] (1/10)<br /><br />
The Tier 1's CMS DBParam ritual ticket. In progress (5/10)<br />
<br />
Let me know if I missed ought.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''T'OTHER VO NAGIOS''']<br /><br />
At time of writing things looka a bit rough at QM, Liverpool (just getting over their downtime) and for Sno+ at Sheffield (likely related to their ticket).<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tickets****************** -----><br />
<!-- ****************Start Tools****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tools - [https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] [https://gridppnagios.physics.ox.ac.uk/nagios/ Nagios]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== ===== <br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 29 Sep 2015'''<br />
<br />
Following an air-conditioning problem at machine room in Oxford Tier-2 site on 26 September, gridppnagios(OX) was shut down and gridppnagios(Lancs) became active instance. Oxford site is in downtime until 1st Oct and it may be extended depending on the situation. <br />
VO-Nagios was also unavailable for two days but we have started it yesterday as it is running on a VM. VO-nagios is using oxford SE for replication test so it is failing those tests. I am looking to change to some other SE. <br />
<br />
'''Tuesday 09 June 2015'''<br />
*ARC CEs were failing nagios test becuase of non-availability of egi repository. Nagios test compare CA version from EGI repo. It started on 5th June and one of the IP addresses behind webserver was not responding. Problem went away in approximately 3 hours. The same problem started again on 6th June. Finally it was fixed on 8th June. No reason was given in any of the ticket opened regarding this outage. <br />
<br />
'''Tuesday 17th February'''<br />
* Another period where message brokers were temporarily unavailable seen yesterday. Any news on the last follow-up?<br />
<br />
'''Tuesday 27th January'''<br />
* Unscheduled outage of the EGI message broker (GRNET) caused a short-lived disruption to GridPP site monitoring (jobs failed) last Thursday 22nd January. Suspect BDII caching meant no immediate failover to stomp://mq.cro-ngi.hr:6163/ from stomp://mq.afroditi.hellasgrid.gr:6163/<br />
<br />
* [http://southgrid.blogspot.co.uk/2014/10/nagios-monitoring-for-non-lhc-vos.html Blog about VO Nagios]<br />
* [https://vo-nagios.physics.ox.ac.uk/nagios/ Oxford VO Nagios] currently monitoring gridpp, pheno, t2k.org, snoplus.snolab.ca, vo.southgrid.ac.uk.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tools****************** -----><br />
<!-- ****************Start VOs****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | VOs - [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS] [http://operations-portal.egi.eu/vo VO IDs] [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs Approved] [http://pprc.qmul.ac.uk/~walker/votable.html VO table]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 19th May'''<br />
* There is a current priority for enabling/supporting our joining communities. <br />
<br />
'''Tuesday 5th May'''<br />
* We have a number of VOs to be removed. Dedicated follow-up meeting proposed.<br />
<br />
'''Tuesday 28th April'''<br />
* For SNOPLUS.SNOLAB.CA, the port numbers for voms02.gridpp.ac.uk and voms03.gridpp.ac.uk have both been updated from 15003 to 15503.<br />
<br />
'''Tuesday 31st March'''<br />
* LIGO are in need of additional support for debugging some tests.<br />
* LSST now enabled on 3 sites. No 'own' CVMFS yet.<br />
<br />
* Impact<br />
** Citation policy (https://www.gridpp.ac.uk/acknowledging.html)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End VOs****************** -----><br />
<!-- ****************Start Sites****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Site Updates<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 24th February'''<br />
* Next review of status today.<br />
<br />
'''Tuesday 27th January'''<br />
* Squids not in GOCDB for: UCL; ECDF; Birmingham; Durham; RHUL; IC; Sussex; Lancaster<br />
* Squids in GOCDB for: EFDA-JET; Manchester; Liverpool; Cambridge; Sheffield; Bristol; Brunel; QMUL; T1; Oxford; Glasgow; RALPPD.<br />
<br />
'''Tuesday 2nd December''' <br />
* [https://www.gridpp.ac.uk/wiki/Batch_system_status Multicore status]. Queues available (63%)<br />
** YES: RAL T1; Brunel; Imperial; QMUL; Lancaster; Liverpool; Manchester; Glasgow; Cambridge; Oxford; RALPP; Sussex (12)<br />
** NO: RHUL (testing); UCL; Sheffield (testing); Durham; ECDF (testing); Birmingham; Bristol (7)<br />
<br />
* According to our [https://www.gridpp.ac.uk/wiki/Batch_system_status table] for cloud/VMs (26%)<br />
** YES: RAL T1; Brunel; Imperial; Manchester; Oxford (5)<br />
** NO: QMUL; RHUL; UCL; Lancaster; Liverpool; Sheffield; Durham; ECDF; Glasgow; Birmingham; Bristol; Cambridge; RALPP; Sussex (14)<br />
<br />
* [http://www.gridpp.ac.uk/php/gridpp-dirac-sam.php?action=viewlcg GridPP DIRAC jobs] successful (58%)<br />
** YES: Bristol; Glasgow; Lancaster; Liverpool; Manchester; Oxford; Sheffield; Brunel; IC; QMUL; RHUL (11)<br />
** NO: Cambridge; Durham; RALPP; RAL T1 (4) + ECDF; Sussex; UCL; Birmingham (4)<br />
<br />
* [https://www.gridpp.ac.uk/wiki/IPv6_site_status IPv6 status]<br />
** Allocation - 42%<br />
** YES: RAL T1; Brunel; IC; QMUL; Manchester; Sheffield; Cambridge; Oxford (8)<br />
** NO: RHUL; UCL; Lancaster; Liverpool; Durham; ECDF; Glasgow; Birmingham; Bristol; RALPP; Sussex<br />
<br />
* Dual stack nodes - 21%<br />
** YES: Brunel; IC; QMUL; Oxford (4)<br />
** NO: RHUL; UCL; Lancaster; Glasgow; Liverpool; Manchester; Sheffield; Durham; ECDF; Birmingham; Bristol; Cambridge; RALPP; Sussex, RAL T1 (15)<br />
<br />
<br />
<br />
'''Tuesday 21st October'''<br />
* High loads seen in xroot by several sites: Liverpool and RALT1... and also Bristol (see Luke's TB-S email on 16/10 for questions about changes to help).<br />
<br />
'''Tuesday 9th September'''<br />
* Intel announced the new generation of Xeon based on Haswell.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Sites****************** -----><br />
|}<br />
<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Meeting Summaries<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start PMB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Project Management Board - [http://www.gridpp.ac.uk/pmb/ Members][http://www.gridpp.ac.uk/php/pmb/minutes.php Minutes] [http://www.gridpp.ac.uk/pmb/ProjectManagement/QuarterlyReports/reports.html Quarterly Reports]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ******************************************* -----><br />
<!-- ****************End PMB****************** -----><br />
<!-- ****************Start Grid Ops****************** -----><br />
<!-- ******************************************* -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | GridPP ops meeting - [http://indico.cern.ch/categoryDisplay.py?categId=338 Agendas] [https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Actions] [https://www.gridpp.ac.uk/wiki/Category:GridPP_Operations Core Tasks]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Grid Ops****************** -----><br />
<!-- ****************Start T1 liaison****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) [https://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Agenda] Meeting takes place on Vidyo.<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
Highlights from this meeting are now included in the Tier1 report farther up this page.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Liaison****************** -----><br />
<!-- ****************Start GDB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Grid Deployment Board - [http://indico.cern.ch/categoryDisplay.py?categId=3l181 Agendas] [http://indico.cern.ch/categoryDisplay.py?categId=666 MB agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End GDB****************** -----><br />
<br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start NGI****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | NGI UK - [http://www.ukngi.ac.uk/ Homepage] [https://ca.grid-support.ac.uk/cgi-bin/pub/pki?cmd=getStaticPage&name=index CA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End NGI****************** -----><br />
<br />
<!-- ****************Start Events****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Events<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************Stop Events****************** -----><br />
<!-- ****************Start ATLAS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK ATLAS - [http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?#currentView=Shifter+view&highlight=false Shifter view] [http://www.atlas.ac.uk/ops/ News & Links]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Atlas S&C week 2-6 Feb 2015'''<br />
<br />
Production<br />
<br />
• Prodsys-2 in production since Dec 1st<br />
<br />
• Deployment has not been transparent , many issued has been solved, the grid is filled again <br />
<br />
• MC15 is expected to start soon, waiting for physics validations, evgen testing is underway and close to finalised.. Simulation expected to be broadly similar to MC14, no blockers expected. <br />
<br />
Rucio<br />
<br />
• Rucio in production since Dec 1st and is ready for LHC RUN-2. Some fields need improvements, including transfer and deletion agents, documentation and monitoring. <br />
<br />
• [https://rucio-ui.cern.ch/dumps Rucio dumps available].<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMDarkDataCleaning Dark data cleaning]<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMLostFilesLost files declaration] . Only Only DDM ops can issue lost files declaration for now, cloud support needs to fill a ticket.<br />
<br />
• Webdav panda functional tests with Hammercloud are ongoing<br />
<br />
Monitoring<br />
<br />
• [http://adc-monitoring.cern.ch/ Main page]<br />
<br />
• [http://dashb-atlas-ddm-acc.cern.ch/dashboard/request.py/ddmaccounting DDM Accounting]<br />
<br />
• [http://atlas-agis.cern.ch/agis/ddmblacklisting/listGroup space]<br />
<br />
• [http://dashb-atlas-ddm.cern.ch/ddm2/ Deletion]<br />
<br />
ASAP<br />
<br />
• ASAP (ATLAS Site Availability Performance) in place. Every 3 months the T2s sites performing BELOW 80% are [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/ATLASSiteCategorization reported to the International Computing Board.]<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End ATLAS****************** -----><br />
<!-- ****************Start CMS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK CMS<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End CMS****************** -----><br />
<!-- ****************Start LHCb****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK LHCb<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End LHCb****************** -----><br />
<!-- ****************Start Other****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK OTHER<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Other****************** -----><br />
<!-- ****************Start Requests****************** -----><br />
{| cellspacing="0" cellpadding="0" style="margin: 1em 0em 0em 0.3em; width:100%"<br />
| style="width:50%; vertical-align:top; border:1px solid Gold; background-color: LightGreen;" rowspan="2"|<br />
<div style="border-bottom:1px solid Gold; background-color:#ffffaa; padding:0.2em 0.5em 0.2em 0.5em; font-size:110%; font-weight:bold;"> To note</div><br />
<div style="padding:0.4em 1em 0.3em 1em;"><br />
<br />
==== ====<br />
<!-- ******************Edit start********************* -----><br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<br />
</div><br />
|}<br />
<br />
<!-- ****************End Requests****************** -----><br />
<br />
|}</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Past_Ticket_Bulletins_2015Past Ticket Bulletins 20152015-10-05T13:04:21Z<p>Matthew Doidge 1ac9bd3994: </p>
<hr />
<div><br />
Matt's on holiday until the 29th, so he's being replaced with links or any update Jeremy is kind enough to provide.<br />
<br />
[http://tinyurl.com/nwgrnys '''CAST YOUR GAZE UPON THE UK'S GGUS TICKETS HERE'''].<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''FEAST YOUR EYES ON THE T'OTHER VO NAGIOS STATUS HERE''']<br />
<br />
"Normal" service will resume in October. I'll leave y'all anticipating that lovely review of all the tickets.<br />
<br />
'''Monday 14th September, 15.00 BST'''<br /><br />
<br />
Yet another brief ticket review - its been a tad busy! Hopefully normal service will resume, err, next month. My apologies.<br />
<br />
Only 20 Open UK tickets this week.<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115805 115805] - This RAL ticket from Sno+ looks like it can be closed, with there not actually being a problem with the site, or a feature that RAL would really want to implement. <br />
<br />
Keeping on the Tier 1, does this ticket concerning the FTS3 certificates ([https://ggus.eu/?mode=ticket_info&ticket_id=115290 115290]) have anyone looking into it yet?<br />
<br />
Are these two QMUL LHCB tickets duplicates, or related? [https://ggus.eu/?mode=ticket_info&ticket_id=115959 115959] & the more recent [https://ggus.eu/?mode=ticket_info&ticket_id=116153 116153]<br />
<br />
This Sussex ticket looks like it could do with someone taking a look - still just "assigned" since the 9th - [https://ggus.eu/?mode=ticket_info&ticket_id=116136 116136]<br />
<br />
Finally, an interesting VAC ticket for Oxford - "no space left on device" for atlas jobs - [https://ggus.eu/?mode=ticket_info&ticket_id=116123 116123]. Bit of an odd one!<br />
<br />
'''Monday 7th September 2015, 15.00 BST'''<br /><br />
20 Open UK Tickets this week.<br />
Due to Interesting Downtimes (apologies for reusing that pun!) yet another fairly light review. But not much is going on in ticketland.<br />
<br />
[http://tinyurl.com/nwgrnys THE GGUS TICKETS IN ALL THEIR GLORY]<br /><br />
The Sno+ ticket [https://ggus.eu/?mode=ticket_info&ticket_id=115805 115805] is interesting, Sno+ are looking at monitoring jobs submitted via WMS through the port 9000 URLS, but the RAL WMS behaves differently from the Glasgow one and doesn't let others with the same roles look at the links. Sno+ are still "developing" their grid infrastructure with the WMS in mind by the looks of it.<br />
<br />
The pilot role at Sheffield ticket [https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] could still do with some attention, or an update.<br />
<br />
Bristol had a ticket from CMS that looks interesting - [https://ggus.eu/?mode=ticket_info&ticket_id=115883 115883]. The CMS SAM3 tests are confused by Bristol having an SRM-less SE endpoint. Waiting for reply after things were clarified.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 T'Other VO Nagios Page]<br /><br />
A few "The CREAM service cannot accept jobs at the moment" style errors at QM, but they're only a few hours old. Otherwise looking alright beyond the usual noise.<br />
<br />
Of course with these light reviews I could well be missing something, so feel free to let me know - sites or VO representatives.<br />
<br />
'''Monday 24th August 2015, 15.45 BST'''<br /><br />
<br />
21 Open UK tickets this week, most being looked at or are understood. There will be no ticket update from Matt next week (1st September) either, as he will be flapping about during a local downtime.<br />
<br />
[http://tinyurl.com/nwgrnys YE OLDE GGUS TICKET LINK]<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] (gridpp pilots at Sheffield) could do with an update. The Liverpool ticket discussed last week ([https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248]) has received an update from the user saying that the ticket can be closed. As Steve mentioned last week the underlying issue is still very much there, but I don't think this ticket is a suitable banner for us to fight that battle under.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 T'Other VO Nagios Page]<br /><br />
As usual nothing to see here at time of writing, sites are doing a grand job of working for the monitored VOs.<br />
<br />
And that's all folks! See you in Liverpool.<br />
<br />
<br />
'''Monday 17th August 2015, 15.00 BST'''<br /><br />
29 Open UK Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114992 114992] (10/7)<br /><br />
It looks like this CMS transfer problem ticket can be closed after a user update last week, which reported enabling multiple streams solved the initial failures. In progress (13/8)<br />
<br />
'''EFDA-JET'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115613 115613] (assigned) ''Update- looks like this was a temporary problem, and the ticket can be closed.<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115448 115448] (in progress, but empty)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115496 115496] (in progress, but might not be a site problem).<br /><br />
Jet have 3 biomed tickets, 2 of which are looking a little neglected.<br />
<br />
'''CAMBRIDGE'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115655 115655] (12/8)<br /><br />
John rightfully asks why does a lengthy but very scheduled downtime set off a ROD alarm. Other then that it'll be worth "on-holding" this ticket whilst the unrighteous red flag fades. In progress (17/8) ''Update - On holded''<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
This Sno+ ticket looks like it needs some chasing up, no news for nearly 2 months. In progress (21/7) ''Update - Steve commented on this on TB-SUPPORT''<br />
<br />
'''Monday 10th August 2015, 15.00 BST'''<br /><br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 T'Other VO Nagios] looks alright at time of writing.<br />
<br />
24 Open UK tickets this week.<br />
<br />
''Lots of activity on GGUS since yesterday. Most positive, but I see the number of tickets at EFDA-JET increasing - all three from biomed.''<br />
<br />
'''Tier 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115512 115512] (5/8)<br /><br />
An interesting ticket - where a banned user is still banned after moving to LHCB from Biomed (no, he's not called Heinz). In Progress (6/8) ''Update - Andrew has asked that the ticket be reassigned to the argus devs as the pepd logs are showing oddness. Waiting for reply now.''<br />
<br />
'''Bristol'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115565 115565] (7/8)<br /><br />
Bristol's phedex agents are down, and have been for a few days. I might have dreamt this, but thought that the Bristol Phedex service might not be hosted at Bristol, especially after RALPP had a similar ticket ([https://ggus.eu/?mode=ticket_info&ticket_id=115566 115566]) at the same time. Assigned (7/8) ''Update - solved''<br />
<br />
'''Manchester'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115504 115504] (ROD Ticket) ''Solved'' <br /> <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115399 115399] (Wiki Ticket) ''Still Open'' <br /> <br />
Both these tickets look like they can be closed.<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115525 115525] (5/8)<br /><br />
Atlas deletion errors after a a disk server fell over- nothing wrong with the ticket handling, but Alessandra brings up a point that always niggles me - the emphasis on the total number of transaction errors and not the number of affected unique files. In progress (8/8)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB ticket sparked by those IPv6 problems. Still no word from Vladimir; Raja - could you comment? I suspect there's been plenty of room for LHCB jobs in QM's (and everyone else who mainlines atlas jobs) queues this weekend. Waiting for reply (21/7)<br />
<br />
'''Monday 3rd August 2015, 14.30 BST'''<br /><br />
<br />
23 Open tickets this month, full review time.<br />
<br />
'''''Newish this morning'''''<br /><br />
As discussed on TB-SUPPORT, a few sites have been getting "I can't lcg-tag your CE" tickets from Biomed. The Liverpool ticket [https://ggus.eu/?mode=ticket_info&ticket_id=115449 115449], solved and verified, was the flagship of these issues. Brunel ([https://ggus.eu/?mode=ticket_info&ticket_id=115445 115445]) and EFDA-JET ([https://ggus.eu/?mode=ticket_info&ticket_id=115448 115448]) also have tickets about this.<br />
<br />
<br />
'''Sno+ "glite-wms-job-status warning"''' (3/8)<br /><br />
Glasgow: [https://ggus.eu/?mode=ticket_info&ticket_id=115435 115435]<br /><br />
Tier 1: [https://ggus.eu/?mode=ticket_info&ticket_id=115434 115434]<br /><br />
Matt M submitted these tickets to Glasgow and the Tier 1 after having trouble with a proportion of Sno+ jobs. Both are being looked at- definitely worth collaborating on this one.<br />
<br />
'''Wiki-leak...'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115399 115399](31/7)<br /><br />
Jeremy noticed that the wiki didn't work for him on Friday - but it seems to work for Jeremy, Alessandra and myself now. As Jeremy notes the ticket can be closed, but out of interest did anyone else spot any problems? In progress (3/8)<br />
<br />
'''Spare the ROD...'''<br /><br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115433 115433] (3/8)<br /><br />
Some CE problems noticed on the dashboard for the Liver-lads - who might be in mourning. Assigned (3/8) ''Update - aaannd Solved by upping gridftp max connections''<br />
<br />
'''RALPP & UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114764 114764]<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114851 114851]<br /><br />
Both of these are "availability" alarm tickets, on-holding until they clear. I hope RALPP managed to get a re-computation for their unfair failures (IGTF-1.65 problems on ARC).<br />
<br />
'''Sno+'d Under'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115387 115387] (30/7)<br /><br />
I'm uncertain if this Sno+ ticket, probably somewhat related to Matt M's recent thread on TB-SUPPORT and concerning xrootd access, is meant for the Tier 1 or RALPP. Assigned (3/8)<br />
<br />
'''First Tier Problems.'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115417 115417] (2/8)<br /><br />
LHCB spotted a number of nodes with cvmfs problems at the Tier 1, which the RAL team had already jumped on and repaired this morning. They wonder if the problem persists. Waiting for reply (3/8)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115290 115290] (28/7)<br /><br />
An FTS problem requiring some special CA magic to solve, but the current CA-wizard isn't about. On hold (29/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113836 113836] (20/5)<br /><br />
Glue 1 vs Glue 2 queue mismatches. It's being worked on perfecting cluster publishing for ARC CEs, but the ticket could either do with an update or on-holding. In progress (24/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114992 114992] (10/7)<br /><br />
CMS transfers failing between RAL and, err, TAMU in the US. Assigned to RAL, where Brian has investigating and Andrew has posed an good question, asking if the user has considered managing the transfers with FTS. Quiet on the user side. In progress (21/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/2014)<br /><br />
One of the tickets from the before times, about CMS AA access tests. It has become a long and confusing saga, but Gareth rescued it with a handy summary of the issue in his last update. How goes the battle? In progress (17/7)<br />
<br />
'''Oxford Squid is red.'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115230 115230] (24/7)<br /><br />
Which might be the colour Ewan's seeing right now! The ticket is reopened, with a comment from Alessandra that the current recommendation is to allow all CERN addresses, and asks if this is something Oxford could do. Reopened (3/8) ''Update - solved after a squid restart. It's not the end of this ordeal, but Ewan would like to tackle it in a different arena then an Oxford ticket.''<br />
<br />
'''Mavaricks and Gooses - GridPP Pilot roles.'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114485 114485] '''Bristol'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] '''Sheffield'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114442 114442] '''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114441 114441] '''RHUL'''<br /><br />
Daniela hopped right back into the pilot seat after getting back from her holidays. Bristol and RALPP are looking good, Sheffield and RHUL are still in the Danger Zone - RHUL in particular were having troubles with argus and could do with some working configs from elsewhere to compare and contrast with their own.<br /><br />
''Update - RHUL are looking better, just a few queue permission tweaks to go by the looks of it.''<br />
<br />
<br />
'''My shame - tarball glexec'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] '''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] '''Lancaster'''<br /><br />
The tarball glexec tickets. Actually this is likely to become a defunct (or at least different) problem at Edinburgh with their SL7 move. Lancaster has a plan - we plan to deploy *something* (amazing plan there Matt) during our next big reinstall in September. Between now and then I have a test CE, cluster and most importantly some time. <br />
<br />
'''Pot-luck tickets (or those I couldn't group).'''<br />
<br />
'''Durham'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br /><br />
A tiny fraction of jobs publishing 0 cores used. Looks to be a slurm oddity. Oliver upgraded their CEs to ARC5 last week and hopes this has fixed things. Fingers crossed! In progress (29/7)<br />
<br />
'''Lancaster'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/2014)<br /><br />
My other shame - Lancaster's poor perfsonar performance. It's being worked on. On Hold (should be back in progress soon) (30/7)<br />
<br />
'''Liverpool'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (21/7)<br /><br />
Sno+ production problems at Liverpool, probably due to a lack of space in the shared area. Things are back in Sno+'s court, with the submitter consulting the Sno+ gurus (I think). In progress (21/7)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB job submission problems due to the known about dual-stacking problems. Waiting for input from LHCB for a while now, as things look okay at QM now but at least check LHCB jobs still weren't running for some reason. Waiting for reply (21/7)<br />
<br />
That's all folks!<br />
<br />
'''Monday 27th July 2015, 16.10 BST'''<br /><br />
<br />
Only 20 UK Tickets this week, and many are on hold for summer holidays. I pruned a few tickets, but none are striking me as needing urgent action, so this will be brief.<br />
<br />
[http://tinyurl.com/nwgrnys Mandatory UK GGUS Link]<br /><br />
Nothing to see here really, but maybe I'm missing something? Nit-picking I see:<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (publishing problems at Durham) could still do with an update - not sure if work is progressing offline on the issue.<br />
<br />
The Snoplus ticket [https://ggus.eu/?mode=ticket_info&ticket_id=115165 115165] looks like it might be of interest for others - in it Matt M asks about tape-functionality in gfal2 tools. Brian has updated the ticket clueing us in about gfal-xattr.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 UK T'other VO Nagios]<br /><br />
A few failures here at time of writing - although only one at Brunel seems to be more then a few hours old (dc2-grid-66.brunel.ac.uk is failing pheno CE tests).<br />
<br />
Let me know if I missed ought!<br />
<br />
'''Monday 20th July 2015, 14.30 BST'''<br /><br />
27 Open UK Tickets this week.<br />
<br />
'''Brunel'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115113 115113] (17/7)<br /><br />
This Brunel Ops ticket (otherwise okay) is a good reminder that when removing CEs from the GOCDB for your site make sure to get all the services, or else you just might end up still monitored (and thus end up ticketed!). Reopened (can probably be closed soon) (20/7) ''Update - ticket solved''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114006 114006] (31/5)<br /><br />
This ticket about problems with Brunels accounting figures was looking promisingly close to being solved (at least to my layman's eyes) 3 weeks ago. Any word offline? In progress (30/6)<br />
<br />
'''Lancaster'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114845 114845] (6/7)<br /><br />
LHCB jobs were failing at Lancaster a fortnight ago, but things should have been fixed quite promptly. Are they still broken? Waiting for reply (14/7)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
A similar case for this LHCB ticket for QM (part of their ongoing dual-stack saga). Waiting for reply (13/7)<br />
<br />
Note: The related issue [https://ggus.eu/index.php?mode=ticket_info&ticket_id=115017 115017] has been "solved" pending further testing. <br />
<br />
'''Sheffield'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114649 114649] (26/6)<br /><br />
This Sno+ ticket has been in "Waiting for Reply" for a little while, no word from the user (who isn't Matt M). Could we poke Sno+ though another channel about this? Waiting for reply (6/7)<br />
<br />
'''Liverpool'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
We could do with finding out from Sno+ the state of play at Liverpool too, although we might have to wait until Steve is back from his hols later this week to field any replies. In progress (17/6)<br />
<br />
'''Durham'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br /><br />
Ticket concerning the small percentage of Durham jobs that aren't publishing their core count (probably a slurm oddity). Now that Oliver's back from holiday has he had time to look at this? On Hold (19/6)<br />
<br />
'''Tier 1'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=113836 (20/5)<br /><br />
I suspect that whilst the work described chugs along in the background we consider on holding this ticket. In progress (24/6)<br />
<br />
'''UCL'''<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=114746 114746] (30/6)<br /><br />
Ben has been battling getting his DPM working, and spotted an interesting problem where SELinux was blocking the httpd from accessing mysql. It's nice to see someone not just switching SELinux off (like I have a habit of doing...). In progress (20/7)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115003 115003] (12/7)<br /><br />
Andy having some problems on a test SE at ECDF - he seems ot be suffering a series of unfortunate errors. Maybe the storage group could help? In progress (17/7)<br />
<br />
'''GridPP Pilot Roles.'''<br /><br />
Bristol are ready for testing, Govind discovered a possible bug in argus that needs a bit more testing at RHUL. Things seem quiet at RALPP and Sheffield, and Brunel too.<br />
<br />
''Supplemental - In the region multicore publishing ticket ([https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233]) only Oxford have a CE still appearing to publish 0 cores - but I thought this CE was Old-Yeller'ed?''<br />
<br />
'''Tuesday 14th July 2015, 9.30 BST'''<br />
<br />
Lazy update today, due to some fun and games at Lancaster yesterday. <br />
<br />
[http://tinyurl.com/nwgrnys UK GGUS Tickets]<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios]<br />
<br />
'''Tickets that pop:'''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114952 114952] and [https://ggus.eu/?mode=ticket_info&ticket_id=114951 114951] are both atlas frontier tickets at RALPP and Oxford, both have been reopened - although the underlying issues seem different. The Oxford ticket is similar to one at RAL ([https://ggus.eu/?mode=ticket_info&ticket_id=114957 114957]), which looks to be caused by an unannounced change in IP for some important atlas squids (AIUI - speed reading the tickets this morning).<br />
<br />
'''QM IPv6 Woes'''<br /><br />
Followers of the atlas uk lists will have noticed some heroic attempts to diagnose and repair problems at QM which appear to be someone else's fault.<br />
LHCB's ticket to QM on the matter: [https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573]<br /><br />
Dan's ticket concerning the "rogue routes": [https://ggus.eu/index.php?mode=ticket_info&ticket_id=115017 115017]<br />
<br />
'''GridPP Pilot Roles'''<br /><br />
Durham, Bristol, Sheffield, Brunel and RHUL still have open tickets about this. Bristol are working on it, as are Durham - Oliver's ready for their setup to be tested again (Puppet overwrote his last changes!). Not much recent news from the other three.<br />
<br />
That's all from me folks, let me know if I missed ought!<br />
<br />
'''Tuesday 14th July 2015, 9.30 BST'''<br />
<br />
Lazy update today, due to some fun and games at Lancaster yesterday. <br />
<br />
[http://tinyurl.com/nwgrnys UK GGUS Tickets]<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios]<br />
<br />
'''Tickets that pop:'''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114952 114952] and [https://ggus.eu/?mode=ticket_info&ticket_id=114951 114951] are both atlas frontier tickets at RALPP and Oxford, both have been reopened - although the underlying issues seem different. The Oxford ticket is similar to one at RAL ([https://ggus.eu/?mode=ticket_info&ticket_id=114957 114957]), which looks to be caused by an unannounced change in IP for some important atlas squids (AIUI - speed reading the tickets this morning).<br />
<br />
'''QM IPv6 Woes'''<br /><br />
Followers of the atlas uk lists will have noticed some heroic attempts to diagnose and repair problems at QM which appear to be someone else's fault.<br />
LHCB's ticket to QM on the matter: [https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573]<br /><br />
Dan's ticket concerning the "rogue routes": [https://ggus.eu/index.php?mode=ticket_info&ticket_id=115017 115017]<br />
<br />
'''GridPP Pilot Roles'''<br /><br />
Durham, Bristol, Sheffield, Brunel and RHUL still have open tickets about this. Bristol are working on it, as are Durham - Oliver's ready for their setup to be tested again (Puppet overwrote his last changes!). Not much recent news from the other three.<br />
<br />
That's all from me folks, let me know if I missed ought!<br />
<br />
'''Monday 6th July 2015, 14.00 BST''' <br /><br />
30 Open UK Tickets this month. Looking at them all!<br />
<br />
'''NGI'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233] (10/6)<br /><br />
The UK not publishing core counts at all sites. Some progress, but at last check John G couldn't see a change for Oxford or Glasgow. In progress (30/6) ''Update - Glasgow seems to be okay after de-creaming, checking the July list we have t2ce6 at Oxford, ce3 and ce4 at Durham (see their ticket) and cetest02 at IC (but that node has test in its hostname!).''<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114442 114442] (18/6)<br /><br />
Gridpp Pilot role ticket. Accounts need to be created, but no word for a few weeks. In progress (19/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114764 114764] (1/7)<br /><br />
Ticket tracking (false) availability issues, created to appease COD - the problem caused by a broken CA rpm release for Arc CEs. Kashif has created a counter-ticket [https://ggus.eu/index.php?mode=ticket_info&ticket_id=114742 114742] Gordon's sagely advice is to submit a recalculation request once the issue is fixed. Assigned (1/7)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114485 114485] (19/6)<br /><br />
Bristol's gridpp pilot role ticket. No news, could do with an update really. In progress (22/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114426 114426] (18/6)<br /><br />
CMS AAA reading test problems. The Bristol admins have transferred data to their new shiny SE and have asked CMS to test again. No word since. Waiting for reply (30/6)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (1/7/13...)<br /><br />
Tarball glexec ticket, now 2 years old. After a really promising burst the last 6 weeks haven't seen any progress, due to a lot of other "normal" tarball work taking up the time. Sorry! On hold (18/5)<br />
<br />
'''DURHAM'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114536 114536] (22/6)<br /><br />
Durham's gridpp pilot role ticket. Not acknowledged yet, is Oliver back yet? Assigned (22/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114765 114765] (1/7)<br /><br />
See RALPP ticket 114764. Assigned (1/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114727 114727] (30/6)<br /><br />
Catalin ticketed that a number of SW_DIR variables at Durham are still pointing to the old school .gridpp.ac.uk cvmfs space. Assigned (30/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br /><br />
John G ticketed Durham over a small percentage of jobs being published as "zero core". Looks like a SLURM timeout problem, although a fix isn't obvious. Put on the back burner whilst Oliver is on holiday. On Hold (19/6)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114649 114649] (26/6)<br /><br />
A ticket from a Sno+ user about not being able to access software using the Sheffield CEs. Acknowledged but no news. In progress (26/6) ''Update - Elena can't find anything wrong, cvmfs seems to be working fine. Perhaps a problem with the environment?''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] (18/6)<br /><br />
Sheffield's gridpp pilot role ticket. Did you get round to rolling them out? In progress (19/6)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114444 114444] (18/6)<br /><br />
LHCB ticket concerning the DPM's SRM not returning checksum information. On hold whilst a related ticket is being looked at ([https://ggus.eu/index.php?mode=ticket_info&ticket_id=111403 111403]). On Hold (22/6)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
Another Sno+ ticket, about grid production jobs failing at Liverpool. AIUI caused by Sno+ running out of space on the shared pool. At last check Steve posted the usage information for Sno+ but no word since (and Steve's off on his hols). In progress (17/6)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114845 114845] (6/7)<br /><br />
LHCB pilots failing at Lancaster. Looks like a simple node misconfiguration, hopefully fixed, waiting to see if it is. On hold (6/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/2013)<br /><br />
glexec ticket - see Edinburgh description. On hold (15/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1)<br /><br />
Bad bandwidth performance at Lancaster. Hoping that IPv6 will shake things up a bit so pushing that. On hold (18/5)<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114746 114746] (30/6)<br /><br />
SRM-put failures ROD ticket. No news at all. Assigned (30/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114851 114851] (6/7)<br /><br />
Low availability ROD ticket, related to above. Assigned (6/7)<br />
<br />
'''RHUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114441 114441] (18/6)<br /><br />
Another GridPP pilot role ticket. Pilots rolled out, but something isn't quite right and they're not working - Govind is looking again. In progress (6/7)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB ticket about two out of three QM CEs not responding for them. Dan spotted the broken CEs were dual-stacked, the working one wasn't. The ticket seemed to have trailed off into some confusion over who needs to do some testing where. I agree with Dan that that who needs to be someone with LHCB credentials! The waters still seem muddied. In progress (1/7)<br />
<br />
'''IC'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114737 114737] (30/6)<br /><br />
The IC voms wasn't updating properly, due to what I infer from the ticket as "SSL/mysql madness". Simon and Robert have been heroically battling this one - it's a good read. On hold (3/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114379 114379] (16/6)<br /><br />
Sam's ticket about SE support in Dirac. Sam will shortly try testing things out on the new Dirac to see how it fares. In progress (6/7)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114447 114447] (18/6)<br /><br />
Brunel's gridpp pilot ticket. Being worked on, with one CE with the pilots enabled. In progress (26/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114006 114006] (31/5)<br /><br />
A ticket from APEL, about Brunel under-reporting the number of jobs they are doing. Turned out to be a problem with Arc, which Raul upgraded to the fixed 5.0 version. The APEL team deleted the sync records, but no word since. In progress (30/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114850 114850] (6/7)<br /><br />
Another APEL ticket, likely the fallout of the previous one - it looks like GAP publishing has been left on for the Brunel CREAM CEs. Assigned (6/7) ''Update - solved''<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114786 114786] (2/7)<br /><br />
Low availability ticket - see RALPP ticket 114442 - probably could do with On holding. In progress (2/7) ''Update - Onholded''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113910 113910] (26/5)<br /><br />
Sno+ data staging problems. Brian gave some advice on how the large VOs do data staging from tape, and has asked if Sno+ still has problems. Matt M might still be on leave though. Waiting for reply (23/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA problems, which eventually brought to light to a problem with super-hot datasets which were alleviated (I think). Despite an update to castor that improved performance the last batch of tests didn't show improved results. No news since. In progress (17/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113836 113836] (20/5)<br /><br />
Glue mismatch problems at RAL. Working on getting "many-Arcs" to correctly publish. In progress (24/6)<br />
<br />
'''Monday 29th June 2015, 14.30 BST'''<br /><br />
<br />
Looking at the [https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''"Other VO" Nagios''']. <br /><br />
Things look generally alright - but Durham look like they need to update their CA rpms - but that might have to wait until Oliver is back from leave.<br />
<br />
'''Tarballs:'''<br /><br />
I don't think this effects many, but there's was a ticket to produce a new version of the WN tarball (which is done): [https://ggus.eu/index.php?mode=ticket_info&ticket_id=114574 114574]<br ><br />
Although AFAICS there is no urgent need to upgrade tarball WNs.<br />
<br />
26 UK Tickets, although not many stand out.<br />
<br />
'''Gridpp VO Pilot tickets:'''<br /><br />
Largely doing alright. With Oliver away the Durham ticket hasn't been looked at yet. Sheffield and Bristol's tickets could do with an update (or on-holding if there's going to be a delay). The RHUL ticket has been reopened as they're deployment of the pilot roles hasn't quite worked out.<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB having trouble with two out of three QM CEs. Dan notes that the two "broken" CEs have been recently dual-stacked, and asks if this could be the problem. The answers is a resounding "maybe", and Raja asks if problems could be duplicated by others using lxplus. Waiting for reply (24/6)<br />
<br />
'''IMPERIAL/DIRAC'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114379 114379] (16/6)<br /><br />
Sam's ticket trying to get SE support with Dirac...er, spruced up. Daniela has asked if the tests can be redone with the "new" dirac. Waiting for reply (22/6)<br />
<br />
Let me know if I missed any tickets.<br />
'''Monday 22nd June 2015, 14.00 BST'''<br />
<br />
35 Open UK Tickets this week (!!!)<br />
<br />
'''GridPP Pilot Role'''<br /><br />
A dozen of them are from Daniela (who painstakingly submitted them all) concerning getting the gridpp (and other) pilot role enabled on the site in question's CEs.<br />
<br />
An example of one of these tickets is:<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=114440 114440] (Lancaster's smug solved ticket).<br />
<br />
Ticketed sites are: Durham, Bristol, Cambridge, Glasgow, ECDF (who are also having general gridpp VO support problems), EFDA-JET (looking solved), Oxford, Liverpool, Sheffield, Brunel, RALPP and RHUL. Most tickets are being worked on fine, but the Bristol and Liverpool ones were still just in an "assigned" state at time of writing. <br /><br />
''Update - good progress on this, just one ticket left "assigned". Cambridge are done, as are JET (ticket needs to be closed). Oxford and Manchester are ready for to have their new setups tried out, with Oxford kindly road-testing glexec for the pilot roles. Good stuff.'' <br />
<br />
'''Core Count Publishing'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233] (10/6)<br /><br />
Of the sites mentioned in this ticket (Durham[1], IC, Liverpool, Glasgow, Oxford) who *hasn't* had a go at changing their core count publishing? I know Oxford have. Daniela had a pertinent question about publishing for VMs, which John answered. In progress (17/6)<br />
<br />
[1] Durham have another ticket on this which may explain their lack of core count publishing: <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br />
<br />
'''DIRAC'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114379 114379] (16/6)<br /><br />
Sam S formed this ticket over having trouble accessing the majority of SEs over Dirac, after some discussion around this last week. Sam acknowledges that this could be a site problem, not a DIRAC problem, but you gotta start somewhere (he worded that point more eloquently). Daniela has posted her latest and greatest DIRAC setup gubbins for Sam to try out. Another, unrelated, point to have are the names missing from Sam's list - for example I'm pretty sure Lancaster should support gridpp VO storage but I've forgotten to roll it out! Waiting for reply (22/6)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
Final ticket today, and another one discussed last week in the storage meeting. Steve's explanation of why (and how) Sno+ would need to start to using space tokens was fantastically well worded in a way to not spook easily startled users. David is digesting the information, but it will likely need to wait for Matt M's return before we'll see progress. In progress (16/6)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114444 114444](18/6)<br /><br />
I told a pork pie when I said that was the last ticket - this one caught my eye. A ticket from lhcb over files not having their checksums stored on Manchester's DPM. A link was given to another ticket at CBPF for a similar issue which got the DPM devs involved ([https://ggus.eu/index.php?mode=ticket_info&ticket_id=111403 111403]) - although Andrew McNab was already subscribed to the ticket. In progress (19/6)<br />
<br />
'''Monday 15th June 2015, 14.15 BST'''<br /><br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO nagios:]<br /><br />
ce05.esc.qmul.ac.uk and hepgrid11.ph.liv.ac.uk seem to be having a spot of bother for multiple VOs, and hepgrid2.ph.liv.ac.uk seems to be starting to have trouble too.<br />
<br />
19 Open UK Tickets this week.<br />
<br />
'''NGI'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233] (10/6)<br /><br />
John Gordon ticketed the NGI (as well as others) about some sites in the UK not publishing core counts with their APEL numbers (or more precisely have submission hosts at that site not publishing). Following the link it looks like Imperial, Liverpool, Durham, Glasgow and Oxford are on this list, I've listed the submission hosts reporting "0" core jobs below to help people clear up their rogues! If you've only fixed things in the last fortnight you'd still show up on this list. In progress (15/6) <br />
<br />
"0" core job submission hosts:<br /><br />
ce3.dur.scotgrid.ac.uk<br /><br />
ce4.dur.scotgrid.ac.uk<br /><br />
cetest02.grid.hep.ph.ic.ac.uk<br /><br />
hepgrid5.ph.liv.ac.uk<br /><br />
hepgrid6.ph.liv.ac.uk<br /><br />
hepgrid97.ph.liv.ac.uk<br /><br />
svr009.gla.scotgrid.ac.uk<br /><br />
t2ce06.physics.ox.ac.uk<br />
<br />
'''Tier 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113914 113914] (26/5)<br /><br />
Sno+ file copying ticket. Matt M is away on his hols, but Dave Auty has took over his duties and reports that this problem seems to have gone away - it can probably be closed. In progress (9/6)<br />
<br />
'''Liverpool'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
Another Sno+ ticket here from David concerning job failures. Nothing wrong with the ticket handling, but I thought that David's errors in submitting test jobs are worth documenting, as they were very understandable. David has since asked if the Sno+ job failures are linked to Sno+ nagios test failures at Liverpool. In progress (12/6)<br />
<br />
'''Imperial'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114157 114157] (8/6)<br /><br />
After Simon and Daniela have cleared up the atlas dark data and expanded their Space Tokens using the space freed up there still seems to be some confusion in Rucio, disagreeing with the SRM numbers. In progress (10/6)<br />
<br />
'''Oxford'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114208 114208] (9/6)<br /><br />
Oxford being ticketed for UKI-SOUTHGRID-OX-HEP_IPV6TEST failing connection tests, tests for which Oxford should not be getting ticketed afaacs. I remember this being mentioned in the Thursday cloud meeting, but I'm ashamed to say I wasn't paying attention. Were any conclusions drawn/decisions made? In progress (10/6)<br />
<br />
'''Manchester'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114153 114153] (7/6)<br /><br />
Atlas transfer failures from Manchester. Errors are still occurring as of yesterday, any news? In progress (14/6)<br />
<br />
<br />
'''Monday 8th June 2015, 15.00 BST'''<br /><br />
21 Open Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113914 113914] (26/5)<br /><br />
Sno+ had problems at the Tier 1 where jobs failed whilst uploading data, believed to be due to an incorrect VOInfoPath. There's been a failure at replicating the issue, and the VOInfoPath advertised is correct. Very confusing, as I assume it all worked at some point before! In progress (2/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113910 113910] (26/5)<br /><br />
Another Sno+ ticket, concerning lcg-cp timeouts whilst data-staging from tape. Matt M has asked for advice on the best practice for doing this, or if Sno+ would be better off just upping their timeouts. Brian has given some advice on using the "bringonline" command, but is himself unsure the best way of seeing what files are currently in a VO's cache. Not much news since. In progress (28/5) <br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114004 114004] (31/5)<br /><br />
Atlas transfers fail due to the "bring-online" timeout being exceeded. Brian spotted a problem with file timestamps mismatching, but no news on this ticket since. In progress (1/6)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114006 114006] (31/5)<br /><br />
APEL accounting oddness at Brunel, noticed by the APEL team. After much to-and-fro-ing John noticed that multiple CEs were using the same SumbitHost, and thus overwriting each other's sync records. Something to watch out for. In progress (7/6)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114157 114157] (8/6)<br /><br />
There's been some debate on the atlas lists about this ticket, a classic "not enough space at the site" ticket. Raising above the indignation over being ticketed for this, Daniela has offered a couple of TB to give some space, and pointed out that IC have some atlas data outside space tokens, and that this could be used to expand the tokens if cleaned up. Waiting for reply (8/6)<br />
<br />
<br />
'''Tuesday 26th May 2015'''<br /><br />
Matt's on leave until the 8th of June. But he's replaceable with handy links... 23 tickets today:<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios''']<br /><br />
<br />
[http://tinyurl.com/nwgrnys '''UK NGI GGUS tickets''']<br />
<br />
'''Monday 18th May 2015, 14.30 BST'''<br /><br />
Full review this week.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios''']<br /><br />
At time of writing I see problems with test jobs at Brunel for pheno and Liverpool for a number of VOs (see Sno+ ticket for probable cause and fix at Liverpool).<br />
<br />
22 Open UK Tickets this week. Going site-by-site:<br />
<br />
'''APEL/NGI'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113473 113473] (4/5)<br /><br />
Missing accounting date for April for some sites. Raul is discussing things for Brunel in the ticket, although they have republished. I think it's only ECDF left to republish their April data. In progress (16/5)<br />
<br />
'''OXFORD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113482 113482] (26/4)<br /><br />
Loss of accounting data for Oxford needing a APEL republish. The Oxford guys republished, but there is some confusion with the resulting numbers. Discussion is ongoing, John G is currently looking at the records. In progress (14/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113650 113650] (11/5)<br /><br />
CMS glideins failing at Oxford. The original problem was with a config tweak being left out of the cvmfs setup, but the ticket has been reopened citing problems persisting on the ARC CE (the CREAM appears to be fixed). Reopened (16/5)<br />
<br />
'''GLASGOW'''<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113095 113095] (17/4)<br /><br />
ROD ticket about batch system BDII failures, left open to avoid unnecessary ticket filing. Gareth noted that the full migration to ARC and HTCondor, which should see the end of these issues, will hopefully be completed by the end of June. On Hold (12/5)<br />
<br />
'''ECDF'''<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (31/7/13)<br /><br />
Somehow left this one out of the e-mail update. Edinburgh's glexec ticket, dependent on the tarball. I put in my tuppence worth today with my tarball hat on. On hold (18/5)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113769 113769] (18/5)<br /><br />
LHCB see a cvmfs problem at Sheffield. Elena has probably fixed the problem(restarted the sssd), just waiting to see if it all pans out. In progress (18/5)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113744 113744] (15/5)<br /><br />
For the VOMS rather then the site, Jens' request for the creation of the dIrac VO, vo.dirac.ac.uk. In progress (18/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113692 113692] (13/5)<br /><br />
A request from pheno to add support to for their new cvmfs area at Manchester, and as I understand it, to support them in a new "form" (pheno.egi.eu). In progress (13/5)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113742 113742] (15/5)<br /><br />
Sno+ noticed their nagios failures at Liverpool. Rob reckons this was a problem with the DPM BDII service certificate not being updated (that's bitten me too), and fixed things this morning. Let's see how that goes. In progress (18/5)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13!)<br /><br />
Lancaster's vintage glexec ticket. An update on this - after have a roundtuit session last week I was building glexec for different paths. It still needs some testing to make sure it works properly. There however definitely won't be a one-size-fits-all tarball solution. On hold (15/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/14)<br /><br />
Only the crustiest old tickets for us at Lancaster! Poor perfsonar performance. Sadly didn't get roundtuit on this one - we're pushing getting these nodes dual stacked as Ewan had pointed out that it would be interesting to see if IPv6 tests also saw this issue. On hild (18/5)<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113721 113721] (14/5)<br /><br />
The only UCL ticket, this is a egi "low availability" ticket. However Daniela notes that the plots are on the rise, so things are looking alright. Probably want to "On Hold" it but otherwise not much to be done. In progress (14/5)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113743 113743] (15/5)<br /><br />
A ticket from Durham concerning the Dirac instance at Imperial's settings for their site. Daniela hopes to get it fixed soon. In progress (15/5)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112948 112948] (10/4)<br /><br />
CA certificate update at 100IT leading to a discussion of other authentication based failures. David has asked for voms information after posting his configs. In progress (13/5)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113035 113035] (14/4)<br /><br />
Ticket tracking the decommissioning of the Tier 1 CREAM CEs. I think things are just about done now, this ticket can soon be closed. In progress (11/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694] (28/10/14)<br /><br />
Sno+ gfal-copy ticket. Brian reports that the Tier 1 is upgrading gfal2 on their WNs, and notes that there's a lot of active debugging work going on in the area. As he eloquently puts it "situation is quite fluid". In progress (13/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA tests failing at the Tier 1. There's been a lot of work on this, deploying then trying to get the new xrootd director configured. New problems have cropped up, and are under investigation. In progress (11/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (28/3)<br /><br />
Atlas transfer failures ("failed to get source file size"). Tracked to a odd double transfer error, possibly introduced in one of the recent "upgrades". Brian has been declaring these files as bad, and a workaround or solution is being thought about. In progress (14/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113705 113705] (13/5)<br /><br />
Atlas transfer failures from RAL tape. Checksum failures, which Brian tracked to being due to not being of a type Castor supports. Brian has asked if this can be changed at the CERN FTS or in rucio. Waiting for reply (14/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113748 113748] (16/5)<br /><br />
Another atlas transfer ticket, but as the error indicates no space left at the Brunel space token being transferred to Elena has noted that this isn't a site problem, telling the submitter to put in a JIRA ticket instead. Waiting for reply, but probably can be just closed (16/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112866 112866] (7/4)<br /><br />
Lots of cms job failures at RAL. This has been traced to some super-hot files, mitigation is being looked into. A candidate for perhaps On Holding, depends on the time frame of a work around. In progress (13/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113320 113320] (27/4)<br /><br />
CMS data transfer issues. I'm not actually too sure what's going on. There are files that need invalidating, which seems to be the root of the evil befalling transfers. The issue is being actively worked on though. In progress (18/5)<br />
<br />
'''Monday 11th May 2015, 14.10 BST'''<br /><br />
22 Open UK Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
There are a few tickets at the Tier 1 that are set "In Progress" but haven't received an update yet this month:<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (CMS AAA Tests, 30/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (Atlas Transfer problems, 16/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694] (SNO+ gfal copy trouble, 15/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112866 112866] (CMS job failures, 7/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112819 112819] (SNO+ arcsync troubles, 20/4)<br />
<br />
'''Other Tier 1 Tickets''' (sorry to be picking on you guys!)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (10/2)<br /><br />
Atlas glexec hammercloud test jobs at the Tier 1. It appears to be working, but a batch of test jobs failed because they couldn't find the "mkgltempdir" utility on some nodes ("slot1_5@lcg1742.gridpp.rl.ac.uk" and "slot1_4@lcg1739.gridpp.rl.ac.uk"). In progress (4/5)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=113320 113320] (27/4)<br /><br />
Maybe repeating what Daniela is going to say in the CMS update - trouble with CMS data transfers within RAL. It's under investigation, but it looks like the files in question will need to be invalidated - even if it's just to paint a clearer picture. In progress (10/5)<br />
<br />
'''APEL REPUBLISHING'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113473 113473] <br /><br />
At last update Brunel, Liverpool, Edinburgh, Birmingham and Oxford need to republish still. Oxford have their own ticket about it due to complications ([https://ggus.eu/?mode=ticket_info&ticket_id=113482 113482]).<br />
<br />
'''UCL Tickets''' - Ben is starting to move to close these, some are going to be "unsolved".<br />
<br />
'''GLASGOW'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113095 113095] (17/4)<br /><br />
Andrew asks if the timeframe for the move to Condor be added to this ticket, for the ROD team's information. On Hold (7/4)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112948 112948] (10/4)<br /><br />
No news on this 100IT ticket for a while. In progress (27/4)<br />
<br />
'''Friday 1st May'''<br /><br />
The Bank Holiday weekend might muck up plans for a Ticket review this week. Just in case, some links!<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios page.]<br /><br />
<br />
[http://tinyurl.com/l784bbg UK GGUS Tickets]<br />
<br />
Hope you all have a nice weekend!<br />
<br />
A quick check of the [https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios] page.<br />
<br />
26 Open UK tickets this week.<br />
<br />
'''ITWO Decommissioning'''<br /><br />
Three of the tickets are to the VOMS sites (Manchester, Oxford, IC), concerning the decommissioning of the ITWO VO. Just an FYI to y'all.<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113293 113293] (26/4)<br /><br />
There was an APEL problem last month where a lot of sites needed to republish their data for the month. I think Edinburgh are the only UK site that suffered this problem, but another FYI ticket. Assigned (26/4) '' And solved''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113181 113181](21/4)<br /><br />
Atlas production jobs not running at ECDF. Andy noticed that analysis jobs were running fine, and believes that this might be a problem scheduling pilots in time. Perhaps a multicore issue if this is only effecting (affecting?) production jobs. In progress (22/4) ''Update - solved''<br />
<br />
'''ATLAS GLEXEC HAMMERCLOUD PROBLEMS'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (TIER 1)<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=111703 111703] (RALPP)<br /><br />
It was discovered that there was a problem in the test code, so the ball is very much in atlas' court for this one. The problem has been fixed and the tests are being rebuilt and resubmitted.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (28/3)<br /><br />
ATLAS FTS failures too RAL. A rucio issue causing double-transfers has been discovered ([https://its.cern.ch/jira/browse/ATLDDMOPS-4939 here]), which would explain the behaviour seen. No news since this revelation. In progress (16/4)<br />
<br />
There are a number of other Tier 1 tickets that could do with either an update or On Holding<br />
<br />
'''Monday 20th April 2015, 14.30 BST'''<br /><br />
24 Open UK tickets this week, only a light review. ''Update - down to 20 open tickets as of this morning''<br />
<br />
'''NGI/TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113150 113150] (20/4)<br /><br />
Fresh in - the NGI has been ticketed to change the regional VO from emi.argus to ngi.argus in the gocdb. Seems a bit pedantic, but hey! I assigned it to the NGI ops, and notified RAL as keepers of the regional argus. Assigned (20/4) ''Update - solved''<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113035 113035] (14/4)<br /><br />
Just for people's interest, the ticket tracking the decommissioning of the last of the RAL CREAM CEs. In progress (14/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112819 112819] (2/4)<br /><br />
A SNO+ ticket I must of somehow missed last week, concerning SNO+'s manual renewing of proxies on ARC machines. Matt M has noticed that ArcSync occasionally hangs rather then timeouts smoothly (although he later notes that he doesn't see the initial problems working from a different network). I'm thinking that this should be redirected at the arc devs, but I don't think they have a GGUS support group (I could be wrong, I'm well behind on the ARC curve). In progress (7/4)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113110 113110] (17/4)<br /><br />
Looks like this atlas low transfer efficiency ticket can be closed. Waiting for reply (20/4) ''Update - solved''<br />
<br />
'''GLASGOW'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113095 113095] (17/4)<br /><br />
ROD ticket for some BDII misreporting at Glasgow. The botheration seems to be ephemeral in nature, the blunders passing with the abating of their batch system's burden. This ticket can probably be solved. In progress (17/4)<br />
<br />
<br />
'''Monday 13th April 2015, 14.00 BST'''<br /><br />
24 Open tickets this week - going over all of them this week, site by site.<br />
<br />
''Fresh in this morning - [https://ggus.eu/?mode=ticket_info&ticket_id=113010 113010] and [https://ggus.eu/?mode=ticket_info&ticket_id=113011 113011] - Sno+ tickets concerning the RAL and Glasgow WMSes not updating job statuses.''<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703](11/2)<br /><br />
Atlas glexec hammercloud tests failing. There's been a lot of waiting on atlas to build new HC jobs. The most recent exchange (delayed due to Easter), was asking about SELinux - but no news since the first. In progress (1/4)<br />
<br />
'''BIRMINGHAM'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112875 112875](7/4)<br /><br />
Low availability ROD ticket. Availability is crawling back up, just need it to go green. On hold (13/4)<br />
<br />
'''GLASGOW'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112967 112967](10/4)<br /><br />
Another ROD ticket for bdii errors at Glasgow. Gareth has been doing everything right investigating this. Kashif recommended ticketed the midmon unit, but Gareth has spotted that the errors correspond to high load on their ARC CE - so it might be a site problem after all - Gareth asks for clarification. Waiting for reply (13/4)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (1/7/13)<br /><br />
Tarball glexec ticket. No news (sorry). End of April I believe was the "deadline" I set for having this made. On Hold (9/3)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/14)<br /><br />
Lancaster's poor perfsonar performance. I'm not believing quite what I was seeing with the tests I performed so I'm aiming to rerun them. On hold (13/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13)<br /><br />
Lancaster's tarball glexec ticket. Same as ECDF. On hold (9/3)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112966 112966] (13/3)<br /><br />
A ROD cream job submit ticket, freshly assigned this afternoon. It's a bit mean of me to bring notice to it. Assigned (13/4) ''And POW, Raul closed this after kicking torque into shape - solved''<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112948 112948] (10/4)<br /><br />
100IT needed to upgrade to the latest CA release. They've done this, but there are still authentication problems. In progress (13/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356] (10/9/14)<br /><br />
Deploying vmcatcher at 100IT. After David's questions falling on deaf ears for a while it has been advised that the ticket be closed as this issue will be dealt with elsewhere. Whether or not it is to be "solved" or "unsolved" is open to debate! In progress (can possibly be closed) (13/4)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA tests failing at RAL. After a lot of work and new xrootd redirectors problems persist. It's looking to be a problem that needs the CASTOR and/or xrootd devs to look at. In progress (30/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112713 112713] (27/3)<br /><br />
CMS asking to clean up the "unmerged area". Andrew conjured up a list of files and asked if they could be deleted - CMS responded with a "yes please then close the ticket". Has the deed been done? In progress (31/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694] (28/10/14)<br /><br />
The Sno+ gfal copy ticket. Matt M still sees gfal-copy hang for files at RAL when he uses the GUID (SURL works). A Castor oddity perhaps? Matt asks a question about what problems like this (coupled with the move away from lcg tools) will mean for VOs that rely on the LFC. In progress (31/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112977 112977] (10/3)<br /><br />
CMS high job failure rate at RAL. Related to 112896 (below) - the jobs all want that file! In progress (13/3)<br />
<br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=112896 (9/4)<br /><br />
CMS Dataset access problems - caused by over a million access attempts on a single file over a 18 hour period. Andrew L comments that CMS needs to have a think about how they access pileup datasets. In progress (9/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (10/2)<br /><br />
Tier 1 counterpart to 111703. A new HC stress test was submitted near the end of March, but no news on how it did. In progress (23/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112866 112866] (7/4)<br /><br />
A different "lots of CMS job failures" ticket. Again a "hot file" seems to be the root cause. In progress (7/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (28/3)<br /><br />
An atlas file access ticket, seemingly caused by some odd FTS behaviour. No answers to Shaun's question about this odd occurrence or much noise at all till today. Waiting for reply (13/4)<br />
<br />
'''UCL'''<br /><br />
UCL has 6 tickets - 4 just "assigned". I'll just list them in the interests of brevity.<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112371 112371] (ROD low availiability, On Hold)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112841 112841] (atlas 0% transfer efficiency, assigned)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112873 112873] (ROD srm put failures, assigned)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95298 95298] (glexec ticket, on hold)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112722 112722] (atlas checksum timeouts, in progress)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112966 112966] (ROD job submit failures, assigned)<br />
<br />
'''Tuesday 7th April'''<br />
* 20 open tickets. [http://tinyurl.com/l784bbg Link to GGUS].<br />
<br />
'''Monday 23rd March 2015, 15.30 GMT'''<br /><br />
19 Open tickets this week. <br />
<br />
'''BIRMINGHAM'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=112550 (23/3)<br /><br />
A ticket fresh off the ROD dashboard - the Birmingham CREAMs aren't being matched ("BrokerHelper: no compatible resources"). Matt W has double checked their setup and can't spot anything wrong - they've been running "normal" atlas/lhcb etc jobs fine over the last few weeks. Any advice appreciated. In progress (23/3)<br />
<br />
'''TIER 1'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=112495 (21/3)<br /><br />
MICE problems running jobs at RAL, which Andrew L discovered coincided with WMS problems that he fixed. Probably should be in "Waiting for Reply/Seeing if the problem's evaporated". In progress (23/3)<br />
<br />
https://ggus.eu/?mode=ticket_info&ticket_id=112350 (14/3)<br /><br />
The cause of this Sno+ ticket, about a recent user not being able to access files due to not being in the gridmap, has been discovered. As Robert F sagely pointed out the latest version of the mkgridmap rpm is required to talk to the voms server. Just waiting on the time for it to updated at now. In progress (17/3)<br />
<br />
'''100IT'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=108356 (10/9/14)<br /><br />
This 100IT ticket is still waiting for a reply (since mid-January). The question needs to be answered by someone familiar with the technologies and terminologies? Is anyone up on vmcatcher? Anyone know what other channel to pass David's query onto? Waiting for reply (15/1)<br />
<br />
'''Monday 16th March 2015, 15.30 GMT'''<br /><br />
<br />
16 Open UK tickets this week. Half Red, Half Green.<br />
<br />
'''RALPP and TIER 1 glexec HC tickets'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703] (RALPP)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (TIER 1)<br /><br />
No news on these tickets since it was expected that a new stable HC job would be released last Tuesday. All very quiet.<br />
<br />
'''OXFORD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112011 112011](25/2)<br /><br />
Similar for this CMS glexec ticket - no news after Kashif asked for some more information way back. Waiting for reply (27/2)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694](28/10/14)<br /><br />
The Sno+ gfal copying ticket. A lot of people are working on this, and attempts to recreate the problems seem to occasionally be devolving into "did we get this complicated command right?". At some point it might be necessary to get the gfal devs involved (are there gfal devs?). Waiting for reply (11/3)<br />
<br />
Also there's the poor 100IT ticket, still waiting for a reply. The JET ticket also needs wrapping up, I put a reminder in to the end of it.<br />
<br />
<br />
'''Monday 9th March 2015, 15.00 GMT'''<br/ ><br />
From last week's crusty ticket round up:<br />
<br />
'''Tier 1'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694]- 28/10/14<br/ ><br />
Matt M has managed to reclaim his tickets after a certificate change orphaned his old ones. Progress has resumed. Duncan has asked Matt to retry his failing tests with a simple copy to local disk example.<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944]- 1/10/14<br/ ><br />
CMS AAA access at RAL. Andrew posed a question to the CMS xroot experts last week - if you have their details it might be a good idea to involve them in the ticket.<br />
<br />
'''100IT'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356]- 10/9/14<br/ ><br />
This VMCatcher ticket is still stuck waiting for a reply. Deafening silence for our 100IT colleagues. <br />
<br />
'''EFDA-JET'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=97485 97485] - 21/9/13<br/ ><br />
The Jet LHCB ticket is in the state of being wrapped up. LHCB have been removed from the local configs, and the site has been removed from LHCB's. I believe that this ticket can be terminated.<br />
<br />
'''QMUL'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353] - 25/11/14<br/ ><br />
Dan's managed to get webdav working on one SE, but not t'other. Very strange, but Dan is investigating (see also [https://ggus.eu/?mode=ticket_info&ticket_id=111942 111942]).<br />
<br />
No movement on the three glexec tickets (none expected on the two tarball ones in the last week though), the Lancaster perfsonar ticket is still waiting on another batch of local tweaks (and I still need to make sense of what I'm seeing). Matt RB closed Sussex's perfsonar ticket though - nice one.<br />
<br />
'''The "Normal" tickets:'''<br />
<br />
Atlas gLexec Hammercloud failures (RALPP and Tier 1)<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (Tier 1)<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703] (RALPP)<br/ ><br />
These tests were waiting on a new stable job release being made - this has been delayed (hopefully out tomorrow).<br />
<br />
'''TIER 1'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111856 111856](19/2)<br/ ><br />
This LHCB ticket about stalled jobs looks like it can be closed (LHCB no longer see a problem). ''Update - set to solved, the jobs were being killed for using too much memory.''<br />
<br />
'''OXFORD'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112011 112011](25/2)<br/ ><br />
A CMS user saw glexec failures on some nodes - Kashif asked for some more information but there has been no reply. I'd consider giving the user till the end of the week then closing the ticket if there's still no word. Waiting for reply (27/2)<br />
<br />
'''Tuesday 3rd March'''<br />
* [http://tinyurl.com/nwgrnys A link to GGUS open tickets] for checking status directly.<br />
<br />
Concentrating on pre-2015 tickets this week in an attempt to Spring Clean the UK ggus presence. I will review these again next week - can people please take a look at these tickets if they're owned by them (or if they think they can help!).<br />
<br />
'''SUSSEX - 26/11/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389]<br /><br />
This is a perfsonar ticket - the initial request (reinstalling the perfsonar node) has been done a while ago but things weren't quite right. Matt RB did some soothing to this last week and asks if he's missed anything - I put it to waiting for reply this morning. Waiting for reply (24/2)<br />
<br />
'''QMUL - 25/11/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353]<br /><br />
Atlas wanting https access on QM's SE. Dan's been working on this nicely, carefully testing each stage of his rollout. The end is in sight here. In progress (17/2)<br />
<br />
'''TIER 1 - 28/10/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694]<br /><br />
This is a SNO+ ticket about getting gfal tools working for the Tier 1 - with the new version out Brian has tested it correctly (and I saw a related thread on lcg-rollout) - but no word from Sno+. Who is wrangling the other VOs in these post-Walker times? Waiting for reply (24/2)<br />
<br />
'''TIER 1 - 1/10/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944]<br /><br />
A CMS access about AAA tests at the Tier 1. This ticket is being actively worked on, with a new xrootd redirector at RAL and problems with the EU redirectors mucking things up. No problems that I can see. Waiting for reply (2/3)<br />
<br />
'''100IT - 10/9/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356]<br /><br />
Getting VMcatcher stuff to work at 100IT. This ticket seems to keep stalling due to lack of documentation or replies from the submitters. Waiting for reply (19/1)<br />
<br />
'''LANCASTER - 27/1/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566]<br /><br />
Lancaster's poor perfsonar performance. Being poked and prodded on and off over the last year, but the problems remain a mystery - Ewan's lending of a iperf endpoint has helped out greatly though, waiting on yet another network tweak. On Hold (23/2)<br />
<br />
'''EFDA-JET - 21/9/13'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=97485 97485]<br /><br />
LHCB job failures at EFDA-JET. The causes of this remain a mystery, and is the first ticket on my "to be set to unsolved" list.<br />
<br />
'''UCL - 1/7/13'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95298 95298]<br /><br />
UCL's glexec ticket. Ben's been working hard at this recently, but keeps hitting show stoppers - the latest being a performance problem possibly due to the VM he's running argus on. On hold (19/2)<br />
<br />
'''ECDF and LANCASTER - 1/7/13'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (ECDF)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (Lancaster)<br /><br />
glexec for the tarball. This is *still* waiting on the tarball glexec, which is again waiting on me, which is waiting on me magicking some extra tarball development time. Will be reviewed by the end of March.<br />
<br />
<br />
'''Monday 23rd February 2015, 15.00 GMT'''<br /><br />
Only 15 Open UK tickets this week. Feel free to bring up any ticket-based issues of your own on this quiet week.<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703](11/2)<br /><br />
This CMS glexec hammercloud ticket is looking a little quiet - no update for a while. If it's continuing offline or waiting on input could it at least be put On Hold? In progress (11/2)<br />
(The Tier-1 version of this ticket, 111699, seems to be chugging along fine - there might be useful snippets in there).<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333](22/1)<br /><br />
The Cloud accounting probe ticket was reopened, asking if 100IT ticketed apel support (I assume contacting them via other means would work too) otherwise the new cloud accounting won't be properly republished. Reopened (20/2)<br />
<br />
<br />
'''IMPERIAL''' (but not really their issue)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111872 111872](20/2)<br /><br />
Tom opened a ticket after another cern@school user had troubles using the IC SE - there has been some problems with the newer versions of the dirac UI. Sometimes it's better to go Vintage! Although after trying Simon and Daniela haven't been able to reproduce the failure - perhaps something's up with the user's UI? Waiting for reply (23/2)<br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389](26/11/14)<br /><br />
Sussex's Perfsonar ticket. I know Matt RB has put the ticket On Hold and is very busy - is there any news/anything we can do to help? On Hold (21/1<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694](28/10/14)<br /><br />
Sno+ gfal copy problems at RAL. Brian informs us that the latest version of the gfal tools works for him and has asked if they work for Matt M. and Co. Did you get these packages out of epel/epel-testing or somewhere else Brian? Waiting for reply (18/2)<br />
<br />
'''Monday 16th February 2015, 14.30 GMT'''<br /><br />
Only 19 open UK tickets today.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111120 111120](12/1)<br /><br />
An atlas ticket concerning transfer failures between RAL and BNL. Brian mentioned last week that the lack of recent failures is due to atlas not attempting to transfer any older data recently. Perhaps this could do with being put into the ticket (and the ticket being put On Hold, or prodded some more)? Waiting for reply (29/1)<br />
<br />
Also<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=111800 111800] (17/2)<br /><br />
ARC CE issues at RAL detected. <br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703](11/2)<br /><br />
Atlas running glexec hammerclouds - having trouble at RALPP (and RAL, see 111699). The glexec experts have gotten involved on this one, and asked to take a peek at a proxy - not sure about anyone else, but I'd feel a tad uncomfortable sharing proxies, even with known and trusted experts as in this case. Am I being overly paranoid? Either way the ticket has gone a bit quiet. In progress (11/2) <br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356] & [https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333]<br /><br />
Both the 100IT tickets are in Waiting for reply - the oldest one for quite a while - David asked a question a while back and no answer. The newest one asks if the 100IT logs made it to apel safely - I think what David has to do is submit a ticket with this question to the apel support team - have I got the right end of the stick? <br />
<br />
'''Biomed Tickets at Manchester and Imperial'''<br /> <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356] & [https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357]<br /><br />
FYI There's a note at the bottom of both of these tickets that the version of CREAM that should fix this has been delayed until the end of February(ish).<br /><br />
''Update - I read those updates wrong - the cream update has been released and these tickets have been (perhaps erroneously) closed.'' <br />
<br />
<br />
<br />
'''Monday 9th February 2015, 15.00 GMT'''<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios Results]<br /><br />
At the time of writing the only site showing red that aren't suffering an understood problem was RALPP with org.nordugrid.ARC-CE-submit and SRM-submit test failures for gridpp, pheno, t2k and southgrid for both its CEs and its SE. The failures are between 1 and 12 hours old, so it doesn't seem to be a persistent failure, but it seems to be quite consistent. They all seem to be failing with "Job submission failed... arcsub exited with code 256: ...ERROR: Failed to connect to XXXX(IPv4):443 .... Job submission failed, no more possible targets". Anyone seen something like this before?<br />
<br />
Only 20 Open UK tickets this week.<br />
<br />
'''Biomed tickets:'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356] (Manchester)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357] (Imperial)<br /><br />
Biomed have linked both these tickets as children of 110636, being worked on by the cream blah team. AFAIKS no sign of Cream 1.16.5 just yet.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111347 111347] (22/1)<br /><br />
CMS consistency checks for January 2015. It looks like everything that was asked of RAL has been done by RAL, so hopefully this can be successfully closed. In progress (3/2)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111120 111120] (12/1)<br /><br />
Another ticket, this time concerning a period of Atlas transfer failures between RAL and BNL, that looks like it can be closed as the failures seem to have stopped (and might well have been at the BNL end). Waiting for reply (22/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA test failures at RAL. Federica can't connect to the new xrootd service according to the error messages. No news for a while. In progress (29/1)<br />
<br />
'''100IT'''<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356]<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333]<br /><br />
Both of these 100IT tickets are looking a bit crusty - the first is waiting for advice, the second was just put "In progress".<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353] (25/11/14)<br />
Dan has set up se02.esc.qmul.ac.uk to test out the latest https-accessible version of storm for dteam and atlas. As a cherry on top this node is also IPv6 enabled. I'm not sure if Dan wants others in the UK to "give it a go"? In progress (6/2)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/14)<br /><br />
(Blatantly scounging for advice) Trying to figure out why Lancaster's perfsonar is under-performing. Ewan kindly gave us access to a iperf endpoint and it's been very useful in characterising some of the weirdness - although I'm still confused. Ewan also gave us a bunch of suggestions for testing that have been useful - next stop, window sizes. If anyone else wants to throw advice to me all wisdom donations are thankfully accepted. My advice for others in be careful trying to connect to the default iperf port on a working DPM pool node.... In Progress (9/2)<br />
<br />
'''Monday 2nd February 2015, 14.00 GMT'''<br /><br />
22 Open UK tickets this month. <br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389] (26/11/14)<br /><br />
A perfsonar ticket for Sussex. Their perfsonar has been reinstalled, but needs soothing. Matt has informed us that this might have to wait a few weeks due to other issues. On Hold (21/1)<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110536 110536] (2/12/14)<br /><br />
MICE job failures at RALPP - it looked like they were dying due to running of of memory. The queues have been tweaked to give MICE more, but no word from the MICE if this has solved the problem. Waiting for reply (12/1)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110365 110365] (25/11/14)<br /><br />
Another perfsonar ticket. Again the node is reinstalled, just not quite working right. Winnie is waiting for news from the other sites in a similar boat. In progress (maybe On Hold it?) (20/1)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111118 111118] (12/1)<br /><br />
ECDF "low availability" ticket - just waiting for the silly alarm to clear. Daniela submitted a ticket about this foolish alarm a while ago - [https://ggus.eu/?mode=ticket_info&ticket_id=107689 107689]. On Hold (19/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (1/7/13)<br /><br />
glexec tarball ticket. With my tarball hat on - still no positive news on this front - it's beginning to look like this can't be done but we're having one last go. Sorry! On Hold (19/12)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110225 110225] (18/11/14)<br /><br />
Change of VO Manager for helios-vo.eu. It looks like this ticket is being held up at the user end a lot. I'm not sure there's anything we can do as it involves outside CAs. On Hold (20/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356] (23/1)<br /><br />
One of Manchester's CEs not working for biomed, due to problems with the new CREAM/old WMS communication. Alessandra gave biomed some sagely advice, but I suspect this ticket will need to be prodded soon to get a reponse from biomed (who I agree should use a newer WMS and close it). On Hold (26/1)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111547 111547] (2/2)<br /><br />
I'm reporting on a ticket that I submitted to myself today. I'm not sure what that says about the world. Anyway - a ticket to track the decommissioning of one of Lancaster's CEs, as we try to do it all proper like. On Hold (2/2)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (21/1/14)<br /><br />
Lancaster's perfsonar ticket, which I sadly let reach its first birthday. I've been prodding this offline, does anyone have the address for a regular, open iperf endpoint I could borrow? On Hold (9/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13)<br /><br />
Lancaster's tarball glexec ticket, as the ECDF one. On hold (26/1)<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13)<br /><br />
UCL's glexec ticket. They've been having trouble getting it to behave, and at last check Ben was off ill - probably due to dealing with glexec :-) On Hold (20/1)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353] (25/11/14)<br /><br />
Atlas asking for QM's storage to be made available via https. Waiting on a production ready STORM that can provide this - Dan is trying it out on his testbed se02.esc.qmul.ac.uk, which still needs tweaking. In progress (28/1)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357](23/1)<br /><br />
One of the IC CEs not working for biomed. Similar to the Manchester ticket, Daniela points to ticket 110635 and is waiting on an EMI release to fix it (due out imminently AIUI). On Hold (28/1)<br />
<br />
'''EFDA-JET'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=97485 97485] (21/9/13)<br /><br />
Jet's LCHB job failure tickets. I'm afraid I haven't been able to chase this up (partly due to only ever remembering on the first Monday of the month) - there's been no news for a while. On Hold (1/10/14)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333] (22/1)<br /><br />
A ticket to 100IT and the NGI to get the cloud accounting probe upgraded. I notified 100IT, but forgot to reassign the ticket - thanks to Jeremy for doing it. Assigned (2/2)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356](10/9/14)<br /><br />
Getting VMcatcher working at 100IT. David from 100IT has asked for some answers on which "glancepush" to use, but no reply for a while. Waiting for reply (19/1)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111477 111477](29/1)<br /><br />
CMS would like to run some staging tests to warm up for Run2. The Tier 1 warned CMS of today's outage and they're happy to proceed tomorrow (the 3rd) - I think they'd like a response. In progress (30/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=107935 107935](27/8/14)<br /><br />
A ticket regarding inconsistent BDII and SRM storage numbers. Waiting on a fix from the developers regarding read-only disk accounting (I think), Brian is still on the case. Stephen B let us know that Maria the ticket submitter is on maternity leave, and asks in her stead if the numbers are expected to align now. On hold (28/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111120 111120](12/1)<br /><br />
An atlas ticket about a large number of data transfer errors seen between RAL and BNL. Brian reckoned that this was due to shallow checksums on the old data being transferred, but had trouble looking at the BNL FTS. Regardless, the ADCoS shifter hadn't seen any errors for a week and suggests the ticket can be closed. Waiting for reply (29/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944](1/10/14)<br /><br />
CMS AAA test problems at RAL. After setting up a new xrootd box the test failures have changed in nature, but sadly they're still failures. In progress (29/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111347 111347](22/1)<br /><br />
CMS Consistency Check for RAL, January 2015 edition. Filelists were generated, orphan files were identified, then purged. Just need to know what CMS want to do next. Waiting for reply (26/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694](28/10/14)<br /><br />
Sno+ ticket concerning gfal tool problems, waiting on the new release to come out (middle of this month I believe). If you don't want to wait that long then I believe the 2.8 gfal2 tools can be found in the fts3 repo at last check. On hold (20/1)<br />
<br />
<br />
'''Monday 26th January 2015, 14.15 GMT'''<br /><br />
''Back after being forgotten about by me:''<br /><br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios Status:''']<br />
<br />
At the time of writing I see:<br /><br />
'''Imperial:''' gridpp VO job submission errors (but only 34 minutes old so probably naught to worry about).<br /><br />
'''Brunel:''' gridpp VO jobs aborted (one of these is 94 days old, so might be something to worry about).<br /><br />
'''Lancaster:''' pheno failures (I can't see what's wrong, but this CE only has 10 days left to live).<br /><br />
'''Sussex:''' snoplus failures (but I think Sussex is in downtime).<br /><br />
'''RALPP:''' A number of failures across a number of CEs, all a few hours old. An SE problem?<br /><br />
'''Sheffield:''' gridpp VO job submission failure, but only 6 hours old.<br />
And of course the srm-$VONAME failures at the Tier 1, which are caused by incompatibility between the tests and Castor AIUI. Things are generally looking good.<br />
<br />
22 Open UK Tickets this week. <br /><br />
'''NGI/100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333](22/1)<br /><br />
The NGI has been asked to upgrade the cloud accounting probe, and then notify our (only at the moment) cloud site to republish their accounting. Not entirely sure what this entails or who this falls on, I assigned it to NGI-OPERATIONS (and also noticed that 100IT isn't on the "notify site" list - odd). Assigned (22/1)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944](1/10/14)<br /><br />
CMS AAA test failures. Andrew Lahiff reported last week that the Tier 1 is building a replacement xrootd box which is currently being prepared. If that will take a while can the ticket be put on hold? In progress (19/1)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353](25/11/14)<br /><br />
An atlas ticket, asking for httpd access to at QMUL. The QM chaps were waiting on a production ready Storm that could handle this, and are preparing to test one out. This is another ticket that looks like it might need to be put On Hold (will leave that up to you chaps - there's a big difference between "slow and steady" progress and "no progress for a while"). In progress (21/1)<br />
<br />
'''RHUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111355 111355](23/1)<br /><br />
A dteam ticket - concerning http access to RHUL's SE. Although the initial observation about the SE certificate being expired was incorrect (the expiry date was reported as 5/1/15, which to be fair I would read as the 5th of January and not the 1st of May!) there still is some underlying problem here with intermittent test failures. Also this ticket raises the question of under what context are these tests being conducted? Anyone know, or shall we ask the submitter? In progress (26/1)<br />
<br />
'''BIOMED PROBLEMS:'''<br /><br />
'''Manchester:''' [https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356](23/1)<br /><br />
'''Imperial:''' [https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357](23/1)<br /><br />
Biomed are having job problems, looking to be caused by using crusty old WMSes to communicate with these site's shiny up-to-date CEs. According to ticket 110635 a cream side fix should be out by the end of January (CREAM 1.16.5), although Alessandra suggests that Biomed should try to use newer, working WMSes - or Dirac instead!<br />
<br />
<br />
'''Monday 19th January 2015, 14.30 GMT'''<br /><br />
23 Open UK Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944](1/10/14)<br /><br />
CMS seeing AAA test failures at RAL. The tests have been restarted recently and now seem to be having some suspicious looking authentication failures. In Progress (13/1)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111162 111162](14/1)<br /><br />
Atlas complaining about httpd doors not working on Sheffield's SE. After schooling the submitter in how to submit more useful information Elena is working on it. I bring this up as in the last few days I've had quite a few of my pool nodes have their httpd daemons crash on them (they're up to date, but still SL5), which may or may not be related. In Progress (19/1)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111118 111118](12/1)<br /><br />
ECDF "low availability" ticket after a few days of argus trouble, which Wahid fixed. Now the ticket will languish for a few weeks as the alarm clears. Daniela has reminded us of her ticket against these fairly silly alarms: https://ggus.eu/?mode=ticket_info&ticket_id=107689 . In the mean time this ticket could do with being put On Hold whilst the alarm clears. In progress (19/1)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356](10/9/2014)<br /><br />
Setting up VMCatcher at 100IT. After some troubles things seem to have be looking up, although there are still some questions that the 100IT chaps have for the configurations and what they should be using that aren't getting answers. I set the ticket to "Waiting for Reply" hoping that this will help get those in the know's attention. Waiting for reply (15/1)<br />
<br />
'''Perfsonar Tickets'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389](Sussex)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110382 110382](TIER 1)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108273 108273](Durham)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566](Lancaster)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110365 110365](Bristol)<br /><br />
Everyone seems to have updated their perfsonar hosts, so we're all good on that front, but a number of sites are either having trouble with their reinstalled hosts, or are having problems that they had pre-reinstall still haunt them. I'm afraid I have no suggestions of what to do about the growing number of these tickets though!</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Operations_Bulletin_LatestOperations Bulletin Latest2015-09-18T10:08:49Z<p>Matthew Doidge 1ac9bd3994: /* */</p>
<hr />
<div>[[Operations_Bulletin_Archive|Bulletin archive]]<br />
<br />
__NOTOC__<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Week commencing 14th September 2015<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Task Areas<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start General****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | General updates <br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start General text*********************** ----->'''<br />
'''Tuesday 1st September'''<br />
* From October/November, the EGI ops VO monitoring will be performed using RFC proxies, as opposed to legacy proxies.<br />
* There will be a new filter for the critical profile for ATLAS WLCG SAM tests so that only production endpoints will be tested and taken into account for site availability metrics. This will be available from the SAM3 interface.<br />
* CNAF: Due to a fire causing problems with one electrical supply line that happened last Thursday, the computing centre is running at lower capacity (around 30% less of the pledged capacity).<br />
* Machine job features testing has hit a big according to Raul!<br />
* The HNSciCloud PCP pilot proposal successful submission (refer to Andrew Sansum's talk at GridPP34 if you forget what this means!). The project intends to procure commercial cloud resources for FY17 and FY18. We will contribute 75K euro towards this activity and the EU will then top up to 250K euro.<br />
* There was an EGI Operations Management Board last Thursday. There are no summary notes yet, but please take a look at the [https://indico.egi.eu/indico/conferenceDisplay.py?confId=2379 agenda] and linked talks (may be worth skimming them at the ops meeting).<br />
* There was a quick request/reminder for sites to please update their [https://www.gridpp.ac.uk/wiki/IPv6_site_status IPv6 entries in the GridPP wiki].<br />
* RAL is closed on Monday and Tuesday of this week!<br />
<br />
<br />
<br />
'''Tuesday 24th August'''<br />
* A [https://indico.cern.ch/event/319751/ draft agenda for the September GDB] is taking shape. Any Tier-2 rep volunteers this month please email Jeremy.<br />
* Interest in HTCondor CE. (Ops thread 20/08).<br />
<br />
<br />
'''Tuesday 18th August'''<br />
* Root CA - Jens will provide a quick incident report of what went wrong!<br />
* [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGDailyMeetingsWeek150817#Monday Notes from Monday's WLCG ops meeting] are available. <br />
* DPM tuning (raised my Matt D - 17/08)<br />
* [http://cds.cern.ch/journal/CERNBulletin/2015/32/News%20Articles/2038500?ln=en Developers@CERN Forum] is an initiative of a group of developers targeting all software developers at CERN.<br />
* Multicore inefficiencies: RAL - CMS allocated slots not used. Glasgow - the concern for us is CPU that was previously being used by ATLAS is no longer being used because of multicore size mismatches. <br />
* [https://labs.ripe.net/atlas/user-experiences RIPE ATLAS probe experience pages]. <br />
* Automating ATLAS consistency checking (See Alastair's email 12/08)<br />
* For generating docs out of markdown Steve T can recommend gitbook. This is what CERN use for the [http://cern.ch/config Config guide].<br />
<br />
<br />
<!-- **********************End General text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End General****************** -----><br />
<!-- ****************Start ops coord****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Operations Coordination - [https://indico.cern.ch/categoryDisplay.py?categId=4372 Agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start ops coord text*********************** -----><br />
<br />
* There was a middleware readiness meeting on 16th September.<br />
* The new DPM version is being tested via the ATLAS workflow by the Edinburgh Volunteer site.<br />
* Many new sites showed interest to participate in MW Readiness testing with CentOS7. It is useful to anticipate the MW behaviour in the event of new HW purchase. DPM validation on CentOS/SL7 is already ongoing at Glasgow.<br />
* ATLAS and CMS are asked to declare whether the xrootd 4 monitoring plugin is important for them or not. As it is now, it doesn't work with dCache v. 2.13.8<br />
* Despite the fact that FTS3 runs at very few sites we decided to test it for Readiness. In this context, ATLAS and CMS are asked to use the FTS3 pilot in their transfer test workflows<br />
* PIC successfully tested dCache v.2.13.8 for CMS.<br />
* CNAF has obtained Indigo-DataCloud effort to strengthen the ARGUS development team. The ARGUS collaboration will meet again early October. The problems faced at CERN with a CMS VOBOX are being investigated in ticket GGUS:116092.<br />
* The next MW Readiness WG vidyo meeting will take place on Wednesday 28 October at 4pm CET.<br />
<br />
<br />
'''Tuesday 25th August'''<br />
* [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150820 Minutes from the meeting last Thursday] are available.<br />
* Highlights: <br />
* Calling for volunteers to write articles on interesting topics for WLCG Operations. There is a [http://wlcg-ops.web.cern.ch/articles section in the portal] for this. In case you are interested, please send a mail to wlcg-ops-coord-chair people for details.<br />
* CERN is going to disable in few weeks write operations via RFIO v2 to Castor in the context of the RFIO access decommission.<br />
* A downtime for the Argus service @ CERN is expected due to the pending filer migration. No dates yet but it will affect ARGUS, all CEs and all batch worker nodes (glExec) running GridJobs. The downtime is foreseen to last 1h-2h.<br />
* Issues with VMs in Tier-0 infra also reported by CMS<br />
* LHCb is trying to integrate submission to the HTCondorCE instance @ CERN<br />
* A message broker pre-prod infra has been setup @ CERN to enable distribution of perfSONAR data to the experiments. OSG is enabling the data publication from the ITB collector service<br />
* Please observe the Action List in the minutes and on the WLCG Operations portal by clicking on the Task List relevant to: [https://wlcg-ops.web.cern.ch/sys-admins/tasks Sites]: [https://wlcg-ops.web.cern.ch/experiments/tasks Experiments].<br />
<br />
'''Tuesday 18th August'''<br />
* Pretty quiet! [https://indico.cern.ch/event/393614/ Next ops coordination meeting is on 20th].<br />
<br />
<br />
<!-- **********************End ops coord text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End ops coord****************** -----><br />
<br />
<!-- ****************Start T1****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-1 - [http://www.gridpp.rl.ac.uk/status/ Status Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start T1 text*********************** -----><br />
'''Tuesday 22nd September'''<br />
A reminder that there is a weekly [http://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Tier-1 experiment liaison meeting]. Notes from the last meeting [https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2015-09-16 here]<br />
* The production FTS was updated (to version 3.3.1) on Wed. morning (16th). The production FTS server has been under high load and has been showing performance problems.<br />
* The first step of the update of the Oracle databases behind Castor was made on Tuesday 15th. There are further steps to do - as announced in the GOC DB.<br />
* There will be an 'at risk' on the morning of Wednesday 30th Sep. as the Tier1's link into the RAL core network is upgraded to 40Gb.<br />
<!-- **********************End T1 text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End T1****************** -----><br />
<br />
<!-- ****************Start Storage & DM****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Storage & Data Management - [http://storage.esc.rl.ac.uk/weekly/ Agendas/Minutes]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150902-minutes.txt Wednesday 02 Sep]'''<br />
* Catch up with [http://www3.imperial.ac.uk/highenergyphysics/research/experiments/mice MICE]<br />
* How to do transfers of '''lots''' of files with FTS3 without the proxy timing out (in particularly if you need it vomsified)<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150812-minutes.txt Wednesday 12 Aug]'''<br />
* sort of housekeeping: data cleanups, catalogue synchronisation - in particular namespace dumps for VOs<br />
* GridPP storage/data at future events; GridPP35 and Hepix and Cloud data events<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150708-minutes.txt Wednesday 08 July]'''<br />
* Huge backlog of ATLAS data from Glasgow waiting to go to RAL, and oddly varying performance numbers - investigating<br />
* How physics data is like your Windows 95 games<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150701-minutes.txt Wednesday 01 July]'''<br />
* Feedback on CMS's proposal for listing contents of storage<br />
* Simple storage on expensive raided disks vs complicated storage on el cheapo or archive drives?<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150624-minutes.txt Wednesday 24 June]'''<br />
* Heard about the Indigo datacloud project, a H2020 project in which STFC is participating<br />
* Data transfers, theory and practice<br />
** Somewhat clunky tools to set up but perform well when they run<br />
** Will continue to work on recommendations/overview document<br />
** Worth having recommendations/experiences for different audiences - (potential) users, decision makers, techies<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End Storage & DM****************** -----><br />
<!-- ****************Start T2EVO****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-2 Evolution - [https://its.cern.ch/jira/browse/GRIDPP/ GridPP JIRA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Thursday 17 Sep'''<br />
* Task force to start developing advice for sites to simplify their operation in line with "6.2.5 Evolution of Tier-2 sites" in the GridPP5 proposal.<br />
* Mailing list for Tier-2 evolution activities: gridpp-t2evo@cern.ch - anyone welcome to join<br />
* Also [https://its.cern.ch/jira/browse/GRIDPP/ GridPP project] on the CERN JIRA service for tracking actions. Can be used with a full or lightweight CERN account. You need to be added manually or on the gridpp-ops@cern.ch mailing list to browse issues.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End T2EVO****************** -----><br />
<!-- ****************Start Accounting****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Accounting - [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html UK Grid Metrics] [[HEPSPEC06]] [http://tinyurl.com/cfbfanf Atlas Dashboard HS06]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Monday 20th July'''<br />
* Oxford publishing 0 cores from Cream today. Maybe they forgot to switch one off. [http://goc-accounting.grid-support.ac.uk/apel/jobs2_withsubmithost.html Check here]. <br />
<br />
'''Tuesday 14th July'''<br />
* QMUL and Sheffield appear to be lagging with publishing by a week. <br />
* Please check your multicore publishing status (especially those sites mentioned in June).<br />
<br />
* A reminder to keep updating the [https://www.gridpp.ac.uk/wiki/HEPSPEC06 HEPSPEC06 tables].<br />
<br />
* [http://accounting.egi.eu/egi.php?SubRegion=1.67&query=normcpu&startYear=2014&startMonth=8&endYear=2014&endMonth=9&yRange=SITE&xRange=VO&voGroup=lhc&chart=GRBAR&scale=LIN&localJobs=onlygridjobs APEL status]: An issue at Sheffield?<br />
<br />
<!-- *****************Edit stop*********************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End Accounting****************** -----><br />
<!-- ****************Start Documentation****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Documentation - [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=area KeyDocs]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- *********************************************************** -----><br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 18th August'''<br />
* Lydia's document - Setup a system to do data archiving using FTS3<br />
<br />
'''Tuesday 28th July'''<br />
* Ewan: /cvmfs/gridpp-vo help ... there's a lot of historical stuff on the GridPP wiki that makes it look a lot more complicated than it is now. We really should have a bit of a clear out at some point.<br />
<br />
'''Tuesday 23rd June'''<br />
* Reminder that documents need reviewing!<br />
<br />
<br />
'''General note'''<br />
<br />
See the [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=reviewed worst KeyDocs list] for documents needing review now and the names of the responsible people.<br />
<br />
|}<br />
<br />
<!-- ****************End Documentation****************** -----><br />
<!-- ****************Start Interop****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Interoperation - [https://wiki.egi.eu/wiki/Grid_Operations_Meetings EGI ops agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Monday 13th July'''<br />
<br />
* There was an EGI Ops meeting today: [https://wiki.egi.eu/wiki/Agenda-13-07-2015 agenda]<br />
* URT/UMD updates:<br />
** UMD 3.13.0 released on 01.07.2015: http://repository.egi.eu/2015/07/01/release-umd-3-13-0/<br />
*** APEL 1.4.1<br />
*** Argus PAP v. 1.6.2<br />
*** gLExec-wn - v. 1.2.3 (lcmaps and mkdir)<br />
*** storm 1.11.8<br />
*** fetch-crl 3.0.16<br />
*** cream 1.16.5<br />
*** dpm-xroot 3.5.2<br />
*** Xroot 4.1.1.<br />
*** Frontier Squid 2.7.24<br />
*** CVMFS 2.1.20<br />
*** GFAL2 2.8.4<br />
*** GFAL2-PYTHON 1.7.1<br />
** UMD 3.13.1 released on 13.07.2015: http://repository.egi.eu/2015/07/13/release-candidate-umd-3-13-1-rc1/ (link was not updated correctly during release)<br />
*** ARC Nagios probes 1.8.3<br />
<br />
* SR updates (small because it's summer):<br />
*** gfal2 2.9.1<br />
*** storm 1.11.9<br />
*** srm-ifce 1.23.1....<br />
*** gfal2-python 1.8.1<br />
** In Verification<br />
*** gfal2-plugin-xrootd 0.3.4<br />
<br />
* Accounting<br />
** [John Gordon] "Of the WLCG sites we now have 97%+ of cpu reported with cores. I expect you all saw my recent email to GDB naming 16 sites. If one German and one Spanish site and the four Russians start publishing we will jump to 99%+"<br />
** New list of sites needing to update multicore accounting being prepared this evening (Monday) by Vincenzo<br />
<br />
* SL5 decommissioning date March 2016; <br />
* Next meeting 10th August<br />
<br />
'''Monday 15th June'''<br />
* There was an EGI operations meeting today: [https://wiki.egi.eu/wiki/Agenda-15-06-2015 agenda]. <br />
* New Action: for the NGIs: please start tracking which sites are still using SL5 services: how many services, and for each service if still needed on SL5, if upgrades on SL5 services are expected). [https://wiki.egi.eu/wiki/SL5_retirement A wiki has been provided to record updates]. Also interesting to understand who is using Debian.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- *********************************************************** -----><br />
<br />
|}<br />
<!-- ****************End Interop****************** -----><br />
<!-- ****************Start Monitoring****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Monitoring - [https://www.gridpp.ac.uk/wiki/Links_Monitoring_pages Links] [http://grid-monitoring.cern.ch/mywlcg/ MyWLCG]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 16th June'''<br />
* F Melaccio & D Crooks decided to add a [https://www.gridpp.ac.uk/wiki/Monitoring_FAQs FAQs section] devoted to common monitoring issues under the monitoring page.<br />
* Feedback welcome. <br />
<br />
<br />
<br />
''' Tuesday 31st March<br />
<br />
* Glasgow Graphite/Grafana documentation: http://www.scotgrid.ac.uk/graphite/<br />
<br />
'''Monday 7th December<br />
<br />
* Meeting last Friday - agenda: https://indico.cern.ch/event/356853/ minutes: https://indico.cern.ch/event/356853/material/minutes/1.pdf<br />
* This was the wrap-up meeting of the consolidation TF; the mailing list will remain extant for a while yet.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Monitoring****************** -----><br />
<!-- ****************On-duty****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | On-duty - [https://operations-portal.in2p3.fr/dashboard Dashboard] [https://www.gridpp.ac.uk/wiki/ROD_rota ROD rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 1st September'''<br />
* Gordon was on shift.<br />
* Another very quiet week with no new tickets. We have two open ROD tickets, both of which are for A/R; one is against Cambridge, and the other is the now 53-day-old ticket at UCL.<br />
* Next up.... Kashif again (thanks Kashif!)<br />
<br />
'''Monday 24th August'''<br />
* Kashif on shift.<br />
* There were quite a few alarms throughout the week and many tickets were opened. All of the tickets were fixed within time limit. <br />
* The certificate of the Argus server at sheffield expired but Elena got a new certificate quickly. <br />
* Cambridge and UCL have low availibilty tickets and not much can be done about it except waiting for availibilty to reach 90%.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Rollout [http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html Status] [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions WLCG Baseline]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 12th May'''<br />
* MW Readiness WG meeting Wed May 6th at 4pm. Attended by Raul, Matt, Sam and Jeremy.<br />
<br />
'''Tuesday 17th March'''<br />
* Daniela has updated the [ https://www.gridpp.ac.uk/wiki/Staged_rollout_emi3 EMI-3 testing table]. Please check it is correct for your site. We want a clear view of where we are contributing.<br />
* There is a middleware readiness meeting this Wednesday. Would be good if a few site representatives joined.<br />
* Machine job features solution testing. Fed back that we will only commence tests if more documentation made available. This stops the HTC solution until after CHEP. Is there interest in testing other batch systems? Raul mentioned SLURM. There is also SGE and Torque.<br />
<br />
'''References'''<br />
<br />
* Staged Rollout pages (now separated into EMI1 & 2), and the page listing the deployed versions is extractable from the bdii, so they should all be reasonably up-to-date:<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout_emi2.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End On-duty****************** -----><br />
<!-- ****************Start Security****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Security - [http://www.gridpp.ac.uk/security/inchand/Incident.html Incident Procedure] [http://www.gridpp.ac.uk/security/policies/index.html Policies] [https://www.gridpp.ac.uk/wiki/SD_rota Rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
'''Tuesday 1st September'''<br />
* The IGTF has released an [ https://rt.egi.eu/rt/Ticket/Display.html?id=9406 urgent update to the trust anchor repository (1.67)]<br />
* Linda is working on a revision to the EGI Technology Questionnaire.<br />
<br />
'''Tuesday 24th August'''<br />
* [https://wiki.egi.eu/wiki/SVG:Advisory-SVG-2015-9323 EGI SVG Advisory "Moderate" RISK - dCache EGI-SVG-2015-9323]<br />
* [https://wiki.egi.eu/wiki/SVG:Advisory-SVG-2014-7159 EGI SVG Advisory 'Low' RISK - VOMs Potential DoS EGI-SVG-2014-7159]<br />
* EGI IGTF CA update, version 1.66-1 ticket created [EGI #9351], due August 31st.<br />
<br />
'''Tuesday 18th August'''<br />
* CVE-2015-3245 (libuser) - EGI-CSIRT processing ~50 sites remaining. None in UK.<br />
* glexec for small vos - summary doc in progress by IanN. Circulated to sec. team for sanity check. More work needed. <br />
<br />
<br />
The EGI [https://operations-portal.egi.eu/csiDashboard security dashboard].<br />
<br />
|}<br />
<!-- ****************End Security****************** -----><br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start Services****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Services - [http://netmon02.grid.hep.ph.ic.ac.uk:8080/maddash-webui/index.cgi PerfSonar dashboard] | [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).<br />
<br />
'''Tuesday 18th August'''<br />
* Next [https://indico.cern.ch/event/401680/ LHCOPN and LHCONE joint meeting]: Science Park Amsterdam (NL) 28-29 of October 2015<br />
<br />
'''Tuesday 14th July'''<br />
* GridPP35 in September will have a part focus on networking and IPv6. This will include a review of where sites are with their deployment. Please try to firm up dates for your IPv6 availability between now and September. Please update the [https://gridpp.ac.uk/wiki/IPv6_site_status GridPP IPv6 status table]. <br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Services****************** -----><br />
<!-- ****************Start Tickets****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tickets<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Friday 18th September'''<br />
<br />
Matt's on holiday until the 29th, so he's being replaced with links or any update Jeremy is kind enough to provide.<br />
<br />
[http://tinyurl.com/nwgrnys '''CAST YOUR GAZE UPON THE UK'S GGUS TICKETS HERE'''].<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''FEAST YOUR EYES ON THE T'OTHER VO NAGIOS STATUS HERE''']<br />
<br />
"Normal" service will resume in October. I'll leave y'all anticipating that lovely review of all the tickets.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tickets****************** -----><br />
<!-- ****************Start Tools****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tools - [https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] [https://gridppnagios.physics.ox.ac.uk/nagios/ Nagios]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== ===== <br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 09 June 2015'''<br />
*ARC CEs were failing nagios test becuase of non-availability of egi repository. Nagios test compare CA version from EGI repo. It started on 5th June and one of the IP addresses behind webserver was not responding. Problem went away in approximately 3 hours. The same problem started again on 6th June. Finally it was fixed on 8th June. No reason was given in any of the ticket opened regarding this outage. <br />
<br />
'''Tuesday 17th February'''<br />
* Another period where message brokers were temporarily unavailable seen yesterday. Any news on the last follow-up?<br />
<br />
'''Tuesday 27th January'''<br />
* Unscheduled outage of the EGI message broker (GRNET) caused a short-lived disruption to GridPP site monitoring (jobs failed) last Thursday 22nd January. Suspect BDII caching meant no immediate failover to stomp://mq.cro-ngi.hr:6163/ from stomp://mq.afroditi.hellasgrid.gr:6163/<br />
<br />
* [http://southgrid.blogspot.co.uk/2014/10/nagios-monitoring-for-non-lhc-vos.html Blog about VO Nagios]<br />
* [https://vo-nagios.physics.ox.ac.uk/nagios/ Oxford VO Nagios] currently monitoring gridpp, pheno, t2k.org, snoplus.snolab.ca, vo.southgrid.ac.uk.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tools****************** -----><br />
<!-- ****************Start VOs****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | VOs - [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS] [http://operations-portal.egi.eu/vo VO IDs] [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs Approved] [http://pprc.qmul.ac.uk/~walker/votable.html VO table]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 19th May'''<br />
* There is a current priority for enabling/supporting our joining communities. <br />
<br />
'''Tuesday 5th May'''<br />
* We have a number of VOs to be removed. Dedicated follow-up meeting proposed.<br />
<br />
'''Tuesday 28th April'''<br />
* For SNOPLUS.SNOLAB.CA, the port numbers for voms02.gridpp.ac.uk and voms03.gridpp.ac.uk have both been updated from 15003 to 15503.<br />
<br />
'''Tuesday 31st March'''<br />
* LIGO are in need of additional support for debugging some tests.<br />
* LSST now enabled on 3 sites. No 'own' CVMFS yet.<br />
<br />
* Impact<br />
** Citation policy (https://www.gridpp.ac.uk/acknowledging.html)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End VOs****************** -----><br />
<!-- ****************Start Sites****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Site Updates<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 24th February'''<br />
* Next review of status today.<br />
<br />
'''Tuesday 27th January'''<br />
* Squids not in GOCDB for: UCL; ECDF; Birmingham; Durham; RHUL; IC; Sussex; Lancaster<br />
* Squids in GOCDB for: EFDA-JET; Manchester; Liverpool; Cambridge; Sheffield; Bristol; Brunel; QMUL; T1; Oxford; Glasgow; RALPPD.<br />
<br />
'''Tuesday 2nd December''' <br />
* [https://www.gridpp.ac.uk/wiki/Batch_system_status Multicore status]. Queues available (63%)<br />
** YES: RAL T1; Brunel; Imperial; QMUL; Lancaster; Liverpool; Manchester; Glasgow; Cambridge; Oxford; RALPP; Sussex (12)<br />
** NO: RHUL (testing); UCL; Sheffield (testing); Durham; ECDF (testing); Birmingham; Bristol (7)<br />
<br />
* According to our [https://www.gridpp.ac.uk/wiki/Batch_system_status table] for cloud/VMs (26%)<br />
** YES: RAL T1; Brunel; Imperial; Manchester; Oxford (5)<br />
** NO: QMUL; RHUL; UCL; Lancaster; Liverpool; Sheffield; Durham; ECDF; Glasgow; Birmingham; Bristol; Cambridge; RALPP; Sussex (14)<br />
<br />
* [http://www.gridpp.ac.uk/php/gridpp-dirac-sam.php?action=viewlcg GridPP DIRAC jobs] successful (58%)<br />
** YES: Bristol; Glasgow; Lancaster; Liverpool; Manchester; Oxford; Sheffield; Brunel; IC; QMUL; RHUL (11)<br />
** NO: Cambridge; Durham; RALPP; RAL T1 (4) + ECDF; Sussex; UCL; Birmingham (4)<br />
<br />
* [https://www.gridpp.ac.uk/wiki/IPv6_site_status IPv6 status]<br />
** Allocation - 42%<br />
** YES: RAL T1; Brunel; IC; QMUL; Manchester; Sheffield; Cambridge; Oxford (8)<br />
** NO: RHUL; UCL; Lancaster; Liverpool; Durham; ECDF; Glasgow; Birmingham; Bristol; RALPP; Sussex<br />
<br />
* Dual stack nodes - 21%<br />
** YES: Brunel; IC; QMUL; Oxford (4)<br />
** NO: RHUL; UCL; Lancaster; Glasgow; Liverpool; Manchester; Sheffield; Durham; ECDF; Birmingham; Bristol; Cambridge; RALPP; Sussex, RAL T1 (15)<br />
<br />
<br />
<br />
'''Tuesday 21st October'''<br />
* High loads seen in xroot by several sites: Liverpool and RALT1... and also Bristol (see Luke's TB-S email on 16/10 for questions about changes to help).<br />
<br />
'''Tuesday 9th September'''<br />
* Intel announced the new generation of Xeon based on Haswell.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Sites****************** -----><br />
|}<br />
<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Meeting Summaries<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start PMB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Project Management Board - [http://www.gridpp.ac.uk/pmb/ Members][http://www.gridpp.ac.uk/php/pmb/minutes.php Minutes] [http://www.gridpp.ac.uk/pmb/ProjectManagement/QuarterlyReports/reports.html Quarterly Reports]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ******************************************* -----><br />
<!-- ****************End PMB****************** -----><br />
<!-- ****************Start Grid Ops****************** -----><br />
<!-- ******************************************* -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | GridPP ops meeting - [http://indico.cern.ch/categoryDisplay.py?categId=338 Agendas] [https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Actions] [https://www.gridpp.ac.uk/wiki/Category:GridPP_Operations Core Tasks]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Grid Ops****************** -----><br />
<!-- ****************Start T1 liaison****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) [https://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Agenda] Meeting takes place on Vidyo.<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
Highlights from this meeting are now included in the Tier1 report farther up this page.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Liaison****************** -----><br />
<!-- ****************Start GDB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Grid Deployment Board - [http://indico.cern.ch/categoryDisplay.py?categId=3l181 Agendas] [http://indico.cern.ch/categoryDisplay.py?categId=666 MB agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End GDB****************** -----><br />
<br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start NGI****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | NGI UK - [http://www.ukngi.ac.uk/ Homepage] [https://ca.grid-support.ac.uk/cgi-bin/pub/pki?cmd=getStaticPage&name=index CA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End NGI****************** -----><br />
<br />
<!-- ****************Start Events****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Events<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************Stop Events****************** -----><br />
<!-- ****************Start ATLAS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK ATLAS - [http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?#currentView=Shifter+view&highlight=false Shifter view] [http://www.atlas.ac.uk/ops/ News & Links]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Atlas S&C week 2-6 Feb 2015'''<br />
<br />
Production<br />
<br />
• Prodsys-2 in production since Dec 1st<br />
<br />
• Deployment has not been transparent , many issued has been solved, the grid is filled again <br />
<br />
• MC15 is expected to start soon, waiting for physics validations, evgen testing is underway and close to finalised.. Simulation expected to be broadly similar to MC14, no blockers expected. <br />
<br />
Rucio<br />
<br />
• Rucio in production since Dec 1st and is ready for LHC RUN-2. Some fields need improvements, including transfer and deletion agents, documentation and monitoring. <br />
<br />
• [https://rucio-ui.cern.ch/dumps Rucio dumps available].<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMDarkDataCleaning Dark data cleaning]<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMLostFilesLost files declaration] . Only Only DDM ops can issue lost files declaration for now, cloud support needs to fill a ticket.<br />
<br />
• Webdav panda functional tests with Hammercloud are ongoing<br />
<br />
Monitoring<br />
<br />
• [http://adc-monitoring.cern.ch/ Main page]<br />
<br />
• [http://dashb-atlas-ddm-acc.cern.ch/dashboard/request.py/ddmaccounting DDM Accounting]<br />
<br />
• [http://atlas-agis.cern.ch/agis/ddmblacklisting/listGroup space]<br />
<br />
• [http://dashb-atlas-ddm.cern.ch/ddm2/ Deletion]<br />
<br />
ASAP<br />
<br />
• ASAP (ATLAS Site Availability Performance) in place. Every 3 months the T2s sites performing BELOW 80% are [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/ATLASSiteCategorization reported to the International Computing Board.]<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End ATLAS****************** -----><br />
<!-- ****************Start CMS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK CMS<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End CMS****************** -----><br />
<!-- ****************Start LHCb****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK LHCb<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End LHCb****************** -----><br />
<!-- ****************Start Other****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK OTHER<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Other****************** -----><br />
<!-- ****************Start Requests****************** -----><br />
{| cellspacing="0" cellpadding="0" style="margin: 1em 0em 0em 0.3em; width:100%"<br />
| style="width:50%; vertical-align:top; border:1px solid Gold; background-color: LightGreen;" rowspan="2"|<br />
<div style="border-bottom:1px solid Gold; background-color:#ffffaa; padding:0.2em 0.5em 0.2em 0.5em; font-size:110%; font-weight:bold;"> To note</div><br />
<div style="padding:0.4em 1em 0.3em 1em;"><br />
<br />
==== ====<br />
<!-- ******************Edit start********************* -----><br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<br />
</div><br />
|}<br />
<br />
<!-- ****************End Requests****************** -----><br />
<br />
|}</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Operations_Bulletin_LatestOperations Bulletin Latest2015-09-18T10:08:30Z<p>Matthew Doidge 1ac9bd3994: /* */</p>
<hr />
<div>[[Operations_Bulletin_Archive|Bulletin archive]]<br />
<br />
__NOTOC__<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Week commencing 14th September 2015<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Task Areas<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start General****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | General updates <br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start General text*********************** ----->'''<br />
'''Tuesday 1st September'''<br />
* From October/November, the EGI ops VO monitoring will be performed using RFC proxies, as opposed to legacy proxies.<br />
* There will be a new filter for the critical profile for ATLAS WLCG SAM tests so that only production endpoints will be tested and taken into account for site availability metrics. This will be available from the SAM3 interface.<br />
* CNAF: Due to a fire causing problems with one electrical supply line that happened last Thursday, the computing centre is running at lower capacity (around 30% less of the pledged capacity).<br />
* Machine job features testing has hit a big according to Raul!<br />
* The HNSciCloud PCP pilot proposal successful submission (refer to Andrew Sansum's talk at GridPP34 if you forget what this means!). The project intends to procure commercial cloud resources for FY17 and FY18. We will contribute 75K euro towards this activity and the EU will then top up to 250K euro.<br />
* There was an EGI Operations Management Board last Thursday. There are no summary notes yet, but please take a look at the [https://indico.egi.eu/indico/conferenceDisplay.py?confId=2379 agenda] and linked talks (may be worth skimming them at the ops meeting).<br />
* There was a quick request/reminder for sites to please update their [https://www.gridpp.ac.uk/wiki/IPv6_site_status IPv6 entries in the GridPP wiki].<br />
* RAL is closed on Monday and Tuesday of this week!<br />
<br />
<br />
<br />
'''Tuesday 24th August'''<br />
* A [https://indico.cern.ch/event/319751/ draft agenda for the September GDB] is taking shape. Any Tier-2 rep volunteers this month please email Jeremy.<br />
* Interest in HTCondor CE. (Ops thread 20/08).<br />
<br />
<br />
'''Tuesday 18th August'''<br />
* Root CA - Jens will provide a quick incident report of what went wrong!<br />
* [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGDailyMeetingsWeek150817#Monday Notes from Monday's WLCG ops meeting] are available. <br />
* DPM tuning (raised my Matt D - 17/08)<br />
* [http://cds.cern.ch/journal/CERNBulletin/2015/32/News%20Articles/2038500?ln=en Developers@CERN Forum] is an initiative of a group of developers targeting all software developers at CERN.<br />
* Multicore inefficiencies: RAL - CMS allocated slots not used. Glasgow - the concern for us is CPU that was previously being used by ATLAS is no longer being used because of multicore size mismatches. <br />
* [https://labs.ripe.net/atlas/user-experiences RIPE ATLAS probe experience pages]. <br />
* Automating ATLAS consistency checking (See Alastair's email 12/08)<br />
* For generating docs out of markdown Steve T can recommend gitbook. This is what CERN use for the [http://cern.ch/config Config guide].<br />
<br />
<br />
<!-- **********************End General text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End General****************** -----><br />
<!-- ****************Start ops coord****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Operations Coordination - [https://indico.cern.ch/categoryDisplay.py?categId=4372 Agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start ops coord text*********************** -----><br />
<br />
* There was a middleware readiness meeting on 16th September.<br />
* The new DPM version is being tested via the ATLAS workflow by the Edinburgh Volunteer site.<br />
* Many new sites showed interest to participate in MW Readiness testing with CentOS7. It is useful to anticipate the MW behaviour in the event of new HW purchase. DPM validation on CentOS/SL7 is already ongoing at Glasgow.<br />
* ATLAS and CMS are asked to declare whether the xrootd 4 monitoring plugin is important for them or not. As it is now, it doesn't work with dCache v. 2.13.8<br />
* Despite the fact that FTS3 runs at very few sites we decided to test it for Readiness. In this context, ATLAS and CMS are asked to use the FTS3 pilot in their transfer test workflows<br />
* PIC successfully tested dCache v.2.13.8 for CMS.<br />
* CNAF has obtained Indigo-DataCloud effort to strengthen the ARGUS development team. The ARGUS collaboration will meet again early October. The problems faced at CERN with a CMS VOBOX are being investigated in ticket GGUS:116092.<br />
* The next MW Readiness WG vidyo meeting will take place on Wednesday 28 October at 4pm CET.<br />
<br />
<br />
'''Tuesday 25th August'''<br />
* [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150820 Minutes from the meeting last Thursday] are available.<br />
* Highlights: <br />
* Calling for volunteers to write articles on interesting topics for WLCG Operations. There is a [http://wlcg-ops.web.cern.ch/articles section in the portal] for this. In case you are interested, please send a mail to wlcg-ops-coord-chair people for details.<br />
* CERN is going to disable in few weeks write operations via RFIO v2 to Castor in the context of the RFIO access decommission.<br />
* A downtime for the Argus service @ CERN is expected due to the pending filer migration. No dates yet but it will affect ARGUS, all CEs and all batch worker nodes (glExec) running GridJobs. The downtime is foreseen to last 1h-2h.<br />
* Issues with VMs in Tier-0 infra also reported by CMS<br />
* LHCb is trying to integrate submission to the HTCondorCE instance @ CERN<br />
* A message broker pre-prod infra has been setup @ CERN to enable distribution of perfSONAR data to the experiments. OSG is enabling the data publication from the ITB collector service<br />
* Please observe the Action List in the minutes and on the WLCG Operations portal by clicking on the Task List relevant to: [https://wlcg-ops.web.cern.ch/sys-admins/tasks Sites]: [https://wlcg-ops.web.cern.ch/experiments/tasks Experiments].<br />
<br />
'''Tuesday 18th August'''<br />
* Pretty quiet! [https://indico.cern.ch/event/393614/ Next ops coordination meeting is on 20th].<br />
<br />
<br />
<!-- **********************End ops coord text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End ops coord****************** -----><br />
<br />
<!-- ****************Start T1****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-1 - [http://www.gridpp.rl.ac.uk/status/ Status Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start T1 text*********************** -----><br />
'''Tuesday 22nd September'''<br />
A reminder that there is a weekly [http://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Tier-1 experiment liaison meeting]. Notes from the last meeting [https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2015-09-16 here]<br />
* The production FTS was updated (to version 3.3.1) on Wed. morning (16th). The production FTS server has been under high load and has been showing performance problems.<br />
* The first step of the update of the Oracle databases behind Castor was made on Tuesday 15th. There are further steps to do - as announced in the GOC DB.<br />
* There will be an 'at risk' on the morning of Wednesday 30th Sep. as the Tier1's link into the RAL core network is upgraded to 40Gb.<br />
<!-- **********************End T1 text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End T1****************** -----><br />
<br />
<!-- ****************Start Storage & DM****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Storage & Data Management - [http://storage.esc.rl.ac.uk/weekly/ Agendas/Minutes]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150902-minutes.txt Wednesday 02 Sep]'''<br />
* Catch up with [http://www3.imperial.ac.uk/highenergyphysics/research/experiments/mice MICE]<br />
* How to do transfers of '''lots''' of files with FTS3 without the proxy timing out (in particularly if you need it vomsified)<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150812-minutes.txt Wednesday 12 Aug]'''<br />
* sort of housekeeping: data cleanups, catalogue synchronisation - in particular namespace dumps for VOs<br />
* GridPP storage/data at future events; GridPP35 and Hepix and Cloud data events<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150708-minutes.txt Wednesday 08 July]'''<br />
* Huge backlog of ATLAS data from Glasgow waiting to go to RAL, and oddly varying performance numbers - investigating<br />
* How physics data is like your Windows 95 games<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150701-minutes.txt Wednesday 01 July]'''<br />
* Feedback on CMS's proposal for listing contents of storage<br />
* Simple storage on expensive raided disks vs complicated storage on el cheapo or archive drives?<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150624-minutes.txt Wednesday 24 June]'''<br />
* Heard about the Indigo datacloud project, a H2020 project in which STFC is participating<br />
* Data transfers, theory and practice<br />
** Somewhat clunky tools to set up but perform well when they run<br />
** Will continue to work on recommendations/overview document<br />
** Worth having recommendations/experiences for different audiences - (potential) users, decision makers, techies<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End Storage & DM****************** -----><br />
<!-- ****************Start T2EVO****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-2 Evolution - [https://its.cern.ch/jira/browse/GRIDPP/ GridPP JIRA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Thursday 17 Sep'''<br />
* Task force to start developing advice for sites to simplify their operation in line with "6.2.5 Evolution of Tier-2 sites" in the GridPP5 proposal.<br />
* Mailing list for Tier-2 evolution activities: gridpp-t2evo@cern.ch - anyone welcome to join<br />
* Also [https://its.cern.ch/jira/browse/GRIDPP/ GridPP project] on the CERN JIRA service for tracking actions. Can be used with a full or lightweight CERN account. You need to be added manually or on the gridpp-ops@cern.ch mailing list to browse issues.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End T2EVO****************** -----><br />
<!-- ****************Start Accounting****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Accounting - [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html UK Grid Metrics] [[HEPSPEC06]] [http://tinyurl.com/cfbfanf Atlas Dashboard HS06]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Monday 20th July'''<br />
* Oxford publishing 0 cores from Cream today. Maybe they forgot to switch one off. [http://goc-accounting.grid-support.ac.uk/apel/jobs2_withsubmithost.html Check here]. <br />
<br />
'''Tuesday 14th July'''<br />
* QMUL and Sheffield appear to be lagging with publishing by a week. <br />
* Please check your multicore publishing status (especially those sites mentioned in June).<br />
<br />
* A reminder to keep updating the [https://www.gridpp.ac.uk/wiki/HEPSPEC06 HEPSPEC06 tables].<br />
<br />
* [http://accounting.egi.eu/egi.php?SubRegion=1.67&query=normcpu&startYear=2014&startMonth=8&endYear=2014&endMonth=9&yRange=SITE&xRange=VO&voGroup=lhc&chart=GRBAR&scale=LIN&localJobs=onlygridjobs APEL status]: An issue at Sheffield?<br />
<br />
<!-- *****************Edit stop*********************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End Accounting****************** -----><br />
<!-- ****************Start Documentation****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Documentation - [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=area KeyDocs]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- *********************************************************** -----><br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 18th August'''<br />
* Lydia's document - Setup a system to do data archiving using FTS3<br />
<br />
'''Tuesday 28th July'''<br />
* Ewan: /cvmfs/gridpp-vo help ... there's a lot of historical stuff on the GridPP wiki that makes it look a lot more complicated than it is now. We really should have a bit of a clear out at some point.<br />
<br />
'''Tuesday 23rd June'''<br />
* Reminder that documents need reviewing!<br />
<br />
<br />
'''General note'''<br />
<br />
See the [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=reviewed worst KeyDocs list] for documents needing review now and the names of the responsible people.<br />
<br />
|}<br />
<br />
<!-- ****************End Documentation****************** -----><br />
<!-- ****************Start Interop****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Interoperation - [https://wiki.egi.eu/wiki/Grid_Operations_Meetings EGI ops agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Monday 13th July'''<br />
<br />
* There was an EGI Ops meeting today: [https://wiki.egi.eu/wiki/Agenda-13-07-2015 agenda]<br />
* URT/UMD updates:<br />
** UMD 3.13.0 released on 01.07.2015: http://repository.egi.eu/2015/07/01/release-umd-3-13-0/<br />
*** APEL 1.4.1<br />
*** Argus PAP v. 1.6.2<br />
*** gLExec-wn - v. 1.2.3 (lcmaps and mkdir)<br />
*** storm 1.11.8<br />
*** fetch-crl 3.0.16<br />
*** cream 1.16.5<br />
*** dpm-xroot 3.5.2<br />
*** Xroot 4.1.1.<br />
*** Frontier Squid 2.7.24<br />
*** CVMFS 2.1.20<br />
*** GFAL2 2.8.4<br />
*** GFAL2-PYTHON 1.7.1<br />
** UMD 3.13.1 released on 13.07.2015: http://repository.egi.eu/2015/07/13/release-candidate-umd-3-13-1-rc1/ (link was not updated correctly during release)<br />
*** ARC Nagios probes 1.8.3<br />
<br />
* SR updates (small because it's summer):<br />
*** gfal2 2.9.1<br />
*** storm 1.11.9<br />
*** srm-ifce 1.23.1....<br />
*** gfal2-python 1.8.1<br />
** In Verification<br />
*** gfal2-plugin-xrootd 0.3.4<br />
<br />
* Accounting<br />
** [John Gordon] "Of the WLCG sites we now have 97%+ of cpu reported with cores. I expect you all saw my recent email to GDB naming 16 sites. If one German and one Spanish site and the four Russians start publishing we will jump to 99%+"<br />
** New list of sites needing to update multicore accounting being prepared this evening (Monday) by Vincenzo<br />
<br />
* SL5 decommissioning date March 2016; <br />
* Next meeting 10th August<br />
<br />
'''Monday 15th June'''<br />
* There was an EGI operations meeting today: [https://wiki.egi.eu/wiki/Agenda-15-06-2015 agenda]. <br />
* New Action: for the NGIs: please start tracking which sites are still using SL5 services: how many services, and for each service if still needed on SL5, if upgrades on SL5 services are expected). [https://wiki.egi.eu/wiki/SL5_retirement A wiki has been provided to record updates]. Also interesting to understand who is using Debian.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- *********************************************************** -----><br />
<br />
|}<br />
<!-- ****************End Interop****************** -----><br />
<!-- ****************Start Monitoring****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Monitoring - [https://www.gridpp.ac.uk/wiki/Links_Monitoring_pages Links] [http://grid-monitoring.cern.ch/mywlcg/ MyWLCG]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 16th June'''<br />
* F Melaccio & D Crooks decided to add a [https://www.gridpp.ac.uk/wiki/Monitoring_FAQs FAQs section] devoted to common monitoring issues under the monitoring page.<br />
* Feedback welcome. <br />
<br />
<br />
<br />
''' Tuesday 31st March<br />
<br />
* Glasgow Graphite/Grafana documentation: http://www.scotgrid.ac.uk/graphite/<br />
<br />
'''Monday 7th December<br />
<br />
* Meeting last Friday - agenda: https://indico.cern.ch/event/356853/ minutes: https://indico.cern.ch/event/356853/material/minutes/1.pdf<br />
* This was the wrap-up meeting of the consolidation TF; the mailing list will remain extant for a while yet.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Monitoring****************** -----><br />
<!-- ****************On-duty****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | On-duty - [https://operations-portal.in2p3.fr/dashboard Dashboard] [https://www.gridpp.ac.uk/wiki/ROD_rota ROD rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 1st September'''<br />
* Gordon was on shift.<br />
* Another very quiet week with no new tickets. We have two open ROD tickets, both of which are for A/R; one is against Cambridge, and the other is the now 53-day-old ticket at UCL.<br />
* Next up.... Kashif again (thanks Kashif!)<br />
<br />
'''Monday 24th August'''<br />
* Kashif on shift.<br />
* There were quite a few alarms throughout the week and many tickets were opened. All of the tickets were fixed within time limit. <br />
* The certificate of the Argus server at sheffield expired but Elena got a new certificate quickly. <br />
* Cambridge and UCL have low availibilty tickets and not much can be done about it except waiting for availibilty to reach 90%.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Rollout [http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html Status] [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions WLCG Baseline]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 12th May'''<br />
* MW Readiness WG meeting Wed May 6th at 4pm. Attended by Raul, Matt, Sam and Jeremy.<br />
<br />
'''Tuesday 17th March'''<br />
* Daniela has updated the [ https://www.gridpp.ac.uk/wiki/Staged_rollout_emi3 EMI-3 testing table]. Please check it is correct for your site. We want a clear view of where we are contributing.<br />
* There is a middleware readiness meeting this Wednesday. Would be good if a few site representatives joined.<br />
* Machine job features solution testing. Fed back that we will only commence tests if more documentation made available. This stops the HTC solution until after CHEP. Is there interest in testing other batch systems? Raul mentioned SLURM. There is also SGE and Torque.<br />
<br />
'''References'''<br />
<br />
* Staged Rollout pages (now separated into EMI1 & 2), and the page listing the deployed versions is extractable from the bdii, so they should all be reasonably up-to-date:<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout_emi2.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End On-duty****************** -----><br />
<!-- ****************Start Security****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Security - [http://www.gridpp.ac.uk/security/inchand/Incident.html Incident Procedure] [http://www.gridpp.ac.uk/security/policies/index.html Policies] [https://www.gridpp.ac.uk/wiki/SD_rota Rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
'''Tuesday 1st September'''<br />
* The IGTF has released an [ https://rt.egi.eu/rt/Ticket/Display.html?id=9406 urgent update to the trust anchor repository (1.67)]<br />
* Linda is working on a revision to the EGI Technology Questionnaire.<br />
<br />
'''Tuesday 24th August'''<br />
* [https://wiki.egi.eu/wiki/SVG:Advisory-SVG-2015-9323 EGI SVG Advisory "Moderate" RISK - dCache EGI-SVG-2015-9323]<br />
* [https://wiki.egi.eu/wiki/SVG:Advisory-SVG-2014-7159 EGI SVG Advisory 'Low' RISK - VOMs Potential DoS EGI-SVG-2014-7159]<br />
* EGI IGTF CA update, version 1.66-1 ticket created [EGI #9351], due August 31st.<br />
<br />
'''Tuesday 18th August'''<br />
* CVE-2015-3245 (libuser) - EGI-CSIRT processing ~50 sites remaining. None in UK.<br />
* glexec for small vos - summary doc in progress by IanN. Circulated to sec. team for sanity check. More work needed. <br />
<br />
<br />
The EGI [https://operations-portal.egi.eu/csiDashboard security dashboard].<br />
<br />
|}<br />
<!-- ****************End Security****************** -----><br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start Services****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Services - [http://netmon02.grid.hep.ph.ic.ac.uk:8080/maddash-webui/index.cgi PerfSonar dashboard] | [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).<br />
<br />
'''Tuesday 18th August'''<br />
* Next [https://indico.cern.ch/event/401680/ LHCOPN and LHCONE joint meeting]: Science Park Amsterdam (NL) 28-29 of October 2015<br />
<br />
'''Tuesday 14th July'''<br />
* GridPP35 in September will have a part focus on networking and IPv6. This will include a review of where sites are with their deployment. Please try to firm up dates for your IPv6 availability between now and September. Please update the [https://gridpp.ac.uk/wiki/IPv6_site_status GridPP IPv6 status table]. <br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Services****************** -----><br />
<!-- ****************Start Tickets****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tickets<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Friday 18th September'''<br />
<br />
Matt's on holiday until the 29th, so he's being replaced with links or any update Jeremy is kind enough to provide.<br />
<br />
[http://tinyurl.com/nwgrnys '''CAST YOUR GAZE UPON THE UK'S GGUS TICKETS HERE'''].<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 FEAST YOUR EYES ON THE T'OTHER VO NAGIOS STATUS HERE]<br />
<br />
"Normal" service will resume in October. I'll leave y'all anticipating that lovely review of all the tickets.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tickets****************** -----><br />
<!-- ****************Start Tools****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tools - [https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] [https://gridppnagios.physics.ox.ac.uk/nagios/ Nagios]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== ===== <br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 09 June 2015'''<br />
*ARC CEs were failing nagios test becuase of non-availability of egi repository. Nagios test compare CA version from EGI repo. It started on 5th June and one of the IP addresses behind webserver was not responding. Problem went away in approximately 3 hours. The same problem started again on 6th June. Finally it was fixed on 8th June. No reason was given in any of the ticket opened regarding this outage. <br />
<br />
'''Tuesday 17th February'''<br />
* Another period where message brokers were temporarily unavailable seen yesterday. Any news on the last follow-up?<br />
<br />
'''Tuesday 27th January'''<br />
* Unscheduled outage of the EGI message broker (GRNET) caused a short-lived disruption to GridPP site monitoring (jobs failed) last Thursday 22nd January. Suspect BDII caching meant no immediate failover to stomp://mq.cro-ngi.hr:6163/ from stomp://mq.afroditi.hellasgrid.gr:6163/<br />
<br />
* [http://southgrid.blogspot.co.uk/2014/10/nagios-monitoring-for-non-lhc-vos.html Blog about VO Nagios]<br />
* [https://vo-nagios.physics.ox.ac.uk/nagios/ Oxford VO Nagios] currently monitoring gridpp, pheno, t2k.org, snoplus.snolab.ca, vo.southgrid.ac.uk.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tools****************** -----><br />
<!-- ****************Start VOs****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | VOs - [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS] [http://operations-portal.egi.eu/vo VO IDs] [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs Approved] [http://pprc.qmul.ac.uk/~walker/votable.html VO table]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 19th May'''<br />
* There is a current priority for enabling/supporting our joining communities. <br />
<br />
'''Tuesday 5th May'''<br />
* We have a number of VOs to be removed. Dedicated follow-up meeting proposed.<br />
<br />
'''Tuesday 28th April'''<br />
* For SNOPLUS.SNOLAB.CA, the port numbers for voms02.gridpp.ac.uk and voms03.gridpp.ac.uk have both been updated from 15003 to 15503.<br />
<br />
'''Tuesday 31st March'''<br />
* LIGO are in need of additional support for debugging some tests.<br />
* LSST now enabled on 3 sites. No 'own' CVMFS yet.<br />
<br />
* Impact<br />
** Citation policy (https://www.gridpp.ac.uk/acknowledging.html)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End VOs****************** -----><br />
<!-- ****************Start Sites****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Site Updates<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 24th February'''<br />
* Next review of status today.<br />
<br />
'''Tuesday 27th January'''<br />
* Squids not in GOCDB for: UCL; ECDF; Birmingham; Durham; RHUL; IC; Sussex; Lancaster<br />
* Squids in GOCDB for: EFDA-JET; Manchester; Liverpool; Cambridge; Sheffield; Bristol; Brunel; QMUL; T1; Oxford; Glasgow; RALPPD.<br />
<br />
'''Tuesday 2nd December''' <br />
* [https://www.gridpp.ac.uk/wiki/Batch_system_status Multicore status]. Queues available (63%)<br />
** YES: RAL T1; Brunel; Imperial; QMUL; Lancaster; Liverpool; Manchester; Glasgow; Cambridge; Oxford; RALPP; Sussex (12)<br />
** NO: RHUL (testing); UCL; Sheffield (testing); Durham; ECDF (testing); Birmingham; Bristol (7)<br />
<br />
* According to our [https://www.gridpp.ac.uk/wiki/Batch_system_status table] for cloud/VMs (26%)<br />
** YES: RAL T1; Brunel; Imperial; Manchester; Oxford (5)<br />
** NO: QMUL; RHUL; UCL; Lancaster; Liverpool; Sheffield; Durham; ECDF; Glasgow; Birmingham; Bristol; Cambridge; RALPP; Sussex (14)<br />
<br />
* [http://www.gridpp.ac.uk/php/gridpp-dirac-sam.php?action=viewlcg GridPP DIRAC jobs] successful (58%)<br />
** YES: Bristol; Glasgow; Lancaster; Liverpool; Manchester; Oxford; Sheffield; Brunel; IC; QMUL; RHUL (11)<br />
** NO: Cambridge; Durham; RALPP; RAL T1 (4) + ECDF; Sussex; UCL; Birmingham (4)<br />
<br />
* [https://www.gridpp.ac.uk/wiki/IPv6_site_status IPv6 status]<br />
** Allocation - 42%<br />
** YES: RAL T1; Brunel; IC; QMUL; Manchester; Sheffield; Cambridge; Oxford (8)<br />
** NO: RHUL; UCL; Lancaster; Liverpool; Durham; ECDF; Glasgow; Birmingham; Bristol; RALPP; Sussex<br />
<br />
* Dual stack nodes - 21%<br />
** YES: Brunel; IC; QMUL; Oxford (4)<br />
** NO: RHUL; UCL; Lancaster; Glasgow; Liverpool; Manchester; Sheffield; Durham; ECDF; Birmingham; Bristol; Cambridge; RALPP; Sussex, RAL T1 (15)<br />
<br />
<br />
<br />
'''Tuesday 21st October'''<br />
* High loads seen in xroot by several sites: Liverpool and RALT1... and also Bristol (see Luke's TB-S email on 16/10 for questions about changes to help).<br />
<br />
'''Tuesday 9th September'''<br />
* Intel announced the new generation of Xeon based on Haswell.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Sites****************** -----><br />
|}<br />
<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Meeting Summaries<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start PMB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Project Management Board - [http://www.gridpp.ac.uk/pmb/ Members][http://www.gridpp.ac.uk/php/pmb/minutes.php Minutes] [http://www.gridpp.ac.uk/pmb/ProjectManagement/QuarterlyReports/reports.html Quarterly Reports]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ******************************************* -----><br />
<!-- ****************End PMB****************** -----><br />
<!-- ****************Start Grid Ops****************** -----><br />
<!-- ******************************************* -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | GridPP ops meeting - [http://indico.cern.ch/categoryDisplay.py?categId=338 Agendas] [https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Actions] [https://www.gridpp.ac.uk/wiki/Category:GridPP_Operations Core Tasks]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Grid Ops****************** -----><br />
<!-- ****************Start T1 liaison****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) [https://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Agenda] Meeting takes place on Vidyo.<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
Highlights from this meeting are now included in the Tier1 report farther up this page.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Liaison****************** -----><br />
<!-- ****************Start GDB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Grid Deployment Board - [http://indico.cern.ch/categoryDisplay.py?categId=3l181 Agendas] [http://indico.cern.ch/categoryDisplay.py?categId=666 MB agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End GDB****************** -----><br />
<br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start NGI****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | NGI UK - [http://www.ukngi.ac.uk/ Homepage] [https://ca.grid-support.ac.uk/cgi-bin/pub/pki?cmd=getStaticPage&name=index CA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End NGI****************** -----><br />
<br />
<!-- ****************Start Events****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Events<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************Stop Events****************** -----><br />
<!-- ****************Start ATLAS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK ATLAS - [http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?#currentView=Shifter+view&highlight=false Shifter view] [http://www.atlas.ac.uk/ops/ News & Links]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Atlas S&C week 2-6 Feb 2015'''<br />
<br />
Production<br />
<br />
• Prodsys-2 in production since Dec 1st<br />
<br />
• Deployment has not been transparent , many issued has been solved, the grid is filled again <br />
<br />
• MC15 is expected to start soon, waiting for physics validations, evgen testing is underway and close to finalised.. Simulation expected to be broadly similar to MC14, no blockers expected. <br />
<br />
Rucio<br />
<br />
• Rucio in production since Dec 1st and is ready for LHC RUN-2. Some fields need improvements, including transfer and deletion agents, documentation and monitoring. <br />
<br />
• [https://rucio-ui.cern.ch/dumps Rucio dumps available].<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMDarkDataCleaning Dark data cleaning]<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMLostFilesLost files declaration] . Only Only DDM ops can issue lost files declaration for now, cloud support needs to fill a ticket.<br />
<br />
• Webdav panda functional tests with Hammercloud are ongoing<br />
<br />
Monitoring<br />
<br />
• [http://adc-monitoring.cern.ch/ Main page]<br />
<br />
• [http://dashb-atlas-ddm-acc.cern.ch/dashboard/request.py/ddmaccounting DDM Accounting]<br />
<br />
• [http://atlas-agis.cern.ch/agis/ddmblacklisting/listGroup space]<br />
<br />
• [http://dashb-atlas-ddm.cern.ch/ddm2/ Deletion]<br />
<br />
ASAP<br />
<br />
• ASAP (ATLAS Site Availability Performance) in place. Every 3 months the T2s sites performing BELOW 80% are [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/ATLASSiteCategorization reported to the International Computing Board.]<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End ATLAS****************** -----><br />
<!-- ****************Start CMS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK CMS<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End CMS****************** -----><br />
<!-- ****************Start LHCb****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK LHCb<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End LHCb****************** -----><br />
<!-- ****************Start Other****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK OTHER<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Other****************** -----><br />
<!-- ****************Start Requests****************** -----><br />
{| cellspacing="0" cellpadding="0" style="margin: 1em 0em 0em 0.3em; width:100%"<br />
| style="width:50%; vertical-align:top; border:1px solid Gold; background-color: LightGreen;" rowspan="2"|<br />
<div style="border-bottom:1px solid Gold; background-color:#ffffaa; padding:0.2em 0.5em 0.2em 0.5em; font-size:110%; font-weight:bold;"> To note</div><br />
<div style="padding:0.4em 1em 0.3em 1em;"><br />
<br />
==== ====<br />
<!-- ******************Edit start********************* -----><br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<br />
</div><br />
|}<br />
<br />
<!-- ****************End Requests****************** -----><br />
<br />
|}</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Past_Ticket_Bulletins_2015Past Ticket Bulletins 20152015-09-18T10:03:41Z<p>Matthew Doidge 1ac9bd3994: </p>
<hr />
<div>'''Monday 14th September, 15.00 BST'''<br /><br />
<br />
Yet another brief ticket review - its been a tad busy! Hopefully normal service will resume, err, next month. My apologies.<br />
<br />
Only 20 Open UK tickets this week.<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115805 115805] - This RAL ticket from Sno+ looks like it can be closed, with there not actually being a problem with the site, or a feature that RAL would really want to implement. <br />
<br />
Keeping on the Tier 1, does this ticket concerning the FTS3 certificates ([https://ggus.eu/?mode=ticket_info&ticket_id=115290 115290]) have anyone looking into it yet?<br />
<br />
Are these two QMUL LHCB tickets duplicates, or related? [https://ggus.eu/?mode=ticket_info&ticket_id=115959 115959] & the more recent [https://ggus.eu/?mode=ticket_info&ticket_id=116153 116153]<br />
<br />
This Sussex ticket looks like it could do with someone taking a look - still just "assigned" since the 9th - [https://ggus.eu/?mode=ticket_info&ticket_id=116136 116136]<br />
<br />
Finally, an interesting VAC ticket for Oxford - "no space left on device" for atlas jobs - [https://ggus.eu/?mode=ticket_info&ticket_id=116123 116123]. Bit of an odd one!<br />
<br />
'''Monday 7th September 2015, 15.00 BST'''<br /><br />
20 Open UK Tickets this week.<br />
Due to Interesting Downtimes (apologies for reusing that pun!) yet another fairly light review. But not much is going on in ticketland.<br />
<br />
[http://tinyurl.com/nwgrnys THE GGUS TICKETS IN ALL THEIR GLORY]<br /><br />
The Sno+ ticket [https://ggus.eu/?mode=ticket_info&ticket_id=115805 115805] is interesting, Sno+ are looking at monitoring jobs submitted via WMS through the port 9000 URLS, but the RAL WMS behaves differently from the Glasgow one and doesn't let others with the same roles look at the links. Sno+ are still "developing" their grid infrastructure with the WMS in mind by the looks of it.<br />
<br />
The pilot role at Sheffield ticket [https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] could still do with some attention, or an update.<br />
<br />
Bristol had a ticket from CMS that looks interesting - [https://ggus.eu/?mode=ticket_info&ticket_id=115883 115883]. The CMS SAM3 tests are confused by Bristol having an SRM-less SE endpoint. Waiting for reply after things were clarified.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 T'Other VO Nagios Page]<br /><br />
A few "The CREAM service cannot accept jobs at the moment" style errors at QM, but they're only a few hours old. Otherwise looking alright beyond the usual noise.<br />
<br />
Of course with these light reviews I could well be missing something, so feel free to let me know - sites or VO representatives.<br />
<br />
'''Monday 24th August 2015, 15.45 BST'''<br /><br />
<br />
21 Open UK tickets this week, most being looked at or are understood. There will be no ticket update from Matt next week (1st September) either, as he will be flapping about during a local downtime.<br />
<br />
[http://tinyurl.com/nwgrnys YE OLDE GGUS TICKET LINK]<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] (gridpp pilots at Sheffield) could do with an update. The Liverpool ticket discussed last week ([https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248]) has received an update from the user saying that the ticket can be closed. As Steve mentioned last week the underlying issue is still very much there, but I don't think this ticket is a suitable banner for us to fight that battle under.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 T'Other VO Nagios Page]<br /><br />
As usual nothing to see here at time of writing, sites are doing a grand job of working for the monitored VOs.<br />
<br />
And that's all folks! See you in Liverpool.<br />
<br />
<br />
'''Monday 17th August 2015, 15.00 BST'''<br /><br />
29 Open UK Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114992 114992] (10/7)<br /><br />
It looks like this CMS transfer problem ticket can be closed after a user update last week, which reported enabling multiple streams solved the initial failures. In progress (13/8)<br />
<br />
'''EFDA-JET'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115613 115613] (assigned) ''Update- looks like this was a temporary problem, and the ticket can be closed.<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115448 115448] (in progress, but empty)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115496 115496] (in progress, but might not be a site problem).<br /><br />
Jet have 3 biomed tickets, 2 of which are looking a little neglected.<br />
<br />
'''CAMBRIDGE'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115655 115655] (12/8)<br /><br />
John rightfully asks why does a lengthy but very scheduled downtime set off a ROD alarm. Other then that it'll be worth "on-holding" this ticket whilst the unrighteous red flag fades. In progress (17/8) ''Update - On holded''<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
This Sno+ ticket looks like it needs some chasing up, no news for nearly 2 months. In progress (21/7) ''Update - Steve commented on this on TB-SUPPORT''<br />
<br />
'''Monday 10th August 2015, 15.00 BST'''<br /><br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 T'Other VO Nagios] looks alright at time of writing.<br />
<br />
24 Open UK tickets this week.<br />
<br />
''Lots of activity on GGUS since yesterday. Most positive, but I see the number of tickets at EFDA-JET increasing - all three from biomed.''<br />
<br />
'''Tier 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115512 115512] (5/8)<br /><br />
An interesting ticket - where a banned user is still banned after moving to LHCB from Biomed (no, he's not called Heinz). In Progress (6/8) ''Update - Andrew has asked that the ticket be reassigned to the argus devs as the pepd logs are showing oddness. Waiting for reply now.''<br />
<br />
'''Bristol'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115565 115565] (7/8)<br /><br />
Bristol's phedex agents are down, and have been for a few days. I might have dreamt this, but thought that the Bristol Phedex service might not be hosted at Bristol, especially after RALPP had a similar ticket ([https://ggus.eu/?mode=ticket_info&ticket_id=115566 115566]) at the same time. Assigned (7/8) ''Update - solved''<br />
<br />
'''Manchester'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115504 115504] (ROD Ticket) ''Solved'' <br /> <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115399 115399] (Wiki Ticket) ''Still Open'' <br /> <br />
Both these tickets look like they can be closed.<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115525 115525] (5/8)<br /><br />
Atlas deletion errors after a a disk server fell over- nothing wrong with the ticket handling, but Alessandra brings up a point that always niggles me - the emphasis on the total number of transaction errors and not the number of affected unique files. In progress (8/8)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB ticket sparked by those IPv6 problems. Still no word from Vladimir; Raja - could you comment? I suspect there's been plenty of room for LHCB jobs in QM's (and everyone else who mainlines atlas jobs) queues this weekend. Waiting for reply (21/7)<br />
<br />
'''Monday 3rd August 2015, 14.30 BST'''<br /><br />
<br />
23 Open tickets this month, full review time.<br />
<br />
'''''Newish this morning'''''<br /><br />
As discussed on TB-SUPPORT, a few sites have been getting "I can't lcg-tag your CE" tickets from Biomed. The Liverpool ticket [https://ggus.eu/?mode=ticket_info&ticket_id=115449 115449], solved and verified, was the flagship of these issues. Brunel ([https://ggus.eu/?mode=ticket_info&ticket_id=115445 115445]) and EFDA-JET ([https://ggus.eu/?mode=ticket_info&ticket_id=115448 115448]) also have tickets about this.<br />
<br />
<br />
'''Sno+ "glite-wms-job-status warning"''' (3/8)<br /><br />
Glasgow: [https://ggus.eu/?mode=ticket_info&ticket_id=115435 115435]<br /><br />
Tier 1: [https://ggus.eu/?mode=ticket_info&ticket_id=115434 115434]<br /><br />
Matt M submitted these tickets to Glasgow and the Tier 1 after having trouble with a proportion of Sno+ jobs. Both are being looked at- definitely worth collaborating on this one.<br />
<br />
'''Wiki-leak...'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115399 115399](31/7)<br /><br />
Jeremy noticed that the wiki didn't work for him on Friday - but it seems to work for Jeremy, Alessandra and myself now. As Jeremy notes the ticket can be closed, but out of interest did anyone else spot any problems? In progress (3/8)<br />
<br />
'''Spare the ROD...'''<br /><br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115433 115433] (3/8)<br /><br />
Some CE problems noticed on the dashboard for the Liver-lads - who might be in mourning. Assigned (3/8) ''Update - aaannd Solved by upping gridftp max connections''<br />
<br />
'''RALPP & UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114764 114764]<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114851 114851]<br /><br />
Both of these are "availability" alarm tickets, on-holding until they clear. I hope RALPP managed to get a re-computation for their unfair failures (IGTF-1.65 problems on ARC).<br />
<br />
'''Sno+'d Under'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115387 115387] (30/7)<br /><br />
I'm uncertain if this Sno+ ticket, probably somewhat related to Matt M's recent thread on TB-SUPPORT and concerning xrootd access, is meant for the Tier 1 or RALPP. Assigned (3/8)<br />
<br />
'''First Tier Problems.'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115417 115417] (2/8)<br /><br />
LHCB spotted a number of nodes with cvmfs problems at the Tier 1, which the RAL team had already jumped on and repaired this morning. They wonder if the problem persists. Waiting for reply (3/8)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115290 115290] (28/7)<br /><br />
An FTS problem requiring some special CA magic to solve, but the current CA-wizard isn't about. On hold (29/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113836 113836] (20/5)<br /><br />
Glue 1 vs Glue 2 queue mismatches. It's being worked on perfecting cluster publishing for ARC CEs, but the ticket could either do with an update or on-holding. In progress (24/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114992 114992] (10/7)<br /><br />
CMS transfers failing between RAL and, err, TAMU in the US. Assigned to RAL, where Brian has investigating and Andrew has posed an good question, asking if the user has considered managing the transfers with FTS. Quiet on the user side. In progress (21/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/2014)<br /><br />
One of the tickets from the before times, about CMS AA access tests. It has become a long and confusing saga, but Gareth rescued it with a handy summary of the issue in his last update. How goes the battle? In progress (17/7)<br />
<br />
'''Oxford Squid is red.'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115230 115230] (24/7)<br /><br />
Which might be the colour Ewan's seeing right now! The ticket is reopened, with a comment from Alessandra that the current recommendation is to allow all CERN addresses, and asks if this is something Oxford could do. Reopened (3/8) ''Update - solved after a squid restart. It's not the end of this ordeal, but Ewan would like to tackle it in a different arena then an Oxford ticket.''<br />
<br />
'''Mavaricks and Gooses - GridPP Pilot roles.'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114485 114485] '''Bristol'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] '''Sheffield'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114442 114442] '''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114441 114441] '''RHUL'''<br /><br />
Daniela hopped right back into the pilot seat after getting back from her holidays. Bristol and RALPP are looking good, Sheffield and RHUL are still in the Danger Zone - RHUL in particular were having troubles with argus and could do with some working configs from elsewhere to compare and contrast with their own.<br /><br />
''Update - RHUL are looking better, just a few queue permission tweaks to go by the looks of it.''<br />
<br />
<br />
'''My shame - tarball glexec'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] '''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] '''Lancaster'''<br /><br />
The tarball glexec tickets. Actually this is likely to become a defunct (or at least different) problem at Edinburgh with their SL7 move. Lancaster has a plan - we plan to deploy *something* (amazing plan there Matt) during our next big reinstall in September. Between now and then I have a test CE, cluster and most importantly some time. <br />
<br />
'''Pot-luck tickets (or those I couldn't group).'''<br />
<br />
'''Durham'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br /><br />
A tiny fraction of jobs publishing 0 cores used. Looks to be a slurm oddity. Oliver upgraded their CEs to ARC5 last week and hopes this has fixed things. Fingers crossed! In progress (29/7)<br />
<br />
'''Lancaster'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/2014)<br /><br />
My other shame - Lancaster's poor perfsonar performance. It's being worked on. On Hold (should be back in progress soon) (30/7)<br />
<br />
'''Liverpool'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (21/7)<br /><br />
Sno+ production problems at Liverpool, probably due to a lack of space in the shared area. Things are back in Sno+'s court, with the submitter consulting the Sno+ gurus (I think). In progress (21/7)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB job submission problems due to the known about dual-stacking problems. Waiting for input from LHCB for a while now, as things look okay at QM now but at least check LHCB jobs still weren't running for some reason. Waiting for reply (21/7)<br />
<br />
That's all folks!<br />
<br />
'''Monday 27th July 2015, 16.10 BST'''<br /><br />
<br />
Only 20 UK Tickets this week, and many are on hold for summer holidays. I pruned a few tickets, but none are striking me as needing urgent action, so this will be brief.<br />
<br />
[http://tinyurl.com/nwgrnys Mandatory UK GGUS Link]<br /><br />
Nothing to see here really, but maybe I'm missing something? Nit-picking I see:<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (publishing problems at Durham) could still do with an update - not sure if work is progressing offline on the issue.<br />
<br />
The Snoplus ticket [https://ggus.eu/?mode=ticket_info&ticket_id=115165 115165] looks like it might be of interest for others - in it Matt M asks about tape-functionality in gfal2 tools. Brian has updated the ticket clueing us in about gfal-xattr.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 UK T'other VO Nagios]<br /><br />
A few failures here at time of writing - although only one at Brunel seems to be more then a few hours old (dc2-grid-66.brunel.ac.uk is failing pheno CE tests).<br />
<br />
Let me know if I missed ought!<br />
<br />
'''Monday 20th July 2015, 14.30 BST'''<br /><br />
27 Open UK Tickets this week.<br />
<br />
'''Brunel'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115113 115113] (17/7)<br /><br />
This Brunel Ops ticket (otherwise okay) is a good reminder that when removing CEs from the GOCDB for your site make sure to get all the services, or else you just might end up still monitored (and thus end up ticketed!). Reopened (can probably be closed soon) (20/7) ''Update - ticket solved''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114006 114006] (31/5)<br /><br />
This ticket about problems with Brunels accounting figures was looking promisingly close to being solved (at least to my layman's eyes) 3 weeks ago. Any word offline? In progress (30/6)<br />
<br />
'''Lancaster'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114845 114845] (6/7)<br /><br />
LHCB jobs were failing at Lancaster a fortnight ago, but things should have been fixed quite promptly. Are they still broken? Waiting for reply (14/7)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
A similar case for this LHCB ticket for QM (part of their ongoing dual-stack saga). Waiting for reply (13/7)<br />
<br />
Note: The related issue [https://ggus.eu/index.php?mode=ticket_info&ticket_id=115017 115017] has been "solved" pending further testing. <br />
<br />
'''Sheffield'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114649 114649] (26/6)<br /><br />
This Sno+ ticket has been in "Waiting for Reply" for a little while, no word from the user (who isn't Matt M). Could we poke Sno+ though another channel about this? Waiting for reply (6/7)<br />
<br />
'''Liverpool'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
We could do with finding out from Sno+ the state of play at Liverpool too, although we might have to wait until Steve is back from his hols later this week to field any replies. In progress (17/6)<br />
<br />
'''Durham'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br /><br />
Ticket concerning the small percentage of Durham jobs that aren't publishing their core count (probably a slurm oddity). Now that Oliver's back from holiday has he had time to look at this? On Hold (19/6)<br />
<br />
'''Tier 1'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=113836 (20/5)<br /><br />
I suspect that whilst the work described chugs along in the background we consider on holding this ticket. In progress (24/6)<br />
<br />
'''UCL'''<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=114746 114746] (30/6)<br /><br />
Ben has been battling getting his DPM working, and spotted an interesting problem where SELinux was blocking the httpd from accessing mysql. It's nice to see someone not just switching SELinux off (like I have a habit of doing...). In progress (20/7)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115003 115003] (12/7)<br /><br />
Andy having some problems on a test SE at ECDF - he seems ot be suffering a series of unfortunate errors. Maybe the storage group could help? In progress (17/7)<br />
<br />
'''GridPP Pilot Roles.'''<br /><br />
Bristol are ready for testing, Govind discovered a possible bug in argus that needs a bit more testing at RHUL. Things seem quiet at RALPP and Sheffield, and Brunel too.<br />
<br />
''Supplemental - In the region multicore publishing ticket ([https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233]) only Oxford have a CE still appearing to publish 0 cores - but I thought this CE was Old-Yeller'ed?''<br />
<br />
'''Tuesday 14th July 2015, 9.30 BST'''<br />
<br />
Lazy update today, due to some fun and games at Lancaster yesterday. <br />
<br />
[http://tinyurl.com/nwgrnys UK GGUS Tickets]<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios]<br />
<br />
'''Tickets that pop:'''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114952 114952] and [https://ggus.eu/?mode=ticket_info&ticket_id=114951 114951] are both atlas frontier tickets at RALPP and Oxford, both have been reopened - although the underlying issues seem different. The Oxford ticket is similar to one at RAL ([https://ggus.eu/?mode=ticket_info&ticket_id=114957 114957]), which looks to be caused by an unannounced change in IP for some important atlas squids (AIUI - speed reading the tickets this morning).<br />
<br />
'''QM IPv6 Woes'''<br /><br />
Followers of the atlas uk lists will have noticed some heroic attempts to diagnose and repair problems at QM which appear to be someone else's fault.<br />
LHCB's ticket to QM on the matter: [https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573]<br /><br />
Dan's ticket concerning the "rogue routes": [https://ggus.eu/index.php?mode=ticket_info&ticket_id=115017 115017]<br />
<br />
'''GridPP Pilot Roles'''<br /><br />
Durham, Bristol, Sheffield, Brunel and RHUL still have open tickets about this. Bristol are working on it, as are Durham - Oliver's ready for their setup to be tested again (Puppet overwrote his last changes!). Not much recent news from the other three.<br />
<br />
That's all from me folks, let me know if I missed ought!<br />
<br />
'''Tuesday 14th July 2015, 9.30 BST'''<br />
<br />
Lazy update today, due to some fun and games at Lancaster yesterday. <br />
<br />
[http://tinyurl.com/nwgrnys UK GGUS Tickets]<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios]<br />
<br />
'''Tickets that pop:'''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114952 114952] and [https://ggus.eu/?mode=ticket_info&ticket_id=114951 114951] are both atlas frontier tickets at RALPP and Oxford, both have been reopened - although the underlying issues seem different. The Oxford ticket is similar to one at RAL ([https://ggus.eu/?mode=ticket_info&ticket_id=114957 114957]), which looks to be caused by an unannounced change in IP for some important atlas squids (AIUI - speed reading the tickets this morning).<br />
<br />
'''QM IPv6 Woes'''<br /><br />
Followers of the atlas uk lists will have noticed some heroic attempts to diagnose and repair problems at QM which appear to be someone else's fault.<br />
LHCB's ticket to QM on the matter: [https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573]<br /><br />
Dan's ticket concerning the "rogue routes": [https://ggus.eu/index.php?mode=ticket_info&ticket_id=115017 115017]<br />
<br />
'''GridPP Pilot Roles'''<br /><br />
Durham, Bristol, Sheffield, Brunel and RHUL still have open tickets about this. Bristol are working on it, as are Durham - Oliver's ready for their setup to be tested again (Puppet overwrote his last changes!). Not much recent news from the other three.<br />
<br />
That's all from me folks, let me know if I missed ought!<br />
<br />
'''Monday 6th July 2015, 14.00 BST''' <br /><br />
30 Open UK Tickets this month. Looking at them all!<br />
<br />
'''NGI'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233] (10/6)<br /><br />
The UK not publishing core counts at all sites. Some progress, but at last check John G couldn't see a change for Oxford or Glasgow. In progress (30/6) ''Update - Glasgow seems to be okay after de-creaming, checking the July list we have t2ce6 at Oxford, ce3 and ce4 at Durham (see their ticket) and cetest02 at IC (but that node has test in its hostname!).''<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114442 114442] (18/6)<br /><br />
Gridpp Pilot role ticket. Accounts need to be created, but no word for a few weeks. In progress (19/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114764 114764] (1/7)<br /><br />
Ticket tracking (false) availability issues, created to appease COD - the problem caused by a broken CA rpm release for Arc CEs. Kashif has created a counter-ticket [https://ggus.eu/index.php?mode=ticket_info&ticket_id=114742 114742] Gordon's sagely advice is to submit a recalculation request once the issue is fixed. Assigned (1/7)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114485 114485] (19/6)<br /><br />
Bristol's gridpp pilot role ticket. No news, could do with an update really. In progress (22/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114426 114426] (18/6)<br /><br />
CMS AAA reading test problems. The Bristol admins have transferred data to their new shiny SE and have asked CMS to test again. No word since. Waiting for reply (30/6)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (1/7/13...)<br /><br />
Tarball glexec ticket, now 2 years old. After a really promising burst the last 6 weeks haven't seen any progress, due to a lot of other "normal" tarball work taking up the time. Sorry! On hold (18/5)<br />
<br />
'''DURHAM'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114536 114536] (22/6)<br /><br />
Durham's gridpp pilot role ticket. Not acknowledged yet, is Oliver back yet? Assigned (22/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114765 114765] (1/7)<br /><br />
See RALPP ticket 114764. Assigned (1/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114727 114727] (30/6)<br /><br />
Catalin ticketed that a number of SW_DIR variables at Durham are still pointing to the old school .gridpp.ac.uk cvmfs space. Assigned (30/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br /><br />
John G ticketed Durham over a small percentage of jobs being published as "zero core". Looks like a SLURM timeout problem, although a fix isn't obvious. Put on the back burner whilst Oliver is on holiday. On Hold (19/6)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114649 114649] (26/6)<br /><br />
A ticket from a Sno+ user about not being able to access software using the Sheffield CEs. Acknowledged but no news. In progress (26/6) ''Update - Elena can't find anything wrong, cvmfs seems to be working fine. Perhaps a problem with the environment?''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] (18/6)<br /><br />
Sheffield's gridpp pilot role ticket. Did you get round to rolling them out? In progress (19/6)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114444 114444] (18/6)<br /><br />
LHCB ticket concerning the DPM's SRM not returning checksum information. On hold whilst a related ticket is being looked at ([https://ggus.eu/index.php?mode=ticket_info&ticket_id=111403 111403]). On Hold (22/6)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
Another Sno+ ticket, about grid production jobs failing at Liverpool. AIUI caused by Sno+ running out of space on the shared pool. At last check Steve posted the usage information for Sno+ but no word since (and Steve's off on his hols). In progress (17/6)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114845 114845] (6/7)<br /><br />
LHCB pilots failing at Lancaster. Looks like a simple node misconfiguration, hopefully fixed, waiting to see if it is. On hold (6/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/2013)<br /><br />
glexec ticket - see Edinburgh description. On hold (15/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1)<br /><br />
Bad bandwidth performance at Lancaster. Hoping that IPv6 will shake things up a bit so pushing that. On hold (18/5)<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114746 114746] (30/6)<br /><br />
SRM-put failures ROD ticket. No news at all. Assigned (30/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114851 114851] (6/7)<br /><br />
Low availability ROD ticket, related to above. Assigned (6/7)<br />
<br />
'''RHUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114441 114441] (18/6)<br /><br />
Another GridPP pilot role ticket. Pilots rolled out, but something isn't quite right and they're not working - Govind is looking again. In progress (6/7)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB ticket about two out of three QM CEs not responding for them. Dan spotted the broken CEs were dual-stacked, the working one wasn't. The ticket seemed to have trailed off into some confusion over who needs to do some testing where. I agree with Dan that that who needs to be someone with LHCB credentials! The waters still seem muddied. In progress (1/7)<br />
<br />
'''IC'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114737 114737] (30/6)<br /><br />
The IC voms wasn't updating properly, due to what I infer from the ticket as "SSL/mysql madness". Simon and Robert have been heroically battling this one - it's a good read. On hold (3/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114379 114379] (16/6)<br /><br />
Sam's ticket about SE support in Dirac. Sam will shortly try testing things out on the new Dirac to see how it fares. In progress (6/7)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114447 114447] (18/6)<br /><br />
Brunel's gridpp pilot ticket. Being worked on, with one CE with the pilots enabled. In progress (26/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114006 114006] (31/5)<br /><br />
A ticket from APEL, about Brunel under-reporting the number of jobs they are doing. Turned out to be a problem with Arc, which Raul upgraded to the fixed 5.0 version. The APEL team deleted the sync records, but no word since. In progress (30/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114850 114850] (6/7)<br /><br />
Another APEL ticket, likely the fallout of the previous one - it looks like GAP publishing has been left on for the Brunel CREAM CEs. Assigned (6/7) ''Update - solved''<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114786 114786] (2/7)<br /><br />
Low availability ticket - see RALPP ticket 114442 - probably could do with On holding. In progress (2/7) ''Update - Onholded''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113910 113910] (26/5)<br /><br />
Sno+ data staging problems. Brian gave some advice on how the large VOs do data staging from tape, and has asked if Sno+ still has problems. Matt M might still be on leave though. Waiting for reply (23/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA problems, which eventually brought to light to a problem with super-hot datasets which were alleviated (I think). Despite an update to castor that improved performance the last batch of tests didn't show improved results. No news since. In progress (17/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113836 113836] (20/5)<br /><br />
Glue mismatch problems at RAL. Working on getting "many-Arcs" to correctly publish. In progress (24/6)<br />
<br />
'''Monday 29th June 2015, 14.30 BST'''<br /><br />
<br />
Looking at the [https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''"Other VO" Nagios''']. <br /><br />
Things look generally alright - but Durham look like they need to update their CA rpms - but that might have to wait until Oliver is back from leave.<br />
<br />
'''Tarballs:'''<br /><br />
I don't think this effects many, but there's was a ticket to produce a new version of the WN tarball (which is done): [https://ggus.eu/index.php?mode=ticket_info&ticket_id=114574 114574]<br ><br />
Although AFAICS there is no urgent need to upgrade tarball WNs.<br />
<br />
26 UK Tickets, although not many stand out.<br />
<br />
'''Gridpp VO Pilot tickets:'''<br /><br />
Largely doing alright. With Oliver away the Durham ticket hasn't been looked at yet. Sheffield and Bristol's tickets could do with an update (or on-holding if there's going to be a delay). The RHUL ticket has been reopened as they're deployment of the pilot roles hasn't quite worked out.<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB having trouble with two out of three QM CEs. Dan notes that the two "broken" CEs have been recently dual-stacked, and asks if this could be the problem. The answers is a resounding "maybe", and Raja asks if problems could be duplicated by others using lxplus. Waiting for reply (24/6)<br />
<br />
'''IMPERIAL/DIRAC'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114379 114379] (16/6)<br /><br />
Sam's ticket trying to get SE support with Dirac...er, spruced up. Daniela has asked if the tests can be redone with the "new" dirac. Waiting for reply (22/6)<br />
<br />
Let me know if I missed any tickets.<br />
'''Monday 22nd June 2015, 14.00 BST'''<br />
<br />
35 Open UK Tickets this week (!!!)<br />
<br />
'''GridPP Pilot Role'''<br /><br />
A dozen of them are from Daniela (who painstakingly submitted them all) concerning getting the gridpp (and other) pilot role enabled on the site in question's CEs.<br />
<br />
An example of one of these tickets is:<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=114440 114440] (Lancaster's smug solved ticket).<br />
<br />
Ticketed sites are: Durham, Bristol, Cambridge, Glasgow, ECDF (who are also having general gridpp VO support problems), EFDA-JET (looking solved), Oxford, Liverpool, Sheffield, Brunel, RALPP and RHUL. Most tickets are being worked on fine, but the Bristol and Liverpool ones were still just in an "assigned" state at time of writing. <br /><br />
''Update - good progress on this, just one ticket left "assigned". Cambridge are done, as are JET (ticket needs to be closed). Oxford and Manchester are ready for to have their new setups tried out, with Oxford kindly road-testing glexec for the pilot roles. Good stuff.'' <br />
<br />
'''Core Count Publishing'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233] (10/6)<br /><br />
Of the sites mentioned in this ticket (Durham[1], IC, Liverpool, Glasgow, Oxford) who *hasn't* had a go at changing their core count publishing? I know Oxford have. Daniela had a pertinent question about publishing for VMs, which John answered. In progress (17/6)<br />
<br />
[1] Durham have another ticket on this which may explain their lack of core count publishing: <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br />
<br />
'''DIRAC'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114379 114379] (16/6)<br /><br />
Sam S formed this ticket over having trouble accessing the majority of SEs over Dirac, after some discussion around this last week. Sam acknowledges that this could be a site problem, not a DIRAC problem, but you gotta start somewhere (he worded that point more eloquently). Daniela has posted her latest and greatest DIRAC setup gubbins for Sam to try out. Another, unrelated, point to have are the names missing from Sam's list - for example I'm pretty sure Lancaster should support gridpp VO storage but I've forgotten to roll it out! Waiting for reply (22/6)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
Final ticket today, and another one discussed last week in the storage meeting. Steve's explanation of why (and how) Sno+ would need to start to using space tokens was fantastically well worded in a way to not spook easily startled users. David is digesting the information, but it will likely need to wait for Matt M's return before we'll see progress. In progress (16/6)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114444 114444](18/6)<br /><br />
I told a pork pie when I said that was the last ticket - this one caught my eye. A ticket from lhcb over files not having their checksums stored on Manchester's DPM. A link was given to another ticket at CBPF for a similar issue which got the DPM devs involved ([https://ggus.eu/index.php?mode=ticket_info&ticket_id=111403 111403]) - although Andrew McNab was already subscribed to the ticket. In progress (19/6)<br />
<br />
'''Monday 15th June 2015, 14.15 BST'''<br /><br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO nagios:]<br /><br />
ce05.esc.qmul.ac.uk and hepgrid11.ph.liv.ac.uk seem to be having a spot of bother for multiple VOs, and hepgrid2.ph.liv.ac.uk seems to be starting to have trouble too.<br />
<br />
19 Open UK Tickets this week.<br />
<br />
'''NGI'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233] (10/6)<br /><br />
John Gordon ticketed the NGI (as well as others) about some sites in the UK not publishing core counts with their APEL numbers (or more precisely have submission hosts at that site not publishing). Following the link it looks like Imperial, Liverpool, Durham, Glasgow and Oxford are on this list, I've listed the submission hosts reporting "0" core jobs below to help people clear up their rogues! If you've only fixed things in the last fortnight you'd still show up on this list. In progress (15/6) <br />
<br />
"0" core job submission hosts:<br /><br />
ce3.dur.scotgrid.ac.uk<br /><br />
ce4.dur.scotgrid.ac.uk<br /><br />
cetest02.grid.hep.ph.ic.ac.uk<br /><br />
hepgrid5.ph.liv.ac.uk<br /><br />
hepgrid6.ph.liv.ac.uk<br /><br />
hepgrid97.ph.liv.ac.uk<br /><br />
svr009.gla.scotgrid.ac.uk<br /><br />
t2ce06.physics.ox.ac.uk<br />
<br />
'''Tier 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113914 113914] (26/5)<br /><br />
Sno+ file copying ticket. Matt M is away on his hols, but Dave Auty has took over his duties and reports that this problem seems to have gone away - it can probably be closed. In progress (9/6)<br />
<br />
'''Liverpool'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
Another Sno+ ticket here from David concerning job failures. Nothing wrong with the ticket handling, but I thought that David's errors in submitting test jobs are worth documenting, as they were very understandable. David has since asked if the Sno+ job failures are linked to Sno+ nagios test failures at Liverpool. In progress (12/6)<br />
<br />
'''Imperial'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114157 114157] (8/6)<br /><br />
After Simon and Daniela have cleared up the atlas dark data and expanded their Space Tokens using the space freed up there still seems to be some confusion in Rucio, disagreeing with the SRM numbers. In progress (10/6)<br />
<br />
'''Oxford'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114208 114208] (9/6)<br /><br />
Oxford being ticketed for UKI-SOUTHGRID-OX-HEP_IPV6TEST failing connection tests, tests for which Oxford should not be getting ticketed afaacs. I remember this being mentioned in the Thursday cloud meeting, but I'm ashamed to say I wasn't paying attention. Were any conclusions drawn/decisions made? In progress (10/6)<br />
<br />
'''Manchester'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114153 114153] (7/6)<br /><br />
Atlas transfer failures from Manchester. Errors are still occurring as of yesterday, any news? In progress (14/6)<br />
<br />
<br />
'''Monday 8th June 2015, 15.00 BST'''<br /><br />
21 Open Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113914 113914] (26/5)<br /><br />
Sno+ had problems at the Tier 1 where jobs failed whilst uploading data, believed to be due to an incorrect VOInfoPath. There's been a failure at replicating the issue, and the VOInfoPath advertised is correct. Very confusing, as I assume it all worked at some point before! In progress (2/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113910 113910] (26/5)<br /><br />
Another Sno+ ticket, concerning lcg-cp timeouts whilst data-staging from tape. Matt M has asked for advice on the best practice for doing this, or if Sno+ would be better off just upping their timeouts. Brian has given some advice on using the "bringonline" command, but is himself unsure the best way of seeing what files are currently in a VO's cache. Not much news since. In progress (28/5) <br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114004 114004] (31/5)<br /><br />
Atlas transfers fail due to the "bring-online" timeout being exceeded. Brian spotted a problem with file timestamps mismatching, but no news on this ticket since. In progress (1/6)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114006 114006] (31/5)<br /><br />
APEL accounting oddness at Brunel, noticed by the APEL team. After much to-and-fro-ing John noticed that multiple CEs were using the same SumbitHost, and thus overwriting each other's sync records. Something to watch out for. In progress (7/6)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114157 114157] (8/6)<br /><br />
There's been some debate on the atlas lists about this ticket, a classic "not enough space at the site" ticket. Raising above the indignation over being ticketed for this, Daniela has offered a couple of TB to give some space, and pointed out that IC have some atlas data outside space tokens, and that this could be used to expand the tokens if cleaned up. Waiting for reply (8/6)<br />
<br />
<br />
'''Tuesday 26th May 2015'''<br /><br />
Matt's on leave until the 8th of June. But he's replaceable with handy links... 23 tickets today:<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios''']<br /><br />
<br />
[http://tinyurl.com/nwgrnys '''UK NGI GGUS tickets''']<br />
<br />
'''Monday 18th May 2015, 14.30 BST'''<br /><br />
Full review this week.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios''']<br /><br />
At time of writing I see problems with test jobs at Brunel for pheno and Liverpool for a number of VOs (see Sno+ ticket for probable cause and fix at Liverpool).<br />
<br />
22 Open UK Tickets this week. Going site-by-site:<br />
<br />
'''APEL/NGI'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113473 113473] (4/5)<br /><br />
Missing accounting date for April for some sites. Raul is discussing things for Brunel in the ticket, although they have republished. I think it's only ECDF left to republish their April data. In progress (16/5)<br />
<br />
'''OXFORD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113482 113482] (26/4)<br /><br />
Loss of accounting data for Oxford needing a APEL republish. The Oxford guys republished, but there is some confusion with the resulting numbers. Discussion is ongoing, John G is currently looking at the records. In progress (14/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113650 113650] (11/5)<br /><br />
CMS glideins failing at Oxford. The original problem was with a config tweak being left out of the cvmfs setup, but the ticket has been reopened citing problems persisting on the ARC CE (the CREAM appears to be fixed). Reopened (16/5)<br />
<br />
'''GLASGOW'''<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113095 113095] (17/4)<br /><br />
ROD ticket about batch system BDII failures, left open to avoid unnecessary ticket filing. Gareth noted that the full migration to ARC and HTCondor, which should see the end of these issues, will hopefully be completed by the end of June. On Hold (12/5)<br />
<br />
'''ECDF'''<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (31/7/13)<br /><br />
Somehow left this one out of the e-mail update. Edinburgh's glexec ticket, dependent on the tarball. I put in my tuppence worth today with my tarball hat on. On hold (18/5)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113769 113769] (18/5)<br /><br />
LHCB see a cvmfs problem at Sheffield. Elena has probably fixed the problem(restarted the sssd), just waiting to see if it all pans out. In progress (18/5)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113744 113744] (15/5)<br /><br />
For the VOMS rather then the site, Jens' request for the creation of the dIrac VO, vo.dirac.ac.uk. In progress (18/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113692 113692] (13/5)<br /><br />
A request from pheno to add support to for their new cvmfs area at Manchester, and as I understand it, to support them in a new "form" (pheno.egi.eu). In progress (13/5)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113742 113742] (15/5)<br /><br />
Sno+ noticed their nagios failures at Liverpool. Rob reckons this was a problem with the DPM BDII service certificate not being updated (that's bitten me too), and fixed things this morning. Let's see how that goes. In progress (18/5)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13!)<br /><br />
Lancaster's vintage glexec ticket. An update on this - after have a roundtuit session last week I was building glexec for different paths. It still needs some testing to make sure it works properly. There however definitely won't be a one-size-fits-all tarball solution. On hold (15/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/14)<br /><br />
Only the crustiest old tickets for us at Lancaster! Poor perfsonar performance. Sadly didn't get roundtuit on this one - we're pushing getting these nodes dual stacked as Ewan had pointed out that it would be interesting to see if IPv6 tests also saw this issue. On hild (18/5)<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113721 113721] (14/5)<br /><br />
The only UCL ticket, this is a egi "low availability" ticket. However Daniela notes that the plots are on the rise, so things are looking alright. Probably want to "On Hold" it but otherwise not much to be done. In progress (14/5)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113743 113743] (15/5)<br /><br />
A ticket from Durham concerning the Dirac instance at Imperial's settings for their site. Daniela hopes to get it fixed soon. In progress (15/5)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112948 112948] (10/4)<br /><br />
CA certificate update at 100IT leading to a discussion of other authentication based failures. David has asked for voms information after posting his configs. In progress (13/5)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113035 113035] (14/4)<br /><br />
Ticket tracking the decommissioning of the Tier 1 CREAM CEs. I think things are just about done now, this ticket can soon be closed. In progress (11/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694] (28/10/14)<br /><br />
Sno+ gfal-copy ticket. Brian reports that the Tier 1 is upgrading gfal2 on their WNs, and notes that there's a lot of active debugging work going on in the area. As he eloquently puts it "situation is quite fluid". In progress (13/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA tests failing at the Tier 1. There's been a lot of work on this, deploying then trying to get the new xrootd director configured. New problems have cropped up, and are under investigation. In progress (11/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (28/3)<br /><br />
Atlas transfer failures ("failed to get source file size"). Tracked to a odd double transfer error, possibly introduced in one of the recent "upgrades". Brian has been declaring these files as bad, and a workaround or solution is being thought about. In progress (14/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113705 113705] (13/5)<br /><br />
Atlas transfer failures from RAL tape. Checksum failures, which Brian tracked to being due to not being of a type Castor supports. Brian has asked if this can be changed at the CERN FTS or in rucio. Waiting for reply (14/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113748 113748] (16/5)<br /><br />
Another atlas transfer ticket, but as the error indicates no space left at the Brunel space token being transferred to Elena has noted that this isn't a site problem, telling the submitter to put in a JIRA ticket instead. Waiting for reply, but probably can be just closed (16/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112866 112866] (7/4)<br /><br />
Lots of cms job failures at RAL. This has been traced to some super-hot files, mitigation is being looked into. A candidate for perhaps On Holding, depends on the time frame of a work around. In progress (13/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113320 113320] (27/4)<br /><br />
CMS data transfer issues. I'm not actually too sure what's going on. There are files that need invalidating, which seems to be the root of the evil befalling transfers. The issue is being actively worked on though. In progress (18/5)<br />
<br />
'''Monday 11th May 2015, 14.10 BST'''<br /><br />
22 Open UK Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
There are a few tickets at the Tier 1 that are set "In Progress" but haven't received an update yet this month:<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (CMS AAA Tests, 30/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (Atlas Transfer problems, 16/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694] (SNO+ gfal copy trouble, 15/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112866 112866] (CMS job failures, 7/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112819 112819] (SNO+ arcsync troubles, 20/4)<br />
<br />
'''Other Tier 1 Tickets''' (sorry to be picking on you guys!)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (10/2)<br /><br />
Atlas glexec hammercloud test jobs at the Tier 1. It appears to be working, but a batch of test jobs failed because they couldn't find the "mkgltempdir" utility on some nodes ("slot1_5@lcg1742.gridpp.rl.ac.uk" and "slot1_4@lcg1739.gridpp.rl.ac.uk"). In progress (4/5)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=113320 113320] (27/4)<br /><br />
Maybe repeating what Daniela is going to say in the CMS update - trouble with CMS data transfers within RAL. It's under investigation, but it looks like the files in question will need to be invalidated - even if it's just to paint a clearer picture. In progress (10/5)<br />
<br />
'''APEL REPUBLISHING'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113473 113473] <br /><br />
At last update Brunel, Liverpool, Edinburgh, Birmingham and Oxford need to republish still. Oxford have their own ticket about it due to complications ([https://ggus.eu/?mode=ticket_info&ticket_id=113482 113482]).<br />
<br />
'''UCL Tickets''' - Ben is starting to move to close these, some are going to be "unsolved".<br />
<br />
'''GLASGOW'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113095 113095] (17/4)<br /><br />
Andrew asks if the timeframe for the move to Condor be added to this ticket, for the ROD team's information. On Hold (7/4)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112948 112948] (10/4)<br /><br />
No news on this 100IT ticket for a while. In progress (27/4)<br />
<br />
'''Friday 1st May'''<br /><br />
The Bank Holiday weekend might muck up plans for a Ticket review this week. Just in case, some links!<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios page.]<br /><br />
<br />
[http://tinyurl.com/l784bbg UK GGUS Tickets]<br />
<br />
Hope you all have a nice weekend!<br />
<br />
A quick check of the [https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios] page.<br />
<br />
26 Open UK tickets this week.<br />
<br />
'''ITWO Decommissioning'''<br /><br />
Three of the tickets are to the VOMS sites (Manchester, Oxford, IC), concerning the decommissioning of the ITWO VO. Just an FYI to y'all.<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113293 113293] (26/4)<br /><br />
There was an APEL problem last month where a lot of sites needed to republish their data for the month. I think Edinburgh are the only UK site that suffered this problem, but another FYI ticket. Assigned (26/4) '' And solved''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113181 113181](21/4)<br /><br />
Atlas production jobs not running at ECDF. Andy noticed that analysis jobs were running fine, and believes that this might be a problem scheduling pilots in time. Perhaps a multicore issue if this is only effecting (affecting?) production jobs. In progress (22/4) ''Update - solved''<br />
<br />
'''ATLAS GLEXEC HAMMERCLOUD PROBLEMS'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (TIER 1)<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=111703 111703] (RALPP)<br /><br />
It was discovered that there was a problem in the test code, so the ball is very much in atlas' court for this one. The problem has been fixed and the tests are being rebuilt and resubmitted.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (28/3)<br /><br />
ATLAS FTS failures too RAL. A rucio issue causing double-transfers has been discovered ([https://its.cern.ch/jira/browse/ATLDDMOPS-4939 here]), which would explain the behaviour seen. No news since this revelation. In progress (16/4)<br />
<br />
There are a number of other Tier 1 tickets that could do with either an update or On Holding<br />
<br />
'''Monday 20th April 2015, 14.30 BST'''<br /><br />
24 Open UK tickets this week, only a light review. ''Update - down to 20 open tickets as of this morning''<br />
<br />
'''NGI/TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113150 113150] (20/4)<br /><br />
Fresh in - the NGI has been ticketed to change the regional VO from emi.argus to ngi.argus in the gocdb. Seems a bit pedantic, but hey! I assigned it to the NGI ops, and notified RAL as keepers of the regional argus. Assigned (20/4) ''Update - solved''<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113035 113035] (14/4)<br /><br />
Just for people's interest, the ticket tracking the decommissioning of the last of the RAL CREAM CEs. In progress (14/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112819 112819] (2/4)<br /><br />
A SNO+ ticket I must of somehow missed last week, concerning SNO+'s manual renewing of proxies on ARC machines. Matt M has noticed that ArcSync occasionally hangs rather then timeouts smoothly (although he later notes that he doesn't see the initial problems working from a different network). I'm thinking that this should be redirected at the arc devs, but I don't think they have a GGUS support group (I could be wrong, I'm well behind on the ARC curve). In progress (7/4)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113110 113110] (17/4)<br /><br />
Looks like this atlas low transfer efficiency ticket can be closed. Waiting for reply (20/4) ''Update - solved''<br />
<br />
'''GLASGOW'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113095 113095] (17/4)<br /><br />
ROD ticket for some BDII misreporting at Glasgow. The botheration seems to be ephemeral in nature, the blunders passing with the abating of their batch system's burden. This ticket can probably be solved. In progress (17/4)<br />
<br />
<br />
'''Monday 13th April 2015, 14.00 BST'''<br /><br />
24 Open tickets this week - going over all of them this week, site by site.<br />
<br />
''Fresh in this morning - [https://ggus.eu/?mode=ticket_info&ticket_id=113010 113010] and [https://ggus.eu/?mode=ticket_info&ticket_id=113011 113011] - Sno+ tickets concerning the RAL and Glasgow WMSes not updating job statuses.''<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703](11/2)<br /><br />
Atlas glexec hammercloud tests failing. There's been a lot of waiting on atlas to build new HC jobs. The most recent exchange (delayed due to Easter), was asking about SELinux - but no news since the first. In progress (1/4)<br />
<br />
'''BIRMINGHAM'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112875 112875](7/4)<br /><br />
Low availability ROD ticket. Availability is crawling back up, just need it to go green. On hold (13/4)<br />
<br />
'''GLASGOW'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112967 112967](10/4)<br /><br />
Another ROD ticket for bdii errors at Glasgow. Gareth has been doing everything right investigating this. Kashif recommended ticketed the midmon unit, but Gareth has spotted that the errors correspond to high load on their ARC CE - so it might be a site problem after all - Gareth asks for clarification. Waiting for reply (13/4)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (1/7/13)<br /><br />
Tarball glexec ticket. No news (sorry). End of April I believe was the "deadline" I set for having this made. On Hold (9/3)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/14)<br /><br />
Lancaster's poor perfsonar performance. I'm not believing quite what I was seeing with the tests I performed so I'm aiming to rerun them. On hold (13/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13)<br /><br />
Lancaster's tarball glexec ticket. Same as ECDF. On hold (9/3)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112966 112966] (13/3)<br /><br />
A ROD cream job submit ticket, freshly assigned this afternoon. It's a bit mean of me to bring notice to it. Assigned (13/4) ''And POW, Raul closed this after kicking torque into shape - solved''<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112948 112948] (10/4)<br /><br />
100IT needed to upgrade to the latest CA release. They've done this, but there are still authentication problems. In progress (13/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356] (10/9/14)<br /><br />
Deploying vmcatcher at 100IT. After David's questions falling on deaf ears for a while it has been advised that the ticket be closed as this issue will be dealt with elsewhere. Whether or not it is to be "solved" or "unsolved" is open to debate! In progress (can possibly be closed) (13/4)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA tests failing at RAL. After a lot of work and new xrootd redirectors problems persist. It's looking to be a problem that needs the CASTOR and/or xrootd devs to look at. In progress (30/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112713 112713] (27/3)<br /><br />
CMS asking to clean up the "unmerged area". Andrew conjured up a list of files and asked if they could be deleted - CMS responded with a "yes please then close the ticket". Has the deed been done? In progress (31/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694] (28/10/14)<br /><br />
The Sno+ gfal copy ticket. Matt M still sees gfal-copy hang for files at RAL when he uses the GUID (SURL works). A Castor oddity perhaps? Matt asks a question about what problems like this (coupled with the move away from lcg tools) will mean for VOs that rely on the LFC. In progress (31/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112977 112977] (10/3)<br /><br />
CMS high job failure rate at RAL. Related to 112896 (below) - the jobs all want that file! In progress (13/3)<br />
<br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=112896 (9/4)<br /><br />
CMS Dataset access problems - caused by over a million access attempts on a single file over a 18 hour period. Andrew L comments that CMS needs to have a think about how they access pileup datasets. In progress (9/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (10/2)<br /><br />
Tier 1 counterpart to 111703. A new HC stress test was submitted near the end of March, but no news on how it did. In progress (23/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112866 112866] (7/4)<br /><br />
A different "lots of CMS job failures" ticket. Again a "hot file" seems to be the root cause. In progress (7/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (28/3)<br /><br />
An atlas file access ticket, seemingly caused by some odd FTS behaviour. No answers to Shaun's question about this odd occurrence or much noise at all till today. Waiting for reply (13/4)<br />
<br />
'''UCL'''<br /><br />
UCL has 6 tickets - 4 just "assigned". I'll just list them in the interests of brevity.<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112371 112371] (ROD low availiability, On Hold)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112841 112841] (atlas 0% transfer efficiency, assigned)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112873 112873] (ROD srm put failures, assigned)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95298 95298] (glexec ticket, on hold)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112722 112722] (atlas checksum timeouts, in progress)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112966 112966] (ROD job submit failures, assigned)<br />
<br />
'''Tuesday 7th April'''<br />
* 20 open tickets. [http://tinyurl.com/l784bbg Link to GGUS].<br />
<br />
'''Monday 23rd March 2015, 15.30 GMT'''<br /><br />
19 Open tickets this week. <br />
<br />
'''BIRMINGHAM'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=112550 (23/3)<br /><br />
A ticket fresh off the ROD dashboard - the Birmingham CREAMs aren't being matched ("BrokerHelper: no compatible resources"). Matt W has double checked their setup and can't spot anything wrong - they've been running "normal" atlas/lhcb etc jobs fine over the last few weeks. Any advice appreciated. In progress (23/3)<br />
<br />
'''TIER 1'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=112495 (21/3)<br /><br />
MICE problems running jobs at RAL, which Andrew L discovered coincided with WMS problems that he fixed. Probably should be in "Waiting for Reply/Seeing if the problem's evaporated". In progress (23/3)<br />
<br />
https://ggus.eu/?mode=ticket_info&ticket_id=112350 (14/3)<br /><br />
The cause of this Sno+ ticket, about a recent user not being able to access files due to not being in the gridmap, has been discovered. As Robert F sagely pointed out the latest version of the mkgridmap rpm is required to talk to the voms server. Just waiting on the time for it to updated at now. In progress (17/3)<br />
<br />
'''100IT'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=108356 (10/9/14)<br /><br />
This 100IT ticket is still waiting for a reply (since mid-January). The question needs to be answered by someone familiar with the technologies and terminologies? Is anyone up on vmcatcher? Anyone know what other channel to pass David's query onto? Waiting for reply (15/1)<br />
<br />
'''Monday 16th March 2015, 15.30 GMT'''<br /><br />
<br />
16 Open UK tickets this week. Half Red, Half Green.<br />
<br />
'''RALPP and TIER 1 glexec HC tickets'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703] (RALPP)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (TIER 1)<br /><br />
No news on these tickets since it was expected that a new stable HC job would be released last Tuesday. All very quiet.<br />
<br />
'''OXFORD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112011 112011](25/2)<br /><br />
Similar for this CMS glexec ticket - no news after Kashif asked for some more information way back. Waiting for reply (27/2)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694](28/10/14)<br /><br />
The Sno+ gfal copying ticket. A lot of people are working on this, and attempts to recreate the problems seem to occasionally be devolving into "did we get this complicated command right?". At some point it might be necessary to get the gfal devs involved (are there gfal devs?). Waiting for reply (11/3)<br />
<br />
Also there's the poor 100IT ticket, still waiting for a reply. The JET ticket also needs wrapping up, I put a reminder in to the end of it.<br />
<br />
<br />
'''Monday 9th March 2015, 15.00 GMT'''<br/ ><br />
From last week's crusty ticket round up:<br />
<br />
'''Tier 1'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694]- 28/10/14<br/ ><br />
Matt M has managed to reclaim his tickets after a certificate change orphaned his old ones. Progress has resumed. Duncan has asked Matt to retry his failing tests with a simple copy to local disk example.<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944]- 1/10/14<br/ ><br />
CMS AAA access at RAL. Andrew posed a question to the CMS xroot experts last week - if you have their details it might be a good idea to involve them in the ticket.<br />
<br />
'''100IT'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356]- 10/9/14<br/ ><br />
This VMCatcher ticket is still stuck waiting for a reply. Deafening silence for our 100IT colleagues. <br />
<br />
'''EFDA-JET'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=97485 97485] - 21/9/13<br/ ><br />
The Jet LHCB ticket is in the state of being wrapped up. LHCB have been removed from the local configs, and the site has been removed from LHCB's. I believe that this ticket can be terminated.<br />
<br />
'''QMUL'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353] - 25/11/14<br/ ><br />
Dan's managed to get webdav working on one SE, but not t'other. Very strange, but Dan is investigating (see also [https://ggus.eu/?mode=ticket_info&ticket_id=111942 111942]).<br />
<br />
No movement on the three glexec tickets (none expected on the two tarball ones in the last week though), the Lancaster perfsonar ticket is still waiting on another batch of local tweaks (and I still need to make sense of what I'm seeing). Matt RB closed Sussex's perfsonar ticket though - nice one.<br />
<br />
'''The "Normal" tickets:'''<br />
<br />
Atlas gLexec Hammercloud failures (RALPP and Tier 1)<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (Tier 1)<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703] (RALPP)<br/ ><br />
These tests were waiting on a new stable job release being made - this has been delayed (hopefully out tomorrow).<br />
<br />
'''TIER 1'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111856 111856](19/2)<br/ ><br />
This LHCB ticket about stalled jobs looks like it can be closed (LHCB no longer see a problem). ''Update - set to solved, the jobs were being killed for using too much memory.''<br />
<br />
'''OXFORD'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112011 112011](25/2)<br/ ><br />
A CMS user saw glexec failures on some nodes - Kashif asked for some more information but there has been no reply. I'd consider giving the user till the end of the week then closing the ticket if there's still no word. Waiting for reply (27/2)<br />
<br />
'''Tuesday 3rd March'''<br />
* [http://tinyurl.com/nwgrnys A link to GGUS open tickets] for checking status directly.<br />
<br />
Concentrating on pre-2015 tickets this week in an attempt to Spring Clean the UK ggus presence. I will review these again next week - can people please take a look at these tickets if they're owned by them (or if they think they can help!).<br />
<br />
'''SUSSEX - 26/11/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389]<br /><br />
This is a perfsonar ticket - the initial request (reinstalling the perfsonar node) has been done a while ago but things weren't quite right. Matt RB did some soothing to this last week and asks if he's missed anything - I put it to waiting for reply this morning. Waiting for reply (24/2)<br />
<br />
'''QMUL - 25/11/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353]<br /><br />
Atlas wanting https access on QM's SE. Dan's been working on this nicely, carefully testing each stage of his rollout. The end is in sight here. In progress (17/2)<br />
<br />
'''TIER 1 - 28/10/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694]<br /><br />
This is a SNO+ ticket about getting gfal tools working for the Tier 1 - with the new version out Brian has tested it correctly (and I saw a related thread on lcg-rollout) - but no word from Sno+. Who is wrangling the other VOs in these post-Walker times? Waiting for reply (24/2)<br />
<br />
'''TIER 1 - 1/10/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944]<br /><br />
A CMS access about AAA tests at the Tier 1. This ticket is being actively worked on, with a new xrootd redirector at RAL and problems with the EU redirectors mucking things up. No problems that I can see. Waiting for reply (2/3)<br />
<br />
'''100IT - 10/9/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356]<br /><br />
Getting VMcatcher stuff to work at 100IT. This ticket seems to keep stalling due to lack of documentation or replies from the submitters. Waiting for reply (19/1)<br />
<br />
'''LANCASTER - 27/1/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566]<br /><br />
Lancaster's poor perfsonar performance. Being poked and prodded on and off over the last year, but the problems remain a mystery - Ewan's lending of a iperf endpoint has helped out greatly though, waiting on yet another network tweak. On Hold (23/2)<br />
<br />
'''EFDA-JET - 21/9/13'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=97485 97485]<br /><br />
LHCB job failures at EFDA-JET. The causes of this remain a mystery, and is the first ticket on my "to be set to unsolved" list.<br />
<br />
'''UCL - 1/7/13'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95298 95298]<br /><br />
UCL's glexec ticket. Ben's been working hard at this recently, but keeps hitting show stoppers - the latest being a performance problem possibly due to the VM he's running argus on. On hold (19/2)<br />
<br />
'''ECDF and LANCASTER - 1/7/13'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (ECDF)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (Lancaster)<br /><br />
glexec for the tarball. This is *still* waiting on the tarball glexec, which is again waiting on me, which is waiting on me magicking some extra tarball development time. Will be reviewed by the end of March.<br />
<br />
<br />
'''Monday 23rd February 2015, 15.00 GMT'''<br /><br />
Only 15 Open UK tickets this week. Feel free to bring up any ticket-based issues of your own on this quiet week.<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703](11/2)<br /><br />
This CMS glexec hammercloud ticket is looking a little quiet - no update for a while. If it's continuing offline or waiting on input could it at least be put On Hold? In progress (11/2)<br />
(The Tier-1 version of this ticket, 111699, seems to be chugging along fine - there might be useful snippets in there).<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333](22/1)<br /><br />
The Cloud accounting probe ticket was reopened, asking if 100IT ticketed apel support (I assume contacting them via other means would work too) otherwise the new cloud accounting won't be properly republished. Reopened (20/2)<br />
<br />
<br />
'''IMPERIAL''' (but not really their issue)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111872 111872](20/2)<br /><br />
Tom opened a ticket after another cern@school user had troubles using the IC SE - there has been some problems with the newer versions of the dirac UI. Sometimes it's better to go Vintage! Although after trying Simon and Daniela haven't been able to reproduce the failure - perhaps something's up with the user's UI? Waiting for reply (23/2)<br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389](26/11/14)<br /><br />
Sussex's Perfsonar ticket. I know Matt RB has put the ticket On Hold and is very busy - is there any news/anything we can do to help? On Hold (21/1<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694](28/10/14)<br /><br />
Sno+ gfal copy problems at RAL. Brian informs us that the latest version of the gfal tools works for him and has asked if they work for Matt M. and Co. Did you get these packages out of epel/epel-testing or somewhere else Brian? Waiting for reply (18/2)<br />
<br />
'''Monday 16th February 2015, 14.30 GMT'''<br /><br />
Only 19 open UK tickets today.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111120 111120](12/1)<br /><br />
An atlas ticket concerning transfer failures between RAL and BNL. Brian mentioned last week that the lack of recent failures is due to atlas not attempting to transfer any older data recently. Perhaps this could do with being put into the ticket (and the ticket being put On Hold, or prodded some more)? Waiting for reply (29/1)<br />
<br />
Also<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=111800 111800] (17/2)<br /><br />
ARC CE issues at RAL detected. <br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703](11/2)<br /><br />
Atlas running glexec hammerclouds - having trouble at RALPP (and RAL, see 111699). The glexec experts have gotten involved on this one, and asked to take a peek at a proxy - not sure about anyone else, but I'd feel a tad uncomfortable sharing proxies, even with known and trusted experts as in this case. Am I being overly paranoid? Either way the ticket has gone a bit quiet. In progress (11/2) <br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356] & [https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333]<br /><br />
Both the 100IT tickets are in Waiting for reply - the oldest one for quite a while - David asked a question a while back and no answer. The newest one asks if the 100IT logs made it to apel safely - I think what David has to do is submit a ticket with this question to the apel support team - have I got the right end of the stick? <br />
<br />
'''Biomed Tickets at Manchester and Imperial'''<br /> <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356] & [https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357]<br /><br />
FYI There's a note at the bottom of both of these tickets that the version of CREAM that should fix this has been delayed until the end of February(ish).<br /><br />
''Update - I read those updates wrong - the cream update has been released and these tickets have been (perhaps erroneously) closed.'' <br />
<br />
<br />
<br />
'''Monday 9th February 2015, 15.00 GMT'''<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios Results]<br /><br />
At the time of writing the only site showing red that aren't suffering an understood problem was RALPP with org.nordugrid.ARC-CE-submit and SRM-submit test failures for gridpp, pheno, t2k and southgrid for both its CEs and its SE. The failures are between 1 and 12 hours old, so it doesn't seem to be a persistent failure, but it seems to be quite consistent. They all seem to be failing with "Job submission failed... arcsub exited with code 256: ...ERROR: Failed to connect to XXXX(IPv4):443 .... Job submission failed, no more possible targets". Anyone seen something like this before?<br />
<br />
Only 20 Open UK tickets this week.<br />
<br />
'''Biomed tickets:'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356] (Manchester)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357] (Imperial)<br /><br />
Biomed have linked both these tickets as children of 110636, being worked on by the cream blah team. AFAIKS no sign of Cream 1.16.5 just yet.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111347 111347] (22/1)<br /><br />
CMS consistency checks for January 2015. It looks like everything that was asked of RAL has been done by RAL, so hopefully this can be successfully closed. In progress (3/2)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111120 111120] (12/1)<br /><br />
Another ticket, this time concerning a period of Atlas transfer failures between RAL and BNL, that looks like it can be closed as the failures seem to have stopped (and might well have been at the BNL end). Waiting for reply (22/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA test failures at RAL. Federica can't connect to the new xrootd service according to the error messages. No news for a while. In progress (29/1)<br />
<br />
'''100IT'''<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356]<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333]<br /><br />
Both of these 100IT tickets are looking a bit crusty - the first is waiting for advice, the second was just put "In progress".<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353] (25/11/14)<br />
Dan has set up se02.esc.qmul.ac.uk to test out the latest https-accessible version of storm for dteam and atlas. As a cherry on top this node is also IPv6 enabled. I'm not sure if Dan wants others in the UK to "give it a go"? In progress (6/2)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/14)<br /><br />
(Blatantly scounging for advice) Trying to figure out why Lancaster's perfsonar is under-performing. Ewan kindly gave us access to a iperf endpoint and it's been very useful in characterising some of the weirdness - although I'm still confused. Ewan also gave us a bunch of suggestions for testing that have been useful - next stop, window sizes. If anyone else wants to throw advice to me all wisdom donations are thankfully accepted. My advice for others in be careful trying to connect to the default iperf port on a working DPM pool node.... In Progress (9/2)<br />
<br />
'''Monday 2nd February 2015, 14.00 GMT'''<br /><br />
22 Open UK tickets this month. <br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389] (26/11/14)<br /><br />
A perfsonar ticket for Sussex. Their perfsonar has been reinstalled, but needs soothing. Matt has informed us that this might have to wait a few weeks due to other issues. On Hold (21/1)<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110536 110536] (2/12/14)<br /><br />
MICE job failures at RALPP - it looked like they were dying due to running of of memory. The queues have been tweaked to give MICE more, but no word from the MICE if this has solved the problem. Waiting for reply (12/1)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110365 110365] (25/11/14)<br /><br />
Another perfsonar ticket. Again the node is reinstalled, just not quite working right. Winnie is waiting for news from the other sites in a similar boat. In progress (maybe On Hold it?) (20/1)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111118 111118] (12/1)<br /><br />
ECDF "low availability" ticket - just waiting for the silly alarm to clear. Daniela submitted a ticket about this foolish alarm a while ago - [https://ggus.eu/?mode=ticket_info&ticket_id=107689 107689]. On Hold (19/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (1/7/13)<br /><br />
glexec tarball ticket. With my tarball hat on - still no positive news on this front - it's beginning to look like this can't be done but we're having one last go. Sorry! On Hold (19/12)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110225 110225] (18/11/14)<br /><br />
Change of VO Manager for helios-vo.eu. It looks like this ticket is being held up at the user end a lot. I'm not sure there's anything we can do as it involves outside CAs. On Hold (20/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356] (23/1)<br /><br />
One of Manchester's CEs not working for biomed, due to problems with the new CREAM/old WMS communication. Alessandra gave biomed some sagely advice, but I suspect this ticket will need to be prodded soon to get a reponse from biomed (who I agree should use a newer WMS and close it). On Hold (26/1)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111547 111547] (2/2)<br /><br />
I'm reporting on a ticket that I submitted to myself today. I'm not sure what that says about the world. Anyway - a ticket to track the decommissioning of one of Lancaster's CEs, as we try to do it all proper like. On Hold (2/2)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (21/1/14)<br /><br />
Lancaster's perfsonar ticket, which I sadly let reach its first birthday. I've been prodding this offline, does anyone have the address for a regular, open iperf endpoint I could borrow? On Hold (9/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13)<br /><br />
Lancaster's tarball glexec ticket, as the ECDF one. On hold (26/1)<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13)<br /><br />
UCL's glexec ticket. They've been having trouble getting it to behave, and at last check Ben was off ill - probably due to dealing with glexec :-) On Hold (20/1)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353] (25/11/14)<br /><br />
Atlas asking for QM's storage to be made available via https. Waiting on a production ready STORM that can provide this - Dan is trying it out on his testbed se02.esc.qmul.ac.uk, which still needs tweaking. In progress (28/1)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357](23/1)<br /><br />
One of the IC CEs not working for biomed. Similar to the Manchester ticket, Daniela points to ticket 110635 and is waiting on an EMI release to fix it (due out imminently AIUI). On Hold (28/1)<br />
<br />
'''EFDA-JET'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=97485 97485] (21/9/13)<br /><br />
Jet's LCHB job failure tickets. I'm afraid I haven't been able to chase this up (partly due to only ever remembering on the first Monday of the month) - there's been no news for a while. On Hold (1/10/14)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333] (22/1)<br /><br />
A ticket to 100IT and the NGI to get the cloud accounting probe upgraded. I notified 100IT, but forgot to reassign the ticket - thanks to Jeremy for doing it. Assigned (2/2)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356](10/9/14)<br /><br />
Getting VMcatcher working at 100IT. David from 100IT has asked for some answers on which "glancepush" to use, but no reply for a while. Waiting for reply (19/1)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111477 111477](29/1)<br /><br />
CMS would like to run some staging tests to warm up for Run2. The Tier 1 warned CMS of today's outage and they're happy to proceed tomorrow (the 3rd) - I think they'd like a response. In progress (30/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=107935 107935](27/8/14)<br /><br />
A ticket regarding inconsistent BDII and SRM storage numbers. Waiting on a fix from the developers regarding read-only disk accounting (I think), Brian is still on the case. Stephen B let us know that Maria the ticket submitter is on maternity leave, and asks in her stead if the numbers are expected to align now. On hold (28/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111120 111120](12/1)<br /><br />
An atlas ticket about a large number of data transfer errors seen between RAL and BNL. Brian reckoned that this was due to shallow checksums on the old data being transferred, but had trouble looking at the BNL FTS. Regardless, the ADCoS shifter hadn't seen any errors for a week and suggests the ticket can be closed. Waiting for reply (29/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944](1/10/14)<br /><br />
CMS AAA test problems at RAL. After setting up a new xrootd box the test failures have changed in nature, but sadly they're still failures. In progress (29/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111347 111347](22/1)<br /><br />
CMS Consistency Check for RAL, January 2015 edition. Filelists were generated, orphan files were identified, then purged. Just need to know what CMS want to do next. Waiting for reply (26/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694](28/10/14)<br /><br />
Sno+ ticket concerning gfal tool problems, waiting on the new release to come out (middle of this month I believe). If you don't want to wait that long then I believe the 2.8 gfal2 tools can be found in the fts3 repo at last check. On hold (20/1)<br />
<br />
<br />
'''Monday 26th January 2015, 14.15 GMT'''<br /><br />
''Back after being forgotten about by me:''<br /><br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios Status:''']<br />
<br />
At the time of writing I see:<br /><br />
'''Imperial:''' gridpp VO job submission errors (but only 34 minutes old so probably naught to worry about).<br /><br />
'''Brunel:''' gridpp VO jobs aborted (one of these is 94 days old, so might be something to worry about).<br /><br />
'''Lancaster:''' pheno failures (I can't see what's wrong, but this CE only has 10 days left to live).<br /><br />
'''Sussex:''' snoplus failures (but I think Sussex is in downtime).<br /><br />
'''RALPP:''' A number of failures across a number of CEs, all a few hours old. An SE problem?<br /><br />
'''Sheffield:''' gridpp VO job submission failure, but only 6 hours old.<br />
And of course the srm-$VONAME failures at the Tier 1, which are caused by incompatibility between the tests and Castor AIUI. Things are generally looking good.<br />
<br />
22 Open UK Tickets this week. <br /><br />
'''NGI/100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333](22/1)<br /><br />
The NGI has been asked to upgrade the cloud accounting probe, and then notify our (only at the moment) cloud site to republish their accounting. Not entirely sure what this entails or who this falls on, I assigned it to NGI-OPERATIONS (and also noticed that 100IT isn't on the "notify site" list - odd). Assigned (22/1)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944](1/10/14)<br /><br />
CMS AAA test failures. Andrew Lahiff reported last week that the Tier 1 is building a replacement xrootd box which is currently being prepared. If that will take a while can the ticket be put on hold? In progress (19/1)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353](25/11/14)<br /><br />
An atlas ticket, asking for httpd access to at QMUL. The QM chaps were waiting on a production ready Storm that could handle this, and are preparing to test one out. This is another ticket that looks like it might need to be put On Hold (will leave that up to you chaps - there's a big difference between "slow and steady" progress and "no progress for a while"). In progress (21/1)<br />
<br />
'''RHUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111355 111355](23/1)<br /><br />
A dteam ticket - concerning http access to RHUL's SE. Although the initial observation about the SE certificate being expired was incorrect (the expiry date was reported as 5/1/15, which to be fair I would read as the 5th of January and not the 1st of May!) there still is some underlying problem here with intermittent test failures. Also this ticket raises the question of under what context are these tests being conducted? Anyone know, or shall we ask the submitter? In progress (26/1)<br />
<br />
'''BIOMED PROBLEMS:'''<br /><br />
'''Manchester:''' [https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356](23/1)<br /><br />
'''Imperial:''' [https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357](23/1)<br /><br />
Biomed are having job problems, looking to be caused by using crusty old WMSes to communicate with these site's shiny up-to-date CEs. According to ticket 110635 a cream side fix should be out by the end of January (CREAM 1.16.5), although Alessandra suggests that Biomed should try to use newer, working WMSes - or Dirac instead!<br />
<br />
<br />
'''Monday 19th January 2015, 14.30 GMT'''<br /><br />
23 Open UK Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944](1/10/14)<br /><br />
CMS seeing AAA test failures at RAL. The tests have been restarted recently and now seem to be having some suspicious looking authentication failures. In Progress (13/1)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111162 111162](14/1)<br /><br />
Atlas complaining about httpd doors not working on Sheffield's SE. After schooling the submitter in how to submit more useful information Elena is working on it. I bring this up as in the last few days I've had quite a few of my pool nodes have their httpd daemons crash on them (they're up to date, but still SL5), which may or may not be related. In Progress (19/1)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111118 111118](12/1)<br /><br />
ECDF "low availability" ticket after a few days of argus trouble, which Wahid fixed. Now the ticket will languish for a few weeks as the alarm clears. Daniela has reminded us of her ticket against these fairly silly alarms: https://ggus.eu/?mode=ticket_info&ticket_id=107689 . In the mean time this ticket could do with being put On Hold whilst the alarm clears. In progress (19/1)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356](10/9/2014)<br /><br />
Setting up VMCatcher at 100IT. After some troubles things seem to have be looking up, although there are still some questions that the 100IT chaps have for the configurations and what they should be using that aren't getting answers. I set the ticket to "Waiting for Reply" hoping that this will help get those in the know's attention. Waiting for reply (15/1)<br />
<br />
'''Perfsonar Tickets'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389](Sussex)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110382 110382](TIER 1)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108273 108273](Durham)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566](Lancaster)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110365 110365](Bristol)<br /><br />
Everyone seems to have updated their perfsonar hosts, so we're all good on that front, but a number of sites are either having trouble with their reinstalled hosts, or are having problems that they had pre-reinstall still haunt them. I'm afraid I have no suggestions of what to do about the growing number of these tickets though!</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/RelocatableGlexecRelocatableGlexec2015-09-17T13:15:58Z<p>Matthew Doidge 1ac9bd3994: </p>
<hr />
<div>The title is misleading - due to security concerns glexec can't be truely relocatable, but it can be built to use a different binary and config path to the defaults, allowing the exporting and use of glexec in a tarball environment.<br />
<br />
=Building GLEXEC to suit your site's tarball needs=<br />
(with reference to [[EMITarball]])<br />
<br />
'''Work in Progress'''<br />
<br />
Please note that we are unable to support glexec directly within the tarball, for many reasons. Listed below is a possible method (still being tested) for a site to build their own relocatable glexec. A group of sites using the same convention for tarball mount points could share the same glexec build to lower the total workload.<br />
<br />
We welcome all feedback on the tickets listed below, or to the tarball support e-mail ( tarball-grid-support atSPAMNOT cern.ch ).<br />
<br />
==Acknowledgements and Further Reading==<br />
Please refer to the glexec web pages for more information:<br /><br />
https://wiki.nikhef.nl/grid/GLExec<br />
<br />
with particular thanks to the writers of:<br><br />
https://wiki.nikhef.nl/grid/Building_gLExec_from_src_rpm<br />
<br />
(the script I use is an updated version of the example given).<br />
<br />
==Requirements==<br />
<br />
* A clean SL6 system, similar to the nodes that you will run on. It will need network connectivity.<br />
* gcc and rpm-build packages installed, as well as the glexec user that you will use on your cluster.<br />
* The script below, or one like it:<br />
<br />
#!/bin/sh<br />
<br />
# SET CUSTOM BUILD ARGUMENTS HERE<br />
<br />
<br />
# EMI and EPEL directories<br />
glexec_pfx=/opt/gridapps/glexec<br />
glexec_etc=/opt/gridapps/glexec/etc<br />
glexec_doc=/opt/gridapps/glexec/doc<br />
<br />
<br />
# END OF BUILD ARGUMENTS<br />
<br />
# Setup build infrastructure<br />
export TOPDIR=`pwd`<br />
mkdir -p $TOPDIR/{SRPMS,SOURCES,SPECS,BUILD,RPMS/x86_64,RPMS/i386}<br />
<br />
# Download and install lcmaps-interface and glexec src<br />
rpm2cpio http://software.nikhef.nl/dist/mwsec/rpm/epel6/x86_64/lcmaps-basic-interface-1.6.1-1.el6.noarch.rpm | cpio -id<br />
rpm --define "_topdir $TOPDIR" -i http://software.nikhef.nl/dist/mwsec/rpm/epel6/SRPMS/glexec-0.9.11-1.el6.src.rpm<br />
<br />
# Patch spec file to match module directories for LCAS and LCMAPS<br />
sed -i "s+^\(%configure\).*+\1 --with-lcmaps-moduledir-sfx=$lcmaps_moddir_sfx --with-lcas-moduledir-sfx=$lcas_moddir_sfx+" $TOPDIR/SPECS/glexec.spec<br />
<br />
# Build the RPM<br />
CFLAGS=-I$TOPDIR/usr/include rpmbuild \<br />
--nodeps \<br />
-ba --define "_topdir $TOPDIR" \<br />
--define "_prefix $glexec_pfx" \<br />
--define "_sysconfdir $glexec_etc" \<br />
--define "_defaultdocdir $glexec_doc" \<br />
$TOPDIR/SPECS/glexec.sp<br />
<br />
<br />
The important site variables are glexec prefix, which should be your tarball mount point (the glexec binary will be in $prefix/sbin). The glexec_etc variable should point to where the glexec.conf file will be kept. The two rpm urls should be checked before building to make sure they are current and point to the latest and greatest release.<br />
<br />
Once run this will give you an rpm to unpack in RPMS. You can do this with an:<br />
<br />
rpm2cpio RPMS/x86_64/$glexec_rpm | cpio -dim<br />
<br />
You will then probably need to do some directory pruning before you have something you can load into your shared area. The glexec.conf file will need to have its ownership and permissions changed, probably to glexec.glexec, 0400. The glexec/sbin directory will likely need to be put into your $PATH environment variable.<br />
<br />
'''If planning on using the (recommended) setuid mode you will need to export and mount your tarballs so that glexec's suid properties aren't squashed. To this end it is recommended that you consider exporting glexec in parallel to instead of on the same mount as the "regular" tarball.<br />
<br />
==glexec in cvmfs==<br />
<br />
With reference to the ticket [https://ggus.eu/index.php?mode=ticket_info&ticket_id=116154 116154] we are investigating making glexec available through cvmfs - although it is early days yet, and we '''cannot''' at this juncture recommend sites mounting cvmfs with suid enabled.<br />
<br />
==ggus ticket==<br />
Please also see the original glexec tarball ticket [https://ggus.eu/index.php?mode=ticket_info&ticket_id=95832 95832] (submitted by the tarball devs to themselves).<br />
<br />
-Matt, 17th September 2015</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/RelocatableGlexecRelocatableGlexec2015-09-17T11:19:07Z<p>Matthew Doidge 1ac9bd3994: /* Building GLEXEC to suit your site's tarball needs */</p>
<hr />
<div>=Building GLEXEC to suit your site's tarball needs=<br />
(with reference to [[EMITarball]])<br />
<br />
'''Work in Progress'''<br />
<br />
Please note that we are unable to support glexec directly within the tarball, for many reasons. Listed below is a possible method (still being tested) for a site to build their own relocatable glexec. A group of sites using the same convention for tarball mount points could share the same glexec build to lower the total workload.<br />
<br />
We welcome all feedback on the tickets listed below, or to the tarball support e-mail ( tarball-grid-support atSPAMNOT cern.ch ).<br />
<br />
==Acknowledgements and Further Reading==<br />
Please refer to the glexec web pages for more information:<br /><br />
https://wiki.nikhef.nl/grid/GLExec<br />
<br />
with particular thanks to the writers of:<br><br />
https://wiki.nikhef.nl/grid/Building_gLExec_from_src_rpm<br />
<br />
(the script I use is an updated version of the example given).<br />
<br />
==Requirements==<br />
<br />
* A clean SL6 system, similar to the nodes that you will run on. It will need network connectivity.<br />
* gcc and rpm-build packages installed, as well as the glexec user that you will use on your cluster.<br />
* The script below, or one like it:<br />
<br />
#!/bin/sh<br />
<br />
# SET CUSTOM BUILD ARGUMENTS HERE<br />
<br />
<br />
# EMI and EPEL directories<br />
glexec_pfx=/opt/gridapps/glexec<br />
glexec_etc=/opt/gridapps/glexec/etc<br />
glexec_doc=/opt/gridapps/glexec/doc<br />
<br />
<br />
# END OF BUILD ARGUMENTS<br />
<br />
# Setup build infrastructure<br />
export TOPDIR=`pwd`<br />
mkdir -p $TOPDIR/{SRPMS,SOURCES,SPECS,BUILD,RPMS/x86_64,RPMS/i386}<br />
<br />
# Download and install lcmaps-interface and glexec src<br />
rpm2cpio http://software.nikhef.nl/dist/mwsec/rpm/epel6/x86_64/lcmaps-basic-interface-1.6.1-1.el6.noarch.rpm | cpio -id<br />
rpm --define "_topdir $TOPDIR" -i http://software.nikhef.nl/dist/mwsec/rpm/epel6/SRPMS/glexec-0.9.11-1.el6.src.rpm<br />
<br />
# Patch spec file to match module directories for LCAS and LCMAPS<br />
sed -i "s+^\(%configure\).*+\1 --with-lcmaps-moduledir-sfx=$lcmaps_moddir_sfx --with-lcas-moduledir-sfx=$lcas_moddir_sfx+" $TOPDIR/SPECS/glexec.spec<br />
<br />
# Build the RPM<br />
CFLAGS=-I$TOPDIR/usr/include rpmbuild \<br />
--nodeps \<br />
-ba --define "_topdir $TOPDIR" \<br />
--define "_prefix $glexec_pfx" \<br />
--define "_sysconfdir $glexec_etc" \<br />
--define "_defaultdocdir $glexec_doc" \<br />
$TOPDIR/SPECS/glexec.sp<br />
<br />
<br />
The important site variables are glexec prefix, which should be your tarball mount point (the glexec binary will be in $prefix/sbin). The glexec_etc variable should point to where the glexec.conf file will be kept. The two rpm urls should be checked before building to make sure they are current and point to the latest and greatest release.<br />
<br />
Once run this will give you an rpm to unpack in RPMS. You can do this with an:<br />
<br />
rpm2cpio RPMS/x86_64/$glexec_rpm | cpio -dim<br />
<br />
You will then probably need to do some directory pruning before you have something you can load into your shared area. The glexec.conf file will need to have its ownership and permissions changed, probably to glexec.glexec, 0400. The glexec/sbin directory will likely need to be put into your $PATH environment variable.<br />
<br />
'''If planning on using the (recommended) setuid mode you will need to export and mount your tarballs so that glexec's suid properties aren't squashed. To this end it is recommended that you consider exporting glexec in parallel to instead of on the same mount as the "regular" tarball.<br />
<br />
==glexec in cvmfs==<br />
<br />
With reference to the ticket [https://ggus.eu/index.php?mode=ticket_info&ticket_id=116154 116154] we are investigating making glexec available through cvmfs - although it is early days yet, and we '''cannot''' at this juncture recommend sites mounting cvmfs with suid enabled.<br />
<br />
==ggus ticket==<br />
Please also see the original glexec tarball ticket [https://ggus.eu/index.php?mode=ticket_info&ticket_id=95832 95832] (submitted by the tarball devs to themselves).<br />
<br />
-Matt, 17th September 2015</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/RelocatableGlexecRelocatableGlexec2015-09-17T11:17:18Z<p>Matthew Doidge 1ac9bd3994: Created page with "=Building GLEXEC to suit your site's tarball needs= (with reference to EMITarball) '''Work in Progress''' Please note that we are unable to support glexec directly within..."</p>
<hr />
<div>=Building GLEXEC to suit your site's tarball needs=<br />
(with reference to [[EMITarball]])<br />
'''Work in Progress'''<br />
<br />
Please note that we are unable to support glexec directly within the tarball, for many reasons. Listed below is a possible method (still being tested) for a site to build their own relocatable glexec. A group of sites using the same convention for tarball mount points could share the same glexec build to lower the total workload.<br />
<br />
We welcome all feedback on the tickets listed below, or to the tarball support e-mail ( tarball-grid-support atSPAMNOT cern.ch ).<br />
<br />
==Acknowledgements and Further Reading==<br />
Please refer to the glexec web pages for more information:<br /><br />
https://wiki.nikhef.nl/grid/GLExec<br />
<br />
with particular thanks to the writers of:<br><br />
https://wiki.nikhef.nl/grid/Building_gLExec_from_src_rpm<br />
<br />
(the script I use is an updated version of the example given).<br />
<br />
==Requirements==<br />
<br />
* A clean SL6 system, similar to the nodes that you will run on. It will need network connectivity.<br />
* gcc and rpm-build packages installed, as well as the glexec user that you will use on your cluster.<br />
* The script below, or one like it:<br />
<br />
#!/bin/sh<br />
<br />
# SET CUSTOM BUILD ARGUMENTS HERE<br />
<br />
<br />
# EMI and EPEL directories<br />
glexec_pfx=/opt/gridapps/glexec<br />
glexec_etc=/opt/gridapps/glexec/etc<br />
glexec_doc=/opt/gridapps/glexec/doc<br />
<br />
<br />
# END OF BUILD ARGUMENTS<br />
<br />
# Setup build infrastructure<br />
export TOPDIR=`pwd`<br />
mkdir -p $TOPDIR/{SRPMS,SOURCES,SPECS,BUILD,RPMS/x86_64,RPMS/i386}<br />
<br />
# Download and install lcmaps-interface and glexec src<br />
rpm2cpio http://software.nikhef.nl/dist/mwsec/rpm/epel6/x86_64/lcmaps-basic-interface-1.6.1-1.el6.noarch.rpm | cpio -id<br />
rpm --define "_topdir $TOPDIR" -i http://software.nikhef.nl/dist/mwsec/rpm/epel6/SRPMS/glexec-0.9.11-1.el6.src.rpm<br />
<br />
# Patch spec file to match module directories for LCAS and LCMAPS<br />
sed -i "s+^\(%configure\).*+\1 --with-lcmaps-moduledir-sfx=$lcmaps_moddir_sfx --with-lcas-moduledir-sfx=$lcas_moddir_sfx+" $TOPDIR/SPECS/glexec.spec<br />
<br />
# Build the RPM<br />
CFLAGS=-I$TOPDIR/usr/include rpmbuild \<br />
--nodeps \<br />
-ba --define "_topdir $TOPDIR" \<br />
--define "_prefix $glexec_pfx" \<br />
--define "_sysconfdir $glexec_etc" \<br />
--define "_defaultdocdir $glexec_doc" \<br />
$TOPDIR/SPECS/glexec.sp<br />
<br />
<br />
The important site variables are glexec prefix, which should be your tarball mount point (the glexec binary will be in $prefix/sbin). The glexec_etc variable should point to where the glexec.conf file will be kept. The two rpm urls should be checked before building to make sure they are current and point to the latest and greatest release.<br />
<br />
Once run this will give you an rpm to unpack in RPMS. You can do this with an:<br />
<br />
rpm2cpio RPMS/x86_64/$glexec_rpm | cpio -dim<br />
<br />
You will then probably need to do some directory pruning before you have something you can load into your shared area. The glexec.conf file will need to have its ownership and permissions changed, probably to glexec.glexec, 0400. The glexec/sbin directory will likely need to be put into your $PATH environment variable.<br />
<br />
'''If planning on using the (recommended) setuid mode you will need to export and mount your tarballs so that glexec's suid properties aren't squashed. To this end it is recommended that you consider exporting glexec in parallel to instead of on the same mount as the "regular" tarball.<br />
<br />
==glexec in cvmfs==<br />
<br />
With reference to the ticket [https://ggus.eu/index.php?mode=ticket_info&ticket_id=116154 116154] we are investigating making glexec available through cvmfs - although it is early days yet, and we '''cannot''' at this juncture recommend sites mounting cvmfs with suid enabled.<br />
<br />
==ggus ticket==<br />
Please also see the original glexec tarball ticket [https://ggus.eu/index.php?mode=ticket_info&ticket_id=95832 95832] (submitted by the tarball devs to themselves).<br />
<br />
-Matt, 17th September 2015</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/EMITarballEMITarball2015-09-17T10:35:06Z<p>Matthew Doidge 1ac9bd3994: </p>
<hr />
<div><br />
<br />
The EMI WN UI Tarball project is on going, headed by members of GridPP - particularly Matt Doidge at Lancaster University. Sadly we lost much of our old wiki pages, hopefully this page will answer any questions. Please pardon our mess, and contact us if you have any questions.<br />
<br />
==Overview==<br />
The EMI3 Worker Node and User Interface tarball is produced for SL6 using tools originally written by David Smith (CERN). The goal of the tarball is to allow the EMI WN or UI to be served over a remote export (such as NFS) on a SL6 machine, without installation of any extra packages or rpms. The tarballs can also be exported over [http://cernvm.cern.ch/portal/filesystem cvmfs], and the latest versions of the tarball are available in the grid.cern.ch repository.<br />
<br />
Currently only the SL6 versions of the EMI WN and UI tarballs are being developed. Work on an SL7 version will start later in 2015.<br />
<br />
==Where to Download==<br />
The latest versions of the EMI3 WN and UI Tarballs can be found here:<br />
<br />
http://repository.egi.eu/mirrors/EMI/tarball/test/sl6/emi3-emi-wn/<br />
<br />
(latest emi-wn-3.15.3-1_sl6v1, June 2015 - the first WN tarball with the gfal2 tools in it)<br />
<br />
http://repository.egi.eu/mirrors/EMI/tarball/test/sl6/emi3-emi-ui/<br />
<br />
(latest emi-ui-3.15.0-1_v1, July 2015 - this vision requires a minor "patch", see below. This is also the first of the "lighter-weight" tarballs.)<br />
<br />
The latest version of both can be found in the grid.cern.ch cvmfs repository.<br />
<br />
/cvmfs/grid.cern.ch/emi3ui-latest<br />
/cvmfs/grid.cern.ch/emi3wn-latest<br />
<br />
Environments for these can be set up using /cvmfs/grid.cern.ch/emi3XX-latest/etc/profile.d/setup-XX-example.sh<br />
<br />
===emi-ui-3.15.0-1_v1 patch===<br />
It was discovered that the tool glite-ce-job-output has a hardcoded default path for the uberftp client. To overcome this one needs to create a config file like:<br />
<br />
cat $EMI_TARBALL_BASE/etc/emitar-cream-client.conf <br />
[<br />
UBERFTP_CLIENT=uberftp<br />
]<br />
<br />
And export the variable GLITE_CREAM_CLIENT_CONFIG pointing at this:<br />
<br />
export GLITE_CREAM_CLIENT_CONFIG=$EMI_TARBALL_BASE/etc/emitar-cream-client.conf <br />
<br />
You only need to do this if using the glite-ce-job-* tools. The UI in cvmfs has this patch installed.<br />
<br />
<br />
===gsiscp===<br />
gsiscp, within the gsi ssh tools included in the tarball, has a default hardcoded path for gsissh in it. This can be overcome by specifying which ssh protocol to use (the "-S" option).<br />
<br />
gsiscp -S gsissh ..... <br />
<br />
One can make things a bit easier by aliasing this in your shell (this alias is ''not'' included in the example environment setup scripts).<br />
<br />
alias gsiscp='gsiscp -S gsissh'<br />
<br />
==Tarball Structure==<br />
<br />
Each tarball currently comes in two parts:<br />
* The core tarball, containing the unpacked packages from the EMI repository.<br />
* The "os-extras" tarball, built from packages from the SL and epel repositories.<br />
<br />
The rpms that went into building each tarball are listed in "rpmlist.txt" and "os-extras.txt" respectively. A "halfway" approach to the tarball, installing packages from the "os-extras" list and only using the core tarball, is supported and does work.<br />
<br />
This structure is currently under review and may change.<br />
<br />
===Tarball Versions===<br />
<br />
The tarball versions listed may look convoluted, but there is a system to them! The first part denotes what middleware was used to build the tarball (emi-ui or emi-wn), the second is the version of that middleware built as denoted by the rpm. The _vX is native to the tarball version, denoting the iteration of the tarball for that particular middleware release (things don't always go right first time). <br />
<br />
==How to install==<br />
<br />
1. Download the tarball.<br />
2. Unpack the tarball (tar -xzf ....) to an exported volume or onto the node itself. If using the os-extras tarball you will need to download and unpack it in the same directory.<br />
3. Write or edit a script that points such variables as PATH, LD_LIBRARY_PATH, VOMS_USERCONF etc at the tarball. An example setup script is placed in etc/profile.d/ in each tarball.<br />
4. That's it - you should have a working set of the UI and WN middleware. Some extra work is needed for the vomsdir, vomses, CA and CRLs. For a WN you will have to set up the users and batch system yourself.<br />
<br />
If wanting to access the tarball in the grid.cern.ch cvmfs repo, simply replace the unpacking of the tarball with setting up cvmfs, enable the grid.cern.ch repository, and have your scripts point there instead. Examples are stored in the repository, which is maintained by the tarball team.<br />
<br />
===Notes for tarballs containing the gfal2 tools===<br />
<br />
In order for the GFAL2 tools to work from the tarball there need to be some additions to the environment:<br />
<br />
* You need to include the 32-bit site-python in your PYTHONPATH as well as the 64-bit, e.g.:<br />
<br />
PYTHONPATH=$EMI_TARBALL_BASE/usr/lib64/python2.6/site-packages:$EMI_TARBALL_BASE/usr/lib/python2.6/site-packages:$PYTHONPATH<br />
<br />
* You need to include these two GFAL specific variables, e.g.:<br />
<br />
GFAL_PLUGIN_DIR=$EMI_TARBALL_BASE/usr/lib64/gfal2-plugins/<br />
GFAL_CONFIG_DIR=$EMI_TARBALL_BASE/etc/gfal2.d/<br />
<br />
===CVMFS notes===<br />
<br />
We advise that one maintains ones own tarball profile scripts, but a functional UI can be obtained on a node simply by (on a node with the grid.cern.ch repo enabled) sourcing /etc/grid-security/emi3ui-latest/etc/profile.d/setup-ui-example.sh. This will use the certificates, vomsdir and vomses in /cvmfs/grid.cern.ch/etc/grid-security - which are not configured for all VOs. Please contact the tarball support team if you would like a VO added or an entry updated.<br />
<br />
==GLEXEC and the Tarball==<br />
<br />
Due to the delicate, secure and highly customisable nature of glexec we are unable to supply a proper "relocatable" distribution of the glexec tools. Sites will have t build their own. Please see the page [[ RelocatableGlexec ]] (still under construction) for information on how to do this.<br />
<br />
==Future Plans==<br />
<br />
'''As of the latest emi-ui tarball (3.15) we have moved to creating the UI tarball for a more up to date platform, greatly reducing the size and number of packages included. If this causes problems for you please let us know.'''<br />
<br />
<br />
===Planned new structure===<br />
The tarballs are currently produced on the same basic-server install SL6 VM that they have been for the last few years - kept up to date but otherwise untouched. However this has left some problems with some fairly low-level libraries being rolled into them. There is also a problem with the fact that the WN tarball ideally requires the HEPOSLIBS metapackage(s) installed - which can at the same time compound the previous problem whilst simultaneously working against the idea of a "complete tarball" . Finally, feedback has been given that some sites would like the epel and SL repo rpms to be separated from within the "os-extras" tarballs.<br />
<br />
These factors have led to us considering a change in the tarball production infrastructure and methodology:<br />
*Tarballs will be produced on a platform of a node in which the HEPOSlibs are already installed to try to reduce the number of "low-level" libraries appearing in it.<br />
*The "os-extras" tarball will be split to "sl-extras" and "epel-extras".<br />
*A single "full" version of the tarballl, made from the base, extras ''and'' heposlibs rpms will be produced on a separate (but cloned) node. This full tarball will mainly be intended for use in cvmfs (to aim for a paradigm where all you need is cmvfs installed).<br />
<br />
===SL7 tarball===<br />
Currently it is looking like the SL7 tarball will start off being an "ad-hoc" affair, consisting of a list of (to-be-identified) utilities pulled into a relocatable distribution.<br />
<br />
==How to contact us==<br />
<br />
The EMI tarball has its own GGUS support group, this one of the better ways of getting in touch, and of course the place to submit tickets to - either about the regular tarball or the tarballs within grid.cern.ch.<br />
<br />
https://wiki.egi.eu/wiki/GGUS:UI_WN_Tarball_FAQ<br />
<br />
There is a tarball email address:<br />
<br />
tarball-grid-support atSPAMNOT cern.ch<br />
<br />
==Old docs==<br />
The old docs have been secreted [https://www.gridpp.ac.uk/wiki/OldEMITarball here]. We hope to improve the documentation over time, when we have time!</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Operations_Bulletin_LatestOperations Bulletin Latest2015-09-14T15:19:05Z<p>Matthew Doidge 1ac9bd3994: /* */</p>
<hr />
<div>[[Operations_Bulletin_Archive|Bulletin archive]]<br />
<br />
__NOTOC__<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Week commencing 14th September 2015<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Task Areas<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start General****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | General updates <br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start General text*********************** ----->'''<br />
'''Tuesday 1st September'''<br />
* From October/November, the EGI ops VO monitoring will be performed using RFC proxies, as opposed to legacy proxies.<br />
* There will be a new filter for the critical profile for ATLAS WLCG SAM tests so that only production endpoints will be tested and taken into account for site availability metrics. This will be available from the SAM3 interface.<br />
* CNAF: Due to a fire causing problems with one electrical supply line that happened last Thursday, the computing centre is running at lower capacity (around 30% less of the pledged capacity).<br />
* Machine job features testing has hit a big according to Raul!<br />
* The HNSciCloud PCP pilot proposal successful submission (refer to Andrew Sansum's talk at GridPP34 if you forget what this means!). The project intends to procure commercial cloud resources for FY17 and FY18. We will contribute 75K euro towards this activity and the EU will then top up to 250K euro.<br />
* There was an EGI Operations Management Board last Thursday. There are no summary notes yet, but please take a look at the [https://indico.egi.eu/indico/conferenceDisplay.py?confId=2379 agenda] and linked talks (may be worth skimming them at the ops meeting).<br />
* There was a quick request/reminder for sites to please update their [https://www.gridpp.ac.uk/wiki/IPv6_site_status IPv6 entries in the GridPP wiki].<br />
* RAL is closed on Monday and Tuesday of this week!<br />
<br />
<br />
<br />
'''Tuesday 24th August'''<br />
* A [https://indico.cern.ch/event/319751/ draft agenda for the September GDB] is taking shape. Any Tier-2 rep volunteers this month please email Jeremy.<br />
* Interest in HTCondor CE. (Ops thread 20/08).<br />
<br />
<br />
'''Tuesday 18th August'''<br />
* Root CA - Jens will provide a quick incident report of what went wrong!<br />
* [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGDailyMeetingsWeek150817#Monday Notes from Monday's WLCG ops meeting] are available. <br />
* DPM tuning (raised my Matt D - 17/08)<br />
* [http://cds.cern.ch/journal/CERNBulletin/2015/32/News%20Articles/2038500?ln=en Developers@CERN Forum] is an initiative of a group of developers targeting all software developers at CERN.<br />
* Multicore inefficiencies: RAL - CMS allocated slots not used. Glasgow - the concern for us is CPU that was previously being used by ATLAS is no longer being used because of multicore size mismatches. <br />
* [https://labs.ripe.net/atlas/user-experiences RIPE ATLAS probe experience pages]. <br />
* Automating ATLAS consistency checking (See Alastair's email 12/08)<br />
* For generating docs out of markdown Steve T can recommend gitbook. This is what CERN use for the [http://cern.ch/config Config guide].<br />
<br />
<br />
<!-- **********************End General text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End General****************** -----><br />
<!-- ****************Start ops coord****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Operations Coordination - [https://indico.cern.ch/categoryDisplay.py?categId=4372 Agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start ops coord text*********************** -----><br />
'''Tuesday 25th August'''<br />
* [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150820 Minutes from the meeting last Thursday] are available.<br />
* Highlights: <br />
* Calling for volunteers to write articles on interesting topics for WLCG Operations. There is a [http://wlcg-ops.web.cern.ch/articles section in the portal] for this. In case you are interested, please send a mail to wlcg-ops-coord-chair people for details.<br />
* CERN is going to disable in few weeks write operations via RFIO v2 to Castor in the context of the RFIO access decommission.<br />
* A downtime for the Argus service @ CERN is expected due to the pending filer migration. No dates yet but it will affect ARGUS, all CEs and all batch worker nodes (glExec) running GridJobs. The downtime is foreseen to last 1h-2h.<br />
* Issues with VMs in Tier-0 infra also reported by CMS<br />
* LHCb is trying to integrate submission to the HTCondorCE instance @ CERN<br />
* A message broker pre-prod infra has been setup @ CERN to enable distribution of perfSONAR data to the experiments. OSG is enabling the data publication from the ITB collector service<br />
* Please observe the Action List in the minutes and on the WLCG Operations portal by clicking on the Task List relevant to: [https://wlcg-ops.web.cern.ch/sys-admins/tasks Sites]: [https://wlcg-ops.web.cern.ch/experiments/tasks Experiments].<br />
<br />
'''Tuesday 18th August'''<br />
* Pretty quiet! [https://indico.cern.ch/event/393614/ Next ops coordination meeting is on 20th].<br />
<br />
<br />
<!-- **********************End ops coord text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End ops coord****************** -----><br />
<br />
<!-- ****************Start T1****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-1 - [http://www.gridpp.rl.ac.uk/status/ Status Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start T1 text*********************** -----><br />
'''Tuesday 1st September'''<br />
A reminder that there is a weekly [http://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Tier-1 experiment liaison meeting]. Notes from the last meeting [https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2015-08-26 here]<br />
* Note RAL closed both Monday and Tuesday 31st Aug. and 1st Sep.<br />
* The post-mortem review of the network incident on the 8th April has been finalised. See [https://www.gridpp.ac.uk/wiki/RAL_Tier1_Incident_20150408_network_intervention_preceding_Castor_upgrade here].<br />
* The Castor disk servers are being upgraded to SL6 (26/27 Aug).<br />
* Entries declared in GOC DB for a brief network interruption (Thursday 3rd Sep) and a series of Castor interventions as the back-end Oracle database is upgraded to a later version.<br />
* The MICE experiment will be taking data daily from the 8th September, when the ISIS neutron source restarts operation. This data will be stored at the Tier1.<br />
<!-- **********************End T1 text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End T1****************** -----><br />
<br />
<!-- ****************Start Storage & DM****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Storage & Data Management - [http://storage.esc.rl.ac.uk/weekly/ Agendas/Minutes]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150902-minutes.txt Wednesday 02 Sep]'''<br />
* Catch up with [http://www3.imperial.ac.uk/highenergyphysics/research/experiments/mice MICE]<br />
* How to do transfers of '''lots''' of files with FTS3 without the proxy timing out (in particularly if you need it vomsified)<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150812-minutes.txt Wednesday 12 Aug]'''<br />
* sort of housekeeping: data cleanups, catalogue synchronisation - in particular namespace dumps for VOs<br />
* GridPP storage/data at future events; GridPP35 and Hepix and Cloud data events<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150708-minutes.txt Wednesday 08 July]'''<br />
* Huge backlog of ATLAS data from Glasgow waiting to go to RAL, and oddly varying performance numbers - investigating<br />
* How physics data is like your Windows 95 games<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150701-minutes.txt Wednesday 01 July]'''<br />
* Feedback on CMS's proposal for listing contents of storage<br />
* Simple storage on expensive raided disks vs complicated storage on el cheapo or archive drives?<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150624-minutes.txt Wednesday 24 June]'''<br />
* Heard about the Indigo datacloud project, a H2020 project in which STFC is participating<br />
* Data transfers, theory and practice<br />
** Somewhat clunky tools to set up but perform well when they run<br />
** Will continue to work on recommendations/overview document<br />
** Worth having recommendations/experiences for different audiences - (potential) users, decision makers, techies<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End Storage & DM****************** -----><br />
<!-- ****************Start Accounting****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Accounting - [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html UK Grid Metrics] [[HEPSPEC06]] [http://tinyurl.com/cfbfanf Atlas Dashboard HS06]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Monday 20th July'''<br />
* Oxford publishing 0 cores from Cream today. Maybe they forgot to switch one off. [http://goc-accounting.grid-support.ac.uk/apel/jobs2_withsubmithost.html Check here]. <br />
<br />
'''Tuesday 14th July'''<br />
* QMUL and Sheffield appear to be lagging with publishing by a week. <br />
* Please check your multicore publishing status (especially those sites mentioned in June).<br />
<br />
'''Tuesday 16th June'''<br />
* Region not publishing accounting by number of cores.<br />
** "0" core submission hosts:<br />
** ce3.dur.scotgrid.ac.uk<br />
** ce4.dur.scotgrid.ac.uk<br />
** cetest02.grid.hep.ph.ic.ac.uk<br />
** hepgrid5.ph.liv.ac.uk<br />
** hepgrid6.ph.liv.ac.uk<br />
** hepgrid97.ph.liv.ac.uk<br />
** svr009.gla.scotgrid.ac.uk<br />
** t2ce06.physics.ox.ac.uk<br />
<br />
'''Tuesday 9th June'''<br />
* Delay noted for Sheffield<br />
<br />
<br />
* A reminder to keep updating the [https://www.gridpp.ac.uk/wiki/HEPSPEC06 HEPSPEC06 tables].<br />
<br />
* [http://accounting.egi.eu/egi.php?SubRegion=1.67&query=normcpu&startYear=2014&startMonth=8&endYear=2014&endMonth=9&yRange=SITE&xRange=VO&voGroup=lhc&chart=GRBAR&scale=LIN&localJobs=onlygridjobs APEL status]: An issue at Sheffield?<br />
<br />
<!-- *****************Edit stop*********************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End Accounting****************** -----><br />
<!-- ****************Start Documentation****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Documentation - [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=area KeyDocs]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- *********************************************************** -----><br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 18th August'''<br />
* Lydia's document - Setup a system to do data archiving using FTS3<br />
<br />
'''Tuesday 28th July'''<br />
* Ewan: /cvmfs/gridpp-vo help ... there's a lot of historical stuff on the GridPP wiki that makes it look a lot more complicated than it is now. We really should have a bit of a clear out at some point.<br />
<br />
'''Tuesday 23rd June'''<br />
* Reminder that documents need reviewing!<br />
<br />
<br />
'''General note'''<br />
<br />
See the [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=reviewed worst KeyDocs list] for documents needing review now and the names of the responsible people.<br />
<br />
|}<br />
<br />
<!-- ****************End Documentation****************** -----><br />
<!-- ****************Start Interop****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Interoperation - [https://wiki.egi.eu/wiki/Grid_Operations_Meetings EGI ops agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Monday 13th July'''<br />
<br />
* There was an EGI Ops meeting today: [https://wiki.egi.eu/wiki/Agenda-13-07-2015 agenda]<br />
* URT/UMD updates:<br />
** UMD 3.13.0 released on 01.07.2015: http://repository.egi.eu/2015/07/01/release-umd-3-13-0/<br />
*** APEL 1.4.1<br />
*** Argus PAP v. 1.6.2<br />
*** gLExec-wn - v. 1.2.3 (lcmaps and mkdir)<br />
*** storm 1.11.8<br />
*** fetch-crl 3.0.16<br />
*** cream 1.16.5<br />
*** dpm-xroot 3.5.2<br />
*** Xroot 4.1.1.<br />
*** Frontier Squid 2.7.24<br />
*** CVMFS 2.1.20<br />
*** GFAL2 2.8.4<br />
*** GFAL2-PYTHON 1.7.1<br />
** UMD 3.13.1 released on 13.07.2015: http://repository.egi.eu/2015/07/13/release-candidate-umd-3-13-1-rc1/ (link was not updated correctly during release)<br />
*** ARC Nagios probes 1.8.3<br />
<br />
* SR updates (small because it's summer):<br />
*** gfal2 2.9.1<br />
*** storm 1.11.9<br />
*** srm-ifce 1.23.1....<br />
*** gfal2-python 1.8.1<br />
** In Verification<br />
*** gfal2-plugin-xrootd 0.3.4<br />
<br />
* Accounting<br />
** [John Gordon] "Of the WLCG sites we now have 97%+ of cpu reported with cores. I expect you all saw my recent email to GDB naming 16 sites. If one German and one Spanish site and the four Russians start publishing we will jump to 99%+"<br />
** New list of sites needing to update multicore accounting being prepared this evening (Monday) by Vincenzo<br />
<br />
* SL5 decommissioning date March 2016; <br />
* Next meeting 10th August<br />
<br />
'''Monday 15th June'''<br />
* There was an EGI operations meeting today: [https://wiki.egi.eu/wiki/Agenda-15-06-2015 agenda]. <br />
* New Action: for the NGIs: please start tracking which sites are still using SL5 services: how many services, and for each service if still needed on SL5, if upgrades on SL5 services are expected). [https://wiki.egi.eu/wiki/SL5_retirement A wiki has been provided to record updates]. Also interesting to understand who is using Debian.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- *********************************************************** -----><br />
<br />
|}<br />
<!-- ****************End Interop****************** -----><br />
<!-- ****************Start Monitoring****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Monitoring - [https://www.gridpp.ac.uk/wiki/Links_Monitoring_pages Links] [http://grid-monitoring.cern.ch/mywlcg/ MyWLCG]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 16th June'''<br />
* F Melaccio & D Crooks decided to add a [https://www.gridpp.ac.uk/wiki/Monitoring_FAQs FAQs section] devoted to common monitoring issues under the monitoring page.<br />
* Feedback welcome. <br />
<br />
<br />
<br />
''' Tuesday 31st March<br />
<br />
* Glasgow Graphite/Grafana documentation: http://www.scotgrid.ac.uk/graphite/<br />
<br />
'''Monday 7th December<br />
<br />
* Meeting last Friday - agenda: https://indico.cern.ch/event/356853/ minutes: https://indico.cern.ch/event/356853/material/minutes/1.pdf<br />
* This was the wrap-up meeting of the consolidation TF; the mailing list will remain extant for a while yet.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Monitoring****************** -----><br />
<!-- ****************On-duty****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | On-duty - [https://operations-portal.in2p3.fr/dashboard Dashboard] [https://www.gridpp.ac.uk/wiki/ROD_rota ROD rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 1st September'''<br />
* Gordon was on shift.<br />
* Another very quiet week with no new tickets. We have two open ROD tickets, both of which are for A/R; one is against Cambridge, and the other is the now 53-day-old ticket at UCL.<br />
* Next up.... Kashif again (thanks Kashif!)<br />
<br />
'''Monday 24th August'''<br />
* Kashif on shift.<br />
* There were quite a few alarms throughout the week and many tickets were opened. All of the tickets were fixed within time limit. <br />
* The certificate of the Argus server at sheffield expired but Elena got a new certificate quickly. <br />
* Cambridge and UCL have low availibilty tickets and not much can be done about it except waiting for availibilty to reach 90%.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Rollout [http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html Status] [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions WLCG Baseline]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 12th May'''<br />
* MW Readiness WG meeting Wed May 6th at 4pm. Attended by Raul, Matt, Sam and Jeremy.<br />
<br />
'''Tuesday 17th March'''<br />
* Daniela has updated the [ https://www.gridpp.ac.uk/wiki/Staged_rollout_emi3 EMI-3 testing table]. Please check it is correct for your site. We want a clear view of where we are contributing.<br />
* There is a middleware readiness meeting this Wednesday. Would be good if a few site representatives joined.<br />
* Machine job features solution testing. Fed back that we will only commence tests if more documentation made available. This stops the HTC solution until after CHEP. Is there interest in testing other batch systems? Raul mentioned SLURM. There is also SGE and Torque.<br />
<br />
'''References'''<br />
<br />
* Staged Rollout pages (now separated into EMI1 & 2), and the page listing the deployed versions is extractable from the bdii, so they should all be reasonably up-to-date:<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout_emi2.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End On-duty****************** -----><br />
<!-- ****************Start Security****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Security - [http://www.gridpp.ac.uk/security/inchand/Incident.html Incident Procedure] [http://www.gridpp.ac.uk/security/policies/index.html Policies] [https://www.gridpp.ac.uk/wiki/SD_rota Rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
'''Tuesday 1st September'''<br />
* The IGTF has released an [ https://rt.egi.eu/rt/Ticket/Display.html?id=9406 urgent update to the trust anchor repository (1.67)]<br />
* Linda is working on a revision to the EGI Technology Questionnaire.<br />
<br />
'''Tuesday 24th August'''<br />
* [https://wiki.egi.eu/wiki/SVG:Advisory-SVG-2015-9323 EGI SVG Advisory "Moderate" RISK - dCache EGI-SVG-2015-9323]<br />
* [https://wiki.egi.eu/wiki/SVG:Advisory-SVG-2014-7159 EGI SVG Advisory 'Low' RISK - VOMs Potential DoS EGI-SVG-2014-7159]<br />
* EGI IGTF CA update, version 1.66-1 ticket created [EGI #9351], due August 31st.<br />
<br />
'''Tuesday 18th August'''<br />
* CVE-2015-3245 (libuser) - EGI-CSIRT processing ~50 sites remaining. None in UK.<br />
* glexec for small vos - summary doc in progress by IanN. Circulated to sec. team for sanity check. More work needed. <br />
<br />
<br />
The EGI [https://operations-portal.egi.eu/csiDashboard security dashboard].<br />
<br />
|}<br />
<!-- ****************End Security****************** -----><br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start Services****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Services - [http://netmon02.grid.hep.ph.ic.ac.uk:8080/maddash-webui/index.cgi PerfSonar dashboard] | [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).<br />
<br />
'''Tuesday 18th August'''<br />
* Next [https://indico.cern.ch/event/401680/ LHCOPN and LHCONE joint meeting]: Science Park Amsterdam (NL) 28-29 of October 2015<br />
<br />
'''Tuesday 14th July'''<br />
* GridPP35 in September will have a part focus on networking and IPv6. This will include a review of where sites are with their deployment. Please try to firm up dates for your IPv6 availability between now and September. Please update the [https://gridpp.ac.uk/wiki/IPv6_site_status GridPP IPv6 status table]. <br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Services****************** -----><br />
<!-- ****************Start Tickets****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tickets<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Monday 14th September, 15.00 BST'''<br /><br />
<br />
Yet another brief ticket review - its been a tad busy! Hopefully normal service will resume, err, next month. My apologies.<br />
<br />
Only 20 Open UK tickets this week.<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115805 115805] - This RAL ticket from Sno+ looks like it can be closed, with there not actually being a problem with the site, or a feature that RAL would really want to implement. <br />
<br />
Keeping on the Tier 1, does this ticket concerning the FTS3 certificates ([https://ggus.eu/?mode=ticket_info&ticket_id=115290 115290]) have anyone looking into it yet?<br />
<br />
Are these two QMUL LHCB tickets duplicates, or related? [https://ggus.eu/?mode=ticket_info&ticket_id=115959 115959] & the more recent [https://ggus.eu/?mode=ticket_info&ticket_id=116153 116153]<br />
<br />
This Sussex ticket looks like it could do with someone taking a look - still just "assigned" since the 9th - [https://ggus.eu/?mode=ticket_info&ticket_id=116136 116136]<br />
<br />
Finally, an interesting VAC ticket for Oxford - "no space left on device" for atlas jobs - [https://ggus.eu/?mode=ticket_info&ticket_id=116123 116123]. Bit of an odd one!<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tickets****************** -----><br />
<!-- ****************Start Tools****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tools - [https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] [https://gridppnagios.physics.ox.ac.uk/nagios/ Nagios]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== ===== <br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 09 June 2015'''<br />
*ARC CEs were failing nagios test becuase of non-availability of egi repository. Nagios test compare CA version from EGI repo. It started on 5th June and one of the IP addresses behind webserver was not responding. Problem went away in approximately 3 hours. The same problem started again on 6th June. Finally it was fixed on 8th June. No reason was given in any of the ticket opened regarding this outage. <br />
<br />
'''Tuesday 17th February'''<br />
* Another period where message brokers were temporarily unavailable seen yesterday. Any news on the last follow-up?<br />
<br />
'''Tuesday 27th January'''<br />
* Unscheduled outage of the EGI message broker (GRNET) caused a short-lived disruption to GridPP site monitoring (jobs failed) last Thursday 22nd January. Suspect BDII caching meant no immediate failover to stomp://mq.cro-ngi.hr:6163/ from stomp://mq.afroditi.hellasgrid.gr:6163/<br />
<br />
* [http://southgrid.blogspot.co.uk/2014/10/nagios-monitoring-for-non-lhc-vos.html Blog about VO Nagios]<br />
* [https://vo-nagios.physics.ox.ac.uk/nagios/ Oxford VO Nagios] currently monitoring gridpp, pheno, t2k.org, snoplus.snolab.ca, vo.southgrid.ac.uk.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tools****************** -----><br />
<!-- ****************Start VOs****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | VOs - [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS] [http://operations-portal.egi.eu/vo VO IDs] [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs Approved] [http://pprc.qmul.ac.uk/~walker/votable.html VO table]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 19th May'''<br />
* There is a current priority for enabling/supporting our joining communities. <br />
<br />
'''Tuesday 5th May'''<br />
* We have a number of VOs to be removed. Dedicated follow-up meeting proposed.<br />
<br />
'''Tuesday 28th April'''<br />
* For SNOPLUS.SNOLAB.CA, the port numbers for voms02.gridpp.ac.uk and voms03.gridpp.ac.uk have both been updated from 15003 to 15503.<br />
<br />
'''Tuesday 31st March'''<br />
* LIGO are in need of additional support for debugging some tests.<br />
* LSST now enabled on 3 sites. No 'own' CVMFS yet.<br />
<br />
* Impact<br />
** Citation policy (https://www.gridpp.ac.uk/acknowledging.html)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End VOs****************** -----><br />
<!-- ****************Start Sites****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Site Updates<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 24th February'''<br />
* Next review of status today.<br />
<br />
'''Tuesday 27th January'''<br />
* Squids not in GOCDB for: UCL; ECDF; Birmingham; Durham; RHUL; IC; Sussex; Lancaster<br />
* Squids in GOCDB for: EFDA-JET; Manchester; Liverpool; Cambridge; Sheffield; Bristol; Brunel; QMUL; T1; Oxford; Glasgow; RALPPD.<br />
<br />
'''Tuesday 2nd December''' <br />
* [https://www.gridpp.ac.uk/wiki/Batch_system_status Multicore status]. Queues available (63%)<br />
** YES: RAL T1; Brunel; Imperial; QMUL; Lancaster; Liverpool; Manchester; Glasgow; Cambridge; Oxford; RALPP; Sussex (12)<br />
** NO: RHUL (testing); UCL; Sheffield (testing); Durham; ECDF (testing); Birmingham; Bristol (7)<br />
<br />
* According to our [https://www.gridpp.ac.uk/wiki/Batch_system_status table] for cloud/VMs (26%)<br />
** YES: RAL T1; Brunel; Imperial; Manchester; Oxford (5)<br />
** NO: QMUL; RHUL; UCL; Lancaster; Liverpool; Sheffield; Durham; ECDF; Glasgow; Birmingham; Bristol; Cambridge; RALPP; Sussex (14)<br />
<br />
* [http://www.gridpp.ac.uk/php/gridpp-dirac-sam.php?action=viewlcg GridPP DIRAC jobs] successful (58%)<br />
** YES: Bristol; Glasgow; Lancaster; Liverpool; Manchester; Oxford; Sheffield; Brunel; IC; QMUL; RHUL (11)<br />
** NO: Cambridge; Durham; RALPP; RAL T1 (4) + ECDF; Sussex; UCL; Birmingham (4)<br />
<br />
* [https://www.gridpp.ac.uk/wiki/IPv6_site_status IPv6 status]<br />
** Allocation - 42%<br />
** YES: RAL T1; Brunel; IC; QMUL; Manchester; Sheffield; Cambridge; Oxford (8)<br />
** NO: RHUL; UCL; Lancaster; Liverpool; Durham; ECDF; Glasgow; Birmingham; Bristol; RALPP; Sussex<br />
<br />
* Dual stack nodes - 21%<br />
** YES: Brunel; IC; QMUL; Oxford (4)<br />
** NO: RHUL; UCL; Lancaster; Glasgow; Liverpool; Manchester; Sheffield; Durham; ECDF; Birmingham; Bristol; Cambridge; RALPP; Sussex, RAL T1 (15)<br />
<br />
<br />
<br />
'''Tuesday 21st October'''<br />
* High loads seen in xroot by several sites: Liverpool and RALT1... and also Bristol (see Luke's TB-S email on 16/10 for questions about changes to help).<br />
<br />
'''Tuesday 9th September'''<br />
* Intel announced the new generation of Xeon based on Haswell.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Sites****************** -----><br />
|}<br />
<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Meeting Summaries<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start PMB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Project Management Board - [http://www.gridpp.ac.uk/pmb/ Members][http://www.gridpp.ac.uk/php/pmb/minutes.php Minutes] [http://www.gridpp.ac.uk/pmb/ProjectManagement/QuarterlyReports/reports.html Quarterly Reports]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ******************************************* -----><br />
<!-- ****************End PMB****************** -----><br />
<!-- ****************Start Grid Ops****************** -----><br />
<!-- ******************************************* -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | GridPP ops meeting - [http://indico.cern.ch/categoryDisplay.py?categId=338 Agendas] [https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Actions] [https://www.gridpp.ac.uk/wiki/Category:GridPP_Operations Core Tasks]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Grid Ops****************** -----><br />
<!-- ****************Start T1 liaison****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) [https://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Agenda] Meeting takes place on Vidyo.<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Wednesday 8th July 2015'''<br />
[https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2015-07-08 Operations report]<br />
* Lots of preparation for the RAL Open Days. These start today (8th) and culminate in the public day on Saturday (11th).<br />
* Intervention on faulty router being prepared for 4th August.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Liaison****************** -----><br />
<!-- ****************Start GDB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Grid Deployment Board - [http://indico.cern.ch/categoryDisplay.py?categId=3l181 Agendas] [http://indico.cern.ch/categoryDisplay.py?categId=666 MB agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End GDB****************** -----><br />
<br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start NGI****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | NGI UK - [http://www.ukngi.ac.uk/ Homepage] [https://ca.grid-support.ac.uk/cgi-bin/pub/pki?cmd=getStaticPage&name=index CA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End NGI****************** -----><br />
<br />
<!-- ****************Start Events****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Events<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************Stop Events****************** -----><br />
<!-- ****************Start ATLAS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK ATLAS - [http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?#currentView=Shifter+view&highlight=false Shifter view] [http://www.atlas.ac.uk/ops/ News & Links]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Atlas S&C week 2-6 Feb 2015'''<br />
<br />
Production<br />
<br />
• Prodsys-2 in production since Dec 1st<br />
<br />
• Deployment has not been transparent , many issued has been solved, the grid is filled again <br />
<br />
• MC15 is expected to start soon, waiting for physics validations, evgen testing is underway and close to finalised.. Simulation expected to be broadly similar to MC14, no blockers expected. <br />
<br />
Rucio<br />
<br />
• Rucio in production since Dec 1st and is ready for LHC RUN-2. Some fields need improvements, including transfer and deletion agents, documentation and monitoring. <br />
<br />
• [https://rucio-ui.cern.ch/dumps Rucio dumps available].<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMDarkDataCleaning Dark data cleaning]<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMLostFilesLost files declaration] . Only Only DDM ops can issue lost files declaration for now, cloud support needs to fill a ticket.<br />
<br />
• Webdav panda functional tests with Hammercloud are ongoing<br />
<br />
Monitoring<br />
<br />
• [http://adc-monitoring.cern.ch/ Main page]<br />
<br />
• [http://dashb-atlas-ddm-acc.cern.ch/dashboard/request.py/ddmaccounting DDM Accounting]<br />
<br />
• [http://atlas-agis.cern.ch/agis/ddmblacklisting/listGroup space]<br />
<br />
• [http://dashb-atlas-ddm.cern.ch/ddm2/ Deletion]<br />
<br />
ASAP<br />
<br />
• ASAP (ATLAS Site Availability Performance) in place. Every 3 months the T2s sites performing BELOW 80% are [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/ATLASSiteCategorization reported to the International Computing Board.]<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End ATLAS****************** -----><br />
<!-- ****************Start CMS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK CMS<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End CMS****************** -----><br />
<!-- ****************Start LHCb****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK LHCb<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End LHCb****************** -----><br />
<!-- ****************Start Other****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK OTHER<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Other****************** -----><br />
<!-- ****************Start Requests****************** -----><br />
{| cellspacing="0" cellpadding="0" style="margin: 1em 0em 0em 0.3em; width:100%"<br />
| style="width:50%; vertical-align:top; border:1px solid Gold; background-color: LightGreen;" rowspan="2"|<br />
<div style="border-bottom:1px solid Gold; background-color:#ffffaa; padding:0.2em 0.5em 0.2em 0.5em; font-size:110%; font-weight:bold;"> To note</div><br />
<div style="padding:0.4em 1em 0.3em 1em;"><br />
<br />
==== ====<br />
<!-- ******************Edit start********************* -----><br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<br />
</div><br />
|}<br />
<br />
<!-- ****************End Requests****************** -----><br />
<br />
|}</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Operations_Bulletin_LatestOperations Bulletin Latest2015-09-14T14:27:59Z<p>Matthew Doidge 1ac9bd3994: /* */</p>
<hr />
<div>[[Operations_Bulletin_Archive|Bulletin archive]]<br />
<br />
__NOTOC__<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Week commencing 14th September 2015<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Task Areas<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start General****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | General updates <br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start General text*********************** ----->'''<br />
'''Tuesday 1st September'''<br />
* From October/November, the EGI ops VO monitoring will be performed using RFC proxies, as opposed to legacy proxies.<br />
* There will be a new filter for the critical profile for ATLAS WLCG SAM tests so that only production endpoints will be tested and taken into account for site availability metrics. This will be available from the SAM3 interface.<br />
* CNAF: Due to a fire causing problems with one electrical supply line that happened last Thursday, the computing centre is running at lower capacity (around 30% less of the pledged capacity).<br />
* Machine job features testing has hit a big according to Raul!<br />
* The HNSciCloud PCP pilot proposal successful submission (refer to Andrew Sansum's talk at GridPP34 if you forget what this means!). The project intends to procure commercial cloud resources for FY17 and FY18. We will contribute 75K euro towards this activity and the EU will then top up to 250K euro.<br />
* There was an EGI Operations Management Board last Thursday. There are no summary notes yet, but please take a look at the [https://indico.egi.eu/indico/conferenceDisplay.py?confId=2379 agenda] and linked talks (may be worth skimming them at the ops meeting).<br />
* There was a quick request/reminder for sites to please update their [https://www.gridpp.ac.uk/wiki/IPv6_site_status IPv6 entries in the GridPP wiki].<br />
* RAL is closed on Monday and Tuesday of this week!<br />
<br />
<br />
<br />
'''Tuesday 24th August'''<br />
* A [https://indico.cern.ch/event/319751/ draft agenda for the September GDB] is taking shape. Any Tier-2 rep volunteers this month please email Jeremy.<br />
* Interest in HTCondor CE. (Ops thread 20/08).<br />
<br />
<br />
'''Tuesday 18th August'''<br />
* Root CA - Jens will provide a quick incident report of what went wrong!<br />
* [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGDailyMeetingsWeek150817#Monday Notes from Monday's WLCG ops meeting] are available. <br />
* DPM tuning (raised my Matt D - 17/08)<br />
* [http://cds.cern.ch/journal/CERNBulletin/2015/32/News%20Articles/2038500?ln=en Developers@CERN Forum] is an initiative of a group of developers targeting all software developers at CERN.<br />
* Multicore inefficiencies: RAL - CMS allocated slots not used. Glasgow - the concern for us is CPU that was previously being used by ATLAS is no longer being used because of multicore size mismatches. <br />
* [https://labs.ripe.net/atlas/user-experiences RIPE ATLAS probe experience pages]. <br />
* Automating ATLAS consistency checking (See Alastair's email 12/08)<br />
* For generating docs out of markdown Steve T can recommend gitbook. This is what CERN use for the [http://cern.ch/config Config guide].<br />
<br />
<br />
<!-- **********************End General text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End General****************** -----><br />
<!-- ****************Start ops coord****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Operations Coordination - [https://indico.cern.ch/categoryDisplay.py?categId=4372 Agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start ops coord text*********************** -----><br />
'''Tuesday 25th August'''<br />
* [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150820 Minutes from the meeting last Thursday] are available.<br />
* Highlights: <br />
* Calling for volunteers to write articles on interesting topics for WLCG Operations. There is a [http://wlcg-ops.web.cern.ch/articles section in the portal] for this. In case you are interested, please send a mail to wlcg-ops-coord-chair people for details.<br />
* CERN is going to disable in few weeks write operations via RFIO v2 to Castor in the context of the RFIO access decommission.<br />
* A downtime for the Argus service @ CERN is expected due to the pending filer migration. No dates yet but it will affect ARGUS, all CEs and all batch worker nodes (glExec) running GridJobs. The downtime is foreseen to last 1h-2h.<br />
* Issues with VMs in Tier-0 infra also reported by CMS<br />
* LHCb is trying to integrate submission to the HTCondorCE instance @ CERN<br />
* A message broker pre-prod infra has been setup @ CERN to enable distribution of perfSONAR data to the experiments. OSG is enabling the data publication from the ITB collector service<br />
* Please observe the Action List in the minutes and on the WLCG Operations portal by clicking on the Task List relevant to: [https://wlcg-ops.web.cern.ch/sys-admins/tasks Sites]: [https://wlcg-ops.web.cern.ch/experiments/tasks Experiments].<br />
<br />
'''Tuesday 18th August'''<br />
* Pretty quiet! [https://indico.cern.ch/event/393614/ Next ops coordination meeting is on 20th].<br />
<br />
<br />
<!-- **********************End ops coord text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End ops coord****************** -----><br />
<br />
<!-- ****************Start T1****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-1 - [http://www.gridpp.rl.ac.uk/status/ Status Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start T1 text*********************** -----><br />
'''Tuesday 1st September'''<br />
A reminder that there is a weekly [http://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Tier-1 experiment liaison meeting]. Notes from the last meeting [https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2015-08-26 here]<br />
* Note RAL closed both Monday and Tuesday 31st Aug. and 1st Sep.<br />
* The post-mortem review of the network incident on the 8th April has been finalised. See [https://www.gridpp.ac.uk/wiki/RAL_Tier1_Incident_20150408_network_intervention_preceding_Castor_upgrade here].<br />
* The Castor disk servers are being upgraded to SL6 (26/27 Aug).<br />
* Entries declared in GOC DB for a brief network interruption (Thursday 3rd Sep) and a series of Castor interventions as the back-end Oracle database is upgraded to a later version.<br />
* The MICE experiment will be taking data daily from the 8th September, when the ISIS neutron source restarts operation. This data will be stored at the Tier1.<br />
<!-- **********************End T1 text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End T1****************** -----><br />
<br />
<!-- ****************Start Storage & DM****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Storage & Data Management - [http://storage.esc.rl.ac.uk/weekly/ Agendas/Minutes]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150902-minutes.txt Wednesday 02 Sep]'''<br />
* Catch up with [http://www3.imperial.ac.uk/highenergyphysics/research/experiments/mice MICE]<br />
* How to do transfers of '''lots''' of files with FTS3 without the proxy timing out (in particularly if you need it vomsified)<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150812-minutes.txt Wednesday 12 Aug]'''<br />
* sort of housekeeping: data cleanups, catalogue synchronisation - in particular namespace dumps for VOs<br />
* GridPP storage/data at future events; GridPP35 and Hepix and Cloud data events<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150708-minutes.txt Wednesday 08 July]'''<br />
* Huge backlog of ATLAS data from Glasgow waiting to go to RAL, and oddly varying performance numbers - investigating<br />
* How physics data is like your Windows 95 games<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150701-minutes.txt Wednesday 01 July]'''<br />
* Feedback on CMS's proposal for listing contents of storage<br />
* Simple storage on expensive raided disks vs complicated storage on el cheapo or archive drives?<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150624-minutes.txt Wednesday 24 June]'''<br />
* Heard about the Indigo datacloud project, a H2020 project in which STFC is participating<br />
* Data transfers, theory and practice<br />
** Somewhat clunky tools to set up but perform well when they run<br />
** Will continue to work on recommendations/overview document<br />
** Worth having recommendations/experiences for different audiences - (potential) users, decision makers, techies<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End Storage & DM****************** -----><br />
<!-- ****************Start Accounting****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Accounting - [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html UK Grid Metrics] [[HEPSPEC06]] [http://tinyurl.com/cfbfanf Atlas Dashboard HS06]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Monday 20th July'''<br />
* Oxford publishing 0 cores from Cream today. Maybe they forgot to switch one off. [http://goc-accounting.grid-support.ac.uk/apel/jobs2_withsubmithost.html Check here]. <br />
<br />
'''Tuesday 14th July'''<br />
* QMUL and Sheffield appear to be lagging with publishing by a week. <br />
* Please check your multicore publishing status (especially those sites mentioned in June).<br />
<br />
'''Tuesday 16th June'''<br />
* Region not publishing accounting by number of cores.<br />
** "0" core submission hosts:<br />
** ce3.dur.scotgrid.ac.uk<br />
** ce4.dur.scotgrid.ac.uk<br />
** cetest02.grid.hep.ph.ic.ac.uk<br />
** hepgrid5.ph.liv.ac.uk<br />
** hepgrid6.ph.liv.ac.uk<br />
** hepgrid97.ph.liv.ac.uk<br />
** svr009.gla.scotgrid.ac.uk<br />
** t2ce06.physics.ox.ac.uk<br />
<br />
'''Tuesday 9th June'''<br />
* Delay noted for Sheffield<br />
<br />
<br />
* A reminder to keep updating the [https://www.gridpp.ac.uk/wiki/HEPSPEC06 HEPSPEC06 tables].<br />
<br />
* [http://accounting.egi.eu/egi.php?SubRegion=1.67&query=normcpu&startYear=2014&startMonth=8&endYear=2014&endMonth=9&yRange=SITE&xRange=VO&voGroup=lhc&chart=GRBAR&scale=LIN&localJobs=onlygridjobs APEL status]: An issue at Sheffield?<br />
<br />
<!-- *****************Edit stop*********************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End Accounting****************** -----><br />
<!-- ****************Start Documentation****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Documentation - [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=area KeyDocs]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- *********************************************************** -----><br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 18th August'''<br />
* Lydia's document - Setup a system to do data archiving using FTS3<br />
<br />
'''Tuesday 28th July'''<br />
* Ewan: /cvmfs/gridpp-vo help ... there's a lot of historical stuff on the GridPP wiki that makes it look a lot more complicated than it is now. We really should have a bit of a clear out at some point.<br />
<br />
'''Tuesday 23rd June'''<br />
* Reminder that documents need reviewing!<br />
<br />
<br />
'''General note'''<br />
<br />
See the [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=reviewed worst KeyDocs list] for documents needing review now and the names of the responsible people.<br />
<br />
|}<br />
<br />
<!-- ****************End Documentation****************** -----><br />
<!-- ****************Start Interop****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Interoperation - [https://wiki.egi.eu/wiki/Grid_Operations_Meetings EGI ops agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Monday 13th July'''<br />
<br />
* There was an EGI Ops meeting today: [https://wiki.egi.eu/wiki/Agenda-13-07-2015 agenda]<br />
* URT/UMD updates:<br />
** UMD 3.13.0 released on 01.07.2015: http://repository.egi.eu/2015/07/01/release-umd-3-13-0/<br />
*** APEL 1.4.1<br />
*** Argus PAP v. 1.6.2<br />
*** gLExec-wn - v. 1.2.3 (lcmaps and mkdir)<br />
*** storm 1.11.8<br />
*** fetch-crl 3.0.16<br />
*** cream 1.16.5<br />
*** dpm-xroot 3.5.2<br />
*** Xroot 4.1.1.<br />
*** Frontier Squid 2.7.24<br />
*** CVMFS 2.1.20<br />
*** GFAL2 2.8.4<br />
*** GFAL2-PYTHON 1.7.1<br />
** UMD 3.13.1 released on 13.07.2015: http://repository.egi.eu/2015/07/13/release-candidate-umd-3-13-1-rc1/ (link was not updated correctly during release)<br />
*** ARC Nagios probes 1.8.3<br />
<br />
* SR updates (small because it's summer):<br />
*** gfal2 2.9.1<br />
*** storm 1.11.9<br />
*** srm-ifce 1.23.1....<br />
*** gfal2-python 1.8.1<br />
** In Verification<br />
*** gfal2-plugin-xrootd 0.3.4<br />
<br />
* Accounting<br />
** [John Gordon] "Of the WLCG sites we now have 97%+ of cpu reported with cores. I expect you all saw my recent email to GDB naming 16 sites. If one German and one Spanish site and the four Russians start publishing we will jump to 99%+"<br />
** New list of sites needing to update multicore accounting being prepared this evening (Monday) by Vincenzo<br />
<br />
* SL5 decommissioning date March 2016; <br />
* Next meeting 10th August<br />
<br />
'''Monday 15th June'''<br />
* There was an EGI operations meeting today: [https://wiki.egi.eu/wiki/Agenda-15-06-2015 agenda]. <br />
* New Action: for the NGIs: please start tracking which sites are still using SL5 services: how many services, and for each service if still needed on SL5, if upgrades on SL5 services are expected). [https://wiki.egi.eu/wiki/SL5_retirement A wiki has been provided to record updates]. Also interesting to understand who is using Debian.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- *********************************************************** -----><br />
<br />
|}<br />
<!-- ****************End Interop****************** -----><br />
<!-- ****************Start Monitoring****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Monitoring - [https://www.gridpp.ac.uk/wiki/Links_Monitoring_pages Links] [http://grid-monitoring.cern.ch/mywlcg/ MyWLCG]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 16th June'''<br />
* F Melaccio & D Crooks decided to add a [https://www.gridpp.ac.uk/wiki/Monitoring_FAQs FAQs section] devoted to common monitoring issues under the monitoring page.<br />
* Feedback welcome. <br />
<br />
<br />
<br />
''' Tuesday 31st March<br />
<br />
* Glasgow Graphite/Grafana documentation: http://www.scotgrid.ac.uk/graphite/<br />
<br />
'''Monday 7th December<br />
<br />
* Meeting last Friday - agenda: https://indico.cern.ch/event/356853/ minutes: https://indico.cern.ch/event/356853/material/minutes/1.pdf<br />
* This was the wrap-up meeting of the consolidation TF; the mailing list will remain extant for a while yet.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Monitoring****************** -----><br />
<!-- ****************On-duty****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | On-duty - [https://operations-portal.in2p3.fr/dashboard Dashboard] [https://www.gridpp.ac.uk/wiki/ROD_rota ROD rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 1st September'''<br />
* Gordon was on shift.<br />
* Another very quiet week with no new tickets. We have two open ROD tickets, both of which are for A/R; one is against Cambridge, and the other is the now 53-day-old ticket at UCL.<br />
* Next up.... Kashif again (thanks Kashif!)<br />
<br />
'''Monday 24th August'''<br />
* Kashif on shift.<br />
* There were quite a few alarms throughout the week and many tickets were opened. All of the tickets were fixed within time limit. <br />
* The certificate of the Argus server at sheffield expired but Elena got a new certificate quickly. <br />
* Cambridge and UCL have low availibilty tickets and not much can be done about it except waiting for availibilty to reach 90%.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Rollout [http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html Status] [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions WLCG Baseline]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 12th May'''<br />
* MW Readiness WG meeting Wed May 6th at 4pm. Attended by Raul, Matt, Sam and Jeremy.<br />
<br />
'''Tuesday 17th March'''<br />
* Daniela has updated the [ https://www.gridpp.ac.uk/wiki/Staged_rollout_emi3 EMI-3 testing table]. Please check it is correct for your site. We want a clear view of where we are contributing.<br />
* There is a middleware readiness meeting this Wednesday. Would be good if a few site representatives joined.<br />
* Machine job features solution testing. Fed back that we will only commence tests if more documentation made available. This stops the HTC solution until after CHEP. Is there interest in testing other batch systems? Raul mentioned SLURM. There is also SGE and Torque.<br />
<br />
'''References'''<br />
<br />
* Staged Rollout pages (now separated into EMI1 & 2), and the page listing the deployed versions is extractable from the bdii, so they should all be reasonably up-to-date:<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout_emi2.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End On-duty****************** -----><br />
<!-- ****************Start Security****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Security - [http://www.gridpp.ac.uk/security/inchand/Incident.html Incident Procedure] [http://www.gridpp.ac.uk/security/policies/index.html Policies] [https://www.gridpp.ac.uk/wiki/SD_rota Rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
'''Tuesday 1st September'''<br />
* The IGTF has released an [ https://rt.egi.eu/rt/Ticket/Display.html?id=9406 urgent update to the trust anchor repository (1.67)]<br />
* Linda is working on a revision to the EGI Technology Questionnaire.<br />
<br />
'''Tuesday 24th August'''<br />
* [https://wiki.egi.eu/wiki/SVG:Advisory-SVG-2015-9323 EGI SVG Advisory "Moderate" RISK - dCache EGI-SVG-2015-9323]<br />
* [https://wiki.egi.eu/wiki/SVG:Advisory-SVG-2014-7159 EGI SVG Advisory 'Low' RISK - VOMs Potential DoS EGI-SVG-2014-7159]<br />
* EGI IGTF CA update, version 1.66-1 ticket created [EGI #9351], due August 31st.<br />
<br />
'''Tuesday 18th August'''<br />
* CVE-2015-3245 (libuser) - EGI-CSIRT processing ~50 sites remaining. None in UK.<br />
* glexec for small vos - summary doc in progress by IanN. Circulated to sec. team for sanity check. More work needed. <br />
<br />
<br />
The EGI [https://operations-portal.egi.eu/csiDashboard security dashboard].<br />
<br />
|}<br />
<!-- ****************End Security****************** -----><br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start Services****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Services - [http://netmon02.grid.hep.ph.ic.ac.uk:8080/maddash-webui/index.cgi PerfSonar dashboard] | [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).<br />
<br />
'''Tuesday 18th August'''<br />
* Next [https://indico.cern.ch/event/401680/ LHCOPN and LHCONE joint meeting]: Science Park Amsterdam (NL) 28-29 of October 2015<br />
<br />
'''Tuesday 14th July'''<br />
* GridPP35 in September will have a part focus on networking and IPv6. This will include a review of where sites are with their deployment. Please try to firm up dates for your IPv6 availability between now and September. Please update the [https://gridpp.ac.uk/wiki/IPv6_site_status GridPP IPv6 status table]. <br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Services****************** -----><br />
<!-- ****************Start Tickets****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tickets<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tickets****************** -----><br />
<!-- ****************Start Tools****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tools - [https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] [https://gridppnagios.physics.ox.ac.uk/nagios/ Nagios]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== ===== <br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 09 June 2015'''<br />
*ARC CEs were failing nagios test becuase of non-availability of egi repository. Nagios test compare CA version from EGI repo. It started on 5th June and one of the IP addresses behind webserver was not responding. Problem went away in approximately 3 hours. The same problem started again on 6th June. Finally it was fixed on 8th June. No reason was given in any of the ticket opened regarding this outage. <br />
<br />
'''Tuesday 17th February'''<br />
* Another period where message brokers were temporarily unavailable seen yesterday. Any news on the last follow-up?<br />
<br />
'''Tuesday 27th January'''<br />
* Unscheduled outage of the EGI message broker (GRNET) caused a short-lived disruption to GridPP site monitoring (jobs failed) last Thursday 22nd January. Suspect BDII caching meant no immediate failover to stomp://mq.cro-ngi.hr:6163/ from stomp://mq.afroditi.hellasgrid.gr:6163/<br />
<br />
* [http://southgrid.blogspot.co.uk/2014/10/nagios-monitoring-for-non-lhc-vos.html Blog about VO Nagios]<br />
* [https://vo-nagios.physics.ox.ac.uk/nagios/ Oxford VO Nagios] currently monitoring gridpp, pheno, t2k.org, snoplus.snolab.ca, vo.southgrid.ac.uk.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tools****************** -----><br />
<!-- ****************Start VOs****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | VOs - [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS] [http://operations-portal.egi.eu/vo VO IDs] [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs Approved] [http://pprc.qmul.ac.uk/~walker/votable.html VO table]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 19th May'''<br />
* There is a current priority for enabling/supporting our joining communities. <br />
<br />
'''Tuesday 5th May'''<br />
* We have a number of VOs to be removed. Dedicated follow-up meeting proposed.<br />
<br />
'''Tuesday 28th April'''<br />
* For SNOPLUS.SNOLAB.CA, the port numbers for voms02.gridpp.ac.uk and voms03.gridpp.ac.uk have both been updated from 15003 to 15503.<br />
<br />
'''Tuesday 31st March'''<br />
* LIGO are in need of additional support for debugging some tests.<br />
* LSST now enabled on 3 sites. No 'own' CVMFS yet.<br />
<br />
* Impact<br />
** Citation policy (https://www.gridpp.ac.uk/acknowledging.html)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End VOs****************** -----><br />
<!-- ****************Start Sites****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Site Updates<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 24th February'''<br />
* Next review of status today.<br />
<br />
'''Tuesday 27th January'''<br />
* Squids not in GOCDB for: UCL; ECDF; Birmingham; Durham; RHUL; IC; Sussex; Lancaster<br />
* Squids in GOCDB for: EFDA-JET; Manchester; Liverpool; Cambridge; Sheffield; Bristol; Brunel; QMUL; T1; Oxford; Glasgow; RALPPD.<br />
<br />
'''Tuesday 2nd December''' <br />
* [https://www.gridpp.ac.uk/wiki/Batch_system_status Multicore status]. Queues available (63%)<br />
** YES: RAL T1; Brunel; Imperial; QMUL; Lancaster; Liverpool; Manchester; Glasgow; Cambridge; Oxford; RALPP; Sussex (12)<br />
** NO: RHUL (testing); UCL; Sheffield (testing); Durham; ECDF (testing); Birmingham; Bristol (7)<br />
<br />
* According to our [https://www.gridpp.ac.uk/wiki/Batch_system_status table] for cloud/VMs (26%)<br />
** YES: RAL T1; Brunel; Imperial; Manchester; Oxford (5)<br />
** NO: QMUL; RHUL; UCL; Lancaster; Liverpool; Sheffield; Durham; ECDF; Glasgow; Birmingham; Bristol; Cambridge; RALPP; Sussex (14)<br />
<br />
* [http://www.gridpp.ac.uk/php/gridpp-dirac-sam.php?action=viewlcg GridPP DIRAC jobs] successful (58%)<br />
** YES: Bristol; Glasgow; Lancaster; Liverpool; Manchester; Oxford; Sheffield; Brunel; IC; QMUL; RHUL (11)<br />
** NO: Cambridge; Durham; RALPP; RAL T1 (4) + ECDF; Sussex; UCL; Birmingham (4)<br />
<br />
* [https://www.gridpp.ac.uk/wiki/IPv6_site_status IPv6 status]<br />
** Allocation - 42%<br />
** YES: RAL T1; Brunel; IC; QMUL; Manchester; Sheffield; Cambridge; Oxford (8)<br />
** NO: RHUL; UCL; Lancaster; Liverpool; Durham; ECDF; Glasgow; Birmingham; Bristol; RALPP; Sussex<br />
<br />
* Dual stack nodes - 21%<br />
** YES: Brunel; IC; QMUL; Oxford (4)<br />
** NO: RHUL; UCL; Lancaster; Glasgow; Liverpool; Manchester; Sheffield; Durham; ECDF; Birmingham; Bristol; Cambridge; RALPP; Sussex, RAL T1 (15)<br />
<br />
<br />
<br />
'''Tuesday 21st October'''<br />
* High loads seen in xroot by several sites: Liverpool and RALT1... and also Bristol (see Luke's TB-S email on 16/10 for questions about changes to help).<br />
<br />
'''Tuesday 9th September'''<br />
* Intel announced the new generation of Xeon based on Haswell.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Sites****************** -----><br />
|}<br />
<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Meeting Summaries<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start PMB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Project Management Board - [http://www.gridpp.ac.uk/pmb/ Members][http://www.gridpp.ac.uk/php/pmb/minutes.php Minutes] [http://www.gridpp.ac.uk/pmb/ProjectManagement/QuarterlyReports/reports.html Quarterly Reports]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ******************************************* -----><br />
<!-- ****************End PMB****************** -----><br />
<!-- ****************Start Grid Ops****************** -----><br />
<!-- ******************************************* -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | GridPP ops meeting - [http://indico.cern.ch/categoryDisplay.py?categId=338 Agendas] [https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Actions] [https://www.gridpp.ac.uk/wiki/Category:GridPP_Operations Core Tasks]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Grid Ops****************** -----><br />
<!-- ****************Start T1 liaison****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) [https://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Agenda] Meeting takes place on Vidyo.<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Wednesday 8th July 2015'''<br />
[https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2015-07-08 Operations report]<br />
* Lots of preparation for the RAL Open Days. These start today (8th) and culminate in the public day on Saturday (11th).<br />
* Intervention on faulty router being prepared for 4th August.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Liaison****************** -----><br />
<!-- ****************Start GDB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Grid Deployment Board - [http://indico.cern.ch/categoryDisplay.py?categId=3l181 Agendas] [http://indico.cern.ch/categoryDisplay.py?categId=666 MB agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End GDB****************** -----><br />
<br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start NGI****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | NGI UK - [http://www.ukngi.ac.uk/ Homepage] [https://ca.grid-support.ac.uk/cgi-bin/pub/pki?cmd=getStaticPage&name=index CA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End NGI****************** -----><br />
<br />
<!-- ****************Start Events****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Events<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************Stop Events****************** -----><br />
<!-- ****************Start ATLAS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK ATLAS - [http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?#currentView=Shifter+view&highlight=false Shifter view] [http://www.atlas.ac.uk/ops/ News & Links]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Atlas S&C week 2-6 Feb 2015'''<br />
<br />
Production<br />
<br />
• Prodsys-2 in production since Dec 1st<br />
<br />
• Deployment has not been transparent , many issued has been solved, the grid is filled again <br />
<br />
• MC15 is expected to start soon, waiting for physics validations, evgen testing is underway and close to finalised.. Simulation expected to be broadly similar to MC14, no blockers expected. <br />
<br />
Rucio<br />
<br />
• Rucio in production since Dec 1st and is ready for LHC RUN-2. Some fields need improvements, including transfer and deletion agents, documentation and monitoring. <br />
<br />
• [https://rucio-ui.cern.ch/dumps Rucio dumps available].<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMDarkDataCleaning Dark data cleaning]<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMLostFilesLost files declaration] . Only Only DDM ops can issue lost files declaration for now, cloud support needs to fill a ticket.<br />
<br />
• Webdav panda functional tests with Hammercloud are ongoing<br />
<br />
Monitoring<br />
<br />
• [http://adc-monitoring.cern.ch/ Main page]<br />
<br />
• [http://dashb-atlas-ddm-acc.cern.ch/dashboard/request.py/ddmaccounting DDM Accounting]<br />
<br />
• [http://atlas-agis.cern.ch/agis/ddmblacklisting/listGroup space]<br />
<br />
• [http://dashb-atlas-ddm.cern.ch/ddm2/ Deletion]<br />
<br />
ASAP<br />
<br />
• ASAP (ATLAS Site Availability Performance) in place. Every 3 months the T2s sites performing BELOW 80% are [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/ATLASSiteCategorization reported to the International Computing Board.]<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End ATLAS****************** -----><br />
<!-- ****************Start CMS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK CMS<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End CMS****************** -----><br />
<!-- ****************Start LHCb****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK LHCb<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End LHCb****************** -----><br />
<!-- ****************Start Other****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK OTHER<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Other****************** -----><br />
<!-- ****************Start Requests****************** -----><br />
{| cellspacing="0" cellpadding="0" style="margin: 1em 0em 0em 0.3em; width:100%"<br />
| style="width:50%; vertical-align:top; border:1px solid Gold; background-color: LightGreen;" rowspan="2"|<br />
<div style="border-bottom:1px solid Gold; background-color:#ffffaa; padding:0.2em 0.5em 0.2em 0.5em; font-size:110%; font-weight:bold;"> To note</div><br />
<div style="padding:0.4em 1em 0.3em 1em;"><br />
<br />
==== ====<br />
<!-- ******************Edit start********************* -----><br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<br />
</div><br />
|}<br />
<br />
<!-- ****************End Requests****************** -----><br />
<br />
|}</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/Past_Ticket_Bulletins_2015Past Ticket Bulletins 20152015-09-14T14:27:02Z<p>Matthew Doidge 1ac9bd3994: </p>
<hr />
<div>'''Monday 7th September 2015, 15.00 BST'''<br /><br />
20 Open UK Tickets this week.<br />
Due to Interesting Downtimes (apologies for reusing that pun!) yet another fairly light review. But not much is going on in ticketland.<br />
<br />
[http://tinyurl.com/nwgrnys THE GGUS TICKETS IN ALL THEIR GLORY]<br /><br />
The Sno+ ticket [https://ggus.eu/?mode=ticket_info&ticket_id=115805 115805] is interesting, Sno+ are looking at monitoring jobs submitted via WMS through the port 9000 URLS, but the RAL WMS behaves differently from the Glasgow one and doesn't let others with the same roles look at the links. Sno+ are still "developing" their grid infrastructure with the WMS in mind by the looks of it.<br />
<br />
The pilot role at Sheffield ticket [https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] could still do with some attention, or an update.<br />
<br />
Bristol had a ticket from CMS that looks interesting - [https://ggus.eu/?mode=ticket_info&ticket_id=115883 115883]. The CMS SAM3 tests are confused by Bristol having an SRM-less SE endpoint. Waiting for reply after things were clarified.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 T'Other VO Nagios Page]<br /><br />
A few "The CREAM service cannot accept jobs at the moment" style errors at QM, but they're only a few hours old. Otherwise looking alright beyond the usual noise.<br />
<br />
Of course with these light reviews I could well be missing something, so feel free to let me know - sites or VO representatives.<br />
<br />
'''Monday 24th August 2015, 15.45 BST'''<br /><br />
<br />
21 Open UK tickets this week, most being looked at or are understood. There will be no ticket update from Matt next week (1st September) either, as he will be flapping about during a local downtime.<br />
<br />
[http://tinyurl.com/nwgrnys YE OLDE GGUS TICKET LINK]<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] (gridpp pilots at Sheffield) could do with an update. The Liverpool ticket discussed last week ([https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248]) has received an update from the user saying that the ticket can be closed. As Steve mentioned last week the underlying issue is still very much there, but I don't think this ticket is a suitable banner for us to fight that battle under.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 T'Other VO Nagios Page]<br /><br />
As usual nothing to see here at time of writing, sites are doing a grand job of working for the monitored VOs.<br />
<br />
And that's all folks! See you in Liverpool.<br />
<br />
<br />
'''Monday 17th August 2015, 15.00 BST'''<br /><br />
29 Open UK Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114992 114992] (10/7)<br /><br />
It looks like this CMS transfer problem ticket can be closed after a user update last week, which reported enabling multiple streams solved the initial failures. In progress (13/8)<br />
<br />
'''EFDA-JET'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115613 115613] (assigned) ''Update- looks like this was a temporary problem, and the ticket can be closed.<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115448 115448] (in progress, but empty)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115496 115496] (in progress, but might not be a site problem).<br /><br />
Jet have 3 biomed tickets, 2 of which are looking a little neglected.<br />
<br />
'''CAMBRIDGE'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115655 115655] (12/8)<br /><br />
John rightfully asks why does a lengthy but very scheduled downtime set off a ROD alarm. Other then that it'll be worth "on-holding" this ticket whilst the unrighteous red flag fades. In progress (17/8) ''Update - On holded''<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
This Sno+ ticket looks like it needs some chasing up, no news for nearly 2 months. In progress (21/7) ''Update - Steve commented on this on TB-SUPPORT''<br />
<br />
'''Monday 10th August 2015, 15.00 BST'''<br /><br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 T'Other VO Nagios] looks alright at time of writing.<br />
<br />
24 Open UK tickets this week.<br />
<br />
''Lots of activity on GGUS since yesterday. Most positive, but I see the number of tickets at EFDA-JET increasing - all three from biomed.''<br />
<br />
'''Tier 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115512 115512] (5/8)<br /><br />
An interesting ticket - where a banned user is still banned after moving to LHCB from Biomed (no, he's not called Heinz). In Progress (6/8) ''Update - Andrew has asked that the ticket be reassigned to the argus devs as the pepd logs are showing oddness. Waiting for reply now.''<br />
<br />
'''Bristol'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115565 115565] (7/8)<br /><br />
Bristol's phedex agents are down, and have been for a few days. I might have dreamt this, but thought that the Bristol Phedex service might not be hosted at Bristol, especially after RALPP had a similar ticket ([https://ggus.eu/?mode=ticket_info&ticket_id=115566 115566]) at the same time. Assigned (7/8) ''Update - solved''<br />
<br />
'''Manchester'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115504 115504] (ROD Ticket) ''Solved'' <br /> <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115399 115399] (Wiki Ticket) ''Still Open'' <br /> <br />
Both these tickets look like they can be closed.<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115525 115525] (5/8)<br /><br />
Atlas deletion errors after a a disk server fell over- nothing wrong with the ticket handling, but Alessandra brings up a point that always niggles me - the emphasis on the total number of transaction errors and not the number of affected unique files. In progress (8/8)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB ticket sparked by those IPv6 problems. Still no word from Vladimir; Raja - could you comment? I suspect there's been plenty of room for LHCB jobs in QM's (and everyone else who mainlines atlas jobs) queues this weekend. Waiting for reply (21/7)<br />
<br />
'''Monday 3rd August 2015, 14.30 BST'''<br /><br />
<br />
23 Open tickets this month, full review time.<br />
<br />
'''''Newish this morning'''''<br /><br />
As discussed on TB-SUPPORT, a few sites have been getting "I can't lcg-tag your CE" tickets from Biomed. The Liverpool ticket [https://ggus.eu/?mode=ticket_info&ticket_id=115449 115449], solved and verified, was the flagship of these issues. Brunel ([https://ggus.eu/?mode=ticket_info&ticket_id=115445 115445]) and EFDA-JET ([https://ggus.eu/?mode=ticket_info&ticket_id=115448 115448]) also have tickets about this.<br />
<br />
<br />
'''Sno+ "glite-wms-job-status warning"''' (3/8)<br /><br />
Glasgow: [https://ggus.eu/?mode=ticket_info&ticket_id=115435 115435]<br /><br />
Tier 1: [https://ggus.eu/?mode=ticket_info&ticket_id=115434 115434]<br /><br />
Matt M submitted these tickets to Glasgow and the Tier 1 after having trouble with a proportion of Sno+ jobs. Both are being looked at- definitely worth collaborating on this one.<br />
<br />
'''Wiki-leak...'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115399 115399](31/7)<br /><br />
Jeremy noticed that the wiki didn't work for him on Friday - but it seems to work for Jeremy, Alessandra and myself now. As Jeremy notes the ticket can be closed, but out of interest did anyone else spot any problems? In progress (3/8)<br />
<br />
'''Spare the ROD...'''<br /><br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115433 115433] (3/8)<br /><br />
Some CE problems noticed on the dashboard for the Liver-lads - who might be in mourning. Assigned (3/8) ''Update - aaannd Solved by upping gridftp max connections''<br />
<br />
'''RALPP & UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114764 114764]<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114851 114851]<br /><br />
Both of these are "availability" alarm tickets, on-holding until they clear. I hope RALPP managed to get a re-computation for their unfair failures (IGTF-1.65 problems on ARC).<br />
<br />
'''Sno+'d Under'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115387 115387] (30/7)<br /><br />
I'm uncertain if this Sno+ ticket, probably somewhat related to Matt M's recent thread on TB-SUPPORT and concerning xrootd access, is meant for the Tier 1 or RALPP. Assigned (3/8)<br />
<br />
'''First Tier Problems.'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115417 115417] (2/8)<br /><br />
LHCB spotted a number of nodes with cvmfs problems at the Tier 1, which the RAL team had already jumped on and repaired this morning. They wonder if the problem persists. Waiting for reply (3/8)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115290 115290] (28/7)<br /><br />
An FTS problem requiring some special CA magic to solve, but the current CA-wizard isn't about. On hold (29/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113836 113836] (20/5)<br /><br />
Glue 1 vs Glue 2 queue mismatches. It's being worked on perfecting cluster publishing for ARC CEs, but the ticket could either do with an update or on-holding. In progress (24/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114992 114992] (10/7)<br /><br />
CMS transfers failing between RAL and, err, TAMU in the US. Assigned to RAL, where Brian has investigating and Andrew has posed an good question, asking if the user has considered managing the transfers with FTS. Quiet on the user side. In progress (21/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/2014)<br /><br />
One of the tickets from the before times, about CMS AA access tests. It has become a long and confusing saga, but Gareth rescued it with a handy summary of the issue in his last update. How goes the battle? In progress (17/7)<br />
<br />
'''Oxford Squid is red.'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115230 115230] (24/7)<br /><br />
Which might be the colour Ewan's seeing right now! The ticket is reopened, with a comment from Alessandra that the current recommendation is to allow all CERN addresses, and asks if this is something Oxford could do. Reopened (3/8) ''Update - solved after a squid restart. It's not the end of this ordeal, but Ewan would like to tackle it in a different arena then an Oxford ticket.''<br />
<br />
'''Mavaricks and Gooses - GridPP Pilot roles.'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114485 114485] '''Bristol'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] '''Sheffield'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114442 114442] '''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114441 114441] '''RHUL'''<br /><br />
Daniela hopped right back into the pilot seat after getting back from her holidays. Bristol and RALPP are looking good, Sheffield and RHUL are still in the Danger Zone - RHUL in particular were having troubles with argus and could do with some working configs from elsewhere to compare and contrast with their own.<br /><br />
''Update - RHUL are looking better, just a few queue permission tweaks to go by the looks of it.''<br />
<br />
<br />
'''My shame - tarball glexec'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] '''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] '''Lancaster'''<br /><br />
The tarball glexec tickets. Actually this is likely to become a defunct (or at least different) problem at Edinburgh with their SL7 move. Lancaster has a plan - we plan to deploy *something* (amazing plan there Matt) during our next big reinstall in September. Between now and then I have a test CE, cluster and most importantly some time. <br />
<br />
'''Pot-luck tickets (or those I couldn't group).'''<br />
<br />
'''Durham'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br /><br />
A tiny fraction of jobs publishing 0 cores used. Looks to be a slurm oddity. Oliver upgraded their CEs to ARC5 last week and hopes this has fixed things. Fingers crossed! In progress (29/7)<br />
<br />
'''Lancaster'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/2014)<br /><br />
My other shame - Lancaster's poor perfsonar performance. It's being worked on. On Hold (should be back in progress soon) (30/7)<br />
<br />
'''Liverpool'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (21/7)<br /><br />
Sno+ production problems at Liverpool, probably due to a lack of space in the shared area. Things are back in Sno+'s court, with the submitter consulting the Sno+ gurus (I think). In progress (21/7)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB job submission problems due to the known about dual-stacking problems. Waiting for input from LHCB for a while now, as things look okay at QM now but at least check LHCB jobs still weren't running for some reason. Waiting for reply (21/7)<br />
<br />
That's all folks!<br />
<br />
'''Monday 27th July 2015, 16.10 BST'''<br /><br />
<br />
Only 20 UK Tickets this week, and many are on hold for summer holidays. I pruned a few tickets, but none are striking me as needing urgent action, so this will be brief.<br />
<br />
[http://tinyurl.com/nwgrnys Mandatory UK GGUS Link]<br /><br />
Nothing to see here really, but maybe I'm missing something? Nit-picking I see:<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (publishing problems at Durham) could still do with an update - not sure if work is progressing offline on the issue.<br />
<br />
The Snoplus ticket [https://ggus.eu/?mode=ticket_info&ticket_id=115165 115165] looks like it might be of interest for others - in it Matt M asks about tape-functionality in gfal2 tools. Brian has updated the ticket clueing us in about gfal-xattr.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 UK T'other VO Nagios]<br /><br />
A few failures here at time of writing - although only one at Brunel seems to be more then a few hours old (dc2-grid-66.brunel.ac.uk is failing pheno CE tests).<br />
<br />
Let me know if I missed ought!<br />
<br />
'''Monday 20th July 2015, 14.30 BST'''<br /><br />
27 Open UK Tickets this week.<br />
<br />
'''Brunel'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115113 115113] (17/7)<br /><br />
This Brunel Ops ticket (otherwise okay) is a good reminder that when removing CEs from the GOCDB for your site make sure to get all the services, or else you just might end up still monitored (and thus end up ticketed!). Reopened (can probably be closed soon) (20/7) ''Update - ticket solved''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114006 114006] (31/5)<br /><br />
This ticket about problems with Brunels accounting figures was looking promisingly close to being solved (at least to my layman's eyes) 3 weeks ago. Any word offline? In progress (30/6)<br />
<br />
'''Lancaster'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114845 114845] (6/7)<br /><br />
LHCB jobs were failing at Lancaster a fortnight ago, but things should have been fixed quite promptly. Are they still broken? Waiting for reply (14/7)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
A similar case for this LHCB ticket for QM (part of their ongoing dual-stack saga). Waiting for reply (13/7)<br />
<br />
Note: The related issue [https://ggus.eu/index.php?mode=ticket_info&ticket_id=115017 115017] has been "solved" pending further testing. <br />
<br />
'''Sheffield'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114649 114649] (26/6)<br /><br />
This Sno+ ticket has been in "Waiting for Reply" for a little while, no word from the user (who isn't Matt M). Could we poke Sno+ though another channel about this? Waiting for reply (6/7)<br />
<br />
'''Liverpool'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
We could do with finding out from Sno+ the state of play at Liverpool too, although we might have to wait until Steve is back from his hols later this week to field any replies. In progress (17/6)<br />
<br />
'''Durham'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br /><br />
Ticket concerning the small percentage of Durham jobs that aren't publishing their core count (probably a slurm oddity). Now that Oliver's back from holiday has he had time to look at this? On Hold (19/6)<br />
<br />
'''Tier 1'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=113836 (20/5)<br /><br />
I suspect that whilst the work described chugs along in the background we consider on holding this ticket. In progress (24/6)<br />
<br />
'''UCL'''<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=114746 114746] (30/6)<br /><br />
Ben has been battling getting his DPM working, and spotted an interesting problem where SELinux was blocking the httpd from accessing mysql. It's nice to see someone not just switching SELinux off (like I have a habit of doing...). In progress (20/7)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=115003 115003] (12/7)<br /><br />
Andy having some problems on a test SE at ECDF - he seems ot be suffering a series of unfortunate errors. Maybe the storage group could help? In progress (17/7)<br />
<br />
'''GridPP Pilot Roles.'''<br /><br />
Bristol are ready for testing, Govind discovered a possible bug in argus that needs a bit more testing at RHUL. Things seem quiet at RALPP and Sheffield, and Brunel too.<br />
<br />
''Supplemental - In the region multicore publishing ticket ([https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233]) only Oxford have a CE still appearing to publish 0 cores - but I thought this CE was Old-Yeller'ed?''<br />
<br />
'''Tuesday 14th July 2015, 9.30 BST'''<br />
<br />
Lazy update today, due to some fun and games at Lancaster yesterday. <br />
<br />
[http://tinyurl.com/nwgrnys UK GGUS Tickets]<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios]<br />
<br />
'''Tickets that pop:'''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114952 114952] and [https://ggus.eu/?mode=ticket_info&ticket_id=114951 114951] are both atlas frontier tickets at RALPP and Oxford, both have been reopened - although the underlying issues seem different. The Oxford ticket is similar to one at RAL ([https://ggus.eu/?mode=ticket_info&ticket_id=114957 114957]), which looks to be caused by an unannounced change in IP for some important atlas squids (AIUI - speed reading the tickets this morning).<br />
<br />
'''QM IPv6 Woes'''<br /><br />
Followers of the atlas uk lists will have noticed some heroic attempts to diagnose and repair problems at QM which appear to be someone else's fault.<br />
LHCB's ticket to QM on the matter: [https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573]<br /><br />
Dan's ticket concerning the "rogue routes": [https://ggus.eu/index.php?mode=ticket_info&ticket_id=115017 115017]<br />
<br />
'''GridPP Pilot Roles'''<br /><br />
Durham, Bristol, Sheffield, Brunel and RHUL still have open tickets about this. Bristol are working on it, as are Durham - Oliver's ready for their setup to be tested again (Puppet overwrote his last changes!). Not much recent news from the other three.<br />
<br />
That's all from me folks, let me know if I missed ought!<br />
<br />
'''Tuesday 14th July 2015, 9.30 BST'''<br />
<br />
Lazy update today, due to some fun and games at Lancaster yesterday. <br />
<br />
[http://tinyurl.com/nwgrnys UK GGUS Tickets]<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios]<br />
<br />
'''Tickets that pop:'''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114952 114952] and [https://ggus.eu/?mode=ticket_info&ticket_id=114951 114951] are both atlas frontier tickets at RALPP and Oxford, both have been reopened - although the underlying issues seem different. The Oxford ticket is similar to one at RAL ([https://ggus.eu/?mode=ticket_info&ticket_id=114957 114957]), which looks to be caused by an unannounced change in IP for some important atlas squids (AIUI - speed reading the tickets this morning).<br />
<br />
'''QM IPv6 Woes'''<br /><br />
Followers of the atlas uk lists will have noticed some heroic attempts to diagnose and repair problems at QM which appear to be someone else's fault.<br />
LHCB's ticket to QM on the matter: [https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573]<br /><br />
Dan's ticket concerning the "rogue routes": [https://ggus.eu/index.php?mode=ticket_info&ticket_id=115017 115017]<br />
<br />
'''GridPP Pilot Roles'''<br /><br />
Durham, Bristol, Sheffield, Brunel and RHUL still have open tickets about this. Bristol are working on it, as are Durham - Oliver's ready for their setup to be tested again (Puppet overwrote his last changes!). Not much recent news from the other three.<br />
<br />
That's all from me folks, let me know if I missed ought!<br />
<br />
'''Monday 6th July 2015, 14.00 BST''' <br /><br />
30 Open UK Tickets this month. Looking at them all!<br />
<br />
'''NGI'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233] (10/6)<br /><br />
The UK not publishing core counts at all sites. Some progress, but at last check John G couldn't see a change for Oxford or Glasgow. In progress (30/6) ''Update - Glasgow seems to be okay after de-creaming, checking the July list we have t2ce6 at Oxford, ce3 and ce4 at Durham (see their ticket) and cetest02 at IC (but that node has test in its hostname!).''<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114442 114442] (18/6)<br /><br />
Gridpp Pilot role ticket. Accounts need to be created, but no word for a few weeks. In progress (19/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114764 114764] (1/7)<br /><br />
Ticket tracking (false) availability issues, created to appease COD - the problem caused by a broken CA rpm release for Arc CEs. Kashif has created a counter-ticket [https://ggus.eu/index.php?mode=ticket_info&ticket_id=114742 114742] Gordon's sagely advice is to submit a recalculation request once the issue is fixed. Assigned (1/7)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114485 114485] (19/6)<br /><br />
Bristol's gridpp pilot role ticket. No news, could do with an update really. In progress (22/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114426 114426] (18/6)<br /><br />
CMS AAA reading test problems. The Bristol admins have transferred data to their new shiny SE and have asked CMS to test again. No word since. Waiting for reply (30/6)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (1/7/13...)<br /><br />
Tarball glexec ticket, now 2 years old. After a really promising burst the last 6 weeks haven't seen any progress, due to a lot of other "normal" tarball work taking up the time. Sorry! On hold (18/5)<br />
<br />
'''DURHAM'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114536 114536] (22/6)<br /><br />
Durham's gridpp pilot role ticket. Not acknowledged yet, is Oliver back yet? Assigned (22/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114765 114765] (1/7)<br /><br />
See RALPP ticket 114764. Assigned (1/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114727 114727] (30/6)<br /><br />
Catalin ticketed that a number of SW_DIR variables at Durham are still pointing to the old school .gridpp.ac.uk cvmfs space. Assigned (30/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br /><br />
John G ticketed Durham over a small percentage of jobs being published as "zero core". Looks like a SLURM timeout problem, although a fix isn't obvious. Put on the back burner whilst Oliver is on holiday. On Hold (19/6)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114649 114649] (26/6)<br /><br />
A ticket from a Sno+ user about not being able to access software using the Sheffield CEs. Acknowledged but no news. In progress (26/6) ''Update - Elena can't find anything wrong, cvmfs seems to be working fine. Perhaps a problem with the environment?''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114460 114460] (18/6)<br /><br />
Sheffield's gridpp pilot role ticket. Did you get round to rolling them out? In progress (19/6)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114444 114444] (18/6)<br /><br />
LHCB ticket concerning the DPM's SRM not returning checksum information. On hold whilst a related ticket is being looked at ([https://ggus.eu/index.php?mode=ticket_info&ticket_id=111403 111403]). On Hold (22/6)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
Another Sno+ ticket, about grid production jobs failing at Liverpool. AIUI caused by Sno+ running out of space on the shared pool. At last check Steve posted the usage information for Sno+ but no word since (and Steve's off on his hols). In progress (17/6)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114845 114845] (6/7)<br /><br />
LHCB pilots failing at Lancaster. Looks like a simple node misconfiguration, hopefully fixed, waiting to see if it is. On hold (6/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/2013)<br /><br />
glexec ticket - see Edinburgh description. On hold (15/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1)<br /><br />
Bad bandwidth performance at Lancaster. Hoping that IPv6 will shake things up a bit so pushing that. On hold (18/5)<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114746 114746] (30/6)<br /><br />
SRM-put failures ROD ticket. No news at all. Assigned (30/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114851 114851] (6/7)<br /><br />
Low availability ROD ticket, related to above. Assigned (6/7)<br />
<br />
'''RHUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114441 114441] (18/6)<br /><br />
Another GridPP pilot role ticket. Pilots rolled out, but something isn't quite right and they're not working - Govind is looking again. In progress (6/7)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB ticket about two out of three QM CEs not responding for them. Dan spotted the broken CEs were dual-stacked, the working one wasn't. The ticket seemed to have trailed off into some confusion over who needs to do some testing where. I agree with Dan that that who needs to be someone with LHCB credentials! The waters still seem muddied. In progress (1/7)<br />
<br />
'''IC'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114737 114737] (30/6)<br /><br />
The IC voms wasn't updating properly, due to what I infer from the ticket as "SSL/mysql madness". Simon and Robert have been heroically battling this one - it's a good read. On hold (3/7)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114379 114379] (16/6)<br /><br />
Sam's ticket about SE support in Dirac. Sam will shortly try testing things out on the new Dirac to see how it fares. In progress (6/7)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114447 114447] (18/6)<br /><br />
Brunel's gridpp pilot ticket. Being worked on, with one CE with the pilots enabled. In progress (26/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114006 114006] (31/5)<br /><br />
A ticket from APEL, about Brunel under-reporting the number of jobs they are doing. Turned out to be a problem with Arc, which Raul upgraded to the fixed 5.0 version. The APEL team deleted the sync records, but no word since. In progress (30/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114850 114850] (6/7)<br /><br />
Another APEL ticket, likely the fallout of the previous one - it looks like GAP publishing has been left on for the Brunel CREAM CEs. Assigned (6/7) ''Update - solved''<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114786 114786] (2/7)<br /><br />
Low availability ticket - see RALPP ticket 114442 - probably could do with On holding. In progress (2/7) ''Update - Onholded''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113910 113910] (26/5)<br /><br />
Sno+ data staging problems. Brian gave some advice on how the large VOs do data staging from tape, and has asked if Sno+ still has problems. Matt M might still be on leave though. Waiting for reply (23/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA problems, which eventually brought to light to a problem with super-hot datasets which were alleviated (I think). Despite an update to castor that improved performance the last batch of tests didn't show improved results. No news since. In progress (17/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113836 113836] (20/5)<br /><br />
Glue mismatch problems at RAL. Working on getting "many-Arcs" to correctly publish. In progress (24/6)<br />
<br />
'''Monday 29th June 2015, 14.30 BST'''<br /><br />
<br />
Looking at the [https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''"Other VO" Nagios''']. <br /><br />
Things look generally alright - but Durham look like they need to update their CA rpms - but that might have to wait until Oliver is back from leave.<br />
<br />
'''Tarballs:'''<br /><br />
I don't think this effects many, but there's was a ticket to produce a new version of the WN tarball (which is done): [https://ggus.eu/index.php?mode=ticket_info&ticket_id=114574 114574]<br ><br />
Although AFAICS there is no urgent need to upgrade tarball WNs.<br />
<br />
26 UK Tickets, although not many stand out.<br />
<br />
'''Gridpp VO Pilot tickets:'''<br /><br />
Largely doing alright. With Oliver away the Durham ticket hasn't been looked at yet. Sheffield and Bristol's tickets could do with an update (or on-holding if there's going to be a delay). The RHUL ticket has been reopened as they're deployment of the pilot roles hasn't quite worked out.<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114573 114573] (23/6)<br /><br />
LHCB having trouble with two out of three QM CEs. Dan notes that the two "broken" CEs have been recently dual-stacked, and asks if this could be the problem. The answers is a resounding "maybe", and Raja asks if problems could be duplicated by others using lxplus. Waiting for reply (24/6)<br />
<br />
'''IMPERIAL/DIRAC'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114379 114379] (16/6)<br /><br />
Sam's ticket trying to get SE support with Dirac...er, spruced up. Daniela has asked if the tests can be redone with the "new" dirac. Waiting for reply (22/6)<br />
<br />
Let me know if I missed any tickets.<br />
'''Monday 22nd June 2015, 14.00 BST'''<br />
<br />
35 Open UK Tickets this week (!!!)<br />
<br />
'''GridPP Pilot Role'''<br /><br />
A dozen of them are from Daniela (who painstakingly submitted them all) concerning getting the gridpp (and other) pilot role enabled on the site in question's CEs.<br />
<br />
An example of one of these tickets is:<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=114440 114440] (Lancaster's smug solved ticket).<br />
<br />
Ticketed sites are: Durham, Bristol, Cambridge, Glasgow, ECDF (who are also having general gridpp VO support problems), EFDA-JET (looking solved), Oxford, Liverpool, Sheffield, Brunel, RALPP and RHUL. Most tickets are being worked on fine, but the Bristol and Liverpool ones were still just in an "assigned" state at time of writing. <br /><br />
''Update - good progress on this, just one ticket left "assigned". Cambridge are done, as are JET (ticket needs to be closed). Oxford and Manchester are ready for to have their new setups tried out, with Oxford kindly road-testing glexec for the pilot roles. Good stuff.'' <br />
<br />
'''Core Count Publishing'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233] (10/6)<br /><br />
Of the sites mentioned in this ticket (Durham[1], IC, Liverpool, Glasgow, Oxford) who *hasn't* had a go at changing their core count publishing? I know Oxford have. Daniela had a pertinent question about publishing for VMs, which John answered. In progress (17/6)<br />
<br />
[1] Durham have another ticket on this which may explain their lack of core count publishing: <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114381 114381] (16/6)<br />
<br />
'''DIRAC'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114379 114379] (16/6)<br /><br />
Sam S formed this ticket over having trouble accessing the majority of SEs over Dirac, after some discussion around this last week. Sam acknowledges that this could be a site problem, not a DIRAC problem, but you gotta start somewhere (he worded that point more eloquently). Daniela has posted her latest and greatest DIRAC setup gubbins for Sam to try out. Another, unrelated, point to have are the names missing from Sam's list - for example I'm pretty sure Lancaster should support gridpp VO storage but I've forgotten to roll it out! Waiting for reply (22/6)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
Final ticket today, and another one discussed last week in the storage meeting. Steve's explanation of why (and how) Sno+ would need to start to using space tokens was fantastically well worded in a way to not spook easily startled users. David is digesting the information, but it will likely need to wait for Matt M's return before we'll see progress. In progress (16/6)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114444 114444](18/6)<br /><br />
I told a pork pie when I said that was the last ticket - this one caught my eye. A ticket from lhcb over files not having their checksums stored on Manchester's DPM. A link was given to another ticket at CBPF for a similar issue which got the DPM devs involved ([https://ggus.eu/index.php?mode=ticket_info&ticket_id=111403 111403]) - although Andrew McNab was already subscribed to the ticket. In progress (19/6)<br />
<br />
'''Monday 15th June 2015, 14.15 BST'''<br /><br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO nagios:]<br /><br />
ce05.esc.qmul.ac.uk and hepgrid11.ph.liv.ac.uk seem to be having a spot of bother for multiple VOs, and hepgrid2.ph.liv.ac.uk seems to be starting to have trouble too.<br />
<br />
19 Open UK Tickets this week.<br />
<br />
'''NGI'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114233 114233] (10/6)<br /><br />
John Gordon ticketed the NGI (as well as others) about some sites in the UK not publishing core counts with their APEL numbers (or more precisely have submission hosts at that site not publishing). Following the link it looks like Imperial, Liverpool, Durham, Glasgow and Oxford are on this list, I've listed the submission hosts reporting "0" core jobs below to help people clear up their rogues! If you've only fixed things in the last fortnight you'd still show up on this list. In progress (15/6) <br />
<br />
"0" core job submission hosts:<br /><br />
ce3.dur.scotgrid.ac.uk<br /><br />
ce4.dur.scotgrid.ac.uk<br /><br />
cetest02.grid.hep.ph.ic.ac.uk<br /><br />
hepgrid5.ph.liv.ac.uk<br /><br />
hepgrid6.ph.liv.ac.uk<br /><br />
hepgrid97.ph.liv.ac.uk<br /><br />
svr009.gla.scotgrid.ac.uk<br /><br />
t2ce06.physics.ox.ac.uk<br />
<br />
'''Tier 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113914 113914] (26/5)<br /><br />
Sno+ file copying ticket. Matt M is away on his hols, but Dave Auty has took over his duties and reports that this problem seems to have gone away - it can probably be closed. In progress (9/6)<br />
<br />
'''Liverpool'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114248 114248] (10/6)<br /><br />
Another Sno+ ticket here from David concerning job failures. Nothing wrong with the ticket handling, but I thought that David's errors in submitting test jobs are worth documenting, as they were very understandable. David has since asked if the Sno+ job failures are linked to Sno+ nagios test failures at Liverpool. In progress (12/6)<br />
<br />
'''Imperial'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114157 114157] (8/6)<br /><br />
After Simon and Daniela have cleared up the atlas dark data and expanded their Space Tokens using the space freed up there still seems to be some confusion in Rucio, disagreeing with the SRM numbers. In progress (10/6)<br />
<br />
'''Oxford'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114208 114208] (9/6)<br /><br />
Oxford being ticketed for UKI-SOUTHGRID-OX-HEP_IPV6TEST failing connection tests, tests for which Oxford should not be getting ticketed afaacs. I remember this being mentioned in the Thursday cloud meeting, but I'm ashamed to say I wasn't paying attention. Were any conclusions drawn/decisions made? In progress (10/6)<br />
<br />
'''Manchester'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114153 114153] (7/6)<br /><br />
Atlas transfer failures from Manchester. Errors are still occurring as of yesterday, any news? In progress (14/6)<br />
<br />
<br />
'''Monday 8th June 2015, 15.00 BST'''<br /><br />
21 Open Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113914 113914] (26/5)<br /><br />
Sno+ had problems at the Tier 1 where jobs failed whilst uploading data, believed to be due to an incorrect VOInfoPath. There's been a failure at replicating the issue, and the VOInfoPath advertised is correct. Very confusing, as I assume it all worked at some point before! In progress (2/6)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113910 113910] (26/5)<br /><br />
Another Sno+ ticket, concerning lcg-cp timeouts whilst data-staging from tape. Matt M has asked for advice on the best practice for doing this, or if Sno+ would be better off just upping their timeouts. Brian has given some advice on using the "bringonline" command, but is himself unsure the best way of seeing what files are currently in a VO's cache. Not much news since. In progress (28/5) <br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114004 114004] (31/5)<br /><br />
Atlas transfers fail due to the "bring-online" timeout being exceeded. Brian spotted a problem with file timestamps mismatching, but no news on this ticket since. In progress (1/6)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114006 114006] (31/5)<br /><br />
APEL accounting oddness at Brunel, noticed by the APEL team. After much to-and-fro-ing John noticed that multiple CEs were using the same SumbitHost, and thus overwriting each other's sync records. Something to watch out for. In progress (7/6)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=114157 114157] (8/6)<br /><br />
There's been some debate on the atlas lists about this ticket, a classic "not enough space at the site" ticket. Raising above the indignation over being ticketed for this, Daniela has offered a couple of TB to give some space, and pointed out that IC have some atlas data outside space tokens, and that this could be used to expand the tokens if cleaned up. Waiting for reply (8/6)<br />
<br />
<br />
'''Tuesday 26th May 2015'''<br /><br />
Matt's on leave until the 8th of June. But he's replaceable with handy links... 23 tickets today:<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios''']<br /><br />
<br />
[http://tinyurl.com/nwgrnys '''UK NGI GGUS tickets''']<br />
<br />
'''Monday 18th May 2015, 14.30 BST'''<br /><br />
Full review this week.<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios''']<br /><br />
At time of writing I see problems with test jobs at Brunel for pheno and Liverpool for a number of VOs (see Sno+ ticket for probable cause and fix at Liverpool).<br />
<br />
22 Open UK Tickets this week. Going site-by-site:<br />
<br />
'''APEL/NGI'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113473 113473] (4/5)<br /><br />
Missing accounting date for April for some sites. Raul is discussing things for Brunel in the ticket, although they have republished. I think it's only ECDF left to republish their April data. In progress (16/5)<br />
<br />
'''OXFORD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113482 113482] (26/4)<br /><br />
Loss of accounting data for Oxford needing a APEL republish. The Oxford guys republished, but there is some confusion with the resulting numbers. Discussion is ongoing, John G is currently looking at the records. In progress (14/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113650 113650] (11/5)<br /><br />
CMS glideins failing at Oxford. The original problem was with a config tweak being left out of the cvmfs setup, but the ticket has been reopened citing problems persisting on the ARC CE (the CREAM appears to be fixed). Reopened (16/5)<br />
<br />
'''GLASGOW'''<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113095 113095] (17/4)<br /><br />
ROD ticket about batch system BDII failures, left open to avoid unnecessary ticket filing. Gareth noted that the full migration to ARC and HTCondor, which should see the end of these issues, will hopefully be completed by the end of June. On Hold (12/5)<br />
<br />
'''ECDF'''<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (31/7/13)<br /><br />
Somehow left this one out of the e-mail update. Edinburgh's glexec ticket, dependent on the tarball. I put in my tuppence worth today with my tarball hat on. On hold (18/5)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113769 113769] (18/5)<br /><br />
LHCB see a cvmfs problem at Sheffield. Elena has probably fixed the problem(restarted the sssd), just waiting to see if it all pans out. In progress (18/5)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113744 113744] (15/5)<br /><br />
For the VOMS rather then the site, Jens' request for the creation of the dIrac VO, vo.dirac.ac.uk. In progress (18/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113692 113692] (13/5)<br /><br />
A request from pheno to add support to for their new cvmfs area at Manchester, and as I understand it, to support them in a new "form" (pheno.egi.eu). In progress (13/5)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113742 113742] (15/5)<br /><br />
Sno+ noticed their nagios failures at Liverpool. Rob reckons this was a problem with the DPM BDII service certificate not being updated (that's bitten me too), and fixed things this morning. Let's see how that goes. In progress (18/5)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13!)<br /><br />
Lancaster's vintage glexec ticket. An update on this - after have a roundtuit session last week I was building glexec for different paths. It still needs some testing to make sure it works properly. There however definitely won't be a one-size-fits-all tarball solution. On hold (15/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/14)<br /><br />
Only the crustiest old tickets for us at Lancaster! Poor perfsonar performance. Sadly didn't get roundtuit on this one - we're pushing getting these nodes dual stacked as Ewan had pointed out that it would be interesting to see if IPv6 tests also saw this issue. On hild (18/5)<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113721 113721] (14/5)<br /><br />
The only UCL ticket, this is a egi "low availability" ticket. However Daniela notes that the plots are on the rise, so things are looking alright. Probably want to "On Hold" it but otherwise not much to be done. In progress (14/5)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113743 113743] (15/5)<br /><br />
A ticket from Durham concerning the Dirac instance at Imperial's settings for their site. Daniela hopes to get it fixed soon. In progress (15/5)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112948 112948] (10/4)<br /><br />
CA certificate update at 100IT leading to a discussion of other authentication based failures. David has asked for voms information after posting his configs. In progress (13/5)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113035 113035] (14/4)<br /><br />
Ticket tracking the decommissioning of the Tier 1 CREAM CEs. I think things are just about done now, this ticket can soon be closed. In progress (11/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694] (28/10/14)<br /><br />
Sno+ gfal-copy ticket. Brian reports that the Tier 1 is upgrading gfal2 on their WNs, and notes that there's a lot of active debugging work going on in the area. As he eloquently puts it "situation is quite fluid". In progress (13/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA tests failing at the Tier 1. There's been a lot of work on this, deploying then trying to get the new xrootd director configured. New problems have cropped up, and are under investigation. In progress (11/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (28/3)<br /><br />
Atlas transfer failures ("failed to get source file size"). Tracked to a odd double transfer error, possibly introduced in one of the recent "upgrades". Brian has been declaring these files as bad, and a workaround or solution is being thought about. In progress (14/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113705 113705] (13/5)<br /><br />
Atlas transfer failures from RAL tape. Checksum failures, which Brian tracked to being due to not being of a type Castor supports. Brian has asked if this can be changed at the CERN FTS or in rucio. Waiting for reply (14/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113748 113748] (16/5)<br /><br />
Another atlas transfer ticket, but as the error indicates no space left at the Brunel space token being transferred to Elena has noted that this isn't a site problem, telling the submitter to put in a JIRA ticket instead. Waiting for reply, but probably can be just closed (16/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112866 112866] (7/4)<br /><br />
Lots of cms job failures at RAL. This has been traced to some super-hot files, mitigation is being looked into. A candidate for perhaps On Holding, depends on the time frame of a work around. In progress (13/5)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113320 113320] (27/4)<br /><br />
CMS data transfer issues. I'm not actually too sure what's going on. There are files that need invalidating, which seems to be the root of the evil befalling transfers. The issue is being actively worked on though. In progress (18/5)<br />
<br />
'''Monday 11th May 2015, 14.10 BST'''<br /><br />
22 Open UK Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
There are a few tickets at the Tier 1 that are set "In Progress" but haven't received an update yet this month:<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (CMS AAA Tests, 30/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (Atlas Transfer problems, 16/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694] (SNO+ gfal copy trouble, 15/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112866 112866] (CMS job failures, 7/4)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112819 112819] (SNO+ arcsync troubles, 20/4)<br />
<br />
'''Other Tier 1 Tickets''' (sorry to be picking on you guys!)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (10/2)<br /><br />
Atlas glexec hammercloud test jobs at the Tier 1. It appears to be working, but a batch of test jobs failed because they couldn't find the "mkgltempdir" utility on some nodes ("slot1_5@lcg1742.gridpp.rl.ac.uk" and "slot1_4@lcg1739.gridpp.rl.ac.uk"). In progress (4/5)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=113320 113320] (27/4)<br /><br />
Maybe repeating what Daniela is going to say in the CMS update - trouble with CMS data transfers within RAL. It's under investigation, but it looks like the files in question will need to be invalidated - even if it's just to paint a clearer picture. In progress (10/5)<br />
<br />
'''APEL REPUBLISHING'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113473 113473] <br /><br />
At last update Brunel, Liverpool, Edinburgh, Birmingham and Oxford need to republish still. Oxford have their own ticket about it due to complications ([https://ggus.eu/?mode=ticket_info&ticket_id=113482 113482]).<br />
<br />
'''UCL Tickets''' - Ben is starting to move to close these, some are going to be "unsolved".<br />
<br />
'''GLASGOW'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113095 113095] (17/4)<br /><br />
Andrew asks if the timeframe for the move to Condor be added to this ticket, for the ROD team's information. On Hold (7/4)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112948 112948] (10/4)<br /><br />
No news on this 100IT ticket for a while. In progress (27/4)<br />
<br />
'''Friday 1st May'''<br /><br />
The Bank Holiday weekend might muck up plans for a Ticket review this week. Just in case, some links!<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios page.]<br /><br />
<br />
[http://tinyurl.com/l784bbg UK GGUS Tickets]<br />
<br />
Hope you all have a nice weekend!<br />
<br />
A quick check of the [https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios] page.<br />
<br />
26 Open UK tickets this week.<br />
<br />
'''ITWO Decommissioning'''<br /><br />
Three of the tickets are to the VOMS sites (Manchester, Oxford, IC), concerning the decommissioning of the ITWO VO. Just an FYI to y'all.<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113293 113293] (26/4)<br /><br />
There was an APEL problem last month where a lot of sites needed to republish their data for the month. I think Edinburgh are the only UK site that suffered this problem, but another FYI ticket. Assigned (26/4) '' And solved''<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113181 113181](21/4)<br /><br />
Atlas production jobs not running at ECDF. Andy noticed that analysis jobs were running fine, and believes that this might be a problem scheduling pilots in time. Perhaps a multicore issue if this is only effecting (affecting?) production jobs. In progress (22/4) ''Update - solved''<br />
<br />
'''ATLAS GLEXEC HAMMERCLOUD PROBLEMS'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (TIER 1)<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=111703 111703] (RALPP)<br /><br />
It was discovered that there was a problem in the test code, so the ball is very much in atlas' court for this one. The problem has been fixed and the tests are being rebuilt and resubmitted.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (28/3)<br /><br />
ATLAS FTS failures too RAL. A rucio issue causing double-transfers has been discovered ([https://its.cern.ch/jira/browse/ATLDDMOPS-4939 here]), which would explain the behaviour seen. No news since this revelation. In progress (16/4)<br />
<br />
There are a number of other Tier 1 tickets that could do with either an update or On Holding<br />
<br />
'''Monday 20th April 2015, 14.30 BST'''<br /><br />
24 Open UK tickets this week, only a light review. ''Update - down to 20 open tickets as of this morning''<br />
<br />
'''NGI/TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113150 113150] (20/4)<br /><br />
Fresh in - the NGI has been ticketed to change the regional VO from emi.argus to ngi.argus in the gocdb. Seems a bit pedantic, but hey! I assigned it to the NGI ops, and notified RAL as keepers of the regional argus. Assigned (20/4) ''Update - solved''<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113035 113035] (14/4)<br /><br />
Just for people's interest, the ticket tracking the decommissioning of the last of the RAL CREAM CEs. In progress (14/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112819 112819] (2/4)<br /><br />
A SNO+ ticket I must of somehow missed last week, concerning SNO+'s manual renewing of proxies on ARC machines. Matt M has noticed that ArcSync occasionally hangs rather then timeouts smoothly (although he later notes that he doesn't see the initial problems working from a different network). I'm thinking that this should be redirected at the arc devs, but I don't think they have a GGUS support group (I could be wrong, I'm well behind on the ARC curve). In progress (7/4)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113110 113110] (17/4)<br /><br />
Looks like this atlas low transfer efficiency ticket can be closed. Waiting for reply (20/4) ''Update - solved''<br />
<br />
'''GLASGOW'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=113095 113095] (17/4)<br /><br />
ROD ticket for some BDII misreporting at Glasgow. The botheration seems to be ephemeral in nature, the blunders passing with the abating of their batch system's burden. This ticket can probably be solved. In progress (17/4)<br />
<br />
<br />
'''Monday 13th April 2015, 14.00 BST'''<br /><br />
24 Open tickets this week - going over all of them this week, site by site.<br />
<br />
''Fresh in this morning - [https://ggus.eu/?mode=ticket_info&ticket_id=113010 113010] and [https://ggus.eu/?mode=ticket_info&ticket_id=113011 113011] - Sno+ tickets concerning the RAL and Glasgow WMSes not updating job statuses.''<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703](11/2)<br /><br />
Atlas glexec hammercloud tests failing. There's been a lot of waiting on atlas to build new HC jobs. The most recent exchange (delayed due to Easter), was asking about SELinux - but no news since the first. In progress (1/4)<br />
<br />
'''BIRMINGHAM'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112875 112875](7/4)<br /><br />
Low availability ROD ticket. Availability is crawling back up, just need it to go green. On hold (13/4)<br />
<br />
'''GLASGOW'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112967 112967](10/4)<br /><br />
Another ROD ticket for bdii errors at Glasgow. Gareth has been doing everything right investigating this. Kashif recommended ticketed the midmon unit, but Gareth has spotted that the errors correspond to high load on their ARC CE - so it might be a site problem after all - Gareth asks for clarification. Waiting for reply (13/4)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (1/7/13)<br /><br />
Tarball glexec ticket. No news (sorry). End of April I believe was the "deadline" I set for having this made. On Hold (9/3)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/14)<br /><br />
Lancaster's poor perfsonar performance. I'm not believing quite what I was seeing with the tests I performed so I'm aiming to rerun them. On hold (13/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13)<br /><br />
Lancaster's tarball glexec ticket. Same as ECDF. On hold (9/3)<br />
<br />
'''BRUNEL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112966 112966] (13/3)<br /><br />
A ROD cream job submit ticket, freshly assigned this afternoon. It's a bit mean of me to bring notice to it. Assigned (13/4) ''And POW, Raul closed this after kicking torque into shape - solved''<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112948 112948] (10/4)<br /><br />
100IT needed to upgrade to the latest CA release. They've done this, but there are still authentication problems. In progress (13/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356] (10/9/14)<br /><br />
Deploying vmcatcher at 100IT. After David's questions falling on deaf ears for a while it has been advised that the ticket be closed as this issue will be dealt with elsewhere. Whether or not it is to be "solved" or "unsolved" is open to debate! In progress (can possibly be closed) (13/4)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA tests failing at RAL. After a lot of work and new xrootd redirectors problems persist. It's looking to be a problem that needs the CASTOR and/or xrootd devs to look at. In progress (30/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112713 112713] (27/3)<br /><br />
CMS asking to clean up the "unmerged area". Andrew conjured up a list of files and asked if they could be deleted - CMS responded with a "yes please then close the ticket". Has the deed been done? In progress (31/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694] (28/10/14)<br /><br />
The Sno+ gfal copy ticket. Matt M still sees gfal-copy hang for files at RAL when he uses the GUID (SURL works). A Castor oddity perhaps? Matt asks a question about what problems like this (coupled with the move away from lcg tools) will mean for VOs that rely on the LFC. In progress (31/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112977 112977] (10/3)<br /><br />
CMS high job failure rate at RAL. Related to 112896 (below) - the jobs all want that file! In progress (13/3)<br />
<br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=112896 (9/4)<br /><br />
CMS Dataset access problems - caused by over a million access attempts on a single file over a 18 hour period. Andrew L comments that CMS needs to have a think about how they access pileup datasets. In progress (9/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (10/2)<br /><br />
Tier 1 counterpart to 111703. A new HC stress test was submitted near the end of March, but no news on how it did. In progress (23/3)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112866 112866] (7/4)<br /><br />
A different "lots of CMS job failures" ticket. Again a "hot file" seems to be the root cause. In progress (7/4)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112721 112721] (28/3)<br /><br />
An atlas file access ticket, seemingly caused by some odd FTS behaviour. No answers to Shaun's question about this odd occurrence or much noise at all till today. Waiting for reply (13/4)<br />
<br />
'''UCL'''<br /><br />
UCL has 6 tickets - 4 just "assigned". I'll just list them in the interests of brevity.<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112371 112371] (ROD low availiability, On Hold)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112841 112841] (atlas 0% transfer efficiency, assigned)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112873 112873] (ROD srm put failures, assigned)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95298 95298] (glexec ticket, on hold)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112722 112722] (atlas checksum timeouts, in progress)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112966 112966] (ROD job submit failures, assigned)<br />
<br />
'''Tuesday 7th April'''<br />
* 20 open tickets. [http://tinyurl.com/l784bbg Link to GGUS].<br />
<br />
'''Monday 23rd March 2015, 15.30 GMT'''<br /><br />
19 Open tickets this week. <br />
<br />
'''BIRMINGHAM'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=112550 (23/3)<br /><br />
A ticket fresh off the ROD dashboard - the Birmingham CREAMs aren't being matched ("BrokerHelper: no compatible resources"). Matt W has double checked their setup and can't spot anything wrong - they've been running "normal" atlas/lhcb etc jobs fine over the last few weeks. Any advice appreciated. In progress (23/3)<br />
<br />
'''TIER 1'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=112495 (21/3)<br /><br />
MICE problems running jobs at RAL, which Andrew L discovered coincided with WMS problems that he fixed. Probably should be in "Waiting for Reply/Seeing if the problem's evaporated". In progress (23/3)<br />
<br />
https://ggus.eu/?mode=ticket_info&ticket_id=112350 (14/3)<br /><br />
The cause of this Sno+ ticket, about a recent user not being able to access files due to not being in the gridmap, has been discovered. As Robert F sagely pointed out the latest version of the mkgridmap rpm is required to talk to the voms server. Just waiting on the time for it to updated at now. In progress (17/3)<br />
<br />
'''100IT'''<br /><br />
https://ggus.eu/?mode=ticket_info&ticket_id=108356 (10/9/14)<br /><br />
This 100IT ticket is still waiting for a reply (since mid-January). The question needs to be answered by someone familiar with the technologies and terminologies? Is anyone up on vmcatcher? Anyone know what other channel to pass David's query onto? Waiting for reply (15/1)<br />
<br />
'''Monday 16th March 2015, 15.30 GMT'''<br /><br />
<br />
16 Open UK tickets this week. Half Red, Half Green.<br />
<br />
'''RALPP and TIER 1 glexec HC tickets'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703] (RALPP)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (TIER 1)<br /><br />
No news on these tickets since it was expected that a new stable HC job would be released last Tuesday. All very quiet.<br />
<br />
'''OXFORD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112011 112011](25/2)<br /><br />
Similar for this CMS glexec ticket - no news after Kashif asked for some more information way back. Waiting for reply (27/2)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694](28/10/14)<br /><br />
The Sno+ gfal copying ticket. A lot of people are working on this, and attempts to recreate the problems seem to occasionally be devolving into "did we get this complicated command right?". At some point it might be necessary to get the gfal devs involved (are there gfal devs?). Waiting for reply (11/3)<br />
<br />
Also there's the poor 100IT ticket, still waiting for a reply. The JET ticket also needs wrapping up, I put a reminder in to the end of it.<br />
<br />
<br />
'''Monday 9th March 2015, 15.00 GMT'''<br/ ><br />
From last week's crusty ticket round up:<br />
<br />
'''Tier 1'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694]- 28/10/14<br/ ><br />
Matt M has managed to reclaim his tickets after a certificate change orphaned his old ones. Progress has resumed. Duncan has asked Matt to retry his failing tests with a simple copy to local disk example.<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944]- 1/10/14<br/ ><br />
CMS AAA access at RAL. Andrew posed a question to the CMS xroot experts last week - if you have their details it might be a good idea to involve them in the ticket.<br />
<br />
'''100IT'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356]- 10/9/14<br/ ><br />
This VMCatcher ticket is still stuck waiting for a reply. Deafening silence for our 100IT colleagues. <br />
<br />
'''EFDA-JET'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=97485 97485] - 21/9/13<br/ ><br />
The Jet LHCB ticket is in the state of being wrapped up. LHCB have been removed from the local configs, and the site has been removed from LHCB's. I believe that this ticket can be terminated.<br />
<br />
'''QMUL'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353] - 25/11/14<br/ ><br />
Dan's managed to get webdav working on one SE, but not t'other. Very strange, but Dan is investigating (see also [https://ggus.eu/?mode=ticket_info&ticket_id=111942 111942]).<br />
<br />
No movement on the three glexec tickets (none expected on the two tarball ones in the last week though), the Lancaster perfsonar ticket is still waiting on another batch of local tweaks (and I still need to make sense of what I'm seeing). Matt RB closed Sussex's perfsonar ticket though - nice one.<br />
<br />
'''The "Normal" tickets:'''<br />
<br />
Atlas gLexec Hammercloud failures (RALPP and Tier 1)<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (Tier 1)<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703] (RALPP)<br/ ><br />
These tests were waiting on a new stable job release being made - this has been delayed (hopefully out tomorrow).<br />
<br />
'''TIER 1'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111856 111856](19/2)<br/ ><br />
This LHCB ticket about stalled jobs looks like it can be closed (LHCB no longer see a problem). ''Update - set to solved, the jobs were being killed for using too much memory.''<br />
<br />
'''OXFORD'''<br/ ><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=112011 112011](25/2)<br/ ><br />
A CMS user saw glexec failures on some nodes - Kashif asked for some more information but there has been no reply. I'd consider giving the user till the end of the week then closing the ticket if there's still no word. Waiting for reply (27/2)<br />
<br />
'''Tuesday 3rd March'''<br />
* [http://tinyurl.com/nwgrnys A link to GGUS open tickets] for checking status directly.<br />
<br />
Concentrating on pre-2015 tickets this week in an attempt to Spring Clean the UK ggus presence. I will review these again next week - can people please take a look at these tickets if they're owned by them (or if they think they can help!).<br />
<br />
'''SUSSEX - 26/11/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389]<br /><br />
This is a perfsonar ticket - the initial request (reinstalling the perfsonar node) has been done a while ago but things weren't quite right. Matt RB did some soothing to this last week and asks if he's missed anything - I put it to waiting for reply this morning. Waiting for reply (24/2)<br />
<br />
'''QMUL - 25/11/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353]<br /><br />
Atlas wanting https access on QM's SE. Dan's been working on this nicely, carefully testing each stage of his rollout. The end is in sight here. In progress (17/2)<br />
<br />
'''TIER 1 - 28/10/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694]<br /><br />
This is a SNO+ ticket about getting gfal tools working for the Tier 1 - with the new version out Brian has tested it correctly (and I saw a related thread on lcg-rollout) - but no word from Sno+. Who is wrangling the other VOs in these post-Walker times? Waiting for reply (24/2)<br />
<br />
'''TIER 1 - 1/10/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944]<br /><br />
A CMS access about AAA tests at the Tier 1. This ticket is being actively worked on, with a new xrootd redirector at RAL and problems with the EU redirectors mucking things up. No problems that I can see. Waiting for reply (2/3)<br />
<br />
'''100IT - 10/9/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356]<br /><br />
Getting VMcatcher stuff to work at 100IT. This ticket seems to keep stalling due to lack of documentation or replies from the submitters. Waiting for reply (19/1)<br />
<br />
'''LANCASTER - 27/1/14'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566]<br /><br />
Lancaster's poor perfsonar performance. Being poked and prodded on and off over the last year, but the problems remain a mystery - Ewan's lending of a iperf endpoint has helped out greatly though, waiting on yet another network tweak. On Hold (23/2)<br />
<br />
'''EFDA-JET - 21/9/13'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=97485 97485]<br /><br />
LHCB job failures at EFDA-JET. The causes of this remain a mystery, and is the first ticket on my "to be set to unsolved" list.<br />
<br />
'''UCL - 1/7/13'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95298 95298]<br /><br />
UCL's glexec ticket. Ben's been working hard at this recently, but keeps hitting show stoppers - the latest being a performance problem possibly due to the VM he's running argus on. On hold (19/2)<br />
<br />
'''ECDF and LANCASTER - 1/7/13'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (ECDF)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (Lancaster)<br /><br />
glexec for the tarball. This is *still* waiting on the tarball glexec, which is again waiting on me, which is waiting on me magicking some extra tarball development time. Will be reviewed by the end of March.<br />
<br />
<br />
'''Monday 23rd February 2015, 15.00 GMT'''<br /><br />
Only 15 Open UK tickets this week. Feel free to bring up any ticket-based issues of your own on this quiet week.<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703](11/2)<br /><br />
This CMS glexec hammercloud ticket is looking a little quiet - no update for a while. If it's continuing offline or waiting on input could it at least be put On Hold? In progress (11/2)<br />
(The Tier-1 version of this ticket, 111699, seems to be chugging along fine - there might be useful snippets in there).<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333](22/1)<br /><br />
The Cloud accounting probe ticket was reopened, asking if 100IT ticketed apel support (I assume contacting them via other means would work too) otherwise the new cloud accounting won't be properly republished. Reopened (20/2)<br />
<br />
<br />
'''IMPERIAL''' (but not really their issue)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111872 111872](20/2)<br /><br />
Tom opened a ticket after another cern@school user had troubles using the IC SE - there has been some problems with the newer versions of the dirac UI. Sometimes it's better to go Vintage! Although after trying Simon and Daniela haven't been able to reproduce the failure - perhaps something's up with the user's UI? Waiting for reply (23/2)<br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389](26/11/14)<br /><br />
Sussex's Perfsonar ticket. I know Matt RB has put the ticket On Hold and is very busy - is there any news/anything we can do to help? On Hold (21/1<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694](28/10/14)<br /><br />
Sno+ gfal copy problems at RAL. Brian informs us that the latest version of the gfal tools works for him and has asked if they work for Matt M. and Co. Did you get these packages out of epel/epel-testing or somewhere else Brian? Waiting for reply (18/2)<br />
<br />
'''Monday 16th February 2015, 14.30 GMT'''<br /><br />
Only 19 open UK tickets today.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111120 111120](12/1)<br /><br />
An atlas ticket concerning transfer failures between RAL and BNL. Brian mentioned last week that the lack of recent failures is due to atlas not attempting to transfer any older data recently. Perhaps this could do with being put into the ticket (and the ticket being put On Hold, or prodded some more)? Waiting for reply (29/1)<br />
<br />
Also<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=111800 111800] (17/2)<br /><br />
ARC CE issues at RAL detected. <br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703](11/2)<br /><br />
Atlas running glexec hammerclouds - having trouble at RALPP (and RAL, see 111699). The glexec experts have gotten involved on this one, and asked to take a peek at a proxy - not sure about anyone else, but I'd feel a tad uncomfortable sharing proxies, even with known and trusted experts as in this case. Am I being overly paranoid? Either way the ticket has gone a bit quiet. In progress (11/2) <br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356] & [https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333]<br /><br />
Both the 100IT tickets are in Waiting for reply - the oldest one for quite a while - David asked a question a while back and no answer. The newest one asks if the 100IT logs made it to apel safely - I think what David has to do is submit a ticket with this question to the apel support team - have I got the right end of the stick? <br />
<br />
'''Biomed Tickets at Manchester and Imperial'''<br /> <br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356] & [https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357]<br /><br />
FYI There's a note at the bottom of both of these tickets that the version of CREAM that should fix this has been delayed until the end of February(ish).<br /><br />
''Update - I read those updates wrong - the cream update has been released and these tickets have been (perhaps erroneously) closed.'' <br />
<br />
<br />
<br />
'''Monday 9th February 2015, 15.00 GMT'''<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 Other VO Nagios Results]<br /><br />
At the time of writing the only site showing red that aren't suffering an understood problem was RALPP with org.nordugrid.ARC-CE-submit and SRM-submit test failures for gridpp, pheno, t2k and southgrid for both its CEs and its SE. The failures are between 1 and 12 hours old, so it doesn't seem to be a persistent failure, but it seems to be quite consistent. They all seem to be failing with "Job submission failed... arcsub exited with code 256: ...ERROR: Failed to connect to XXXX(IPv4):443 .... Job submission failed, no more possible targets". Anyone seen something like this before?<br />
<br />
Only 20 Open UK tickets this week.<br />
<br />
'''Biomed tickets:'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356] (Manchester)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357] (Imperial)<br /><br />
Biomed have linked both these tickets as children of 110636, being worked on by the cream blah team. AFAIKS no sign of Cream 1.16.5 just yet.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111347 111347] (22/1)<br /><br />
CMS consistency checks for January 2015. It looks like everything that was asked of RAL has been done by RAL, so hopefully this can be successfully closed. In progress (3/2)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111120 111120] (12/1)<br /><br />
Another ticket, this time concerning a period of Atlas transfer failures between RAL and BNL, that looks like it can be closed as the failures seem to have stopped (and might well have been at the BNL end). Waiting for reply (22/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944] (1/10/14)<br /><br />
CMS AAA test failures at RAL. Federica can't connect to the new xrootd service according to the error messages. No news for a while. In progress (29/1)<br />
<br />
'''100IT'''<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356]<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333]<br /><br />
Both of these 100IT tickets are looking a bit crusty - the first is waiting for advice, the second was just put "In progress".<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353] (25/11/14)<br />
Dan has set up se02.esc.qmul.ac.uk to test out the latest https-accessible version of storm for dteam and atlas. As a cherry on top this node is also IPv6 enabled. I'm not sure if Dan wants others in the UK to "give it a go"? In progress (6/2)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (27/1/14)<br /><br />
(Blatantly scounging for advice) Trying to figure out why Lancaster's perfsonar is under-performing. Ewan kindly gave us access to a iperf endpoint and it's been very useful in characterising some of the weirdness - although I'm still confused. Ewan also gave us a bunch of suggestions for testing that have been useful - next stop, window sizes. If anyone else wants to throw advice to me all wisdom donations are thankfully accepted. My advice for others in be careful trying to connect to the default iperf port on a working DPM pool node.... In Progress (9/2)<br />
<br />
'''Monday 2nd February 2015, 14.00 GMT'''<br /><br />
22 Open UK tickets this month. <br />
<br />
'''SUSSEX'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389] (26/11/14)<br /><br />
A perfsonar ticket for Sussex. Their perfsonar has been reinstalled, but needs soothing. Matt has informed us that this might have to wait a few weeks due to other issues. On Hold (21/1)<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110536 110536] (2/12/14)<br /><br />
MICE job failures at RALPP - it looked like they were dying due to running of of memory. The queues have been tweaked to give MICE more, but no word from the MICE if this has solved the problem. Waiting for reply (12/1)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110365 110365] (25/11/14)<br /><br />
Another perfsonar ticket. Again the node is reinstalled, just not quite working right. Winnie is waiting for news from the other sites in a similar boat. In progress (maybe On Hold it?) (20/1)<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111118 111118] (12/1)<br /><br />
ECDF "low availability" ticket - just waiting for the silly alarm to clear. Daniela submitted a ticket about this foolish alarm a while ago - [https://ggus.eu/?mode=ticket_info&ticket_id=107689 107689]. On Hold (19/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95303 95303] (1/7/13)<br /><br />
glexec tarball ticket. With my tarball hat on - still no positive news on this front - it's beginning to look like this can't be done but we're having one last go. Sorry! On Hold (19/12)<br />
<br />
'''MANCHESTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110225 110225] (18/11/14)<br /><br />
Change of VO Manager for helios-vo.eu. It looks like this ticket is being held up at the user end a lot. I'm not sure there's anything we can do as it involves outside CAs. On Hold (20/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356] (23/1)<br /><br />
One of Manchester's CEs not working for biomed, due to problems with the new CREAM/old WMS communication. Alessandra gave biomed some sagely advice, but I suspect this ticket will need to be prodded soon to get a reponse from biomed (who I agree should use a newer WMS and close it). On Hold (26/1)<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111547 111547] (2/2)<br /><br />
I'm reporting on a ticket that I submitted to myself today. I'm not sure what that says about the world. Anyway - a ticket to track the decommissioning of one of Lancaster's CEs, as we try to do it all proper like. On Hold (2/2)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566] (21/1/14)<br /><br />
Lancaster's perfsonar ticket, which I sadly let reach its first birthday. I've been prodding this offline, does anyone have the address for a regular, open iperf endpoint I could borrow? On Hold (9/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13)<br /><br />
Lancaster's tarball glexec ticket, as the ECDF one. On hold (26/1)<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=95299 95299] (1/7/13)<br /><br />
UCL's glexec ticket. They've been having trouble getting it to behave, and at last check Ben was off ill - probably due to dealing with glexec :-) On Hold (20/1)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353] (25/11/14)<br /><br />
Atlas asking for QM's storage to be made available via https. Waiting on a production ready STORM that can provide this - Dan is trying it out on his testbed se02.esc.qmul.ac.uk, which still needs tweaking. In progress (28/1)<br />
<br />
'''IMPERIAL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357](23/1)<br /><br />
One of the IC CEs not working for biomed. Similar to the Manchester ticket, Daniela points to ticket 110635 and is waiting on an EMI release to fix it (due out imminently AIUI). On Hold (28/1)<br />
<br />
'''EFDA-JET'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=97485 97485] (21/9/13)<br /><br />
Jet's LCHB job failure tickets. I'm afraid I haven't been able to chase this up (partly due to only ever remembering on the first Monday of the month) - there's been no news for a while. On Hold (1/10/14)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333] (22/1)<br /><br />
A ticket to 100IT and the NGI to get the cloud accounting probe upgraded. I notified 100IT, but forgot to reassign the ticket - thanks to Jeremy for doing it. Assigned (2/2)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356](10/9/14)<br /><br />
Getting VMcatcher working at 100IT. David from 100IT has asked for some answers on which "glancepush" to use, but no reply for a while. Waiting for reply (19/1)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111477 111477](29/1)<br /><br />
CMS would like to run some staging tests to warm up for Run2. The Tier 1 warned CMS of today's outage and they're happy to proceed tomorrow (the 3rd) - I think they'd like a response. In progress (30/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=107935 107935](27/8/14)<br /><br />
A ticket regarding inconsistent BDII and SRM storage numbers. Waiting on a fix from the developers regarding read-only disk accounting (I think), Brian is still on the case. Stephen B let us know that Maria the ticket submitter is on maternity leave, and asks in her stead if the numbers are expected to align now. On hold (28/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111120 111120](12/1)<br /><br />
An atlas ticket about a large number of data transfer errors seen between RAL and BNL. Brian reckoned that this was due to shallow checksums on the old data being transferred, but had trouble looking at the BNL FTS. Regardless, the ADCoS shifter hadn't seen any errors for a week and suggests the ticket can be closed. Waiting for reply (29/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944](1/10/14)<br /><br />
CMS AAA test problems at RAL. After setting up a new xrootd box the test failures have changed in nature, but sadly they're still failures. In progress (29/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111347 111347](22/1)<br /><br />
CMS Consistency Check for RAL, January 2015 edition. Filelists were generated, orphan files were identified, then purged. Just need to know what CMS want to do next. Waiting for reply (26/1)<br />
<br />
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694](28/10/14)<br /><br />
Sno+ ticket concerning gfal tool problems, waiting on the new release to come out (middle of this month I believe). If you don't want to wait that long then I believe the 2.8 gfal2 tools can be found in the fts3 repo at last check. On hold (20/1)<br />
<br />
<br />
'''Monday 26th January 2015, 14.15 GMT'''<br /><br />
''Back after being forgotten about by me:''<br /><br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''Other VO Nagios Status:''']<br />
<br />
At the time of writing I see:<br /><br />
'''Imperial:''' gridpp VO job submission errors (but only 34 minutes old so probably naught to worry about).<br /><br />
'''Brunel:''' gridpp VO jobs aborted (one of these is 94 days old, so might be something to worry about).<br /><br />
'''Lancaster:''' pheno failures (I can't see what's wrong, but this CE only has 10 days left to live).<br /><br />
'''Sussex:''' snoplus failures (but I think Sussex is in downtime).<br /><br />
'''RALPP:''' A number of failures across a number of CEs, all a few hours old. An SE problem?<br /><br />
'''Sheffield:''' gridpp VO job submission failure, but only 6 hours old.<br />
And of course the srm-$VONAME failures at the Tier 1, which are caused by incompatibility between the tests and Castor AIUI. Things are generally looking good.<br />
<br />
22 Open UK Tickets this week. <br /><br />
'''NGI/100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111333 111333](22/1)<br /><br />
The NGI has been asked to upgrade the cloud accounting probe, and then notify our (only at the moment) cloud site to republish their accounting. Not entirely sure what this entails or who this falls on, I assigned it to NGI-OPERATIONS (and also noticed that 100IT isn't on the "notify site" list - odd). Assigned (22/1)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944](1/10/14)<br /><br />
CMS AAA test failures. Andrew Lahiff reported last week that the Tier 1 is building a replacement xrootd box which is currently being prepared. If that will take a while can the ticket be put on hold? In progress (19/1)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353](25/11/14)<br /><br />
An atlas ticket, asking for httpd access to at QMUL. The QM chaps were waiting on a production ready Storm that could handle this, and are preparing to test one out. This is another ticket that looks like it might need to be put On Hold (will leave that up to you chaps - there's a big difference between "slow and steady" progress and "no progress for a while"). In progress (21/1)<br />
<br />
'''RHUL'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111355 111355](23/1)<br /><br />
A dteam ticket - concerning http access to RHUL's SE. Although the initial observation about the SE certificate being expired was incorrect (the expiry date was reported as 5/1/15, which to be fair I would read as the 5th of January and not the 1st of May!) there still is some underlying problem here with intermittent test failures. Also this ticket raises the question of under what context are these tests being conducted? Anyone know, or shall we ask the submitter? In progress (26/1)<br />
<br />
'''BIOMED PROBLEMS:'''<br /><br />
'''Manchester:''' [https://ggus.eu/?mode=ticket_info&ticket_id=111356 111356](23/1)<br /><br />
'''Imperial:''' [https://ggus.eu/?mode=ticket_info&ticket_id=111357 111357](23/1)<br /><br />
Biomed are having job problems, looking to be caused by using crusty old WMSes to communicate with these site's shiny up-to-date CEs. According to ticket 110635 a cream side fix should be out by the end of January (CREAM 1.16.5), although Alessandra suggests that Biomed should try to use newer, working WMSes - or Dirac instead!<br />
<br />
<br />
'''Monday 19th January 2015, 14.30 GMT'''<br /><br />
23 Open UK Tickets this week.<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944](1/10/14)<br /><br />
CMS seeing AAA test failures at RAL. The tests have been restarted recently and now seem to be having some suspicious looking authentication failures. In Progress (13/1)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111162 111162](14/1)<br /><br />
Atlas complaining about httpd doors not working on Sheffield's SE. After schooling the submitter in how to submit more useful information Elena is working on it. I bring this up as in the last few days I've had quite a few of my pool nodes have their httpd daemons crash on them (they're up to date, but still SL5), which may or may not be related. In Progress (19/1)<br />
<br />
'''ECDF'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=111118 111118](12/1)<br /><br />
ECDF "low availability" ticket after a few days of argus trouble, which Wahid fixed. Now the ticket will languish for a few weeks as the alarm clears. Daniela has reminded us of her ticket against these fairly silly alarms: https://ggus.eu/?mode=ticket_info&ticket_id=107689 . In the mean time this ticket could do with being put On Hold whilst the alarm clears. In progress (19/1)<br />
<br />
'''100IT'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356](10/9/2014)<br /><br />
Setting up VMCatcher at 100IT. After some troubles things seem to have be looking up, although there are still some questions that the 100IT chaps have for the configurations and what they should be using that aren't getting answers. I set the ticket to "Waiting for Reply" hoping that this will help get those in the know's attention. Waiting for reply (15/1)<br />
<br />
'''Perfsonar Tickets'''<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110389 110389](Sussex)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110382 110382](TIER 1)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=108273 108273](Durham)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=100566 100566](Lancaster)<br /><br />
[https://ggus.eu/?mode=ticket_info&ticket_id=110365 110365](Bristol)<br /><br />
Everyone seems to have updated their perfsonar hosts, so we're all good on that front, but a number of sites are either having trouble with their reinstalled hosts, or are having problems that they had pre-reinstall still haunt them. I'm afraid I have no suggestions of what to do about the growing number of these tickets though!</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/IPv6_site_statusIPv6 site status2015-09-07T15:00:33Z<p>Matthew Doidge 1ac9bd3994: </p>
<hr />
<div>Testing, testing, 1,2,3.<br />
<br />
{|border="1" cellpadding="1"<br />
|+<br />
<br />
|-style="background:#7C8AAF;color:white"<br />
|Site<br />
|Discussed with local networking team<br />
|Asked for some IPv6 addresses<br />
|Has IPv6 addresses<br />
|IPv6 allocation<br />
|IPv6 enabled hosts (1)<br />
|IPv6 hostnames resolvable via IPv6 (2)<br />
|Joined HEPIX gridftp testbed<br />
|Joined HEPIX phedex testbed<br />
|Dual-stack perfSONAR host<br />
|Dual-stack worker nodes <br />
|Dual-stack grid services (e.g. xrootd, SRM, gridftp)<br />
|Notes<br />
|Date last updated<br />
<br />
|-<br />
|RAL Tier-1<br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">::/64</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|Working with vendors to fix problems on Tier1 core routers. So the rollout of IPv6 to production network is on hold. The testbed is functional<br />
|2015-07-22<br />
<br />
|-<br />
|UKI-LT2-Brunel<br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">::/64</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Not anymore</span><br />
|<span style="color:green">Yes (DPM)</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|IPv6 services in production: Storage, CEs (Cream and Arc/HTC)and all worker nodes<br />
|2015-04-21<br />
<br />
|-<br />
|UKI-LT2-IC-HEP<br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">::/64</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes (but since left)</span><br />
|<span style="color:green">Yes (Storm & DPM)</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|Most services dual-stack including dCache <br />
|2014-10-14<br />
<br />
|-<br />
|UKI-LT2-QMUL<br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">2a01:56c0:4033::/48 (for grid cluster)</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:green">Yes (Storm)</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:red">No</span><br />
|<span style="color:green">Yes</span><br />
|dual-stack VLAN which also does jumbo frames. RIPE atlas probe. DEV: cream (ce04, ce08), storm (se01, se02), xrootd(xrootd02). Prod: storm (se03, se04)<br />
|2015-07-21<br />
<br />
|-<br />
|UKI-LT2-RHUL<br />
|<span style="color:green">Yes</span><br />
|<span style="color:red">Not yet</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red"></span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<br />
|<br />
|<br />
|<br />
|IT will look into after 10GB link commisioing in Sept14<br />
|2014-08-12<br />
<br />
|-<br />
|UKI-LT2-UCL-HEP<br />
|<span style="color:red">Yes</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red"></span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<br />
|<br />
|<br />
|<br />
|Central IT IPv6 project still in early stages.<br />
|<br />
<br />
<br />
|-<br />
|UKI-NORTHGRID-LANCS-HEP<br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red"></span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<br />
|<br />
|<br />
|<br />
|An increase in enthusiasm for IPv6 among members of Technical Infrastructure Group, rolling IPv6 routing out to core in the near future (after term starts). In light of this we've made some specific requests for addresses and are hopeful for a leap in progress soon.<br />
|2015-09-07<br />
<br />
|-<br />
|UKI-NORTHGRID-LIV-HEP<br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red"></span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<br />
|<br />
|<br />
|Central Services have a new core network with IPv6 addresses. An old piece of kit needs bypassing to give us access to the new core. Once connected we should be able to start testing dual-stack on perfsonar. Target Autumn/Winter 2015.<br />
|2015-07-28<br />
<br />
|-<br />
|UKI-NORTHGRID-MAN-HEP<br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">2001:630:22:1004::/64</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes (but no reverse lookup yet)</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:red">No</span><br />
|<span style="color:green">Yes</span><br />
|The IPv6 subnet may or may not change in the future. Services on IPv6: testbed CREAM CE, IPv6 only GridPP VOMS daemon (voms6.gridpp.ac.uk)<br />
|2015-08-27<br />
<br />
|-<br />
|UKI-NORTHGRID-SHEF-HEP<br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">2001:630:63:3::41c5:101</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:green">Yes</span><br />
|<br />
|<br />
| <br />
|2015-02-24<br />
<br />
|-<br />
|UKI-SCOTGRID-DURHAM<br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">2001:630:a5:1200::0/64</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes (but no reverse yet)</span><br />
|<br />
|<br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<br />
|Waiting for delegation of ipv6 reverse DNS before we can enable services<br />
|2015-07-28<br />
<br />
|-<br />
|UKI-SCOTGRID-ECDF<br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:red"></span><br />
|<span style="color:red">Testing</span><br />
|<span style="color:red">Testing</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">Testing</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
| <br />
- Met with university network services in August to discuss plans and address allocation and testing strategy <br />
- Got the go ahead from ECDF to test IPv6 on our service VLAN (which is insulated from their cluster) <br />
- Have allocated IPv6 addresses to our two perfsonar hosts to check bi-direction resolvability before moving onto other middleware services<br />
| 2015-09-01<br />
<br />
<br />
|-<br />
|UKI-SCOTGRID-GLASGOW<br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">::/64</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:red">No</span><br />
|<span style="color:green">Yes (left to join Phedex)</span><br />
|<span style="color:green">Yes (DPM)</span><br />
|<br />
|<br />
|<br />
|Building a dual stack test cluster.<br />
|2014-8-13<br />
<br />
|-<br />
|UKI-SOUTHGRID-BHAM-HEP<br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red"></span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<br />
|<br />
|<br />
|<br />
|Have requested IPv6 addresses from University central IT. Waiting to hear back.<br />
|2014-08-14<br />
<br />
|-<br />
|UKI-SOUTHGRID-BRIS<br />
|<span style="color:green">Yes</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red"></span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<br />
|<br />
|<br />
|<br />
|University ready and willing - limited manpower at site<br />
|2014-08-13<br />
<br />
|-<br />
|UKI-SOUTHGRID-CAM-HEP<br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:magenta">2001:630:212:e10::/64</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<br />
|<br />
|<br />
|University ready and willing - limited manpower at site<br />
|2014-08-12<br />
<br />
|-<br />
|UKI-SOUTHGRID-OX-HEP<br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:magenta">2001:630:441:900::/56</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<br />
|<span style="color:green">Yes</span><br />
|<span style="color:magenta"> Some </span><br />
|<br />
|<span style="color:black"> The IPv6 service from the University runs on somewhat bandwidth limited and non-redundant kit, so we're only deploying test systems, no production work is being run over IPv6. We hope to change that rapidly once the University [https://www.it.ox.ac.uk/news/oxford-network-evolution-project-update network upgrade] completes. </span><br />
|20150728<br />
<br />
|-<br />
|UKI-SOUTHGRID-RALPP<br />
|<span style="color:green">Yes</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red"></span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<br />
|<br />
|<br />
|<br />
| IPv6 Now enabled on Campus routers - Waiting for Central testing to finish before the make address spaces available to departments.<br />
|<br />
<br />
|-<br />
|UKI-SOUTHGRID-SUSX<br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red"></span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<br />
|<br />
|<br />
|<br />
| New central University core routers installed in early summer 2015, which will enable IPv6 work to progress now. I need to chase them again as this was previously the hold up.<br />
| 2015-07-28<br />
<br />
|}<br />
<br />
<br />
(1) Hosts with AAAA records exist in the public DNS, e.g.<br />
<br />
dig AAAA netmon00.grid.hep.ph.ic.ac.uk<br />
<br />
ANSWER SECTION:<br />
<br />
netmon00.grid.hep.ph.ic.ac.uk. 300 IN AAAA 2001:630:12:580:207:43ff:fe11:ffb0<br />
<br />
(2) Hostnames can be resolved from an external IPv6 only host, i.e. local DNS server has AAAA record, e.g.<br />
<br />
dig netmon00.grid.hep.ph.ic.ac.uk<br />
<br />
ADDITIONAL SECTION:<br />
<br />
ns0.ic.ac.uk. 86400 IN A 155.198.142.80<br />
<br />
ns0.ic.ac.uk. 300 IN AAAA 2001:630:12:600:1::80</div>Matthew Doidge 1ac9bd3994https://www.gridpp.ac.uk/wiki/IPv6_site_statusIPv6 site status2015-09-07T15:00:10Z<p>Matthew Doidge 1ac9bd3994: </p>
<hr />
<div>Testing, testing, 1,2,3.<br />
<br />
{|border="1" cellpadding="1"<br />
|+<br />
<br />
|-style="background:#7C8AAF;color:white"<br />
|Site<br />
|Discussed with local networking team<br />
|Asked for some IPv6 addresses<br />
|Has IPv6 addresses<br />
|IPv6 allocation<br />
|IPv6 enabled hosts (1)<br />
|IPv6 hostnames resolvable via IPv6 (2)<br />
|Joined HEPIX gridftp testbed<br />
|Joined HEPIX phedex testbed<br />
|Dual-stack perfSONAR host<br />
|Dual-stack worker nodes <br />
|Dual-stack grid services (e.g. xrootd, SRM, gridftp)<br />
|Notes<br />
|Date last updated<br />
<br />
|-<br />
|RAL Tier-1<br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">::/64</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|Working with vendors to fix problems on Tier1 core routers. So the rollout of IPv6 to production network is on hold. The testbed is functional<br />
|2015-07-22<br />
<br />
|-<br />
|UKI-LT2-Brunel<br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">::/64</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Not anymore</span><br />
|<span style="color:green">Yes (DPM)</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|IPv6 services in production: Storage, CEs (Cream and Arc/HTC)and all worker nodes<br />
|2015-04-21<br />
<br />
|-<br />
|UKI-LT2-IC-HEP<br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">::/64</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes (but since left)</span><br />
|<span style="color:green">Yes (Storm & DPM)</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|Most services dual-stack including dCache <br />
|2014-10-14<br />
<br />
|-<br />
|UKI-LT2-QMUL<br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">2a01:56c0:4033::/48 (for grid cluster)</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:green">Yes (Storm)</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:red">No</span><br />
|<span style="color:green">Yes</span><br />
|dual-stack VLAN which also does jumbo frames. RIPE atlas probe. DEV: cream (ce04, ce08), storm (se01, se02), xrootd(xrootd02). Prod: storm (se03, se04)<br />
|2015-07-21<br />
<br />
|-<br />
|UKI-LT2-RHUL<br />
|<span style="color:green">Yes</span><br />
|<span style="color:red">Not yet</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red"></span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<br />
|<br />
|<br />
|<br />
|IT will look into after 10GB link commisioing in Sept14<br />
|2014-08-12<br />
<br />
|-<br />
|UKI-LT2-UCL-HEP<br />
|<span style="color:red">Yes</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red"></span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<br />
|<br />
|<br />
|<br />
|Central IT IPv6 project still in early stages.<br />
|<br />
<br />
<br />
|-<br />
|UKI-NORTHGRID-LANCS-HEP<br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red"></span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<br />
|<br />
|<br />
|<br />
|An increase in enthusiasm for IPv6 among members of Technical Infrastructure Group, rolling IPv6 routing out to core in the near future (after term starts). In light of this we've made some specific requests for addresses and are hopeful.<br />
|2015-09-07<br />
<br />
|-<br />
|UKI-NORTHGRID-LIV-HEP<br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red"></span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<br />
|<br />
|<br />
|Central Services have a new core network with IPv6 addresses. An old piece of kit needs bypassing to give us access to the new core. Once connected we should be able to start testing dual-stack on perfsonar. Target Autumn/Winter 2015.<br />
|2015-07-28<br />
<br />
|-<br />
|UKI-NORTHGRID-MAN-HEP<br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">2001:630:22:1004::/64</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes (but no reverse lookup yet)</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:red">No</span><br />
|<span style="color:green">Yes</span><br />
|The IPv6 subnet may or may not change in the future. Services on IPv6: testbed CREAM CE, IPv6 only GridPP VOMS daemon (voms6.gridpp.ac.uk)<br />
|2015-08-27<br />
<br />
|-<br />
|UKI-NORTHGRID-SHEF-HEP<br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">2001:630:63:3::41c5:101</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:green">Yes</span><br />
|<br />
|<br />
| <br />
|2015-02-24<br />
<br />
|-<br />
|UKI-SCOTGRID-DURHAM<br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">2001:630:a5:1200::0/64</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes (but no reverse yet)</span><br />
|<br />
|<br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<br />
|Waiting for delegation of ipv6 reverse DNS before we can enable services<br />
|2015-07-28<br />
<br />
|-<br />
|UKI-SCOTGRID-ECDF<br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:red"></span><br />
|<span style="color:red">Testing</span><br />
|<span style="color:red">Testing</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">Testing</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
| <br />
- Met with university network services in August to discuss plans and address allocation and testing strategy <br />
- Got the go ahead from ECDF to test IPv6 on our service VLAN (which is insulated from their cluster) <br />
- Have allocated IPv6 addresses to our two perfsonar hosts to check bi-direction resolvability before moving onto other middleware services<br />
| 2015-09-01<br />
<br />
<br />
|-<br />
|UKI-SCOTGRID-GLASGOW<br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">::/64</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:red">No</span><br />
|<span style="color:green">Yes (left to join Phedex)</span><br />
|<span style="color:green">Yes (DPM)</span><br />
|<br />
|<br />
|<br />
|Building a dual stack test cluster.<br />
|2014-8-13<br />
<br />
|-<br />
|UKI-SOUTHGRID-BHAM-HEP<br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red"></span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<br />
|<br />
|<br />
|<br />
|Have requested IPv6 addresses from University central IT. Waiting to hear back.<br />
|2014-08-14<br />
<br />
|-<br />
|UKI-SOUTHGRID-BRIS<br />
|<span style="color:green">Yes</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red"></span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<br />
|<br />
|<br />
|<br />
|University ready and willing - limited manpower at site<br />
|2014-08-13<br />
<br />
|-<br />
|UKI-SOUTHGRID-CAM-HEP<br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:magenta">2001:630:212:e10::/64</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<br />
|<br />
|<br />
|University ready and willing - limited manpower at site<br />
|2014-08-12<br />
<br />
|-<br />
|UKI-SOUTHGRID-OX-HEP<br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:magenta">2001:630:441:900::/56</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<br />
|<span style="color:green">Yes</span><br />
|<span style="color:magenta"> Some </span><br />
|<br />
|<span style="color:black"> The IPv6 service from the University runs on somewhat bandwidth limited and non-redundant kit, so we're only deploying test systems, no production work is being run over IPv6. We hope to change that rapidly once the University [https://www.it.ox.ac.uk/news/oxford-network-evolution-project-update network upgrade] completes. </span><br />
|20150728<br />
<br />
|-<br />
|UKI-SOUTHGRID-RALPP<br />
|<span style="color:green">Yes</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red"></span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<br />
|<br />
|<br />
|<br />
| IPv6 Now enabled on Campus routers - Waiting for Central testing to finish before the make address spaces available to departments.<br />
|<br />
<br />
|-<br />
|UKI-SOUTHGRID-SUSX<br />
|<span style="color:green">Yes</span><br />
|<span style="color:green">Yes</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red"></span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<span style="color:red">No</span><br />
|<br />
|<br />
|<br />
|<br />
| New central University core routers installed in early summer 2015, which will enable IPv6 work to progress now. I need to chase them again as this was previously the hold up.<br />
| 2015-07-28<br />
<br />
|}<br />
<br />
<br />
(1) Hosts with AAAA records exist in the public DNS, e.g.<br />
<br />
dig AAAA netmon00.grid.hep.ph.ic.ac.uk<br />
<br />
ANSWER SECTION:<br />
<br />
netmon00.grid.hep.ph.ic.ac.uk. 300 IN AAAA 2001:630:12:580:207:43ff:fe11:ffb0<br />
<br />
(2) Hostnames can be resolved from an external IPv6 only host, i.e. local DNS server has AAAA record, e.g.<br />
<br />
dig netmon00.grid.hep.ph.ic.ac.uk<br />
<br />
ADDITIONAL SECTION:<br />
<br />
ns0.ic.ac.uk. 86400 IN A 155.198.142.80<br />
<br />
ns0.ic.ac.uk. 300 IN AAAA 2001:630:12:600:1::80</div>Matthew Doidge 1ac9bd3994