https://www.gridpp.ac.uk/w/api.php?action=feedcontributions&user=Kashif+Mohammad+6ae08fa8ff&feedformat=atomGridPP Wiki - User contributions [en]2024-03-29T15:28:51ZUser contributionsMediaWiki 1.22.0https://www.gridpp.ac.uk/wiki/Operations_Bulletin_LatestOperations Bulletin Latest2015-09-29T09:46:51Z<p>Kashif Mohammad 6ae08fa8ff: /* */</p>
<hr />
<div>[[Operations_Bulletin_Archive|Bulletin archive]]<br />
<br />
__NOTOC__<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Week commencing 29th September 2015<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Task Areas<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start General****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | General updates <br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start General text*********************** ----->'''<br />
<br />
'''Tuesday 29th September'''<br />
* Nagios was (and therefore the regional dashboard has) affected by a weekend A/C outage at Oxford.<br />
* Steve J reports on: Condor libglobus_common problems<br />
* There was an EGI OMB on 24th September. [https://indico.egi.eu/indico/conferenceDisplay.py?confId=2380 Agenda]. <br />
* [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGDailyMeetingsWeek150928#Monday Notes from the Monday biweekly WLCG ops meeting] are available for anyone who is interested in the latest ops news.<br />
* On the topic 'Perfsonar Bandwidth checks not running' Duncan reported a move to a [https://maddash.aglt2.org/maddash-webui/index.cgi?dashboard=Latency%20tests%20between%20all%20WLCG%20hosts full WLCG mesh].<br />
* Tom would appreciate feedback on the [https://vm36.tier2.hep.manchester.ac.uk/ GridPP website v2].<br />
* Steve Lloyd has setup a [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics2.htmlnew metrics page] as a basis for allocating T2 hardware funding. This just uses total Disk and total Elapsed and/or CPU time. In the PMB yesterday it was agreed that Elapsed time would be used, but the results of various combinations will be watched and assessed over the coming months. One overriding reason for using Elapsed time is that CPU is not provided by all cloud implementations.<br />
* <br />
<br />
'''Tuesday 22nd September'''<br />
* Monday's WLCG weekly ops meeting [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGDailyMeetingsWeek150921#Monday minutes] are available.<br />
* There is an EGI Operations Management Board this Thursday. Do we have any items to raise?<br />
* Several observations recently of FTS3 at RAL being overloaded.<br />
* Federico raised: anomalous CPU usage for DIRAC ilc jobs.<br />
* Looking at supporting DEAP3600 (RHUL, RAL and Sussex).<br />
<br />
<br />
'''Tuesday 15th September'''<br />
* As mentioned last week, GridPP is creating a Tier-2 Evolution working group.<br />
* LCG-ROLLOUT ''glexec' missing in /cvmfs/grid.cern.ch/emi3wn-latest?'. Matt is making progress.<br />
* Cambridge machine room relocation Wednesday 16th September.<br />
* Registration for the [http://cf2015.egi.eu/ EGI Community Forum 2015 in Bari] is open. [https://indico.egi.eu/indico/conferenceDisplay.py?confId=2544 Agenda].<br />
* There was a GDB at CERN last week: Agenda : [https://twiki.cern.ch/twiki/bin/view/LCG/GDBMeetingNotes20150909 Minutes]. [http://indico.cern.ch/event/319751/ Agenda].<br />
* Simon noted this (DIRAC doc link) was broken: https://github.com/gridpp/user-guides/blob/master/DIRAC-getting-started.md. <br />
* Articles for the [https://wlcg-ops.web.cern.ch/ WLCG ops portal].<br />
<br />
<br />
'''Tuesday 1st September'''<br />
* From October/November, the EGI ops VO monitoring will be performed using RFC proxies, as opposed to legacy proxies.<br />
* There will be a new filter for the critical profile for ATLAS WLCG SAM tests so that only production endpoints will be tested and taken into account for site availability metrics. This will be available from the SAM3 interface.<br />
* CNAF: Due to a fire causing problems with one electrical supply line that happened last Thursday, the computing centre is running at lower capacity (around 30% less of the pledged capacity).<br />
* Machine job features testing has hit a big according to Raul!<br />
* The HNSciCloud PCP pilot proposal successful submission (refer to Andrew Sansum's talk at GridPP34 if you forget what this means!). The project intends to procure commercial cloud resources for FY17 and FY18. We will contribute 75K euro towards this activity and the EU will then top up to 250K euro.<br />
* There was an EGI Operations Management Board last Thursday. There are no summary notes yet, but please take a look at the [https://indico.egi.eu/indico/conferenceDisplay.py?confId=2379 agenda] and linked talks (may be worth skimming them at the ops meeting).<br />
* There was a quick request/reminder for sites to please update their [https://www.gridpp.ac.uk/wiki/IPv6_site_status IPv6 entries in the GridPP wiki].<br />
* RAL is closed on Monday and Tuesday of this week!<br />
<br />
<br />
<br />
'''Tuesday 24th August'''<br />
* A [https://indico.cern.ch/event/319751/ draft agenda for the September GDB] is taking shape. Any Tier-2 rep volunteers this month please email Jeremy.<br />
* Interest in HTCondor CE. (Ops thread 20/08).<br />
<br />
<br />
<!-- **********************End General text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End General****************** -----><br />
<!-- ****************Start ops coord****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Operations Coordination - [https://indico.cern.ch/categoryDisplay.py?categId=4372 Agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start ops coord text*********************** -----><br />
'''Tuesday 22nd September'''<br />
* There was an ops coordination meeting last Thursday: [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150917 Minutes].<br />
* Highlights:<br />
** All 4 experiments have now an agreed workflow with the T0 for tickets that should be handled by the experiment supporters and were accidentally assigned to the T0 service managers.<br />
** A new FTS3 bug fixing release 3.3.1 is now available.<br />
** A globus lib issue is causing problems with FTS3 for sites running IPv6.<br />
** The rogue Glasgow configuration management tool replacing the current configuration for VOMS with the old one was picked up and unfortunately discussed as though sites had not got the message about using the new VOMS.<br />
** No network problems experienced with the transatlantic link despite 3 out of 4 cables being unavailable.<br />
** T0 experts are investigating the slow WN performance reported by LHCb and others.<br />
** A group of experts at CERN and CMS investigate ARGUS authentication problems affecting CMS VOBOXes.<br />
** T1 & T2 sites please observe the actions requested by ATLAS and CMS (also on the WLCG Operations portal).<br />
* Actions for [https://wlcg-ops.web.cern.ch/sys-admins/tasks Sites]; [https://wlcg-ops.web.cern.ch/experiments/tasks Experiments].<br />
<br />
'''Tuesday 15th September'''<br />
* The [https://indico.cern.ch/event/393616/ next WLCG ops coordination meeting is this Thursday 17th September]. Are there any Tier-2 issues we wish to raise? Minutes will appear [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes150917 here]. <br />
<br />
<br />
* There was a middleware readiness meeting on 16th September.<br />
* The new DPM version is being tested via the ATLAS workflow by the Edinburgh Volunteer site.<br />
* Many new sites showed interest to participate in MW Readiness testing with CentOS7. It is useful to anticipate the MW behaviour in the event of new HW purchase. DPM validation on CentOS/SL7 is already ongoing at Glasgow.<br />
* ATLAS and CMS are asked to declare whether the xrootd 4 monitoring plugin is important for them or not. As it is now, it doesn't work with dCache v. 2.13.8<br />
* Despite the fact that FTS3 runs at very few sites we decided to test it for Readiness. In this context, ATLAS and CMS are asked to use the FTS3 pilot in their transfer test workflows<br />
* PIC successfully tested dCache v.2.13.8 for CMS.<br />
* CNAF has obtained Indigo-DataCloud effort to strengthen the ARGUS development team. The ARGUS collaboration will meet again early October. The problems faced at CERN with a CMS VOBOX are being investigated in ticket GGUS:116092.<br />
* The next MW Readiness WG vidyo meeting will take place on Wednesday 28 October at 4pm CET.<br />
<br />
<br />
<!-- **********************End ops coord text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End ops coord****************** -----><br />
<br />
<!-- ****************Start T1****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-1 - [http://www.gridpp.rl.ac.uk/status/ Status Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start T1 text*********************** -----><br />
'''Tuesday 29th September'''<br />
A reminder that there is a weekly [http://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Tier-1 experiment liaison meeting]. Notes from the last meeting [https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2015-09-23 here]<br />
* The problems with the production FTS service have been resolved. A workaround to the memory leak introduced with the new version has been supplied. This, along with a reduction in the numbers of transfers queued, has enabled the service to return to normal operation.<br />
* The second step in the upgrade of the Castor Oracle databases to version 11.2.0.4 took place last Tuesday. This was the upgrade of the "Neptune" standby database and the re-establishment of the Dataguard link. ("Neptune" hosts the Atlas and GEN instance stagers.). The next step in this upgrade is the upgrade of the "Pluto" database which hosts the Nameserver as well as the CMS & LHCb stager databases. This will require all of Castor to be down for the day and is scheduled for the 6th October.<br />
* We have an 'at risk' for tomorrow morning (Wednesday 30th Sep.) as the Tier1's link into the RAL core network is upgraded to 40Gb. This will take place between 07:00 and 08:30.<br />
* We have put a new Atlas xroot (FAX) re-director into service.<br />
<!-- **********************End T1 text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End T1****************** -----><br />
<br />
<!-- ****************Start Storage & DM****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Storage & Data Management - [http://storage.esc.rl.ac.uk/weekly/ Agendas/Minutes]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150902-minutes.txt Wednesday 02 Sep]'''<br />
* Catch up with [http://www3.imperial.ac.uk/highenergyphysics/research/experiments/mice MICE]<br />
* How to do transfers of '''lots''' of files with FTS3 without the proxy timing out (in particularly if you need it vomsified)<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150812-minutes.txt Wednesday 12 Aug]'''<br />
* sort of housekeeping: data cleanups, catalogue synchronisation - in particular namespace dumps for VOs<br />
* GridPP storage/data at future events; GridPP35 and Hepix and Cloud data events<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150708-minutes.txt Wednesday 08 July]'''<br />
* Huge backlog of ATLAS data from Glasgow waiting to go to RAL, and oddly varying performance numbers - investigating<br />
* How physics data is like your Windows 95 games<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150701-minutes.txt Wednesday 01 July]'''<br />
* Feedback on CMS's proposal for listing contents of storage<br />
* Simple storage on expensive raided disks vs complicated storage on el cheapo or archive drives?<br />
<br />
'''[http://storage.esc.rl.ac.uk/weekly/20150624-minutes.txt Wednesday 24 June]'''<br />
* Heard about the Indigo datacloud project, a H2020 project in which STFC is participating<br />
* Data transfers, theory and practice<br />
** Somewhat clunky tools to set up but perform well when they run<br />
** Will continue to work on recommendations/overview document<br />
** Worth having recommendations/experiences for different audiences - (potential) users, decision makers, techies<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End Storage & DM****************** -----><br />
<!-- ****************Start T2EVO****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-2 Evolution - [https://its.cern.ch/jira/browse/GRIDPP/ GridPP JIRA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 22 Sep'''<br />
* Fortnightly [https://indico.cern.ch/category/4454/ GridPP Technical Meetings] on Fridays will have Tier-2 Evolution discussions, starting on Fri 25 Sept.<br />
<br />
'''Thursday 17 Sep'''<br />
* Task force to start developing advice for sites to simplify their operation in line with "6.2.5 Evolution of Tier-2 sites" in the GridPP5 proposal.<br />
* Mailing list for Tier-2 evolution activities: gridpp-t2evo@cern.ch - anyone welcome to join<br />
* Also [https://its.cern.ch/jira/browse/GRIDPP/ GridPP project] on the CERN JIRA service for tracking actions. Can be used with a full or lightweight CERN account. You need to be added manually or on the gridpp-ops@cern.ch mailing list to browse issues.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End T2EVO****************** -----><br />
<!-- ****************Start Accounting****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Accounting - [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html UK Grid Metrics] [[HEPSPEC06]] [http://tinyurl.com/cfbfanf Atlas Dashboard HS06]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 22nd September'''<br />
* Slight delay for Sheffield but overall okay - although there is a gap between today's date and the most recent update for all sites. Perhaps an APEL delay.<br />
<br />
'''Monday 20th July'''<br />
* Oxford publishing 0 cores from Cream today. Maybe they forgot to switch one off. [http://goc-accounting.grid-support.ac.uk/apel/jobs2_withsubmithost.html Check here]. <br />
<br />
'''Tuesday 14th July'''<br />
* QMUL and Sheffield appear to be lagging with publishing by a week. <br />
* Please check your multicore publishing status (especially those sites mentioned in June).<br />
<br />
* A reminder to keep updating the [https://www.gridpp.ac.uk/wiki/HEPSPEC06 HEPSPEC06 tables].<br />
<br />
* [http://accounting.egi.eu/egi.php?SubRegion=1.67&query=normcpu&startYear=2014&startMonth=8&endYear=2014&endMonth=9&yRange=SITE&xRange=VO&voGroup=lhc&chart=GRBAR&scale=LIN&localJobs=onlygridjobs APEL status]: An issue at Sheffield?<br />
<br />
<!-- *****************Edit stop*********************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End Accounting****************** -----><br />
<!-- ****************Start Documentation****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Documentation - [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=area KeyDocs]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- *********************************************************** -----><br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 22nd September'''<br />
* Steve J is going to undertake some GridPP/documentation usability testing. <br />
<br />
'''Tuesday 18th August'''<br />
* Lydia's document - Setup a system to do data archiving using FTS3<br />
<br />
'''Tuesday 28th July'''<br />
* Ewan: /cvmfs/gridpp-vo help ... there's a lot of historical stuff on the GridPP wiki that makes it look a lot more complicated than it is now. We really should have a bit of a clear out at some point.<br />
<br />
'''Tuesday 23rd June'''<br />
* Reminder that documents need reviewing!<br />
<br />
<br />
'''General note'''<br />
<br />
See the [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=reviewed worst KeyDocs list] for documents needing review now and the names of the responsible people.<br />
<br />
|}<br />
<br />
<!-- ****************End Documentation****************** -----><br />
<!-- ****************Start Interop****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Interoperation - [https://wiki.egi.eu/wiki/Grid_Operations_Meetings EGI ops agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Monday 13th July'''<br />
<br />
* There was an EGI Ops meeting today: [https://wiki.egi.eu/wiki/Agenda-13-07-2015 agenda]<br />
* URT/UMD updates:<br />
** UMD 3.13.0 released on 01.07.2015: http://repository.egi.eu/2015/07/01/release-umd-3-13-0/<br />
*** APEL 1.4.1<br />
*** Argus PAP v. 1.6.2<br />
*** gLExec-wn - v. 1.2.3 (lcmaps and mkdir)<br />
*** storm 1.11.8<br />
*** fetch-crl 3.0.16<br />
*** cream 1.16.5<br />
*** dpm-xroot 3.5.2<br />
*** Xroot 4.1.1.<br />
*** Frontier Squid 2.7.24<br />
*** CVMFS 2.1.20<br />
*** GFAL2 2.8.4<br />
*** GFAL2-PYTHON 1.7.1<br />
** UMD 3.13.1 released on 13.07.2015: http://repository.egi.eu/2015/07/13/release-candidate-umd-3-13-1-rc1/ (link was not updated correctly during release)<br />
*** ARC Nagios probes 1.8.3<br />
<br />
* SR updates (small because it's summer):<br />
*** gfal2 2.9.1<br />
*** storm 1.11.9<br />
*** srm-ifce 1.23.1....<br />
*** gfal2-python 1.8.1<br />
** In Verification<br />
*** gfal2-plugin-xrootd 0.3.4<br />
<br />
* Accounting<br />
** [John Gordon] "Of the WLCG sites we now have 97%+ of cpu reported with cores. I expect you all saw my recent email to GDB naming 16 sites. If one German and one Spanish site and the four Russians start publishing we will jump to 99%+"<br />
** New list of sites needing to update multicore accounting being prepared this evening (Monday) by Vincenzo<br />
<br />
* SL5 decommissioning date March 2016; <br />
* Next meeting 10th August<br />
<br />
'''Monday 15th June'''<br />
* There was an EGI operations meeting today: [https://wiki.egi.eu/wiki/Agenda-15-06-2015 agenda]. <br />
* New Action: for the NGIs: please start tracking which sites are still using SL5 services: how many services, and for each service if still needed on SL5, if upgrades on SL5 services are expected). [https://wiki.egi.eu/wiki/SL5_retirement A wiki has been provided to record updates]. Also interesting to understand who is using Debian.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- *********************************************************** -----><br />
<br />
|}<br />
<!-- ****************End Interop****************** -----><br />
<!-- ****************Start Monitoring****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Monitoring - [https://www.gridpp.ac.uk/wiki/Links_Monitoring_pages Links] [http://grid-monitoring.cern.ch/mywlcg/ MyWLCG]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 16th June'''<br />
* F Melaccio & D Crooks decided to add a [https://www.gridpp.ac.uk/wiki/Monitoring_FAQs FAQs section] devoted to common monitoring issues under the monitoring page.<br />
* Feedback welcome. <br />
<br />
<br />
<br />
''' Tuesday 31st March<br />
<br />
* Glasgow Graphite/Grafana documentation: http://www.scotgrid.ac.uk/graphite/<br />
<br />
'''Monday 7th December<br />
<br />
* Meeting last Friday - agenda: https://indico.cern.ch/event/356853/ minutes: https://indico.cern.ch/event/356853/material/minutes/1.pdf<br />
* This was the wrap-up meeting of the consolidation TF; the mailing list will remain extant for a while yet.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Monitoring****************** -----><br />
<!-- ****************On-duty****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | On-duty - [https://operations-portal.in2p3.fr/dashboard Dashboard] [https://www.gridpp.ac.uk/wiki/ROD_rota ROD rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 15th September'''<br />
* Generally quiet. QMUL have some grumblyness with the CEs. However, I understand much of this is caused by the batch farm being busy. There are low-availability tickets 'on hold' for Liverpool and UCL.<br />
<br />
<br />
'''Tuesday 1st September'''<br />
* Gordon was on shift.<br />
* Another very quiet week with no new tickets. We have two open ROD tickets, both of which are for A/R; one is against Cambridge, and the other is the now 53-day-old ticket at UCL.<br />
* Next up.... Kashif again (thanks Kashif!)<br />
<br />
'''Monday 24th August'''<br />
* Kashif on shift.<br />
* There were quite a few alarms throughout the week and many tickets were opened. All of the tickets were fixed within time limit. <br />
* The certificate of the Argus server at sheffield expired but Elena got a new certificate quickly. <br />
* Cambridge and UCL have low availibilty tickets and not much can be done about it except waiting for availibilty to reach 90%.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Rollout [http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html Status] [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions WLCG Baseline]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 15th September'''<br />
* [http://repository.egi.eu/2015/09/10/release-umd-3-13-3/ UMD 3.13.3 is available].<br />
<br />
'''Tuesday 12th May'''<br />
* MW Readiness WG meeting Wed May 6th at 4pm. Attended by Raul, Matt, Sam and Jeremy.<br />
<br />
'''Tuesday 17th March'''<br />
* Daniela has updated the [ https://www.gridpp.ac.uk/wiki/Staged_rollout_emi3 EMI-3 testing table]. Please check it is correct for your site. We want a clear view of where we are contributing.<br />
* There is a middleware readiness meeting this Wednesday. Would be good if a few site representatives joined.<br />
* Machine job features solution testing. Fed back that we will only commence tests if more documentation made available. This stops the HTC solution until after CHEP. Is there interest in testing other batch systems? Raul mentioned SLURM. There is also SGE and Torque.<br />
<br />
'''References'''<br />
<br />
* Staged Rollout pages (now separated into EMI1 & 2), and the page listing the deployed versions is extractable from the bdii, so they should all be reasonably up-to-date:<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout_emi2.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End On-duty****************** -----><br />
<!-- ****************Start Security****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Security - [http://www.gridpp.ac.uk/security/inchand/Incident.html Incident Procedure] [http://www.gridpp.ac.uk/security/policies/index.html Policies] [https://www.gridpp.ac.uk/wiki/SD_rota Rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
'''Tuesday 29th September'''<br />
* Incident broadcast EGI-20150925-01 relating to compromised systems in China.<br />
* UK security team meeting scheduled for 30th Sept. <br />
<br />
'''Monday 29th September'''<br />
* IGTF has released a regular update to the trust anchor repository (1.68) - for distribution ON OR AFTER October 5th<br />
<br />
'''Tuesday 15th September'''<br />
* The security team meeting this week is cancelled. The next will be on 30th. <br />
<br />
'''Tuesday 1st September'''<br />
* The IGTF has released an [ https://rt.egi.eu/rt/Ticket/Display.html?id=9406 urgent update to the trust anchor repository (1.67)]<br />
* Linda is working on a revision to the EGI Technology Questionnaire.<br />
<br />
The EGI [https://operations-portal.egi.eu/csiDashboard security dashboard].<br />
<br />
|}<br />
<!-- ****************End Security****************** -----><br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start Services****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Services - [http://netmon02.grid.hep.ph.ic.ac.uk:8080/maddash-webui/index.cgi PerfSonar dashboard] | [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).<br />
<br />
'''Tuesday 18th August'''<br />
* Next [https://indico.cern.ch/event/401680/ LHCOPN and LHCONE joint meeting]: Science Park Amsterdam (NL) 28-29 of October 2015<br />
<br />
'''Tuesday 14th July'''<br />
* GridPP35 in September will have a part focus on networking and IPv6. This will include a review of where sites are with their deployment. Please try to firm up dates for your IPv6 availability between now and September. Please update the [https://gridpp.ac.uk/wiki/IPv6_site_status GridPP IPv6 status table]. <br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Services****************** -----><br />
<!-- ****************Start Tickets****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tickets<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Friday 18th September'''<br />
<br />
Matt's on holiday until the 29th, so he's being replaced with links or any update Jeremy is kind enough to provide.<br />
<br />
[http://tinyurl.com/nwgrnys '''CAST YOUR GAZE UPON THE UK'S GGUS TICKETS HERE'''].<br />
<br />
[https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 '''FEAST YOUR EYES ON THE T'OTHER VO NAGIOS STATUS HERE''']<br />
<br />
"Normal" service will resume in October. I'll leave y'all anticipating that lovely review of all the tickets.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tickets****************** -----><br />
<!-- ****************Start Tools****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tools - [https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] [https://gridppnagios.physics.ox.ac.uk/nagios/ Nagios]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== ===== <br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 29 Sep 2015'''<br />
<br />
Following an air-conditioning problem at machine room in Oxford Tier-2 site on 26 September, gridppnagios(OX) was shut down and gridppnagios(Lancs) became active instance. Oxford site is in downtime until 1st Oct and it may be extended depending on the situation. <br />
VO-Nagios was also unavailable for two days but we have started it yesterday as it is running on a VM. VO-nagios is using oxford SE for replication test so it is failing those tests. I am looking to change to some other SE. <br />
<br />
'''Tuesday 09 June 2015'''<br />
*ARC CEs were failing nagios test becuase of non-availability of egi repository. Nagios test compare CA version from EGI repo. It started on 5th June and one of the IP addresses behind webserver was not responding. Problem went away in approximately 3 hours. The same problem started again on 6th June. Finally it was fixed on 8th June. No reason was given in any of the ticket opened regarding this outage. <br />
<br />
'''Tuesday 17th February'''<br />
* Another period where message brokers were temporarily unavailable seen yesterday. Any news on the last follow-up?<br />
<br />
'''Tuesday 27th January'''<br />
* Unscheduled outage of the EGI message broker (GRNET) caused a short-lived disruption to GridPP site monitoring (jobs failed) last Thursday 22nd January. Suspect BDII caching meant no immediate failover to stomp://mq.cro-ngi.hr:6163/ from stomp://mq.afroditi.hellasgrid.gr:6163/<br />
<br />
* [http://southgrid.blogspot.co.uk/2014/10/nagios-monitoring-for-non-lhc-vos.html Blog about VO Nagios]<br />
* [https://vo-nagios.physics.ox.ac.uk/nagios/ Oxford VO Nagios] currently monitoring gridpp, pheno, t2k.org, snoplus.snolab.ca, vo.southgrid.ac.uk.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tools****************** -----><br />
<!-- ****************Start VOs****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | VOs - [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS] [http://operations-portal.egi.eu/vo VO IDs] [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs Approved] [http://pprc.qmul.ac.uk/~walker/votable.html VO table]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 19th May'''<br />
* There is a current priority for enabling/supporting our joining communities. <br />
<br />
'''Tuesday 5th May'''<br />
* We have a number of VOs to be removed. Dedicated follow-up meeting proposed.<br />
<br />
'''Tuesday 28th April'''<br />
* For SNOPLUS.SNOLAB.CA, the port numbers for voms02.gridpp.ac.uk and voms03.gridpp.ac.uk have both been updated from 15003 to 15503.<br />
<br />
'''Tuesday 31st March'''<br />
* LIGO are in need of additional support for debugging some tests.<br />
* LSST now enabled on 3 sites. No 'own' CVMFS yet.<br />
<br />
* Impact<br />
** Citation policy (https://www.gridpp.ac.uk/acknowledging.html)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End VOs****************** -----><br />
<!-- ****************Start Sites****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Site Updates<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 24th February'''<br />
* Next review of status today.<br />
<br />
'''Tuesday 27th January'''<br />
* Squids not in GOCDB for: UCL; ECDF; Birmingham; Durham; RHUL; IC; Sussex; Lancaster<br />
* Squids in GOCDB for: EFDA-JET; Manchester; Liverpool; Cambridge; Sheffield; Bristol; Brunel; QMUL; T1; Oxford; Glasgow; RALPPD.<br />
<br />
'''Tuesday 2nd December''' <br />
* [https://www.gridpp.ac.uk/wiki/Batch_system_status Multicore status]. Queues available (63%)<br />
** YES: RAL T1; Brunel; Imperial; QMUL; Lancaster; Liverpool; Manchester; Glasgow; Cambridge; Oxford; RALPP; Sussex (12)<br />
** NO: RHUL (testing); UCL; Sheffield (testing); Durham; ECDF (testing); Birmingham; Bristol (7)<br />
<br />
* According to our [https://www.gridpp.ac.uk/wiki/Batch_system_status table] for cloud/VMs (26%)<br />
** YES: RAL T1; Brunel; Imperial; Manchester; Oxford (5)<br />
** NO: QMUL; RHUL; UCL; Lancaster; Liverpool; Sheffield; Durham; ECDF; Glasgow; Birmingham; Bristol; Cambridge; RALPP; Sussex (14)<br />
<br />
* [http://www.gridpp.ac.uk/php/gridpp-dirac-sam.php?action=viewlcg GridPP DIRAC jobs] successful (58%)<br />
** YES: Bristol; Glasgow; Lancaster; Liverpool; Manchester; Oxford; Sheffield; Brunel; IC; QMUL; RHUL (11)<br />
** NO: Cambridge; Durham; RALPP; RAL T1 (4) + ECDF; Sussex; UCL; Birmingham (4)<br />
<br />
* [https://www.gridpp.ac.uk/wiki/IPv6_site_status IPv6 status]<br />
** Allocation - 42%<br />
** YES: RAL T1; Brunel; IC; QMUL; Manchester; Sheffield; Cambridge; Oxford (8)<br />
** NO: RHUL; UCL; Lancaster; Liverpool; Durham; ECDF; Glasgow; Birmingham; Bristol; RALPP; Sussex<br />
<br />
* Dual stack nodes - 21%<br />
** YES: Brunel; IC; QMUL; Oxford (4)<br />
** NO: RHUL; UCL; Lancaster; Glasgow; Liverpool; Manchester; Sheffield; Durham; ECDF; Birmingham; Bristol; Cambridge; RALPP; Sussex, RAL T1 (15)<br />
<br />
<br />
<br />
'''Tuesday 21st October'''<br />
* High loads seen in xroot by several sites: Liverpool and RALT1... and also Bristol (see Luke's TB-S email on 16/10 for questions about changes to help).<br />
<br />
'''Tuesday 9th September'''<br />
* Intel announced the new generation of Xeon based on Haswell.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Sites****************** -----><br />
|}<br />
<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Meeting Summaries<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start PMB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Project Management Board - [http://www.gridpp.ac.uk/pmb/ Members][http://www.gridpp.ac.uk/php/pmb/minutes.php Minutes] [http://www.gridpp.ac.uk/pmb/ProjectManagement/QuarterlyReports/reports.html Quarterly Reports]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ******************************************* -----><br />
<!-- ****************End PMB****************** -----><br />
<!-- ****************Start Grid Ops****************** -----><br />
<!-- ******************************************* -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | GridPP ops meeting - [http://indico.cern.ch/categoryDisplay.py?categId=338 Agendas] [https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Actions] [https://www.gridpp.ac.uk/wiki/Category:GridPP_Operations Core Tasks]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Grid Ops****************** -----><br />
<!-- ****************Start T1 liaison****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) [https://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Agenda] Meeting takes place on Vidyo.<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
Highlights from this meeting are now included in the Tier1 report farther up this page.<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Liaison****************** -----><br />
<!-- ****************Start GDB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Grid Deployment Board - [http://indico.cern.ch/categoryDisplay.py?categId=3l181 Agendas] [http://indico.cern.ch/categoryDisplay.py?categId=666 MB agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End GDB****************** -----><br />
<br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start NGI****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | NGI UK - [http://www.ukngi.ac.uk/ Homepage] [https://ca.grid-support.ac.uk/cgi-bin/pub/pki?cmd=getStaticPage&name=index CA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End NGI****************** -----><br />
<br />
<!-- ****************Start Events****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Events<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************Stop Events****************** -----><br />
<!-- ****************Start ATLAS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK ATLAS - [http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?#currentView=Shifter+view&highlight=false Shifter view] [http://www.atlas.ac.uk/ops/ News & Links]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Atlas S&C week 2-6 Feb 2015'''<br />
<br />
Production<br />
<br />
• Prodsys-2 in production since Dec 1st<br />
<br />
• Deployment has not been transparent , many issued has been solved, the grid is filled again <br />
<br />
• MC15 is expected to start soon, waiting for physics validations, evgen testing is underway and close to finalised.. Simulation expected to be broadly similar to MC14, no blockers expected. <br />
<br />
Rucio<br />
<br />
• Rucio in production since Dec 1st and is ready for LHC RUN-2. Some fields need improvements, including transfer and deletion agents, documentation and monitoring. <br />
<br />
• [https://rucio-ui.cern.ch/dumps Rucio dumps available].<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMDarkDataCleaning Dark data cleaning]<br />
<br />
• [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/DDMLostFilesLost files declaration] . Only Only DDM ops can issue lost files declaration for now, cloud support needs to fill a ticket.<br />
<br />
• Webdav panda functional tests with Hammercloud are ongoing<br />
<br />
Monitoring<br />
<br />
• [http://adc-monitoring.cern.ch/ Main page]<br />
<br />
• [http://dashb-atlas-ddm-acc.cern.ch/dashboard/request.py/ddmaccounting DDM Accounting]<br />
<br />
• [http://atlas-agis.cern.ch/agis/ddmblacklisting/listGroup space]<br />
<br />
• [http://dashb-atlas-ddm.cern.ch/ddm2/ Deletion]<br />
<br />
ASAP<br />
<br />
• ASAP (ATLAS Site Availability Performance) in place. Every 3 months the T2s sites performing BELOW 80% are [https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/ATLASSiteCategorization reported to the International Computing Board.]<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End ATLAS****************** -----><br />
<!-- ****************Start CMS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK CMS<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End CMS****************** -----><br />
<!-- ****************Start LHCb****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK LHCb<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End LHCb****************** -----><br />
<!-- ****************Start Other****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK OTHER<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Other****************** -----><br />
<!-- ****************Start Requests****************** -----><br />
{| cellspacing="0" cellpadding="0" style="margin: 1em 0em 0em 0.3em; width:100%"<br />
| style="width:50%; vertical-align:top; border:1px solid Gold; background-color: LightGreen;" rowspan="2"|<br />
<div style="border-bottom:1px solid Gold; background-color:#ffffaa; padding:0.2em 0.5em 0.2em 0.5em; font-size:110%; font-weight:bold;"> To note</div><br />
<div style="padding:0.4em 1em 0.3em 1em;"><br />
<br />
==== ====<br />
<!-- ******************Edit start********************* -----><br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<br />
</div><br />
|}<br />
<br />
<!-- ****************End Requests****************** -----><br />
<br />
|}</div>Kashif Mohammad 6ae08fa8ffhttps://www.gridpp.ac.uk/wiki/ROD_rotaROD rota2015-08-25T13:31:15Z<p>Kashif Mohammad 6ae08fa8ff: </p>
<hr />
<div>[[Rota History]]<br />
<br />
October 2013 <br /><br />
7th Andrew <br /><br />
14th Daniela <br /><br />
21st Kashif <br /><br />
28th Andrew <br /><br />
<br />
November 2013 <br /><br />
4th Daniela <br /><br />
11th Kashif <br /><br />
18th Andrew <br /><br />
25th Kashif/Gareth <br /><br />
<br />
December 2013 <br /><br />
2nd Daniela <br /><br />
9th Andrew <br /><br />
16th Gareth <br /> <br />
23rd Kashif <br /><br />
30th Andrew <br /><br />
<br />
January 2014 <br /><br />
6th Daniela <br /><br />
13th Gareth <br /><br />
20th Kashif <br /><br />
27th Andrew <br /><br />
<br />
February 2014 <br /><br />
3rd Daniela <br /><br />
10th Gareth <br /><br />
17th Kashif <br /><br />
24th Andrew <br /><br />
<br />
March 2014 <br /><br />
3rd Daniela <br /><br />
10th Gareth <br /><br />
17th Andrew <br /><br />
24th Kashif <br /><br />
31st Daniela<br />
<br />
April 2014 <br /><br />
7th Gareth <br /><br />
14th Kashif <br /><br />
21st Andrew <br /><br />
28th Gareth <br /><br />
<br />
May 2014 <br /><br />
5th Daniela <br /><br />
12th Andrew <br /><br />
19th Gareth <br /><br />
26th Kashif <br /><br />
<br />
June 2014 <br /><br />
2nd Daniela <br /><br />
9th Gareth <br /><br />
16th Andrew <br /><br />
21st Kashif <br /><br />
30th Gareth <br /><br />
<br />
July 2014 <br /><br />
7th Daniela <br /><br />
14th Andrew <br /><br />
21st Kashif <br /><br />
28th Gareth <br /><br />
<br />
August 2014 <br /><br />
4th Andrew <br /><br />
11th Daniela <br /><br />
18th Kashif <br /><br />
25th Daniela <br /><br />
<br />
September 2014 <br /><br />
1st Andrew <br /><br />
8th Gareth <br /><br />
15th Kashif <br /><br />
22nd Daniela <br /><br />
29th Andrew <br /><br />
<br />
October 2014 <br /><br />
6th Kashif <br /><br />
13th Gareth <br /><br />
20th Andrew <br /><br />
27th Daniela <br /><br />
<br />
November 2014 <br /><br />
3rd Kashif <br /><br />
10th Gareth <br /><br />
17th Andrew <br /><br />
24th Daniela <br /><br />
<br />
December 2014 <br /><br />
1st Kashif <br /><br />
8th Gareth <br /><br />
15th Daniela <br /><br />
22nd Andrew <br /><br />
29th Kashif <br /><br />
<br />
January 2015 <br /><br />
5th Gareth <br /><br />
12th Daniela <br /><br />
19th Andrew <br /><br />
26th Kashif + Gordon <br /><br />
<br />
February 2015 <br /><br />
2nd Gareth <br /><br />
9th Daniela <br /><br />
16th Gordon <br /><br />
23rd Andrew <br /><br />
<br />
March 2015 <br /><br />
2nd Gareth <br /><br />
9th Kashif <br /><br />
16th Daniela <br /><br />
23rd Gordon <br /><br />
30th Andrew <br /><br />
<br />
April 2015 <br /><br />
6th Kashif <br /><br />
13th Gareth <br /><br />
20th Gordon <br /><br />
27th Daniela <br /><br />
<br />
May 2015 <br /><br />
4th Andrew <br /><br />
11th Kashif <br /><br />
18th Gareth <br /><br />
25th Gordon <br /><br />
<br />
June 2015 <br /><br />
1st Daniela <br /><br />
8th Andrew <br /><br />
15th Gareth <br /><br />
22nd Kashif <br /><br />
29th Gordon <br /><br />
<br />
July 2015 <br /><br />
6th Daniela <br /><br />
13th Andrew <br /><br />
20th Gordon (best efforts) <br /><br />
27th Daniela <br /><br />
<br />
August 2015 <br /><br />
3rd Gareth <br /><br />
10th Andrew <br /><br />
17th Kashif <br /><br />
24th Gordon <br /><br />
31st Kashif <br /><br />
<br />
September 2015 <br /><br />
7th Gareth <br /><br />
14th Andrew <br /><br />
21st Daniela <br /><br />
28th Gordon <br /><br />
<br />
... Meet-o-matic request in progress.<br />
<br />
<br />
<br />
<br />
{{KeyDocs|responsible=Jeremy Coles|reviewdate=2015-10-01|accuratedate=2015-08-17|percentage=100}}</div>Kashif Mohammad 6ae08fa8ffhttps://www.gridpp.ac.uk/wiki/UKI_WLCG_Regional_NagiosUKI WLCG Regional Nagios2015-07-09T14:56:49Z<p>Kashif Mohammad 6ae08fa8ff: </p>
<hr />
<div>== Nagios Status ==<br />
'''Current Active Nagios :''' https://gridppnagios.physics.ox.ac.uk/nagios<br />
<br />
''' Standby :''' https://gridppnagios.lancs.ac.uk/nagios<br />
<br />
''' My EGI portal :''' https://mon.egi.eu/myegi/<br />
<br />
== Introduction of WLCG Nagios ==<br />
<br />
WLCG Nagios is replacement of old centralized SAM system to monitor WLCG grid infrastructure. It enabled regional entities, such as ROC or NGI to deploy and maintain regional monitoring infrastructure. Data produced by regional nagios is also used by project level system to carry out task such as Service Level Agreement(SLA) calculations etc. Regional Dashboard also use the same data to show problems at site and subsequently tickets are created based on those alarms.<br />
WLCG-Nagios was developed to integrate nagios for WLCG monitoring as part of the EGEE-SA1 Multi Level Monitoring [https://twiki.cern.ch/twiki/bin/view/EGEE/MultiLevelMonitoringOverview MLM] approach. Now it is maintained under EGI and current home page of WLCG Nagios is [https://wiki.egi.eu/wiki/SAM here]. Apart from Nagios, main component of WLCG nagios are<br />
<br />
<br />
'''Aggregated Topology Provider (ATP)''' : ATP is installed as part of ROC/NGI WLCG Nagios package and it collects and aggregate topology related information from various information provider like GOCDB, CIC Portal, BDII and different VO feeds. It is the single authoritative information source for current wlcg grid topology.<br />
<br />
'''Nagios Configuration Generator(NCG)''' : It is the configuration tool which generates nagios configuration based on current topology provided by ATP. A cron job runs NCG every six hour so any changes in topology is included into nagios portal with in six hours.<br />
<br />
'''Messaging Infrastructure''' : An ActiveMQ based messaging infrastructure is used to publish all test results. The idea is that every test result should be published to a Topic or Queue on message bus so any tool can subscribe to that topic or queue and get latest result. Like Regional Dashboard subscribes to alarms queue and get all results directly from message bus<br />
<br />
== UKI Regional Nagios Monitoring Infrastructure ==<br />
<br />
Oxford is hosting and maintaining Regional Nagios([https://gridppnagios.physics.ox.ac.uk/nagios Gridppnagios]) for UK.<br />
WLCG nagios is running on a Dell 610 machine with 16GB of RAM. Jobs are submitted through WMS using a proxy generated by robot certificate. We are not using dedicated WMS any more and Nagios submission system picks up a random WMS from a configured list. A dedicated SE(storage-monit.physics.ox.ac.uk) hosted at Oxford is used for se-replication test of ARC-CE.<br />
<br />
==Access to WLCG Nagios Portal ==<br />
<br />
Access to WLCG nagios portal is enabled for all members of ops and dteam VO apart from persons registered as site admin or regional staff in GOCDB. <br />
Site admins are also authorized to schedule tests for their respective sites. Regional staff members can schedule test for all sites in NGI.<br />
<br />
Access can also be provided to other persons on request. Please send a mail to lcg_manager@physics.ox.ac.uk.<br />
<br />
== Understanding WLCG Nagios Tests ==<br />
<br />
Nagios plugins are available for almost all grid services and it is explained in detail [https://tomtools.cern.ch/confluence/display/SAM/SAM+Architecture.html here] . Explaining all tests are out of scope of this wiki so I am giving a brief overview of CE and SE which is the main component of a grid site. <br />
<br />
===CE Test=== <br />
Nagios submits a job to CE through WMS and the result is sent to message bus directly from WN. Nagios subscribe this result from message bus and publish it as passive result in Nagios portal. CE test comprise of mainly fallowing steps<br />
# Check env variable like, LCG_GFAL_INFOSYS<br />
# Check the version of lcg-CA and glite middleware installed at WN<br />
# Copy and register a file to close SE and then download it to WN and compare it<br />
# Replicate the same file to a central SE define in regional nagios.For UKI, it is storage-monit.physics.ox.ac.uk but it can be any SE.<br />
# Delete all tests file from SE(s).<br />
<br />
===SE test===<br />
<br />
A wrapper script is launched from Nagios which test different metrics and publish result as passive result in Nagios Portal. Main metrics are<br />
#Get full SRM endpoint and storage area from BDII<br />
#Copy and list a test file to SRM<br />
#Get transport URL of file and download it to tmp space and the delete all test files.<br />
<br />
==Rescheduling Tests==<br />
<br />
Site admins registered in GOCDB can reschedule tests for their respective sites for troubleshooting. <br />
===Rescheduling SE test===<br />
<br />
Scheduling SE test is quite straight forward. click on "org.sam.SRM-All-/ops/Role=lcgadmin" for your SE and then click on "Re-schedule the next check of this service" in service command option and then "commit". Point to note here is that org.sam.SRM-All is a wrapper metric and it will reschedule all test for your SE. You can not reschedule other test in SE as they are passive test.<br />
<br />
===Rescheduling CE test===<br />
<br />
For CE, reschedule "org.sam.CREAMCE-JobState-/ops/Role=lcgadmin" as same as above. Here also org.sam.CREAMCE-jobState is wrapper test for all other CE tests and you should not try to reschedule any passive test because it throws an error and it persist there. There is an extra complication in CE test that you can only reschedule org.sam.CREAMCE-JobState if status of this test is either OK or failure. You can not reschedule if status is waiting, pending or running.<br />
==MyEGI Portal==<br />
<br />
[https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] is the visualization tool of WLCG Nagios package. It is the replacement for SAMDB Portal. It consists of<br />
* Regional GridMap<br />
*Service View : Shows all services with current status<br />
*Metric Status View : Show the status of all tests per service and flavour.<br />
*History View : Show a graph of the current status of a service over time<br />
<br />
==Backup Regional Nagios at Lancaster==<br />
<br />
https://gridppnagios.lancs.ac.uk/nagios<br />
<br />
Backup Nagios functions exactly same way as main regional Nagios. The only difference is that it doesn't send alarms to Dashboard.<br />
<br />
<br />
== Maintenance and Troubleshooting ==<br />
<br />
=== Switch Active Nagios between Oxford and Lancaster: ===<br />
<br />
1. Uncomment #NCG_BACKUP_INSTANCE=true in site-info.def at Active Nagios and run yaim<br />
<br />
/opt/glite/yaim/bin/yaim -c -s /etc/yaim/site-info.def -n NAGIOS -n SAM_NAGIOS<br />
<br />
<br />
This will turn Active Nagios into Backup Nagios<br />
<br />
<br />
2. Now comment out NCG_BACKUP_INSTANCE in site-info.def at backup nagios and run yaim in same way. It will be become active one.<br />
<br />
=== Changing WMS for SAM Nagios: ===<br />
<br />
VO_OPS_WMS_HOSTS in site-info.def list wms to be use for submitting jobs to service nodes. Sometimes if we want to remove a misbehaving WMS from the list then there is two option.<br />
<br />
<br />
1. Change VO_OPS_WMS_HOSTS in site-info.def and run yaim as above.<br />
<br />
<br />
2. Edit /etc/glite-wms/ops/glite_wms.conf and glite_wmsui.conf directly and then restart nagios<br />
<br />
/etc/init.d/nagios restart<br />
<br />
<br />
=== Default Top Bdii: ===<br />
<br />
emi-cream.CREAMCE-DirectJobState test uses ldap://sam-bdii.cern.ch:2170 as default top bdii. I have overwritten this configuraion at Oxford SAM instance. It is managed by puppet and changes /etc/ncg/ncg-localdb.d/creamcedjs.conf file<br />
<br />
MODIFY_PARAMETRIC_PARAMETER!emi.cream.CREAMCE-DirectJobState!--ldap-uri!lcgbdii.gridpp.rl.ac.uk<br />
<br />
So if RAL Top BDII is going into extended downtime then change this file. Lancaster SAM Nagios instances uses default value<br />
<br />
Same condition applies for org.sam.SRM-All-/ops/Role=lcgadmin test. It has been overwritten and managed by puppet. It requires /etc/ncg/ncg-localdb.d/srm.conf to be changed <br />
<br />
MODIFY_METRIC_PARAMETER!org.sam.SRM-All!--ldap-uri!lcgbdii.gridpp.rl.ac.uk<br />
<br />
There is no mechanism of failover to different top bdii.<br />
<br />
<br />
==FAQ and Error Messages== <br />
Q. I have added or removed a service, how much time nagios will take to update configuration ?<br />
A. Nagios reconfigure itself every six hour, So nagios will update with in 6 hour of information being publish in top bdii. No manual intervention is required<br />
<br />
Q. How to subscribe for nagios alerts ?<br />
A. There is no facility where user can subscribe alerts for himself. You have to ask Regional Nagios admin for subscription of email alerts.<br />
<br />
Q. Nagios job failing with Exit Code!=0<br />
A. Nagios script launches a Simple-MTA process to send result to message bus. In most of the cases if job completed but <br />
Simple-MTA process could not be launch then it throw this error. Main reasons are<br />
openldap-clients is not installed on WN<br />
ldapsearch -b o=grid -h "TOP-BDII" -p 2170 -x "(GlueServiceType=msg.broker.stomp)" to see that BDII defined in WN is publishing <br />
information about message broker<br />
Check BDII_LIST=lcgbdii.gridpp.rl.ac.uk:2170,lcg-bdii.gridpp.ac.uk:2170,lcg-bdii.cern.ch:2170 <br />
Check firewall outgoing for TCP:6163 port<br />
<br />
Q.org.sam.WN-RepCr test failing with " send2nsd: NS002 - send error : Bad credentials cannot create"<br />
A. Check crl at WN, it may be one of the reason<br />
<br />
Q How to recalculate availability and reliability of sites in case of problem<br />
A https://wiki.egi.eu/wiki/PROC10 <br />
https://tomtools.cern.ch/confluence/display/SAMDOC/Availability+Re-computation+Policy<br />
<br />
== Useful Links ==<br />
UKI Myegi Page<br />
https://gridppnagios.physics.ox.ac.uk/myegi<br />
<br />
UKI WLCG Nagios<br />
https://gridppnagios.physics.ox.ac.uk/nagios<br />
<br />
Nagios Test for one site <br />
https://gridppnagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?hostgroup=site-UKI-LT2-QMUL&style=detail<br />
<br />
Nagios Test for one service i.e glexec<br />
https://gridppnagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?servicegroup=org.sam.glexec.CE&style=detail<br />
<br />
Entry point for all SAM related information<br />
https://wiki.egi.eu/wiki/SAM<br />
<br />
<br />
Operational Dashboard workflow<br />
https://forge.in2p3.fr/projects/opsportaluser/wiki/Operations_Dashboard <br />
<br />
<br />
How to use Dashboard and Nagios web interface : second part of presentation is good for Nagios<br />
https://documents.egi.eu/public/RetrieveFile?docid=301&version=6&filename=Training_guide_general_v1.pdf<br />
<br />
<br />
{{KeyDocs|responsible=Kashif Mohammad|reviewdate=2015-07-09|accuratedate=2015-07-09|percentage=100}}</div>Kashif Mohammad 6ae08fa8ffhttps://www.gridpp.ac.uk/wiki/UKI_WLCG_Regional_NagiosUKI WLCG Regional Nagios2015-07-09T14:39:11Z<p>Kashif Mohammad 6ae08fa8ff: </p>
<hr />
<div>== Nagios Status ==<br />
'''Current Active Nagios :''' https://gridppnagios.physics.ox.ac.uk/nagios<br />
<br />
''' Standby :''' https://gridppnagios.lancs.ac.uk/nagios<br />
<br />
''' My EGI portal :''' https://mon.egi.eu/myegi/<br />
<br />
== Introduction of WLCG Nagios ==<br />
<br />
WLCG Nagios is replacement of old centralized SAM system to monitor WLCG grid infrastructure. It enabled regional entities, such as ROC or NGI to deploy and maintain regional monitoring infrastructure. Data produced by regional nagios is also used by project level system to carry out task such as Service Level Agreement(SLA) calculations etc. Regional Dashboard also use the same data to show problems at site and subsequently tickets are created based on those alarms.<br />
WLCG-Nagios was developed to integrate nagios for WLCG monitoring as part of the EGEE-SA1 Multi Level Monitoring [https://twiki.cern.ch/twiki/bin/view/EGEE/MultiLevelMonitoringOverview MLM] approach. Now it is maintained under EGI and current home page of WLCG Nagios is [https://wiki.egi.eu/wiki/SAM here]. Apart from Nagios, main component of WLCG nagios are<br />
<br />
<br />
'''Aggregated Topology Provider (ATP)''' : ATP is installed as part of ROC/NGI WLCG Nagios package and it collects and aggregate topology related information from various information provider like GOCDB, CIC Portal, BDII and different VO feeds. It is the single authoritative information source for current wlcg grid topology.<br />
<br />
'''Nagios Configuration Generator(NCG)''' : It is the configuration tool which generates nagios configuration based on current topology provided by ATP. A cron job runs NCG every six hour so any changes in topology is included into nagios portal with in six hours.<br />
<br />
'''Messaging Infrastructure''' : An ActiveMQ based messaging infrastructure is used to publish all test results. The idea is that every test result should be published to a Topic or Queue on message bus so any tool can subscribe to that topic or queue and get latest result. Like Regional Dashboard subscribes to alarms queue and get all results directly from message bus<br />
<br />
== UKI Regional Nagios Monitoring Infrastructure ==<br />
<br />
Oxford is hosting and maintaining Regional Nagios([https://gridppnagios.physics.ox.ac.uk/nagios Gridppnagios]) for UK.<br />
WLCG nagios is running on a Dell 610 machine with 16GB of RAM. Jobs are submitted through WMS using a proxy generated by robot certificate. We are not using dedicated WMS any more and Nagios submission system picks up a random WMS from a configured list. A dedicated SE(storage-monit.physics.ox.ac.uk) hosted at Oxford is used for se-replication test of ARC-CE.<br />
<br />
==Access to WLCG Nagios Portal ==<br />
<br />
Access to WLCG nagios portal is enabled for all members of ops and dteam VO apart from persons registered as site admin or regional staff in GOCDB. <br />
Site admins are also authorized to schedule tests for their respective sites. Regional staff members can schedule test for all sites in NGI.<br />
<br />
Access can also be provided to other persons on request. Please send a mail to lcg_manager@physics.ox.ac.uk.<br />
<br />
== Understanding WLCG Nagios Tests ==<br />
<br />
Nagios plugins are available for almost all grid services and it is explained in detail [https://tomtools.cern.ch/confluence/display/SAM/SAM+Architecture.html here] . Explaining all tests are out of scope of this wiki so I am giving a brief overview of CE and SE which is the main component of a grid site. <br />
<br />
===CE Test=== <br />
Nagios submits a job to CE through WMS and the result is sent to message bus directly from WN. Nagios subscribe this result from message bus and publish it as passive result in Nagios portal. CE test comprise of mainly fallowing steps<br />
# Check env variable like, LCG_GFAL_INFOSYS<br />
# Check the version of lcg-CA and glite middleware installed at WN<br />
# Copy and register a file to close SE and then download it to WN and compare it<br />
# Replicate the same file to a central SE define in regional nagios.For UKI, it is storage-monit.physics.ox.ac.uk but it can be any SE.<br />
# Delete all tests file from SE(s).<br />
<br />
===SE test===<br />
<br />
A wrapper script is launched from Nagios which test different metrics and publish result as passive result in Nagios Portal. Main metrics are<br />
#Get full SRM endpoint and storage area from BDII<br />
#Copy and list a test file to SRM<br />
#Get transport URL of file and download it to tmp space and the delete all test files.<br />
<br />
==Rescheduling Tests==<br />
<br />
Site admins registered in GOCDB can reschedule tests for their respective sites for troubleshooting. <br />
===Rescheduling SE test===<br />
<br />
Scheduling SE test is quite straight forward. click on "org.sam.SRM-All-/ops/Role=lcgadmin" for your SE and then click on "Re-schedule the next check of this service" in service command option and then "commit". Point to note here is that org.sam.SRM-All is a wrapper metric and it will reschedule all test for your SE. You can not reschedule other test in SE as they are passive test.<br />
<br />
===Rescheduling CE test===<br />
<br />
For CE, reschedule "org.sam.CREAMCE-JobState-/ops/Role=lcgadmin" as same as above. Here also org.sam.CREAMCE-jobState is wrapper test for all other CE tests and you should not try to reschedule any passive test because it throws an error and it persist there. There is an extra complication in CE test that you can only reschedule org.sam.CREAMCE-JobState if status of this test is either OK or failure. You can not reschedule if status is waiting, pending or running.<br />
==MyEGI Portal==<br />
<br />
[https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] is the visualization tool of WLCG Nagios package. It is the replacement for SAMDB Portal. It consists of<br />
* Regional GridMap<br />
*Service View : Shows all services with current status<br />
*Metric Status View : Show the status of all tests per service and flavour.<br />
*History View : Show a graph of the current status of a service over time<br />
<br />
==Backup Regional Nagios at Lancaster==<br />
<br />
https://gridppnagios.lancs.ac.uk/nagios<br />
<br />
Backup Nagios functions exactly same way as main regional Nagios. The only difference is that it doesn't send alarms to Dashboard.<br />
<br />
<br />
== Maintenance and Troubleshooting ==<br />
<br />
=== Switch Active Nagios between Oxford and Lancaster: ===<br />
<br />
1. Uncomment #NCG_BACKUP_INSTANCE=true in site-info.def at Active Nagios and run yaim<br />
<br />
/opt/glite/yaim/bin/yaim -c -s /etc/yaim/site-info.def -n NAGIOS -n SAM_NAGIOS<br />
<br />
<br />
This will turn Active Nagios into Backup Nagios<br />
<br />
<br />
2. Now comment out NCG_BACKUP_INSTANCE in site-info.def at backup nagios and run yaim in same way. It will be become active one.<br />
<br />
=== Changing WMS for SAM Nagios: ===<br />
<br />
VO_OPS_WMS_HOSTS in site-info.def list wms to be use for submitting jobs to service nodes. Sometimes if we want to remove a misbehaving WMS from the list then there is two option.<br />
<br />
<br />
1. Change VO_OPS_WMS_HOSTS in site-info.def and run yaim as above.<br />
<br />
<br />
2. Edit /etc/glite-wms/ops/glite_wms.conf and glite_wmsui.conf directly and then restart nagios<br />
<br />
/etc/init.d/nagios restart<br />
<br />
<br />
=== Default Top Bdii: ===<br />
<br />
emi-cream.CREAMCE-DirectJobState test uses ldap://sam-bdii.cern.ch:2170 as default top bdii. I have overwritten this configuraion at Oxford SAM instance. It is managed by puppet and changes /etc/ncg/ncg-localdb.d/creamcedjs.conf file<br />
<br />
MODIFY_PARAMETRIC_PARAMETER!emi.cream.CREAMCE-DirectJobState!--ldap-uri!lcgbdii.gridpp.rl.ac.uk<br />
<br />
So if RAL Top BDII is going into extended downtime then change this file. Lancaster SAM Nagios instances uses default value<br />
<br />
Same condition applies for org.sam.SRM-All-/ops/Role=lcgadmin test. It has been overwritten and managed by puppet. It requires /etc/ncg/ncg-localdb.d/srm.conf to be changed <br />
<br />
MODIFY_METRIC_PARAMETER!org.sam.SRM-All!--ldap-uri!lcgbdii.gridpp.rl.ac.uk<br />
<br />
There is no mechanism of failover to different top bdii.<br />
<br />
<br />
==FAQ and Error Messages== <br />
Q. I have added or removed a service, how much time nagios will take to update configuration ?<br />
A. Nagios reconfigure itself every six hour, So nagios will update with in 6 hour of information being publish in top bdii. No manual intervention is required<br />
<br />
Q. How to subscribe for nagios alerts ?<br />
A. There is no facility where user can subscribe alerts for himself. You have to ask Regional Nagios admin for subscription of email alerts.<br />
<br />
Q. Nagios job failing with Exit Code!=0<br />
A. Nagios script launches a Simple-MTA process to send result to message bus. In most of the cases if job completed but <br />
Simple-MTA process could not be launch then it throw this error. Main reasons are<br />
openldap-clients is not installed on WN<br />
ldapsearch -b o=grid -h "TOP-BDII" -p 2170 -x "(GlueServiceType=msg.broker.stomp)" to see that BDII defined in WN is publishing <br />
information about message broker<br />
Check BDII_LIST=lcgbdii.gridpp.rl.ac.uk:2170,lcg-bdii.gridpp.ac.uk:2170,lcg-bdii.cern.ch:2170 <br />
Check firewall outgoing for TCP:6163 port<br />
<br />
Q.org.sam.WN-RepCr test failing with " send2nsd: NS002 - send error : Bad credentials cannot create"<br />
A. Check crl at WN, it may be one of the reason<br />
<br />
Q How to recalculate availability and reliability of sites in case of problem<br />
A https://wiki.egi.eu/wiki/PROC10 <br />
https://tomtools.cern.ch/confluence/display/SAMDOC/Availability+Re-computation+Policy<br />
<br />
== Useful Links ==<br />
UKI Myegi Page<br />
https://gridppnagios.physics.ox.ac.uk/myegi<br />
<br />
UKI WLCG Nagios<br />
https://gridppnagios.physics.ox.ac.uk/nagios<br />
<br />
Nagios Test for one site <br />
https://gridppnagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?hostgroup=site-UKI-LT2-QMUL&style=detail<br />
<br />
Nagios Test for one service i.e glexec<br />
https://gridppnagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?servicegroup=org.sam.glexec.CE&style=detail<br />
<br />
Entry point for all SAM related information<br />
https://wiki.egi.eu/wiki/SAM<br />
<br />
<br />
Operational Dashboard workflow<br />
https://forge.in2p3.fr/projects/opsportaluser/wiki/Operations_Dashboard <br />
<br />
<br />
How to use Dashboard and Nagios web interface : second part of presentation is good for Nagios<br />
https://documents.egi.eu/public/RetrieveFile?docid=301&version=6&filename=Training_guide_general_v1.pdf<br />
<br />
<br />
{{KeyDocs|responsible=Kashif Mohammad|reviewdate=2015-03-27|accuratedate=2015-03-27|percentage=100}}</div>Kashif Mohammad 6ae08fa8ffhttps://www.gridpp.ac.uk/wiki/UKI_WLCG_Regional_NagiosUKI WLCG Regional Nagios2015-07-09T14:38:12Z<p>Kashif Mohammad 6ae08fa8ff: </p>
<hr />
<div>== Nagios Status ==<br />
'''Current Active Nagios :''' https://gridppnagios.physics.ox.ac.uk/nagios<br />
<br />
''' Standby :''' https://gridppnagios.lancs.ac.uk/nagios<br />
<br />
''' My EGI portal :''' https://mon.egi.eu/myegi/<br />
<br />
== Introduction of WLCG Nagios ==<br />
<br />
WLCG Nagios is replacement of old centralized SAM system to monitor WLCG grid infrastructure. It enabled regional entities, such as ROC or NGI to deploy and maintain regional monitoring infrastructure. Data produced by regional nagios is also used by project level system to carry out task such as Service Level Agreement(SLA) calculations etc. Regional Dashboard also use the same data to show problems at site and subsequently tickets are created based on those alarms.<br />
WLCG-Nagios was developed to integrate nagios for WLCG monitoring as part of the EGEE-SA1 Multi Level Monitoring [https://twiki.cern.ch/twiki/bin/view/EGEE/MultiLevelMonitoringOverview MLM] approach. Now it is maintained under EGI and current home page of WLCG Nagios is [https://wiki.egi.eu/wiki/SAM here]. Apart from Nagios, main component of WLCG nagios are<br />
<br />
<br />
'''Aggregated Topology Provider (ATP)''' : ATP is installed as part of ROC/NGI WLCG Nagios package and it collects and aggregate topology related information from various information provider like GOCDB, CIC Portal, BDII and different VO feeds. It is the single authoritative information source for current wlcg grid topology.<br />
<br />
'''Nagios Configuration Generator(NCG)''' : It is the configuration tool which generates nagios configuration based on current topology provided by ATP. A cron job runs NCG every six hour so any changes in topology is included into nagios portal with in six hours.<br />
<br />
'''Messaging Infrastructure''' : An ActiveMQ based messaging infrastructure is used to publish all test results. The idea is that every test result should be published to a Topic or Queue on message bus so any tool can subscribe to that topic or queue and get latest result. Like Regional Dashboard subscribes to alarms queue and get all results directly from message bus<br />
<br />
== UKI Regional Nagios Monitoring Infrastructure ==<br />
<br />
Oxford is hosting and maintaining Regional Nagios([https://gridppnagios.physics.ox.ac.uk/nagios Gridppnagios]) for UK.<br />
WLCG nagios is running on a Dell 610 machine with 16GB of RAM. Jobs are submitted through WMS using a proxy generated by robot certificate. We are not using dedicated WMS any more and Nagios submission system picks up a random WMS from a configured list. A dedicated SE(storage-monit.physics.ox.ac.uk) hosted at Oxford is used for se-replication test of ARC-CE.<br />
<br />
==Access to WLCG Nagios Portal ==<br />
<br />
Access to WLCG nagios portal is enabled for all members of ops and dteam VO apart from persons registered as site admin or regional staff in GOCDB. <br />
Site admins are also authorized to schedule tests for their respective sites. Regional staff members can schedule test for all sites in NGI.<br />
<br />
Access can also be provided to other persons on request. Please send a mail to lcg_manager@physics.ox.ac.uk.<br />
<br />
== Understanding WLCG Nagios Tests ==<br />
<br />
Nagios plugins are available for almost all grid services and it is explained in detail [https://tomtools.cern.ch/confluence/display/SAM/SAM+Architecture.html here] . Explaining all tests are out of scope of this wiki so I am giving a brief overview of CE and SE which is the main component of a grid site. <br />
<br />
===CE Test=== <br />
Nagios submits a job to CE through WMS and the result is sent to message bus directly from WN. Nagios subscribe this result from message bus and publish it as passive result in Nagios portal. CE test comprise of mainly fallowing steps<br />
# Check env variable like, LCG_GFAL_INFOSYS<br />
# Check the version of lcg-CA and glite middleware installed at WN<br />
# Copy and register a file to close SE and then download it to WN and compare it<br />
# Replicate the same file to a central SE define in regional nagios.For UKI, it is storage-monit.physics.ox.ac.uk but it can be any SE.<br />
# Delete all tests file from SE(s).<br />
<br />
===SE test===<br />
<br />
A wrapper script is launched from Nagios which test different metrics and publish result as passive result in Nagios Portal. Main metrics are<br />
#Get full SRM endpoint and storage area from BDII<br />
#Copy and list a test file to SRM<br />
#Get transport URL of file and download it to tmp space and the delete all test files.<br />
<br />
==Rescheduling Tests==<br />
<br />
Site admins registered in GOCDB can reschedule tests for their respective sites for troubleshooting. <br />
===Rescheduling SE test===<br />
<br />
Scheduling SE test is quite straight forward. click on "org.sam.SRM-All-/ops/Role=lcgadmin" for your SE and then click on "Re-schedule the next check of this service" in service command option and then "commit". Point to note here is that org.sam.SRM-All is a wrapper metric and it will reschedule all test for your SE. You can not reschedule other test in SE as they are passive test.<br />
<br />
===Rescheduling CE test===<br />
<br />
For CE, reschedule "org.sam.CREAMCE-JobState-/ops/Role=lcgadmin" as same as above. Here also org.sam.CREAMCE-jobState is wrapper test for all other CE tests and you should not try to reschedule any passive test because it throws an error and it persist there. There is an extra complication in CE test that you can only reschedule org.sam.CREAMCE-JobState if status of this test is either OK or failure. You can not reschedule if status is waiting, pending or running.<br />
==MyEGI Portal==<br />
<br />
[https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] is the visualization tool of WLCG Nagios package. It is the replacement for SAMDB Portal. It consists of<br />
* Regional GridMap<br />
*Service View : Shows all services with current status<br />
*Metric Status View : Show the status of all tests per service and flavour.<br />
*History View : Show a graph of the current status of a service over time<br />
<br />
==Backup Regional Nagios at Lancaster==<br />
<br />
https://gridppnagios.lancs.ac.uk/nagios<br />
<br />
Backup Nagios functions exactly same way as main regional Nagios. The only difference is that it doesn't send alarms to Dashboard.<br />
<br />
<br />
== Maintenance and Troubleshooting ==<br />
<br />
=== Switch Active Nagios between Oxford and Lancaster: ===<br />
<br />
1. Uncomment #NCG_BACKUP_INSTANCE=true in site-info.def at Active Nagios and run yaim<br />
<br />
/opt/glite/yaim/bin/yaim -c -s /etc/yaim/site-info.def -n NAGIOS -n SAM_NAGIOS<br />
<br />
<br />
This will turn Active Nagios into Backup Nagios<br />
<br />
<br />
2. Now comment out NCG_BACKUP_INSTANCE in site-info.def at backup nagios and run yaim in same way. It will be become active one.<br />
<br />
=== Changing WMS for SAM Nagios ===<br />
<br />
VO_OPS_WMS_HOSTS in site-info.def list wms to be use for submitting jobs to service nodes. Sometimes if we want to remove a misbehaving WMS from the list then there is two option.<br />
<br />
<br />
1. Change VO_OPS_WMS_HOSTS in site-info.def and run yaim as above.<br />
<br />
<br />
2. Edit /etc/glite-wms/ops/glite_wms.conf and glite_wmsui.conf directly and then restart nagios<br />
<br />
/etc/init.d/nagios restart<br />
<br />
<br />
=== Default Top Bdii ===<br />
<br />
emi-cream.CREAMCE-DirectJobState test uses ldap://sam-bdii.cern.ch:2170 as default top bdii. I have overwritten this configuraion at Oxford SAM instance. It is managed by puppet and changes /etc/ncg/ncg-localdb.d/creamcedjs.conf file<br />
<br />
MODIFY_PARAMETRIC_PARAMETER!emi.cream.CREAMCE-DirectJobState!--ldap-uri!lcgbdii.gridpp.rl.ac.uk<br />
<br />
So if RAL Top BDII is going into extended downtime then change this file. Lancaster SAM Nagios instances uses default value<br />
<br />
Same condition applies for org.sam.SRM-All-/ops/Role=lcgadmin test. It has been overwritten and managed by puppet. It requires /etc/ncg/ncg-localdb.d/srm.conf to be changed <br />
<br />
MODIFY_METRIC_PARAMETER!org.sam.SRM-All!--ldap-uri!lcgbdii.gridpp.rl.ac.uk<br />
<br />
There is no mechanism of failover to different top bdii.<br />
<br />
<br />
==FAQ and Error Messages== <br />
Q. I have added or removed a service, how much time nagios will take to update configuration ?<br />
A. Nagios reconfigure itself every six hour, So nagios will update with in 6 hour of information being publish in top bdii. No manual intervention is required<br />
<br />
Q. How to subscribe for nagios alerts ?<br />
A. There is no facility where user can subscribe alerts for himself. You have to ask Regional Nagios admin for subscription of email alerts.<br />
<br />
Q. Nagios job failing with Exit Code!=0<br />
A. Nagios script launches a Simple-MTA process to send result to message bus. In most of the cases if job completed but <br />
Simple-MTA process could not be launch then it throw this error. Main reasons are<br />
openldap-clients is not installed on WN<br />
ldapsearch -b o=grid -h "TOP-BDII" -p 2170 -x "(GlueServiceType=msg.broker.stomp)" to see that BDII defined in WN is publishing <br />
information about message broker<br />
Check BDII_LIST=lcgbdii.gridpp.rl.ac.uk:2170,lcg-bdii.gridpp.ac.uk:2170,lcg-bdii.cern.ch:2170 <br />
Check firewall outgoing for TCP:6163 port<br />
<br />
Q.org.sam.WN-RepCr test failing with " send2nsd: NS002 - send error : Bad credentials cannot create"<br />
A. Check crl at WN, it may be one of the reason<br />
<br />
Q How to recalculate availability and reliability of sites in case of problem<br />
A https://wiki.egi.eu/wiki/PROC10 <br />
https://tomtools.cern.ch/confluence/display/SAMDOC/Availability+Re-computation+Policy<br />
<br />
== Useful Links ==<br />
UKI Myegi Page<br />
https://gridppnagios.physics.ox.ac.uk/myegi<br />
<br />
UKI WLCG Nagios<br />
https://gridppnagios.physics.ox.ac.uk/nagios<br />
<br />
Nagios Test for one site <br />
https://gridppnagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?hostgroup=site-UKI-LT2-QMUL&style=detail<br />
<br />
Nagios Test for one service i.e glexec<br />
https://gridppnagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?servicegroup=org.sam.glexec.CE&style=detail<br />
<br />
Entry point for all SAM related information<br />
https://wiki.egi.eu/wiki/SAM<br />
<br />
<br />
Operational Dashboard workflow<br />
https://forge.in2p3.fr/projects/opsportaluser/wiki/Operations_Dashboard <br />
<br />
<br />
How to use Dashboard and Nagios web interface : second part of presentation is good for Nagios<br />
https://documents.egi.eu/public/RetrieveFile?docid=301&version=6&filename=Training_guide_general_v1.pdf<br />
<br />
<br />
{{KeyDocs|responsible=Kashif Mohammad|reviewdate=2015-03-27|accuratedate=2015-03-27|percentage=100}}</div>Kashif Mohammad 6ae08fa8ffhttps://www.gridpp.ac.uk/wiki/UKI_WLCG_Regional_NagiosUKI WLCG Regional Nagios2015-07-09T14:08:52Z<p>Kashif Mohammad 6ae08fa8ff: </p>
<hr />
<div>== Nagios Status ==<br />
'''Current Active Nagios :''' https://gridppnagios.physics.ox.ac.uk/nagios<br />
<br />
''' Standby :''' https://gridppnagios.lancs.ac.uk/nagios<br />
<br />
''' My EGI portal :''' https://mon.egi.eu/myegi/<br />
<br />
== Introduction of WLCG Nagios ==<br />
<br />
WLCG Nagios is replacement of old centralized SAM system to monitor WLCG grid infrastructure. It enabled regional entities, such as ROC or NGI to deploy and maintain regional monitoring infrastructure. Data produced by regional nagios is also used by project level system to carry out task such as Service Level Agreement(SLA) calculations etc. Regional Dashboard also use the same data to show problems at site and subsequently tickets are created based on those alarms.<br />
WLCG-Nagios was developed to integrate nagios for WLCG monitoring as part of the EGEE-SA1 Multi Level Monitoring [https://twiki.cern.ch/twiki/bin/view/EGEE/MultiLevelMonitoringOverview MLM] approach. Now it is maintained under EGI and current home page of WLCG Nagios is [https://wiki.egi.eu/wiki/SAM here]. Apart from Nagios, main component of WLCG nagios are<br />
<br />
<br />
'''Aggregated Topology Provider (ATP)''' : ATP is installed as part of ROC/NGI WLCG Nagios package and it collects and aggregate topology related information from various information provider like GOCDB, CIC Portal, BDII and different VO feeds. It is the single authoritative information source for current wlcg grid topology.<br />
<br />
'''Nagios Configuration Generator(NCG)''' : It is the configuration tool which generates nagios configuration based on current topology provided by ATP. A cron job runs NCG every six hour so any changes in topology is included into nagios portal with in six hours.<br />
<br />
'''Messaging Infrastructure''' : An ActiveMQ based messaging infrastructure is used to publish all test results. The idea is that every test result should be published to a Topic or Queue on message bus so any tool can subscribe to that topic or queue and get latest result. Like Regional Dashboard subscribes to alarms queue and get all results directly from message bus<br />
<br />
== UKI Regional Nagios Monitoring Infrastructure ==<br />
<br />
Oxford is hosting and maintaining Regional Nagios([https://gridppnagios.physics.ox.ac.uk/nagios Gridppnagios]) for UK.<br />
WLCG nagios is running on a Dell 610 machine with 16GB of RAM. Jobs are submitted through WMS using a proxy generated by robot certificate. We are not using dedicated WMS any more and Nagios submission system picks up a random WMS from a configured list. A dedicated SE(storage-monit.physics.ox.ac.uk) hosted at Oxford is used for se-replication test of ARC-CE.<br />
<br />
==Access to WLCG Nagios Portal ==<br />
<br />
Access to WLCG nagios portal is enabled for all members of ops and dteam VO apart from persons registered as site admin or regional staff in GOCDB. <br />
Site admins are also authorized to schedule tests for their respective sites. Regional staff members can schedule test for all sites in NGI.<br />
<br />
Access can also be provided to other persons on request. Please send a mail to lcg_manager@physics.ox.ac.uk.<br />
<br />
== Understanding WLCG Nagios Tests ==<br />
<br />
Nagios plugins are available for almost all grid services and it is explained in detail [https://tomtools.cern.ch/confluence/display/SAM/SAM+Architecture.html here] . Explaining all tests are out of scope of this wiki so I am giving a brief overview of CE and SE which is the main component of a grid site. <br />
<br />
===CE Test=== <br />
Nagios submits a job to CE through WMS and the result is sent to message bus directly from WN. Nagios subscribe this result from message bus and publish it as passive result in Nagios portal. CE test comprise of mainly fallowing steps<br />
# Check env variable like, LCG_GFAL_INFOSYS<br />
# Check the version of lcg-CA and glite middleware installed at WN<br />
# Copy and register a file to close SE and then download it to WN and compare it<br />
# Replicate the same file to a central SE define in regional nagios.For UKI, it is storage-monit.physics.ox.ac.uk but it can be any SE.<br />
# Delete all tests file from SE(s).<br />
<br />
===SE test===<br />
<br />
A wrapper script is launched from Nagios which test different metrics and publish result as passive result in Nagios Portal. Main metrics are<br />
#Get full SRM endpoint and storage area from BDII<br />
#Copy and list a test file to SRM<br />
#Get transport URL of file and download it to tmp space and the delete all test files.<br />
<br />
==Rescheduling Tests==<br />
<br />
Site admins registered in GOCDB can reschedule tests for their respective sites for troubleshooting. <br />
===Rescheduling SE test===<br />
<br />
Scheduling SE test is quite straight forward. click on "org.sam.SRM-All-/ops/Role=lcgadmin" for your SE and then click on "Re-schedule the next check of this service" in service command option and then "commit". Point to note here is that org.sam.SRM-All is a wrapper metric and it will reschedule all test for your SE. You can not reschedule other test in SE as they are passive test.<br />
<br />
===Rescheduling CE test===<br />
<br />
For CE, reschedule "org.sam.CREAMCE-JobState-/ops/Role=lcgadmin" as same as above. Here also org.sam.CREAMCE-jobState is wrapper test for all other CE tests and you should not try to reschedule any passive test because it throws an error and it persist there. There is an extra complication in CE test that you can only reschedule org.sam.CREAMCE-JobState if status of this test is either OK or failure. You can not reschedule if status is waiting, pending or running.<br />
==MyEGI Portal==<br />
<br />
[https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] is the visualization tool of WLCG Nagios package. It is the replacement for SAMDB Portal. It consists of<br />
* Regional GridMap<br />
*Service View : Shows all services with current status<br />
*Metric Status View : Show the status of all tests per service and flavour.<br />
*History View : Show a graph of the current status of a service over time<br />
<br />
==Backup Regional Nagios at Lancaster==<br />
<br />
https://gridppnagios.lancs.ac.uk/nagios<br />
<br />
Backup Nagios functions exactly same way as main regional Nagios. The only difference is that it doesn't send alarms to Dashboard.<br />
<br />
<br />
== Maintenance and Troubleshooting ==<br />
<br />
=== Switch Active Nagios between Oxford and Lancaster: ===<br />
<br />
1. Uncomment #NCG_BACKUP_INSTANCE=true in site-info.def at Active Nagios and run yaim<br />
<br />
/opt/glite/yaim/bin/yaim -c -s /etc/yaim/site-info.def -n NAGIOS -n SAM_NAGIOS<br />
<br />
<br />
This will turn Active Nagios into Backup Nagios<br />
<br />
<br />
2. Now comment out NCG_BACKUP_INSTANCE in site-info.def at backup nagios and run yaim in same way. It will be become active one.<br />
<br />
=== Changing WMS for SAM Nagios ===<br />
<br />
VO_OPS_WMS_HOSTS in site-info.def list wms to be use for submitting jobs to service nodes. Sometimes if we want to remove a misbehaving WMS from the list then there is two option.<br />
<br />
<br />
1. Change VO_OPS_WMS_HOSTS in site-info.def and run yaim as above.<br />
<br />
<br />
2. Edit /etc/glite-wms/ops/glite_wms.conf and glite_wmsui.conf directly and then restart nagios<br />
<br />
/etc/init.d/nagios restart<br />
<br />
<br />
<br />
<br />
<br />
==FAQ and Error Messages== <br />
Q. I have added or removed a service, how much time nagios will take to update configuration ?<br />
A. Nagios reconfigure itself every six hour, So nagios will update with in 6 hour of information being publish in top bdii. No manual intervention is required<br />
<br />
Q. How to subscribe for nagios alerts ?<br />
A. There is no facility where user can subscribe alerts for himself. You have to ask Regional Nagios admin for subscription of email alerts.<br />
<br />
Q. Nagios job failing with Exit Code!=0<br />
A. Nagios script launches a Simple-MTA process to send result to message bus. In most of the cases if job completed but <br />
Simple-MTA process could not be launch then it throw this error. Main reasons are<br />
openldap-clients is not installed on WN<br />
ldapsearch -b o=grid -h "TOP-BDII" -p 2170 -x "(GlueServiceType=msg.broker.stomp)" to see that BDII defined in WN is publishing <br />
information about message broker<br />
Check BDII_LIST=lcgbdii.gridpp.rl.ac.uk:2170,lcg-bdii.gridpp.ac.uk:2170,lcg-bdii.cern.ch:2170 <br />
Check firewall outgoing for TCP:6163 port<br />
<br />
Q.org.sam.WN-RepCr test failing with " send2nsd: NS002 - send error : Bad credentials cannot create"<br />
A. Check crl at WN, it may be one of the reason<br />
<br />
Q How to recalculate availability and reliability of sites in case of problem<br />
A https://wiki.egi.eu/wiki/PROC10 <br />
https://tomtools.cern.ch/confluence/display/SAMDOC/Availability+Re-computation+Policy<br />
<br />
== Useful Links ==<br />
UKI Myegi Page<br />
https://gridppnagios.physics.ox.ac.uk/myegi<br />
<br />
UKI WLCG Nagios<br />
https://gridppnagios.physics.ox.ac.uk/nagios<br />
<br />
Nagios Test for one site <br />
https://gridppnagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?hostgroup=site-UKI-LT2-QMUL&style=detail<br />
<br />
Nagios Test for one service i.e glexec<br />
https://gridppnagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?servicegroup=org.sam.glexec.CE&style=detail<br />
<br />
Entry point for all SAM related information<br />
https://wiki.egi.eu/wiki/SAM<br />
<br />
<br />
Operational Dashboard workflow<br />
https://forge.in2p3.fr/projects/opsportaluser/wiki/Operations_Dashboard <br />
<br />
<br />
How to use Dashboard and Nagios web interface : second part of presentation is good for Nagios<br />
https://documents.egi.eu/public/RetrieveFile?docid=301&version=6&filename=Training_guide_general_v1.pdf<br />
<br />
<br />
{{KeyDocs|responsible=Kashif Mohammad|reviewdate=2015-03-27|accuratedate=2015-03-27|percentage=100}}</div>Kashif Mohammad 6ae08fa8ffhttps://www.gridpp.ac.uk/wiki/GitHub_RepositoriesGitHub Repositories2015-06-04T08:53:04Z<p>Kashif Mohammad 6ae08fa8ff: </p>
<hr />
<div>List of useful github repositories. Please feel free to add any new category or repository.<br />
== Catch-all repositories ==<br />
https://github.com/gridpp<br />
<br />
https://github.com/stfc<br />
<br />
== Puppet Modules ==<br />
https://github.com/cernops<br />
<br />
https://github.com/HEP-puppet<br />
<br />
https://github.com/oxford-physics<br />
<br />
== Monitoring ==<br />
<br />
https://github.com/alahiff/ral-htcondor-nagios-plugins<br />
<br />
<br />
== Logstash ==</div>Kashif Mohammad 6ae08fa8ffhttps://www.gridpp.ac.uk/wiki/UKI_WLCG_Regional_NagiosUKI WLCG Regional Nagios2015-03-27T10:30:35Z<p>Kashif Mohammad 6ae08fa8ff: /* Useful Links */</p>
<hr />
<div>== Nagios Status ==<br />
'''Current Active Nagios :''' https://gridppnagios.physics.ox.ac.uk/nagios<br />
<br />
''' Standby :''' https://gridppnagios.lancs.ac.uk/nagios<br />
<br />
''' My EGI portal :''' https://mon.egi.eu/myegi/<br />
<br />
== Introduction of WLCG Nagios ==<br />
<br />
WLCG Nagios is replacement of old centralized SAM system to monitor WLCG grid infrastructure. It enabled regional entities, such as ROC or NGI to deploy and maintain regional monitoring infrastructure. Data produced by regional nagios is also used by project level system to carry out task such as Service Level Agreement(SLA) calculations etc. Regional Dashboard also use the same data to show problems at site and subsequently tickets are created based on those alarms.<br />
WLCG-Nagios was developed to integrate nagios for WLCG monitoring as part of the EGEE-SA1 Multi Level Monitoring [https://twiki.cern.ch/twiki/bin/view/EGEE/MultiLevelMonitoringOverview MLM] approach. Now it is maintained under EGI and current home page of WLCG Nagios is [https://wiki.egi.eu/wiki/SAM here]. Apart from Nagios, main component of WLCG nagios are<br />
<br />
<br />
'''Aggregated Topology Provider (ATP)''' : ATP is installed as part of ROC/NGI WLCG Nagios package and it collects and aggregate topology related information from various information provider like GOCDB, CIC Portal, BDII and different VO feeds. It is the single authoritative information source for current wlcg grid topology.<br />
<br />
'''Nagios Configuration Generator(NCG)''' : It is the configuration tool which generates nagios configuration based on current topology provided by ATP. A cron job runs NCG every six hour so any changes in topology is included into nagios portal with in six hours.<br />
<br />
'''Messaging Infrastructure''' : An ActiveMQ based messaging infrastructure is used to publish all test results. The idea is that every test result should be published to a Topic or Queue on message bus so any tool can subscribe to that topic or queue and get latest result. Like Regional Dashboard subscribes to alarms queue and get all results directly from message bus<br />
<br />
== UKI Regional Nagios Monitoring Infrastructure ==<br />
<br />
Oxford is hosting and maintaining Regional Nagios([https://gridppnagios.physics.ox.ac.uk/nagios Gridppnagios]) for UK.<br />
WLCG nagios is running on a Dell 610 machine with 16GB of RAM. Jobs are submitted through WMS using a proxy generated by robot certificate. We are not using dedicated WMS any more and Nagios submission system picks up a random WMS from a configured list. A dedicated SE(storage-monit.physics.ox.ac.uk) hosted at Oxford is used for se-replication test of ARC-CE.<br />
<br />
==Access to WLCG Nagios Portal ==<br />
<br />
Access to WLCG nagios portal is enabled for all members of ops and dteam VO apart from persons registered as site admin or regional staff in GOCDB. <br />
Site admins are also authorized to schedule tests for their respective sites. Regional staff members can schedule test for all sites in NGI.<br />
<br />
Access can also be provided to other persons on request. Please send a mail to lcg_manager@physics.ox.ac.uk.<br />
<br />
== Understanding WLCG Nagios Tests ==<br />
<br />
Nagios plugins are available for almost all grid services and it is explained in detail [https://tomtools.cern.ch/confluence/display/SAM/SAM+Architecture.html here] . Explaining all tests are out of scope of this wiki so I am giving a brief overview of CE and SE which is the main component of a grid site. <br />
<br />
===CE Test=== <br />
Nagios submits a job to CE through WMS and the result is sent to message bus directly from WN. Nagios subscribe this result from message bus and publish it as passive result in Nagios portal. CE test comprise of mainly fallowing steps<br />
# Check env variable like, LCG_GFAL_INFOSYS<br />
# Check the version of lcg-CA and glite middleware installed at WN<br />
# Copy and register a file to close SE and then download it to WN and compare it<br />
# Replicate the same file to a central SE define in regional nagios.For UKI, it is storage-monit.physics.ox.ac.uk but it can be any SE.<br />
# Delete all tests file from SE(s).<br />
<br />
===SE test===<br />
<br />
A wrapper script is launched from Nagios which test different metrics and publish result as passive result in Nagios Portal. Main metrics are<br />
#Get full SRM endpoint and storage area from BDII<br />
#Copy and list a test file to SRM<br />
#Get transport URL of file and download it to tmp space and the delete all test files.<br />
<br />
==Rescheduling Tests==<br />
<br />
Site admins registered in GOCDB can reschedule tests for their respective sites for troubleshooting. <br />
===Rescheduling SE test===<br />
<br />
Scheduling SE test is quite straight forward. click on "org.sam.SRM-All-/ops/Role=lcgadmin" for your SE and then click on "Re-schedule the next check of this service" in service command option and then "commit". Point to note here is that org.sam.SRM-All is a wrapper metric and it will reschedule all test for your SE. You can not reschedule other test in SE as they are passive test.<br />
<br />
===Rescheduling CE test===<br />
<br />
For CE, reschedule "org.sam.CREAMCE-JobState-/ops/Role=lcgadmin" as same as above. Here also org.sam.CREAMCE-jobState is wrapper test for all other CE tests and you should not try to reschedule any passive test because it throws an error and it persist there. There is an extra complication in CE test that you can only reschedule org.sam.CREAMCE-JobState if status of this test is either OK or failure. You can not reschedule if status is waiting, pending or running.<br />
==MyEGI Portal==<br />
<br />
[https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] is the visualization tool of WLCG Nagios package. It is the replacement for SAMDB Portal. It consists of<br />
* Regional GridMap<br />
*Service View : Shows all services with current status<br />
*Metric Status View : Show the status of all tests per service and flavour.<br />
*History View : Show a graph of the current status of a service over time<br />
<br />
==Backup Regional Nagios at Lancaster==<br />
<br />
https://gridppnagios.lancs.ac.uk/nagios<br />
<br />
Backup Nagios functions exactly same way as main regional Nagios. The only difference is that it doesn't send alarms to Dashboard.<br />
<br />
==FAQ and Error Messages== <br />
Q. I have added or removed a service, how much time nagios will take to update configuration ?<br />
A. Nagios reconfigure itself every six hour, So nagios will update with in 6 hour of information being publish in top bdii. No manual intervention is required<br />
<br />
Q. How to subscribe for nagios alerts ?<br />
A. There is no facility where user can subscribe alerts for himself. You have to ask Regional Nagios admin for subscription of email alerts.<br />
<br />
Q. Nagios job failing with Exit Code!=0<br />
A. Nagios script launches a Simple-MTA process to send result to message bus. In most of the cases if job completed but <br />
Simple-MTA process could not be launch then it throw this error. Main reasons are<br />
openldap-clients is not installed on WN<br />
ldapsearch -b o=grid -h "TOP-BDII" -p 2170 -x "(GlueServiceType=msg.broker.stomp)" to see that BDII defined in WN is publishing <br />
information about message broker<br />
Check BDII_LIST=lcgbdii.gridpp.rl.ac.uk:2170,lcg-bdii.gridpp.ac.uk:2170,lcg-bdii.cern.ch:2170 <br />
Check firewall outgoing for TCP:6163 port<br />
<br />
Q.org.sam.WN-RepCr test failing with " send2nsd: NS002 - send error : Bad credentials cannot create"<br />
A. Check crl at WN, it may be one of the reason<br />
<br />
Q How to recalculate availability and reliability of sites in case of problem<br />
A https://wiki.egi.eu/wiki/PROC10 <br />
https://tomtools.cern.ch/confluence/display/SAMDOC/Availability+Re-computation+Policy<br />
<br />
== Useful Links ==<br />
UKI Myegi Page<br />
https://gridppnagios.physics.ox.ac.uk/myegi<br />
<br />
UKI WLCG Nagios<br />
https://gridppnagios.physics.ox.ac.uk/nagios<br />
<br />
Nagios Test for one site <br />
https://gridppnagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?hostgroup=site-UKI-LT2-QMUL&style=detail<br />
<br />
Nagios Test for one service i.e glexec<br />
https://gridppnagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?servicegroup=org.sam.glexec.CE&style=detail<br />
<br />
Entry point for all SAM related information<br />
https://wiki.egi.eu/wiki/SAM<br />
<br />
<br />
Operational Dashboard workflow<br />
https://forge.in2p3.fr/projects/opsportaluser/wiki/Operations_Dashboard <br />
<br />
<br />
How to use Dashboard and Nagios web interface : second part of presentation is good for Nagios<br />
https://documents.egi.eu/public/RetrieveFile?docid=301&version=6&filename=Training_guide_general_v1.pdf<br />
<br />
<br />
{{KeyDocs|responsible=Kashif Mohammad|reviewdate=2015-03-27|accuratedate=2015-03-27|percentage=100}}</div>Kashif Mohammad 6ae08fa8ffhttps://www.gridpp.ac.uk/wiki/UKI_WLCG_Regional_NagiosUKI WLCG Regional Nagios2015-03-27T10:30:00Z<p>Kashif Mohammad 6ae08fa8ff: /* Useful Links */</p>
<hr />
<div>== Nagios Status ==<br />
'''Current Active Nagios :''' https://gridppnagios.physics.ox.ac.uk/nagios<br />
<br />
''' Standby :''' https://gridppnagios.lancs.ac.uk/nagios<br />
<br />
''' My EGI portal :''' https://mon.egi.eu/myegi/<br />
<br />
== Introduction of WLCG Nagios ==<br />
<br />
WLCG Nagios is replacement of old centralized SAM system to monitor WLCG grid infrastructure. It enabled regional entities, such as ROC or NGI to deploy and maintain regional monitoring infrastructure. Data produced by regional nagios is also used by project level system to carry out task such as Service Level Agreement(SLA) calculations etc. Regional Dashboard also use the same data to show problems at site and subsequently tickets are created based on those alarms.<br />
WLCG-Nagios was developed to integrate nagios for WLCG monitoring as part of the EGEE-SA1 Multi Level Monitoring [https://twiki.cern.ch/twiki/bin/view/EGEE/MultiLevelMonitoringOverview MLM] approach. Now it is maintained under EGI and current home page of WLCG Nagios is [https://wiki.egi.eu/wiki/SAM here]. Apart from Nagios, main component of WLCG nagios are<br />
<br />
<br />
'''Aggregated Topology Provider (ATP)''' : ATP is installed as part of ROC/NGI WLCG Nagios package and it collects and aggregate topology related information from various information provider like GOCDB, CIC Portal, BDII and different VO feeds. It is the single authoritative information source for current wlcg grid topology.<br />
<br />
'''Nagios Configuration Generator(NCG)''' : It is the configuration tool which generates nagios configuration based on current topology provided by ATP. A cron job runs NCG every six hour so any changes in topology is included into nagios portal with in six hours.<br />
<br />
'''Messaging Infrastructure''' : An ActiveMQ based messaging infrastructure is used to publish all test results. The idea is that every test result should be published to a Topic or Queue on message bus so any tool can subscribe to that topic or queue and get latest result. Like Regional Dashboard subscribes to alarms queue and get all results directly from message bus<br />
<br />
== UKI Regional Nagios Monitoring Infrastructure ==<br />
<br />
Oxford is hosting and maintaining Regional Nagios([https://gridppnagios.physics.ox.ac.uk/nagios Gridppnagios]) for UK.<br />
WLCG nagios is running on a Dell 610 machine with 16GB of RAM. Jobs are submitted through WMS using a proxy generated by robot certificate. We are not using dedicated WMS any more and Nagios submission system picks up a random WMS from a configured list. A dedicated SE(storage-monit.physics.ox.ac.uk) hosted at Oxford is used for se-replication test of ARC-CE.<br />
<br />
==Access to WLCG Nagios Portal ==<br />
<br />
Access to WLCG nagios portal is enabled for all members of ops and dteam VO apart from persons registered as site admin or regional staff in GOCDB. <br />
Site admins are also authorized to schedule tests for their respective sites. Regional staff members can schedule test for all sites in NGI.<br />
<br />
Access can also be provided to other persons on request. Please send a mail to lcg_manager@physics.ox.ac.uk.<br />
<br />
== Understanding WLCG Nagios Tests ==<br />
<br />
Nagios plugins are available for almost all grid services and it is explained in detail [https://tomtools.cern.ch/confluence/display/SAM/SAM+Architecture.html here] . Explaining all tests are out of scope of this wiki so I am giving a brief overview of CE and SE which is the main component of a grid site. <br />
<br />
===CE Test=== <br />
Nagios submits a job to CE through WMS and the result is sent to message bus directly from WN. Nagios subscribe this result from message bus and publish it as passive result in Nagios portal. CE test comprise of mainly fallowing steps<br />
# Check env variable like, LCG_GFAL_INFOSYS<br />
# Check the version of lcg-CA and glite middleware installed at WN<br />
# Copy and register a file to close SE and then download it to WN and compare it<br />
# Replicate the same file to a central SE define in regional nagios.For UKI, it is storage-monit.physics.ox.ac.uk but it can be any SE.<br />
# Delete all tests file from SE(s).<br />
<br />
===SE test===<br />
<br />
A wrapper script is launched from Nagios which test different metrics and publish result as passive result in Nagios Portal. Main metrics are<br />
#Get full SRM endpoint and storage area from BDII<br />
#Copy and list a test file to SRM<br />
#Get transport URL of file and download it to tmp space and the delete all test files.<br />
<br />
==Rescheduling Tests==<br />
<br />
Site admins registered in GOCDB can reschedule tests for their respective sites for troubleshooting. <br />
===Rescheduling SE test===<br />
<br />
Scheduling SE test is quite straight forward. click on "org.sam.SRM-All-/ops/Role=lcgadmin" for your SE and then click on "Re-schedule the next check of this service" in service command option and then "commit". Point to note here is that org.sam.SRM-All is a wrapper metric and it will reschedule all test for your SE. You can not reschedule other test in SE as they are passive test.<br />
<br />
===Rescheduling CE test===<br />
<br />
For CE, reschedule "org.sam.CREAMCE-JobState-/ops/Role=lcgadmin" as same as above. Here also org.sam.CREAMCE-jobState is wrapper test for all other CE tests and you should not try to reschedule any passive test because it throws an error and it persist there. There is an extra complication in CE test that you can only reschedule org.sam.CREAMCE-JobState if status of this test is either OK or failure. You can not reschedule if status is waiting, pending or running.<br />
==MyEGI Portal==<br />
<br />
[https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] is the visualization tool of WLCG Nagios package. It is the replacement for SAMDB Portal. It consists of<br />
* Regional GridMap<br />
*Service View : Shows all services with current status<br />
*Metric Status View : Show the status of all tests per service and flavour.<br />
*History View : Show a graph of the current status of a service over time<br />
<br />
==Backup Regional Nagios at Lancaster==<br />
<br />
https://gridppnagios.lancs.ac.uk/nagios<br />
<br />
Backup Nagios functions exactly same way as main regional Nagios. The only difference is that it doesn't send alarms to Dashboard.<br />
<br />
==FAQ and Error Messages== <br />
Q. I have added or removed a service, how much time nagios will take to update configuration ?<br />
A. Nagios reconfigure itself every six hour, So nagios will update with in 6 hour of information being publish in top bdii. No manual intervention is required<br />
<br />
Q. How to subscribe for nagios alerts ?<br />
A. There is no facility where user can subscribe alerts for himself. You have to ask Regional Nagios admin for subscription of email alerts.<br />
<br />
Q. Nagios job failing with Exit Code!=0<br />
A. Nagios script launches a Simple-MTA process to send result to message bus. In most of the cases if job completed but <br />
Simple-MTA process could not be launch then it throw this error. Main reasons are<br />
openldap-clients is not installed on WN<br />
ldapsearch -b o=grid -h "TOP-BDII" -p 2170 -x "(GlueServiceType=msg.broker.stomp)" to see that BDII defined in WN is publishing <br />
information about message broker<br />
Check BDII_LIST=lcgbdii.gridpp.rl.ac.uk:2170,lcg-bdii.gridpp.ac.uk:2170,lcg-bdii.cern.ch:2170 <br />
Check firewall outgoing for TCP:6163 port<br />
<br />
Q.org.sam.WN-RepCr test failing with " send2nsd: NS002 - send error : Bad credentials cannot create"<br />
A. Check crl at WN, it may be one of the reason<br />
<br />
Q How to recalculate availability and reliability of sites in case of problem<br />
A https://wiki.egi.eu/wiki/PROC10 <br />
https://tomtools.cern.ch/confluence/display/SAMDOC/Availability+Re-computation+Policy<br />
<br />
== Useful Links ==<br />
UKI Myegi Page<br />
https://gridppnagios.physics.ox.ac.uk/myegi<br />
<br />
UKI WLCG Nagios<br />
https://gridppnagios.physics.ox.ac.uk/nagios<br />
<br />
Nagios Test for one site <br />
https://gridppnagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?hostgroup=site-UKI-LT2-QMUL&style=detail<br />
<br />
Nagios Test for one service i.e glexec<br />
https://gridppnagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?servicegroup=org.sam.glexec.CE&style=detail<br />
<br />
Entry point for all SAM related information<br />
https://wiki.egi.eu/wiki/SAM<br />
<br />
<br />
Operational Dashboard workflow<br />
https://forge.in2p3.fr/projects/opsportaluser/wiki/Operations_Dashboard <br />
<br />
<br />
How to use Dashboard and Nagios web interface : second part of presentation is good for Nagios<br />
https://documents.egi.eu/public/RetrieveFile?docid=301&version=6&filename=Training_guide_general_v1.pdf<br />
<br />
<br />
{{KeyDocs|responsible=Kashif Mohammad|reviewdate=2014-10-03|accuratedate=2015-03-27|percentage=100}}</div>Kashif Mohammad 6ae08fa8ffhttps://www.gridpp.ac.uk/wiki/UKI_WLCG_Regional_NagiosUKI WLCG Regional Nagios2015-03-27T10:28:37Z<p>Kashif Mohammad 6ae08fa8ff: /* Nagios Status */</p>
<hr />
<div>== Nagios Status ==<br />
'''Current Active Nagios :''' https://gridppnagios.physics.ox.ac.uk/nagios<br />
<br />
''' Standby :''' https://gridppnagios.lancs.ac.uk/nagios<br />
<br />
''' My EGI portal :''' https://mon.egi.eu/myegi/<br />
<br />
== Introduction of WLCG Nagios ==<br />
<br />
WLCG Nagios is replacement of old centralized SAM system to monitor WLCG grid infrastructure. It enabled regional entities, such as ROC or NGI to deploy and maintain regional monitoring infrastructure. Data produced by regional nagios is also used by project level system to carry out task such as Service Level Agreement(SLA) calculations etc. Regional Dashboard also use the same data to show problems at site and subsequently tickets are created based on those alarms.<br />
WLCG-Nagios was developed to integrate nagios for WLCG monitoring as part of the EGEE-SA1 Multi Level Monitoring [https://twiki.cern.ch/twiki/bin/view/EGEE/MultiLevelMonitoringOverview MLM] approach. Now it is maintained under EGI and current home page of WLCG Nagios is [https://wiki.egi.eu/wiki/SAM here]. Apart from Nagios, main component of WLCG nagios are<br />
<br />
<br />
'''Aggregated Topology Provider (ATP)''' : ATP is installed as part of ROC/NGI WLCG Nagios package and it collects and aggregate topology related information from various information provider like GOCDB, CIC Portal, BDII and different VO feeds. It is the single authoritative information source for current wlcg grid topology.<br />
<br />
'''Nagios Configuration Generator(NCG)''' : It is the configuration tool which generates nagios configuration based on current topology provided by ATP. A cron job runs NCG every six hour so any changes in topology is included into nagios portal with in six hours.<br />
<br />
'''Messaging Infrastructure''' : An ActiveMQ based messaging infrastructure is used to publish all test results. The idea is that every test result should be published to a Topic or Queue on message bus so any tool can subscribe to that topic or queue and get latest result. Like Regional Dashboard subscribes to alarms queue and get all results directly from message bus<br />
<br />
== UKI Regional Nagios Monitoring Infrastructure ==<br />
<br />
Oxford is hosting and maintaining Regional Nagios([https://gridppnagios.physics.ox.ac.uk/nagios Gridppnagios]) for UK.<br />
WLCG nagios is running on a Dell 610 machine with 16GB of RAM. Jobs are submitted through WMS using a proxy generated by robot certificate. We are not using dedicated WMS any more and Nagios submission system picks up a random WMS from a configured list. A dedicated SE(storage-monit.physics.ox.ac.uk) hosted at Oxford is used for se-replication test of ARC-CE.<br />
<br />
==Access to WLCG Nagios Portal ==<br />
<br />
Access to WLCG nagios portal is enabled for all members of ops and dteam VO apart from persons registered as site admin or regional staff in GOCDB. <br />
Site admins are also authorized to schedule tests for their respective sites. Regional staff members can schedule test for all sites in NGI.<br />
<br />
Access can also be provided to other persons on request. Please send a mail to lcg_manager@physics.ox.ac.uk.<br />
<br />
== Understanding WLCG Nagios Tests ==<br />
<br />
Nagios plugins are available for almost all grid services and it is explained in detail [https://tomtools.cern.ch/confluence/display/SAM/SAM+Architecture.html here] . Explaining all tests are out of scope of this wiki so I am giving a brief overview of CE and SE which is the main component of a grid site. <br />
<br />
===CE Test=== <br />
Nagios submits a job to CE through WMS and the result is sent to message bus directly from WN. Nagios subscribe this result from message bus and publish it as passive result in Nagios portal. CE test comprise of mainly fallowing steps<br />
# Check env variable like, LCG_GFAL_INFOSYS<br />
# Check the version of lcg-CA and glite middleware installed at WN<br />
# Copy and register a file to close SE and then download it to WN and compare it<br />
# Replicate the same file to a central SE define in regional nagios.For UKI, it is storage-monit.physics.ox.ac.uk but it can be any SE.<br />
# Delete all tests file from SE(s).<br />
<br />
===SE test===<br />
<br />
A wrapper script is launched from Nagios which test different metrics and publish result as passive result in Nagios Portal. Main metrics are<br />
#Get full SRM endpoint and storage area from BDII<br />
#Copy and list a test file to SRM<br />
#Get transport URL of file and download it to tmp space and the delete all test files.<br />
<br />
==Rescheduling Tests==<br />
<br />
Site admins registered in GOCDB can reschedule tests for their respective sites for troubleshooting. <br />
===Rescheduling SE test===<br />
<br />
Scheduling SE test is quite straight forward. click on "org.sam.SRM-All-/ops/Role=lcgadmin" for your SE and then click on "Re-schedule the next check of this service" in service command option and then "commit". Point to note here is that org.sam.SRM-All is a wrapper metric and it will reschedule all test for your SE. You can not reschedule other test in SE as they are passive test.<br />
<br />
===Rescheduling CE test===<br />
<br />
For CE, reschedule "org.sam.CREAMCE-JobState-/ops/Role=lcgadmin" as same as above. Here also org.sam.CREAMCE-jobState is wrapper test for all other CE tests and you should not try to reschedule any passive test because it throws an error and it persist there. There is an extra complication in CE test that you can only reschedule org.sam.CREAMCE-JobState if status of this test is either OK or failure. You can not reschedule if status is waiting, pending or running.<br />
==MyEGI Portal==<br />
<br />
[https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] is the visualization tool of WLCG Nagios package. It is the replacement for SAMDB Portal. It consists of<br />
* Regional GridMap<br />
*Service View : Shows all services with current status<br />
*Metric Status View : Show the status of all tests per service and flavour.<br />
*History View : Show a graph of the current status of a service over time<br />
<br />
==Backup Regional Nagios at Lancaster==<br />
<br />
https://gridppnagios.lancs.ac.uk/nagios<br />
<br />
Backup Nagios functions exactly same way as main regional Nagios. The only difference is that it doesn't send alarms to Dashboard.<br />
<br />
==FAQ and Error Messages== <br />
Q. I have added or removed a service, how much time nagios will take to update configuration ?<br />
A. Nagios reconfigure itself every six hour, So nagios will update with in 6 hour of information being publish in top bdii. No manual intervention is required<br />
<br />
Q. How to subscribe for nagios alerts ?<br />
A. There is no facility where user can subscribe alerts for himself. You have to ask Regional Nagios admin for subscription of email alerts.<br />
<br />
Q. Nagios job failing with Exit Code!=0<br />
A. Nagios script launches a Simple-MTA process to send result to message bus. In most of the cases if job completed but <br />
Simple-MTA process could not be launch then it throw this error. Main reasons are<br />
openldap-clients is not installed on WN<br />
ldapsearch -b o=grid -h "TOP-BDII" -p 2170 -x "(GlueServiceType=msg.broker.stomp)" to see that BDII defined in WN is publishing <br />
information about message broker<br />
Check BDII_LIST=lcgbdii.gridpp.rl.ac.uk:2170,lcg-bdii.gridpp.ac.uk:2170,lcg-bdii.cern.ch:2170 <br />
Check firewall outgoing for TCP:6163 port<br />
<br />
Q.org.sam.WN-RepCr test failing with " send2nsd: NS002 - send error : Bad credentials cannot create"<br />
A. Check crl at WN, it may be one of the reason<br />
<br />
Q How to recalculate availability and reliability of sites in case of problem<br />
A https://wiki.egi.eu/wiki/PROC10 <br />
https://tomtools.cern.ch/confluence/display/SAMDOC/Availability+Re-computation+Policy<br />
<br />
== Useful Links ==<br />
UKI Myegi Page<br />
https://gridppnagios.physics.ox.ac.uk/myegi<br />
<br />
UKI WLCG Nagios<br />
https://gridppnagios.physics.ox.ac.uk/nagios<br />
<br />
Nagios Test for one site <br />
https://gridppnagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?hostgroup=site-UKI-LT2-QMUL&style=detail<br />
<br />
Nagios Test for one service i.e glexec<br />
https://gridppnagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?servicegroup=org.sam.glexec.CE&style=detail<br />
<br />
Entry point for all SAM related information<br />
https://wiki.egi.eu/wiki/SAM<br />
<br />
<br />
Operational Dashboard workflow<br />
https://forge.in2p3.fr/projects/opsportaluser/wiki/Operations_Dashboard <br />
<br />
<br />
How to use Dashboard and Nagios web interface : second part of presentation is good for Nagios<br />
https://documents.egi.eu/public/RetrieveFile?docid=301&version=6&filename=Training_guide_general_v1.pdf<br />
<br />
<br />
{{KeyDocs|responsible=Kashif Mohammad|reviewdate=2014-10-03|accuratedate=2014-10-03|percentage=100}}</div>Kashif Mohammad 6ae08fa8ffhttps://www.gridpp.ac.uk/wiki/Operations_Team_Action_itemsOperations Team Action items2015-01-27T10:57:18Z<p>Kashif Mohammad 6ae08fa8ff: /* Count list for minute taking */</p>
<hr />
<div>=== Count list for minute taking === <br />
<br />
Works left -> right. EVO meeting = 1: F2F meeting = 5<br />
<br />
IN: Duncan=9 Sam=10 Stephen=10 Chris=9 Kashif=11 Wahid=10 Ewan=10 Pete=10 Jeremy=10 Daniela=10 Andrew=10 Matt=11 David=11 Alessandra=11 Brian=11<br />
OUT: Mark=6<br />
Minutes can be found at: [http://indico.cern.ch/categoryDisplay.py?categId=338 UKI-ROC agenda]<br />
<br />
=== Action list ===<br />
<br />
{|border="1" cellpadding="1"<br />
|+<br />
<br />
|-style="background:#7C8AAF;color:white"<br />
!Action ID<br />
!Action description<br />
!Owner<br />
!Target date<br />
!Status<br />
!Date closed<br />
!Notes<br />
<br />
|-<br />
|O-120410-01<br />
| Some Action<br />
| Some person<br />
| 25-012-2012<br />
| Some status<br />
| 25-012-2024<br />
| Template<br />
<br />
<br />
<br />
<br />
|-<br />
|O-140506-02<br />
|Puppet config to make cvmfs changes once for all relevant VOs <br />
|Kashif<br />
|<br />
|Open ?<br />
|<br />
|Currently, when updating cvmfs information, it has to be done per VO rather than doing it once which then propagates as appropriate. CAN SOMEONE CHECK IF THIS HAS ACTUALLY BEEN DONE?<br />
<br />
<br />
|-<br />
| 0-140812-01<br />
| RIPE probe deployment<br />
| Jeremy (?)<br />
| 2014-08-20<br />
| Open<br />
|<br />
| Discuss deployment of RIPE probes with sites at GridPP 33 (Consider changing to "chase the sites which still need to install RIPE probes that they were given at GridPP33").<br />
|<br />
<br />
<br />
|-<br />
| 0-141216-01<br />
| Discuss regional VO cvmfs semantics, put forward a recommendation.<br />
| Ewan, Tom, Others.<br />
| 2014-12-16<br />
| Open<br />
|<br />
| With the rollout of regional VO cvmfs areas Ewan made a point that a one cvmfs repo per model might not be the best solution. Either we could use one cvmfs area for multiple VOs or the opposite: a cvmfs area for each VO's subgroup.<br />
|<br />
<br />
|-<br />
| 0-141216-02<br />
| <br />
| Tom, Jeremy, Lancaster lads<br />
| 2014-12-16<br />
| Open<br />
|<br />
| Engage with new northgrid VO members from UCLAN.<br />
|<br />
<br />
|<br />
|}<br />
<br />
See also: [[Operations Team Completed Actions]] and [[Operations Issues ]].<br />
<br />
See also the archived: [[Deployment Team Completed Actions]] and [[Deployment Issues ]].<br />
<br />
[[Category:GridPP Operations]]</div>Kashif Mohammad 6ae08fa8ffhttps://www.gridpp.ac.uk/wiki/UKI_WLCG_Regional_NagiosUKI WLCG Regional Nagios2014-10-03T14:58:39Z<p>Kashif Mohammad 6ae08fa8ff: /* Useful Links */</p>
<hr />
<div>== Nagios Status ==<br />
'''Current Active Nagios :''' https://gridppnagios.physics.ox.ac.uk/nagios<br />
<br />
''' Standby :''' https://gridppnagios.lancs.ac.uk/nagios<br />
<br />
== Introduction of WLCG Nagios ==<br />
<br />
WLCG Nagios is replacement of old centralized SAM system to monitor WLCG grid infrastructure. It enabled regional entities, such as ROC or NGI to deploy and maintain regional monitoring infrastructure. Data produced by regional nagios is also used by project level system to carry out task such as Service Level Agreement(SLA) calculations etc. Regional Dashboard also use the same data to show problems at site and subsequently tickets are created based on those alarms.<br />
WLCG-Nagios was developed to integrate nagios for WLCG monitoring as part of the EGEE-SA1 Multi Level Monitoring [https://twiki.cern.ch/twiki/bin/view/EGEE/MultiLevelMonitoringOverview MLM] approach. Now it is maintained under EGI and current home page of WLCG Nagios is [https://wiki.egi.eu/wiki/SAM here]. Apart from Nagios, main component of WLCG nagios are<br />
<br />
<br />
'''Aggregated Topology Provider (ATP)''' : ATP is installed as part of ROC/NGI WLCG Nagios package and it collects and aggregate topology related information from various information provider like GOCDB, CIC Portal, BDII and different VO feeds. It is the single authoritative information source for current wlcg grid topology.<br />
<br />
'''Nagios Configuration Generator(NCG)''' : It is the configuration tool which generates nagios configuration based on current topology provided by ATP. A cron job runs NCG every six hour so any changes in topology is included into nagios portal with in six hours.<br />
<br />
'''Messaging Infrastructure''' : An ActiveMQ based messaging infrastructure is used to publish all test results. The idea is that every test result should be published to a Topic or Queue on message bus so any tool can subscribe to that topic or queue and get latest result. Like Regional Dashboard subscribes to alarms queue and get all results directly from message bus<br />
<br />
== UKI Regional Nagios Monitoring Infrastructure ==<br />
<br />
Oxford is hosting and maintaining Regional Nagios([https://gridppnagios.physics.ox.ac.uk/nagios Gridppnagios]) for UK.<br />
WLCG nagios is running on a Dell 610 machine with 16GB of RAM. Jobs are submitted through WMS using a proxy generated by robot certificate. We are not using dedicated WMS any more and Nagios submission system picks up a random WMS from a configured list. A dedicated SE(storage-monit.physics.ox.ac.uk) hosted at Oxford is used for se-replication test of ARC-CE.<br />
<br />
==Access to WLCG Nagios Portal ==<br />
<br />
Access to WLCG nagios portal is enabled for all members of ops and dteam VO apart from persons registered as site admin or regional staff in GOCDB. <br />
Site admins are also authorized to schedule tests for their respective sites. Regional staff members can schedule test for all sites in NGI.<br />
<br />
Access can also be provided to other persons on request. Please send a mail to lcg_manager@physics.ox.ac.uk.<br />
<br />
== Understanding WLCG Nagios Tests ==<br />
<br />
Nagios plugins are available for almost all grid services and it is explained in detail [https://tomtools.cern.ch/confluence/display/SAM/SAM+Architecture.html here] . Explaining all tests are out of scope of this wiki so I am giving a brief overview of CE and SE which is the main component of a grid site. <br />
<br />
===CE Test=== <br />
Nagios submits a job to CE through WMS and the result is sent to message bus directly from WN. Nagios subscribe this result from message bus and publish it as passive result in Nagios portal. CE test comprise of mainly fallowing steps<br />
# Check env variable like, LCG_GFAL_INFOSYS<br />
# Check the version of lcg-CA and glite middleware installed at WN<br />
# Copy and register a file to close SE and then download it to WN and compare it<br />
# Replicate the same file to a central SE define in regional nagios.For UKI, it is storage-monit.physics.ox.ac.uk but it can be any SE.<br />
# Delete all tests file from SE(s).<br />
<br />
===SE test===<br />
<br />
A wrapper script is launched from Nagios which test different metrics and publish result as passive result in Nagios Portal. Main metrics are<br />
#Get full SRM endpoint and storage area from BDII<br />
#Copy and list a test file to SRM<br />
#Get transport URL of file and download it to tmp space and the delete all test files.<br />
<br />
==Rescheduling Tests==<br />
<br />
Site admins registered in GOCDB can reschedule tests for their respective sites for troubleshooting. <br />
===Rescheduling SE test===<br />
<br />
Scheduling SE test is quite straight forward. click on "org.sam.SRM-All-/ops/Role=lcgadmin" for your SE and then click on "Re-schedule the next check of this service" in service command option and then "commit". Point to note here is that org.sam.SRM-All is a wrapper metric and it will reschedule all test for your SE. You can not reschedule other test in SE as they are passive test.<br />
<br />
===Rescheduling CE test===<br />
<br />
For CE, reschedule "org.sam.CREAMCE-JobState-/ops/Role=lcgadmin" as same as above. Here also org.sam.CREAMCE-jobState is wrapper test for all other CE tests and you should not try to reschedule any passive test because it throws an error and it persist there. There is an extra complication in CE test that you can only reschedule org.sam.CREAMCE-JobState if status of this test is either OK or failure. You can not reschedule if status is waiting, pending or running.<br />
==MyEGI Portal==<br />
<br />
[https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] is the visualization tool of WLCG Nagios package. It is the replacement for SAMDB Portal. It consists of<br />
* Regional GridMap<br />
*Service View : Shows all services with current status<br />
*Metric Status View : Show the status of all tests per service and flavour.<br />
*History View : Show a graph of the current status of a service over time<br />
<br />
==Backup Regional Nagios at Lancaster==<br />
<br />
https://gridppnagios.lancs.ac.uk/nagios<br />
<br />
Backup Nagios functions exactly same way as main regional Nagios. The only difference is that it doesn't send alarms to Dashboard.<br />
<br />
==FAQ and Error Messages== <br />
Q. I have added or removed a service, how much time nagios will take to update configuration ?<br />
A. Nagios reconfigure itself every six hour, So nagios will update with in 6 hour of information being publish in top bdii. No manual intervention is required<br />
<br />
Q. How to subscribe for nagios alerts ?<br />
A. There is no facility where user can subscribe alerts for himself. You have to ask Regional Nagios admin for subscription of email alerts.<br />
<br />
Q. Nagios job failing with Exit Code!=0<br />
A. Nagios script launches a Simple-MTA process to send result to message bus. In most of the cases if job completed but <br />
Simple-MTA process could not be launch then it throw this error. Main reasons are<br />
openldap-clients is not installed on WN<br />
ldapsearch -b o=grid -h "TOP-BDII" -p 2170 -x "(GlueServiceType=msg.broker.stomp)" to see that BDII defined in WN is publishing <br />
information about message broker<br />
Check BDII_LIST=lcgbdii.gridpp.rl.ac.uk:2170,lcg-bdii.gridpp.ac.uk:2170,lcg-bdii.cern.ch:2170 <br />
Check firewall outgoing for TCP:6163 port<br />
<br />
Q.org.sam.WN-RepCr test failing with " send2nsd: NS002 - send error : Bad credentials cannot create"<br />
A. Check crl at WN, it may be one of the reason<br />
<br />
Q How to recalculate availability and reliability of sites in case of problem<br />
A https://wiki.egi.eu/wiki/PROC10 <br />
https://tomtools.cern.ch/confluence/display/SAMDOC/Availability+Re-computation+Policy<br />
<br />
== Useful Links ==<br />
UKI Myegi Page<br />
https://gridppnagios.physics.ox.ac.uk/myegi<br />
<br />
UKI WLCG Nagios<br />
https://gridppnagios.physics.ox.ac.uk/nagios<br />
<br />
Nagios Test for one site <br />
https://gridppnagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?hostgroup=site-UKI-LT2-QMUL&style=detail<br />
<br />
Nagios Test for one service i.e glexec<br />
https://gridppnagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?servicegroup=org.sam.glexec.CE&style=detail<br />
<br />
Entry point for all SAM related information<br />
https://wiki.egi.eu/wiki/SAM<br />
<br />
<br />
Operational Dashboard workflow<br />
https://forge.in2p3.fr/projects/opsportaluser/wiki/Operations_Dashboard <br />
<br />
<br />
How to use Dashboard and Nagios web interface : second part of presentation is good for Nagios<br />
https://documents.egi.eu/public/RetrieveFile?docid=301&version=6&filename=Training_guide_general_v1.pdf<br />
<br />
<br />
{{KeyDocs|responsible=Kashif Mohammad|reviewdate=2014-10-03|accuratedate=2014-10-03|percentage=100}}</div>Kashif Mohammad 6ae08fa8ffhttps://www.gridpp.ac.uk/wiki/UKI_WLCG_Regional_NagiosUKI WLCG Regional Nagios2014-10-03T14:57:41Z<p>Kashif Mohammad 6ae08fa8ff: /* Useful Links */</p>
<hr />
<div>== Nagios Status ==<br />
'''Current Active Nagios :''' https://gridppnagios.physics.ox.ac.uk/nagios<br />
<br />
''' Standby :''' https://gridppnagios.lancs.ac.uk/nagios<br />
<br />
== Introduction of WLCG Nagios ==<br />
<br />
WLCG Nagios is replacement of old centralized SAM system to monitor WLCG grid infrastructure. It enabled regional entities, such as ROC or NGI to deploy and maintain regional monitoring infrastructure. Data produced by regional nagios is also used by project level system to carry out task such as Service Level Agreement(SLA) calculations etc. Regional Dashboard also use the same data to show problems at site and subsequently tickets are created based on those alarms.<br />
WLCG-Nagios was developed to integrate nagios for WLCG monitoring as part of the EGEE-SA1 Multi Level Monitoring [https://twiki.cern.ch/twiki/bin/view/EGEE/MultiLevelMonitoringOverview MLM] approach. Now it is maintained under EGI and current home page of WLCG Nagios is [https://wiki.egi.eu/wiki/SAM here]. Apart from Nagios, main component of WLCG nagios are<br />
<br />
<br />
'''Aggregated Topology Provider (ATP)''' : ATP is installed as part of ROC/NGI WLCG Nagios package and it collects and aggregate topology related information from various information provider like GOCDB, CIC Portal, BDII and different VO feeds. It is the single authoritative information source for current wlcg grid topology.<br />
<br />
'''Nagios Configuration Generator(NCG)''' : It is the configuration tool which generates nagios configuration based on current topology provided by ATP. A cron job runs NCG every six hour so any changes in topology is included into nagios portal with in six hours.<br />
<br />
'''Messaging Infrastructure''' : An ActiveMQ based messaging infrastructure is used to publish all test results. The idea is that every test result should be published to a Topic or Queue on message bus so any tool can subscribe to that topic or queue and get latest result. Like Regional Dashboard subscribes to alarms queue and get all results directly from message bus<br />
<br />
== UKI Regional Nagios Monitoring Infrastructure ==<br />
<br />
Oxford is hosting and maintaining Regional Nagios([https://gridppnagios.physics.ox.ac.uk/nagios Gridppnagios]) for UK.<br />
WLCG nagios is running on a Dell 610 machine with 16GB of RAM. Jobs are submitted through WMS using a proxy generated by robot certificate. We are not using dedicated WMS any more and Nagios submission system picks up a random WMS from a configured list. A dedicated SE(storage-monit.physics.ox.ac.uk) hosted at Oxford is used for se-replication test of ARC-CE.<br />
<br />
==Access to WLCG Nagios Portal ==<br />
<br />
Access to WLCG nagios portal is enabled for all members of ops and dteam VO apart from persons registered as site admin or regional staff in GOCDB. <br />
Site admins are also authorized to schedule tests for their respective sites. Regional staff members can schedule test for all sites in NGI.<br />
<br />
Access can also be provided to other persons on request. Please send a mail to lcg_manager@physics.ox.ac.uk.<br />
<br />
== Understanding WLCG Nagios Tests ==<br />
<br />
Nagios plugins are available for almost all grid services and it is explained in detail [https://tomtools.cern.ch/confluence/display/SAM/SAM+Architecture.html here] . Explaining all tests are out of scope of this wiki so I am giving a brief overview of CE and SE which is the main component of a grid site. <br />
<br />
===CE Test=== <br />
Nagios submits a job to CE through WMS and the result is sent to message bus directly from WN. Nagios subscribe this result from message bus and publish it as passive result in Nagios portal. CE test comprise of mainly fallowing steps<br />
# Check env variable like, LCG_GFAL_INFOSYS<br />
# Check the version of lcg-CA and glite middleware installed at WN<br />
# Copy and register a file to close SE and then download it to WN and compare it<br />
# Replicate the same file to a central SE define in regional nagios.For UKI, it is storage-monit.physics.ox.ac.uk but it can be any SE.<br />
# Delete all tests file from SE(s).<br />
<br />
===SE test===<br />
<br />
A wrapper script is launched from Nagios which test different metrics and publish result as passive result in Nagios Portal. Main metrics are<br />
#Get full SRM endpoint and storage area from BDII<br />
#Copy and list a test file to SRM<br />
#Get transport URL of file and download it to tmp space and the delete all test files.<br />
<br />
==Rescheduling Tests==<br />
<br />
Site admins registered in GOCDB can reschedule tests for their respective sites for troubleshooting. <br />
===Rescheduling SE test===<br />
<br />
Scheduling SE test is quite straight forward. click on "org.sam.SRM-All-/ops/Role=lcgadmin" for your SE and then click on "Re-schedule the next check of this service" in service command option and then "commit". Point to note here is that org.sam.SRM-All is a wrapper metric and it will reschedule all test for your SE. You can not reschedule other test in SE as they are passive test.<br />
<br />
===Rescheduling CE test===<br />
<br />
For CE, reschedule "org.sam.CREAMCE-JobState-/ops/Role=lcgadmin" as same as above. Here also org.sam.CREAMCE-jobState is wrapper test for all other CE tests and you should not try to reschedule any passive test because it throws an error and it persist there. There is an extra complication in CE test that you can only reschedule org.sam.CREAMCE-JobState if status of this test is either OK or failure. You can not reschedule if status is waiting, pending or running.<br />
==MyEGI Portal==<br />
<br />
[https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] is the visualization tool of WLCG Nagios package. It is the replacement for SAMDB Portal. It consists of<br />
* Regional GridMap<br />
*Service View : Shows all services with current status<br />
*Metric Status View : Show the status of all tests per service and flavour.<br />
*History View : Show a graph of the current status of a service over time<br />
<br />
==Backup Regional Nagios at Lancaster==<br />
<br />
https://gridppnagios.lancs.ac.uk/nagios<br />
<br />
Backup Nagios functions exactly same way as main regional Nagios. The only difference is that it doesn't send alarms to Dashboard.<br />
<br />
==FAQ and Error Messages== <br />
Q. I have added or removed a service, how much time nagios will take to update configuration ?<br />
A. Nagios reconfigure itself every six hour, So nagios will update with in 6 hour of information being publish in top bdii. No manual intervention is required<br />
<br />
Q. How to subscribe for nagios alerts ?<br />
A. There is no facility where user can subscribe alerts for himself. You have to ask Regional Nagios admin for subscription of email alerts.<br />
<br />
Q. Nagios job failing with Exit Code!=0<br />
A. Nagios script launches a Simple-MTA process to send result to message bus. In most of the cases if job completed but <br />
Simple-MTA process could not be launch then it throw this error. Main reasons are<br />
openldap-clients is not installed on WN<br />
ldapsearch -b o=grid -h "TOP-BDII" -p 2170 -x "(GlueServiceType=msg.broker.stomp)" to see that BDII defined in WN is publishing <br />
information about message broker<br />
Check BDII_LIST=lcgbdii.gridpp.rl.ac.uk:2170,lcg-bdii.gridpp.ac.uk:2170,lcg-bdii.cern.ch:2170 <br />
Check firewall outgoing for TCP:6163 port<br />
<br />
Q.org.sam.WN-RepCr test failing with " send2nsd: NS002 - send error : Bad credentials cannot create"<br />
A. Check crl at WN, it may be one of the reason<br />
<br />
Q How to recalculate availability and reliability of sites in case of problem<br />
A https://wiki.egi.eu/wiki/PROC10 <br />
https://tomtools.cern.ch/confluence/display/SAMDOC/Availability+Re-computation+Policy<br />
<br />
== Useful Links ==<br />
UKI Myegi Page<br />
https://gridppnagios.physics.ox.ac.uk/myegi<br />
<br />
UKI WLCG Nagios<br />
https://gridppnagios.physics.ox.ac.uk/nagios<br />
<br />
Nagios Test for one site <br />
https://gridppnagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?hostgroup=site-UKI-LT2-QMUL&style=detail<br />
<br />
Nagios Test for one service i.e glexec<br />
https://gridppnagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?servicegroup=org.sam.glexec.CE&style=detail<br />
<br />
Entry point for all SAM related information<br />
https://wiki.egi.eu/wiki/SAM<br />
<br />
<br />
Operational Dashboard workflow<br />
https://forge.in2p3.fr/projects/opsportaluser/wiki/Operations_Dashboard <br />
<br />
<br />
How to use Dashboard and Nagios web interface : second part of presentation is good for Nagios<br />
https://documents.egi.eu/public/RetrieveFile?docid=301&version=6&filename=Training_guide_general_v1.pdf<br />
<br />
<br />
{{KeyDocs|responsible=Kashif Mohammad|reviewdate=2014-01-14|accuratedate=2014-01-14|percentage=100}}</div>Kashif Mohammad 6ae08fa8ffhttps://www.gridpp.ac.uk/wiki/UKI_WLCG_Regional_NagiosUKI WLCG Regional Nagios2014-10-03T14:55:13Z<p>Kashif Mohammad 6ae08fa8ff: /* Understanding WLCG Nagios Tests */</p>
<hr />
<div>== Nagios Status ==<br />
'''Current Active Nagios :''' https://gridppnagios.physics.ox.ac.uk/nagios<br />
<br />
''' Standby :''' https://gridppnagios.lancs.ac.uk/nagios<br />
<br />
== Introduction of WLCG Nagios ==<br />
<br />
WLCG Nagios is replacement of old centralized SAM system to monitor WLCG grid infrastructure. It enabled regional entities, such as ROC or NGI to deploy and maintain regional monitoring infrastructure. Data produced by regional nagios is also used by project level system to carry out task such as Service Level Agreement(SLA) calculations etc. Regional Dashboard also use the same data to show problems at site and subsequently tickets are created based on those alarms.<br />
WLCG-Nagios was developed to integrate nagios for WLCG monitoring as part of the EGEE-SA1 Multi Level Monitoring [https://twiki.cern.ch/twiki/bin/view/EGEE/MultiLevelMonitoringOverview MLM] approach. Now it is maintained under EGI and current home page of WLCG Nagios is [https://wiki.egi.eu/wiki/SAM here]. Apart from Nagios, main component of WLCG nagios are<br />
<br />
<br />
'''Aggregated Topology Provider (ATP)''' : ATP is installed as part of ROC/NGI WLCG Nagios package and it collects and aggregate topology related information from various information provider like GOCDB, CIC Portal, BDII and different VO feeds. It is the single authoritative information source for current wlcg grid topology.<br />
<br />
'''Nagios Configuration Generator(NCG)''' : It is the configuration tool which generates nagios configuration based on current topology provided by ATP. A cron job runs NCG every six hour so any changes in topology is included into nagios portal with in six hours.<br />
<br />
'''Messaging Infrastructure''' : An ActiveMQ based messaging infrastructure is used to publish all test results. The idea is that every test result should be published to a Topic or Queue on message bus so any tool can subscribe to that topic or queue and get latest result. Like Regional Dashboard subscribes to alarms queue and get all results directly from message bus<br />
<br />
== UKI Regional Nagios Monitoring Infrastructure ==<br />
<br />
Oxford is hosting and maintaining Regional Nagios([https://gridppnagios.physics.ox.ac.uk/nagios Gridppnagios]) for UK.<br />
WLCG nagios is running on a Dell 610 machine with 16GB of RAM. Jobs are submitted through WMS using a proxy generated by robot certificate. We are not using dedicated WMS any more and Nagios submission system picks up a random WMS from a configured list. A dedicated SE(storage-monit.physics.ox.ac.uk) hosted at Oxford is used for se-replication test of ARC-CE.<br />
<br />
==Access to WLCG Nagios Portal ==<br />
<br />
Access to WLCG nagios portal is enabled for all members of ops and dteam VO apart from persons registered as site admin or regional staff in GOCDB. <br />
Site admins are also authorized to schedule tests for their respective sites. Regional staff members can schedule test for all sites in NGI.<br />
<br />
Access can also be provided to other persons on request. Please send a mail to lcg_manager@physics.ox.ac.uk.<br />
<br />
== Understanding WLCG Nagios Tests ==<br />
<br />
Nagios plugins are available for almost all grid services and it is explained in detail [https://tomtools.cern.ch/confluence/display/SAM/SAM+Architecture.html here] . Explaining all tests are out of scope of this wiki so I am giving a brief overview of CE and SE which is the main component of a grid site. <br />
<br />
===CE Test=== <br />
Nagios submits a job to CE through WMS and the result is sent to message bus directly from WN. Nagios subscribe this result from message bus and publish it as passive result in Nagios portal. CE test comprise of mainly fallowing steps<br />
# Check env variable like, LCG_GFAL_INFOSYS<br />
# Check the version of lcg-CA and glite middleware installed at WN<br />
# Copy and register a file to close SE and then download it to WN and compare it<br />
# Replicate the same file to a central SE define in regional nagios.For UKI, it is storage-monit.physics.ox.ac.uk but it can be any SE.<br />
# Delete all tests file from SE(s).<br />
<br />
===SE test===<br />
<br />
A wrapper script is launched from Nagios which test different metrics and publish result as passive result in Nagios Portal. Main metrics are<br />
#Get full SRM endpoint and storage area from BDII<br />
#Copy and list a test file to SRM<br />
#Get transport URL of file and download it to tmp space and the delete all test files.<br />
<br />
==Rescheduling Tests==<br />
<br />
Site admins registered in GOCDB can reschedule tests for their respective sites for troubleshooting. <br />
===Rescheduling SE test===<br />
<br />
Scheduling SE test is quite straight forward. click on "org.sam.SRM-All-/ops/Role=lcgadmin" for your SE and then click on "Re-schedule the next check of this service" in service command option and then "commit". Point to note here is that org.sam.SRM-All is a wrapper metric and it will reschedule all test for your SE. You can not reschedule other test in SE as they are passive test.<br />
<br />
===Rescheduling CE test===<br />
<br />
For CE, reschedule "org.sam.CREAMCE-JobState-/ops/Role=lcgadmin" as same as above. Here also org.sam.CREAMCE-jobState is wrapper test for all other CE tests and you should not try to reschedule any passive test because it throws an error and it persist there. There is an extra complication in CE test that you can only reschedule org.sam.CREAMCE-JobState if status of this test is either OK or failure. You can not reschedule if status is waiting, pending or running.<br />
==MyEGI Portal==<br />
<br />
[https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] is the visualization tool of WLCG Nagios package. It is the replacement for SAMDB Portal. It consists of<br />
* Regional GridMap<br />
*Service View : Shows all services with current status<br />
*Metric Status View : Show the status of all tests per service and flavour.<br />
*History View : Show a graph of the current status of a service over time<br />
<br />
==Backup Regional Nagios at Lancaster==<br />
<br />
https://gridppnagios.lancs.ac.uk/nagios<br />
<br />
Backup Nagios functions exactly same way as main regional Nagios. The only difference is that it doesn't send alarms to Dashboard.<br />
<br />
==FAQ and Error Messages== <br />
Q. I have added or removed a service, how much time nagios will take to update configuration ?<br />
A. Nagios reconfigure itself every six hour, So nagios will update with in 6 hour of information being publish in top bdii. No manual intervention is required<br />
<br />
Q. How to subscribe for nagios alerts ?<br />
A. There is no facility where user can subscribe alerts for himself. You have to ask Regional Nagios admin for subscription of email alerts.<br />
<br />
Q. Nagios job failing with Exit Code!=0<br />
A. Nagios script launches a Simple-MTA process to send result to message bus. In most of the cases if job completed but <br />
Simple-MTA process could not be launch then it throw this error. Main reasons are<br />
openldap-clients is not installed on WN<br />
ldapsearch -b o=grid -h "TOP-BDII" -p 2170 -x "(GlueServiceType=msg.broker.stomp)" to see that BDII defined in WN is publishing <br />
information about message broker<br />
Check BDII_LIST=lcgbdii.gridpp.rl.ac.uk:2170,lcg-bdii.gridpp.ac.uk:2170,lcg-bdii.cern.ch:2170 <br />
Check firewall outgoing for TCP:6163 port<br />
<br />
Q.org.sam.WN-RepCr test failing with " send2nsd: NS002 - send error : Bad credentials cannot create"<br />
A. Check crl at WN, it may be one of the reason<br />
<br />
Q How to recalculate availability and reliability of sites in case of problem<br />
A https://wiki.egi.eu/wiki/PROC10 <br />
https://tomtools.cern.ch/confluence/display/SAMDOC/Availability+Re-computation+Policy<br />
<br />
== Useful Links ==<br />
UKI Myegi Page<br />
https://gridppnagios.physics.ox.ac.uk/myegi<br />
<br />
UKI WLCG Nagios<br />
https://gridppnagios.physics.ox.ac.uk/nagios<br />
<br />
Nagios Test for one site <br />
https://gridppnagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?hostgroup=site-UKI-LT2-QMUL&style=detail<br />
<br />
Nagios Test for one service i.e glexec<br />
https://gridppnagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?servicegroup=org.sam.glexec.CE&style=detail<br />
<br />
Generic WLCG nagios Documentation<br />
https://tomtools.cern.ch/confluence/display/SAMDOC/SAM+Documentation<br />
<br />
Critical tests which trigger alarms in Dashboard<br />
https://wiki.egi.eu/wiki/Operations_SAM_tests<br />
<br />
WLCG Nagios Probe description<br />
https://tomtools.cern.ch/confluence/display/SAM/Probes<br />
<br />
<br />
Operational Dashboard workflow<br />
https://forge.in2p3.fr/projects/opsportaluser/wiki/Operations_Dashboard <br />
<br />
<br />
How to use Dashboard and Nagios web interface : second part of presentation is good for Nagios<br />
https://documents.egi.eu/public/RetrieveFile?docid=301&version=6&filename=Training_guide_general_v1.pdf<br />
<br />
<br />
{{KeyDocs|responsible=Kashif Mohammad|reviewdate=2014-01-14|accuratedate=2014-01-14|percentage=100}}</div>Kashif Mohammad 6ae08fa8ffhttps://www.gridpp.ac.uk/wiki/UKI_WLCG_Regional_NagiosUKI WLCG Regional Nagios2014-10-03T14:53:41Z<p>Kashif Mohammad 6ae08fa8ff: /* Introduction of WLCG Nagios */</p>
<hr />
<div>== Nagios Status ==<br />
'''Current Active Nagios :''' https://gridppnagios.physics.ox.ac.uk/nagios<br />
<br />
''' Standby :''' https://gridppnagios.lancs.ac.uk/nagios<br />
<br />
== Introduction of WLCG Nagios ==<br />
<br />
WLCG Nagios is replacement of old centralized SAM system to monitor WLCG grid infrastructure. It enabled regional entities, such as ROC or NGI to deploy and maintain regional monitoring infrastructure. Data produced by regional nagios is also used by project level system to carry out task such as Service Level Agreement(SLA) calculations etc. Regional Dashboard also use the same data to show problems at site and subsequently tickets are created based on those alarms.<br />
WLCG-Nagios was developed to integrate nagios for WLCG monitoring as part of the EGEE-SA1 Multi Level Monitoring [https://twiki.cern.ch/twiki/bin/view/EGEE/MultiLevelMonitoringOverview MLM] approach. Now it is maintained under EGI and current home page of WLCG Nagios is [https://wiki.egi.eu/wiki/SAM here]. Apart from Nagios, main component of WLCG nagios are<br />
<br />
<br />
'''Aggregated Topology Provider (ATP)''' : ATP is installed as part of ROC/NGI WLCG Nagios package and it collects and aggregate topology related information from various information provider like GOCDB, CIC Portal, BDII and different VO feeds. It is the single authoritative information source for current wlcg grid topology.<br />
<br />
'''Nagios Configuration Generator(NCG)''' : It is the configuration tool which generates nagios configuration based on current topology provided by ATP. A cron job runs NCG every six hour so any changes in topology is included into nagios portal with in six hours.<br />
<br />
'''Messaging Infrastructure''' : An ActiveMQ based messaging infrastructure is used to publish all test results. The idea is that every test result should be published to a Topic or Queue on message bus so any tool can subscribe to that topic or queue and get latest result. Like Regional Dashboard subscribes to alarms queue and get all results directly from message bus<br />
<br />
== UKI Regional Nagios Monitoring Infrastructure ==<br />
<br />
Oxford is hosting and maintaining Regional Nagios([https://gridppnagios.physics.ox.ac.uk/nagios Gridppnagios]) for UK.<br />
WLCG nagios is running on a Dell 610 machine with 16GB of RAM. Jobs are submitted through WMS using a proxy generated by robot certificate. We are not using dedicated WMS any more and Nagios submission system picks up a random WMS from a configured list. A dedicated SE(storage-monit.physics.ox.ac.uk) hosted at Oxford is used for se-replication test of ARC-CE.<br />
<br />
==Access to WLCG Nagios Portal ==<br />
<br />
Access to WLCG nagios portal is enabled for all members of ops and dteam VO apart from persons registered as site admin or regional staff in GOCDB. <br />
Site admins are also authorized to schedule tests for their respective sites. Regional staff members can schedule test for all sites in NGI.<br />
<br />
Access can also be provided to other persons on request. Please send a mail to lcg_manager@physics.ox.ac.uk.<br />
<br />
== Understanding WLCG Nagios Tests ==<br />
<br />
Nagios plugins are available for almost all grid services and it is explained in detail [https://tomtools.cern.ch/confluence/display/SAMDOC/Grid+probes here] . Explaining all tests are out of scope of this wiki so I am giving a brief overview of CE and SE which is the main component of a grid site. <br />
<br />
===CE Test=== <br />
Nagios submits a job to CE through WMS and the result is sent to message bus directly from WN. Nagios subscribe this result from message bus and publish it as passive result in Nagios portal. CE test comprise of mainly fallowing steps<br />
# Check env variable like, LCG_GFAL_INFOSYS<br />
# Check the version of lcg-CA and glite middleware installed at WN<br />
# Copy and register a file to close SE and then download it to WN and compare it<br />
# Replicate the same file to a central SE define in regional nagios.For UKI, it is storage-monit.physics.ox.ac.uk but it can be any SE.<br />
# Delete all tests file from SE(s).<br />
<br />
===SE test===<br />
<br />
A wrapper script is launched from Nagios which test different metrics and publish result as passive result in Nagios Portal. Main metrics are<br />
#Get full SRM endpoint and storage area from BDII<br />
#Copy and list a test file to SRM<br />
#Get transport URL of file and download it to tmp space and the delete all test files.<br />
<br />
==Rescheduling Tests==<br />
<br />
Site admins registered in GOCDB can reschedule tests for their respective sites for troubleshooting. <br />
===Rescheduling SE test===<br />
<br />
Scheduling SE test is quite straight forward. click on "org.sam.SRM-All-/ops/Role=lcgadmin" for your SE and then click on "Re-schedule the next check of this service" in service command option and then "commit". Point to note here is that org.sam.SRM-All is a wrapper metric and it will reschedule all test for your SE. You can not reschedule other test in SE as they are passive test.<br />
<br />
===Rescheduling CE test===<br />
<br />
For CE, reschedule "org.sam.CREAMCE-JobState-/ops/Role=lcgadmin" as same as above. Here also org.sam.CREAMCE-jobState is wrapper test for all other CE tests and you should not try to reschedule any passive test because it throws an error and it persist there. There is an extra complication in CE test that you can only reschedule org.sam.CREAMCE-JobState if status of this test is either OK or failure. You can not reschedule if status is waiting, pending or running.<br />
==MyEGI Portal==<br />
<br />
[https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] is the visualization tool of WLCG Nagios package. It is the replacement for SAMDB Portal. It consists of<br />
* Regional GridMap<br />
*Service View : Shows all services with current status<br />
*Metric Status View : Show the status of all tests per service and flavour.<br />
*History View : Show a graph of the current status of a service over time<br />
<br />
==Backup Regional Nagios at Lancaster==<br />
<br />
https://gridppnagios.lancs.ac.uk/nagios<br />
<br />
Backup Nagios functions exactly same way as main regional Nagios. The only difference is that it doesn't send alarms to Dashboard.<br />
<br />
==FAQ and Error Messages== <br />
Q. I have added or removed a service, how much time nagios will take to update configuration ?<br />
A. Nagios reconfigure itself every six hour, So nagios will update with in 6 hour of information being publish in top bdii. No manual intervention is required<br />
<br />
Q. How to subscribe for nagios alerts ?<br />
A. There is no facility where user can subscribe alerts for himself. You have to ask Regional Nagios admin for subscription of email alerts.<br />
<br />
Q. Nagios job failing with Exit Code!=0<br />
A. Nagios script launches a Simple-MTA process to send result to message bus. In most of the cases if job completed but <br />
Simple-MTA process could not be launch then it throw this error. Main reasons are<br />
openldap-clients is not installed on WN<br />
ldapsearch -b o=grid -h "TOP-BDII" -p 2170 -x "(GlueServiceType=msg.broker.stomp)" to see that BDII defined in WN is publishing <br />
information about message broker<br />
Check BDII_LIST=lcgbdii.gridpp.rl.ac.uk:2170,lcg-bdii.gridpp.ac.uk:2170,lcg-bdii.cern.ch:2170 <br />
Check firewall outgoing for TCP:6163 port<br />
<br />
Q.org.sam.WN-RepCr test failing with " send2nsd: NS002 - send error : Bad credentials cannot create"<br />
A. Check crl at WN, it may be one of the reason<br />
<br />
Q How to recalculate availability and reliability of sites in case of problem<br />
A https://wiki.egi.eu/wiki/PROC10 <br />
https://tomtools.cern.ch/confluence/display/SAMDOC/Availability+Re-computation+Policy<br />
<br />
== Useful Links ==<br />
UKI Myegi Page<br />
https://gridppnagios.physics.ox.ac.uk/myegi<br />
<br />
UKI WLCG Nagios<br />
https://gridppnagios.physics.ox.ac.uk/nagios<br />
<br />
Nagios Test for one site <br />
https://gridppnagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?hostgroup=site-UKI-LT2-QMUL&style=detail<br />
<br />
Nagios Test for one service i.e glexec<br />
https://gridppnagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?servicegroup=org.sam.glexec.CE&style=detail<br />
<br />
Generic WLCG nagios Documentation<br />
https://tomtools.cern.ch/confluence/display/SAMDOC/SAM+Documentation<br />
<br />
Critical tests which trigger alarms in Dashboard<br />
https://wiki.egi.eu/wiki/Operations_SAM_tests<br />
<br />
WLCG Nagios Probe description<br />
https://tomtools.cern.ch/confluence/display/SAM/Probes<br />
<br />
<br />
Operational Dashboard workflow<br />
https://forge.in2p3.fr/projects/opsportaluser/wiki/Operations_Dashboard <br />
<br />
<br />
How to use Dashboard and Nagios web interface : second part of presentation is good for Nagios<br />
https://documents.egi.eu/public/RetrieveFile?docid=301&version=6&filename=Training_guide_general_v1.pdf<br />
<br />
<br />
{{KeyDocs|responsible=Kashif Mohammad|reviewdate=2014-01-14|accuratedate=2014-01-14|percentage=100}}</div>Kashif Mohammad 6ae08fa8ffhttps://www.gridpp.ac.uk/wiki/UKI_WLCG_Regional_NagiosUKI WLCG Regional Nagios2014-10-03T14:52:23Z<p>Kashif Mohammad 6ae08fa8ff: /* UKI Regional Nagios Monitoring Infrastructure */</p>
<hr />
<div>== Nagios Status ==<br />
'''Current Active Nagios :''' https://gridppnagios.physics.ox.ac.uk/nagios<br />
<br />
''' Standby :''' https://gridppnagios.lancs.ac.uk/nagios<br />
<br />
== Introduction of WLCG Nagios ==<br />
<br />
WLCG Nagios is replacement of old centralized SAM system to monitor WLCG grid infrastructure. It enabled regional entities, such as ROC or NGI to deploy and maintain regional monitoring infrastructure. Data produced by regional nagios is also used by project level system to carry out task such as Service Level Agreement(SLA) calculations etc. Regional Dashboard also use the same data to show problems at site and subsequently tickets are created based on those alarms.<br />
WLCG-Nagios was developed to integrate nagios for WLCG monitoring as part of the EGEE-SA1 Multi Level Monitoring [https://twiki.cern.ch/twiki/bin/view/EGEE/MultiLevelMonitoringOverview MLM] approach. Now it is maintained under EGI and current home page of WLCG Nagios is [https://tomtools.cern.ch/confluence/display/SAM/Home here]. Apart from Nagios, main component of WLCG nagios are<br />
<br />
<br />
'''Aggregated Topology Provider (ATP)''' : ATP is installed as part of ROC/NGI WLCG Nagios package and it collects and aggregate topology related information from various information provider like GOCDB, CIC Portal, BDII and different VO feeds. It is the single authoritative information source for current wlcg grid topology.<br />
<br />
'''Nagios Configuration Generator(NCG)''' : It is the configuration tool which generates nagios configuration based on current topology provided by ATP. A cron job runs NCG every six hour so any changes in topology is included into nagios portal with in six hours.<br />
<br />
'''Messaging Infrastructure''' : An ActiveMQ based messaging infrastructure is used to publish all test results. The idea is that every test result should be published to a Topic or Queue on message bus so any tool can subscribe to that topic or queue and get latest result. Like Regional Dashboard subscribes to alarms queue and get all results directly from message bus<br />
<br />
== UKI Regional Nagios Monitoring Infrastructure ==<br />
<br />
Oxford is hosting and maintaining Regional Nagios([https://gridppnagios.physics.ox.ac.uk/nagios Gridppnagios]) for UK.<br />
WLCG nagios is running on a Dell 610 machine with 16GB of RAM. Jobs are submitted through WMS using a proxy generated by robot certificate. We are not using dedicated WMS any more and Nagios submission system picks up a random WMS from a configured list. A dedicated SE(storage-monit.physics.ox.ac.uk) hosted at Oxford is used for se-replication test of ARC-CE.<br />
<br />
==Access to WLCG Nagios Portal ==<br />
<br />
Access to WLCG nagios portal is enabled for all members of ops and dteam VO apart from persons registered as site admin or regional staff in GOCDB. <br />
Site admins are also authorized to schedule tests for their respective sites. Regional staff members can schedule test for all sites in NGI.<br />
<br />
Access can also be provided to other persons on request. Please send a mail to lcg_manager@physics.ox.ac.uk.<br />
<br />
== Understanding WLCG Nagios Tests ==<br />
<br />
Nagios plugins are available for almost all grid services and it is explained in detail [https://tomtools.cern.ch/confluence/display/SAMDOC/Grid+probes here] . Explaining all tests are out of scope of this wiki so I am giving a brief overview of CE and SE which is the main component of a grid site. <br />
<br />
===CE Test=== <br />
Nagios submits a job to CE through WMS and the result is sent to message bus directly from WN. Nagios subscribe this result from message bus and publish it as passive result in Nagios portal. CE test comprise of mainly fallowing steps<br />
# Check env variable like, LCG_GFAL_INFOSYS<br />
# Check the version of lcg-CA and glite middleware installed at WN<br />
# Copy and register a file to close SE and then download it to WN and compare it<br />
# Replicate the same file to a central SE define in regional nagios.For UKI, it is storage-monit.physics.ox.ac.uk but it can be any SE.<br />
# Delete all tests file from SE(s).<br />
<br />
===SE test===<br />
<br />
A wrapper script is launched from Nagios which test different metrics and publish result as passive result in Nagios Portal. Main metrics are<br />
#Get full SRM endpoint and storage area from BDII<br />
#Copy and list a test file to SRM<br />
#Get transport URL of file and download it to tmp space and the delete all test files.<br />
<br />
==Rescheduling Tests==<br />
<br />
Site admins registered in GOCDB can reschedule tests for their respective sites for troubleshooting. <br />
===Rescheduling SE test===<br />
<br />
Scheduling SE test is quite straight forward. click on "org.sam.SRM-All-/ops/Role=lcgadmin" for your SE and then click on "Re-schedule the next check of this service" in service command option and then "commit". Point to note here is that org.sam.SRM-All is a wrapper metric and it will reschedule all test for your SE. You can not reschedule other test in SE as they are passive test.<br />
<br />
===Rescheduling CE test===<br />
<br />
For CE, reschedule "org.sam.CREAMCE-JobState-/ops/Role=lcgadmin" as same as above. Here also org.sam.CREAMCE-jobState is wrapper test for all other CE tests and you should not try to reschedule any passive test because it throws an error and it persist there. There is an extra complication in CE test that you can only reschedule org.sam.CREAMCE-JobState if status of this test is either OK or failure. You can not reschedule if status is waiting, pending or running.<br />
==MyEGI Portal==<br />
<br />
[https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] is the visualization tool of WLCG Nagios package. It is the replacement for SAMDB Portal. It consists of<br />
* Regional GridMap<br />
*Service View : Shows all services with current status<br />
*Metric Status View : Show the status of all tests per service and flavour.<br />
*History View : Show a graph of the current status of a service over time<br />
<br />
==Backup Regional Nagios at Lancaster==<br />
<br />
https://gridppnagios.lancs.ac.uk/nagios<br />
<br />
Backup Nagios functions exactly same way as main regional Nagios. The only difference is that it doesn't send alarms to Dashboard.<br />
<br />
==FAQ and Error Messages== <br />
Q. I have added or removed a service, how much time nagios will take to update configuration ?<br />
A. Nagios reconfigure itself every six hour, So nagios will update with in 6 hour of information being publish in top bdii. No manual intervention is required<br />
<br />
Q. How to subscribe for nagios alerts ?<br />
A. There is no facility where user can subscribe alerts for himself. You have to ask Regional Nagios admin for subscription of email alerts.<br />
<br />
Q. Nagios job failing with Exit Code!=0<br />
A. Nagios script launches a Simple-MTA process to send result to message bus. In most of the cases if job completed but <br />
Simple-MTA process could not be launch then it throw this error. Main reasons are<br />
openldap-clients is not installed on WN<br />
ldapsearch -b o=grid -h "TOP-BDII" -p 2170 -x "(GlueServiceType=msg.broker.stomp)" to see that BDII defined in WN is publishing <br />
information about message broker<br />
Check BDII_LIST=lcgbdii.gridpp.rl.ac.uk:2170,lcg-bdii.gridpp.ac.uk:2170,lcg-bdii.cern.ch:2170 <br />
Check firewall outgoing for TCP:6163 port<br />
<br />
Q.org.sam.WN-RepCr test failing with " send2nsd: NS002 - send error : Bad credentials cannot create"<br />
A. Check crl at WN, it may be one of the reason<br />
<br />
Q How to recalculate availability and reliability of sites in case of problem<br />
A https://wiki.egi.eu/wiki/PROC10 <br />
https://tomtools.cern.ch/confluence/display/SAMDOC/Availability+Re-computation+Policy<br />
<br />
== Useful Links ==<br />
UKI Myegi Page<br />
https://gridppnagios.physics.ox.ac.uk/myegi<br />
<br />
UKI WLCG Nagios<br />
https://gridppnagios.physics.ox.ac.uk/nagios<br />
<br />
Nagios Test for one site <br />
https://gridppnagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?hostgroup=site-UKI-LT2-QMUL&style=detail<br />
<br />
Nagios Test for one service i.e glexec<br />
https://gridppnagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?servicegroup=org.sam.glexec.CE&style=detail<br />
<br />
Generic WLCG nagios Documentation<br />
https://tomtools.cern.ch/confluence/display/SAMDOC/SAM+Documentation<br />
<br />
Critical tests which trigger alarms in Dashboard<br />
https://wiki.egi.eu/wiki/Operations_SAM_tests<br />
<br />
WLCG Nagios Probe description<br />
https://tomtools.cern.ch/confluence/display/SAM/Probes<br />
<br />
<br />
Operational Dashboard workflow<br />
https://forge.in2p3.fr/projects/opsportaluser/wiki/Operations_Dashboard <br />
<br />
<br />
How to use Dashboard and Nagios web interface : second part of presentation is good for Nagios<br />
https://documents.egi.eu/public/RetrieveFile?docid=301&version=6&filename=Training_guide_general_v1.pdf<br />
<br />
<br />
{{KeyDocs|responsible=Kashif Mohammad|reviewdate=2014-01-14|accuratedate=2014-01-14|percentage=100}}</div>Kashif Mohammad 6ae08fa8ffhttps://www.gridpp.ac.uk/wiki/UKI_WLCG_Regional_NagiosUKI WLCG Regional Nagios2014-10-03T14:51:24Z<p>Kashif Mohammad 6ae08fa8ff: /* UKI Regional Nagios Monitoring Infrastructure */</p>
<hr />
<div>== Nagios Status ==<br />
'''Current Active Nagios :''' https://gridppnagios.physics.ox.ac.uk/nagios<br />
<br />
''' Standby :''' https://gridppnagios.lancs.ac.uk/nagios<br />
<br />
== Introduction of WLCG Nagios ==<br />
<br />
WLCG Nagios is replacement of old centralized SAM system to monitor WLCG grid infrastructure. It enabled regional entities, such as ROC or NGI to deploy and maintain regional monitoring infrastructure. Data produced by regional nagios is also used by project level system to carry out task such as Service Level Agreement(SLA) calculations etc. Regional Dashboard also use the same data to show problems at site and subsequently tickets are created based on those alarms.<br />
WLCG-Nagios was developed to integrate nagios for WLCG monitoring as part of the EGEE-SA1 Multi Level Monitoring [https://twiki.cern.ch/twiki/bin/view/EGEE/MultiLevelMonitoringOverview MLM] approach. Now it is maintained under EGI and current home page of WLCG Nagios is [https://tomtools.cern.ch/confluence/display/SAM/Home here]. Apart from Nagios, main component of WLCG nagios are<br />
<br />
<br />
'''Aggregated Topology Provider (ATP)''' : ATP is installed as part of ROC/NGI WLCG Nagios package and it collects and aggregate topology related information from various information provider like GOCDB, CIC Portal, BDII and different VO feeds. It is the single authoritative information source for current wlcg grid topology.<br />
<br />
'''Nagios Configuration Generator(NCG)''' : It is the configuration tool which generates nagios configuration based on current topology provided by ATP. A cron job runs NCG every six hour so any changes in topology is included into nagios portal with in six hours.<br />
<br />
'''Messaging Infrastructure''' : An ActiveMQ based messaging infrastructure is used to publish all test results. The idea is that every test result should be published to a Topic or Queue on message bus so any tool can subscribe to that topic or queue and get latest result. Like Regional Dashboard subscribes to alarms queue and get all results directly from message bus<br />
<br />
== UKI Regional Nagios Monitoring Infrastructure ==<br />
<br />
Oxford is hosting and maintaining Regional Nagios([https://gridppnagios.physics.ox.ac.uk/nagios Gridppnagios]) for UK.<br />
WLCG nagios is running on a Dell 610 machine with 16GB of RAM. Jobs are submitted through WMS using a proxy generated by robot certificate. We are not using dedicated WMS any more and Nagios submission system picks up a random WMS from list. A dedicated se (storage-monit.physics.ox.ac.uk) hosted at Oxford is used for se-replication test of ARC-CE.<br />
<br />
==Access to WLCG Nagios Portal ==<br />
<br />
Access to WLCG nagios portal is enabled for all members of ops and dteam VO apart from persons registered as site admin or regional staff in GOCDB. <br />
Site admins are also authorized to schedule tests for their respective sites. Regional staff members can schedule test for all sites in NGI.<br />
<br />
Access can also be provided to other persons on request. Please send a mail to lcg_manager@physics.ox.ac.uk.<br />
<br />
== Understanding WLCG Nagios Tests ==<br />
<br />
Nagios plugins are available for almost all grid services and it is explained in detail [https://tomtools.cern.ch/confluence/display/SAMDOC/Grid+probes here] . Explaining all tests are out of scope of this wiki so I am giving a brief overview of CE and SE which is the main component of a grid site. <br />
<br />
===CE Test=== <br />
Nagios submits a job to CE through WMS and the result is sent to message bus directly from WN. Nagios subscribe this result from message bus and publish it as passive result in Nagios portal. CE test comprise of mainly fallowing steps<br />
# Check env variable like, LCG_GFAL_INFOSYS<br />
# Check the version of lcg-CA and glite middleware installed at WN<br />
# Copy and register a file to close SE and then download it to WN and compare it<br />
# Replicate the same file to a central SE define in regional nagios.For UKI, it is storage-monit.physics.ox.ac.uk but it can be any SE.<br />
# Delete all tests file from SE(s).<br />
<br />
===SE test===<br />
<br />
A wrapper script is launched from Nagios which test different metrics and publish result as passive result in Nagios Portal. Main metrics are<br />
#Get full SRM endpoint and storage area from BDII<br />
#Copy and list a test file to SRM<br />
#Get transport URL of file and download it to tmp space and the delete all test files.<br />
<br />
==Rescheduling Tests==<br />
<br />
Site admins registered in GOCDB can reschedule tests for their respective sites for troubleshooting. <br />
===Rescheduling SE test===<br />
<br />
Scheduling SE test is quite straight forward. click on "org.sam.SRM-All-/ops/Role=lcgadmin" for your SE and then click on "Re-schedule the next check of this service" in service command option and then "commit". Point to note here is that org.sam.SRM-All is a wrapper metric and it will reschedule all test for your SE. You can not reschedule other test in SE as they are passive test.<br />
<br />
===Rescheduling CE test===<br />
<br />
For CE, reschedule "org.sam.CREAMCE-JobState-/ops/Role=lcgadmin" as same as above. Here also org.sam.CREAMCE-jobState is wrapper test for all other CE tests and you should not try to reschedule any passive test because it throws an error and it persist there. There is an extra complication in CE test that you can only reschedule org.sam.CREAMCE-JobState if status of this test is either OK or failure. You can not reschedule if status is waiting, pending or running.<br />
==MyEGI Portal==<br />
<br />
[https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] is the visualization tool of WLCG Nagios package. It is the replacement for SAMDB Portal. It consists of<br />
* Regional GridMap<br />
*Service View : Shows all services with current status<br />
*Metric Status View : Show the status of all tests per service and flavour.<br />
*History View : Show a graph of the current status of a service over time<br />
<br />
==Backup Regional Nagios at Lancaster==<br />
<br />
https://gridppnagios.lancs.ac.uk/nagios<br />
<br />
Backup Nagios functions exactly same way as main regional Nagios. The only difference is that it doesn't send alarms to Dashboard.<br />
<br />
==FAQ and Error Messages== <br />
Q. I have added or removed a service, how much time nagios will take to update configuration ?<br />
A. Nagios reconfigure itself every six hour, So nagios will update with in 6 hour of information being publish in top bdii. No manual intervention is required<br />
<br />
Q. How to subscribe for nagios alerts ?<br />
A. There is no facility where user can subscribe alerts for himself. You have to ask Regional Nagios admin for subscription of email alerts.<br />
<br />
Q. Nagios job failing with Exit Code!=0<br />
A. Nagios script launches a Simple-MTA process to send result to message bus. In most of the cases if job completed but <br />
Simple-MTA process could not be launch then it throw this error. Main reasons are<br />
openldap-clients is not installed on WN<br />
ldapsearch -b o=grid -h "TOP-BDII" -p 2170 -x "(GlueServiceType=msg.broker.stomp)" to see that BDII defined in WN is publishing <br />
information about message broker<br />
Check BDII_LIST=lcgbdii.gridpp.rl.ac.uk:2170,lcg-bdii.gridpp.ac.uk:2170,lcg-bdii.cern.ch:2170 <br />
Check firewall outgoing for TCP:6163 port<br />
<br />
Q.org.sam.WN-RepCr test failing with " send2nsd: NS002 - send error : Bad credentials cannot create"<br />
A. Check crl at WN, it may be one of the reason<br />
<br />
Q How to recalculate availability and reliability of sites in case of problem<br />
A https://wiki.egi.eu/wiki/PROC10 <br />
https://tomtools.cern.ch/confluence/display/SAMDOC/Availability+Re-computation+Policy<br />
<br />
== Useful Links ==<br />
UKI Myegi Page<br />
https://gridppnagios.physics.ox.ac.uk/myegi<br />
<br />
UKI WLCG Nagios<br />
https://gridppnagios.physics.ox.ac.uk/nagios<br />
<br />
Nagios Test for one site <br />
https://gridppnagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?hostgroup=site-UKI-LT2-QMUL&style=detail<br />
<br />
Nagios Test for one service i.e glexec<br />
https://gridppnagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?servicegroup=org.sam.glexec.CE&style=detail<br />
<br />
Generic WLCG nagios Documentation<br />
https://tomtools.cern.ch/confluence/display/SAMDOC/SAM+Documentation<br />
<br />
Critical tests which trigger alarms in Dashboard<br />
https://wiki.egi.eu/wiki/Operations_SAM_tests<br />
<br />
WLCG Nagios Probe description<br />
https://tomtools.cern.ch/confluence/display/SAM/Probes<br />
<br />
<br />
Operational Dashboard workflow<br />
https://forge.in2p3.fr/projects/opsportaluser/wiki/Operations_Dashboard <br />
<br />
<br />
How to use Dashboard and Nagios web interface : second part of presentation is good for Nagios<br />
https://documents.egi.eu/public/RetrieveFile?docid=301&version=6&filename=Training_guide_general_v1.pdf<br />
<br />
<br />
{{KeyDocs|responsible=Kashif Mohammad|reviewdate=2014-01-14|accuratedate=2014-01-14|percentage=100}}</div>Kashif Mohammad 6ae08fa8ffhttps://www.gridpp.ac.uk/wiki/Operations_Team_Action_itemsOperations Team Action items2014-10-01T11:04:24Z<p>Kashif Mohammad 6ae08fa8ff: /* This is a Wiki area to track GridPP operations team actions ===== Count list for minute taking = */</p>
<hr />
<div>== This is a Wiki area to track GridPP operations team actions ===== Count list for minute taking === <br />
<br />
Works left -> right. EVO meeting = 1: F2F meeting = 5<br />
<br />
IN: Pete=8 David=9 Jeremy=9 Daniela=10 David=9 Alessandra=9 Ewan=9 Duncan=9 Sam=9 Stephen=9 Andrew=9 Chris=9 Kashif=10 Wahid=10 Matt=10 Brian=11<br />
OUT: Mark=6<br />
Minutes can be found at: [http://indico.cern.ch/categoryDisplay.py?categId=338 UKI-ROC agenda]<br />
<br />
=== Action list ===<br />
<br />
{|border="1" cellpadding="1"<br />
|+<br />
<br />
|-style="background:#7C8AAF;color:white"<br />
!Action ID<br />
!Action description<br />
!Owner<br />
!Target date<br />
!Status<br />
!Date closed<br />
!Notes<br />
<br />
|-<br />
|O-120410-01<br />
| Some Action<br />
| Some person<br />
| 25-012-2012<br />
| Some status<br />
| 25-012-2024<br />
| Template<br />
<br />
<br />
<br />
<br />
|-<br />
|O-140506-02<br />
|Puppet config to make cvmfs changes once for all relevant VOs <br />
|Kashif<br />
|<br />
|Open ?<br />
|<br />
|Currently, when updating cvmfs information, it has to be done per VO rather than doing it once which then propagates as appropriate. <br />
<br />
<br />
|-<br />
| 0-140812-01<br />
| RIPE probe deployment<br />
| Jeremy (?)<br />
| 2014-08-20<br />
| Open<br />
|<br />
| Discuss deployment of RIPE probes with sites at GridPP 33<br />
|<br />
<br />
|-<br />
|O-140812-03<br />
| Prepare to talk to JANET at GridPP33<br />
| All<br />
| 2014-08-21<br />
|Open<br />
|<br />
| Sites should have questions, queries and suggestions ready to make most of the meeting with JANET reps. Perhaps local network admins would like a question asked? The setup of a new IPv6 SIG was suggested as one.<br />
| <br />
<br />
|<br />
|}<br />
<br />
See also: [[Operations Team Completed Actions]] and [[Operations Issues ]].<br />
<br />
See also the archived: [[Deployment Team Completed Actions]] and [[Deployment Issues ]].<br />
<br />
[[Category:GridPP Operations]]</div>Kashif Mohammad 6ae08fa8ffhttps://www.gridpp.ac.uk/wiki/Operations_Bulletin_LatestOperations Bulletin Latest2014-09-16T09:27:32Z<p>Kashif Mohammad 6ae08fa8ff: /* */</p>
<hr />
<div>[[Operations_Bulletin_Archive|Bulletin archive]]<br />
<br />
__NOTOC__<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Week commencing 15th September 2014<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Task Areas<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start General****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | General updates <br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start General text*********************** ----->'''<br />
'''Monday 15th September'''<br />
* Steve has setup an [http://pprc.qmul.ac.uk/~lloyd/gridpp/nettest_v6.html IPv6 network test].<br />
* Duncan asked if the gfal replacement is not on WNs by default?<br />
* The official [https://twiki.cern.ch/twiki/bin/view/LCG/GDBMeetingNotes20140910 GDB summary notes of the 10th September meeting] are now available.<br />
* [https://twiki.cern.ch/twiki/bin/view/LCG/20140910PreGDB Notes] from the [http://indico.cern.ch/event/272791/ pre-GDB on Clouds] are also available.<br />
* Do we still have sites suffering from ARGUS instabilities? CERN noticed ongoing problems ([ https://ggus.eu/index.php?mode=ticket_info&ticket_id=105666 GGUS 105666]).<br />
* A reminder of this [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOperationsWeb top-level WLCG page].<br />
* All VOMS update tickets closed. Tests passing. Thank you!<br />
* [ https://indico.cern.ch/event/340943/ 3rd interim Foundation Board (iFB) of the HEP Software community meeting] this Wednesday, 17th September at 15.00 Geneva time. Plans to identify people to lead the activities.<br />
<br />
<br />
<br />
'''Monday 8th September'''<br />
* Be ready for the new CERN and ops VOMS. Compare the prod and preprod instances for:<br />
** ALICE:[http://cern.ch/go/6fbH preprod] and [http://cern.ch/go/Cb7F prod]. Birmingham both.<br />
** ATLAS:[http://cern.ch/go/R9vf preprod] and [http://cern.ch/go/z8q9 prod]. UCL pre only. QMUL both.<br />
** CMS:[http://cern.ch/go/GQ6h preprod] and [http://cern.ch/go/l9x6 prod]. QMUL both. RHUL both. UCL??. ECDF intermittent.<br />
** LHCb:[http://cern.ch/go/Zj9z preprod] and [http://cern.ch/go/6qX7 prod]. Fine.<br />
* An EMI3 WN tarball update has been done by Matt (see also [https://ggus.eu/index.php?mode=ticket_info&ticket_id=107869 GGUS 107869]).<br />
* There is an LHCONE/LHCOPN meeting next week on 16th and 17th ([https://indico.cern.ch/event/318811/ agenda]). It would be good to have some remote participation.<br />
* Website redesign - please complete [https://docs.google.com/forms/d/1REl4Utss1RZB7yxX0jAJ7aYDIhdsOXEPedGVZU0J27c/viewform?usp=send_form this survey].<br />
* For multicore - a reminder for sites running multicore and CREAM that there is an option in APEL to account multicore/multicpu. By default it is off.<br />
* There is a [http://indico.cern.ch/event/272791/ pre-GDB this afternoon on Clouds].<br />
* There is a [http://indico.cern.ch/event/272777/ GDB this week]. Any input?<br />
* Storage placement - survey TBC.<br />
<br />
<!-- **********************End General text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End General****************** -----><br />
<!-- ****************Start ops coord****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Operations Coordination - [https://indico.cern.ch/categoryDisplay.py?categId=4372 Agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start ops coord text*********************** -----><br />
'''Tuesday 16th September'''<br />
* The next core ops meeting is on 18th September.<br />
* The next multi-core meeting is today at 14:30 CERN time. It is on dynamic partitioning with LSF at CNAF.<br />
<br />
'''Monday 8th September'''<br />
* There will be a multi-core meeting on Tuesday 9th at 14:30 (CERN time). Covering reviews of the UGE setup for multicore jobs at CCIN2P3 and of the method to passing job requirement arguments to batch systems via CE. ([https://indico.cern.ch/event/339461/ Agenda])<br />
* A review of last week's ops meeting ([https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes140904 minutes]) follows:<br />
* No operations news<br />
* The [http://linuxsoft.cern.ch/wlcg/ WLCG repository] will become signed soon.<br />
* Baselines: No new EMI/UMD releases since last meeting.<br />
* MW issues: Missing key usage extension in delegated proxy. Fix for CREAM UI in October. Impacts ATLAS-Rucio intengration.<br />
* T1: FTS2 decommissioning done at 3 sites and 1 process another 3. NDGF-T1 is testing FAX using native xrootd and nfs4-mount from dCache.<br />
* OSG following up on how to discover HTCondor CEs in the information system.<br />
* Oracle: GoldenGate migration fine for IN2P3.<br />
* T0: AFS UI still used. lxplus5 target close of 14th Sept 2014. ARGUS - believe seen again unresponsive CAs problem.<br />
* T2: NTR<br />
* ALICE: Low activity. Job efficiencies issue still open.<br />
* ATLAS: Rucio test and normal DQ2 production activity are producing a slightly higher load on the storage of the sites.<br />
* CMS: Reminders - Target for CVMFS 2.1.19; update xrootd fallback configuration; add "Phedex Node Name" to site configuration.<br />
* LHCb: Mainly simulation work. SHA2 certificate testing started.<br />
* Network & transfer metrics: [https://indico.cern.ch/event/336520/ Meeting Monday 8th Sept]. [https://indico.cern.ch/event/336520/material/slides/0.pdf Slides]. Pythia Network Diagnosis Infrastructure funded by NSF - perfSONAR-PS data to identify and localize network problems using the Pythia algorithms.<br />
* Tracking tools: NTR<br />
* FTS3 deployment TF: Done - FTS3 now in production. New releases every 3-4 months. There are lists for feature requests and also support. Some improvements to FTS dashboard.<br />
* glexec TF: NTR.<br />
* Machine/job features: New lead for condor part Marian Zvada.<br />
* MW readiness: T0 pre-prod to install package reporter. Latest Cream-CE and Bdii update have been installed at LNL-T2. Next [https://indico.cern.ch/event/332224/ meeting 1st October].<br />
* "MW software for the verification activity" uses the package reporter results to aggregate per software component, is used to tag good/bad versions, publishes the results in a dashboard.<br />
* Multicore: ATLAS 11 T1s and 35 T2s; CMS at T1s and some US T2s. Decided the TF would take on board the standardization of the blah scripts (and other CEs scripts if needed) for the scheduling parameters<br />
* SHA-2: Compliance being tested. [https://operations-portal.egi.eu/broadcast/archive/id/1190 Broadcast sent]. Deadline 15th September. Switch SAM. Then expt. job and data systems. 88->55 tickets. <br />
* WMS decommissioning: Condor-based SAM probes due 1st October.<br />
* IPv6: NTR.<br />
* Squid mon & HTTP proxy discovery: Working on [http://wlcg-squid-monitor.cern.ch/snmpstats/all.html automated MRTG monitor]. Working on documentation. <br />
<br />
<br />
'''Tuesday 2nd September'''<br />
* The next WLCG ops coordination meeting is this [https://indico.cern.ch/event/326087/ Thursday 4th September].<br />
* There will be a Tier-1/2 feedback section in the agenda IF there is feedback/input. Do we have any items to raise?<br />
<br />
<br />
<br />
<!-- **********************End ops coord text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End ops coord****************** -----><br />
<br />
<!-- ****************Start T1****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-1 - [http://www.gridpp.rl.ac.uk/status/ Status Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start T1 text*********************** -----><br />
'''Tuesday 8th September'''<br />
* There was a brief network interruption yesterday (Tuesday 8th Sep) to the Tier1 network at around 5pm local time. This lasted a few minuites and the cause is being investigated.<br />
* We are planning to stop access for all VOs apart from ALICE to our CREAM CEs. The proposed date is 23rd September.<br />
<!-- **********************End T1 text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End T1****************** -----><br />
<br />
<!-- ****************Start Storage & DM****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Storage & Data Management - [http://storage.esc.rl.ac.uk/weekly/ Agendas/Minutes]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
'''Wedn 10 Sept.'''<br />
* High load at L'pool causing low throughput - how to throttle xroot transfers (and is the load necessary or a bug?)<br />
* Still testing WebFTS<br />
* Prep for DPM workshop<br />
<br />
'''Monday 1st September'''<br />
* FAX sites to update the C++ N2N rpms .<br />
* There is interest regarding issues/performance when placing storage outside firewalls. JC will shortly start a (closed) discussion/survey.<br />
<br />
'''Monday 11th August'''<br />
* Pool nodes at RHUL have received test errors.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End Storage & DM****************** -----><br />
<!-- ****************Start Accounting****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Accounting - [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html UK Grid Metrics] [[HEPSPEC06]] [http://tinyurl.com/cfbfanf Atlas Dashboard HS06]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 9th September'''<br />
* [http://accounting.egi.eu/egi.php?SubRegion=1.67&query=normcpu&startYear=2014&startMonth=8&endYear=2014&endMonth=9&yRange=SITE&xRange=VO&voGroup=lhc&chart=GRBAR&scale=LIN&localJobs=onlygridjobs APEL status]: An issue at Sheffield?<br />
<br />
<br />
'''Tuesday 2nd September'''<br />
* Please check [http://wlcg-rebus.cern.ch/apps/capacities/sites/ REBUS figures] for your site.<br />
* [http://accounting.egi.eu/egi.php?SubRegion=1.67&query=normcpu&startYear=2014&startMonth=8&endYear=2014&endMonth=9&yRange=SITE&xRange=VO&voGroup=lhc&chart=GRBAR&scale=LIN&localJobs=onlygridjobs APEL status]: Okay.<br />
<br />
'''Tuesday 26th August'''<br />
* Sheffield has stopped publishing.<br />
<br />
<!-- *****************Edit stop*********************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End Accounting****************** -----><br />
<!-- ****************Start Documentation****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Documentation - [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=area KeyDocs]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- *********************************************************** -----><br />
<!-- ******************Edit start********************* -----><br />
<br />
See the [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=reviewed worst KeyDocs list] for documents needing review now and the names of the responsible people.<br />
<br />
'''Tuesday 9th September'''<br />
* Looking a bit better. Will review in more details at core ops meeting (next Thursday 18th@11:30am unless there is a clash)<br />
<br />
'''Tuesday 2nd September'''<br />
* This work needs a kick-start! Reminders should now be being received.<br />
* Tom/Andrew in discussion about options for main site - main considerations are Wordpress and Drupal.<br />
<br />
<br />
|}<br />
<br />
<!-- ****************End Documentation****************** -----><br />
<!-- ****************Start Interop****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Interoperation - [https://wiki.egi.eu/wiki/Grid_Operations_Meetings EGI ops agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 9th September'''<br />
<br />
* Meeting minutes from [https://indico.egi.eu/indico/materialDisplay.py?materialId=minutes&confId=2310 yesterday].<br />
<br />
** Mostly a short meeting to give updates on product updates over the summer.<br />
** Please read the agenda/minutes for a full set but to pull out a couple of things:<br />
<br />
** Note that as per http://dmc.web.cern.ch, gfal and lcg-util are in end-of-life mode and support will end for both on 1st November. <br />
<br />
** FTS3, SQUID and CVMFS will soon be include in UMD; early adopters are requested<br />
<br />
* Next meeting planned for October 6th.<br />
<br />
'''Monday 8th September'''<br />
* There is an [https://wiki.egi.eu/wiki/Agenda-08-09-2014 EGI ops meeting today].<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- *********************************************************** -----><br />
<br />
|}<br />
<!-- ****************End Interop****************** -----><br />
<!-- ****************Start Monitoring****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Monitoring - [https://www.gridpp.ac.uk/wiki/Links_Monitoring_pages Links] [http://grid-monitoring.cern.ch/mywlcg/ MyWLCG]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
''' Tuesday 16 September '''<br />
<br />
* SAM Nagios probes refactoring TF meeting<br />
<br />
We had first SAM Nagios probe refactoring TF meeting on 12 September. Some of identified issues are listed in TF wiki<br />
<br />
https://wiki.egi.eu/wiki/SAM_Nagios_probes_refactoring_TF <br />
<br />
We need operations team opinion on some of the issues<br />
<br />
** Removing LFC tests : Everyone agreed in the meeting that it should be removed<br />
** Use of WMS for SAM test <br />
** Testing SRM client tools from WN <br />
<br />
<br />
''' Tuesday 2nd September<br />
<br />
* Monitoring consolation meeting last Friday<br />
<br />
** Validation of SAM2/3 results: https://twiki.cern.ch/twiki/bin/view/LCG/ValidationStatus<br />
** 4 UK sites (ECDF, Brunel, Durham, Oxford) had slight discrepancies: Looks either to be because more metrics are now being represented (blue = unknown, site not penalised), or older service still in vo feed (AGIS).<br />
** Next step is to compare availabilities for August for SAM2/3 and compare for sites<br />
** If sites see any discrepancies between http://dashb-atlas-sum.cern.ch/dashboard/request.py/historicalsmryview-sum and http://wlcg-sam-atlas.cern.ch/dashboard/request.py/historicalsmry for their site, please let me know<br />
<br />
* Squid monitoring TF meeting last Thursday<br />
<br />
** Cosmin presented ALICE CVMFS proposal to revived TF<br />
** Notes from meeting: https://twiki.cern.ch/twiki/bin/view/LCG/SquidMonitoringTF20140828MeetingNotes<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Monitoring****************** -----><br />
<!-- ****************On-duty****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | On-duty - [https://operations-portal.in2p3.fr/dashboard Dashboard] [https://www.gridpp.ac.uk/wiki/ROD_rota ROD rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 2nd September'''<br />
* Sussex is back in business - kept closing their low availability alarm wrt the GGUS ticket.<br />
* The UCL ticket is now finally receiving some attention.<br />
* Ongoing problems at RAL.<br />
<br />
'''Tuesday 26th August'''<br />
* RAL : Nagios jobs staying in queue for long time - to be investigated.<br />
* Sussex : Matt needs help probably from some SGE experts.<br />
* UCL : No acknowledgement from the site (ticket escalated to second level).<br />
* 100IT : There is an alarm from EGI federated cloud - this needs discussion.<br />
* Durham : Availability alarms - require constant closing with some comments. Ticket with devs is open.<br />
<br />
'''Tuesday 12th August'''<br />
* Last week was quiet.<br />
* Still one or to responses needed for next rota allocations.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Rollout [http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html Status] [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions WLCG Baseline]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 26th August'''<br />
* EMI3 WN tarball update needed soon ([https://ggus.eu/index.php?mode=ticket_info&ticket_id=107869 GGUS 107869])<br />
<br />
'''Monday 28th July'''<br />
* [https://operations-portal.egi.eu/broadcast/archive/id/1180 UMD v.3.8.0] was released on 24th July.<br />
<br />
<br />
'''References'''<br />
<br />
* Staged Rollout pages (now separated into EMI1 & 2), and the page listing the deployed versions is extractable from the bdii, so they should all be reasonably up-to-date:<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout_emi2.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End On-duty****************** -----><br />
<!-- ****************Start Security****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Security - [http://www.gridpp.ac.uk/security/inchand/Incident.html Incident Procedure] [http://www.gridpp.ac.uk/security/policies/index.html Policies] [https://www.gridpp.ac.uk/wiki/SD_rota Rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<br />
* FAX update<br />
<br />
'''Monday 8th August'''<br />
* There was a security team meeting last Wednesday.<br />
* There was a CA TAG meeting also last Wednesday.<br />
<br />
'''Monday 11th August'''<br />
* Topics as mentioned during the last GridPP technical meeting.<br />
<br />
* There is an issue at the moment in the evaluation of vulnerabilities causing everything rated 'High' by Pakiti to display as 'Critical' in the Dashboard.<br />
<br />
<br />
* The EGI [https://operations-portal.egi.eu/csiDashboard security dashboard].<br />
<br />
|}<br />
<!-- ****************End Security****************** -----><br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start Services****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Services - [http://netmon02.grid.hep.ph.ic.ac.uk:8080/maddash-webui/index.cgi PerfSonar dashboard] | [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).<br />
<br />
'''Tuesday 16th September'''<br />
* Another reminder of the [http://pprc.qmul.ac.uk/~lloyd/gridpp/nettest_v6.html IPv6 network tests].<br />
* There is an [https://indico.cern.ch/event/318811/ LHCONE/LHCOPN meeting] taking place yesterday and today.<br />
* RIPE A at Glasgow now live (but tagged ...). Hope to see others soon.<br />
<br />
<br />
'''Tuesday 9th September'''<br />
* RIPE probes now hosted: Cambridge, Sheffield, Liverpool, Lancaster (& Oxford and QMUL). Glasgow connected but no data.<br />
* RIPE probes not yet hosted: 6 sites.<br />
<br />
'''Tuesday 2nd September'''<br />
* Only a few of the RIPE probes went live last week - any issues at the other sites to be discussed?<br />
* JANET is going to deploy a perfSONAR instance on one of the exchange points in London. They hope it will help raise awareness of issues with local systems affecting their transfer performance.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Services****************** -----><br />
<!-- ****************Start Tickets****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tickets<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Monday 8th September 2014, 15.00 BST'''<br /><br />
25 Open UK tickets this week.<br />
<br />
'''NO SITE IN PARTICULAR'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=108182 108182](3/9)<br /><br />
As seen on TB-SUPPORT, the NGI has a ticket telling it to get sites to have the new voms servers configured for the switch over. Jeremy has kindly offered to field the ticket. I think we all have this in hand, but as I type this I realise I may have forgotten to set things up for the ops VO. I encourage everyone to double check their readiness ahead of next Monday's switchover. Assigned (8/9)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=106615 106615](2/7)<br /><br />
The RAL FTS2 service has been shutdown for nearly a week now, so I suspect this ticket tracking the switch off can be closed. In progress (3/9)<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=108306 108306](8/9)<br /><br />
CMS having trouble running a "locateall" AAA test at RALPP (TBH I don't know what that is) - Chris has let them know that this is due to their xrootd reverse proxy being down, and it should be up and running in a day or two after it's reinstalled. In progress (8/9)<br />
<br />
'''OXFORD'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=107911 107911](27/8)<br /><br />
As mentioned last week, Sno+ have been having trouble as they can't assign software tags on Arc CEs, and they use these tags to do stuff like black/white listing. There was some dicussion on this in the ticket, but it fizzled out- I suspect due to the topic moving offline. Can it have an update please? In progress (27/8)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=106554 106554](29/6)<br /><br />
CMS transfer problems to Bristol. Winnie put an update, where she mentioned she has applied a fix to their Storm that might have fixed the problem. Maybe. She's asked if the problem still persists, as the monitoring links provided have all gone stale. Lukasz is on leave, can anyone CMS savvy help her? Waiting for reply (8/9)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=106325 106325](18/6)<br /><br />
CMS Pilots losing contact with home base. No progress since Winnie noticed that the problem only seems to affect one of the Bristol clusters, but none expected due to leave. On Hold (8/9)<br />
<br />
''Update - Bristol have another, possibly related CMS ticket [https://ggus.eu/index.php?mode=ticket_info&ticket_id=108317 108317]<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=108100 108100](1/9)<br /><br />
Maarten ticketed ECDF about this CE's not having the new voms servers configured. Andy is working on it. There's a reminder that on top of adding the right configs services do need restarting. In progress (5/9)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=95303 95303](1/7/2013)<br /><br />
glexec tarball ticket. There's a bit more movement on getting this done, but it's all on me to get the tarball glexec working still - naught the Edinburgh chaps can do.<br />
<br />
'''DURHAM'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=108273 108273](5/9)<br /><br />
Duncan noticed some interesting goings on on the Durham perfsonar page. The Durham chaps are talking to their networking team to figure out what the flip is going on. In progress (8/9)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=107886 107886](26/8)<br /><br />
Duncan's unwavering gaze also noticed a problem on Sheffield's perfsonar. Elena was tweaking it when it broke, and it looks like it's still broken, any luck fixing it Elena? In progress (26/8)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=108288 108288](8/9)<br /><br />
Liverpool got a ROD ticket when their CREAM CE got poorly. Steve worked his magic and things were fixed, but Gareth asks about the persisting BDII tests still failing. Solved (8/9) ''Update - the problems seems to have disappeared, so was probably just a artifact of BDII lag.''<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=100566 100566](27/1)<br /><br />
My personal shame number 1. Lancaster's poor perfsonar performance. Despite a reinstall of the box and not showing any signs of a bottle neck in transfers or running manual tests we still have really poor perfsonar results. No problems with the network have been found. Duncan helped formulate a plan at GridPP, but I haven't had the time to test it out yet. On hold (8/9)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=95299 95299](1/7/13)<br />
My personal shame number 2 - Lancaster's glexec deployment ticket. Some news in that I have something I'd like to test now - I just need to find time to test it, then see if I can package it somehow. On hold (8/9)<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=95298 95298](1/7/13)<br /><br />
UCL's glexec deployment ticket. This work was pushed back to the end of August - any news on it? On Hold (29/7)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=107711 107711](15/8)<br /><br />
A ROD ticket for UCL APEL publishing errors. The apel admins got involved and things are looking better now - although Gareth points out that there is some missing data in the Spring. In progress (8/9)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=107799 107799](21/8)<br /><br />
Pointing VO_SNOPLUS_SNOLAB_CA_SW_DIR to /cvmfs/snoplus.gridpp.ac.uk. No news for a while on this after it was acknowledged - has the job fallen to the bottom of the stack? In progress (22/8) ''Solved now, issue was dealt with last week but the ticket wasn't updated.''<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=108217 108217](3/9)<br /><br />
Duncan ticketed QM about one of their pefsonar boxen - which Dan pointed out is their IPv6 perfsonar. So does that mean this ticket can be closed? In progress (4/9) ''Update - Duncan would like the ticket kept open to track this node's assimmalation into the mesh.''<br />
<br />
'''EFDA-JET'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=97485 97485](21/9/13)<br /><br />
Longstanding LHCB ticket with JET. No movement on this, but none was expected. Still if anyone wants to heroically interject with some ideas I'm sure it would be appreciated. On hold (29/7)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=107880 107880](26/8)<br /><br />
As mentioned last week, Matt M of Sno+ fame has a user who only has access to srm tools and is having trouble accessing files at RAL. Brian has suggested using the webfts, but Matt doesn't think this will work for the user's limited abilities. Any thoughts? In progress (8/9)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=107935 107935](27/8)<br /><br />
Inconsistency between BDII and SRM reported storage capacity...hang on, haven't we been here before (105571)? It's not quite the same problem, but it's close. Brian has confirmed the mismatch, Maria has asked for an explanation for it (and how it only really effects ATLASHOTDISK). In progress (3/9)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=105405 105405](14/5)<br /><br />
Checking the site firewall configuration for RAL's Vidyo router. Last update was in July, is the dialogue between the Vidyo team and the RAL networking chaps ongoing? On hold (1/7)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=106324 106324](18/6)<br /><br />
The Tier 1's version of 106325 - CMS pilots losing contact. This was waiting on the firewall expert getting back from hols to compare the settings between the Tier 1 and Tier 2 (who don't see this issue). Are they back yet? On Hold (14/8)<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tickets****************** -----><br />
<!-- ****************Start Tools****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tools - [https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] [https://gridppnagios.physics.ox.ac.uk/nagios/ Nagios]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== ===== <br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 16th Sep'''<br />
<br />
Multi VO nagios maintained at Oxford has been upgraded to add ARC CE tests. <br />
<br />
https://vo-nagios.physics.ox.ac.uk/nagios/<br />
<br />
It is currently monitoring gridpp, pheno, t2k.org, snoplus.snolab.ca, vo.southgrid.ac.uk<br />
<br />
Should we start monitoring it more actively and open ticket for sites failing tests ? <br />
<br />
'''Monday 14th July'''<br />
<br />
Winnie reported on Saturday 12th July that most of the UK sites are failing nagios test. Problem started with unscheduled power cut at a Greek site hosting EGI Message broker (mq.afroditi.hellasgrid.gr) around 2PM on 11th July. Message broker was put in downtime but topbdii's continued to publish it for quite long time. Stephen Burke mentioned in TB support thread that now default caching time is 4 days. When I checked on Monday morning only Manchester was still publishing mq.afroditi and it went away after Alessandra manually restarted top bdii. It seams that Imperial is configured with much shorter cache time.<br />
Only Oxford and Imperial was almost not affected and the reason may be that Oxford WN's have Imperial top bdii as first option in BDII_LIST. Other NGI's have reported same problem and this outage is likely to be considered when calculating availability/reliability. All Nagios tests came back to normal now. <br />
<br />
Emir reported this on tools-admin mailing list<br />
"We were planning to raise this issue at the next Operations meeting. In these extreme cases 24h cache rule in Top BDII has to be somehow circumvented." <br />
<br />
'''Tuesday 1st July'''<br />
* There was a monitoring problem on 26th June. All ARC CE's were using storage-monit.phyics.ox.ac.uk for replicating files as part of the nagios testing. storage-monit was updated but not re-yaimed until later. Storage-monit was broken for the morning leading to all ARC SRM tests failing.<br />
<br />
'''Tuesday 24th June'''<br />
* An update from Janusz on DIRAC:<br />
* We had a stupid bug in Dirac which affected the gridpp VO and storage. Now it is fixed and I was able to successfully upload a test file to Liverpool and register the file with the DFC<br />
* The async FTS is still under study, there some issues with this.<br />
*I have a link to software to sync user database from a VOMS server, haven’t looked into this in detail yet.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tools****************** -----><br />
<!-- ****************Start VOs****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | VOs - [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS] [http://operations-portal.egi.eu/vo VO IDs] [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs Approved] [http://pprc.qmul.ac.uk/~walker/votable.html VO table]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Monday 11th August'''<br />
* Steve J sent an email to hyperk on 7th regarding "software directory for Hyperk (CVMFS)" and entries in the VO ID card.<br />
<br />
"Monday 14th July 2014"<br />
* HyperK.org will initially use remote storage (irods at QMUL) - so CPU resources would be appreciated. <br />
<br />
"Monday 30 June 2104"<br />
* HyperK.org request for support from other sites<br />
** 2TB storage requested.<br />
** CVMFS required<br />
<br />
* Cernatschool.org<br />
** WebDAV access to storage -world read works at QMUL.<br />
** ideally will configure federated access with DFC as LFC allows.<br />
<br />
<br />
'''Monday 16 June 2014'''<br />
* CVMFS<br />
** Snoplus almost ready to move to CVMFS - waiting on two sites. Will use symlinks in existing software<br />
<br />
* VOMS server: Snoplus has problems with some of the VOMS servers - see ggus 106243 - may be related to update. <br />
<br />
<br />
'''Tuesday 15th April'''<br />
* Is there interest in an FTS3 web front end? ([http://indico.cern.ch/event/272620/contribution/11/material/slides/1.pdf more details])<br />
<br />
<br />
* Impact<br />
** Citation policy (https://www.gridpp.ac.uk/acknowledging.html)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End VOs****************** -----><br />
<!-- ****************Start Sites****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Site Updates<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 9th September'''<br />
* Intel announced the new generation of Xeon based on Haswell.<br />
<br />
'''Tuesday 20th May'''<br />
* Various sites but notably Oxford have ARGUS problems. 100s of requests seen per minute. Performance issues have been noted after initial installation at RAL, QMUL and others.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Sites****************** -----><br />
|}<br />
<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Meeting Summaries<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start PMB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Project Management Board - [http://www.gridpp.ac.uk/pmb/ Members][http://www.gridpp.ac.uk/php/pmb/minutes.php Minutes] [http://www.gridpp.ac.uk/pmb/ProjectManagement/QuarterlyReports/reports.html Quarterly Reports]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ******************************************* -----><br />
<!-- ****************End PMB****************** -----><br />
<!-- ****************Start Grid Ops****************** -----><br />
<!-- ******************************************* -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | GridPP ops meeting - [http://indico.cern.ch/categoryDisplay.py?categId=338 Agendas] [https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Actions] [https://www.gridpp.ac.uk/wiki/Category:GridPP_Operations Core Tasks]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Grid Ops****************** -----><br />
<!-- ****************Start T1 liaison****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) [https://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Agenda] Meeting takes place on Vidyo.<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Wednesday 10th September 2014'''<br />
* [https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2014-09-10 Operations report]<br />
* CMS are now writing to the newer T10KD tapes and migration of CMS data from 'B' to 'D' tapes is underway. <br />
* Access to the Cream CEs will be withdrawn apart from leaving access for ALICE. This has been announced for Tuesday 30th September. <br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Liaison****************** -----><br />
<!-- ****************Start GDB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Grid Deployment Board - [http://indico.cern.ch/categoryDisplay.py?categId=3l181 Agendas] [http://indico.cern.ch/categoryDisplay.py?categId=666 MB agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End GDB****************** -----><br />
<br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start NGI****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | NGI UK - [http://www.ukngi.ac.uk/ Homepage] [https://ca.grid-support.ac.uk/cgi-bin/pub/pki?cmd=getStaticPage&name=index CA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End NGI****************** -----><br />
<br />
<!-- ****************Start Events****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Events<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************Stop Events****************** -----><br />
<!-- ****************Start ATLAS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK ATLAS - [http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?#currentView=Shifter+view&highlight=false Shifter view] [http://www.atlas.ac.uk/ops/ News & Links]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End ATLAS****************** -----><br />
<!-- ****************Start CMS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK CMS<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End CMS****************** -----><br />
<!-- ****************Start LHCb****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK LHCb<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End LHCb****************** -----><br />
<!-- ****************Start Other****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK OTHER<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Other****************** -----><br />
<!-- ****************Start Requests****************** -----><br />
{| cellspacing="0" cellpadding="0" style="margin: 1em 0em 0em 0.3em; width:100%"<br />
| style="width:50%; vertical-align:top; border:1px solid Gold; background-color: LightGreen;" rowspan="2"|<br />
<div style="border-bottom:1px solid Gold; background-color:#ffffaa; padding:0.2em 0.5em 0.2em 0.5em; font-size:110%; font-weight:bold;"> To note</div><br />
<div style="padding:0.4em 1em 0.3em 1em;"><br />
<br />
==== ====<br />
<!-- ******************Edit start********************* -----><br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<br />
</div><br />
|}<br />
<br />
<!-- ****************End Requests****************** -----><br />
<br />
|}</div>Kashif Mohammad 6ae08fa8ffhttps://www.gridpp.ac.uk/wiki/Operations_Bulletin_LatestOperations Bulletin Latest2014-09-16T09:10:27Z<p>Kashif Mohammad 6ae08fa8ff: /* */</p>
<hr />
<div>[[Operations_Bulletin_Archive|Bulletin archive]]<br />
<br />
__NOTOC__<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Week commencing 15th September 2014<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Task Areas<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start General****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | General updates <br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start General text*********************** ----->'''<br />
'''Monday 15th September'''<br />
* Steve has setup an [http://pprc.qmul.ac.uk/~lloyd/gridpp/nettest_v6.html IPv6 network test].<br />
* Duncan asked if the gfal replacement is not on WNs by default?<br />
* The official [https://twiki.cern.ch/twiki/bin/view/LCG/GDBMeetingNotes20140910 GDB summary notes of the 10th September meeting] are now available.<br />
* [https://twiki.cern.ch/twiki/bin/view/LCG/20140910PreGDB Notes] from the [http://indico.cern.ch/event/272791/ pre-GDB on Clouds] are also available.<br />
* Do we still have sites suffering from ARGUS instabilities? CERN noticed ongoing problems ([ https://ggus.eu/index.php?mode=ticket_info&ticket_id=105666 GGUS 105666]).<br />
* A reminder of this [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOperationsWeb top-level WLCG page].<br />
* All VOMS update tickets closed. Tests passing. Thank you!<br />
* [ https://indico.cern.ch/event/340943/ 3rd interim Foundation Board (iFB) of the HEP Software community meeting] this Wednesday, 17th September at 15.00 Geneva time. Plans to identify people to lead the activities.<br />
<br />
<br />
<br />
'''Monday 8th September'''<br />
* Be ready for the new CERN and ops VOMS. Compare the prod and preprod instances for:<br />
** ALICE:[http://cern.ch/go/6fbH preprod] and [http://cern.ch/go/Cb7F prod]. Birmingham both.<br />
** ATLAS:[http://cern.ch/go/R9vf preprod] and [http://cern.ch/go/z8q9 prod]. UCL pre only. QMUL both.<br />
** CMS:[http://cern.ch/go/GQ6h preprod] and [http://cern.ch/go/l9x6 prod]. QMUL both. RHUL both. UCL??. ECDF intermittent.<br />
** LHCb:[http://cern.ch/go/Zj9z preprod] and [http://cern.ch/go/6qX7 prod]. Fine.<br />
* An EMI3 WN tarball update has been done by Matt (see also [https://ggus.eu/index.php?mode=ticket_info&ticket_id=107869 GGUS 107869]).<br />
* There is an LHCONE/LHCOPN meeting next week on 16th and 17th ([https://indico.cern.ch/event/318811/ agenda]). It would be good to have some remote participation.<br />
* Website redesign - please complete [https://docs.google.com/forms/d/1REl4Utss1RZB7yxX0jAJ7aYDIhdsOXEPedGVZU0J27c/viewform?usp=send_form this survey].<br />
* For multicore - a reminder for sites running multicore and CREAM that there is an option in APEL to account multicore/multicpu. By default it is off.<br />
* There is a [http://indico.cern.ch/event/272791/ pre-GDB this afternoon on Clouds].<br />
* There is a [http://indico.cern.ch/event/272777/ GDB this week]. Any input?<br />
* Storage placement - survey TBC.<br />
<br />
<!-- **********************End General text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End General****************** -----><br />
<!-- ****************Start ops coord****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Operations Coordination - [https://indico.cern.ch/categoryDisplay.py?categId=4372 Agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start ops coord text*********************** -----><br />
'''Tuesday 16th September'''<br />
* The next core ops meeting is on 18th September.<br />
* The next multi-core meeting is today at 14:30 CERN time. It is on dynamic partitioning with LSF at CNAF.<br />
<br />
'''Monday 8th September'''<br />
* There will be a multi-core meeting on Tuesday 9th at 14:30 (CERN time). Covering reviews of the UGE setup for multicore jobs at CCIN2P3 and of the method to passing job requirement arguments to batch systems via CE. ([https://indico.cern.ch/event/339461/ Agenda])<br />
* A review of last week's ops meeting ([https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes140904 minutes]) follows:<br />
* No operations news<br />
* The [http://linuxsoft.cern.ch/wlcg/ WLCG repository] will become signed soon.<br />
* Baselines: No new EMI/UMD releases since last meeting.<br />
* MW issues: Missing key usage extension in delegated proxy. Fix for CREAM UI in October. Impacts ATLAS-Rucio intengration.<br />
* T1: FTS2 decommissioning done at 3 sites and 1 process another 3. NDGF-T1 is testing FAX using native xrootd and nfs4-mount from dCache.<br />
* OSG following up on how to discover HTCondor CEs in the information system.<br />
* Oracle: GoldenGate migration fine for IN2P3.<br />
* T0: AFS UI still used. lxplus5 target close of 14th Sept 2014. ARGUS - believe seen again unresponsive CAs problem.<br />
* T2: NTR<br />
* ALICE: Low activity. Job efficiencies issue still open.<br />
* ATLAS: Rucio test and normal DQ2 production activity are producing a slightly higher load on the storage of the sites.<br />
* CMS: Reminders - Target for CVMFS 2.1.19; update xrootd fallback configuration; add "Phedex Node Name" to site configuration.<br />
* LHCb: Mainly simulation work. SHA2 certificate testing started.<br />
* Network & transfer metrics: [https://indico.cern.ch/event/336520/ Meeting Monday 8th Sept]. [https://indico.cern.ch/event/336520/material/slides/0.pdf Slides]. Pythia Network Diagnosis Infrastructure funded by NSF - perfSONAR-PS data to identify and localize network problems using the Pythia algorithms.<br />
* Tracking tools: NTR<br />
* FTS3 deployment TF: Done - FTS3 now in production. New releases every 3-4 months. There are lists for feature requests and also support. Some improvements to FTS dashboard.<br />
* glexec TF: NTR.<br />
* Machine/job features: New lead for condor part Marian Zvada.<br />
* MW readiness: T0 pre-prod to install package reporter. Latest Cream-CE and Bdii update have been installed at LNL-T2. Next [https://indico.cern.ch/event/332224/ meeting 1st October].<br />
* "MW software for the verification activity" uses the package reporter results to aggregate per software component, is used to tag good/bad versions, publishes the results in a dashboard.<br />
* Multicore: ATLAS 11 T1s and 35 T2s; CMS at T1s and some US T2s. Decided the TF would take on board the standardization of the blah scripts (and other CEs scripts if needed) for the scheduling parameters<br />
* SHA-2: Compliance being tested. [https://operations-portal.egi.eu/broadcast/archive/id/1190 Broadcast sent]. Deadline 15th September. Switch SAM. Then expt. job and data systems. 88->55 tickets. <br />
* WMS decommissioning: Condor-based SAM probes due 1st October.<br />
* IPv6: NTR.<br />
* Squid mon & HTTP proxy discovery: Working on [http://wlcg-squid-monitor.cern.ch/snmpstats/all.html automated MRTG monitor]. Working on documentation. <br />
<br />
<br />
'''Tuesday 2nd September'''<br />
* The next WLCG ops coordination meeting is this [https://indico.cern.ch/event/326087/ Thursday 4th September].<br />
* There will be a Tier-1/2 feedback section in the agenda IF there is feedback/input. Do we have any items to raise?<br />
<br />
<br />
<br />
<!-- **********************End ops coord text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End ops coord****************** -----><br />
<br />
<!-- ****************Start T1****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-1 - [http://www.gridpp.rl.ac.uk/status/ Status Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start T1 text*********************** -----><br />
'''Tuesday 8th September'''<br />
* There was a brief network interruption yesterday (Tuesday 8th Sep) to the Tier1 network at around 5pm local time. This lasted a few minuites and the cause is being investigated.<br />
* We are planning to stop access for all VOs apart from ALICE to our CREAM CEs. The proposed date is 23rd September.<br />
<!-- **********************End T1 text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End T1****************** -----><br />
<br />
<!-- ****************Start Storage & DM****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Storage & Data Management - [http://storage.esc.rl.ac.uk/weekly/ Agendas/Minutes]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
'''Wedn 10 Sept.'''<br />
* High load at L'pool causing low throughput - how to throttle xroot transfers (and is the load necessary or a bug?)<br />
* Still testing WebFTS<br />
* Prep for DPM workshop<br />
<br />
'''Monday 1st September'''<br />
* FAX sites to update the C++ N2N rpms .<br />
* There is interest regarding issues/performance when placing storage outside firewalls. JC will shortly start a (closed) discussion/survey.<br />
<br />
'''Monday 11th August'''<br />
* Pool nodes at RHUL have received test errors.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End Storage & DM****************** -----><br />
<!-- ****************Start Accounting****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Accounting - [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html UK Grid Metrics] [[HEPSPEC06]] [http://tinyurl.com/cfbfanf Atlas Dashboard HS06]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 9th September'''<br />
* [http://accounting.egi.eu/egi.php?SubRegion=1.67&query=normcpu&startYear=2014&startMonth=8&endYear=2014&endMonth=9&yRange=SITE&xRange=VO&voGroup=lhc&chart=GRBAR&scale=LIN&localJobs=onlygridjobs APEL status]: An issue at Sheffield?<br />
<br />
<br />
'''Tuesday 2nd September'''<br />
* Please check [http://wlcg-rebus.cern.ch/apps/capacities/sites/ REBUS figures] for your site.<br />
* [http://accounting.egi.eu/egi.php?SubRegion=1.67&query=normcpu&startYear=2014&startMonth=8&endYear=2014&endMonth=9&yRange=SITE&xRange=VO&voGroup=lhc&chart=GRBAR&scale=LIN&localJobs=onlygridjobs APEL status]: Okay.<br />
<br />
'''Tuesday 26th August'''<br />
* Sheffield has stopped publishing.<br />
<br />
<!-- *****************Edit stop*********************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End Accounting****************** -----><br />
<!-- ****************Start Documentation****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Documentation - [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=area KeyDocs]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- *********************************************************** -----><br />
<!-- ******************Edit start********************* -----><br />
<br />
See the [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=reviewed worst KeyDocs list] for documents needing review now and the names of the responsible people.<br />
<br />
'''Tuesday 9th September'''<br />
* Looking a bit better. Will review in more details at core ops meeting (next Thursday 18th@11:30am unless there is a clash)<br />
<br />
'''Tuesday 2nd September'''<br />
* This work needs a kick-start! Reminders should now be being received.<br />
* Tom/Andrew in discussion about options for main site - main considerations are Wordpress and Drupal.<br />
<br />
<br />
|}<br />
<br />
<!-- ****************End Documentation****************** -----><br />
<!-- ****************Start Interop****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Interoperation - [https://wiki.egi.eu/wiki/Grid_Operations_Meetings EGI ops agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 9th September'''<br />
<br />
* Meeting minutes from [https://indico.egi.eu/indico/materialDisplay.py?materialId=minutes&confId=2310 yesterday].<br />
<br />
** Mostly a short meeting to give updates on product updates over the summer.<br />
** Please read the agenda/minutes for a full set but to pull out a couple of things:<br />
<br />
** Note that as per http://dmc.web.cern.ch, gfal and lcg-util are in end-of-life mode and support will end for both on 1st November. <br />
<br />
** FTS3, SQUID and CVMFS will soon be include in UMD; early adopters are requested<br />
<br />
* Next meeting planned for October 6th.<br />
<br />
'''Monday 8th September'''<br />
* There is an [https://wiki.egi.eu/wiki/Agenda-08-09-2014 EGI ops meeting today].<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- *********************************************************** -----><br />
<br />
|}<br />
<!-- ****************End Interop****************** -----><br />
<!-- ****************Start Monitoring****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Monitoring - [https://www.gridpp.ac.uk/wiki/Links_Monitoring_pages Links] [http://grid-monitoring.cern.ch/mywlcg/ MyWLCG]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
''' Tuesday 2nd September<br />
<br />
* Monitoring consolation meeting last Friday<br />
<br />
** Validation of SAM2/3 results: https://twiki.cern.ch/twiki/bin/view/LCG/ValidationStatus<br />
** 4 UK sites (ECDF, Brunel, Durham, Oxford) had slight discrepancies: Looks either to be because more metrics are now being represented (blue = unknown, site not penalised), or older service still in vo feed (AGIS).<br />
** Next step is to compare availabilities for August for SAM2/3 and compare for sites<br />
** If sites see any discrepancies between http://dashb-atlas-sum.cern.ch/dashboard/request.py/historicalsmryview-sum and http://wlcg-sam-atlas.cern.ch/dashboard/request.py/historicalsmry for their site, please let me know<br />
<br />
* Squid monitoring TF meeting last Thursday<br />
<br />
** Cosmin presented ALICE CVMFS proposal to revived TF<br />
** Notes from meeting: https://twiki.cern.ch/twiki/bin/view/LCG/SquidMonitoringTF20140828MeetingNotes<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Monitoring****************** -----><br />
<!-- ****************On-duty****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | On-duty - [https://operations-portal.in2p3.fr/dashboard Dashboard] [https://www.gridpp.ac.uk/wiki/ROD_rota ROD rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 2nd September'''<br />
* Sussex is back in business - kept closing their low availability alarm wrt the GGUS ticket.<br />
* The UCL ticket is now finally receiving some attention.<br />
* Ongoing problems at RAL.<br />
<br />
'''Tuesday 26th August'''<br />
* RAL : Nagios jobs staying in queue for long time - to be investigated.<br />
* Sussex : Matt needs help probably from some SGE experts.<br />
* UCL : No acknowledgement from the site (ticket escalated to second level).<br />
* 100IT : There is an alarm from EGI federated cloud - this needs discussion.<br />
* Durham : Availability alarms - require constant closing with some comments. Ticket with devs is open.<br />
<br />
'''Tuesday 12th August'''<br />
* Last week was quiet.<br />
* Still one or to responses needed for next rota allocations.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Rollout [http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html Status] [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions WLCG Baseline]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 26th August'''<br />
* EMI3 WN tarball update needed soon ([https://ggus.eu/index.php?mode=ticket_info&ticket_id=107869 GGUS 107869])<br />
<br />
'''Monday 28th July'''<br />
* [https://operations-portal.egi.eu/broadcast/archive/id/1180 UMD v.3.8.0] was released on 24th July.<br />
<br />
<br />
'''References'''<br />
<br />
* Staged Rollout pages (now separated into EMI1 & 2), and the page listing the deployed versions is extractable from the bdii, so they should all be reasonably up-to-date:<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout_emi2.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End On-duty****************** -----><br />
<!-- ****************Start Security****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Security - [http://www.gridpp.ac.uk/security/inchand/Incident.html Incident Procedure] [http://www.gridpp.ac.uk/security/policies/index.html Policies] [https://www.gridpp.ac.uk/wiki/SD_rota Rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<br />
* FAX update<br />
<br />
'''Monday 8th August'''<br />
* There was a security team meeting last Wednesday.<br />
* There was a CA TAG meeting also last Wednesday.<br />
<br />
'''Monday 11th August'''<br />
* Topics as mentioned during the last GridPP technical meeting.<br />
<br />
* There is an issue at the moment in the evaluation of vulnerabilities causing everything rated 'High' by Pakiti to display as 'Critical' in the Dashboard.<br />
<br />
<br />
* The EGI [https://operations-portal.egi.eu/csiDashboard security dashboard].<br />
<br />
|}<br />
<!-- ****************End Security****************** -----><br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start Services****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Services - [http://netmon02.grid.hep.ph.ic.ac.uk:8080/maddash-webui/index.cgi PerfSonar dashboard] | [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).<br />
<br />
'''Tuesday 16th September'''<br />
* Another reminder of the [http://pprc.qmul.ac.uk/~lloyd/gridpp/nettest_v6.html IPv6 network tests].<br />
* There is an [https://indico.cern.ch/event/318811/ LHCONE/LHCOPN meeting] taking place yesterday and today.<br />
* RIPE A at Glasgow now live (but tagged ...). Hope to see others soon.<br />
<br />
<br />
'''Tuesday 9th September'''<br />
* RIPE probes now hosted: Cambridge, Sheffield, Liverpool, Lancaster (& Oxford and QMUL). Glasgow connected but no data.<br />
* RIPE probes not yet hosted: 6 sites.<br />
<br />
'''Tuesday 2nd September'''<br />
* Only a few of the RIPE probes went live last week - any issues at the other sites to be discussed?<br />
* JANET is going to deploy a perfSONAR instance on one of the exchange points in London. They hope it will help raise awareness of issues with local systems affecting their transfer performance.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Services****************** -----><br />
<!-- ****************Start Tickets****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tickets<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Monday 8th September 2014, 15.00 BST'''<br /><br />
25 Open UK tickets this week.<br />
<br />
'''NO SITE IN PARTICULAR'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=108182 108182](3/9)<br /><br />
As seen on TB-SUPPORT, the NGI has a ticket telling it to get sites to have the new voms servers configured for the switch over. Jeremy has kindly offered to field the ticket. I think we all have this in hand, but as I type this I realise I may have forgotten to set things up for the ops VO. I encourage everyone to double check their readiness ahead of next Monday's switchover. Assigned (8/9)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=106615 106615](2/7)<br /><br />
The RAL FTS2 service has been shutdown for nearly a week now, so I suspect this ticket tracking the switch off can be closed. In progress (3/9)<br />
<br />
'''RALPP'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=108306 108306](8/9)<br /><br />
CMS having trouble running a "locateall" AAA test at RALPP (TBH I don't know what that is) - Chris has let them know that this is due to their xrootd reverse proxy being down, and it should be up and running in a day or two after it's reinstalled. In progress (8/9)<br />
<br />
'''OXFORD'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=107911 107911](27/8)<br /><br />
As mentioned last week, Sno+ have been having trouble as they can't assign software tags on Arc CEs, and they use these tags to do stuff like black/white listing. There was some dicussion on this in the ticket, but it fizzled out- I suspect due to the topic moving offline. Can it have an update please? In progress (27/8)<br />
<br />
'''BRISTOL'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=106554 106554](29/6)<br /><br />
CMS transfer problems to Bristol. Winnie put an update, where she mentioned she has applied a fix to their Storm that might have fixed the problem. Maybe. She's asked if the problem still persists, as the monitoring links provided have all gone stale. Lukasz is on leave, can anyone CMS savvy help her? Waiting for reply (8/9)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=106325 106325](18/6)<br /><br />
CMS Pilots losing contact with home base. No progress since Winnie noticed that the problem only seems to affect one of the Bristol clusters, but none expected due to leave. On Hold (8/9)<br />
<br />
''Update - Bristol have another, possibly related CMS ticket [https://ggus.eu/index.php?mode=ticket_info&ticket_id=108317 108317]<br />
<br />
'''EDINBURGH'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=108100 108100](1/9)<br /><br />
Maarten ticketed ECDF about this CE's not having the new voms servers configured. Andy is working on it. There's a reminder that on top of adding the right configs services do need restarting. In progress (5/9)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=95303 95303](1/7/2013)<br /><br />
glexec tarball ticket. There's a bit more movement on getting this done, but it's all on me to get the tarball glexec working still - naught the Edinburgh chaps can do.<br />
<br />
'''DURHAM'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=108273 108273](5/9)<br /><br />
Duncan noticed some interesting goings on on the Durham perfsonar page. The Durham chaps are talking to their networking team to figure out what the flip is going on. In progress (8/9)<br />
<br />
'''SHEFFIELD'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=107886 107886](26/8)<br /><br />
Duncan's unwavering gaze also noticed a problem on Sheffield's perfsonar. Elena was tweaking it when it broke, and it looks like it's still broken, any luck fixing it Elena? In progress (26/8)<br />
<br />
'''LIVERPOOL'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=108288 108288](8/9)<br /><br />
Liverpool got a ROD ticket when their CREAM CE got poorly. Steve worked his magic and things were fixed, but Gareth asks about the persisting BDII tests still failing. Solved (8/9) ''Update - the problems seems to have disappeared, so was probably just a artifact of BDII lag.''<br />
<br />
'''LANCASTER'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=100566 100566](27/1)<br /><br />
My personal shame number 1. Lancaster's poor perfsonar performance. Despite a reinstall of the box and not showing any signs of a bottle neck in transfers or running manual tests we still have really poor perfsonar results. No problems with the network have been found. Duncan helped formulate a plan at GridPP, but I haven't had the time to test it out yet. On hold (8/9)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=95299 95299](1/7/13)<br />
My personal shame number 2 - Lancaster's glexec deployment ticket. Some news in that I have something I'd like to test now - I just need to find time to test it, then see if I can package it somehow. On hold (8/9)<br />
<br />
'''UCL'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=95298 95298](1/7/13)<br /><br />
UCL's glexec deployment ticket. This work was pushed back to the end of August - any news on it? On Hold (29/7)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=107711 107711](15/8)<br /><br />
A ROD ticket for UCL APEL publishing errors. The apel admins got involved and things are looking better now - although Gareth points out that there is some missing data in the Spring. In progress (8/9)<br />
<br />
'''QMUL'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=107799 107799](21/8)<br /><br />
Pointing VO_SNOPLUS_SNOLAB_CA_SW_DIR to /cvmfs/snoplus.gridpp.ac.uk. No news for a while on this after it was acknowledged - has the job fallen to the bottom of the stack? In progress (22/8) ''Solved now, issue was dealt with last week but the ticket wasn't updated.''<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=108217 108217](3/9)<br /><br />
Duncan ticketed QM about one of their pefsonar boxen - which Dan pointed out is their IPv6 perfsonar. So does that mean this ticket can be closed? In progress (4/9) ''Update - Duncan would like the ticket kept open to track this node's assimmalation into the mesh.''<br />
<br />
'''EFDA-JET'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=97485 97485](21/9/13)<br /><br />
Longstanding LHCB ticket with JET. No movement on this, but none was expected. Still if anyone wants to heroically interject with some ideas I'm sure it would be appreciated. On hold (29/7)<br />
<br />
'''TIER 1'''<br /><br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=107880 107880](26/8)<br /><br />
As mentioned last week, Matt M of Sno+ fame has a user who only has access to srm tools and is having trouble accessing files at RAL. Brian has suggested using the webfts, but Matt doesn't think this will work for the user's limited abilities. Any thoughts? In progress (8/9)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=107935 107935](27/8)<br /><br />
Inconsistency between BDII and SRM reported storage capacity...hang on, haven't we been here before (105571)? It's not quite the same problem, but it's close. Brian has confirmed the mismatch, Maria has asked for an explanation for it (and how it only really effects ATLASHOTDISK). In progress (3/9)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=105405 105405](14/5)<br /><br />
Checking the site firewall configuration for RAL's Vidyo router. Last update was in July, is the dialogue between the Vidyo team and the RAL networking chaps ongoing? On hold (1/7)<br />
<br />
[https://ggus.eu/index.php?mode=ticket_info&ticket_id=106324 106324](18/6)<br /><br />
The Tier 1's version of 106325 - CMS pilots losing contact. This was waiting on the firewall expert getting back from hols to compare the settings between the Tier 1 and Tier 2 (who don't see this issue). Are they back yet? On Hold (14/8)<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tickets****************** -----><br />
<!-- ****************Start Tools****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tools - [https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] [https://gridppnagios.physics.ox.ac.uk/nagios/ Nagios]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== ===== <br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 16th Sep'''<br />
<br />
Multi VO nagios maintained at Oxford has been upgraded to add ARC CE tests. <br />
<br />
https://vo-nagios.physics.ox.ac.uk/nagios/<br />
<br />
It is currently monitoring gridpp, pheno, t2k.org, snoplus.snolab.ca, vo.southgrid.ac.uk<br />
<br />
Should we start monitoring it more actively and open ticket for sites failing tests ? <br />
<br />
'''Monday 14th July'''<br />
<br />
Winnie reported on Saturday 12th July that most of the UK sites are failing nagios test. Problem started with unscheduled power cut at a Greek site hosting EGI Message broker (mq.afroditi.hellasgrid.gr) around 2PM on 11th July. Message broker was put in downtime but topbdii's continued to publish it for quite long time. Stephen Burke mentioned in TB support thread that now default caching time is 4 days. When I checked on Monday morning only Manchester was still publishing mq.afroditi and it went away after Alessandra manually restarted top bdii. It seams that Imperial is configured with much shorter cache time.<br />
Only Oxford and Imperial was almost not affected and the reason may be that Oxford WN's have Imperial top bdii as first option in BDII_LIST. Other NGI's have reported same problem and this outage is likely to be considered when calculating availability/reliability. All Nagios tests came back to normal now. <br />
<br />
Emir reported this on tools-admin mailing list<br />
"We were planning to raise this issue at the next Operations meeting. In these extreme cases 24h cache rule in Top BDII has to be somehow circumvented." <br />
<br />
'''Tuesday 1st July'''<br />
* There was a monitoring problem on 26th June. All ARC CE's were using storage-monit.phyics.ox.ac.uk for replicating files as part of the nagios testing. storage-monit was updated but not re-yaimed until later. Storage-monit was broken for the morning leading to all ARC SRM tests failing.<br />
<br />
'''Tuesday 24th June'''<br />
* An update from Janusz on DIRAC:<br />
* We had a stupid bug in Dirac which affected the gridpp VO and storage. Now it is fixed and I was able to successfully upload a test file to Liverpool and register the file with the DFC<br />
* The async FTS is still under study, there some issues with this.<br />
*I have a link to software to sync user database from a VOMS server, haven’t looked into this in detail yet.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tools****************** -----><br />
<!-- ****************Start VOs****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | VOs - [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS] [http://operations-portal.egi.eu/vo VO IDs] [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs Approved] [http://pprc.qmul.ac.uk/~walker/votable.html VO table]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Monday 11th August'''<br />
* Steve J sent an email to hyperk on 7th regarding "software directory for Hyperk (CVMFS)" and entries in the VO ID card.<br />
<br />
"Monday 14th July 2014"<br />
* HyperK.org will initially use remote storage (irods at QMUL) - so CPU resources would be appreciated. <br />
<br />
"Monday 30 June 2104"<br />
* HyperK.org request for support from other sites<br />
** 2TB storage requested.<br />
** CVMFS required<br />
<br />
* Cernatschool.org<br />
** WebDAV access to storage -world read works at QMUL.<br />
** ideally will configure federated access with DFC as LFC allows.<br />
<br />
<br />
'''Monday 16 June 2014'''<br />
* CVMFS<br />
** Snoplus almost ready to move to CVMFS - waiting on two sites. Will use symlinks in existing software<br />
<br />
* VOMS server: Snoplus has problems with some of the VOMS servers - see ggus 106243 - may be related to update. <br />
<br />
<br />
'''Tuesday 15th April'''<br />
* Is there interest in an FTS3 web front end? ([http://indico.cern.ch/event/272620/contribution/11/material/slides/1.pdf more details])<br />
<br />
<br />
* Impact<br />
** Citation policy (https://www.gridpp.ac.uk/acknowledging.html)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End VOs****************** -----><br />
<!-- ****************Start Sites****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Site Updates<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 9th September'''<br />
* Intel announced the new generation of Xeon based on Haswell.<br />
<br />
'''Tuesday 20th May'''<br />
* Various sites but notably Oxford have ARGUS problems. 100s of requests seen per minute. Performance issues have been noted after initial installation at RAL, QMUL and others.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Sites****************** -----><br />
|}<br />
<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Meeting Summaries<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start PMB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Project Management Board - [http://www.gridpp.ac.uk/pmb/ Members][http://www.gridpp.ac.uk/php/pmb/minutes.php Minutes] [http://www.gridpp.ac.uk/pmb/ProjectManagement/QuarterlyReports/reports.html Quarterly Reports]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ******************************************* -----><br />
<!-- ****************End PMB****************** -----><br />
<!-- ****************Start Grid Ops****************** -----><br />
<!-- ******************************************* -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | GridPP ops meeting - [http://indico.cern.ch/categoryDisplay.py?categId=338 Agendas] [https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Actions] [https://www.gridpp.ac.uk/wiki/Category:GridPP_Operations Core Tasks]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Grid Ops****************** -----><br />
<!-- ****************Start T1 liaison****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) [https://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Agenda] Meeting takes place on Vidyo.<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Wednesday 10th September 2014'''<br />
* [https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2014-09-10 Operations report]<br />
* CMS are now writing to the newer T10KD tapes and migration of CMS data from 'B' to 'D' tapes is underway. <br />
* Access to the Cream CEs will be withdrawn apart from leaving access for ALICE. This has been announced for Tuesday 30th September. <br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Liaison****************** -----><br />
<!-- ****************Start GDB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Grid Deployment Board - [http://indico.cern.ch/categoryDisplay.py?categId=3l181 Agendas] [http://indico.cern.ch/categoryDisplay.py?categId=666 MB agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End GDB****************** -----><br />
<br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start NGI****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | NGI UK - [http://www.ukngi.ac.uk/ Homepage] [https://ca.grid-support.ac.uk/cgi-bin/pub/pki?cmd=getStaticPage&name=index CA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End NGI****************** -----><br />
<br />
<!-- ****************Start Events****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Events<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************Stop Events****************** -----><br />
<!-- ****************Start ATLAS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK ATLAS - [http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?#currentView=Shifter+view&highlight=false Shifter view] [http://www.atlas.ac.uk/ops/ News & Links]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End ATLAS****************** -----><br />
<!-- ****************Start CMS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK CMS<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End CMS****************** -----><br />
<!-- ****************Start LHCb****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK LHCb<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End LHCb****************** -----><br />
<!-- ****************Start Other****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK OTHER<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Other****************** -----><br />
<!-- ****************Start Requests****************** -----><br />
{| cellspacing="0" cellpadding="0" style="margin: 1em 0em 0em 0.3em; width:100%"<br />
| style="width:50%; vertical-align:top; border:1px solid Gold; background-color: LightGreen;" rowspan="2"|<br />
<div style="border-bottom:1px solid Gold; background-color:#ffffaa; padding:0.2em 0.5em 0.2em 0.5em; font-size:110%; font-weight:bold;"> To note</div><br />
<div style="padding:0.4em 1em 0.3em 1em;"><br />
<br />
==== ====<br />
<!-- ******************Edit start********************* -----><br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<br />
</div><br />
|}<br />
<br />
<!-- ****************End Requests****************** -----><br />
<br />
|}</div>Kashif Mohammad 6ae08fa8ffhttps://www.gridpp.ac.uk/wiki/Operations_Bulletin_LatestOperations Bulletin Latest2014-07-14T23:21:50Z<p>Kashif Mohammad 6ae08fa8ff: /* */</p>
<hr />
<div>[[Operations_Bulletin_Archive|Bulletin archive]]<br />
<br />
__NOTOC__<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Week commencing 14th July 2014<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Task Areas<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start General****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | General updates <br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start General text*********************** ----->'''<br />
<br />
'''Monday 14th July'''<br />
* Workshop - CVMFS monitoring feedback <br />
* ATLAS DC14 13TeV simulation starting - note Alessandra's recommendation regarding Nikehf scripts and multicore running (for torque/maui) sites.<br />
* topBDII caching and errors<br />
* ILC VOMS changes<br />
<br />
* Sites with ARC CEs who want to support LHCb need to make a few configuration changes. This is to ensure that there is an environment variable available to jobs which specifies the name of the queue.<br />
* EGI [https://wiki.egi.eu/wiki/Availability_and_reliability_monthly_statistics A/R report for June]<br />
* Did anyone else see kernel problems like Liverpool (see [http://northgrid-tech.blogspot.co.uk/2014/04/kernel-problems-at-liverpool.html blog])<br />
* Large numbers of biomed jobs have been impacting various sites. Is setting MaxTotalJobs the answer? Do we need follow-up with the VO?<br />
* HyperK can now make use of additional resources and a general request for enablement was circulated. It has been confirmed that they only need disk at QMUL.<br />
<br />
<!-- **********************End General text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End General****************** -----><br />
<!-- ****************Start ops coord****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Operations Coordination - [https://indico.cern.ch/categoryDisplay.py?categId=4372 Agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start ops coord text*********************** -----><br />
'''Tuesday 14th July'''<br />
* The regular meeting would have taken place last week - an update was presented at the workshop. [https://indico.cern.ch/event/326084/ Next meeting is on 24th July].<br />
<br />
'''Tuesday 1st July'''<br />
* There will be a [https://indico.cern.ch/event/327276/ multi-core TF meeting this afternoon] at 13:30 UK time.<br />
* A reminder that the [http://indico.cern.ch/event/320011/ next MW readiness WG meeting] is this Wednesday (2nd July) at 15:00 UK time.<br />
<br />
'''Monday 23rd June'''<br />
* [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes140619 Minutes] from [https://indico.cern.ch/event/323689/ last Thursday's] meeting. Highlights....<br />
* A [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions#Issues_Affecting_the_WLCG_Infras page] is available listing current known middleware issues affecting WLCG.<br />
* [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions Baselines]: Storm 1.11.4 released in EMI containing several bug fixes. Baseline update with UMD release.<br />
* 3 issues affected some sites after the latest EMI update of Cream and LB. The problems are under investigations by the PTs.<br />
* CVMFS: Starting from July, sites not compliant with the 2.1.19 version will be notified with a GGUS ticket (noted that upgrade just requires an update of the RPM and a restart CVMFS). <br />
* T0: The OPS VO now runs in voms-admin instead of VOMRS, after the migration done on June 17th <br />
* Tier-1/Tier-2 feedback: NTR!<br />
* ALICE: successful campaign for users to move away from old ROOT versions. T0 job efficiency issues ongoing.<br />
* ATLAS: DC14 expected to start in approximately 2 weeks from now.Panda/Jedi is now fully ready for user analysis.<br />
* CMS: Started to remove individual release tags from CEs. After the introduction of disk/tape separation at the T1 sites, CMS now must site readiness measures for T1 sites<br />
* LHCb: Recommend CVMFS 2.1.19. General request: ensure that downtimes, including unscheduled outages, accurately reflect the specific services which are unavailable.<br />
* FTS3: Monitoring the auto-tuning algorithm closely and adjusting various monitoring tools of FTS3.<br />
* glexec: 10 sites have yet to enable it. ARGUS instabilities being investigated.<br />
* Machine/job features: PBS/torque and LSF implemented. SLURM pending. SGE and HTCondor in progress.<br />
* MW readiness: ATLAS and CMS DPM setups in progress. Monitoring prototype being deployed at test sites. <br />
* Multicore: CMS stable flow. Gathering reports for July workshop. ATLAS MC jobs on-hold pending new software release.<br />
* SHA-2: New VOMS fix for CERN instances requires sites to update ARGUS, UI, CREAM and WN instances.<br />
* WMS decommissioning: Progress with SAM Condor validation. ARC-CE WN tests failing for some CMS sites (incl. Imperial).<br />
* IPv6: NTR<br />
* HTTP proxy discovery: [https://twiki.cern.ch/twiki/bin/view/LCG/HttpProxyDiscoveryTaskForce#Task_Overview Task overview table] updated.<br />
* Network and transfers metrics: [https://twiki.cern.ch/twiki/bin/view/LCG/MeshLeaders Mesh leaders] developed. Kick off in July.<br />
* AOB: OSG plan to migrate to HTCondor CEs by October.<br />
<br />
<br />
<br />
<br />
<!-- **********************End ops coord text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End ops coord****************** -----><br />
<br />
<!-- ****************Start T1****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-1 - [http://www.gridpp.rl.ac.uk/status/ Status Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start T1 text*********************** -----><br />
'''Tuesday 1st July'''<br />
* LHCb Castor Stager Upgrade was carried out successfully last Thursday. The final update is the Atlas Castor instance stager which is planned for the Atlas - Tue 1st July.<br />
* There is a UPS/Generator load test tomorrow morning (Wed 2nd July) and the site has been declared in an At Risk (warning) in the GOC DB from 10 to 11 local time.<br />
* We are looking at how to end the FTS2 service, now FTS3 is becoming widely used.<br />
* The software server used by the small VOs will be withdrawn from service. Its use as a software server is very limited (possibly only SNO+) although a few VOs use it for uploading files to the CVMFS repository.<br />
<!-- **********************End T1 text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End T1****************** -----><br />
<br />
<!-- ****************Start Storage & DM****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Storage & Data Management - [http://storage.esc.rl.ac.uk/weekly/ Agendas/Minutes]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
'''Wednesday 2 July'''<br />
* Guidance and policies for "small" VOs: how to get them started with stuff, without preventing them later growing bigger.<br />
<br />
'''Tuesday 1st July'''<br />
* Spacetokens for smaller VOs ... most want them but what happens post SRM. Chris's [https://www.gridpp.ac.uk/wiki/Storage/SpaceTokens summary on Spacetokens] needs updating and a consensus! Could the SEs implement some reservation system internally? Is there merit in the suggestion to make use of [https://www.gridpp.ac.uk/wiki/Using_Castor_At_RAL RAL genscratch and its Least Used Policy]?<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End Storage & DM****************** -----><br />
<!-- ****************Start Accounting****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Accounting - [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html UK Grid Metrics] [[HEPSPEC06]] [http://tinyurl.com/cfbfanf Atlas Dashboard HS06]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 1st July'''<br />
* There are no SL6 [https://www.gridpp.ac.uk/wiki/HEPSPEC06 HS06 entries] in our wiki for UCL and EFDA.<br />
* Are there any observations from the latest GridPP [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html metrics tables]? (Does anything need addressing or correcting?).<br />
* APEL is not up-to-date for: RHUL; Manchester and Durham.<br />
<br />
<br />
'''Tuesday 24th June'''<br />
* APEL not up-to-date for: RHUL; Manchester, Durham and Sussex.<br />
<br />
<br />
* Check publishing via: http://gstat2.grid.sinica.edu.tw/gstat/summary/Country/UK/ <br />
<br />
<!-- *****************Edit stop*********************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End Accounting****************** -----><br />
<!-- ****************Start Documentation****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Documentation - [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=area KeyDocs]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- *********************************************************** -----><br />
<!-- ******************Edit start********************* -----><br />
<br />
See the [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=reviewed worst KeyDocs list] for documents needing review now and the names of the responsible people.<br />
<br />
'''Monday 16th June'''<br />
* A review is starting of old and obsolete pages within the GridPP website - there are many! Please review sections that you have created and update them if necessary.<br />
<br />
'''Tuesday 6th April'''<br />
* KeyDocs are going to be reviewed (in next 4 weeks) as the system is not working (or not adding anything) in some areas.<br />
<br />
<br />
|}<br />
<br />
<!-- ****************End Documentation****************** -----><br />
<!-- ****************Start Interop****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Interoperation - [https://wiki.egi.eu/wiki/Grid_Operations_Meetings EGI ops agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 14th July'''<br />
* Last meeting yesterday.<br />
<br />
* Agenda: https://wiki.egi.eu/wiki/Agenda-14-07-2014<br />
* Minutes: <br />
<br />
* URT: see agenda for details<br />
* SR: In verification: gfal2 v. 2.5.5; active: globus-info-provider-service v. 0.2.1 cream v. 1.16.3; ready to be released: storm v. 1.11.4 lb v. 11.1 wms v. 3.6.5 dcache v. 2.6.28<br />
* DMSU report: CREAM CLI/GridSite SegFaults at Long-Lived Proxies solved<br />
* Migration of Central SAM services: Note to make sure that if being reinstalled that patches are applied<br />
* EMI-2/APEL-2 - Looks like UCL is still publishing with APEL-2 publisher<br />
* Hoped that gr.net issues resolved on Monday. Summary of discussion to be in minutes.<br />
* Next meeting placeholder 28th July, but may not happen (OMD depending)<br />
* Please fill out this UMD customer satisfaction survey in the next couple of weeks if you had a moment: https://www.surveymonkey.com/s/MQ6G8BZ<br />
<br />
'''Tuesday 1st July'''<br />
* Today's ops meeting cancelled - partly due to forthcoming 4th EGI annual review.<br />
* EMI-2 decommissioning: The situation is followed by COD ([https://ggus.eu/index.php?mode=ticket_info&ticket_id=106354 GGUS 106354]). "Please remember that we passed the decommissioning deadline and after today - Sites still deploying unsupported service end-points risk suspension, unless documented technical reasons prevent a Site Admin from updating these end-points (source [https://wiki.egi.eu/wiki/PROC16 PROC16]).<br />
* There is STILL use of UMD2/EMI2 APEL clients to send accounting data. As of today there are 20 sites ([http://goc-accounting.grid-support.ac.uk/consumer/ see latest list]) still using UMD2/EMI2 APEL clients<br />
<br />
<br />
* Updates requested for the [https://www.egi.eu/earlyAdopters/table early adopters table]. <br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- *********************************************************** -----><br />
<br />
|}<br />
<!-- ****************End Interop****************** -----><br />
<!-- ****************Start Monitoring****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Monitoring - [https://www.gridpp.ac.uk/wiki/Links_Monitoring_pages Links] [http://grid-monitoring.cern.ch/mywlcg/ MyWLCG]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 24th June'''<br />
* Meeting last Friday: https://indico.cern.ch/event/324687/<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Monitoring****************** -----><br />
<!-- ****************On-duty****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | On-duty - [https://operations-portal.in2p3.fr/dashboard Dashboard] [https://www.gridpp.ac.uk/wiki/ROD_rota ROD rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 1st July'''<br />
* Quiet week. Sussex emi2 ticket is still open. UCL also has a open ticket regarding some problem with storage.<br />
<br />
'''Tuesday 24th June'''<br />
* Very quiet shift. Dashboard downtime on Tuesday seemed to go ok.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Rollout [http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html Status] [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions WLCG Baseline]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 18th March'''<br />
* The EMI-2 decommissioning task has started.<br />
* The [http://indico.cern.ch/event/MW-Readiness_3 next WLCG middleware readiness WG meeting] takes place this afternoon at 13:30 UK time. <br />
<br />
'''Tuesday 11th February'''<br />
* 31st May has been set as the deadline for EMI-2 decommissioning. There may be an issue for dCache (related to 3rd party/enstore component).<br />
<br />
'''References'''<br />
<br />
* Staged Rollout pages (now separated into EMI1 & 2), and the page listing the deployed versions is extractable from the bdii, so they should all be reasonably up-to-date:<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout_emi2.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End On-duty****************** -----><br />
<!-- ****************Start Security****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Security - [http://www.gridpp.ac.uk/security/inchand/Incident.html Incident Procedure] [http://www.gridpp.ac.uk/security/policies/index.html Policies] [https://www.gridpp.ac.uk/wiki/SD_rota Rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
'''Monday 14th July'''<br />
* EGI CSIRT ADVISORY [EGI-ADV-20140625] <br />
<br />
<br />
'''Tuesday 1st July'''<br />
* There was a very useful security challenge debrief last week. Thanks to Heiko.<br />
* There may be a site contacts challenge in the coming months. Please could every site review their site security contact details and ensure that the GOCDB entry is up-to-date and working. <br />
* EGI indicates that site ARGUS instances can now be hooked up with the regional instances.<br />
* There was one EGI amber final report last week.<br />
* Next team meeting 16th July.<br />
<br />
'''Monday 23rd June'''<br />
* CVE-2014-3153 - but no public exploit.<br />
** This kernel vulnerability has been patched in errata released last week.<br />
* PerfSonar/Cacti updates.<br />
* New IGTF CA release 1.58 - the EGI release is due on 30th June.<br />
<br />
<br />
|}<br />
<!-- ****************End Security****************** -----><br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start Services****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Services - [http://netmon02.grid.hep.ph.ic.ac.uk:8080/maddash-webui/index.cgi PerfSonar dashboard] | [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).<br />
<br />
'''Tuesday 17th June'''<br />
* The GridPP VOMS server was updated on 11/06/2014 - no issues reported.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Services****************** -----><br />
<!-- ****************Start Tickets****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tickets<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Monday 14th July 2014, 14.30 BST.'''<br /><br />
29 Open UK tickets today. I might have to send my apologies to this week's meeting as Lancaster is receiving a delivery Tuesday morning.<br />
<br />
'''FNAL VOMS TICKETS'''<br /><br />
As seen on TB-SUPPORT - a number of sites got tickets concerning jobs still contacting the FNAL voms server for CMS/ILC. Birmingham, RHUL, Liverpool and the Tier 1's tickets are still being worked on - RHUL's ticket might not have been spotted yet (still assigned).<br />
<br />
'''DECOMMISSIONING THE FTS3 SERVICE'''<br /><br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=106615 (2/7)<br /><br />
Gareth opened a ticket to document the retirement, in accordance with ancient grid laws. As naught is happening until the 2nd of September I put on hold till nearer the time. On Hold (14/7)<br />
<br />
'''TIER 1'''<br /><br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=106770 (10/7)<br /><br />
enmr.eu wanted to add tags to one of the Tier 1's arc ces, which of course didn't work. There was an interesting exchange about why a VO would still want to have a site publish tags in the age of cvmfs (essentially so they can minimise changes to the submission gubbins). Andrew offered to add in the tag "VO-enmr.eu-CVMFS" by hand to his CE, it's likely that other sites might be asked to do the same - and it's a solution worth noting for other VOs. In progress (14/7)<br />
<br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=106610 (2/7)<br /><br />
Enabling HyperK at the Tier 1. Ticket looks a little stalled after Chris commented that it was wise for Hyper K to be enabled on only Arc-CEs (in light of RAL going dairy free). In progress (2/7)<br />
<br />
'''UCL'''<br /><br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=106425 (4/7)<br /><br />
UCL are still having trouble with nagios tests after a pool node died. Ben is having trouble getting the new disk server set up - I tried to give him some tips and advised shouting out for help. In progress (8/7)<br />
<br />
'''BRISTOL'''<br /><br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=106554 (1/6)<br /><br />
Bristol having trouble with CMS transfers- Lukasz noticed Storm was being odd (believing there to be no free space when there was). The SE was kicked but the problem (or a similar one) showed up again. Anyone seen similar? (Looking at Chris Walker:Storm Sage again here). In Progress (9/7)<br />
<br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=106325 (1/6)<br /><br />
cf TIER 1 ticket https://ggus.eu/index.php?mode=ticket_info&ticket_id=106324<br /><br />
CMS pilots losing contact with their home base. Looks similar to the issue at RAL, where they seem to have had some success (still waiting to see if it was complete). If the RAL chaps could elaborate on the firewall tweaks that brought about this improvement it would be greatly appreciated (The RAL ticket could do with an update too)! In Progress (14/7)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tickets****************** -----><br />
<!-- ****************Start Tools****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tools - [https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] [https://gridppnagios.physics.ox.ac.uk/nagios/ Nagios]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== ===== <br />
<!-- ******************Edit start********************* -----><br />
'''Monday 14th July'''<br />
<br />
Winnie reported on Saturday 12th July that most of the UK sites are failing nagios test. Problem started with unscheduled power cut at a Greek site hosting EGI Message broker (mq.afroditi.hellasgrid.gr) around 2PM on 11th July. Message broker was put in downtime but topbdii's continued to publish it for quite long time. Stephen Burke mentioned in TB support thread that now default caching time is 4 days. When I checked on Monday morning only Manchester was still publishing mq.afroditi and it went away after Alessandra manually restarted top bdii. It seams that Imperial is configured with much shorter cache time.<br />
Only Oxford and Imperial was almost not affected and the reason may be that Oxford WN's have Imperial top bdii as first option in BDII_LIST. Other NGI's have reported same problem and this outage is likely to be considered when calculating availability/reliability. All Nagios tests came back to normal now. <br />
<br />
Emir reported this on tools-admin mailing list<br />
"We were planning to raise this issue at the next Operations meeting. In these extreme cases 24h cache rule in Top BDII has to be somehow circumvented." <br />
<br />
'''Tuesday 1st July'''<br />
* There was a monitoring problem on 26th June. All ARC CE's were using storage-monit.phyics.ox.ac.uk for replicating files as part of the nagios testing. storage-monit was updated but not re-yaimed until later. Storage-monit was broken for the morning leading to all ARC SRM tests failing.<br />
<br />
'''Tuesday 24th June'''<br />
* An update from Janusz on DIRAC:<br />
* We had a stupid bug in Dirac which affected the gridpp VO and storage. Now it is fixed and I was able to successfully upload a test file to Liverpool and register the file with the DFC<br />
* The async FTS is still under study, there some issues with this.<br />
*I have a link to software to sync user database from a VOMS server, haven’t looked into this in detail yet.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tools****************** -----><br />
<!-- ****************Start VOs****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | VOs - [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS] [http://operations-portal.egi.eu/vo VO IDs] [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs Approved] [http://pprc.qmul.ac.uk/~walker/votable.html VO table]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
"Monday 14th July 2014"<br />
* HyperK.org will initially use remote storage (irods at QMUL) - so CPU resources would be appreciated. <br />
<br />
"Monday 30 June 2104"<br />
* HyperK.org request for support from other sites<br />
** 2TB storage requested.<br />
** CVMFS required<br />
<br />
* Cernatschool.org<br />
** WebDAV access to storage -world read works at QMUL.<br />
** ideally will configure federated access with DFC as LFC allows.<br />
<br />
<br />
'''Monday 16 June 2014'''<br />
* CVMFS<br />
** Snoplus almost ready to move to CVMFS - waiting on two sites. Will use symlinks in existing software<br />
<br />
* VOMS server: Snoplus has problems with some of the VOMS servers - see ggus 106243 - may be related to update. <br />
<br />
<br />
'''Tuesday 15th April'''<br />
* Is there interest in an FTS3 web front end? ([http://indico.cern.ch/event/272620/contribution/11/material/slides/1.pdf more details])<br />
<br />
<br />
* Impact<br />
** Citation policy (https://www.gridpp.ac.uk/acknowledging.html)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End VOs****************** -----><br />
<!-- ****************Start Sites****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Site Updates<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 20th May'''<br />
* Various sites but notably Oxford have ARGUS problems. 100s of requests seen per minute. Performance issues have been noted after initial installation at RAL, QMUL and others.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Sites****************** -----><br />
|}<br />
<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Meeting Summaries<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start PMB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Project Management Board - [http://www.gridpp.ac.uk/pmb/ Members][http://www.gridpp.ac.uk/php/pmb/minutes.php Minutes] [http://www.gridpp.ac.uk/pmb/ProjectManagement/QuarterlyReports/reports.html Quarterly Reports]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ******************************************* -----><br />
<!-- ****************End PMB****************** -----><br />
<!-- ****************Start Grid Ops****************** -----><br />
<!-- ******************************************* -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | GridPP ops meeting - [http://indico.cern.ch/categoryDisplay.py?categId=338 Agendas] [https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Actions] [https://www.gridpp.ac.uk/wiki/Category:GridPP_Operations Core Tasks]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Grid Ops****************** -----><br />
<!-- ****************Start T1 liaison****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) [https://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Agenda] Meeting takes place on Vidyo.<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Wednesday 25th June 2014'''<br />
* [https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2014-06-25 Operations report]<br />
* Castor GEN Stager 2.1.14-13 updated yesterday (24th June). Some problems with xroot for ALICE not resolved until following morning. Remaining stager dates as follows (LHCb - Thu 26th June; Atlas - Tue 8th July.)<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Liaison****************** -----><br />
<!-- ****************Start GDB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Grid Deployment Board - [http://indico.cern.ch/categoryDisplay.py?categId=3l181 Agendas] [http://indico.cern.ch/categoryDisplay.py?categId=666 MB agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End GDB****************** -----><br />
<br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start NGI****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | NGI UK - [http://www.ukngi.ac.uk/ Homepage] [https://ca.grid-support.ac.uk/cgi-bin/pub/pki?cmd=getStaticPage&name=index CA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End NGI****************** -----><br />
<br />
<!-- ****************Start Events****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Events<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************Stop Events****************** -----><br />
<!-- ****************Start ATLAS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK ATLAS - [http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?#currentView=Shifter+view&highlight=false Shifter view] [http://www.atlas.ac.uk/ops/ News & Links]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End ATLAS****************** -----><br />
<!-- ****************Start CMS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK CMS<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End CMS****************** -----><br />
<!-- ****************Start LHCb****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK LHCb<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End LHCb****************** -----><br />
<!-- ****************Start Other****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK OTHER<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Other****************** -----><br />
<!-- ****************Start Requests****************** -----><br />
{| cellspacing="0" cellpadding="0" style="margin: 1em 0em 0em 0.3em; width:100%"<br />
| style="width:50%; vertical-align:top; border:1px solid Gold; background-color: LightGreen;" rowspan="2"|<br />
<div style="border-bottom:1px solid Gold; background-color:#ffffaa; padding:0.2em 0.5em 0.2em 0.5em; font-size:110%; font-weight:bold;"> To note</div><br />
<div style="padding:0.4em 1em 0.3em 1em;"><br />
<br />
==== ====<br />
<!-- ******************Edit start********************* -----><br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<br />
</div><br />
|}<br />
<br />
<!-- ****************End Requests****************** -----><br />
<br />
|}</div>Kashif Mohammad 6ae08fa8ffhttps://www.gridpp.ac.uk/wiki/Operations_Bulletin_LatestOperations Bulletin Latest2014-07-14T23:16:02Z<p>Kashif Mohammad 6ae08fa8ff: /* */</p>
<hr />
<div>[[Operations_Bulletin_Archive|Bulletin archive]]<br />
<br />
__NOTOC__<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Week commencing 14th July 2014<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Task Areas<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start General****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | General updates <br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start General text*********************** ----->'''<br />
<br />
'''Monday 14th July'''<br />
* Workshop - CVMFS monitoring feedback <br />
* ATLAS DC14 13TeV simulation starting - note Alessandra's recommendation regarding Nikehf scripts and multicore running (for torque/maui) sites.<br />
* topBDII caching and errors<br />
* ILC VOMS changes<br />
<br />
* Sites with ARC CEs who want to support LHCb need to make a few configuration changes. This is to ensure that there is an environment variable available to jobs which specifies the name of the queue.<br />
* EGI [https://wiki.egi.eu/wiki/Availability_and_reliability_monthly_statistics A/R report for June]<br />
* Did anyone else see kernel problems like Liverpool (see [http://northgrid-tech.blogspot.co.uk/2014/04/kernel-problems-at-liverpool.html blog])<br />
* Large numbers of biomed jobs have been impacting various sites. Is setting MaxTotalJobs the answer? Do we need follow-up with the VO?<br />
* HyperK can now make use of additional resources and a general request for enablement was circulated. It has been confirmed that they only need disk at QMUL.<br />
<br />
<!-- **********************End General text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End General****************** -----><br />
<!-- ****************Start ops coord****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Operations Coordination - [https://indico.cern.ch/categoryDisplay.py?categId=4372 Agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start ops coord text*********************** -----><br />
'''Tuesday 14th July'''<br />
* The regular meeting would have taken place last week - an update was presented at the workshop. [https://indico.cern.ch/event/326084/ Next meeting is on 24th July].<br />
<br />
'''Tuesday 1st July'''<br />
* There will be a [https://indico.cern.ch/event/327276/ multi-core TF meeting this afternoon] at 13:30 UK time.<br />
* A reminder that the [http://indico.cern.ch/event/320011/ next MW readiness WG meeting] is this Wednesday (2nd July) at 15:00 UK time.<br />
<br />
'''Monday 23rd June'''<br />
* [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes140619 Minutes] from [https://indico.cern.ch/event/323689/ last Thursday's] meeting. Highlights....<br />
* A [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions#Issues_Affecting_the_WLCG_Infras page] is available listing current known middleware issues affecting WLCG.<br />
* [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions Baselines]: Storm 1.11.4 released in EMI containing several bug fixes. Baseline update with UMD release.<br />
* 3 issues affected some sites after the latest EMI update of Cream and LB. The problems are under investigations by the PTs.<br />
* CVMFS: Starting from July, sites not compliant with the 2.1.19 version will be notified with a GGUS ticket (noted that upgrade just requires an update of the RPM and a restart CVMFS). <br />
* T0: The OPS VO now runs in voms-admin instead of VOMRS, after the migration done on June 17th <br />
* Tier-1/Tier-2 feedback: NTR!<br />
* ALICE: successful campaign for users to move away from old ROOT versions. T0 job efficiency issues ongoing.<br />
* ATLAS: DC14 expected to start in approximately 2 weeks from now.Panda/Jedi is now fully ready for user analysis.<br />
* CMS: Started to remove individual release tags from CEs. After the introduction of disk/tape separation at the T1 sites, CMS now must site readiness measures for T1 sites<br />
* LHCb: Recommend CVMFS 2.1.19. General request: ensure that downtimes, including unscheduled outages, accurately reflect the specific services which are unavailable.<br />
* FTS3: Monitoring the auto-tuning algorithm closely and adjusting various monitoring tools of FTS3.<br />
* glexec: 10 sites have yet to enable it. ARGUS instabilities being investigated.<br />
* Machine/job features: PBS/torque and LSF implemented. SLURM pending. SGE and HTCondor in progress.<br />
* MW readiness: ATLAS and CMS DPM setups in progress. Monitoring prototype being deployed at test sites. <br />
* Multicore: CMS stable flow. Gathering reports for July workshop. ATLAS MC jobs on-hold pending new software release.<br />
* SHA-2: New VOMS fix for CERN instances requires sites to update ARGUS, UI, CREAM and WN instances.<br />
* WMS decommissioning: Progress with SAM Condor validation. ARC-CE WN tests failing for some CMS sites (incl. Imperial).<br />
* IPv6: NTR<br />
* HTTP proxy discovery: [https://twiki.cern.ch/twiki/bin/view/LCG/HttpProxyDiscoveryTaskForce#Task_Overview Task overview table] updated.<br />
* Network and transfers metrics: [https://twiki.cern.ch/twiki/bin/view/LCG/MeshLeaders Mesh leaders] developed. Kick off in July.<br />
* AOB: OSG plan to migrate to HTCondor CEs by October.<br />
<br />
<br />
<br />
<br />
<!-- **********************End ops coord text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End ops coord****************** -----><br />
<br />
<!-- ****************Start T1****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-1 - [http://www.gridpp.rl.ac.uk/status/ Status Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start T1 text*********************** -----><br />
'''Tuesday 1st July'''<br />
* LHCb Castor Stager Upgrade was carried out successfully last Thursday. The final update is the Atlas Castor instance stager which is planned for the Atlas - Tue 1st July.<br />
* There is a UPS/Generator load test tomorrow morning (Wed 2nd July) and the site has been declared in an At Risk (warning) in the GOC DB from 10 to 11 local time.<br />
* We are looking at how to end the FTS2 service, now FTS3 is becoming widely used.<br />
* The software server used by the small VOs will be withdrawn from service. Its use as a software server is very limited (possibly only SNO+) although a few VOs use it for uploading files to the CVMFS repository.<br />
<!-- **********************End T1 text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End T1****************** -----><br />
<br />
<!-- ****************Start Storage & DM****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Storage & Data Management - [http://storage.esc.rl.ac.uk/weekly/ Agendas/Minutes]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
'''Wednesday 2 July'''<br />
* Guidance and policies for "small" VOs: how to get them started with stuff, without preventing them later growing bigger.<br />
<br />
'''Tuesday 1st July'''<br />
* Spacetokens for smaller VOs ... most want them but what happens post SRM. Chris's [https://www.gridpp.ac.uk/wiki/Storage/SpaceTokens summary on Spacetokens] needs updating and a consensus! Could the SEs implement some reservation system internally? Is there merit in the suggestion to make use of [https://www.gridpp.ac.uk/wiki/Using_Castor_At_RAL RAL genscratch and its Least Used Policy]?<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End Storage & DM****************** -----><br />
<!-- ****************Start Accounting****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Accounting - [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html UK Grid Metrics] [[HEPSPEC06]] [http://tinyurl.com/cfbfanf Atlas Dashboard HS06]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 1st July'''<br />
* There are no SL6 [https://www.gridpp.ac.uk/wiki/HEPSPEC06 HS06 entries] in our wiki for UCL and EFDA.<br />
* Are there any observations from the latest GridPP [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html metrics tables]? (Does anything need addressing or correcting?).<br />
* APEL is not up-to-date for: RHUL; Manchester and Durham.<br />
<br />
<br />
'''Tuesday 24th June'''<br />
* APEL not up-to-date for: RHUL; Manchester, Durham and Sussex.<br />
<br />
<br />
* Check publishing via: http://gstat2.grid.sinica.edu.tw/gstat/summary/Country/UK/ <br />
<br />
<!-- *****************Edit stop*********************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End Accounting****************** -----><br />
<!-- ****************Start Documentation****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Documentation - [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=area KeyDocs]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- *********************************************************** -----><br />
<!-- ******************Edit start********************* -----><br />
<br />
See the [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=reviewed worst KeyDocs list] for documents needing review now and the names of the responsible people.<br />
<br />
'''Monday 16th June'''<br />
* A review is starting of old and obsolete pages within the GridPP website - there are many! Please review sections that you have created and update them if necessary.<br />
<br />
'''Tuesday 6th April'''<br />
* KeyDocs are going to be reviewed (in next 4 weeks) as the system is not working (or not adding anything) in some areas.<br />
<br />
<br />
|}<br />
<br />
<!-- ****************End Documentation****************** -----><br />
<!-- ****************Start Interop****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Interoperation - [https://wiki.egi.eu/wiki/Grid_Operations_Meetings EGI ops agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 14th July'''<br />
* Last meeting yesterday.<br />
<br />
* Agenda: https://wiki.egi.eu/wiki/Agenda-14-07-2014<br />
* Minutes: <br />
<br />
* URT: see agenda for details<br />
* SR: In verification: gfal2 v. 2.5.5; active: globus-info-provider-service v. 0.2.1 cream v. 1.16.3; ready to be released: storm v. 1.11.4 lb v. 11.1 wms v. 3.6.5 dcache v. 2.6.28<br />
* DMSU report: CREAM CLI/GridSite SegFaults at Long-Lived Proxies solved<br />
* Migration of Central SAM services: Note to make sure that if being reinstalled that patches are applied<br />
* EMI-2/APEL-2 - Looks like UCL is still publishing with APEL-2 publisher<br />
* Hoped that gr.net issues resolved on Monday. Summary of discussion to be in minutes.<br />
* Next meeting placeholder 28th July, but may not happen (OMD depending)<br />
* Please fill out this UMD customer satisfaction survey in the next couple of weeks if you had a moment: https://www.surveymonkey.com/s/MQ6G8BZ<br />
<br />
'''Tuesday 1st July'''<br />
* Today's ops meeting cancelled - partly due to forthcoming 4th EGI annual review.<br />
* EMI-2 decommissioning: The situation is followed by COD ([https://ggus.eu/index.php?mode=ticket_info&ticket_id=106354 GGUS 106354]). "Please remember that we passed the decommissioning deadline and after today - Sites still deploying unsupported service end-points risk suspension, unless documented technical reasons prevent a Site Admin from updating these end-points (source [https://wiki.egi.eu/wiki/PROC16 PROC16]).<br />
* There is STILL use of UMD2/EMI2 APEL clients to send accounting data. As of today there are 20 sites ([http://goc-accounting.grid-support.ac.uk/consumer/ see latest list]) still using UMD2/EMI2 APEL clients<br />
<br />
<br />
* Updates requested for the [https://www.egi.eu/earlyAdopters/table early adopters table]. <br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- *********************************************************** -----><br />
<br />
|}<br />
<!-- ****************End Interop****************** -----><br />
<!-- ****************Start Monitoring****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Monitoring - [https://www.gridpp.ac.uk/wiki/Links_Monitoring_pages Links] [http://grid-monitoring.cern.ch/mywlcg/ MyWLCG]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 24th June'''<br />
* Meeting last Friday: https://indico.cern.ch/event/324687/<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Monitoring****************** -----><br />
<!-- ****************On-duty****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | On-duty - [https://operations-portal.in2p3.fr/dashboard Dashboard] [https://www.gridpp.ac.uk/wiki/ROD_rota ROD rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 1st July'''<br />
* Quiet week. Sussex emi2 ticket is still open. UCL also has a open ticket regarding some problem with storage.<br />
<br />
'''Tuesday 24th June'''<br />
* Very quiet shift. Dashboard downtime on Tuesday seemed to go ok.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Rollout [http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html Status] [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions WLCG Baseline]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 18th March'''<br />
* The EMI-2 decommissioning task has started.<br />
* The [http://indico.cern.ch/event/MW-Readiness_3 next WLCG middleware readiness WG meeting] takes place this afternoon at 13:30 UK time. <br />
<br />
'''Tuesday 11th February'''<br />
* 31st May has been set as the deadline for EMI-2 decommissioning. There may be an issue for dCache (related to 3rd party/enstore component).<br />
<br />
'''References'''<br />
<br />
* Staged Rollout pages (now separated into EMI1 & 2), and the page listing the deployed versions is extractable from the bdii, so they should all be reasonably up-to-date:<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout_emi2.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End On-duty****************** -----><br />
<!-- ****************Start Security****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Security - [http://www.gridpp.ac.uk/security/inchand/Incident.html Incident Procedure] [http://www.gridpp.ac.uk/security/policies/index.html Policies] [https://www.gridpp.ac.uk/wiki/SD_rota Rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
'''Monday 14th July'''<br />
* EGI CSIRT ADVISORY [EGI-ADV-20140625] <br />
<br />
<br />
'''Tuesday 1st July'''<br />
* There was a very useful security challenge debrief last week. Thanks to Heiko.<br />
* There may be a site contacts challenge in the coming months. Please could every site review their site security contact details and ensure that the GOCDB entry is up-to-date and working. <br />
* EGI indicates that site ARGUS instances can now be hooked up with the regional instances.<br />
* There was one EGI amber final report last week.<br />
* Next team meeting 16th July.<br />
<br />
'''Monday 23rd June'''<br />
* CVE-2014-3153 - but no public exploit.<br />
** This kernel vulnerability has been patched in errata released last week.<br />
* PerfSonar/Cacti updates.<br />
* New IGTF CA release 1.58 - the EGI release is due on 30th June.<br />
<br />
<br />
|}<br />
<!-- ****************End Security****************** -----><br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start Services****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Services - [http://netmon02.grid.hep.ph.ic.ac.uk:8080/maddash-webui/index.cgi PerfSonar dashboard] | [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).<br />
<br />
'''Tuesday 17th June'''<br />
* The GridPP VOMS server was updated on 11/06/2014 - no issues reported.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Services****************** -----><br />
<!-- ****************Start Tickets****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tickets<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Monday 14th July 2014, 14.30 BST.'''<br /><br />
29 Open UK tickets today. I might have to send my apologies to this week's meeting as Lancaster is receiving a delivery Tuesday morning.<br />
<br />
'''FNAL VOMS TICKETS'''<br /><br />
As seen on TB-SUPPORT - a number of sites got tickets concerning jobs still contacting the FNAL voms server for CMS/ILC. Birmingham, RHUL, Liverpool and the Tier 1's tickets are still being worked on - RHUL's ticket might not have been spotted yet (still assigned).<br />
<br />
'''DECOMMISSIONING THE FTS3 SERVICE'''<br /><br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=106615 (2/7)<br /><br />
Gareth opened a ticket to document the retirement, in accordance with ancient grid laws. As naught is happening until the 2nd of September I put on hold till nearer the time. On Hold (14/7)<br />
<br />
'''TIER 1'''<br /><br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=106770 (10/7)<br /><br />
enmr.eu wanted to add tags to one of the Tier 1's arc ces, which of course didn't work. There was an interesting exchange about why a VO would still want to have a site publish tags in the age of cvmfs (essentially so they can minimise changes to the submission gubbins). Andrew offered to add in the tag "VO-enmr.eu-CVMFS" by hand to his CE, it's likely that other sites might be asked to do the same - and it's a solution worth noting for other VOs. In progress (14/7)<br />
<br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=106610 (2/7)<br /><br />
Enabling HyperK at the Tier 1. Ticket looks a little stalled after Chris commented that it was wise for Hyper K to be enabled on only Arc-CEs (in light of RAL going dairy free). In progress (2/7)<br />
<br />
'''UCL'''<br /><br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=106425 (4/7)<br /><br />
UCL are still having trouble with nagios tests after a pool node died. Ben is having trouble getting the new disk server set up - I tried to give him some tips and advised shouting out for help. In progress (8/7)<br />
<br />
'''BRISTOL'''<br /><br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=106554 (1/6)<br /><br />
Bristol having trouble with CMS transfers- Lukasz noticed Storm was being odd (believing there to be no free space when there was). The SE was kicked but the problem (or a similar one) showed up again. Anyone seen similar? (Looking at Chris Walker:Storm Sage again here). In Progress (9/7)<br />
<br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=106325 (1/6)<br /><br />
cf TIER 1 ticket https://ggus.eu/index.php?mode=ticket_info&ticket_id=106324<br /><br />
CMS pilots losing contact with their home base. Looks similar to the issue at RAL, where they seem to have had some success (still waiting to see if it was complete). If the RAL chaps could elaborate on the firewall tweaks that brought about this improvement it would be greatly appreciated (The RAL ticket could do with an update too)! In Progress (14/7)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tickets****************** -----><br />
<!-- ****************Start Tools****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tools - [https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] [https://gridppnagios.physics.ox.ac.uk/nagios/ Nagios]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== ===== <br />
<!-- ******************Edit start********************* -----><br />
'''Monday 14th July'''<br />
<br />
Winnie reported on Saturday 12th July that most of the UK sites are failing nagios test. Problem started with unscheduled power cut at a Greek site hosting EGI Message broker (mq.afroditi.hellasgrid.gr) around 2PM on 11th July. Message broker was put in downtime but topbdii's continued to publish it for quite long time. Stephen Burke mentioned in TB support thread that now default caching time is 4 days. When I checked on Monday morning only Manchester was still publishing mq.afroditi and it went away after Alessandra manually restarted top bdii. It seams that Imperial is configured with much shorter cache time.<br />
Only Oxford and Imperial was almost not affected and the reason may be that Oxford WN's have Imperial top bdii as first option in BDII_LIST. Other NGI's have reported same problem and this outage is likely to be considered when calculating availability/reliability. <br />
<br />
Emir reported this on tools-admin mailing list<br />
"We were planning to raise this issue at the next Operations meeting. In these extreme cases 24h cache rule in Top BDII has to be somehow circumvented." <br />
<br />
'''Tuesday 1st July'''<br />
* There was a monitoring problem on 26th June. All ARC CE's were using storage-monit.phyics.ox.ac.uk for replicating files as part of the nagios testing. storage-monit was updated but not re-yaimed until later. Storage-monit was broken for the morning leading to all ARC SRM tests failing.<br />
<br />
'''Tuesday 24th June'''<br />
* An update from Janusz on DIRAC:<br />
* We had a stupid bug in Dirac which affected the gridpp VO and storage. Now it is fixed and I was able to successfully upload a test file to Liverpool and register the file with the DFC<br />
* The async FTS is still under study, there some issues with this.<br />
*I have a link to software to sync user database from a VOMS server, haven’t looked into this in detail yet.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tools****************** -----><br />
<!-- ****************Start VOs****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | VOs - [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS] [http://operations-portal.egi.eu/vo VO IDs] [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs Approved] [http://pprc.qmul.ac.uk/~walker/votable.html VO table]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
"Monday 14th July 2014"<br />
* HyperK.org will initially use remote storage (irods at QMUL) - so CPU resources would be appreciated. <br />
<br />
"Monday 30 June 2104"<br />
* HyperK.org request for support from other sites<br />
** 2TB storage requested.<br />
** CVMFS required<br />
<br />
* Cernatschool.org<br />
** WebDAV access to storage -world read works at QMUL.<br />
** ideally will configure federated access with DFC as LFC allows.<br />
<br />
<br />
'''Monday 16 June 2014'''<br />
* CVMFS<br />
** Snoplus almost ready to move to CVMFS - waiting on two sites. Will use symlinks in existing software<br />
<br />
* VOMS server: Snoplus has problems with some of the VOMS servers - see ggus 106243 - may be related to update. <br />
<br />
<br />
'''Tuesday 15th April'''<br />
* Is there interest in an FTS3 web front end? ([http://indico.cern.ch/event/272620/contribution/11/material/slides/1.pdf more details])<br />
<br />
<br />
* Impact<br />
** Citation policy (https://www.gridpp.ac.uk/acknowledging.html)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End VOs****************** -----><br />
<!-- ****************Start Sites****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Site Updates<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 20th May'''<br />
* Various sites but notably Oxford have ARGUS problems. 100s of requests seen per minute. Performance issues have been noted after initial installation at RAL, QMUL and others.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Sites****************** -----><br />
|}<br />
<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Meeting Summaries<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start PMB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Project Management Board - [http://www.gridpp.ac.uk/pmb/ Members][http://www.gridpp.ac.uk/php/pmb/minutes.php Minutes] [http://www.gridpp.ac.uk/pmb/ProjectManagement/QuarterlyReports/reports.html Quarterly Reports]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ******************************************* -----><br />
<!-- ****************End PMB****************** -----><br />
<!-- ****************Start Grid Ops****************** -----><br />
<!-- ******************************************* -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | GridPP ops meeting - [http://indico.cern.ch/categoryDisplay.py?categId=338 Agendas] [https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Actions] [https://www.gridpp.ac.uk/wiki/Category:GridPP_Operations Core Tasks]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Grid Ops****************** -----><br />
<!-- ****************Start T1 liaison****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) [https://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Agenda] Meeting takes place on Vidyo.<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Wednesday 25th June 2014'''<br />
* [https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2014-06-25 Operations report]<br />
* Castor GEN Stager 2.1.14-13 updated yesterday (24th June). Some problems with xroot for ALICE not resolved until following morning. Remaining stager dates as follows (LHCb - Thu 26th June; Atlas - Tue 8th July.)<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Liaison****************** -----><br />
<!-- ****************Start GDB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Grid Deployment Board - [http://indico.cern.ch/categoryDisplay.py?categId=3l181 Agendas] [http://indico.cern.ch/categoryDisplay.py?categId=666 MB agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End GDB****************** -----><br />
<br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start NGI****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | NGI UK - [http://www.ukngi.ac.uk/ Homepage] [https://ca.grid-support.ac.uk/cgi-bin/pub/pki?cmd=getStaticPage&name=index CA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End NGI****************** -----><br />
<br />
<!-- ****************Start Events****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Events<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************Stop Events****************** -----><br />
<!-- ****************Start ATLAS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK ATLAS - [http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?#currentView=Shifter+view&highlight=false Shifter view] [http://www.atlas.ac.uk/ops/ News & Links]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End ATLAS****************** -----><br />
<!-- ****************Start CMS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK CMS<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End CMS****************** -----><br />
<!-- ****************Start LHCb****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK LHCb<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End LHCb****************** -----><br />
<!-- ****************Start Other****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK OTHER<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Other****************** -----><br />
<!-- ****************Start Requests****************** -----><br />
{| cellspacing="0" cellpadding="0" style="margin: 1em 0em 0em 0.3em; width:100%"<br />
| style="width:50%; vertical-align:top; border:1px solid Gold; background-color: LightGreen;" rowspan="2"|<br />
<div style="border-bottom:1px solid Gold; background-color:#ffffaa; padding:0.2em 0.5em 0.2em 0.5em; font-size:110%; font-weight:bold;"> To note</div><br />
<div style="padding:0.4em 1em 0.3em 1em;"><br />
<br />
==== ====<br />
<!-- ******************Edit start********************* -----><br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<br />
</div><br />
|}<br />
<br />
<!-- ****************End Requests****************** -----><br />
<br />
|}</div>Kashif Mohammad 6ae08fa8ffhttps://www.gridpp.ac.uk/wiki/Operations_Bulletin_LatestOperations Bulletin Latest2014-07-14T23:15:12Z<p>Kashif Mohammad 6ae08fa8ff: /* */</p>
<hr />
<div>[[Operations_Bulletin_Archive|Bulletin archive]]<br />
<br />
__NOTOC__<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Week commencing 14th July 2014<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Task Areas<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start General****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | General updates <br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start General text*********************** ----->'''<br />
<br />
'''Monday 14th July'''<br />
* Workshop - CVMFS monitoring feedback <br />
* ATLAS DC14 13TeV simulation starting - note Alessandra's recommendation regarding Nikehf scripts and multicore running (for torque/maui) sites.<br />
* topBDII caching and errors<br />
* ILC VOMS changes<br />
<br />
* Sites with ARC CEs who want to support LHCb need to make a few configuration changes. This is to ensure that there is an environment variable available to jobs which specifies the name of the queue.<br />
* EGI [https://wiki.egi.eu/wiki/Availability_and_reliability_monthly_statistics A/R report for June]<br />
* Did anyone else see kernel problems like Liverpool (see [http://northgrid-tech.blogspot.co.uk/2014/04/kernel-problems-at-liverpool.html blog])<br />
* Large numbers of biomed jobs have been impacting various sites. Is setting MaxTotalJobs the answer? Do we need follow-up with the VO?<br />
* HyperK can now make use of additional resources and a general request for enablement was circulated. It has been confirmed that they only need disk at QMUL.<br />
<br />
<!-- **********************End General text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End General****************** -----><br />
<!-- ****************Start ops coord****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Operations Coordination - [https://indico.cern.ch/categoryDisplay.py?categId=4372 Agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start ops coord text*********************** -----><br />
'''Tuesday 14th July'''<br />
* The regular meeting would have taken place last week - an update was presented at the workshop. [https://indico.cern.ch/event/326084/ Next meeting is on 24th July].<br />
<br />
'''Tuesday 1st July'''<br />
* There will be a [https://indico.cern.ch/event/327276/ multi-core TF meeting this afternoon] at 13:30 UK time.<br />
* A reminder that the [http://indico.cern.ch/event/320011/ next MW readiness WG meeting] is this Wednesday (2nd July) at 15:00 UK time.<br />
<br />
'''Monday 23rd June'''<br />
* [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes140619 Minutes] from [https://indico.cern.ch/event/323689/ last Thursday's] meeting. Highlights....<br />
* A [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions#Issues_Affecting_the_WLCG_Infras page] is available listing current known middleware issues affecting WLCG.<br />
* [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions Baselines]: Storm 1.11.4 released in EMI containing several bug fixes. Baseline update with UMD release.<br />
* 3 issues affected some sites after the latest EMI update of Cream and LB. The problems are under investigations by the PTs.<br />
* CVMFS: Starting from July, sites not compliant with the 2.1.19 version will be notified with a GGUS ticket (noted that upgrade just requires an update of the RPM and a restart CVMFS). <br />
* T0: The OPS VO now runs in voms-admin instead of VOMRS, after the migration done on June 17th <br />
* Tier-1/Tier-2 feedback: NTR!<br />
* ALICE: successful campaign for users to move away from old ROOT versions. T0 job efficiency issues ongoing.<br />
* ATLAS: DC14 expected to start in approximately 2 weeks from now.Panda/Jedi is now fully ready for user analysis.<br />
* CMS: Started to remove individual release tags from CEs. After the introduction of disk/tape separation at the T1 sites, CMS now must site readiness measures for T1 sites<br />
* LHCb: Recommend CVMFS 2.1.19. General request: ensure that downtimes, including unscheduled outages, accurately reflect the specific services which are unavailable.<br />
* FTS3: Monitoring the auto-tuning algorithm closely and adjusting various monitoring tools of FTS3.<br />
* glexec: 10 sites have yet to enable it. ARGUS instabilities being investigated.<br />
* Machine/job features: PBS/torque and LSF implemented. SLURM pending. SGE and HTCondor in progress.<br />
* MW readiness: ATLAS and CMS DPM setups in progress. Monitoring prototype being deployed at test sites. <br />
* Multicore: CMS stable flow. Gathering reports for July workshop. ATLAS MC jobs on-hold pending new software release.<br />
* SHA-2: New VOMS fix for CERN instances requires sites to update ARGUS, UI, CREAM and WN instances.<br />
* WMS decommissioning: Progress with SAM Condor validation. ARC-CE WN tests failing for some CMS sites (incl. Imperial).<br />
* IPv6: NTR<br />
* HTTP proxy discovery: [https://twiki.cern.ch/twiki/bin/view/LCG/HttpProxyDiscoveryTaskForce#Task_Overview Task overview table] updated.<br />
* Network and transfers metrics: [https://twiki.cern.ch/twiki/bin/view/LCG/MeshLeaders Mesh leaders] developed. Kick off in July.<br />
* AOB: OSG plan to migrate to HTCondor CEs by October.<br />
<br />
<br />
<br />
<br />
<!-- **********************End ops coord text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End ops coord****************** -----><br />
<br />
<!-- ****************Start T1****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-1 - [http://www.gridpp.rl.ac.uk/status/ Status Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start T1 text*********************** -----><br />
'''Tuesday 1st July'''<br />
* LHCb Castor Stager Upgrade was carried out successfully last Thursday. The final update is the Atlas Castor instance stager which is planned for the Atlas - Tue 1st July.<br />
* There is a UPS/Generator load test tomorrow morning (Wed 2nd July) and the site has been declared in an At Risk (warning) in the GOC DB from 10 to 11 local time.<br />
* We are looking at how to end the FTS2 service, now FTS3 is becoming widely used.<br />
* The software server used by the small VOs will be withdrawn from service. Its use as a software server is very limited (possibly only SNO+) although a few VOs use it for uploading files to the CVMFS repository.<br />
<!-- **********************End T1 text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End T1****************** -----><br />
<br />
<!-- ****************Start Storage & DM****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Storage & Data Management - [http://storage.esc.rl.ac.uk/weekly/ Agendas/Minutes]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
'''Wednesday 2 July'''<br />
* Guidance and policies for "small" VOs: how to get them started with stuff, without preventing them later growing bigger.<br />
<br />
'''Tuesday 1st July'''<br />
* Spacetokens for smaller VOs ... most want them but what happens post SRM. Chris's [https://www.gridpp.ac.uk/wiki/Storage/SpaceTokens summary on Spacetokens] needs updating and a consensus! Could the SEs implement some reservation system internally? Is there merit in the suggestion to make use of [https://www.gridpp.ac.uk/wiki/Using_Castor_At_RAL RAL genscratch and its Least Used Policy]?<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End Storage & DM****************** -----><br />
<!-- ****************Start Accounting****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Accounting - [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html UK Grid Metrics] [[HEPSPEC06]] [http://tinyurl.com/cfbfanf Atlas Dashboard HS06]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 1st July'''<br />
* There are no SL6 [https://www.gridpp.ac.uk/wiki/HEPSPEC06 HS06 entries] in our wiki for UCL and EFDA.<br />
* Are there any observations from the latest GridPP [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html metrics tables]? (Does anything need addressing or correcting?).<br />
* APEL is not up-to-date for: RHUL; Manchester and Durham.<br />
<br />
<br />
'''Tuesday 24th June'''<br />
* APEL not up-to-date for: RHUL; Manchester, Durham and Sussex.<br />
<br />
<br />
* Check publishing via: http://gstat2.grid.sinica.edu.tw/gstat/summary/Country/UK/ <br />
<br />
<!-- *****************Edit stop*********************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End Accounting****************** -----><br />
<!-- ****************Start Documentation****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Documentation - [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=area KeyDocs]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- *********************************************************** -----><br />
<!-- ******************Edit start********************* -----><br />
<br />
See the [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=reviewed worst KeyDocs list] for documents needing review now and the names of the responsible people.<br />
<br />
'''Monday 16th June'''<br />
* A review is starting of old and obsolete pages within the GridPP website - there are many! Please review sections that you have created and update them if necessary.<br />
<br />
'''Tuesday 6th April'''<br />
* KeyDocs are going to be reviewed (in next 4 weeks) as the system is not working (or not adding anything) in some areas.<br />
<br />
<br />
|}<br />
<br />
<!-- ****************End Documentation****************** -----><br />
<!-- ****************Start Interop****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Interoperation - [https://wiki.egi.eu/wiki/Grid_Operations_Meetings EGI ops agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 14th July'''<br />
* Last meeting yesterday.<br />
<br />
* Agenda: https://wiki.egi.eu/wiki/Agenda-14-07-2014<br />
* Minutes: <br />
<br />
* URT: see agenda for details<br />
* SR: In verification: gfal2 v. 2.5.5; active: globus-info-provider-service v. 0.2.1 cream v. 1.16.3; ready to be released: storm v. 1.11.4 lb v. 11.1 wms v. 3.6.5 dcache v. 2.6.28<br />
* DMSU report: CREAM CLI/GridSite SegFaults at Long-Lived Proxies solved<br />
* Migration of Central SAM services: Note to make sure that if being reinstalled that patches are applied<br />
* EMI-2/APEL-2 - Looks like UCL is still publishing with APEL-2 publisher<br />
* Hoped that gr.net issues resolved on Monday. Summary of discussion to be in minutes.<br />
* Next meeting placeholder 28th July, but may not happen (OMD depending)<br />
* Please fill out this UMD customer satisfaction survey in the next couple of weeks if you had a moment: https://www.surveymonkey.com/s/MQ6G8BZ<br />
<br />
'''Tuesday 1st July'''<br />
* Today's ops meeting cancelled - partly due to forthcoming 4th EGI annual review.<br />
* EMI-2 decommissioning: The situation is followed by COD ([https://ggus.eu/index.php?mode=ticket_info&ticket_id=106354 GGUS 106354]). "Please remember that we passed the decommissioning deadline and after today - Sites still deploying unsupported service end-points risk suspension, unless documented technical reasons prevent a Site Admin from updating these end-points (source [https://wiki.egi.eu/wiki/PROC16 PROC16]).<br />
* There is STILL use of UMD2/EMI2 APEL clients to send accounting data. As of today there are 20 sites ([http://goc-accounting.grid-support.ac.uk/consumer/ see latest list]) still using UMD2/EMI2 APEL clients<br />
<br />
<br />
* Updates requested for the [https://www.egi.eu/earlyAdopters/table early adopters table]. <br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- *********************************************************** -----><br />
<br />
|}<br />
<!-- ****************End Interop****************** -----><br />
<!-- ****************Start Monitoring****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Monitoring - [https://www.gridpp.ac.uk/wiki/Links_Monitoring_pages Links] [http://grid-monitoring.cern.ch/mywlcg/ MyWLCG]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 24th June'''<br />
* Meeting last Friday: https://indico.cern.ch/event/324687/<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Monitoring****************** -----><br />
<!-- ****************On-duty****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | On-duty - [https://operations-portal.in2p3.fr/dashboard Dashboard] [https://www.gridpp.ac.uk/wiki/ROD_rota ROD rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 1st July'''<br />
* Quiet week. Sussex emi2 ticket is still open. UCL also has a open ticket regarding some problem with storage.<br />
<br />
'''Tuesday 24th June'''<br />
* Very quiet shift. Dashboard downtime on Tuesday seemed to go ok.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Rollout [http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html Status] [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions WLCG Baseline]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 18th March'''<br />
* The EMI-2 decommissioning task has started.<br />
* The [http://indico.cern.ch/event/MW-Readiness_3 next WLCG middleware readiness WG meeting] takes place this afternoon at 13:30 UK time. <br />
<br />
'''Tuesday 11th February'''<br />
* 31st May has been set as the deadline for EMI-2 decommissioning. There may be an issue for dCache (related to 3rd party/enstore component).<br />
<br />
'''References'''<br />
<br />
* Staged Rollout pages (now separated into EMI1 & 2), and the page listing the deployed versions is extractable from the bdii, so they should all be reasonably up-to-date:<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout_emi2.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End On-duty****************** -----><br />
<!-- ****************Start Security****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Security - [http://www.gridpp.ac.uk/security/inchand/Incident.html Incident Procedure] [http://www.gridpp.ac.uk/security/policies/index.html Policies] [https://www.gridpp.ac.uk/wiki/SD_rota Rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
'''Monday 14th July'''<br />
* EGI CSIRT ADVISORY [EGI-ADV-20140625] <br />
<br />
<br />
'''Tuesday 1st July'''<br />
* There was a very useful security challenge debrief last week. Thanks to Heiko.<br />
* There may be a site contacts challenge in the coming months. Please could every site review their site security contact details and ensure that the GOCDB entry is up-to-date and working. <br />
* EGI indicates that site ARGUS instances can now be hooked up with the regional instances.<br />
* There was one EGI amber final report last week.<br />
* Next team meeting 16th July.<br />
<br />
'''Monday 23rd June'''<br />
* CVE-2014-3153 - but no public exploit.<br />
** This kernel vulnerability has been patched in errata released last week.<br />
* PerfSonar/Cacti updates.<br />
* New IGTF CA release 1.58 - the EGI release is due on 30th June.<br />
<br />
<br />
|}<br />
<!-- ****************End Security****************** -----><br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start Services****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Services - [http://netmon02.grid.hep.ph.ic.ac.uk:8080/maddash-webui/index.cgi PerfSonar dashboard] | [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).<br />
<br />
'''Tuesday 17th June'''<br />
* The GridPP VOMS server was updated on 11/06/2014 - no issues reported.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Services****************** -----><br />
<!-- ****************Start Tickets****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tickets<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Monday 14th July 2014, 14.30 BST.'''<br /><br />
29 Open UK tickets today. I might have to send my apologies to this week's meeting as Lancaster is receiving a delivery Tuesday morning.<br />
<br />
'''FNAL VOMS TICKETS'''<br /><br />
As seen on TB-SUPPORT - a number of sites got tickets concerning jobs still contacting the FNAL voms server for CMS/ILC. Birmingham, RHUL, Liverpool and the Tier 1's tickets are still being worked on - RHUL's ticket might not have been spotted yet (still assigned).<br />
<br />
'''DECOMMISSIONING THE FTS3 SERVICE'''<br /><br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=106615 (2/7)<br /><br />
Gareth opened a ticket to document the retirement, in accordance with ancient grid laws. As naught is happening until the 2nd of September I put on hold till nearer the time. On Hold (14/7)<br />
<br />
'''TIER 1'''<br /><br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=106770 (10/7)<br /><br />
enmr.eu wanted to add tags to one of the Tier 1's arc ces, which of course didn't work. There was an interesting exchange about why a VO would still want to have a site publish tags in the age of cvmfs (essentially so they can minimise changes to the submission gubbins). Andrew offered to add in the tag "VO-enmr.eu-CVMFS" by hand to his CE, it's likely that other sites might be asked to do the same - and it's a solution worth noting for other VOs. In progress (14/7)<br />
<br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=106610 (2/7)<br /><br />
Enabling HyperK at the Tier 1. Ticket looks a little stalled after Chris commented that it was wise for Hyper K to be enabled on only Arc-CEs (in light of RAL going dairy free). In progress (2/7)<br />
<br />
'''UCL'''<br /><br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=106425 (4/7)<br /><br />
UCL are still having trouble with nagios tests after a pool node died. Ben is having trouble getting the new disk server set up - I tried to give him some tips and advised shouting out for help. In progress (8/7)<br />
<br />
'''BRISTOL'''<br /><br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=106554 (1/6)<br /><br />
Bristol having trouble with CMS transfers- Lukasz noticed Storm was being odd (believing there to be no free space when there was). The SE was kicked but the problem (or a similar one) showed up again. Anyone seen similar? (Looking at Chris Walker:Storm Sage again here). In Progress (9/7)<br />
<br />
https://ggus.eu/index.php?mode=ticket_info&ticket_id=106325 (1/6)<br /><br />
cf TIER 1 ticket https://ggus.eu/index.php?mode=ticket_info&ticket_id=106324<br /><br />
CMS pilots losing contact with their home base. Looks similar to the issue at RAL, where they seem to have had some success (still waiting to see if it was complete). If the RAL chaps could elaborate on the firewall tweaks that brought about this improvement it would be greatly appreciated (The RAL ticket could do with an update too)! In Progress (14/7)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tickets****************** -----><br />
<!-- ****************Start Tools****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tools - [https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] [https://gridppnagios.physics.ox.ac.uk/nagios/ Nagios]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== ===== <br />
<!-- ******************Edit start********************* -----><br />
'''Monday 14th July'''<br />
<br />
Winnie reported on Saturday 12th July that most of the UK sites are failing nagios test. Problem started with unscheduled power cut at a Greek site hosting EGI Message broker (mq.afroditi.hellasgrid.gr) around 2PM on 11th July. Message broker was put in downtime but topbdii's continued to publish it for quite long time. Stephen Burke mentioned in TB support thread that now default caching time is 4 days. When I checked on Monday morning only Manchester was still publishing mq.afroditi and it went away after Alessandra manually restarted top bdii. It seams that Imperial is configured with much shorter cache time.<br />
Only Oxford and Imperial was almost not affected and the reason may be that Oxford WN's have Imperial top bdii as first option in BDII_LIST. Other NGI's has reported same problem and this outage is likely to be considered when calculating availability/reliability. <br />
<br />
Emir reported this on tools-admin mailing list<br />
"We were planning to raise this issue at the next Operations meeting. In these extreme cases 24h cache rule in Top BDII has to be somehow circumvented." <br />
<br />
'''Tuesday 1st July'''<br />
* There was a monitoring problem on 26th June. All ARC CE's were using storage-monit.phyics.ox.ac.uk for replicating files as part of the nagios testing. storage-monit was updated but not re-yaimed until later. Storage-monit was broken for the morning leading to all ARC SRM tests failing.<br />
<br />
'''Tuesday 24th June'''<br />
* An update from Janusz on DIRAC:<br />
* We had a stupid bug in Dirac which affected the gridpp VO and storage. Now it is fixed and I was able to successfully upload a test file to Liverpool and register the file with the DFC<br />
* The async FTS is still under study, there some issues with this.<br />
*I have a link to software to sync user database from a VOMS server, haven’t looked into this in detail yet.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tools****************** -----><br />
<!-- ****************Start VOs****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | VOs - [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS] [http://operations-portal.egi.eu/vo VO IDs] [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs Approved] [http://pprc.qmul.ac.uk/~walker/votable.html VO table]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
"Monday 14th July 2014"<br />
* HyperK.org will initially use remote storage (irods at QMUL) - so CPU resources would be appreciated. <br />
<br />
"Monday 30 June 2104"<br />
* HyperK.org request for support from other sites<br />
** 2TB storage requested.<br />
** CVMFS required<br />
<br />
* Cernatschool.org<br />
** WebDAV access to storage -world read works at QMUL.<br />
** ideally will configure federated access with DFC as LFC allows.<br />
<br />
<br />
'''Monday 16 June 2014'''<br />
* CVMFS<br />
** Snoplus almost ready to move to CVMFS - waiting on two sites. Will use symlinks in existing software<br />
<br />
* VOMS server: Snoplus has problems with some of the VOMS servers - see ggus 106243 - may be related to update. <br />
<br />
<br />
'''Tuesday 15th April'''<br />
* Is there interest in an FTS3 web front end? ([http://indico.cern.ch/event/272620/contribution/11/material/slides/1.pdf more details])<br />
<br />
<br />
* Impact<br />
** Citation policy (https://www.gridpp.ac.uk/acknowledging.html)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End VOs****************** -----><br />
<!-- ****************Start Sites****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Site Updates<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 20th May'''<br />
* Various sites but notably Oxford have ARGUS problems. 100s of requests seen per minute. Performance issues have been noted after initial installation at RAL, QMUL and others.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Sites****************** -----><br />
|}<br />
<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Meeting Summaries<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start PMB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Project Management Board - [http://www.gridpp.ac.uk/pmb/ Members][http://www.gridpp.ac.uk/php/pmb/minutes.php Minutes] [http://www.gridpp.ac.uk/pmb/ProjectManagement/QuarterlyReports/reports.html Quarterly Reports]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ******************************************* -----><br />
<!-- ****************End PMB****************** -----><br />
<!-- ****************Start Grid Ops****************** -----><br />
<!-- ******************************************* -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | GridPP ops meeting - [http://indico.cern.ch/categoryDisplay.py?categId=338 Agendas] [https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Actions] [https://www.gridpp.ac.uk/wiki/Category:GridPP_Operations Core Tasks]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Grid Ops****************** -----><br />
<!-- ****************Start T1 liaison****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) [https://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Agenda] Meeting takes place on Vidyo.<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Wednesday 25th June 2014'''<br />
* [https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2014-06-25 Operations report]<br />
* Castor GEN Stager 2.1.14-13 updated yesterday (24th June). Some problems with xroot for ALICE not resolved until following morning. Remaining stager dates as follows (LHCb - Thu 26th June; Atlas - Tue 8th July.)<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Liaison****************** -----><br />
<!-- ****************Start GDB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Grid Deployment Board - [http://indico.cern.ch/categoryDisplay.py?categId=3l181 Agendas] [http://indico.cern.ch/categoryDisplay.py?categId=666 MB agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End GDB****************** -----><br />
<br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start NGI****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | NGI UK - [http://www.ukngi.ac.uk/ Homepage] [https://ca.grid-support.ac.uk/cgi-bin/pub/pki?cmd=getStaticPage&name=index CA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End NGI****************** -----><br />
<br />
<!-- ****************Start Events****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Events<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************Stop Events****************** -----><br />
<!-- ****************Start ATLAS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK ATLAS - [http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?#currentView=Shifter+view&highlight=false Shifter view] [http://www.atlas.ac.uk/ops/ News & Links]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End ATLAS****************** -----><br />
<!-- ****************Start CMS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK CMS<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End CMS****************** -----><br />
<!-- ****************Start LHCb****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK LHCb<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End LHCb****************** -----><br />
<!-- ****************Start Other****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK OTHER<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Other****************** -----><br />
<!-- ****************Start Requests****************** -----><br />
{| cellspacing="0" cellpadding="0" style="margin: 1em 0em 0em 0.3em; width:100%"<br />
| style="width:50%; vertical-align:top; border:1px solid Gold; background-color: LightGreen;" rowspan="2"|<br />
<div style="border-bottom:1px solid Gold; background-color:#ffffaa; padding:0.2em 0.5em 0.2em 0.5em; font-size:110%; font-weight:bold;"> To note</div><br />
<div style="padding:0.4em 1em 0.3em 1em;"><br />
<br />
==== ====<br />
<!-- ******************Edit start********************* -----><br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<br />
</div><br />
|}<br />
<br />
<!-- ****************End Requests****************** -----><br />
<br />
|}</div>Kashif Mohammad 6ae08fa8ffhttps://www.gridpp.ac.uk/wiki/GitHub_RepositoriesGitHub Repositories2014-06-16T14:03:37Z<p>Kashif Mohammad 6ae08fa8ff: /* Monitoring */</p>
<hr />
<div>List of useful github repositories. Please feel free to add any new category or repository.<br />
== GridPP catch all repository ==<br />
https://github.com/gridpp<br />
<br />
== Puppet Modules ==<br />
https://github.com/cernops<br />
<br />
https://github.com/HEP-puppet<br />
<br />
https://github.com/oxford-physics<br />
<br />
== Monitoring ==<br />
<br />
https://github.com/alahiff/ral-htcondor-nagios-plugins</div>Kashif Mohammad 6ae08fa8ffhttps://www.gridpp.ac.uk/wiki/GitHub_RepositoriesGitHub Repositories2014-06-16T13:50:52Z<p>Kashif Mohammad 6ae08fa8ff: Created page with "List of useful github repositories. Please feel free to add any new category or repository. == GridPP catch all repository == https://github.com/gridpp == Puppet Modules == h..."</p>
<hr />
<div>List of useful github repositories. Please feel free to add any new category or repository.<br />
== GridPP catch all repository ==<br />
https://github.com/gridpp<br />
<br />
== Puppet Modules ==<br />
https://github.com/cernops<br />
<br />
https://github.com/HEP-puppet<br />
<br />
https://github.com/oxford-physics<br />
<br />
== Monitoring ==</div>Kashif Mohammad 6ae08fa8ffhttps://www.gridpp.ac.uk/wiki/Main_PageMain Page2014-06-16T13:42:23Z<p>Kashif Mohammad 6ae08fa8ff: /* Middleware */</p>
<hr />
<div>==Sites in GridPP==<br />
* [[ScotGrid]] Tier2 - [http://www.scotgrid.ac.uk/wiki Glasgow], [[Edinburgh]], [[Durham]] <br />
* [[NorthGrid]] Tier2 - [[Lancaster]], [[Liverpool]], [[Manchester]], [[Sheffield]], Daresbury Laboratory<br />
* [[SouthGrid]] Tier2 - [[Birmingham]], [[Bristol]], [[Cambridge]], [[Oxford]], [[Warwick]], [http://www.gridpp.ac.uk/wiki/RAL_Tier2 Rutherford Appleton Laboratory], [[EFDA-JET]], [[Sussex]]<br />
* [[London Tier2]] Tier2 - [[Brunel]], [https://www.gridpp.ac.uk/wiki/UCL-HEP UCL], [https://www.gridpp.ac.uk/wiki/IC-HEP Imperial College ], [https://www.gridpp.ac.uk/wiki/QMUL Queen Mary], [https://www.gridpp.ac.uk/wiki/RHUL Royal Holloway]<br />
* [[RAL Tier1]] - Rutherford Appleton Laboratory<br />
<br />
==Security==<br />
* How to [[Report_Security_Incident | Report a Security Incident]]<br />
* How to [[Report_Software_Vulnerability | Report a Software Vulnerability ]]<br />
* How to ban or blacklist a user on a CE or SE [[How to ban/blacklist user on CE and SE]]<br />
* Other [[Security_Information | Security Information ]]<br />
<br />
==GridPP grid services==<br />
*[[Grid services]]<br />
<br />
==Key Documents==<br />
<br />
Important documents are called [https://www.gridpp.ac.uk/php/KeyDocs.php Key Docs]. This describes how<br />
to make sure the core documents get maintained well.<br />
<br />
Deletions or other major changes can be listed for discussion in [[Stale documents]].<br />
<br />
==GridPP Cloud==<br />
* [[GridPP Cloud]]<br />
* [[Cloud Work at Imperial ]]<br />
<br />
==GridPP Tier2 Support==<br />
<br />
===Experiment===<br />
<br />
* [[Site contacts]]<br />
<br />
===Middleware===<br />
<br />
* [[ Argus Server ]]<br />
* [[Grid Certificate]]<br />
* [[Data Management]]<br />
* [[Grid Storage]]<br />
* [[GitHub Repositories ]]<br />
* [[Cluster Management]]<br />
* [[:Category:Batch_Systems|Batch Systems]]<br />
* [[Virtualisation]]<br />
* [https://grid-deployment.web.cern.ch/grid-deployment/cgi-bin/dashboard.cgi Glite software builds]<br />
* [[Yaim - GridPP specific settings]]<br />
* [[Puppet]]<br />
* [https://www.gridpp.ac.uk/wiki/Staged_rollout Staged Rollout]<br />
* Also see [[Security_Information | Security Information ]] for some of the security middleware<br />
<br />
===Operations===<br />
<br />
* [[ARGUS deployment]]<br />
* [[Batch system status]]<br />
* [[Procedures]]<br />
* [[Site information]]<br />
* [[Site status and plans]]<br />
* [[Protected Site networking|Site networking]]<br />
* [[:Category:Incidents]]<br />
* [[Regional operations]]<br />
* [[HEPSPEC06]]<br />
** [[Publishing tutorial]]<br />
<br />
* [[Middleware transition]]<br />
* [[Compatibility table for EMI2/3 and SL5/6 ]]<br />
<br />
===Middleware - early adoption & testing===<br />
<br />
* [https://www.gridpp.ac.uk/wiki/Staged_rollout Staged Rollout]<br />
* [https://www.gridpp.ac.uk/wiki/IPv6 IPv6 TestBed]<br />
<br />
===VO Support===<br />
<br />
* [[GridPP approved VOs]]<br />
** [[Adoption of Backup GridPP Voms Servers]]<br />
** [[Maintaining GridPP approved VOs]]<br />
** [[Policies for GridPP approved VOs]]<br />
***[https://www.gridpp.ac.uk/wiki/EGI_VO_registration EGI policies and registration ]<br />
** [[VomsSnooper Tools]] - Generate and check your config from the [http://operations-portal.egi.eu/vo Operations portal]<br />
** [[Non LHC VO Status and Plans]]<br />
* [[Current VO Fairshares at T2/T1]]<br />
* [[New VO deployment]] <br />
* [[Instruction for VO administrators]]<br />
* [[VOMS proxy time limited and how to request an extension]]<br />
* [[ATLAS Software Installation]]<br />
* [[CMS in the UK]]<br />
* [[Setting up an Automatic Blacklisting Service]] with ganga<br />
<br />
==== Getting up and running on the grid - site admins ====<br />
<br />
A tool, Instant UI, has been written to set up a User Interface (UI) quickly and easily. Instructions are available here: <br />
<br />
* [[User Interface (UI) to support approved VOs]] (tested on EMI3/SL6)<br />
<br />
For those who wish to "roll their own", a routinely updated tarball of a typical site-info.def/vo.d is available here:<br />
<br />
* http://hep.ph.liv.ac.uk/VomsSnooper/UI_glitecfg.tar<br />
<br />
This setup contains all the [[GridPP approved VOs]] and should be up to date.<br />
<br />
Manual ways to setup a UI are also documented here:<br />
<br />
* [[Installing a UI for Grid Submission]] (tested on EMI3/SL6)<br />
* [[Installing a UI]]<br />
* [[Configuring a UI]]<br />
<br />
==== Getting up and running on the grid - users ====<br />
* [[Grid user crash course]]<br />
* [[Quick Guide to Dirac|A quick guide to DiRAC]]<br />
* [[A quick guide to CVMFS]]<br />
* http://wiki.egee-see.org/index.php/Quick_User_Guide_for_Submitting_Jobs - another take on getting started<br />
<br />
* Job management - managing the lifecycle of jobs<br />
** [[Long running jobs using myproxy]]<br />
** http://www.gridpp.ac.uk/deployment/users/gridsite-admin.cgi?cmd=print&file=myproxy.html Good explanation about running jobs using myproxy,<br />
<br />
** [[Requiring software]]<br />
** [[Requiring Data]]<br />
<br />
==== Getting up and running on the grid - new VOs ====<br />
<br />
* [[Setting up a new Virtual Organisation (VO)]]<br />
* [[Where to get help]]<br />
* [[Setting up the VO]] <br />
* [[Wearing different hats]]<br />
* [[Data Management]]<br />
** [[Using Castor At RAL]]<br />
* [[Job Management]]<br />
<br />
* [[Software Deployment]]<br />
* [[Some simple test jobs]]<br />
* [[KeyTokens]] Hardware keys for storing certificates - use for automated data management.<br />
<br />
* [[Other Useful VO links]]<br />
<br />
===Monitoring===<br />
<br />
* [[ UKI WLCG Regional Nagios]]<br />
* [[ Backup Regional Nagios ]]<br />
* [[Nagios]]<br />
* [[MonAMI]]: [[MonAMI_by_example|MonAMI by example]] tutorial, and how to monitor [[MonAMI_dCache_plugin|dCache]] and [[MonAMI_DPM_plugin|DPM]].<br />
* [[Links_Monitoring_pages]]<br />
* [[Ranked Monitoring pages]]<br />
* [[ATLAS Monitoring For Sites]]<br />
* [[ ROD rota]]<br />
* [[MonitoringTools|Monitoring Tools]]<br />
* [[Monitoring]]<br />
<br />
===Availability===<br />
<br />
* [[SL ATLAS tests]]<br />
* [[Atlas_FCR_Procedure]]<br />
* [[Steve Lloyd SAM availability]] (summary graphs)<br />
* [[EGEE_availability_reports]]<br />
* [[Availability graphs]]<br />
<br />
===Hardware===<br />
<br />
* [[Guidance and recent purchases]]<br />
* [[Tier1 Procurements]]<br />
<br />
== LCG Service Challenges ==<br />
<br />
*[[Service Challenges]]<br />
**[[Service Challenge 4]]<br />
** [[GridPP Answers to 10 Easy Network Questions]]<br />
** [[Security Service Challenges]]<br />
<br />
== Operations Team ==<br />
* Agenda's for the meetings will be on indico here: https://indico.cern.ch/categoryDisplay.py?categId=4592<br />
*[[Operations Team Action items|Action items]]<br />
*[[Deployment Issues|Issues log]]<br />
*[[Discussion and tasks list]]<br />
*[[GDB reports]]<br />
*[[Working with NGS]]<br />
*[[NGS Surgery Reports]]<br />
*[[:Category:UKI Testzone|UKI Testzone]]<br />
*[[EVO Tips and Tricks]]<br />
*[[Resiliency and Disaster Planning]]<br />
*[https://www.gridpp.ac.uk/wiki/Operations_Bulletin_Latest Bulletin Latest]<br />
Deployment Team pages<br />
<br />
*[[Deployment Team Action items|Action items]]<br />
<br />
== Useful Links ==<br />
* FootPrints HelpDesk: http://helpdesk.grid-support.ac.uk<br />
* [[GridPP blogs]]<br />
<br />
== Test Pages (experimental) ==<br />
[[Separate Site Status Pages]]<br />
<br />
==GridPP Wiki Categories==<br />
* [[:Special:Categories]] (List of all categories) <br />
* [[:Special:Allpages]] (List of all pages) <br />
* [[:Category:Top Level]]<br />
* [[:Category:Glossary]]<br />
<br />
==About GridPPWiki==<br />
<br />
* If you want to add content to this Wiki, then see [http://meta.wikimedia.org/wiki/Help:Editing advice on basic markup]. <br />
* Before you add a section it's worth browsing the names of [[Special:Allpages | all pages in the wiki]] to be sure that someone hasn't started an article with a slightly different name.<br />
* Please use [[Special:Categories | wiki categories]] to help organise the information in wiki. You can make categories hierarchical by [http://meta.wikimedia.org/wiki/Help:Category#Subcategories adding the category] to another category (a good example is that [[:Category:File Catalog]] is a subcategory of [[:Category:Data Management]] which is a subcategory of [[:Category:Grid Middleware]] which is a subcategory of [[:Category:Top Level]]). General [http://meta.wikimedia.org/wiki/Help:Category Documentation] on categories is useful.<br />
<br />
__NOTOC__</div>Kashif Mohammad 6ae08fa8ffhttps://www.gridpp.ac.uk/wiki/Operations_Bulletin_LatestOperations Bulletin Latest2014-05-20T09:26:08Z<p>Kashif Mohammad 6ae08fa8ff: /* */</p>
<hr />
<div>[[Operations_Bulletin_Archive|Bulletin archive]]<br />
<br />
__NOTOC__<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Week commencing 19th May 2014<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Task Areas<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start General****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | General updates <br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start General text*********************** ----->'''<br />
'''Monday 19th May'''<br />
* David C has put together a [http://gridpp-monitoring.blogspot.co.uk blog on monitoring]. Who can/will contribute content?<br />
* HEPiX takes place this week (19th-23rd May) and talks are available from the [https://indico.cern.ch/event/274555/timetable/#all.detailed the event page]. Monday covered some site reports and OS related updates. Tuesday's focus is batch systems. Wednesday covers IPv6, security and benchmarking. Thursday storage, monitoring and infrastructure deployment. Friday is cloud day. <br />
* The [https://indico.egi.eu/indico/conferenceTimeTable.py?confId=1994#all.detailed EGI Community Forum] takes place this week in Helsinki. There are talks/tracks covering: Helix-nebula; earth sciences; CSIRT (focus on clouds); tools updates (incl. GOCDB and APEL); Lifesciences; data preservation; vulnerability handling; sustainability; federated clouds; DiRAC; data management ... and of course H2020!<br />
<br />
'''Monday 12th May'''<br />
* The PMB discussed the issues raised at last week's ops meeting regarding LHCONE and reiterate that the UK position is that we do not need to join LHCONE, though the technical issue of whether a peering point is possible is being investigated by JANET. The UK position will only change if there is a demonstrable need in this area and the experiments formally request it.<br />
* There is a [http://indico.cern.ch/event/272787/ pre-GDB this week on Data Access]. This includes experiment plans in the area but also a review of the recent workshop (looking at monitoring - what data needs to be kept since already have 1TB, cost models etc.).<br />
* There is a [http://indico.cern.ch/event/272621/ GDB on Wednesday 14th May]. It consists mainly of update reports in areas such as configuration management and operations coordination.<br />
* Still waiting on some availability/reliability 'explanations' from last week. How many sites struggle to get useful information from the SAM results?<br />
<br />
<br />
<!-- **********************End General text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End General****************** -----><br />
<!-- ****************Start ops coord****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Operations Coordination - [https://indico.cern.ch/categoryDisplay.py?categId=4372 Agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start ops coord text*********************** -----><br />
'''Thursday 15th May'''<br />
* There was a middleware readiness [http://indico.cern.ch/event/314807/ meeting last Thursday]. Most updates are appearing in the [https://twiki.cern.ch/twiki/bin/view/LCG/MiddlewareReadiness twiki].<br />
* Focus has been on the volunteer sites and getting a testing process in place. Focus has been on what currently exists/happens at each site.<br />
* There was also a look at how the middleware baselines information is references and used/applied at the T0 and T1s.<br />
* Some discussion of a [https://indico.cern.ch/event/314807/material/slides/1.pdf proposal] on how to monitor installed middleware packages. <br />
* Discussion mainly about middleware packages vs RPMs and defining what is up-to-date from the results. <br />
* Tests will be carried out with volunteer sites and Pakiti used as a possible way forward.<br />
<br />
<br />
'''Monday 12th May'''<br />
* There was a WLCG operations coordination meeting last [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes140508 Thursday 8th May]. The next meeting will be on 22nd May, ([http://indico.cern.ch/event/313440/ a preliminary agenda] is available.<br />
* Updates from the 8th May:<br />
<br />
* Alastair Dewhurst replaces Simone Campana in the IPv6 task force<br />
* Future support for ARGUS is being reviewed. SWITCH will support it for another 6 months.<br />
* There is to be a new task force (or working group) on network and transfer metrics. The proposed mandate is to identify and publish the metrics, make sure that issues can be better understood and fixed, and enable use of network-aware tools. <br />
* A reminder to [https://indico.cern.ch/event/305362/ register] for the WLCG workshop.<br />
* Security support for EMI-2 ended on April 30th, all baseline versions increased to EMI-3 except for dCache for which support was extended. <br />
* The [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions WLCG baselines pag]e is up-to-date. Please check it.<br />
<br />
* No significant job efficiency differences between CERN Geneva and Wigner (i.e. depending on location) have been found. Still following up on several possibilities (see the [https://indico.cern.ch/event/302033/contribution/3/material/slides/0.pdf MB presentation]).<br />
* T0 WMSes (except SAM instances) now powered off.<br />
* DPM 1.8.8 has been released to EPEL-stable.<br />
* A series of storage developer meetings are taking place with an aim to ensure consistent, complete and correct publishing of storage systems to GLUE2, in particular relating to capacity publishing.<br />
<br />
* ALICE: Activity ahead of Quark Matter 2014 (May 19-24, GSI Darmstadt) <br />
* ATLAS: MC - lower activity in the last two weeks. Rucio stress test planned to start after 20th of May. Multi core allocation - sites asked to reduce the multi-core partition in case of static single-core/multi-core allocation.<br />
* CMS: SAM test for glexec goes critical on May 15th. Reminder to sites to please deploy detailed xrootd monitoring. Started to send production workflows through mixture of multi-core and single-core pilots. FTS3 for Phedex Debug transfers becoming mandatory now.<br />
* Some discussion on strategy for availability recalculation in case of failures or timeouts in submission of SAM jobs through gliteWMS, which do not necessarily affect production jobs.<br />
* LHCb: Incremental stripping campaign finished, all productions closed. CASTOR->EOS migration of LHCb user data finished.<br />
<br />
* Tracking tools: GGUS proposal to stop ticket creation through email. <br />
* FTS3: CERN prod instance has been upgraded to the latest stable version 3.2.22. RAL on Wednesday 14th May.<br />
* glexec: Only one change. See [https://twiki.cern.ch/twiki/bin/view/LCG/GlexecDeploymentTracking deployment tracking page]. (UK= UCL, Lancs, ECDF).<br />
* Machine/Job features: development and soon deployment of a machine/job features service for a cloud infrastructure. Soon mfj.py client does not need to be deployed as LHC VOs plan to bring it in s/w stack.<br />
* M/W readiness: Checking usage of baselines page. [http://indico.cern.ch/event/314807/ Next meeting Thursday] 15th @ 09:30 UK time.<br />
* Multicore: start to evaluate the compatibility of ATLAS and CMS approaches to submitting multicore jobs to shared sites.<br />
* SHA-2: New issue found for CERN VOMS - job submission to CREAM fails when the proxy is signed by a VOMS server with a SHA512 host certificate. Fix out soon. Sites will then need to update CEs. CMS has found no blocking issues with RFC proxie.<br />
* WMS decom: CERN WMS instances for experiments have been switched off on May 5 <br />
* IPv6: HEPiX IPv6 meeting last week. Also, lxplus-ipv6.cern.ch, an lxplus instance with dual-stack connectivity now available.<br />
* HTTP proxy discovery: Waiting on full implementation of the SquidMonitoringTaskForce recommendations - then sites will need to register squids. <br />
<br />
<br />
<br />
'''Tuesday 6th May'''<br />
* There will be a WLCG ops coordination meeting this [https://indico.cern.ch/event/313378/ Thursday 8th May]. Pre-meeting reports can be found in the [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes140508 twiki].<br />
<br />
<br />
<!-- **********************End ops coord text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End ops coord****************** -----><br />
<br />
<!-- ****************Start T1****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-1 - [http://www.gridpp.rl.ac.uk/status/ Status Page]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- *********************************************************** -----><br />
<!-- ***********************Start T1 text*********************** -----><br />
'''Tuesday 20th May'''<br />
* Testing CVMFS client 2.1.19 ongoing.<br />
* In process of scheduling Castor 2.1.14 upgrade. (Now likely to be 10th June for nameserver with stagers in the weeks after that).<br />
* We are looking at how to end the FTS2 service, now FTS3 is becoming widely used.<br />
* The software server used by the small VOs will be withdrawn from service (aiming for June).<br />
<!-- **********************End T1 text************************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End T1****************** -----><br />
<br />
<!-- ****************Start Storage & DM****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Storage & Data Management - [http://storage.esc.rl.ac.uk/weekly/ Agendas/Minutes]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
====== ======<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 6th May'''<br />
* There was a DPM collaboration meeting last Wednesday. <br />
* The following priorities were agreed for the next year: <br />
** YAIM->Puppet transition (YAIM support ends this year); <br />
** I/O Monitoring; GridFTP redirection - available now for testing; <br />
** Admin interface and improved HTTP file management; <br />
** Nightly testing of WAN HTTP access performance, Hammercloud;<br />
** Removal of legacy components where possible (eg RFIO); <br />
** System logging via dmlite; <br />
** Rebalancing utilities; <br />
** and move of web presence and docs to an indexable Drupal site.<br />
<br />
'''Tuesday 22nd April'''<br />
* A DPM collaboration meeting is being planned for the coming week(s). Are there any site comments or feedback on DPM as a product (e.g. speed of new feature development) and the support it receives?<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- ************************************************************ -----><br />
|}<br />
<br />
<!-- ****************End Storage & DM****************** -----><br />
<!-- ****************Start Accounting****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Accounting - [http://pprc.qmul.ac.uk/~lloyd/gridpp/metrics.html UK Grid Metrics] [[HEPSPEC06]] [http://tinyurl.com/cfbfanf Atlas Dashboard HS06]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 20th May'''<br />
* Sites with APEL '[http://accounting.egi.eu/egi.php?SubRegion=1.67&query=normcpu&startYear=2014&startMonth=3&endYear=2014&endMonth=5&yRange=SUBREGION&xRange=VO&voGroup=top10&chart=GRBAR&scale=LIN&localJobs=onlygridjobs delays]': IC, Liverpool, Sheffield, Durham, ECDF and Glasgow.<br />
<br />
'''Tuesday 13th May'''<br />
* Will review GridPP metrics soon. Trying to get table up-to-date first.<br />
* No HEPSPEC06 wiki updates showing SL6 results for UCL or RALPP.<br />
* ATLAS HS06 coefficient for Lancaster 13.9?<br />
* APEL publishing 'stopped' for Liverpool, ECDF and Glasgow.<br />
<br />
<br />
'''Tuesday 29th April'''<br />
* Glasgow looks slightly delayed with recent accounting data publishing.<br />
<br />
'''Tuesday 15th April'''<br />
* The APEL accounting system has been undergoing database maintenance to improve performance and reliability. Networking problems at the RAL site have delayed completion of the operation. Sites may see nagios alerts warning them that they have not published accounting data for 7 days - these will stop after the maintenance work completes. <br />
<br />
* Check publishing via: http://gstat2.grid.sinica.edu.tw/gstat/summary/Country/UK/ <br />
<br />
<!-- *****************Edit stop*********************** -----><br />
<!-- *********************************************************** -----><br />
|}<br />
<!-- ****************End Accounting****************** -----><br />
<!-- ****************Start Documentation****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Documentation - [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=area KeyDocs]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- *********************************************************** -----><br />
<!-- ******************Edit start********************* -----><br />
<br />
See the [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=reviewed worst KeyDocs list] for documents needing review now and the names of the responsible people.<br />
<br />
'''Tuesday 6th April'''<br />
* KeyDocs are going to be reviewed (in next 4 weeks) as the system is not working (or not adding anything) in some areas.<br />
<br />
'''Tuesday 15th April'''<br />
* There was an extended TB-SUPPORT thread on testing out https://www.gridpp.ac.uk/wiki/Grid_Certificate. Summary and conclusions?<br />
* Ops changes recorded to https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs. <br />
<br />
'''Tuesday 1st April'''<br />
* [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=reviewed Keydocs] action needed by Jens J; Rob H/Security T; Alessandra F; Wahid B; David C and Matt D.<br />
* We need to reassign Mark M's documents on Core Grid Services<br />
<br />
<br />
'''Tuesday 18th March'''<br />
* Keydocs action needed by: Mark M; Jens J; Rob H/Security T; Alessandra F; Wahid B; David C and Matt D.<br />
<br />
|}<br />
<br />
<!-- ****************End Documentation****************** -----><br />
<!-- ****************Start Interop****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Interoperation - [https://wiki.egi.eu/wiki/Grid_Operations_Meetings EGI ops agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday May 6th'''<br />
* There was a meeting yesterday, the agenda is here: https://wiki.egi.eu/wiki/Agenda-05-05-2014<br />
* Two things to pull out:<br />
** UMD 3 EA list:<br />
*** John Gordon was in touch with Cristina Aiftimiei and noted that in his opinion the UK sites listed as UMD-1/2 would probably still be taking part for UMD-3 (not least because UMD-1/2 are effectively no longer extant). <br />
*** In the agenda is a list of a few UK sites that haven't confirmed their contacts with Joao Pina; please have a look and get back to him - all he's looking for is a note to make sure that the contact list is up to date.<br />
** EMI-2 decommissioning<br />
*** Update spreadsheet on progress as of 24th April: http://goo.gl/vY6Mtm (I note that several of the UK mentions in this list have had updates since then)<br />
*** Overview slides [pdf]: https://indico.egi.eu/indico/getFile.py/access?contribId=9&resId=2&materialId=slides&confId=2162<br />
*** Plan is that Cristina will take stock at the end of this week and contact NGIs with outstanding upgrades for an idea of their plans.<br />
* Also discussed was the migration of Central SAM services & reconfiguration of NGIs SAM instances<br />
* Next meeting June 2<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<!-- *********************************************************** -----><br />
<br />
|}<br />
<!-- ****************End Interop****************** -----><br />
<!-- ****************Start Monitoring****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Monitoring - [https://www.gridpp.ac.uk/wiki/Links_Monitoring_pages Links] [http://grid-monitoring.cern.ch/mywlcg/ MyWLCG]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 6th May'''<br />
* Next meeting this Friday on MVC and caching: <br />
** Agenda: https://indico.cern.ch/event/316273/<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Monitoring****************** -----><br />
<!-- ****************On-duty****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | On-duty - [https://operations-portal.in2p3.fr/dashboard Dashboard] [https://www.gridpp.ac.uk/wiki/ROD_rota ROD rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Monday 12th May'''<br />
* Problems with dashboard<br />
* Issue with UCL availability ticket<br />
<br />
* EGI identified EMI/UMD-2 endpoints at:<br />
** UCL - DPM, WNs, BDII, CE<br />
** Durham - CE<br />
** ECDF - CE, info3<br />
** Sussex - CE, BDII<br />
** Bristol - CEs<br />
<br />
'''Tuesday 6th May'''<br />
* One ticket expiry dealt with promptly.<br />
* A number of the "EMI-3" tickets have now been closed - there has been good progress. However, some do remain.<br />
* UCL ticket about low availability. The cause has been fixed. It is expected to stay open until their availability has risen to an acceptable level again.<br />
* Very slow refresh of the Nagios test results as seen on the ROD dashboard. In some cases the dashboard still showed test result <br />
states for the previous day. Using gridppnagios display to see the 'real' state of any given Nagios test.<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Rollout [http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html Status] [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions WLCG Baseline]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
'''Tuesday 18th March'''<br />
* The EMI-2 decommissioning task has started.<br />
* The [http://indico.cern.ch/event/MW-Readiness_3 next WLCG middleware readiness WG meeting] takes place this afternoon at 13:30 UK time. <br />
<br />
'''Tuesday 11th February'''<br />
* 31st May has been set as the deadline for EMI-2 decommissioning. There may be an issue for dCache (related to 3rd party/enstore component).<br />
<br />
'''References'''<br />
<br />
* Staged Rollout pages (now separated into EMI1 & 2), and the page listing the deployed versions is extractable from the bdii, so they should all be reasonably up-to-date:<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout_emi2.html<br />
* http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End On-duty****************** -----><br />
<!-- ****************Start Security****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Security - [http://www.gridpp.ac.uk/security/inchand/Incident.html Incident Procedure] [http://www.gridpp.ac.uk/security/policies/index.html Policies] [https://www.gridpp.ac.uk/wiki/SD_rota Rota]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
'''Tuesday 29th April'''<br />
* The changes to the regional dashboard make the on-duty task harder. Need to rely on Pakiti again.<br />
<br />
'''Tuesday 15th April'''<br />
* Update on the OpenSSL status.<br />
* The discussion list members have been updated. Anyone missing?<br />
<br />
<br />
|}<br />
<!-- ****************End Security****************** -----><br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start Services****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Services - [http://netmon02.grid.hep.ph.ic.ac.uk:8080/maddash-webui/index.cgi PerfSonar dashboard] | [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).<br />
<br />
'''Tuesday 13th May'''<br />
* Ewan's gridpp VO membership expired without warning. Does this only go to the VO admin for VOs on the [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS]?<br />
<br />
'''Tuesday 29th April'''<br />
* It was mentioned several weeks ago that the perfsonar meshes were being sorted by host name and that sorting by site name would be available soon. This is now the case. You can see the familiar GridPP site sorting [http://netmon02.grid.hep.ph.ic.ac.uk:8080/maddash-webui/index.cgi?dashboard=UK%20sites here] and the [http://maddash.aglt2.org/maddash-webui/index.cgi?dashboard=WLCG%20sites large WLCG mesh here]. Note the square of GridPP sites towards the bottom right. Red squares represent throughput of less than 500 Mb/s.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Services****************** -----><br />
<!-- ****************Start Tickets****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tickets<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tickets****************** -----><br />
<!-- ****************Start Tools****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tools - [https://gridppnagios.physics.ox.ac.uk/myegi MyEGI] [https://gridppnagios.physics.ox.ac.uk/nagios/ Nagios]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== ===== <br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 20th May'''<br />
* Central myegi service moved from http://grid-monitoring.cern.ch/myegi to https://mon.egi.eu/myegi/ . Please visit new portal to check availability/reliability figures<br />
* There was an issue with central message broker and some results have been lost because of this. Mail from central sam team<br />
''Between May 1st and May 12th, SAM-CENTRAL and the Message Broker Network have experienced a set of chained failures that resulted in the loss of a large portion of the metric results that were published by the SAM NGI Instances. The loss of these messages will result in an unusually high number of UNKNOWNS in the May A/R reports, but the actual A/R numbers will not be affected as UNKNOWNS are not take into account. No other services have been affected.''<br />
<br />
'''Tuesday 13th May'''<br />
* From last week's discussion DiRAC now supports: NA62, vo.landslides.mossaic.org, t2k.org, snoplus, gridpp, CERN@school and northgrid. NA62 are moving from LFC to DFC and plan to use DiRAC in place of the WMS.<br />
<br />
'''Monday 17th March'''<br />
* The [[Quick_Guide_to_Dirac|GridPP DIRAC]] service is now able to submit jobs to VMs on [http://www.gridpp.ac.uk/vac/ Vac] sites. [[Vac configuration for GridPP DIRAC]] explains how to configure a Vac site to run GridPP DIRAC jobs in VMs. More volunteer sites would be useful.<br />
* EGI has published a new [https://documents.egi.eu/document/2069 roadmap for operations tools].<br />
<br />
'''Tuesday 26th November'''<br />
* Regional [https://gridppnagios.lancs.ac.uk/nagios Nagios] updated to release 22. It is a glite to UMD update and it required a fresh installation.<br />
* There have been some internal changes in SAM-Nagios. Test probes are now the responsibility of product team. Some test names have been changed as a result of this reorganization. For example the org.sam.CREAMCE-DirectJobSubmit test has become emi.cream.CREAMCE-DirectJobSubmit. This does not affect the operational activities. <br />
* Please could all site admins look at services associated to their site and please mail Kashif if anything odd is noticed. Site admins can reschedule tests for their sites and it would be helpful if most functionalities are tested.<br />
* Also, look at [https://gridppnagios.lancs.ac.uk/myegi/ myegi] which can be useful with links to the Dashboard, GSTAT, Accounting Portal and GGUS. <br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Tools****************** -----><br />
<!-- ****************Start VOs****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | VOs - [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS] [http://operations-portal.egi.eu/vo VO IDs] [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs Approved] [http://pprc.qmul.ac.uk/~walker/votable.html VO table]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Tuesday 15th April'''<br />
* Is there interest in an FTS3 web front end? ([http://indico.cern.ch/event/272620/contribution/11/material/slides/1.pdf more details])<br />
<br />
'''Monday 17 February 2014'''<br />
* Proxy renewal<br />
** All RAL WMSs now renew proxies with 1024 bits. This looks like the end of this (at last). <br />
<br />
<br />
'''Tuesday 11 February 2014'''<br />
* Proxy renewal<br />
** lcgwms06 at RAL has been upgraded and works<br />
** Both Imperial's WMSs work<br />
** Glasgow's will still need to be upgraded (unless they have been since Friday). <br />
<br />
* Impact<br />
** Citation policy (https://www.gridpp.ac.uk/acknowledging.html)<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End VOs****************** -----><br />
<!-- ****************Start Sites****************** -----><br />
<br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"<br />
|-<br />
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Site Updates<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Tuesday 13th May<br />
* QMUL - some issues with CPU allocation times based on ATLAS JDL.<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Sites****************** -----><br />
|}<br />
<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Meeting Summaries<br />
|}<br />
<br />
{| width="100%" cellspacing="0" cellpadding="0"<br />
<br />
|-<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ******************************************* -----><br />
<!-- ****************Start PMB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Project Management Board - [http://www.gridpp.ac.uk/pmb/ Members][http://www.gridpp.ac.uk/php/pmb/minutes.php Minutes] [http://www.gridpp.ac.uk/pmb/ProjectManagement/QuarterlyReports/reports.html Quarterly Reports]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ******************************************* -----><br />
<!-- ****************End PMB****************** -----><br />
<!-- ****************Start Grid Ops****************** -----><br />
<!-- ******************************************* -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | GridPP ops meeting - [http://indico.cern.ch/categoryDisplay.py?categId=338 Agendas] [https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Actions] [https://www.gridpp.ac.uk/wiki/Category:GridPP_Operations Core Tasks]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Grid Ops****************** -----><br />
<!-- ****************Start T1 liaison****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) [https://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting Agenda] Meeting takes place on Vidyo.<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
'''Wednesday 14th May 2014'''<br />
* [https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2014-05-14 Operations report]<br />
* Ongoing testing of CVMFS client 2.1.19. So far so good<br />
* In process of scheduling Castor 2.1.14 upgrade. Proposed date for Nameserver upgrade now changed to Tuesday 10th June.<br />
* As stated last week we are proposing to turn off the CREAM CEs. We are also starting to plan to end the FTS2 service.<br />
* Reminder: The software server used by the small VOs will be withdrawn from service (aiming for June).<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Liaison****************** -----><br />
<!-- ****************Start GDB****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Grid Deployment Board - [http://indico.cern.ch/categoryDisplay.py?categId=3l181 Agendas] [http://indico.cern.ch/categoryDisplay.py?categId=666 MB agendas]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End GDB****************** -----><br />
<br />
<!-- ******************************************** -----><br />
<!-- ******************COLUMN 2****************** -----><br />
<!-- ******************************************** -----><br />
<br />
<br />
| width="50%" style="vertical-align: top;" |<br />
<!-- ****************Start NGI****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | NGI UK - [http://www.ukngi.ac.uk/ Homepage] [https://ca.grid-support.ac.uk/cgi-bin/pub/pki?cmd=getStaticPage&name=index CA]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End NGI****************** -----><br />
<br />
<!-- ****************Start Events****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Events<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************Stop Events****************** -----><br />
<!-- ****************Start ATLAS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK ATLAS - [http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?#currentView=Shifter+view&highlight=false Shifter view] [http://www.atlas.ac.uk/ops/ News & Links]<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End ATLAS****************** -----><br />
<!-- ****************Start CMS****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK CMS<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End CMS****************** -----><br />
<!-- ****************Start LHCb****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK LHCb<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
Empty<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End LHCb****************** -----><br />
<!-- ****************Start Other****************** -----><br />
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0em 0em 0.3em;"<br />
|-<br />
| style="background-color: #f8f9ca; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | UK OTHER<br />
|-<br />
| style="padding: 0.4em 0.4em 0.4em 0.4em; line-height:0.5em;" |<br />
<br />
===== =====<br />
<!-- ******************Edit start********************* -----><br />
<br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
|}<br />
<!-- ****************End Other****************** -----><br />
<!-- ****************Start Requests****************** -----><br />
{| cellspacing="0" cellpadding="0" style="margin: 1em 0em 0em 0.3em; width:100%"<br />
| style="width:50%; vertical-align:top; border:1px solid Gold; background-color: LightGreen;" rowspan="2"|<br />
<div style="border-bottom:1px solid Gold; background-color:#ffffaa; padding:0.2em 0.5em 0.2em 0.5em; font-size:110%; font-weight:bold;"> To note</div><br />
<div style="padding:0.4em 1em 0.3em 1em;"><br />
<br />
==== ====<br />
<!-- ******************Edit start********************* -----><br />
* N/A<br />
<br />
<!-- ******************Edit stop********************* -----><br />
<br />
</div><br />
|}<br />
<br />
<!-- ****************End Requests****************** -----><br />
<br />
|}</div>Kashif Mohammad 6ae08fa8ffhttps://www.gridpp.ac.uk/wiki/Batch_system_statusBatch system status2014-05-06T10:09:38Z<p>Kashif Mohammad 6ae08fa8ff: /* Sites batch system status */</p>
<hr />
<div>== Other links ==<br />
<br />
* [https://twiki.cern.ch/twiki/bin/view/LCG/BatchSystemComparison Batch System Comparison Table]<br />
<br />
== Sites batch system status == <br />
<br />
This page has been setup to collect information from GridPP sites regarding their batch systems in February 2014. The information will help with wider considerations and strategy. The table seeks the following:<br />
<br />
1) Current product (local/shared) - what is the current batch system at the site. Is it locally managed or shared with other groups?<br />
<br />
2) Concerns - has your site experienced any problems with the batch system in operation?<br />
<br />
3) Interest/Investigating/Testing - Does your site already have plans to change and if so to what. If not are you actively investigating or testing any alternatives?<br />
<br />
4) CE type(s) - What CE type (gLite, ARC...) do you currently run and do you plan to change this, perhaps in conjunction with a batch system move?<br />
<br />
5) Cloud interface(s)? - Does your site offer access to resources in ways other than via a CE? <br />
<br />
6) Notes - Any other information you wish to share on this topic.<br />
<br />
<br />
<br />
<br />
{|border="1",cellpadding="1"<br />
|+<br />
<br />
|-style="background:#7C8AAF;color:white"<br />
|Site<br />
|Current product (local/shared)<br />
|Concerns and observations<br />
|Interest/Investigating/Testing<br />
|CE type(s) & plans at site<br />
|Cloud interface available/plans<br />
|Notes<br />
<br />
|-<br />
|RAL Tier-1<br />
|<span style="color:green">HTCondor (local)</span><br />
|<span style="color:green">None</span><br />
|<span style="color:green">No reason to change</span><br />
|<span style="color:green">ARC & CREAM CEs, but would like to decommission CREAM CEs eventually</span><br />
|<span style="color:green"></span><br />
|<br />
<br />
|-<br />
|UKI-LT2-Brunel<br />
|<span style="color:green">Torque/Maui</span><br />
|<span style="color:green">No support for Torque/Maui</span><br />
|<span style="color:green">Slurm and HTCondor in test</span><br />
|<span style="color:green">Arc in test</span><br />
|<span style="color:green">OpenVZ in production, Docker in test</span><br />
|<br />
<br />
|-<br />
|UKI-LT2-IC-HEP<br />
|<span style="color:green">Gridengine</span><br />
|<span style="color:green">None</span><br />
|<span style="color:green">None</span><br />
|<span style="color:green">CREAM, ARC</span><br />
|<span style="color:green">GridPP Cloud Tests</span><br />
|<br />
<br />
<br />
|-<br />
|UKI-LT2-QMUL<br />
|<span style="color:green">Gridengine (local)</span><br />
|<span style="color:green">None</span><br />
|<span style="color:green">son of gridengine</span><br />
|<span style="color:green">cream</span><br />
|<span style="color:green">scalable solution to get our storage usable in the cloud</span><br />
|<br />
<br />
|-<br />
|UKI-LT2-RHUL<br />
||<span style="color:green">Torque/Maui (local)</span><br />
|<span style="color:green">Torque/Maui support non-existent</span><br />
|<span style="color:green">Will follow the consensus</span><br />
|<span style="color:green">Cream</span><br />
|<span style="color:green"></span><br />
|<br />
<br />
<br />
|-<br />
|UKI-LT2-UCL-HEP<br />
|<span style="color:green"></span><br />
|<span style="color:green"></span><br />
|<span style="color:green"></span><br />
|<span style="color:green"></span><br />
|<span style="color:green"></span><br />
|<br />
<br />
<br />
|-<br />
|UKI-NORTHGRID-LANCS-HEP<br />
||<span style="color:green">Son of Gridengine (HEC), torque/maui (local)</span><br />
|<span style="color:green">Disillusioned with torque/maui.</span><br />
|<span style="color:green">Slurm or HTCondor.</span><br />
|<span style="color:green">Cream, interested in ARC</span><br />
|<span style="color:green">VMWare testing.</span><br />
|<br />
<br />
|-<br />
|UKI-NORTHGRID-LIV-HEP<br />
|<span style="color:green">Torque Maui</span><br />
|<span style="color:green">Poor Support, Maui intrinsically broken</span><br />
|<span style="color:green">Slurm (Condor?) </span><br />
|<span style="color:green">Cream</span><br />
|<span style="color:green">None</span><br />
|<br />
<br />
|-<br />
|UKI-NORTHGRID-MAN-HEP<br />
|<span style="color:green">Torque/Maui (local)</span><br />
|<span style="color:green">Maui is unsupported. It had memory leaks. Robert wrote a patch and there was nowhere to feed it back into.</span><br />
|<span style="color:green">slurm</span><br />
|<span style="color:green">Currently CreamCE, investigating ARC-CE</span><br />
|<span style="color:green">Vac in production on testbed</span><br />
|<br />
<br />
<br />
|-<br />
|UKI-NORTHGRID-SHEF-HEP<br />
|<span style="color:green">Torque/Maui (local)</span><br />
|<span style="color:green">Torque/Maui support non-existent</span><br />
|<span style="color:green">Will follow the consensus</span><br />
|<span style="color:green">CREAM CE</span><br />
|<span style="color:green"></span><br />
|<br />
<br />
<br />
|-<br />
|UKI-SCOTGRID-DURHAM<br />
|<span style="color:green">Torque/Maui - Local</span><br />
|<span style="color:green">Becomes unresponsive and unstable. Doesn't behave particularly well if it looses nodes.</span><br />
|<span style="color:green">SLURM</span><br />
|<span style="color:green">Currently CreamCE, would like to use ARC as a replacement</span><br />
|<span style="color:green">N/A</span><br />
|<br />
<br />
<br />
|-<br />
|UKI-SCOTGRID-ECDF<br />
|<span style="color:green">Gridengine</span><br />
|<span style="color:green">None</span><br />
|<span style="color:green">No plans to change</span><br />
|<span style="color:green">Cream CE for standard production, ARC CE for exploratory HPC work</span><br />
|<span style="color:green"></span><br />
|<br />
<br />
<br />
|-<br />
|UKI-SCOTGRID-GLASGOW<br />
|<span style="color:green">Torque/Maui - Local</span><br />
|<span style="color:green">Becomes unresponsive at times of high load or nodes being un-contactable.</span><br />
|<span style="color:green">Investigating HTCondor/SoGE/SLURM as a replacement.</span><br />
|<span style="color:green">Currently CreamCE, investigating ARC CE as replacement.</span><br />
|<span style="color:green">N/A</span><br />
|<br />
<br />
|-<br />
|UKI-SOUTHGRID-BHAM-HEP<br />
||<span style="color:green">Torque/Maui</span><br />
|<span style="color:green">Maui sometimes fails to see new jobs and so nothing is scheduled</span><br />
|<span style="color:green">Will follow the consensus</span><br />
|<span style="color:green">CREAM</span><br />
|<span style="color:green"></span><br />
|<br />
<br />
<br />
|-<br />
|UKI-SOUTHGRID-BRIS<br />
|<span style="color:green">HTCondor (shared), torque + maui (local)</span><br />
|<span style="color:green">None</span><br />
|<span style="color:green">No reason to change</span><br />
|<span style="color:green">ARC & CREAM CEs</span><br />
|<span style="color:green"></span><br />
|<br />
<br />
<br />
|-<br />
|UKI-SOUTHGRID-CAM-HEP<br />
|<span style="color:green">Torque/Maui (local)</span><br />
|<span style="color:green">Torque/Maui support non-existent</span><br />
|<span style="color:green">Will follow the consensus</span><br />
|<span style="color:green">CREAM CE</span><br />
|<span style="color:green"></span><br />
|<br />
<br />
|-<br />
|UKI-SOUTHGRID-OX-HEP<br />
|<span style="color:green">Torque/Maui</span><br />
|<span style="color:green">Becomes unresponsive and unstable.</span><br />
|<span style="color:green">Moved 1/3 WN's to HTCondor</span><br />
|<span style="color:green">CREAMCE, ARC CE in production</span><br />
|<span style="color:green">OpenStack in production. Testing VAC</span><br />
|<br />
<br />
<br />
|-<br />
|UKI-SOUTHGRID-RALPP<br />
|<span style="color:green">HTCondor (Legacy Torque/Maui will be switched off soon)</span><br />
|<span style="color:green">None</span><br />
|<span style="color:green">None, just migrated from torque/maui</span><br />
|<span style="color:green">ArcCE (Legacy CreamCEs will be switched off soon</span><br />
|<span style="color:green"></span><br />
|<br />
<br />
<br />
|-<br />
|UKI-SOUTHGRID-SUSX<br />
|<span style="color:green">(Shared) Gridengine - (Univa Grid Engine)</span><br />
|<span style="color:green">None</span><br />
|<span style="color:green">No reason to change</span><br />
|<span style="color:green">CREAMCE</span><br />
|<span style="color:green"></span><br />
|<br />
<br />
<br />
|}</div>Kashif Mohammad 6ae08fa8ff