Difference between revisions of "Operations Bulletin Latest"

From GridPP Wiki
Jump to: navigation, search
()
()
 
(2,851 intermediate revisions by 40 users not shown)
Line 5: Line 5:
 
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"
 
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"
 
|-
 
|-
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Week commencing 16th March 2015
+
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Week commencing Monday 25th February 2019
 
|}
 
|}
  
Line 27: Line 27:
 
====== ======
 
====== ======
 
<!-- *********************************************************** ----->
 
<!-- *********************************************************** ----->
<!-- ***********************Start General text*********************** ----->'''
+
'''Tuesday 18th June'''
'''Tuesday 10th March'''
+
* DPM Workshop last week - come to this week's storage meeting for an in-depth look.
* CERN [http://cern.ch/Computing.Seminars computing seminars].
+
** https://indico.cern.ch/event/776832/
* Status of WNs in CVMFS
+
* DIRAC downtime this week due to the move to Slough - good luck!
* Suggestion to write actions to Vidyo chat window
+
* EGI Ops meeting this week.
* Problem with dashboards – following announced GOCDB changes last week.
+
** https://wiki.egi.eu/wiki/Agenda-2019-06-17
* cvmfs 2.1.20 is currently being certified at CERN and RAL. A small update to cvmfs-puppet is required to deploy cvmfs 2.1.20 clients.
+
** HTCondorCE commissioning ongoing, SRM decommissioning survey
* A new version of the CA Portal was released last week. It supports email address updates.
+
** State of SRM usage at DPM sites?
* JK requests those with host certificates with embedded email addresses to phase them out.
+
* CERN accounts: Note WLCG office acting as CERN guarantor for everyone, so by default everyone will be”‘LCG"’d in the database (new HR rules mean Externals automatically take on the organic-unit of the guarantor).
+
* RAL T1: Dell Force10 Z9000 Firmware update. Was on version 9.2.0.0 running on BIOS 3.0.0.3 and boot code 3.0.1.1. Testing version is 9.6.0.0P3 which runs on the same BIOS but requires boot code 3.0.1.4.
+
* For LIGO, Rob Quick informs JC that OSG has recently re-engaged with the VO and will update us on their VOMS situation in the next week.
+
* ARC CONDOR CPU time fix - see Steve Jones's TB-S email on 3rd March. (related to ATLAS submitting some longer running Multi-Core jobs which get killed with the original configuration?)
+
* A reminder that on Tuesday 10th there is a pre-GDB: [https://indico.cern.ch/event/319819/ agenda]. AM: HEP and other sciences. PM: Cloud issues.
+
* Wednesday 11th there is a GDB: [https://indico.cern.ch/event/319745/timetable/#20150311.detailed Agenda] covering EGI/WLCG work, European procurements, WLCG operational costs and supporting other sciences.
+
  
  
 +
'''Tuesday 11th June'''
  
 +
* Technical Meeting last week about the New JSON based Information System:  https://indico.cern.ch/event/821105/
 +
* This week we will get round to looking at the outcome of the Security Day (and HEPSYSMAN).
 +
* The DPM Workshop is this week: https://indico.cern.ch/event/776832/ There's a Vidyo Room planned for people to listen in.
 +
* CentOS7 Migration https://twiki.cern.ch/twiki/bin/view/AtlasComputing/CentOS7Deployment
 +
<!-- ***********************Start General text*********************** ----->'''
 +
'''Tuesday 4th June 2019'''
 +
* The Security Day  + HEPSYSMAN was on t'other week: https://indico.cern.ch/event/721692/
 +
* Please can sites review their GOCDB information: https://ggus.eu/?mode=ticket_info&ticket_id=141296
 +
* iris.ac.uk VO - Andrew explained this.
 +
* New(-ish) HEPOSLib release - 7.2.9 https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineTable
 +
* Gareth's query about the WS interface to ARC on TB-SUPPORT
 +
* Anything else?
  
  
'''Tuesday 3rd March'''
 
* There was an [https://indico.egi.eu/indico/conferenceDisplay.py?confId=2373 EGI OMB last Thursday]. Highlights for awareness:
 
** [https://wiki.egi.eu/wiki/Core_EGI_Activities EGI core activities] to be reviewed for next phase.
 
** EGI provides a catch-all CA; Request to register iRODS endpoints in GOCDB; [http://www.eubrazilcloudconnect.eu EU Brazil Cloud Connect mentioned]; May [http://go.egi.eu/reg2015  conference registration]open. Next OMB 26th March.
 
** [http://accounting-devel.egi.eu/show.php?SubRegion=1.65&query=sum_normcpu&startYear=2015&startMonth=1&endYear=2015&endMonth=2&yrange=SITE&xrange=NUMBER+PROCESSORS&groupVO=all&chart=GRBAR&scale=LIN&localJobs=onlygridjobs Check the status of multi-core publishing].
 
** New release of ops portal coming 16th March. See [https://wiki.egi.eu/wiki/Operations_Portal_Release_Schedule Schedule] and [http://operations-portal.egi.eu/home/tasksList/release_id/10 version 3.1.2 changes].
 
** New [https://wiki.egi.eu/wiki/Performance  performance documents]. QoS; ROD; Availability/Reliability.
 
** Security: FedCloud survey done. EGI-CSIRT collaboarating with FedCloud. Security challenges coming - communications one done; next pull in payload and assess capabilities. Also looking at VM incident handling.
 
** [https://indico.egi.eu/indico/getFile.py/access?contribId=3&resId=0&materialId=slides&confId=2373 ARGO monitoring]: Modular architecture. Example deployment models given for EGI/VO/NGI. Uses message brokering - between collectors and review engine. Listed advantages over ACE. Problems with SAM Nagios given (e.g. effort to upgrade, support). ARGO is SAM refactored (no heavy DB). -> Need a UK view on this as being pushed for EGI-Engage.
 
** VO SLAs: to clarify expectations. Some updates including FedCloud inclusion.
 
** EGI Engagement ([http://go.egi.eu/engagementstrategy  strategy]): Call for unfunded partners to join core competency centres: BBMRI, DARIAH, EISCAT_3D, ELIXIR, EPOS, INSTRUCT, LifeWatch & Disaster mitigation.
 
** New infrastructures in preparatory project phase: MIRRI – microbal resources; EuroDISH – food, nutrition, diseases; ISBE – systems biology. Opportunities for national engagement: EMBRC: Marine biological resource centre (... + UK); ERINHA: Highly Pathogenic Agents (...+ UK); Euro-BioImaging (...+ UK & EMBL).INSTRUCT: Structural biology (many countries). CLARIN (call out).
 
** [https://wiki.egi.eu/wiki/VT_GAPF Genome analysis & protein folding pilot ongoing]: Chipster, RSAT, READemption.
 
** Ongoing integration of ELIXIR reference datasets into EGI. Several use-cases for [http://go.egi.eu/cloud FedCloud] mentioned (any more sites want to join?)
 
 
* Preliminary agendas are available for the [https://indico.cern.ch/event/319819/ pre-GDB] and [https://indico.cern.ch/event/319745/timetable/#20150311.detailed GDB].
 
* The deadline for CHEP registrations is 15th March. ([http://chep2015.kek.jp Bulletins])
 
 
'''Tuesday 24th February'''
 
* [https://indico.cern.ch/event/348657/ CERN VM users workshop - 3rd-5th March].
 
* Registration needed for March: [https://indico.cern.ch/event/319819/ pre-GDB] (HEP & Other sciences + Cloud issue) and [https://indico.cern.ch/event/319745/overview GDB]
 
*  A downtime of the UK eScience CA Services on Wednesday 4th March from around 08:30 till about 10:00 has been announced. They should be considered "at risk" until the afternoon.
 
* Atlas offline software releases in AGIS was discussed last week (Update on Lancaster?)
 
* [http://www.gridpp.ac.uk/gridpp34/index.html GridPP34 registration] is now open.
 
* Tom is helping to kick-start the (supported) [https://www.gridpp.ac.uk/wiki/VO_Cleanup_Campaign VO clean-up] which is overdue.
 
* [https://espace2013.cern.ch/WLCG-document-repository/ReliabilityAvailability/2015/january-15/ WLCG final T2 A/R figures for January 2015] published.
 
* Elena prepared a bulletin summary of the ATLAS s/w & computing week - see lower down this page.
 
* A reminder that EGI 2015 (Lisbon) has an [http://conf2015.egi.eu open call for contributions].
 
  
 
<!-- **********************End General text************************** ----->
 
<!-- **********************End General text************************** ----->
Line 83: Line 61:
 
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"
 
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"
 
|-
 
|-
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Operations Coordination - [https://indico.cern.ch/categoryDisplay.py?categId=4372 Agendas]
+
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | WLCG Operations Coordination - [https://indico.cern.ch/categoryDisplay.py?categId=4372 Agendas][https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsCoordination Wiki Page]
 
|-
 
|-
 
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |
 
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |
Line 90: Line 68:
 
<!-- *********************************************************** ----->
 
<!-- *********************************************************** ----->
 
<!-- ***********************Start ops coord text*********************** ----->
 
<!-- ***********************Start ops coord text*********************** ----->
'''Monday 9th March'''
+
'''Tuesday 11th June'''
* There was a WLCG ops meeting on Thursday: [https://indico.cern.ch/category/4372/ Agenda] | [https://indico.cern.ch/event/378502/material/minutes/ Minutes].
+
 
* In summary:
+
* I got stuck (figuratively) in my machine room last Thursday afternoon so missed it. [https://indico.cern.ch/event/823800/ Agenda.] [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes190606 Minutes.] Ste was there - any observations?
** News: Looking at survey – GDB & Workshop.  
+
 
** M/W news: Baselines – new FTS 3.3.32 (shares fix).  
+
'''Tuesday 14th May'''
** M/W issues: RAL [https://ggus.eu/?mode=ticket_info&ticket_id=112107 problem upgrading gfal2] in WN ( conflict with gfal). New vulnerability (SSL).  
+
* Next meeting this Thursday.
** T0/T1 services: Various FTS and SE upgrades at CERN, RAL and BNL...
+
 
** Question about WN and UI status and maintenance wrt CVMFS.
+
'''Tuesday 9th April'''
** T0 news: LHCb LFC migration fine. Checking who uses LFC. VOMS-admin in place - some issues require follow-up.
+
* Ops meeting last week: https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes190404
** T1/T2: NTR
+
 
** ALICE: High activity. [https://indico.cern.ch/event/354209/ T1/T2 workshop]. Some VOMS-admin problems.
+
 
** ATLAS: Taking cosmics. MC15 started (consider start of Run2). Care with maintenance requested. Transatlantic link problems encountered.
+
'''Tuesday 26th March'''
** CMS: Taking cosmics. Moderate load. Tier-1 tape tests continue. VOMRS migration in progress. Migration to a single global Condor pool for Analysis and Production (For T2s: 80% VOMS pilot; 10% VOMS production; 10% other).
+
* For information, there was an Ops meeting on the 7th: https://twiki.cern.ch/twiki/bin/viewauth/LCG/WLCGOpsMinutes190307 (I think we might have dicussed this one).
** LHCb: "Run1 Legacy stripping" finished. Some VOMRS migration issues. LFC to DFC migration done.
+
* Next meeting booked for 4th April
** glexec: Panda campaign at 63 sites (+9)
+
 
** RFC proxies: Presentation at GDB. Checking experiment usage. SAM-Nagios fix easy. Default later this year.
+
'''Monday 11th February 2019'''
**  MJ features: NTR
+
* There was a WLCG ops coordination meeting last Thursday. You can view the meeting notes [https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes190207 here].
** M/W readiness: Recent Storm/dCache/DPM validations done, more happening. ARC-CE tests for CMS in preparation. Next meeting 18th March.
+
** End of CREAM support Dec 2020.
** Multicore: Main objectives achieved. Entering 'passive' mode.
+
** Migration from CREAM to be discussed at the EGI conference 6-8 of May in Amsterdam.
** IPv6: NTR
+
** Operational intelligence discussion ([https://indico.cern.ch/event/795889/sessions/302520/attachments/1792600/2920978/Operational_Intelligence_-_WLCG_Ops_Coord-2.pdf see slides]).  
** Squid proxy & http proxy discovery: NTR
+
** Experiment updates (see ATLAS discussion on DOMA)
** Network & transfers WG: [(https://indico.cern.ch/event/372546/ Meeting 18th Feb]. Deadline for 3.4.1 passed. Campaign to correct some configurations. Testbed for 3.4.2rc. New meshes (inc. IPv6). Testing and evaluation of the pilot instances for esmond/maddash ongoing. Production instance: [http://psomd.grid.iu.edu psomd.grid.iu.edu]. LHCb pilot project to provide experiment agnostic prototype to access central datastore (esmond). Extending ATLAS FTS performance study to CMS and LHCb. Next [https://indico.cern.ch/event/379017/ meeting 18th March].
+
** WG updates (few).
** Temporary ATLAS solution has been found to publish the HTCondor CEs in the BDII and OIM in a way that satisfies both ATLAS and SAM needs.
+
 
** Next meeting 19th March.
+
 
  
  
'''Monday 2nd March'''
 
* There is a WLCG ops meeting coming up this Thursday. Any T1/T2 issues?
 
* There was a [https://indico.cern.ch/event/377034/ perfSONAR data extraction] meeting today.
 
  
  
Line 136: Line 111:
 
<!-- *********************************************************** ----->
 
<!-- *********************************************************** ----->
 
<!-- ***********************Start T1 text*********************** ----->
 
<!-- ***********************Start T1 text*********************** ----->
'''Tuesday 10th March'''
+
 
* We had a problem on our network around 3pm last Friday (6th March) with very high packet lost for around 20 minutes.
+
'''17 June 2019''' Report for the Experiments Liaison Report (17/06/2019) is [https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2019-06-17 here].
* The problems with our primary network router are still being followed up. We had a problem and were not able to try out the spare switch last Thursday. This has been re-scheduled for early on this Thursday morning (12th March). An 'At Risk' has been declared in the GOC DB.
+
<!-- *********************************************************** ----->
* Tomorrow (Wednesday 11th March) we plan to update the firmware in our network switches to fix a problem of periodic reboots.
+
 
<!-- **********************End T1 text************************** ----->
 
<!-- **********************End T1 text************************** ----->
 +
* Ongoing, we are seeing high outbound packet loss over IPv6.  Central networking performed a firmware update to the border routers but this didn’t resolve the issue.  Plan to move connections to the new border routers in Mid June.  Will do this before trying to debug any further.
 +
 +
'''11 June 2019''' Report for the Experiments Liaison Report (10/06/2019) is [https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2019-06-10 here].
 
<!-- *********************************************************** ----->
 
<!-- *********************************************************** ----->
 +
<!-- **********************End T1 text************************** ----->
 +
* Ongoing, we are seeing high outbound packet loss over IPv6.  Central networking performed a firmware update to the border routers but this didn’t resolve the issue.  Plan to move connections to the new border routers in Mid June.  Will do this before trying to debug any further.
 +
* Three certificates were revoked mistakenly on ARGUS on Thursday.  All SAM tests failed until this was fixed the next morning.  Batch farm also did not start any new jobs during this time.  We used this accidental draining to reboot nodes that needed to pick up security patching.
 +
* LHCb Castor instance has been completely disabled for LHCb and will be decommissioned.
 +
* Brian Davies has transferred from his role as GridPP Tier-2 Storage Support Officer and has joined the Tier-1 Production Team.  Although this has happened with immediate effect he will still be available for ad-hoc/informal storage support.
 +
 
|}
 
|}
<!-- ****************End T1****************** ----->
 
  
 
<!-- ****************Start Storage & DM****************** ----->
 
<!-- ****************Start Storage & DM****************** ----->
Line 155: Line 137:
 
<!-- ******************Edit start********************* ----->
 
<!-- ******************Edit start********************* ----->
  
'''[http://storage.esc.rl.ac.uk/weekly/20150311-minutes.txt Wedn 11 March]'''
+
'''[http://storage.esc.rl.ac.uk/weekly/20191030-minutes.txt Wed 30 Oct]'''
* Few odd things encountered with DPM 1.8.9 upgrades. Sites not using puppet hack workarounds.
+
* DOME upgrade problems at Edinburgh
* An audience with [http://www.ligo.org/ LIGO]
+
* Data management support/development for IRIS users
  
'''Tuesday 9th March'''
+
'''[http://storage.esc.rl.ac.uk/weekly/20191023-minutes.txt Wed 23 Oct]'''
* DPM with puppet issues (trying this [https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Admin/InstallationConfigurationPuppetSimple documented approach].
+
* Rucio reporting
* Command for backing up DPM (see email to TB-SUPPORT on 3rd) - is it documented!?
+
  
'''Tuesday 3rd March'''
+
'''Wed 16 Oct'''
* DiRAC FS backup to RAL (and UCL) very likely to go ahead. GridPP encourages support from co-located sysadmins to help setup transfers... but first there needs to be a discussion of how the data will be handled. More soon.
+
* CEPH workshop at CERN report
  
'''Monday 23rd February'''
+
'''[http://storage.esc.rl.ac.uk/weekly/20191002-minutes.txt Wed 02 Oct]'''
* From February 17th expect xrootd version 4 will be added to the EPEL repository. Update not transparent if xrootd supported.
+
* Safe to upgrade to DPM 1.13 but make sure the BDII is working if you support DIRAC
 +
* Roadmap for xroot and http TPC for RAL FTS(es)
  
'''Tuesday 17th February'''
+
'''[http://storage.esc.rl.ac.uk/weekly/20190925-minutes.txt Wed 25 Sept]'''
* Chris suggests linking the [https://www.gridpp.ac.uk/wiki/WebDAV WebDav page] from the storage wiki.
+
* Storage support for IRIS VOs?
  
 +
'''[http://storage.esc.rl.ac.uk/weekly/20190918-minutes.txt Wed 18 Sept]'''
 +
* Report from yesterday's Rucio Face Meeting at Coseners
 +
* Suggestions for following up from yesterday's CEPH day hosted by CERN
 +
 +
'''[http://storage.esc.rl.ac.uk/weekly/20190911-minutes.txt Wed 11 Sept]'''
 +
* Storage related stuff at the FNAL (pre-)GDBs
 +
* DOME upgrade tickets for non-DOME DPM sites
 +
 +
'''[http://storage.esc.rl.ac.uk/weekly/20190904-minutes.txt Wed 04 Sept]'''
 +
* Banning in SSC not entirely successful in non-DOME DPM, and end of support is nigh; tickets to upgrade will go out shortly.
 +
* Storage-and-data-management-wise, GridPP43 was interesting although no-one volunteered to install the next CEPH.
  
  
Line 179: Line 172:
  
 
<!-- ****************End Storage & DM****************** ----->
 
<!-- ****************End Storage & DM****************** ----->
 +
<!-- ****************Start T2EVO****************** ----->
 +
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"
 +
|-
 +
| style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Tier-2 Evolution - [https://its.cern.ch/jira/browse/GRIDPP/ GridPP JIRA]
 +
|-
 +
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |
 +
 +
====== ======
 +
<!-- ******************Edit start********************* ----->
 +
'''Tuesday 30 Apr 2019'''
 +
* ATLAS CernVM4 VMs (equivalent to CentOS 7.6) being tested
 +
 +
<!-- ******************Edit stop********************* ----->
 +
<!-- ************************************************************ ----->
 +
|}
 +
 +
<!-- ****************End T2EVO****************** ----->
 
<!-- ****************Start Accounting****************** ----->
 
<!-- ****************Start Accounting****************** ----->
  
Line 189: Line 199:
 
===== =====
 
===== =====
 
<!-- ******************Edit start********************* ----->
 
<!-- ******************Edit start********************* ----->
'''Tuesday 10th March'''
+
'''Tuesday 6th February'''
* Possible delay at Birmingham. Worth checking.
+
* HEPSPEC06 on recent Intel CPUs. [https://www.gridpp.ac.uk/wiki/HEPSPEC06 GridPP benchmarking page]
  
'''Tuesday 24th February'''
 
* UKI-SOUTHGRID-SUSX now highlighted as not publishing for 3 months.
 
  
* A reminder to keep updating the [https://www.gridpp.ac.uk/wiki/HEPSPEC06 HEPSPEC06 tables].
+
'''Tuesday 30th January'''
 +
* Please keep updating the [https://www.gridpp.ac.uk/wiki/HEPSPEC06 GridPP benchmarking page].
 +
 
 +
'''Tuesday 24th Oct'''
 +
* A talk on end-to-end validation of any site's APEL accounting was presented at the WLCG Accountuing Taskforce meeting, 19th Oct. There slides exist here: https://indico.cern.ch/event/673843/contributions/2756986/attachments/1542818/2420233/blackBoxAccTesting.pdf
 +
 
 +
 
 +
'''Monday 16th January'''
 +
* The discussion topic for next week will be accounting comparisons. Please note Alessandra's comments last week.
 +
 
 +
'''Monday 14th November'''
 +
* Alessandra has written an [https://twiki.cern.ch/twiki/bin/view/LCG/AccountingFAQ FAQ] to extract numbers from ATLAS and APEL avoiding the SSB.
 +
 
 +
'''Monday 26th September'''
 +
* A problem with the APEL Pub and Sync tests developed last Tuesday and was resolved on Wednesday. This had a temporary impact on the accounting portal.
  
 
* [http://accounting.egi.eu/egi.php?SubRegion=1.67&query=normcpu&startYear=2014&startMonth=8&endYear=2014&endMonth=9&yRange=SITE&xRange=VO&voGroup=lhc&chart=GRBAR&scale=LIN&localJobs=onlygridjobs APEL status]: An issue at Sheffield?
 
* [http://accounting.egi.eu/egi.php?SubRegion=1.67&query=normcpu&startYear=2014&startMonth=8&endYear=2014&endMonth=9&yRange=SITE&xRange=VO&voGroup=lhc&chart=GRBAR&scale=LIN&localJobs=onlygridjobs APEL status]: An issue at Sheffield?
Line 206: Line 228:
 
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"
 
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"
 
|-
 
|-
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Documentation - [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=area KeyDocs]
+
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Documentation - [https://www.gridpp.ac.uk/keydocs?sort=area KeyDocs]
 
|-
 
|-
 
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |
 
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |
Line 214: Line 236:
 
<!-- ******************Edit start********************* ----->
 
<!-- ******************Edit start********************* ----->
  
See the [https://www.gridpp.ac.uk/php/KeyDocs.php?sort=reviewed worst KeyDocs list] for documents needing review now and the names of the responsible people.
+
''' Tue 9th July 2019'''
 +
 
 +
LHCb has added this to their requirements:
 +
 
 +
Sites not having an SRM installation must provide:
 +
 
 +
* disk only storage
 +
* a GRIDFPT endpoint (a single dns entry)
 +
* an XROOT endpoint (a single dns entry)
 +
* a way to do the accounting (preferably following the WLCG TF standard: https://twiki.cern.ch/twiki/bin/view/LCG/StorageSpaceAccounting)
 +
 
 +
''' Tue 16th April 2019'''
 +
 
 +
Minor change to LHCB requirements in Approved VOs:
 +
 
 +
* Sites having migrated to the Centos7 (including "Cern Centos7") operating system or  later versions are requested to provide support for singularity containers.
 +
 
 +
https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs#VO_Resource_Requirements
 +
 
 +
'''Tue 2nd April 2019'''
 +
 
 +
Changes to Approved for DUNE (and LZ)
 +
 
 +
There is a new "Approved VOs" document showing new settings for DUNE. And, since both LZ's voms servers now show up in the EGI Portal, I've removed the (now spurious) entry for voms.hep.wisc.edu that was formerly being inserted by hand.
 +
 
 +
https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs
 +
 
 +
I've also created a new set of RPMs (each has all LSC, VOMS, and XML for one VO) for all approved VOs. See Approved VOs doc for details, version is 1.9.
 +
 
 +
Sites with DUNE services need to update using whatever method they employ.
 +
 
 +
Special note: Since voms1.fnal.gov changed yesterday, I've updated documentation for that right away. But voms2.fnal.gov remains as it is until 23rd April (3 weeks). I'll update documentation for that when it is closer. Staggering the updates like this gives a time margin, so any particular site can have at least one service properly configured at any time.  But it does mean that two site updates are needed; one now, and one in three weeks. However, since it's sufficient for only one (of the two) voms servers to be configured properly, sites could save effort and wait until (say) 19th April, then update both at once. But I can't dictate what site admins should do. It's your call.
 +
 
 +
Ste
 +
 
 +
 
 +
'''Tuesday 5th Mar 2019'''
 +
New VOMs details for Biomed:
 +
https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs
 +
 
 +
New VOMs rpms to match.
 +
 
 +
http://hep.ph.liv.ac.uk/~sjones/RPMS.voms/
 +
 
 +
'''Tuesday 12th Feb 2019'''
 +
Documentation done for HTCondor-CE apel accounting.
 +
https://twiki.cern.ch/twiki/bin/view/LCG/HtCondorCeAccounting
 +
 
 +
 
 +
'''Tuesday 12th Feb 2019'''
 +
New CA DN for Biomed
 +
<pre>
 +
< VOMS_CA_DN="'/C=FR/O=CNRS/CN=GRID2-FR' "
 +
---
 +
> VOMS_CA_DN="'/C=FR/O=MENESR/OU=GRID-FR/CN=AC GRID-FR Services' "
 +
</pre>
  
'''Tuesday 3rd March'''
+
'''Tuesday 5th Feb 2019'''
* Proposals for about 4 keydocs to be removed/downgraded. For discussion at next core-ops or focus meeting.
+
For enmr, certificate of voms-02.pd.infn.it
  
'''Tuesday 24th February'''
+
New DN: /DC=org/DC=terena/DC=tcs/C=IT/L=Frascati/O=Istituto Nazionale di Fisica Nucleare/CN=voms-02.pd.infn.it,
* Question of where to direct new VOs. [http://operations-portal.egi.eu/vo/registrationWelcome EGI welcome page]?
+
New CA_DN: /C=NL/ST=Noord-Holland/L=Amsterdam/O=TERENA/CN=TERENA eScience SSL CA 3
  
'''Tuesday 17th February'''
+
Please check approved VOs: https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs
* KeyDocs review continues. Deadline for removals requests 25th February.  
+
  
'''Tuesday 10th February'''
 
* Survey of KeyDocs has started. If you own a document please take note. Documents remaining after this exercise will need to be brought fully up-to-date.
 
* A reminder that Steve created this [https://www.gridpp.ac.uk/wiki/Current_Activities Current Activities] capture page. Only ARGUS at the moment!?
 
* See also the documents from Tom linked in the 'Other VOs' section.
 
  
'''Tuesday 3 Feb '''
 
  
* New VO, LSST, in [https://www.gridpp.ac.uk/w/index.php?title=GridPP_approved_VOs Approved VOs].
+
'''General note'''
  
* New section in Wiki called "Project Management Pages".
+
See the [https://www.gridpp.ac.uk/keydocs?sort=reviewed worst KeyDocs list] for documents needing review now and the names of the responsible people.
The idea is to cluster all Self-Edited Site Tracking Tables
+
in here. Sites should keep entries in [[Current Activities]]
+
up to date. Once a Self-Edited Site Tracking Tables has
+
served its purpose, PM to move it to  [[Historical Archive]]
+
or otherwise dispose of the table.
+
  
 
|}
 
|}
Line 247: Line 313:
 
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"
 
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"
 
|-
 
|-
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Interoperation - [https://wiki.egi.eu/wiki/Grid_Operations_Meetings EGI ops agendas]
+
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Interoperation - [https://wiki.egi.eu/wiki/Operations_Meeting EGI ops agendas] [https://indico.egi.eu/indico/category/32/ Indico schedule]
 
|-
 
|-
 
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |
 
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |
Line 253: Line 319:
 
===== =====
 
===== =====
 
<!-- ******************Edit start********************* ----->
 
<!-- ******************Edit start********************* ----->
'''Monday 9th March'''
 
  
* The agenda for February's EGI ops meeting is [https://wiki.egi.eu/wiki/Agenda-09-03-2015 here]. Minutes are [https://indico.egi.eu/indico/materialDisplay.py?materialId=minutes&confId=2465 here]
+
''' 14 May 2019 '''
 +
* Meeting for May cancelled. Next meeting is on 10th June.
 +
 
 +
''' Monday 11th March '''
 +
 
 +
* Agenda: https://wiki.egi.eu/wiki/Agenda-2019-03-11 
 +
* UMD 4.8.2 released :  ARC 15.03.19 - To address how ARC counts HELD job
 +
* DPM Legacy mode is going to end in June 2019
 +
* EGI will open tickets after June
 +
* HTCondor CE accounting status?
 +
* http://egi.ui.argo.grnet.gr/ : https://ggus.eu/index.php?mode=ticket_info&ticket_id=139877 
 +
 
 +
 
 +
''' Tuesday 12th February '''
 +
* There was an EGI ops meeting yesterday - [https://indico.egi.eu/indico/event/4320/ Agenda]. We were asked to review the IPv6 readiness information ([https://wiki.egi.eu/w/index.php?title=IPV6_Assessment see here]). We should perhaps link in the GridPP IPv6 table for updates status.
 +
 
 +
 
 +
''' Friday 1st February '''
 +
 
 +
* Early adopters needed for HTcondor CE , https://ggus.eu/index.php?mode=ticket_info&ticket_id=139377
 +
 
 +
''' Thursday 17th January EGI OMB meeting '''
 +
 
 +
* CREAMCE to be out of support by Dec 2020
 +
* More effort to fix accounting issue of HTCondor CE
 +
 
 +
''' Monday 14th January 2019 '''
 +
* Agenda https://wiki.egi.eu/wiki/Agenda-2019-01-14
 +
 
 +
* No UK specific issue mentioned.
 +
 
 +
* IPv6 readiness plan is going to be summarized at OMB.
 +
 
 +
'
  
** APEL 1.4.0
 
*** Added Month and Year columns to primary key of CloudSummaries table in cloud schema.
 
** DPM-Xrootd 3.5.2 is in EPEL stable - this is the first version of the component compatible with xrootd4
 
** gLExec-wn - v. 1.2.3: lcmaps-plugins-c-pep 1.3.0-1 & mkgltempdir 0.0.5-1
 
*** "The lcmaps-plugins-c-pep-1.3.0-1 preferably needs the argus-pep-api-c-2.3.0. This version will be released into EMI & UMD repositories in a near future."
 
** UMD 3.11.0 released on 16.02.2014, UMD 3.11.1 released on 4.03.2014
 
** lcg-CA 1.62 noted with an intention to broadcast these as they occur as opposed to monthly.
 
** EGI looking at the decommissioning of SL5, possibly by end of 2015, as a byproduct of adding CentOS 7 to UMD. NGIs to make a note if extended SL5 support is required.
 
** Vincenzo Spinoso has joined EGI Ops team from NGI_IT. Vincenzo will chair EGI Ops.
 
** Next meeting is April 20th.
 
  
 
<!-- ******************Edit stop********************* ----->
 
<!-- ******************Edit stop********************* ----->
Line 282: Line 370:
 
===== =====
 
===== =====
 
<!-- ******************Edit start********************* ----->
 
<!-- ******************Edit start********************* ----->
 +
'''Tuesday 4th July'''
 +
* There were a number of useful links provided in the monitoring talks at the WLCG workshop in Manchester - especially those in the [https://indico.cern.ch/event/609911/timetable/#20170621 Wednesday sessions].
 +
 +
'''Monday 13th February'''
 +
* This category is pretty much inactive. Are there any topics under "monitoring" that anyone wants reported at this ops meeting? If not we will remove this section from the regular updates area of the bulletin and just leave the main links.
 +
 +
'''Tuesday 1st December'''
 +
* Sites are kindly invited to update the monitoring status page at https://www.gridpp.ac.uk/wiki/Site_monitoring_status
 +
 +
 +
'''Tuesday 16th June'''
 +
* F Melaccio & D Crooks decided to add a [https://www.gridpp.ac.uk/wiki/Monitoring_FAQs FAQs section] devoted to common monitoring issues under the monitoring page.
 +
* Feedback welcome.
 +
  
'''Monday 7th December
 
  
* Meeting last Friday - agenda: https://indico.cern.ch/event/356853/ minutes: https://indico.cern.ch/event/356853/material/minutes/1.pdf
 
* This was the wrap-up meeting of the consolidation TF; the mailing list will remain extant for a while yet.
 
 
<!-- ******************Edit stop********************* ----->
 
<!-- ******************Edit stop********************* ----->
 
|}
 
|}
Line 300: Line 399:
 
===== =====
 
===== =====
 
<!-- ******************Edit start********************* ----->
 
<!-- ******************Edit start********************* ----->
'''Monday 16th March'''
+
'''Tuesday 5th February'''
* Low availability alarm for UCL, but no alarm for the actual underlying cause (disk full on DPM).
+
* Birmingham decommissioning of SRM and BDII still going on so tickets are on hold.
* Ticket now issued.
+
* Few availability tickets on hold
* Kashif -> Daniela
+
* Lancaster has WebDav ticket on hold which seems to be effect of DOME rollout
 +
 
 +
'''Tuesday 14th August'''
 +
* A couple of new availability tickets (QMUL and Lancaster), both for well-publicised reasons. Otherwise quiet. AM on shift this week.
  
'''Monday 9th March'''
 
* There was a problem with the upgrade of the dashboard when it lost the scope of the ROD role.
 
* Generally quiet.
 
* Low availability alarm for RALPP.
 
* Gareth->Kashif
 
  
 
<!-- ******************Edit stop********************* ----->
 
<!-- ******************Edit stop********************* ----->
Line 322: Line 419:
 
===== =====
 
===== =====
 
<!-- ******************Edit start********************* ----->
 
<!-- ******************Edit start********************* ----->
'''Tuesday 17th February'''
+
'''Monday 20th November'''
* A message from Maria Dimou: A request to you and the sites you collaborate with, to please install the MW Package Reporter.
+
* IPv6: https://www.gridpp.ac.uk/wiki/IPv6_site_status (up-to-date as of November)
The current version, [https://twiki.cern.ch/twiki/bin/view/LCG/MiddlewarePackageReporter documented in the wiki] , it
+
* Batch systems and WN moves to SL7/CentOS7: https://www.gridpp.ac.uk/wiki/Batch_system_status (added column for date of last update)
is very easy to install and addresses all site and security requirements.
+
 
* Quick check on: 1. ARC CE readiness testing. 2. Machine job features testing.  
+
 
  
'''Tuesday 20th January'''
 
* From [https://indico.cern.ch/event/319743/contribution/9/material/slides/3.pdf Cristina's GDB talk] last week note that EMI repositories will be frozen and the product team releases will become UMD-preview (repository content and webpages similar but now managed).
 
  
'''References'''
+
'''Historical References'''
  
 
* Staged Rollout pages (now separated into EMI1 & 2), and the page listing the deployed versions is extractable from the bdii, so they should all be reasonably up-to-date:
 
* Staged Rollout pages (now separated into EMI1 & 2), and the page listing the deployed versions is extractable from the bdii, so they should all be reasonably up-to-date:
Line 347: Line 442:
 
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"
 
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0.5em 1em 0;"
 
|-
 
|-
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Security - [http://www.gridpp.ac.uk/security/inchand/Incident.html Incident Procedure] [http://www.gridpp.ac.uk/security/policies/index.html Policies] [https://www.gridpp.ac.uk/wiki/SD_rota Rota]
+
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Security - [https://www.gridpp.ac.uk/wiki/Report_Security_Incident Incident Procedure] [https://wiki.egi.eu/wiki/SPG:Documents Policies] [https://www.gridpp.ac.uk/wiki/SD_rota Rota]
 
|-
 
|-
 
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |
 
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |
  
 
===== =====
 
===== =====
'''Monday 9th March'''
+
'''Tuesday 11th June
* [https://wiki.egi.eu/wiki/EGI_IGTF_Release Trust anchor release 1.62] (missing announcement!).
+
* From DG: "1.62 is indeed out and was released by the IGTF on Feb 23rd. The EGI release by an unfortunate coincidence got linked to a VOMS update for UMD (v3.3.3), so was released only today."
+
  
'''Tuesday 24th February'''
+
* NTR
* Tracking approvals in GOCDB.
+
* EGI SVG Advisory - dCache vulnerability for some access methods [SVG EGI-SVG-2015-8183]
+
* New security officer starts this week!
+
 
+
'''Tuesday 17th February'''
+
* The IGTF is about to release (23/2) an update to the trust anchor repository (1.62)
+
** This release includes two CAs where the issuer name, but not the end-user names, change. To make this change transparent, VOMS and VOMS-Admin operators are kindly but urgently requested to [ http://italiangrid.github.io/voms/release-notes/voms-admin-server/3.3.2/ review their installation].
+
 
+
* The EGI [https://operations-portal.egi.eu/csiDashboard security dashboard].
+
  
 
|}
 
|}
Line 378: Line 462:
 
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0 0 0.3em;"
 
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0em 0 0 0.3em;"
 
|-
 
|-
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Services - [http://netmon02.grid.hep.ph.ic.ac.uk:8080/maddash-webui/index.cgi PerfSonar dashboard] | [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS]
+
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | Services - [https://psmad.opensciencegrid.org/maddash-webui/index.cgi?dashboard=UK%20Mesh%20Config PerfSonar production dashboard] |[https://psetf.opensciencegrid.org/etf/check_mk/index.py?start_url=%2Fetf%2Fcheck_mk%2Fview.py%3Fhostgroup%3DUK%26opthost_group%3DUK%26view_name%3Dhostgroup PerfSonar ETF] | [http://opensciencegrid.org/networking/ OSG Networking and perfSONAR pages] | [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS]
 
|-
 
|-
 
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |
 
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |
Line 386: Line 470:
 
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).
 
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).
  
'''Tuesday 10th March'''
+
'''Monday 5th March'''
* From the recent WLCG meeting, [https://indico.cern.ch/event/377034/material/slides/1.pdf two slides (1 & 2)] give the direction of the network monitoring and metrics progress: integration of perfSONAR event types into experiment monitoring and an architecture for data to get from RSV probes to client. Components described on slide 3.
+
* Next LHCOPN and LHCONE meeting: [https://indico.cern.ch/event/772031/ Umeå, Sweden, 4-5 Jun 2019]. Registration required.
* The [https://indico.cern.ch/event/376098/ next LHCOPN and LHCONE joint meeting] will take place on Monday 1st and Tuesday 2nd of June 2015 in Berkeley (US) (hosted by LBL and ESnet).
+
 
 +
'''Monday 10th September'''
 +
* Please check perfSONAR status [https://psetf.opensciencegrid.org/etf/check_mk/index.py?start_url=%2Fetf%2Fcheck_mk%2Fview.py%3Ffilled_in%3Dfilter%26host_regex%3Duk%26view_name%3Dsearchhost here] especially the mesh URL (should be http://psconfig.opensciencegrid.org/pub/auto/FQDN)
 +
 
 +
'''Monday 2nd July'''
 +
* Next LHCOPN and LHCONE meeting: [https://indico.cern.ch/event/725706/ Fermilab, Batavia US, 30-31October 2018]. Registration required.  
 +
 
 +
'''Monday 30th April'''
 +
* The [https://indico.cern.ch/event/725706/ next LHCOPN and LHCONE joint meeting] will take place on Tuesday the 30th and Wednesday the 31st of October 2018
 +
 
 +
'''Monday 19th February'''
  
'''Tuesday 3rd March'''
+
Please could sites upgrade their perfsonar hosts to CentOS7. Instructions are [https://opensciencegrid.github.io/networking/perfsonar/installation/ here]. Current OS versions [https://tinyurl.com/y9eode2b here].
* Winnie points out there are a multitude of dashboards for PerfSONAR! Which to use: [http://perfsonar-itb.grid.iu.edu perfsonar-itb] : [https://psomd.grid.iu.edu/WLCGperfSONAR/check_mk/ psomd] : [http://maddash.aglt2.org maddash].
+
  
'''Tuesday 3rd February'''
+
* Duncan has recreated the UK perfSONAR mesh. [http://maddash.aglt2.org/maddash-webui/index.cgi?dashboard=UK%20Config Link here]!
* There was a [https://indico.cern.ch/event/369420/ network and transfer metrics WG meeting last week]. Any feedback?
+
  
'''Tuesday 20th January'''
 
* The next [https://indico.cern.ch/event/342059/ LHCOPN/LHCONE meeting is in Cambridge 9-10 February]. 
 
* Perfsonar is expected to be available at sites - where are we with the reinstall?
 
  
 
<!-- ******************Edit stop********************* ----->
 
<!-- ******************Edit stop********************* ----->
Line 412: Line 501:
 
===== =====
 
===== =====
 
<!-- ******************Edit start********************* ----->
 
<!-- ******************Edit start********************* ----->
'''Monday 9th March 2015, 15.00 GMT'''<br/ >
 
From last week's crusty ticket round up:
 
  
'''Tier 1'''<br/ >
+
32 Open Tickets this week, which is an in depth a look as I've been able to take.
[https://ggus.eu/?mode=ticket_info&ticket_id=109694 109694]- 28/10/14<br/ >
+
Matt M has managed to reclaim his tickets after a certificate change orphaned his old ones. Progress has resumed. Duncan has asked Matt to retry his failing tests with a simple copy to local disk example.
+
 
+
[https://ggus.eu/?mode=ticket_info&ticket_id=108944 108944]- 1/10/14<br/ >
+
CMS AAA access at RAL. Andrew posed a question to the CMS xroot experts last week - if you have their details it might be a good idea to involve them in the ticket.
+
 
+
'''100IT'''<br/ >
+
[https://ggus.eu/?mode=ticket_info&ticket_id=108356 108356]- 10/9/14<br/ >
+
This VMCatcher ticket is still stuck waiting for a reply. Deafening silence for our 100IT colleagues.
+
 
+
'''EFDA-JET'''<br/ >
+
[https://ggus.eu/?mode=ticket_info&ticket_id=97485 97485] - 21/9/13<br/ >
+
The Jet LHCB ticket is in the state of being wrapped up. LHCB have been removed from the local configs, and the site has been removed from LHCB's. I believe that this ticket can be terminated.
+
 
+
'''QMUL'''<br/ >
+
[https://ggus.eu/?mode=ticket_info&ticket_id=110353 110353] - 25/11/14<br/ >
+
Dan's managed to get webdav working on one SE, but not t'other. Very strange, but Dan is investigating (see also [https://ggus.eu/?mode=ticket_info&ticket_id=111942 111942]).
+
 
+
No movement on the three glexec tickets (none expected on the two tarball ones in the last week though), the Lancaster perfsonar ticket is still waiting on another batch of local tweaks (and I still need to make sense of what I'm seeing). Matt RB closed Sussex's perfsonar ticket though - nice one.
+
 
+
'''The "Normal" tickets:'''
+
 
+
Atlas gLexec Hammercloud failures (RALPP and Tier 1)<br/ >
+
[https://ggus.eu/?mode=ticket_info&ticket_id=111699 111699] (Tier 1)<br/ >
+
[https://ggus.eu/?mode=ticket_info&ticket_id=111703 111703] (RALPP)<br/ >
+
These tests were waiting on a new stable job release being made - this has been delayed (hopefully out tomorrow).
+
 
+
'''TIER 1'''<br/ >
+
[https://ggus.eu/?mode=ticket_info&ticket_id=111856 111856](19/2)<br/ >
+
This LHCB ticket about stalled jobs looks like it can be closed (LHCB no longer see a problem). ''Update - set to solved, the jobs were being killed for using too much memory.''
+
 
+
'''OXFORD'''<br/ >
+
[https://ggus.eu/?mode=ticket_info&ticket_id=112011 112011](25/2)<br/ >
+
A CMS user saw glexec failures on some nodes - Kashif asked for some more information but there has been no reply. I'd consider giving the user till the end of the week then closing the ticket if there's still no word. Waiting for reply (27/2)
+
  
 
<!-- ******************Edit stop********************* ----->
 
<!-- ******************Edit stop********************* ----->
Line 464: Line 517:
 
===== =====  
 
===== =====  
 
<!-- ******************Edit start********************* ----->
 
<!-- ******************Edit start********************* ----->
'''Tuesday 17th February'''
+
'''Monday 20th November'''
* Another period where message brokers were temporarily unavailable seen yesterday. Any news on the last follow-up?
+
* The WLCG dashboard is available here: http://dashboard.cern.ch/.
 +
* Please review the results on [http://pprc.qmul.ac.uk/~lloyd/gridpp/ Steve's test pages].
  
'''Tuesday 27th January'''
+
'''Tuesday 18th July'''
* Unscheduled outage of the EGI message broker (GRNET) caused a short-lived disruption to GridPP site monitoring (jobs failed) last Thursday 22nd January. Suspect BDII caching meant no immediate failover to stomp://mq.cro-ngi.hr:6163/ from stomp://mq.afroditi.hellasgrid.gr:6163/
+
* Following our ops discussion last week, Steve will focus [http://pprc.qmul.ac.uk/~lloyd/gridpp/ his tests] on supporting the GridPP DIRAC area and decommission the other tests.
  
* [http://southgrid.blogspot.co.uk/2014/10/nagios-monitoring-for-non-lhc-vos.html Blog about VO Nagios]
 
* [https://vo-nagios.physics.ox.ac.uk/nagios/ Oxford VO Nagios] currently monitoring gridpp, pheno, t2k.org, snoplus.snolab.ca, vo.southgrid.ac.uk.
 
  
  
Line 481: Line 533:
 
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"
 
{| style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 1em 0 0 0.3em;"
 
|-
 
|-
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | VOs - [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS] [http://operations-portal.egi.eu/vo VO IDs] [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs Approved] [http://pprc.qmul.ac.uk/~walker/votable.html VO table]
+
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: left; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-left: 0.4em; padding-top: 0.1em; padding-bottom: 0.1em;" | VOs - [https://voms.gridpp.ac.uk:8443/vomses/ GridPP VOMS] [http://operations-portal.egi.eu/vo VO IDs] [https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs Approved] [http://pprc.qmul.ac.uk/~lloyd/gridpp/votable.html VO table]
 
|-
 
|-
 
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |
 
| style="padding: 0.4em 0.4em 0.4em 0.4em;" |
Line 487: Line 539:
 
===== =====
 
===== =====
 
<!-- ******************Edit start********************* ----->
 
<!-- ******************Edit start********************* ----->
'''Monday 9th March'''
 
* SIXT VOMS parameters have changed.
 
  
'''Tuesday 17th February'''
+
'''Monday 20th November'''
Two changes to approved VOs (https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs)
+
* Tom Whyntie has requested (and been granted) access to the GridPP VO to get some pipelines working for large-scale processing and analysis of MRI scans associated with the [http://www.ukbiobank.ac.uk/ UK Biobank project].
 
+
* All VOs in the incubation page being prompted for updates by the end of November (required input for OC documents).
* LSST uses port 15003 (had been 15002, clashing with dzero)
+
* QMUL (Steve L) is following up on the biomed MoU. GridPP want to be cited in research papers for the support our resources/sites provide.
* t2k has included a note that it';s software is now distrinuted via CVMFS.
+
 
+
'''Tuesday 17th February'''
+
* Some interest from UK [http://www.ligo.org LIGO] users. Catalin setting up CVMFS. Issue with VOMS. "Our users have access to CILogon basic CA certificates, which I understand can in principal be entered into VOMS to provide a VO identity."
+
 
+
'''Tuesday 10th February'''
+
* There is now a [https://www.gridpp.ac.uk/wiki/A_quick_guide_to_CVMFS Quick Guide to CVMFS].
+
* Part of Tom's quick guide series! There is also a [https://www.gridpp.ac.uk/wiki/Quick_Guide_to_Dirac Quick Guide to DIRAC].  
+
* Feedback on both welcome... on the CVMFS front some mapping issues for regional VOs are being examined.
+
* GalDyn (UCLan galaxy simulations): test jobs have been submitted via DIRAC; now moving onto data storage testing via the official DIRAC tutorials.
+
* CERN@school has its first student on the grid with a grid certificate! LUCID data arriving.
+
* The CERN VM mechanism looks like it could provide an interesting way for new users to get a grid-ready UI by installing VirtualBox (or similar), downloading the CERN VM image, and applying a GridPP-specific context.
+
  
  
Line 524: Line 562:
 
===== =====
 
===== =====
 
<!-- ******************Edit start********************* ----->
 
<!-- ******************Edit start********************* ----->
'''Tuesday 24th February'''
+
'''Date'''
* Next review of status today.
+
 
+
'''Tuesday 27th January'''
+
* Squids not in GOCDB for: UCL; ECDF; Birmingham; Durham; RHUL; IC; Sussex; Lancaster
+
* Squids in GOCDB for: EFDA-JET; Manchester; Liverpool; Cambridge; Sheffield; Bristol; Brunel; QMUL; T1; Oxford; Glasgow; RALPPD.
+
 
+
'''Tuesday 2nd December'''
+
* [https://www.gridpp.ac.uk/wiki/Batch_system_status Multicore status]. Queues available (63%)
+
** YES: RAL T1; Brunel; Imperial; QMUL; Lancaster; Liverpool; Manchester; Glasgow; Cambridge; Oxford; RALPP; Sussex (12)
+
** NO: RHUL (testing); UCL; Sheffield (testing); Durham; ECDF (testing); Birmingham; Bristol (7)
+
 
+
* According to our [https://www.gridpp.ac.uk/wiki/Batch_system_status table] for cloud/VMs (26%)
+
** YES: RAL T1; Brunel; Imperial; Manchester; Oxford (5)
+
** NO: QMUL; RHUL; UCL; Lancaster; Liverpool; Sheffield; Durham; ECDF; Glasgow; Birmingham; Bristol; Cambridge; RALPP; Sussex (14)
+
 
+
* [http://www.gridpp.ac.uk/php/gridpp-dirac-sam.php?action=viewlcg GridPP DIRAC jobs] successful  (58%)
+
** YES: Bristol; Glasgow; Lancaster; Liverpool; Manchester; Oxford; Sheffield; Brunel; IC; QMUL; RHUL (11)
+
** NO: Cambridge; Durham; RALPP; RAL T1 (4) + ECDF; Sussex; UCL; Birmingham (4)
+
 
+
* [https://www.gridpp.ac.uk/wiki/IPv6_site_status IPv6 status]
+
** Allocation - 42%
+
** YES: RAL T1; Brunel; IC; QMUL; Manchester; Sheffield; Cambridge; Oxford (8)
+
** NO: RHUL; UCL; Lancaster; Liverpool; Durham; ECDF; Glasgow; Birmingham; Bristol; RALPP; Sussex
+
 
+
* Dual stack nodes - 21%
+
** YES: Brunel; IC; QMUL; Oxford (4)
+
** NO: RHUL; UCL; Lancaster; Glasgow; Liverpool; Manchester; Sheffield; Durham; ECDF; Birmingham; Bristol; Cambridge; RALPP; Sussex, RAL T1 (15)
+
 
+
 
+
 
+
'''Tuesday 21st October'''
+
* High loads seen in xroot by several sites: Liverpool and RALT1... and also Bristol (see Luke's TB-S email on 16/10 for questions about changes to help).
+
  
'''Tuesday 9th September'''
 
* Intel announced the new generation of Xeon based on Haswell.
 
  
  
Line 620: Line 624:
 
===== =====
 
===== =====
 
<!-- ******************Edit start********************* ----->
 
<!-- ******************Edit start********************* ----->
'''Wednesday 11th March 2015'''
+
Highlights from this meeting are now included in the Tier1 report farther up this page.
* [https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2015-03-11 Operations report]
+
* Plan to swap out problematic Tier1 network router early tomorrow (Thursday 12th March) morning. (Delayed from last week).
+
* Proposed date for Castor upgrade to version 2.1.14-15 is the 8th April. (To be confirmed).
+
* The rollout of CVMFS version 2.1.20 is just about complete across the batch farm. As we are a test site for this we will check it runs OK for a further fortnight before reporting to CERN that it is OK.
+
* All data has now been migrated off the T10000A, as well as the T10000B tapes.  
+
 
<!-- ******************Edit stop********************* ----->
 
<!-- ******************Edit stop********************* ----->
 
|}
 
|}

Latest revision as of 14:09, 30 October 2019

Bulletin archive


Week commencing Monday 25th February 2019
Task Areas
General updates

Tuesday 18th June


Tuesday 11th June

Tuesday 4th June 2019


WLCG Operations Coordination - AgendasWiki Page

Tuesday 11th June

  • I got stuck (figuratively) in my machine room last Thursday afternoon so missed it. Agenda. Minutes. Ste was there - any observations?

Tuesday 14th May

  • Next meeting this Thursday.

Tuesday 9th April


Tuesday 26th March

Monday 11th February 2019

  • There was a WLCG ops coordination meeting last Thursday. You can view the meeting notes here.
    • End of CREAM support Dec 2020.
    • Migration from CREAM to be discussed at the EGI conference 6-8 of May in Amsterdam.
    • Operational intelligence discussion (see slides).
    • Experiment updates (see ATLAS discussion on DOMA)
    • WG updates (few).




Tier-1 - Status Page

17 June 2019 Report for the Experiments Liaison Report (17/06/2019) is here.

  • Ongoing, we are seeing high outbound packet loss over IPv6. Central networking performed a firmware update to the border routers but this didn’t resolve the issue. Plan to move connections to the new border routers in Mid June. Will do this before trying to debug any further.

11 June 2019 Report for the Experiments Liaison Report (10/06/2019) is here.

  • Ongoing, we are seeing high outbound packet loss over IPv6. Central networking performed a firmware update to the border routers but this didn’t resolve the issue. Plan to move connections to the new border routers in Mid June. Will do this before trying to debug any further.
  • Three certificates were revoked mistakenly on ARGUS on Thursday. All SAM tests failed until this was fixed the next morning. Batch farm also did not start any new jobs during this time. We used this accidental draining to reboot nodes that needed to pick up security patching.
  • LHCb Castor instance has been completely disabled for LHCb and will be decommissioned.
  • Brian Davies has transferred from his role as GridPP Tier-2 Storage Support Officer and has joined the Tier-1 Production Team. Although this has happened with immediate effect he will still be available for ad-hoc/informal storage support.
Storage & Data Management - Agendas/Minutes

Wed 30 Oct

  • DOME upgrade problems at Edinburgh
  • Data management support/development for IRIS users

Wed 23 Oct

  • Rucio reporting

Wed 16 Oct

  • CEPH workshop at CERN report

Wed 02 Oct

  • Safe to upgrade to DPM 1.13 but make sure the BDII is working if you support DIRAC
  • Roadmap for xroot and http TPC for RAL FTS(es)

Wed 25 Sept

  • Storage support for IRIS VOs?

Wed 18 Sept

  • Report from yesterday's Rucio Face Meeting at Coseners
  • Suggestions for following up from yesterday's CEPH day hosted by CERN

Wed 11 Sept

  • Storage related stuff at the FNAL (pre-)GDBs
  • DOME upgrade tickets for non-DOME DPM sites

Wed 04 Sept

  • Banning in SSC not entirely successful in non-DOME DPM, and end of support is nigh; tickets to upgrade will go out shortly.
  • Storage-and-data-management-wise, GridPP43 was interesting although no-one volunteered to install the next CEPH.


Tier-2 Evolution - GridPP JIRA

Tuesday 30 Apr 2019

  • ATLAS CernVM4 VMs (equivalent to CentOS 7.6) being tested


Accounting - UK Grid Metrics HEPSPEC06 Atlas Dashboard HS06

Tuesday 6th February


Tuesday 30th January

Tuesday 24th Oct


Monday 16th January

  • The discussion topic for next week will be accounting comparisons. Please note Alessandra's comments last week.

Monday 14th November

  • Alessandra has written an FAQ to extract numbers from ATLAS and APEL avoiding the SSB.

Monday 26th September

  • A problem with the APEL Pub and Sync tests developed last Tuesday and was resolved on Wednesday. This had a temporary impact on the accounting portal.
Documentation - KeyDocs

Tue 9th July 2019

LHCb has added this to their requirements:

Sites not having an SRM installation must provide:

Tue 16th April 2019

Minor change to LHCB requirements in Approved VOs:

  • Sites having migrated to the Centos7 (including "Cern Centos7") operating system or later versions are requested to provide support for singularity containers.

https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs#VO_Resource_Requirements

Tue 2nd April 2019

Changes to Approved for DUNE (and LZ)

There is a new "Approved VOs" document showing new settings for DUNE. And, since both LZ's voms servers now show up in the EGI Portal, I've removed the (now spurious) entry for voms.hep.wisc.edu that was formerly being inserted by hand.

https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs

I've also created a new set of RPMs (each has all LSC, VOMS, and XML for one VO) for all approved VOs. See Approved VOs doc for details, version is 1.9.

Sites with DUNE services need to update using whatever method they employ.

Special note: Since voms1.fnal.gov changed yesterday, I've updated documentation for that right away. But voms2.fnal.gov remains as it is until 23rd April (3 weeks). I'll update documentation for that when it is closer. Staggering the updates like this gives a time margin, so any particular site can have at least one service properly configured at any time. But it does mean that two site updates are needed; one now, and one in three weeks. However, since it's sufficient for only one (of the two) voms servers to be configured properly, sites could save effort and wait until (say) 19th April, then update both at once. But I can't dictate what site admins should do. It's your call.

Ste


Tuesday 5th Mar 2019 New VOMs details for Biomed: https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs

New VOMs rpms to match.

http://hep.ph.liv.ac.uk/~sjones/RPMS.voms/

Tuesday 12th Feb 2019 Documentation done for HTCondor-CE apel accounting. https://twiki.cern.ch/twiki/bin/view/LCG/HtCondorCeAccounting


Tuesday 12th Feb 2019 New CA DN for Biomed

< VOMS_CA_DN="'/C=FR/O=CNRS/CN=GRID2-FR' "
---
> VOMS_CA_DN="'/C=FR/O=MENESR/OU=GRID-FR/CN=AC GRID-FR Services' "

Tuesday 5th Feb 2019 For enmr, certificate of voms-02.pd.infn.it

New DN: /DC=org/DC=terena/DC=tcs/C=IT/L=Frascati/O=Istituto Nazionale di Fisica Nucleare/CN=voms-02.pd.infn.it, New CA_DN: /C=NL/ST=Noord-Holland/L=Amsterdam/O=TERENA/CN=TERENA eScience SSL CA 3

Please check approved VOs: https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs


General note

See the worst KeyDocs list for documents needing review now and the names of the responsible people.

Interoperation - EGI ops agendas Indico schedule

14 May 2019

  • Meeting for May cancelled. Next meeting is on 10th June.

Monday 11th March


Tuesday 12th February

  • There was an EGI ops meeting yesterday - Agenda. We were asked to review the IPv6 readiness information (see here). We should perhaps link in the GridPP IPv6 table for updates status.


Friday 1st February

Thursday 17th January EGI OMB meeting

  • CREAMCE to be out of support by Dec 2020
  • More effort to fix accounting issue of HTCondor CE

Monday 14th January 2019

  • No UK specific issue mentioned.
  • IPv6 readiness plan is going to be summarized at OMB.

'


Monitoring - Links MyWLCG

Tuesday 4th July

  • There were a number of useful links provided in the monitoring talks at the WLCG workshop in Manchester - especially those in the Wednesday sessions.

Monday 13th February

  • This category is pretty much inactive. Are there any topics under "monitoring" that anyone wants reported at this ops meeting? If not we will remove this section from the regular updates area of the bulletin and just leave the main links.

Tuesday 1st December


Tuesday 16th June

  • F Melaccio & D Crooks decided to add a FAQs section devoted to common monitoring issues under the monitoring page.
  • Feedback welcome.


On-duty - Dashboard ROD rota

Tuesday 5th February

  • Birmingham decommissioning of SRM and BDII still going on so tickets are on hold.
  • Few availability tickets on hold
  • Lancaster has WebDav ticket on hold which seems to be effect of DOME rollout

Tuesday 14th August

  • A couple of new availability tickets (QMUL and Lancaster), both for well-publicised reasons. Otherwise quiet. AM on shift this week.


Rollout Status WLCG Baseline

Monday 20th November



Historical References


Security - Incident Procedure Policies Rota

Tuesday 11th June

  • NTR


Services - PerfSonar production dashboard |PerfSonar ETF | OSG Networking and perfSONAR pages | GridPP VOMS

- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).

Monday 5th March

Monday 10th September

Monday 2nd July

Monday 30th April

Monday 19th February

Please could sites upgrade their perfsonar hosts to CentOS7. Instructions are here. Current OS versions here.

  • Duncan has recreated the UK perfSONAR mesh. Link here!


Tickets

32 Open Tickets this week, which is an in depth a look as I've been able to take.

Tools - MyEGI Nagios

Monday 20th November

Tuesday 18th July

  • Following our ops discussion last week, Steve will focus his tests on supporting the GridPP DIRAC area and decommission the other tests.


VOs - GridPP VOMS VO IDs Approved VO table

Monday 20th November

  • Tom Whyntie has requested (and been granted) access to the GridPP VO to get some pipelines working for large-scale processing and analysis of MRI scans associated with the UK Biobank project.
  • All VOs in the incubation page being prompted for updates by the end of November (required input for OC documents).
  • QMUL (Steve L) is following up on the biomed MoU. GridPP want to be cited in research papers for the support our resources/sites provide.


Site Updates

Date



Meeting Summaries
Project Management Board - MembersMinutes Quarterly Reports

Empty

GridPP ops meeting - Agendas Actions Core Tasks

Empty


RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) Agenda Meeting takes place on Vidyo.

Highlights from this meeting are now included in the Tier1 report farther up this page.

WLCG Grid Deployment Board - Agendas MB agendas

Empty



NGI UK - Homepage CA

Empty

Events
UK ATLAS - Shifter view News & Links

Atlas S&C week 2-6 Feb 2015

Production

• Prodsys-2 in production since Dec 1st

• Deployment has not been transparent , many issued has been solved, the grid is filled again

• MC15 is expected to start soon, waiting for physics validations, evgen testing is underway and close to finalised.. Simulation expected to be broadly similar to MC14, no blockers expected.

Rucio

• Rucio in production since Dec 1st and is ready for LHC RUN-2. Some fields need improvements, including transfer and deletion agents, documentation and monitoring.

Rucio dumps available.

Dark data cleaning

files declaration . Only Only DDM ops can issue lost files declaration for now, cloud support needs to fill a ticket.

• Webdav panda functional tests with Hammercloud are ongoing

Monitoring

Main page

DDM Accounting

space

Deletion

ASAP

• ASAP (ATLAS Site Availability Performance) in place. Every 3 months the T2s sites performing BELOW 80% are reported to the International Computing Board.


UK CMS

Empty

UK LHCb

Empty

UK OTHER
  • N/A
To note

  • N/A