GridPP PMB Meeting 651

GridPP PMB Meeting 651 (20.11.17)
=================================
Present: Dave Britton(Chair), Tony Cass, Pete Clarke, Jeremy Coles, David Colling, Pete Gronbech, Roger Jones, Steve Lloyd, Andrew Sansum, Louisa Campbell (Minutes).

Apologies: Tony Doyle, Dave Kelsey, Andrew McNab, Gareth Smith,

1. LHCONE
=========
DB circulated an email about LHCONE, triggered by meetings that he and PC attended in CERN recently. A couple of years ago we adopted a wait-and-see attitude to LHCONE because, in the UK, it appeared to be a soluton looking for a problem and we were concerned about additional costs. LHCONE appears to have now become established and technology development now means there are no obvious cost implications. For some non-UK sites LHCONE seems the most appropriate vehicle to connect if we need high bandwidth. – Obviously we can access other Tier-1 sites via the OPN but we increasingly peer with Tier-2 sites. With the growth of “DMZ”-like configurations, it is also increasingly likely that the LHCONE will be the only route to bypass firewalls at some sites. In the UK we did establish a test LHCONE link at IC and intended to do the same at RAL. However, progress on the latter has been slow (other priorities such as IPV6) but DB has proposed we now pursue this more actively. Imperial is still connected – there is a small cost for one licence for c. £1500. We believe there is (now) no issue from the JANET side because the technology required is increasingly being used anyway and this is not a new link or additional bandwidth. AS has submitted a paper on this at the recent RAL Network meeting and is pushing. DB asked the PMB to endorse the proposal, AS confirmed this is probably relatively straightforward. DC confirmed CMS are happy with this and see no change to performance elsewhere when they switch to LHCONE. RJ confirmed this is satisfactory from ATLAS perspective with a watching brief maintained. It was suggested it would be helpful to establish a policy and sites can keep this in mind for the future. This topic should be included in the GridPP40 agenda, including experience at Imperial and what this might entail as a future option for sites.

2. Data Management Plan
=======================
PC has circulated a copy for discussion here for speed of response. RJ and DC and AM were asked for input.
Page 1 – similar to previous version, PC asked if any LHC members require changes, he noted the same statement is included regarding computing models and software foundations and asked if any additional points should be included. DC mentioned the Evolution group set up but this has not evolved yet and the roadmap is out for comment.
Section 1:2 – no changes recommended. It was suggested that the wording is changed to note data used for Outreach, communication and Non-HPC science.
Section 1:3 – same as previous version – taken from the UKTO bid. The purpose is not to provide a full description but to remind CGIs of their reliance on GridPP. RJ noted a potential typo on numbers of Tier2 sites.
Section 1:4 – unchanged since last year – a generic description of the process which could benefit from more clarity/amendment. Final paragraph should be rewritten and moved to a more appropriate place.
Section 1:5 – Software metadata has been slightly changed. It states everyone has catalogues and mentioned storage – this has been reduced to a shorter and more succinct paragraph. PC asked for comments to ensure this remains accurate and up to date.
Section 1:6 – may need to be more specific as there is no date limit for data preservation.
Section 1:7 – begins with open data policies and references the open data sites, which probably have not changed in 3 years. In yellow some experiments will likely have undertaken more with regard to open data and PC asked for input from RJ and DC if there are any updates/changes re CMS and Atlas. DC noted there were some changes and he will check this and contact Cathy for information. PC advised he will be circulating to other experiments also. RJ noted some changes around Atlas were related to packages but not specifically relevant here.
Section 1:8 – this is relatively brief and was a heading in the guidelines. It is important to set a constructive tone while explaining challenges re funding for service and scalability. This relates to meeting Government priorities on open data.
Section 1:9 – PC suggested assigning names to sections. Each experiment will not require the same level of detail. PC asked DB to update NA62; PC will update Fermilab; DC will update locks since the LZ document has been submitted; PC will deal with Supernemo; RJ will follow up on T2K (PG has been attempting to find a contact re Tier1 – Francesca and Sophie); Commet (DC will enquire with Yoshi, though they were not funded by STFC and may prefer not to contribute).
Ideally, PC would like to send the document to the Chair of the PPMG by the end of this week.

3. Power down at RAL
====================
AS provided an update on a regional power cut that reached at least Abingdon today (GS is dealing with this and sends his apologies). This was due to 2 feeds going into the region – 1 this morning and the other at noon – leading to an 8 minute downtime, causes are unknown. The generator did not fire up, checks are ongoing but the decision was taken to stay down for the moment until there is confidence the power is fully and reliably up to both feeds. There should be a smooth restart as many systems remained up, though Castor was down. This has been announced to LHCG. With 3-monthly testing of the generator, it would be helpful to know why it did not work.

4. AOCB
=======
a) CHEP abstracts submission is open.

b) PC noted another request for committed resources from EUCLID – enquired about the cost to get Cycles on GridPP. Their recognition of our effort was welcomed – we should consider this further for the future and set up a model in the context of LZ, LSST and others. Something less binding than a formal MOU but with long term requirements would be useful and should be further discussed. DC noted the answer from Swindon after the end of GridPP5 re LZ – Tony’s response is that while he supports this he cannot commit at the moment since it is not clear what the situation will be in future. This requires to be considered over the next 2-3 weeks.

c) PG reminded that Quarterly reports are now due urgently. AS is working on this today and DC confirmed he is working on the CMS report. JC has completed and is emailing to PG today. For future reports it may be useful for PG to prompt with a phone call.

5. Standing Items
===================

SI-0 Bi-Weekly Report from Technical Group (DC)
———————————————–
There was no Technical Group meeting this week.

SI-1 ATLAS Weekly Review and Plans (RJ)
—————————————
Nothing significant to report.

SI-2 CMS Weekly Review and Plans (DC)
————————————-
Nothing significant to report. Last week was computing week – lots of interesting talks but nothing of particular reference to the UK.

SI-3 LHCb Weekly Review and Plans (PC)
————————————–
AM not in attendance, no report submitted.

SI-4 Production Manager’s report (JC)
————————————-
1. Glasgow has spotted LHCb user jobs attempting to use 32 threads per job. Cgroups protected the site but it is being actively followed up with LHCb as these jobs could adversely affect a site’s performance.

2. CHEP 2018 abstract submission is open: https://indico.cern.ch/event/587955/abstracts/. DB confirmed that the same policy would apply as for previous years.

3. Biomed MoU: Steve shared an analysis on 8th November indicating this would be okay (0.5% of our capacity and 85% availability). SL confirmed he started this asking how it can give credit – this should be signed, but it is not clear who should sign as MOU appear to be between sites rather than with GridPP. SL will pursue this – it should be straightforward enough to update the confirmation text to generate an acknowledgement to GridPP sites in the UK and state this is guaranteed at GridPP level.

4. Tom Whyntie has requested (and been granted) access to the GridPP VO to get some pipelines working for large-scale processing and analysis of MRI scans associated with the UK Biobank project (http://www.ukbiobank.ac.uk/).

5. October T2 availability/reliability was as follows:

ALICE (http://wlcg-sam.cern.ch/reports/2017/201710/wlcg/WLCG_All_Sites_ALICE_Oct2017.pdf)
All okay.

ATLAS (http://wlcg-sam.cern.ch/reports/2017/201710/wlcg/WLCG_All_Sites_ATLAS_Oct2017.pdf)
RHUL 86%:86%

CMS (http://wlcg-sam.cern.ch/reports/2017/201710/wlcg/WLCG_All_Sites_CMS_Oct2017.pdf)
All okay.

LHCb (http://wlcg-sam.cern.ch/reports/2017/201710/wlcg/WLCG_All_Sites_LHCB_Oct2017.pdf)
All okay.

Site issues:

* RHUL had a networking issue.

7. Germany is leaving EGI from January 2018. This has raised various questions around services for WLCG such as APEL and GOCDB. Running GGUS will continue as a WLCG contribution. DB confirmed this has been discussed in various contexts, including the NGI meeting last week and it will also be covered in the NGI meeting next week. AS confirmed that once EOSK Hub starts no payments will be received direct from EGI, so this is rather a complex process – this comes from the Horizon 2020 project.

Announcements of operations related meetings for reference:

A. The next LHCOPN/LHCONE meeting will be at RAL 6-7th March 2018: https://indico.cern.ch/event/681168/. The UK position on LHCONE is being reviewed and this would be worth attending given the above discussion.

B. There is a CERNVM users workshop 30th Jan to 1st Feb: https://indico.cern.ch/e/cvm18.

C. The first workshop of the WLCG Security Operations Centre (SOC) working group will take place at CERN on the 11th (afternoon) and 12th (all day) of December 2017 (https://indico.cern.ch/event/676160/).

SI-5 Tier-1 Manager’s Report (GS)
———————————
GS not present and no report submitted. AS noted re procurement that the Disk tender went out on Friday and CPU tender will be out this week. DB reiterated the concern raised last week that procurement was later than planned and how this might impact in future. AS confirmed these procurements should end before Christmas so are relatively on track, though there has been some slippage. AS noted recent appointment of Project Manager who is doing a great job of keeping track of all procurements, though the Tier1 began before he was in post – this will be very helpful next year in commencing earlier. Decision making process has been speeded up recently which will also help.

SI-6 LCG Management Board Report of Issues (DB)
———————————————–
DB was unable to attend due to a power cut. He will circulate the link to the agenda and reports.

SI-7 External Contexts (PC)
———————————
UKTO – the meeting will now mainly track procurement initially. PC suggested it would be worth having a F2F meeting around March.

REVIEW OF ACTIONS
=================
644.2: PG and AS will document plans and costings for the remainder of GridPP5 taking account of the Oracle tape issues experienced. (Update: a draft will be produced before Christmas). Ongoing.
644.3: AS put together a starting plan for staff ramp-down. (Update: a draft will be produced before Christmas). Ongoing.
644.4: AS will progress capture of funds for Dirac with Mark Wilkinson. Ongoing.
647.1: PC will update Data Management Plan. Ongoing.

647.2: DB will circulate link for Data Management Plan once agreed. Ongoing.
649.1: DB will write Introduction of OS documents. Ongoing.
649.2: PC will write Wider Context of OS documents. Ongoing.
649.3: PG will schedule a discussion of the Risk Register at a PMB meeting in December then update this in the OS documents. Ongoing.
649.4: GS and AS will write the Tier1 Status section of OS documents. Ongoing.
649.5: JC will write Deployment Status section of OS documents with input from PG. Ongoing.
649.6: RJ, DC and AS will write LHC section of User Reports in OS documents. Ongoing.
649.7: JC will write Other Experiments section of User Reports in OS documents with input from DC and PG. Ongoing.
650.1 SL will confer with PG and sign up to Biomed site to ensure our input will be explicitly credited in future. Ongoing.

ACTIONS AS OF 20.11.17
======================
644.2: PG and AS will document plans and costings for the remainder of GridPP5 taking account of the Oracle tape issues experienced. (Update: a draft will be produced before Christmas). Ongoing.
644.3: AS put together a starting plan for staff ramp-down. (Update: a draft will be produced before Christmas). Ongoing.
644.4: AS will progress capture of funds for Dirac with Mark Wilkinson. Ongoing.
647.1: PC will update Data Management Plan. Ongoing.

647.2: DB will circulate link for Data Management Plan once agreed. Ongoing.
649.1: DB will write Introduction of OS documents. Ongoing.
649.2: PC will write Wider Context of OS documents. Ongoing.
649.3: PG will schedule a discussion of the Risk Register at a PMB meeting in December then update this in the OS documents. Ongoing.
649.4: GS and AS will write the Tier1 Status section of OS documents. Ongoing.
649.5: JC will write Deployment Status section of OS documents with input from PG. Ongoing.
649.6: RJ, DC and AS will write LHC section of User Reports in OS documents. Ongoing.
649.7: JC will write Other Experiments section of User Reports in OS documents with input from DC and PG. Ongoing.
650.1 SL will confer with PG and sign up to Biomed site to ensure our input will be explicitly credited in future. Ongoing.