GridPP PMB Meeting 706

GridPP PMB Meeting 706 (23.04.19)                               F2F
=================================
Present: Dave Britton (Chair), Tony Cass, Pete Clarke, David Colling, Alastair Dewhurst, Pete Gronbech, Jon Hays, Roger Jones, Dave Kelsey, Steve Lloyd, Andrew McNab, Gareth Roy, Andrew Sansum, Louisa Campbell (Minutes).

Apologies: Tony Doyle

1) GridPP Panel Questions
=========================
Q1 new technologies – response includes case study of Echo in GridPP5 and Datalakes (WP4), for compute Cloud technology and containers; CPU cores are referred to. GPU is covered and mention could be made of emerging technologies, e.g. TPU. Potential for reducing global storage requirements – Datalakes is one area that may contribute to this and experiments drive this based on requirements. There was discussion on practicalities and costings of some of these technologies as well as editing text for clarity and to ensure the response covers aspects we do very well and others we aspire to do, taking account of the constraints that we operate within.

Q2 financial savings – sharing expertise developed during GridPP5 requiring less manpower to operate effectively, we made efficiency savings to drive down operation costs. There was discussion on edits and making clear the differences between Tier-1 and Tier-2 requirements. IRIS kit was discussed as well as in-kind contribution. GridPP6 will continue the successful things being done and outline cause and effects.

Q3 over-pledged resources – RRB report is referred to noting ATLAS, CMS and LHCb use of pledged resource. These are opportunistic resources and there is no mechanism for reimbursement from partners for over-delivery.

Q4 reduction in UK Tier-2 capacity disadvantages UK analysis capacity – most of this refers to group disk and there is also mention of CPU, reprocessing and Monte Carlo simulation etc. Having Tier-2 allows usage of local storage through the Grid. RJ will finalise the text.

Q5 justification of FEC for each post – this is agreed as robust text. It was noted at RAL the overheads take the total per post to £100K and the project does not benefit directly from these overheads as they are absorbed by STFC. The text will make clear the GridPP staff do not perform technician or administrative support roles at the host institution. They are employed to undertake GridPP and support academics in using the Grid.

Q6 rationale for removing scientific capability rather than reducing travel, FEC etc – there was agreement on text in reinforcing how critical the travel budget is and it is comparatively low overall (c. 2.5%) – DK will read over. It should be noted this relates to CG travel and that we do not have any LTAs.

Q7 division between Tier-1 and Tier-2 was less well explained – clarification of text was discussed, including tape service, extra network and on-call service.

Q8 case for support did not explain how the request would meet UK pledges – the first part of the question noted meeting pledges – level of resources required is somewhere between flat cash and 90% scenarios. All pledges to non-LHC experiments were approved by STFC. It was noted the final part of the question is unclear.

Q9 Tier-2 electricity costs justification – Glasgow figures were provided and comparison was possible with Slough where hosting costs are £35K per month and Tier-2 status at Slough is c. £4M per annum for electricity, hosting and other associated non-staff costs. Costs of HW are replicated against running costs, this is also in the Risk Register (Risk 28). Balance of Programmes review comments could be useful to consider for the presentation.

2) Q4/18 reports outstanding (22/4)
===================================
CMS report will be submitted this afternoon. Tier-1 quarterly report is being prepared and submitted by tomorrow. Manpower for Tier-1 can be challenging to produce due to complexities with budget codes and people on projects.

3) Review of Risk Register
=============================
This was done for the GridPP6 proposal. There are currently 3 versions: one prepared for the last OSC; an update in the scope of GridPP6 if fully funded; and one taking account of the changes from 6 going back to 5 (minus Brexit). There is a paragraph in the OSC document referring to this. Risk 2 has been replaced with a stand-alone tape service that needs to be planned for. Risk 9 – failure to deploy h/w to procure or deploy. Risk 11 – over contention of resources is now split in risk 12 (inter LHC contention and LHC experiments contending with others we want to support – for GridPP6).

Some priorities also updated. Amber risks:

5 – loss of custodial data (Castor still relies on Oracle and staffing still low) – remains at amber.

8 – recruitment (tweaked text with Tier-1 using apprenticeships and internships) – remains at amber but may increase to red as we approach the end of GridPP5.

9 – failure to deploy h/w – one Tier-2 did not meet FY constraint by £70K and Clustervision went bankrupt – remains at amber.

11 – over-contention for resources, CMS asked for access to tranche funding early – remains at amber.

13 – previously capital risk if prices drop below threshold at Tier-1 (this relates to issues historically with switches and tape media and more relevant at the start of GridPP5 relating to h/w and resources), in the last OSC this was amber – should be reduced to green.

15 – loss of experienced personnel at Tier-2 – remains at amber.

18 – if experiment software runs inefficiently on the grid (e.g. Multi-core issues, CMS issues) since we are in long shut-down 2 this risk should be reduced to green.

19 – security problem affecting reputation (GridPP has a very good security team but it is not possible to anticipate when risks may occur) – remains at amber.

21 – insufficient user support is slowly increasing as we try to support more communities – remains at amber.

22 – mismatch between budget costs & price volatility due to Brexit – remains at amber.

4) Oversight Committee Documents
=============================
DB would like to submit by Tuesday 30th April. Latest version (2D) is on the website and Q4 report minus Tier-1 is also there. It is recognised the format is useful, especially for a new OSC, and there was a discussion on what should be included as well as how it should best be presented.

  • Introduction

DB has written the introduction and requests feedback where indicated. Some points highlighted include the GridPP6 proposal, during this process we initiated discussion with WLCG regarding interpretation of Flat Cash and UKRI capital funds.

  • Wider Context & GridPP6 Proposal

PC has updated Wider Context and DB provided an overview of the GridPP6 Proposal.

  • GridPP5 Status

PG and GR are undertaking – PG is doing the first section and needs outstanding reports to complete.

  • Risk Register

GR is progressing the Risk Register.

  • Tier-1 Status

AD has commenced this and expects to complete by Thursday/Friday for finalising on Monday.

  • Tier-2 Status

PC and DB have considered this section and PC suggested a template that could be useful for bringing uniformity to Tier-1 and Tier-2 reporting which AD agreed would be useful. DB enquired who should best complete the text for Tier-2, what should be incorporated, including grants to experiments and what aspects can be dispensed with, perhaps utilising SL’s model. This should be a high-level Tier-2 overview report. PG will make a start on the text with input from DB and GR (Matt will be asked for input). There should perhaps be a brief section on Security included in the report covering the last 6 months – DK will consider this in discussion with David Crooks.

  • User Reports (ATLAS, CMS, LHCb, Other)

RJ has written the ATLAS report and will send to GR. DC will complete and submit for CMS in the next few days. AM has submitted LHCb report to DB for review. JH submitted a report for Other VOs based on previous submissions – Dune is included and there may be more on LZ and others.

  • Outreach

This section can in future be placed into the Introduction section.

  • IRIS

PC will prepare text for this section.

  • Financial Tables

Being prepared.

5) Staffing Matters
=============================

  • How do we replace work done by JC?

DB will email JC and request the spreadsheet outlining the tasks he was undertaking previously split between JC and PG. Sharing the weekly meeting between Matt has been going very well – it was agreed Matt should Chair the Ops team meeting.

  • Other matters arising

IRIS distribution of funds was discussed – PC outlined the allocations and associated calculations relating to the proposal that GridPP takes £750K to provide resource for IRIS. DB noted the relevance of 3 big sites have bought into IRIS as they accepted funds and have an obligation to support IRIS users, the other 2 big sites could do the same; also, we need to ensure we purchase the required kit and ensure there are sufficient funds to cover the infrastructure at both sites. The term “Physical Cores” should be included in the document. Monitoring and accounting needs to be worked out – the experiments can do through the EGI portal, but RAL would like some clarity on this. DC and AM will consider requirements in this regard and further discussion took place on practicalities of running and upgrading. It was agreed Openstack responsibilities should go to Imperial for now and be developed at Manchester, there is aspiration at QMUL and Glasgow will consider this in c. 1 year after moving to the new Datacentre. DB has a spreadsheet tracking funds coming from different sources to maintain a balance in funding across the 5 sites.

ACTION 706.1: ALL should consider individuals who could take on Tier-2.

6) AOCB
=======
None

REVIEW OF ACTIONS
=================
644.4: AD will progress capture of funds for Dirac with Mark Wilkinson. (Update: funding from DIRAC. AS has emailed Mark. They are now using it more heavily. Could use the money for tape, but have to be careful not to buy tape we won’t use. May be better charging later rather than during this FY? AD will now progress. 08/10/18 – Leicester are producing a PO for tapes and will send to AD to produce an invoice). Ongoing.

702.1: DC to identify an LZ presentation for GridPP42. Done.

702.3: GR to Update is required to Table 20 to bring numbers in line with the returned JeS forms. Done.

702.5: PC to draft a set of milestones for WP4. Done.

702.6: PC & DB to add some additional text to bring things together (WP1c). Done.

703.1: DC to provide figures for WP2 numbers. Done.

703.2: AD will contact Darren (Tier-1), Tim (Atlas) and Katie (CMS) for Q4 reports. Done.

704.1: ALL should discuss with experiment reps to develop response for question 1 of the GridPP6 Response to the Panel. Done.

704.2: AD to develop a response relating to WP4 work (question 1) of the GridPP6 Response to the Panel. Done.

704.3: DB, RJ, AD and DC will work on question 3 of the GridPP6 Response to the Panel. Done.

704.4: ALL should review and contribute to the GridPP6 Response to the Panel where appropriate. Done.

704.5: DB will write a report for the OSC. Ongoing.

ACTIONS AS OF 23.04.19
======================
644.4: AD will progress capture of funds for Dirac with Mark Wilkinson. (Update: funding from DIRAC. AS has emailed Mark. They are now using it more heavily. Could use the money for tape, but have to be careful not to buy tape we won’t use. May be better charging later rather than during this FY? AD will now progress. 08/10/18 – Leicester are producing a PO for tapes and will send to AD to produce an invoice). Ongoing.

704.5: DB will write a report for the OSC. Ongoing.

706.1: ALL should consider individuals who could take on Tier-2.