GridPP PMB Meeting 594 (F2F)

GridPP PMB F2F Meeting 594 (11.04.16)
==============================================================
Present: Pete Gronbech (Chair), Dave Britton, Dave Kelsey, Andrew Sansum, Jeremy Coles, Gareth Smith, Roger Jones, Pete Clarke, David Colling, Tony Doyle, Tony Cass, Andrew McNab, Steve Lloyd, Louisa Campbell (Minutes).

Apologies: None

1. Intro
=========
DB confirmed GridPP5 as a starting point and reiterated we must position ourselves based on what the context will be in 4 years and plan accordingly for “GridPP6”. Today’s meeting covers several aspects related to this, but with the knowledge that we have funds secured. Spreadsheets were submitted in STFC format – amounts are known and confirmed. However, it should be noted that the outcome of the spending review for STFC was effectively flat-cash. Programs have been asked to submit amounts + or – 5% on last time. STFC funding is flat cash, i.e. not plus inflation as government ODA stipulations have impacted this – this will become clearer in the coming weeks. This will be more challenging to cope with by GridPP6 which may require some imaginative ways of securing funding. Some potentially surprising countries currently defined on the list as Developing (e.g. South Africa, India and China) are eligible for ODA. Capital split has not been defined for GridPP5 – it is unlikely Tier-1 will achieve the split hoped for. This year, it appears that STFC funding dropped by c. £12m, but this reflects some Capital issues.

There was a discussion about UK-T0 reminding the PMB that this is one of the external contexts that is becoming important. PC has been involved in ongoing discussions with Astronomers who recognise they can learn a great deal from Particle Physics and GridPP. The more influential members recognise the benefits of working with us, particularly with SKA, and they should be perceived as peers in an inclusive context.

2. OC Documents
===============
The Oversight Committee meeting is 6 May and documents are generally brought together 1 week before that. PG has circulated target deadlines to the PMB for draft version submission. These will summarise GridPP4+ and include final accounts for GridPP4 which are now available. PG noted spend was very close to the target but there was a question as to whether university out-turn numbers need to be incorporated.

RAL – DK noted that the OPN came out cheaper (£83K) rather than the predicted £133K – Finance insisted in accruing this over 7 months in contrast to previous years. There has been a slight underspend on staffing and travel – quarterly estimates have to be submitted and have been underestimated over the last 2 years. The figures will need to be more accurately assessed and finalised as other costs need to be factored in, e.g. HEPIX, PMB F2F meetings, etc.

Financial documents – PG is responsible for GridPP4 and GridPP4+ financial summaries. Final Staff Numbers rely on figures from quarterly reports so these require to be submitted imminently for inclusion. PG will suggest appropriate figures based on amounts generally submitted in previous quarters, which tend to be largely consistent except for Tier-1. We need to actually check the target Budget and manpower for comparison because an official letter was not received from STFC (though the PMP (Project Management Plan) probably contains all that is required); this will form a reasonably accurate set of figures to work from.

Documents: Introduction (DB – done); International (PC – Done); GridPP4 status with more project management things (PG working on); Risk Register fully discussed at today’s PMB (PG will finalise); Tier-1 Status (AS has provided majority of information); Deployment Status (in progress, complete by Wednesday); Experiments – ATLAS (RJ working on), CMS (DC working on), LHC (AM will finish before Wednesday); Other VOs (Tom has written a great deal on that could form a firm foundation – ALICE, LSST, LZ and others should be included – PG will edit). Information can be generated and a standard graph inserted to demonstrate we are delivering what is required. Impact & Dissemination – Tom has almost completed writing. PG will circulate a final draft to PMB for comment after all content has been input.

Support for other VOs was discussed, including non-LHC and hardware projected and actually used. Differences between support for other experiments and providing hardware for these was discussed.

ACTION 594.1 – PG to circulate draft OSC reports to PMB for comment.

3. GridPP5 Plans
================
DB enquired where we are with Project Map and Deliverables. PG confirmed Project Map and format for quarterly reports needs to be reviewed for GridPP5. The OC reports cover only GridPP4+. The reports for GridPP5 should cover – basic strategy, procurement, staff, milestones measured and high level deliverables, amongst others. High level milestones require to be proposed and agreed. As the project has to evolve in 4 years this will not necessarily be the most effective way forward, but a system must be in place for monitoring and demonstrating delivery of core mandate over next 4 years – i.e. LHC projects have sufficient support. Engagement with other experiments can be plotted statistically to demonstrate what experiments requested and what was delivered. Perhaps resource slots available should be stated, rather than those that were taken up to demonstrate extra resource available. Plotting of queued jobs may be helpful to determine waits for resources, but this is challenging to measure. It was noted that a recent, damning, report from the National Audit Office criticised BIS for providing funding but no means of measuring success. The number of experiments using GridPP and publications produced could be inserted to demonstrate success, though this may be impacted by other requirements. For example, DRI was an excellent project example but not easily measurable/justifiable/ quantifiable. If we were asked for 10KPI’s to measure success of GridPP then they were overlaid onto a Project Map, e.g. average efficiency/usage of Tier1 or Tier2 resource, vacancies arising and posts filled, etc. this can be provided. However, it may not be efficient to gather info that might not be requested. High level info is easy to extract for different contexts – PG will initiate a process to produce Project Map to meet different project representatives then collate and extract this information.

ACTION 594.2 – PG to initiate the production of a GridPP5 Project Map.

4. Risk Register
================
GridPP4+ – provided an opportunity to consider what is required for GridPP5. The GridPP4+ register should be reassessed and used as the basis of a GridPP5 Risk Register – there are currently 33 risks thereon. Discussion took place on the best placement of risks (outlined below) – DC and PG will discuss and agree separately:

Risk 1 no change.
Risk 2 is a risk for ALICE – perhaps this should be rephrased e.g. ‘inefficiencies of software have allowed more hardware than technically is required – risk caused by intransience in some experiments creates costs’, risk is high but impact is low for ALICE – keep risk but re-categorise.
Risk 3 needs considered for GridPP5 as it has become a more pressing/ justifiable risk as we have been pressed for funding.
Risk 4 high impact.
Risk 5 risk high and impact higher – we are attempting to mitigate this (similar to risk 10 – should be moved up and connected)
Risk 6 could mention procurement and raise up to 30%.
Risk 7 mitigation could relate to security teams to deal – threats rising and ability to deal with is decreasing and impact on operations is rising. Wording should be altered and risk level increased.
Risk 8 reputational risk is high – conflating risk 7 and 8 – add DNS tag.
Risk 9 should be deleted.
Risk 10 Retention problems at RAL, to be moved up the list next to risk 5.
Risk 11 to remain but amend wording to ‘mismatch between budget and hardware costs’. Risk is not high on a one-year timeframe, but c. 25% likelihood of it happening.
Risk 12 Rephrase for UK specific risk. No NGI at present now that Claire Devereaux has moved on – she and Charlotte represent RCUK, STFC run the CA. Some services CA undertake for WLCG – who chairs? Matthew Duffy is chair of the Council but not the UK representative, Ian Collier may join the PMB as the connection to EGI which is necessary for responding to relevant queries and input to funding applications. Titiana recently confirmed that a proposed PMB for EGI Think Tank will soon be established. PG suggested that we should merely ask relevant questions to the Council (i.e. Charlotte or Matthew) wherever necessary. Questioned whether a new Standing Item for any EGI/Icloud items etc should be a slot on the Agenda.
Risk 13 Travel funds have been cut, in line with the rest of the project. No change to the risk.
Risk 14 no longer high risk.
Risk 15 Juxtaposition with UK risk item – should be moved next to Risk 12.
Risk 16 No increased risk.
Risk 17 possibly move up – the custodial data sits on Castor (which is a long term risk). No change currently.
Risk 18 possibly electricity bills – point out all STFC funds. If electricity price increases this is an overhead onto operations – more concern for Tier-2s. No change.
Risk 19 No change.
Risk 20 connected to Risk 19 – one is an operational risk (risk 20) and the other financial risk (Risk 19). Impact is dropping with evolving computer models. May be worth joining into one risk and define aggregated risk and mitigation.
Risk 21 low risk at the moment over the next 6 months – decrease risk.
Risk 22 low risk – decrease substantially. Consider including some wording on potential Brexit impact and we will assess it after June 24.
Risk 24 several mitigating factors, remains at current risk level.
Risk 25 No change.
Risk 26 increase to amber as more users now and higher expectations.
Risk 27 no change.
Risk 28 no change.
Risk 29 delete.
Risk 30 low risk.
Risk 31 increase risk – perhaps double as we don’t yet have confirmation of resources. Reword to state STFC.
Risk 32 CEPH project is a major undertaking, RAL has allocated some project management effort to control the project.
Risk 33 amend wording to ‘Failure of achieving further integration within PPAN community’

When the Project Map is designed this may throw up additional risks to consider.

ACTION 594.3 – DC and PG will discuss and agree placement of risks on the risk register.

ACTION 594.4 – LC to create a new Standing Item for future PMB agendas ‘External contexts’

6. CEPH – Echo storage at RAL – status report
=============================================
Alistair Dewhurst (STFC RAL) presented via Skype.

The PMB thanked Alistair for his comprehensive presentation and will further discuss the implications. Discussion took place around the user IDs and passwords, specifically relating to issues associated with any proposed design of authentication software.

7. ALICE long term storage plan
================================

Alistair Dewhurst (STFC, RAL) presented via Skype. From page 32 of attached presentation slides.

DB asked what ALICE require from us. Alistair went over the details that are contained in a presentation and paper to the PMB. There was discussion on whether the same issues were experience at other Tier-1 sites? We are unusual in that ALICE is a small part of the Tier-1; most other Tier-1s are dedicated or at least have ALICE as a major client. This means other Tier-1s can offer ALICE a more bespoke service; not feasible at RAL because of the effort it would require. The PMB suggests offering to continue to support ALICE at RAL using the current hardware and CASTOR until spring 2017. ALICE should use the interim to see if they could align better with the other LHC VOs. If that turns out to be impossible, GridPP would consider options to provide ALICE with the minimal MOU level of service at RAL beyond 2017, but this may be impossible. Even if it is possible, ALICE would not be able to continue their current success in making opportunistic use of free resources beyond that point.

DB enquired about deploying CEPH FS on top of CEPH to meet ALICE’s requirement? It would require repartitioning up the service that could not easily be accommodated, getting a file system to work at scale is challenging. On the other hand, prolonging the Castor-disk service is also an unattractive option. Catalin is giving a presentation on Monday 18 April to ALICE and it would be very helpful if information were to be supplied so that a comprehensive answer can be provided.

ACTION 594.5 ALL agree a decision on ALICE storage and communicate to Catalin before Monday 18 April 2016.

9. Quarterly Reports
=====================
This agenda item was to discuss what we want from quarterly reports and how to improve/modify them for GridPP5. There was insufficient time to discuss this time.

REVIEW OF ACTIONS
=================
591.4: PG to collate information for inclusion in OSC Financial Report. Ongoing.

591.5: ALL to contribute to the OSC Project Status Report. Ongoing.

591.6: DB to contribute Introduction and International Context for OSC Report. Done.

591.7: PG to contribute Summary of GridPP Status for OSC Report. Ongoing.

591.8: PG to contribute Discussion of Risk Register for OSC Report. Done.

591.9: GS and AS to contribute Tier-1 Status Report for OSC Report. Ongoing.

591.10: JC to contribute Deployment Status for OSC Report. Ongoing.

591.11: RJ to contribute ATLAS User Report for OSC Report. Ongoing.

591.12: DC to contribute LHCb User Report for OSC Report. Ongoing.

591.14: AS to consider how to model a proposal for short term temporarily sign-ins for new users to access the Grid. AS has started discussing this with Ian Collier. Ongoing.

ACTIONS AS OF 11.04.16
======================
591.4: PG to collate information for inclusion in OSC Financial Report. Ongoing.

591.5: ALL to contribute to the OSC Project Status Report. Ongoing.

591.7: PG to contribute Summary of GridPP Status for OSC Report. Ongoing.

591.9: GS and AS to contribute Tier-1 Status Report for OSC Report. Ongoing.

591.10: JC to contribute Deployment Status for OSC Report. Ongoing.

591.11: RJ to contribute ATLAS User Report for OSC Report. Ongoing.

591.12: DC to contribute LHCb User Report for OSC Report. Ongoing.

591.14: AS to consider how to model a proposal for short term temporarily sign-ins for new users to access the Grid. AS has started discussing this with Ian Collier. Ongoing.

594.1 – PG to circulate draft OSC reports to PMB for comment.

594.2 – PG to initiate the production of a GridPP Project Map.

594.3 – DC and PG will discuss and agree placement of risks on the risk register.

594.4 – LC to create a new Standing Item for future PMB agendas ‘External contexts’.

594.5 ALL agree a decision on ALICE storage and communicate to Catalin before Monday 18 April 2016.