GridPP PMB Meeting 664 (F2F)

GridPP PMB Meeting 664 (09.04.18) F2F
=================================
Present: Pete Gronbech (Chair), Dave Britton, Tony Cass, Pete Clarke, Jeremy Coles, David Colling, Tony Doyle, Roger Jones, Steve Lloyd, Andrew McNab, Gareth Smith, Louisa Campbell (Minutes).

Apologies: Alastair Dewhurst, Dave Kelsey, Andrew Sansum.

1. Item 1 (DB)
====================
DB noted the agenda items are a guide and the meeting will have a good deal of free-flowing discussion on relevant areas rather than conclusive decisions.

2. Tier2 h/w Allocation Policy
==============================
The original GridPP5 plan included Tier2 amounts per year and Tony Medland confirms it will be required for some institutes to spend in this FY and others next FY. Decisions need to be taken on managing this and ensuring the spend is consistent with the financial profile and the evolution of Tier2s. Experiments will consider where disk funds are needed to meet commitments and CPU funds will be primarily be distributed according to the performance metrics. Total spend is about £1.5M with a nominal spend of c £788K this FY and slightly less next year. We need to continue the evolution of Tier2s into big centres hosting disk for Atlas, CMS, etc and CPU only at other sites. There has been pressure to reduce the number of smaller sites monitored by the experiments – GridPP could continue to regard all sites as “Tier-2s” but, at the same time, mark smaller sites as “Tier-3” as far as ATLAS are concerned. This will take the pressure off those sites. For example, Bristol operates without centrally managed Grid space, over the past 18 months this has operated very well. The only issue is that we want to ensure that resources delivered are accounted against the pledge. RJ will check whether Atlas acknowledge if CPU deriving from such sites is part of the pledge – AM confirmed the source of pledges is not a problem for LHCb.
DB enquired where CMS and Atlas would like their disk located – this was covered during Tier2 evolution as decisions must be made on what procurement must be undertaken in each FY in order for PG to design a model. RJ noted it should be taken into account that Atlas are still in a transition phase and this will have an impact. Intermediate sites with storage – caches do not need to be very large but decisions need to be modelled over long timescales since Holloway, Oxford and other medium-large sites create issues for Atlas and require planning well in advance. Large and medium sites should be surveyed to ascertain their exact disk-profiles as current equipment reaches end-of-life.
Glasgow is building a new data centre and it was suggested the funds for Glasgow HW should not be spent until FY19 to accommodate this timescale – a similar request from Imperial.
As previously discussed, RALPP resources need to be integrated into the Tier-1 as it cannot be justifiable to have 2 computer centres in one place, hence if CMS disk goes to PPD it should be integrated with Tier1. We cannot justify additional manpower to run a separate h/w facility.
ACTION 664.1: RJ will discuss with Stephan about reducing the number of sites and confirm if they are happy to acknowledge CPU deriving from these sites is part of the pledge.
ACTION 664.2: PG will canvas sites to ascertain when they want to spend money and determine how disk will be phased out.
ACTION 664.3: RJ and DC will advise how their experiments want disk divided for the start of Run 3 (Alice and LHCb are resolved).

3. Summary of recent questionnaires
===================================
PC summarised surveys recently completed and where they went – see attached slides (Appendix I).
PC has written up a document outlining the infrastructure for UKRI infrastructure group chaired by Brian Bowcher (now stepped down and possibly replaced by Mark Thompson).
DB summarised UKRI survey for the Balance of Programmes programme – which had multiple and complex questions (some of which were not specifically relevant). This may be an early phase of a later, more relevant, questionnaire.
ACTION 664.4: PC to publish input to Balance of Programmes Review on GridPP website.

4. Experiment Support Posts
===========================
CMS post has been unfilled for over 12 month and now Alistair has moved on, Atlas post is also unfilled. PPD has considered filling the Atlas role with two part-time staff, but GridPP and ATLAS feel that it requires full engagement. The CMS vacancy was subject to a failed recruitment. ChrisB has undertaken some fractions on the role and a second round of interviews were conducted last week – the preferred candidate is likely to accept an offer. These posts need to be filled and functional if they are to be defended in GridPP6.
The preferred candidate requires a visa and TD noted that a PhD requirement post does not need Tier4 visa – a recent role in Glasgow has been offered to a Canadian citizen.

5. Tier1 staff changes and review date?
========================================
AD took on the role of Tier-1 manager and the Tier1 review date was originally considered for May 2018. It was, however, proposed the review would be best undertaken in September to ensure AD has a firm grasp on aspects including the distribution of staff and their responsibilities/funding, etc. in advance. Due to other commitments, it was agreed that Thursday 13th September was the most appropriate date for the review.

ACTION 664.5: GS to respond on availability for proposed date of 13 September for Tier1 review

6. Tier1 Finances
=================
One tranche of the disk procurement has not been delivered on time for this FY, largely due to a shortage of 12TB drives. We are assessing the impact of this.
Less time than expected was booked on the Tier-1 staff line last year and we have requested to carry forward 1.5 FTE to next FY to ease the Tier1 staff ramp down that was built into the mid-point of the GriPP5 project. A decision on this request should be known around May. Moving 17.5 FTE to 14.5 FTE would be challenging and moving 1.5 FTE over would also significantly help in ensuring Castor will remain as necessary.
DK raised the question of travel funding for CHEP and circulated a spreadsheet relating to requests by site. The number of CHEP abstracts accepted this year far exceeds any previous year resulting in higher numbers of attendees. DK has circulated a spreadsheet with estimated costs and suggested distribution of attendees. However, the information is incomplete as Lancaster is not included yet in the figures – this should be updated before decisions can be taken. Usually around £15K is spent, early bird deadline is the end of this week.
Suggested attendees at CHEP: Talks – 3 from Edinburgh so 2+PC; David Crooks will attend from Glasgow, a poster and talk involves Sam and Gareth and one of them should attend; Manchester – AM must attend, as should Alessandra; Brunel – Raul has a talk and should attend; Imperial – 3 talks and Daniela and Simon or DC and Daniela should attend; RAL – Adrian, Rob Appleyard, Alastair, Fraser and DK – it was suggested 3 should attend. It was suggested that in addition there should be 1 attendee from Lancaster PLUS one extra for Edinburgh. If there are additional requests later or issues in institutes funding the other half the group should consider paying if absolutely necessary.
ACTION 664.6: PC will confirm to DK the number of staff from Edinburgh that should attend CHEP.
ACTION 664.7: RJ will send information to DK on CHEP talks and posters accepted.

7. UKTO
========
The PMB took the opportunity to discuss how GridPP relates to UKT0 in the shorter and longer term.
PC noted he wears two hats (computing for PPAN and GridPP) and must balance the good reasons for getting involved in other elements given funding constraints. In the long term this is about establishing trust and collaboration so that future funds can be used to share costs with GridPP for elements of STFC-wide computing. In the shorter term, we operate at the edge of capacity and other groups may not appreciate the effort and resources involved in running big computing systems, and we have to balance our own need to secure GridPP in the longer term in the face of potential decreased funding in the future. This means that staff working in other contexts should hopefully result in other UKT0 projects where their costs can potentially be covered by other funding applications. I.e. there should be an infrastructure that is useful to the common good and some costs are shared. We are experts in the field and contribute greatly to other projects and should be integrated into grant funding applications. There may be limited opportunity for funds from the Astronomy program per se, since there are many historical contributory factors and no excess funds – these should be acquired now, e.g. the potential £16M should be used as a proof of concept.
Consideration should be given to how GridPP6 should be framed and explain why GridPP and UKTO are necessary, highlighting our provision of support for UK particle physics as an integrated part of UKT0, similar to our position within WLCG. There was some discussion on how to best approach specific aspects, e.g. common infrastructures and differences in resource usage, and encouraging collaboration. Specific issues can be addressed – e.g. security should be integrated across GridPP and UKT0 and this opens potential for jointly funded roles. Operations security should include h/w jointly by UKTO and GridPP. Also, perhaps, a single tape store as a common UKTO/GridPP resource and part of UKTO infrastructure, amongst others. Our peer relationship with UKTO should be made clear and similar relationships elsewhere should be explored as well as how this could best be presented in the context of GridPP6.
At the OSC Tony Medland raised GridPP6 and stated the timeframe should be after the grants round is completed in October for a planned submission by end February 2019. DB suggested he would like formal briefing on the scope which could clarify distinctions between UKT0 and GridPP (LZ, LSST, etc).
There may be less money in GridPP6 for staff at Tier2 and it benefits these staff to be engaged in UKT0 as a potential source of funding/career path. AM’s talk GridPP40 will encourage more engagement with UKT0, e.g. Cambridge is smaller GridPP site but is becoming an increasingly important UKT0 site and could be a model since the knowledge is already there in some staff. Sites are encouraged to engage with and support UKTO at the local level as this is beneficial to GridPP, the sites, individuals, etc.
There was discussion on how far to push GridPP aspects in the UKTO context – e.g. Vcycle, Dirac, since only a few groups engage with these services. It was noted that these services should be offered as thing we can help with but that, ultimately, the choice of what to use is up to the user groups.
In summary positive engagement with UKTO should be promoted making clear the intention is not to take over but simply to offer help where we can. A vision of co-existence with UKTO should be developed in advance of drawing up the GridPP6 proposal, perhaps engaging a consultant to participate in this process and consider other areas that have been successful in this field, e.g. Canada. AM will start by making a list of potential participants and their requirements wrt API results.
ACTION 664.8: JC will examine GridPP staff roles/service/areas of expertise.
ACTION 664.9: AM will share baseline of interfaces he will draw up for UKT0 participating sites before a F2F in June.
ACTION 664.10: AM will share list of interfaces which experiments need to be able to participate in the UKT0 service.

8. GridPP6 Planning
===================
This was largely covered in the UKT0 discussion.

9. AOCB
=======
a) GridPP41 will be held in Ambleside 28th-31st August.

REVIEW OF ACTIONS
=================
655.3: PG to consider the agenda and date for Tier1 review and include disaster recovery plans. (UPDATE: appropriate dates are being considered with Alastair Dewhurst). Ongoing.
656.1: DK will report before the end of February on any actions GridPP should take to comply with GDPR. Ongoing.
656.2: DC will report on CPU efficiencies and CMS taskforce. Ongoing.

662.1: Summaries to be provided of any likely contribution to the broad aims of GridPP from the CDT: AM/RJ for Manchester, JC for Cambridge, DB for Glasgow.

ACTIONS AS OF 09.04.18
======================
655.3: PG to consider the agenda and date for Tier1 review and include disaster recovery plans. (UPDATE: appropriate dates are being considered with Alastair Dewhurst). Ongoing.
656.1: DK will report before the end of February on any actions GridPP should take to comply with GDPR. Ongoing.
656.2: DC will report on CPU efficiencies and CMS taskforce. Ongoing.

663.1: Summaries to be provided of any likely contribution to the broad aims of GridPP from the CDT: AM/RJ for Manchester, JC for Cambridge, DB for Glasgow.

664.1: RJ will discuss with Stephan about reducing the number of sites and confirm if they are happy to acknowledge CPU deriving from these sites is part of the pledge.
664.2: PG will canvas sites to ascertain when they want to spend money and determine how disk will be phased out.
664.3: RJ and DC will advise how the experiments want disk divided for the start of Run 3 (Alice and LHCb are resolved).
664.4: PC will publish results from Balance of Programmes Review on GridPP website.
664.5: GS willrespond on availability for proposed date of 13 September for Tier1 review.
664.6: PC will confirm to DK the number of staff from Edinburgh that should attend CHEP.
664.7: RJ will send information to DK on CHEP talks and posters accepted.
664.8: JC will examine GridPP staff roles/service/areas of expertise.
664.9: AM will share baseline of interfaces he will draw up for UKT0 participating sites before a F2F in June.
664.10: AM will share list of interfaces which experiments need to be able to participate in the UKT0 service.

Appendix I