GridPP PMB Meeting 605 (F2F)

GridPP PMB Meeting 605 (29.08.16) F2F – AMBLESIDE
=================================
Present: Dave Britton(Chair), Pete Clarke, David Colling, Tony Doyle, Pete Gronbech, Roger Jones, Dave Kelsey, Steve Lloyd, Andrew McNab, Andrew Sansum, Gareth Smith, Alastair Dewhurst (STFC), Alison Packer (STFC), Louisa Campbell (Minutes).

Apologies: Tony Cass, Jeremy Coles.

1. Intro (DB)
=============
DB welcomed the PMB to Ambleside and thanked AD and AP for attending the PMB to present their report.

2. RAL OPN Link
===============
GS presented on the OPN link (available on the Indico webpage).
DB requested clarification on whether additional issues arise because the system continues to reset itself after a failure and if that could be anticipated in future. There is some correlation with FTS traffic commensurate with saturating the link on some days and some packet loss with saturation. Monitoring the plot may not highlight when issues can occur – (plot average is 6.4 but consideration should be given to when more than 20 may be required). There was some discussion on what qualifies as a good transfer, saturating at 20 for short periods is not a great issue, whereas previous saturation at 10 did become so. 20 Gb is probably sufficient for run 2, but we should determine if it is possible to go beyond that and ascertain how that can be achieved, timescale and costs involved. Tier2s have regular periods when >30 Gb are being used. The back-up link is also important as we revert to 10 if the plug on either link is pulled this is manageable in the short term, but longer periods would create issues and require planning. Perhaps 10 for the primary and 20 for the backup. There is no increase in jobs running on Tier1 or Tier2 (May-Aug), so the major effect is luminosity (for CMS and ATLAS).

Costings
DK presented a history of costings over the last 3 years (available on Indico webpage) and noted secondary link costs increased because there was payment due for connections at both ends. However, cost were greatly reduced from original £133K level and last year this changed to £83.5K which may allow some headroom in the budgets if required. Changes had to be made this year and from 01.08.16 the new pricings for 2x10gb is £14K + VAT per circuit and an additional third party charge = £34K + VAT. To upgrade one link to 20 there are installation costs (£2400) and annual rental = £28,294. Increasing to 100 would increase installation costs to £152,810 and £129K annual rent. However, despite these numbers, it is not clear what the total actual cost will be to upgrade one (or both) links.

Strategy
It is considered that 20gb may suffice for the moment and cost implications for upgrades should be fully understood. It was agreed to try to upgrade now to 30gb (20 + 10) if that is affordable.

ACTION 605.1: DK will investigate costs and timescales of upgrading the OPN Link to 30 and report back to PMB.

3. GridPP5 Project Map and Reports
==================================
PG presented work packages changed in the GridPP5 proposal. Package 1 – Tier1, 2 – Tier2; 3 – Specialist services; 4 – Management, admin.

The Project Map remains almost unchanged but with some altered titles. PMB members are named as responsible for some elements of report generation. LHCb should be AM. Various milestones were contained in the proposals relating to procurement, pledges, installation, external reviews of Tier1, etc. PG compared GridPP4+ to GridPP5 and made necessary alterations, he will now look at reports to ensure the metrics are accurate. The Tier1 report remains unchanged and milestones were discussed – set out yearly for the duration of the project. A Tier1 review has been undertaken roughly annually (next review scheduled for November 2016, final date TBC). The term ‘external’ should be defined and it was noted a previous review was not undertaken was because funding had not yet been secured and any review had to take account of the GridPP5 strategy.

Tier2 – experimental reports with ATLAS: metrics remain relatively unchanged but feedback is required on whether the correct elements are being measured and if this is sufficient (RJ confirmed).

CMS – for GridPP4+ Bristol was not included as a Tier-2; DC confirmed Bristol is now operating well (Luke Kresco is running the site) but not undertaking Tier2 work. Unlike ATLAS the information and criteria differs – CMS sites were very responsive and working very well , but the report should also include issues. Elements such as RAL inefficiency was related to CMS issues outside of our control, but the reports can be simplified if uniform information is collected across ATLAS and CMS.

RAL efficiency should be shown compared to the average worldwide Tier1 efficiency as this factors in experiment-specific issues, though there are some jobs running only at some sites. Therefore, there should be a way that overall information can be included in the report, but DC can provide specific additional information. PG will define a way to select data uniformly, but members will provide relevant contributory information to PG. AM is currently working on elements included in the metrics. There is an OSC on 22 November and documents need to be completed by 15 November and work commenced by the beginning of November.

Data Group section – Jens or Sam should perhaps provide data, Jens provides relevant and timely data. There was some discussion for STFC input to Horizon 2020 – GOGDB and APEL and other elements to demonstrate our involvement, perhaps more related to logistics (under AOCB) and Ian Collier may be best placed to provide the necessary information.

NGI – there is no longer really an owner for the NGI part of the ProjectMap. The elements therein are not logically part of Tier1 and there is a consolidated report that comes from various people.

Security – DK has provided a report to PG.

Planning and execution – PG undertakes these strands. DC & PG will discuss this week.

Outreach – previously focussed on paper etc, this now needs to be reworked. SL will discuss with PG this week, including case studies (e.g. 1 case study per annum would be an acceptable measure). New case studies should be introduced for VACcycle, Deployment, etc, and should be related/linked to storage items. AM, DC and Sam can discuss this week.

DB slides – deliver commitments of Tier1 and Tier2; maintain and monitor services delivery; contribute and lead WLCG infrastructure in places; reduce effort to run infrastructure (lightweight Tier2 and Tier1 site); engage new user groups; engage with other e-infrastructures in UK; engage with European initiatives; respond to RCUK impact agenda; prepare for the period after GridPP5. These could be more explicit and the project map needs to illustrate how we are making progress in these areas, the technical side should be monitored and DC and AM require to be involved in more detail on these discussions. The metric could be ‘have you demonstrated that’ with a paragraph providing details in the quarterly report. Numbers are variable and difficult to incorporate into the metrics. Run 3 document preparations are critical – HSF is fulfilling our mandate to contribute to preparations for Run 3 and this should be built in to the project map and demonstrate to OSC that changes are being made. HSF engagement is undertaken through experiments but not disconnected with GridPP and a method of demonstrating that should be developed. Perhaps a way to highlight people connected to the GridPP and those who are paid by GridPP. CHEP provides an effective vehicle to demonstrate this where one person from different groups contribute papers but they are also connected to GridPP. The OSC are concerned over UK influence and driving forward in a way that will reduce costs. Care should be taken to ensure any commitment we make is achievable by the end of GridPP5: leading, monitoring and being involved in changes and manpower involved.

Risk register:
Reordered by work package (Tier1, Tier2, Expt Support, Project wide) and categorised by risk and effect. Tier1 risks together and detail on where main risks are. They are now grouped more logically and should be looked at in more detail before the next OSC. Today the organisation and categorisation were approved for 31 risks and which category of risk is involved (operational, reputational, financial) – this could be given numbers (1-3) and added at end. The list is now in a more logical order than at GridPP36 and it was agreed that a F2F should be arranged to go through the list in detail, assess level of risks and determine if any risks can be combined without pushing to too high a level and whether new risks should be incorporated (e.g. new sites coming on). PG will circulate and request feedback from PMB members.

ACTION 605.2: PG will go through data with PMB members this week to agree data to be included in the Project Map and reports.

ACTION 605.3: ALL to review risks on the register to which their names are attached and provide interim feedback.

4. Echo Status & Development Plan Update
========================================
AD presented an update.

The PMB thanked AD for a comprehensive presentation. Some issues have been raised: there are risks on the Castor side (technical risks and personnel risks, i.e. a dependency on specific individuals). On CEPH concern was expressed over the lack of GridFTP plugin which is not at the level where it is complete/almost complete, as well as the dependency on specific individuals. 2018 looms and the position should be monitored and carefully managed. Internal questions will be addressed, including what support should be in place for assisting development work and providing a contingency for absences. AD confirmed there are reassurances and contingencies in place including staffing. DB reinforced the critical point of issues with Castor and migration issues and enquired whether any issues require to be considered for the migration in 2018. On the physical limit of migration rate – ATLAS file size is an issue which is challenging to precisely define, but if Castor machines are drained and based on previous numbers it can be estimated at 1 PTB per quarter – this can be drained without too high an impact. At that rate there will be 7 PTB in Castor – if the migration commences during the third quarter of 2017 it will take 2 years to drain. DB enquired whether other issues should be considered, e.g. zipping up files in line with Dirac transfer strategy. It was confirmed this would be possible if necessary, e.g. perhaps secondary data that is copied elsewhere need not be migrated. Data with a known lifetime may not require planning into the migration. 13% of files on Castor going into Echo would give us a clearer idea of the way forward and potential issues should be planned around now to try and mitigate them in 2018.

5. AOCB
=======
a) Horizon 2020 calls: There was a call out for submission early 2017 and it is assumed that EGI will lead the response (EU related content). STFC concerns 1) EUT0 – Tony Medland is Chair of the CB and should be supported on this. PC sometimes attends – Charlotte Jamieson is representative of research councils. STFC (mainly security and SCD) staff will respond to the EGI Call, Ian Collier is involved. DK security team will submit and others will be responding to the Call. The main point is to maintain 1FTE between GOC and Apel and we could request more for CVMFS. For now the objective is to be part of a bid that may be submitted. PC is having various conversations to understand what we may be able to request. Many countries concerned have much larger EGI interests – we have historically been involved in these areas and UK strategies should be to build on that success: Security Policy, Vulnerabilities and Operational issues. There are probably opportunities to bid for FTEs in the security strand to meet the needs of research communities. Care should be taken on placing a pricing structure onto services we provide. This funding stream will replace Horizon 2020 which requires 50% matching funding. It would be helpful to have a summary on how this is evolving and our place in this, perhaps from Ian Collier or others at RAL, as it would be useful for Charlotte and others to be aware of the activities we are involved in.

REVIEW OF ACTIONS
=================
600.1: DC to contact Julia Sedgebeer at Imperial to informally discuss and address SuperNemo’s computing needs and request Daniella and Tom to await outcome of these discussions before progressing further. Ongoing.
602.2: AM and AS to resolve the LHCb request at the end of July. (UPDATE: AM and Concezio have worked up figures have now gone into 2019 and we could plan on this basis). Done.
602.3: AS will request that Jens make a presentation to the PMB supported by a written report on plans for AAAI project with Dirac as well as proposed reporting. Ongoing.
602.4: AS will contact Charlotte to determine how much effort in total STFC funding in the AAAI project so that we can see if there are other contributions we should expect. Ongoing.
603.1: AS will discuss with Yens and confirm agreement for 150 TB tape storage capacity for the Nuclear Physics request on the provision it can be accessed by existing mechanisms in the GridPP suite of tools. Ongoing.
603.2: DK will circulate costings for OPN links to determine which options would be affordable. Perhaps discuss at F2F. Done.

603.3: PG will include an item on F2F agenda for costs so that decisions can be made on the best way forward for OPN load. Done.
604.1: AM will discuss moving ‘The Evolving Information System’ to Session with with JC and PG. Done.
604.2: DB to arrange chairs for sessions which do not already have one assigned. Session 1 – RJ; Session – 2 PG; Session 3 – JC; Session 4 – DC; Session 5 – PC; Session 6 – DK. Done.

ACTIONS AS OF 12.09.16
======================
600.1: DC to contact Julia Sedgebeer at Imperial to informally discuss and address SuperNemo’s computing needs and request Daniella and Tom to await outcome of these discussions before progressing further. Ongoing.
602.3: AS will request that Jens make a presentation to the PMB supported by a written report on plans for AAAI project with Dirac as well as proposed reporting. Ongoing.
602.4: AS will contact Charlotte to determine how much effort in total STFC funding in the AAAI project so that we can see if there are other contributions we should expect. Ongoing.
603.1: AS will discuss with Yens and confirm agreement for 150 TB tape storage capacity for the Nuclear Physics request on the provision it can be accessed by existing mechanisms in the GridPP suite of tools. Ongoing.
605.1: DK will investigate costs and timescales of upgrading the OPN Link to 30 and report back to PMB.
605.2: 605.2: PG will go through data with PMB members this week to agree data to be included in the Project Map and reports.

605.3: ALL to review risks on the register to which their names are attached and provide interim feedback.