GridPP PMB Meeting 683

GridPP PMB Meeting 683 (22/10/18)
=================================
Present: Dave Britton (Chair), Pete Clarke, Jeremy Coles, David Colling, Alastair Dewhurst, Pete Gronbech, Roger Jones, Steve Lloyd, Andrew McNab, Gareth Roy, Louisa Campbell (Minutes).

Apologies: Tony Cass, Dave Kelsey, Andrew Sansum,

1. Q2 Quarterly report overview
===============================
The report is largely complete and Matt will take on this task for Q3. PG provided a summary of key points from the report that he circulated in advance of the PMB.
DB noted on the Tape at RAL issue, AD should tweak the metric if appropriate and asked if any of the flagged issues are still outstanding and require attention. For example, 6% for CMS is being addressed by changing procedure in the quarterly resource meeting to ensure it is picked up – Tier-1 resources have been low and is now filling up to the correct level, though the issue was more likely due to more use of resources at Tier-2 and other aspects which procedures are now in place to resolve.

2. Oversight Documents
======================
PG noted the date of the OSC meeting is very close and documents require to be prepared urgently. He has put a copy of the previous report into a Cernbox and has begun to work on the new document. DB asked for the relevant PMB members to deal with their sections – these need to be submitted one week prior to the meeting, not 2 weeks as was previously the case. PG will continue to be responsible for this document due to his experience. DB requested drafts be circulated by next week’s PMB (PC is in Chicago for LHC1 meeting next week and will work on this there). PG will prepare the financial table, project map and risk register in time to process also for the Q3 report. Decisions need to be taken in regard to the late spend funds.
Re the new OSC – DB saw Tony Medland and Sarah on Wednesday and they confirmed the OSC has been changed – Fab and Martin have left, Carlton remains – Chris Alton, Jackie Palace and Andy Buckley have joined (Chris Alton may take over as Chair in the longer term, but that is not clear at this stage – he is the UK Representative of the CRSG at CERN and has a good overview). Therefore we should be aware there are three new members who will not have the same understanding of our normal reports as previously and may have additional questions that will require to be addressed.

3. Risk Register
================
PG raised this in order that this can be covered in next week’s PMB.

4. ALICE/ATLAS – Birmingham Storage
===================================
Relating to an email from AD – Birmingham is migrating from DPM to EOS and also supporting Atlas who are happy to continue using EOS. However, this may cause concern at Atlas regarding a small site which may not have sufficient support and Atlas is requiring something different, which we need to consider in the longer term. There was some discussion on the resources required. The question was asked why Atlas requires this storage at Birmingham and if it is necessary – RJ confirmed it needs disk for cacheing. UCL is used in diskless (VAC and Storage) or we could remove 400TB to RAL and give the Atlas Disk at Birmingham to ALICE. Birmingham could move to a cacheing model with a small allocation on a large well-run service and most of its resources have been allocated to ALICE for some years now. Care should be taken on dealing with the situation since ALICE should be considered as another community we support, particularly nuclear physics at ALICE, and how best we can assist them. We could suggest there is no implication for our Tier-1 and because of our direction of travel it makes less sense to do this for Atlas, and then ALICE can continue to use the Tier-2 at this one site, ie Birmingham. There was some discussion on other mitigating factors. DB suggested we are conflating 2 issues – technical issues of running 2 EOS instances at Birmingham and our relationship with ALICE. They expect the same from us as other communities do and we should try to help if we can. DB will email Mark noting Birmingham’s ability to run 2 OES at one facility and whether there are alternatives as suggested above, without prejudice to GridPP6.
ACTION 683.1: DB will write to Mark at ALICE to consider the best solution for EOS proposals at Birmingham.

5. GridPP Resources available to IRIS
======================================
DB noted an issue raised by AS (emails circulated) due to IRIS soon to put out a call for bids for using IRIS resources and the resource allocation committee would like to know if GridPP will commit to GridPP-owned resources at that facility. DB emailed the PMB that it may not be appropriate to commit these resource as we show goodwill and provide resources wherever available, typically from Tier-2 sites who maintain ownership of resources and make local decision on who uses these resources in line with GridPP guidelines. Sites support VOs they wish to supply resources and help to, this does not translate to a resource allocation committee in terms of knowing their customer base and best ways to support. The PMB agreed not to allow the committee to allocate the resources on our behalf but we will continue to commit resources if we can.
ACTION 683.2: DB will write to AS confirming GridPP are happy to continue providing resources using processes already in place (quarterly resource meeting) wherever possible, though are unable to agree for the resource allocation committee to directly allocate resources.

6. WLCG SOC Workshop funding
======================================
David Crooks made a request to support a WLCG SOC (Security operations Centre) he would like to host at Cosners 19-21 February and asked if GridPP could contribute £500 to moderate necessary registration feeds. It was suggested delegates should not really be asked to contribute fees to register over and above costs for accommodation. It was suggested that the institute concerned should try to find funds to allocate – DB will speak to David Crooks to determine breakdown of costs and agree to support it accordingly.

7. CMS Storage Request
======================
Request for extra PTB from CMS. It is possible to do at RAL due to underuse by ALICE as they cannot fill the disk quickly. There is around 650 TB of additional disk being allocated next year so an extra PTB may be used up by early allocation, but the extra 350 TB may be challenging to get back. Also, none of the h/w for Atlas is being used and is not a GridPP resource that will be used in the next 6 months. DC explained it came out of a report at the management meeting – DC will check the minutes and report back to the PMB. AD, Chris Brew and Katie are on the same flight to CERN and will discuss then report back to PMB. DB proposes that provided we have a plan to safely deliver the disk we can approve the 650TB needed to meet next year’s pledge so long as AD has confirmed it is achievable and also assurances of how/when it will be returned.
ACTION 683.3: AD and DC will provide the PMB with information subject to provision of 650 PTB with a plan of how and well it will be returned.

8. SCD Strategy for Tape Storage
================================
Email from AS to some members of the community. This picks up on some points raised at the Tier-1 review relating to SCD at RAL would not move toward a tape infrastructure. The email enquired whether a dialogue should be commenced on a shared tape infrastructure. There was some discussion on whether there is scope for this activity with a key aspect of shared risk and equitable distribution of costs – ie a shared tape infrastructure, possibly branded as ALICE. DB invited comments. STCF/SCD placing themselves in a strong position as a tape supplier for several communities makes a lot of sense and IRIS funds should be allocated to this provided communities require the tape. The PMB agreed it is a sensible approach, economies of scale could mean costs are reduced though there may be slight risk of losing an element of control, but we are a large player and should retain a good measure of control that works within GridPP. Additionally, the solution may end up more expensive overall since STFC will take on costs of developing as AD has alluded to in his background document, but since we will move away from Castor this will become cheaper. This could be an opportunity for AS to make a strong argument that effort for this should come form elsewhere too, directed at Jasmine and programmes as an example of effort that GridPP are committing.
ACTION 683.4: DB will write to AS advising GridPP strongly support the SCD Strategy for Tape Storage and see this as a way of containing future costs.

9. AOCB
=======
Swindon meeting that DB and PC attended with Tony Medland and Sarah about being briefed on the scope of GridPP6. It was a very helpful and positive meeting with a few points raised, including:

a) We will need to tabulate resource request in different categories (resources for LHC experiments and exploitation and separate number for non-LHC eg NA62) and another for in-construction experiments that are going ahead and have reasonably well known requirements, and experiments in development. Therefore, we need to factorise more clearly into these categories as the committee adjudicating this will need to determine what is or is not in scope.
b) Services and support – we should make clear (perhaps a section additionally containing elements we propose to do which are not core LHC – DUNE, CVMS for Ligo, etc – to assist an additional community) to show small increments necessary to lever the LHC for other communities.
c) Funding – planning line is flat cash, and we should probably address a +10% scenario as well as a -10% scenario. We will receive a letter with detail on scope, pages, etc. Any potential for -30% was not discussed. DB asked if we could write a +10% case as the default then 2 other cases as we are already at a level to continue to do more with less as a consequence of flat cash over many years, increasing stable of experiments we support and increasing need to provide for IRIS etc. Therefore it may be difficult to make a case at flat cash level. We will await the wording of the instructions in the next week or so.
d) Timescale – GridPP6 should go for 4 years to cover Run3 – £24M project exceeds the £20M threshold which requires a business case to BAES. DB noted PC has done such business cases already in the IRIS context and STFC are having a new format for the paperwork which better matches our proposal design, so this is ultimately better over 4 years and Tony and Sarah appear to concur this makes better sense.
e) Timeline – some discussion on various committees working backward from June – a submission date will preferably be 10 March as last time but possible end February.
f) Strategic objectives for GridPP5 should align well with GridPP6 with some tweaking of wording.

There are positives to come out of a business case scenario as it may mitigate other approval processes. It was noted the timescale coincides with the OSC. The Board will probably be specially constituted with PPGP and PPR members. It is recognised that last time they did not have the expertise to adjudicate, so this will have PPGP with cross-membership and experts brought in – possibly cross membership from OSC which increases importance of engaging appropriately with the new OSC. It is not yet clear who will replace Tony when he retires at Christmas.

2) PG noted emails that he and AM receive occasionally and one received today from a 3rd year computer science student from Brunel asking if they can use our information for a case study. He should be directed towards the Brunel site team.

10. Standing Items
===================

SI-0 Bi-Weekly Report from Technical Group (DC)
———————————————–
To be discussed next week.

SI-1 ATLAS Weekly Review and Plans (RJ)
—————————————
To be discussed next week

SI-2 CMS Weekly Review and Plans (DC)
————————————-
To be discussed next week

SI-3 LHCb Weekly Review and Plans (PC)
————————————–
To be discussed next week

SI-4 Production Manager’s report (JC)
————————————-
To be discussed next week

SI-5 Tier-1 Manager’s Report (AD)
———————————
– Ongoing issues with CMS-AAA service. Machines are regularly going in to swap and then failing tests. We believe there are multiple problems including at least one memory leak in XRootD. Katy, Chris B and myself are out in CERN for CMS C&S week, we will discuss how to resolve this.

– Patching of machines against CVE-2018-14634 is in progress. A lot of WN are currently being drained and will be restarted on Monday with the fix.

– There was a problem with our configuration management system that was triggering machines to reboot indefinitely because it wasn’t detecting that the correct Kernel was on the machine. This delayed the HPE15 WN going back into production.

– On the 16th October, while host certificates were being renewed in bulk for Echo. They old certificates were revoked before the new ones had been put on. As other sites pick up this revocation, Echo failed more and more transfers. The new certificates were deployed the following morning.

– The HPE15 WN did go back into production last week and this means that for the first time in about a year the Tier-1 was comfortably above our WLCG pledge.

– The Tier-1 disk procurement has (finally) been finalised and the tender is ready to be submitted (today/tomorrow). I am still waiting for the CPU procurement back from SBS but that won’t delay the submission of the disk procurement.

SI-6 LCG Management Board Report of Issues (DB)
———————————————–
To be discussed next week

SI-7 External Contexts (PC)
———————————
To be discussed next week

REVIEW OF ACTIONS
=================
644.4: AD will progress capture of funds for Dirac with Mark Wilkinson. (Update: funding from DIRAC. AS has emailed Mark. They are now using it more heavily. Could use the money for tape, but have to be careful not to buy tape we won’t use. May be better charging later rather than during this FY? AD will now progress. 08/10/18 – Leicester are producing a PO for tapes and will send to AD to produce an invoice). Ongoing.
667.2 PG will do h/w planning before next OC to provide OC with details of shortfall in funds. Ongoing.
675.1: DC to sign off report on Tier-1 LHC usage. Ongoing.
678.2: DK to finalise the Security, Trust and Identity background document by mid October. Ongoing.
678.3: AD to finalise the Tier1 background document, including tape strategy by end September. Ongoing.
678.5: JC to finalise the Storage background document by end September.
(UPDATE: 17 October meeting with Tony Medland – DB and PC will attend. This is almost complete and awaiting a few minor elements to be worked in – GR will upload into Googledocs for info). Ongoing.
680.2: JC will follow up GDPR implications relating to VOMS with DK. Ongoing.

ACTIONS AS OF 22/10/18
======================
644.4: AD will progress capture of funds for Dirac with Mark Wilkinson. (Update: funding from DIRAC. AS has emailed Mark. They are now using it more heavily. Could use the money for tape, but have to be careful not to buy tape we won’t use. May be better charging later rather than during this FY? AD will now progress. 08/10/18 – Leicester are producing a PO for tapes and will send to AD to produce an invoice). Ongoing.
667.2 PG will do h/w planning before next OC to provide OC with details of shortfall in funds. Ongoing.
675.1: DC to sign off report on Tier-1 LHC usage. Ongoing.
678.2: DK to finalise the Security, Trust and Identity background document by mid October. Ongoing.
678.3: AD to finalise the Tier1 background document, including tape strategy by end September. Ongoing.
678.5: JC to finalise the Storage background document by end September.
(UPDATE: 17 October meeting with Tony Medland – DB and PC will attend. This is almost complete and awaiting a few minor elements to be worked in – GR will upload into Googledocs for info). Ongoing.
680.2: JC will follow up GDPR implications relating to VOMS with DK. Ongoing.

683.1: DB will write to Mark at ALICE to consider the best solution for EOS proposals at Birmingham.

683.2: DB will write to AS confirming GridPP are happy to continue providing resources using processes already in place (quarterly resource meeting) wherever possible, though are unable to agree for the resource allocation committee to directly allocate resources.

683.3: AD and DC will provide the PMB with information subject to provision of 650 PTB with a plan of how and well it will be returned.
683.4: DB will write to AS advising GridPP strongly support the SCD Strategy for Tape Storage and see this as a way of containing future costs.