GridPP PMB Meeting 674

GridPP PMB Meeting 674 (16/07/18)
=================================
Present: Dave Britton (Chair), Pete Clarke, David Colling, Pete Gronbech (minutes), Dave Kelsey, Steve Lloyd, Andrew McNab, Andrew Sansum.

Apologies: Tony Cass, Jeremy Coles, Alastair Dewhurst, Tony Doyle, Roger Jones.

1. Project Manager’s Report
===========================
PG stated that all but one of the Q118 reports were in so he will provide a report at the next PMB.
PG has consulted with Tony Medland with regard to reporting the additional capital allocated to the Tier-1 in FY16 and 17 in the financial table. This now shows a ~£300K underspend on the hardware line. This is possible because this line is not split between Capital and Resource. Need to discuss with AS. Will meet on Friday.
Tier-2 h/w allocation needs to be made with as much as possible in FY18. Steve Lloyd had rerun his metrics page following the correction of some anomalies in publishing from RHUL. PG needs the final percentage splits for CMS storage. DC is urged to provide them. The question of RAL PP as a tier-2 in GridPP6 was discussed.
The GridPP41 agenda requires to be firmed up.
Finally PG informed the group that he has been appointed as the Head of IT for the Physics department at Oxford University and therefore will be having to reduce his GridPP involvement.
DB has proposed that he could continue to be the ‘Resource Manager’ looking after the high level finances and allocation of resources at the Tier-1 and Tier-2. More routine tasks such as Quarterly Reports, Indico agenda’s, meeting reminders, (and notes) and other Production Manager duties should be handed over ASAP as people are found to take on these roles.

2. Captitalisable (IRIS) projects
=================================
IRIS: Targeted at institutes and Capital allocated to Centre
Capitalizable Items v1: DB believes this will be difficult as there are few FTE’s and no FEC. Need to consider how to convince institutes to do this for only 1-2 years. Timescales are quite tight. PC noted this is challenging for various reasons, but there are good reasons for doing this:
~£1M gone to Ada Lovelace Centre for s/w projects.
It was felt that we should take the opportunity to start to develop assets for the benefit of the entire PPAN community.
Have to be careful it will count as capital.
Both STFC and the institute have to be convinced.
AS noted that part of the discussion has fallen between the gaps and SCD are less certain about how to go about creating a remote capital asset.
DC suggested IC has a way to understand this – Multi-VO DIRAC at IC. s/w could be extended to incorporate this. There was discussion on estimated effort and it was confirmed that manpower could be found to undertake this. A list of IRIS users already using or intending to use it is required and some research requires to be undertaken.
APEL (at RAL): The relationship with Grid SAFE needs to be understood and AM suggested that other projects may not be aware what they need. However, care should be taken as some projects will never grow to the size of LHC so do not or have not needed it. We have to demonstrate that any other activity really needs this. DB suggested the user groups will want to know how much they have used and the sites will require monitoring to determine what they have delivered.
CERNVM File System: Catalin has come up with a proposal to secure CVMFS.
Extend monitoring: ANEAS, DUNE, etc need ETF replacement for SAM. It’s an open source flexible plugin based framework. DB suggested this seems further down the line but there is an opportunity to make the proposal.
AS/AM GOCDB. (It’s already very flexible) We want this on the diagrams, but cannot think of anything ATM. DB suggested it should be on the architectural blueprint, not necessarily a capital asset.
Rucio: Would be good to get half an FTE to Edinburgh. Need plans and milestones and user cases. PC already doing this for DUNE in the UK. SKA are looking at it. There is a possible suggestion that the instance could be at RAL but the development at Edinburgh.
Vcycle: AM is managing openstack. This can be extended to commercial providers. Library of templates that expts can use to create VMs.

From GridPP side We encourage bids from RAL for APEL and CVMFS, and IC for DIRAC , Ed for RUcio and Manchester for vcylce. Others may come along but these are the current front runners. We encourage GridPP members to prepare the background for these.

3. Standing Items
===================

SI-0 Bi-Weekly Report from Technical Group (DC)
———————————————–
Nothing to report.

SI-1 ATLAS Weekly Review and Plans (RJ)
—————————————
Nothing much from the Tier 2s. Lancaster had a power outage on Friday (the city, not just the site!) but recovered within hours.
There was a data loss in the castor instance at RAL, but it was small and we believe all secondary copies (tbc)
We are also zipping up small files to recover space across Atlas production spaces. RAL installed a zip utility to help this.

SI-2 CMS Weekly Review and Plans (DC)
————————————-
Nothing to report.

SI-3 LHCb Weekly Review and Plans (PC)
————————————–
Nothing to report.

SI-4 Production Manager’s report (JC)
————————————-
Briefing done for Ops-meeting and storage group background document draft. The work has started in a Google Doc, it was on-hold last week due to CHEP.
Tier2 background document draft has not yet been started due to SKA-SDP reviews, but JC will be turning to it shortly.
Quarterly report has not yet been submitted. This has been started but T2 reports were only recently received. The intention is to return to it later this week (probably after the GDB if it finishes early).

SI-5 Tier-1 Manager’s Report (AD)
———————————
We lost a disk server (gdss747) for ATLAS. This is part of the 13 generation of disk servers that are in their 5th year and are being retired. It had 60k files on it, 89% were logs files. Of the remaining files, some will have been dark data although some may have been still useful for ATLAS. Note: While I said that ATLAS have migrated off Castor for disk, Rod Walker made a mistake while merging the sites back together which meant the output of some jobs got written to Castor.

Completed patching of singularity across batch farm.

We are migrating the LFC to new hardware. The new setup has been setup and tested. The database team are running some tests for the database migration which we expect to happen in the next few days.

SI-6 LCG Management Board Report of Issues (DB)
———————————————–
MB meets tomorrow.

SI-7 External Contexts (PC)
———————————
All plugged in to rapidly developing IRIS items.

AS has been asked to report to SeIGO board. Governance structure for GridPP. There is no longer a user Board, but weekly experiment meeting should be viewed as part of the governance structure of GridPP. They have the opportunity to feed back issues rapidly to GridPP.We had descriptions of the roles of the PMB on the web site.

REVIEW OF ACTIONS
=================
644.4: AD will progress capture of funds for Dirac with Mark Wilkinson. (Update: funding from DIRAC. AS has emailed Mark. They are now using it more heavily. Could use the money for tape, but have to be careful not to buy tape we won’t use. May be better charging later rather than during this FY? AD will now progress). Ongoing.
663.3: RJ and DC will advise how the experiments want disk divided for the start of Run 3 (Alice and LHCb are resolved). (Update: DB will write to DK with DC in copy with proposed way forward – almost complete). Ongoing.
665.2: AD will produce Procurement schedule for the coming FY to build in an additional month to buffer any delays in the future. Ongoing.
667.1 PG Clarify with STFC what exactly is required for the OC feedback. wrt the Capital reporting (Update PG, AS and AD to meet Friday to create a roadmap to monitor progress). Ongoing.
667.2 Need to do h/w planning before next OC to provide OC with details of shortfall in funds. Ongoing.
671.2: PG and AM to check if there is a common requirement across the Grid that can be negotiated with Dell for a framework agreement (e.g. Storage, Compute, Configurations). Ongoing.
672.1: DB will complete the UKRI Infrastructure Survey and request comments from the PMB by 02/07/18. Done.
672.2: JC and/or PG and/or PC to brief Ops-meeting and request storage group draft background document. AD to contribute. Done.
672.3: RJ, DK and AM to draft the Experiment Support background document. Ongoing.
672.4: DK to draft the Security, Trust and Identity background document. Ongoing.
672.5: AD to draft the Tier1 background document. Ongoing.
672.6: JC, SL AM and PG to draft the Tier2 background document. Ongoing.
672.7: PG will consider the agenda for GridPP41 incorporating the GridPP6 Background Documents. Ongoing.
673.1: AM will create a matrix of HEP technology that could be of use to other communities (matrix will come from this). Done.
673.2: AD will provide the PMB with an overview of strategy for tapes and drives for the remainder of GridPP5 and GridPP6. Ongoing.

ACTIONS AS OF 16.07.18
======================
644.4: AD will progress capture of funds for Dirac with Mark Wilkinson. (Update: funding from DIRAC. AS has emailed Mark. They are now using it more heavily. Could use the money for tape, but have to be careful not to buy tape we won’t use. May be better charging later rather than during this FY? AD will now progress). Ongoing.
663.3: RJ and DC will advise how the experiments want disk divided for the start of Run 3 (Alice and LHCb are resolved). (Update: DB will write to DK with DC in copy with proposed way forward – almost complete). Ongoing.
665.2: AD will produce Procurement schedule for the coming FY to build in an additional month to buffer any delays in the future. Ongoing.
667.1 PG Clarify with STFC what exactly is required for the OC feedback. wrt the Capital reporting (Update PG, AS and AD to meet Friday to create a roadmap to monitor progress). Ongoing.
667.2 Need to do h/w planning before next OC to provide OC with details of shortfall in funds. Ongoing.
671.2: PG and AM to check if there is a common requirement across the Grid that can be negotiated with Dell for a framework agreement (e.g. Storage, Compute, Configurations). Ongoing.
672.3: RJ, DK and AM to draft the Experiment Support background document. Ongoing.
672.4: DK to draft the Security, Trust and Identity background document. Ongoing.
672.5: AD to draft the Tier1 background document. Ongoing.
672.6: JC, SL AM and PG to draft the Tier2 background document. Ongoing.
672.7: PG will consider the agenda for GridPP41 incorporating the GridPP6 Background Documents. Ongoing.
673.2: AD will provide the PMB with an overview of strategy for tapes and drives for the remainder of GridPP5 and GridPP6. Ongoing.