GridPP PMB Meeting 673

GridPP PMB Meeting 673 (02/07/18)
=================================
Present: Dave Britton (Chair), Tony Cass, Jeremy Coles, David Colling, Alastair Dewhurst, Roger Jones, Dave Kelsey, Steve Lloyd, Andrew McNab, Andrew Sansum, Louisa Campbell (Minutes).

Apologies: Pete Clarke, Tony Doyle, Pete Gronbech.

1. IRIS site selection document
===============================
DB circulated a draft of the proposed document. A table should be inserted to summarise the resources from the two sites to make it more easily accessible. Submission deadline is today and feedback was extremely positive from PMB members. Lancaster and Imperial will purchase 1 PTB each and Lancaster will make a purchase to match-fund. RJ will update, insert a table and send draft to DB then submit this evening.

2. IRIS capitalisable projects?
===============================
The IRIS PB have raised possible capitalisable projects – IRIS has capital funding, we are capitalising some projects for AOC and there is a question about whether that is possible at other institutes, i.e. digital assets that can be used by the wider IRIS community. We should consider if any of the projects are interesting to propose and the IRIS PB need to establish a process for what they will fund if they are convinced these elements could be used coupled with an understanding from us on whether these capitalised projects can be used at the proposed sites. Some suggestions were made on how this could operate and it could be argued that GridPP are already established in some elements that could be presented as capital assets (likely part of the capital asset funding for PPAN). AM will prepare a table/matrix with components, who is using them and potential users (1-page summary) – this is mostly available in the latest OS Documents and could also be used to ensure website info is up to date.
Action 673.1: AM will create a matrix of HEP technology that could be of use to other communities.

3. AOCB
=======
a) GridPP41 Registration and Agenda
Registration should be opened this week. DB asked the PMB to consider specific talks and general topics. GridPP41 will be based around the discussion documents – PMB members will raise the topic at forthcoming meetings they are attending.

5. Standing Items
===================

SI-0 Bi-Weekly Report from Technical Group (DC)
———————————————–
No report submitted.

SI-1 ATLAS Weekly Review and Plans (RJ)
—————————————
RJ confirmed Atlas are off the Castor grid (see Tier-1 Manager Report). This is simplifying setups more than Panda queues and will be easier to manage. Atlas made the decision that RAL will be part of the data carousel to provide tape as a source for higher RO facilities and testing will be undertaken in August. Therefore, both CMS and Atlas are now off Castor.

SI-2 CMS Weekly Review and Plans (DC)
————————————-
No report submitted.

SI-3 LHCb Weekly Review and Plans (PC)
————————————–
No report submitted.

SI-4 Production Manager’s report (JC)
————————————-
1. European HTCondor workshop will be held at RAL 4-7 September 2018

2. There was an informal DUNE UK meeting last week. It was noted storage estimates of 2-4PB per annum. Hosting sites are RAL, Manchester, IC and Edinburgh. There was a discussion about the necessity of spacetokens and non-SRM access (xrootd). As mentioned last week a full production use case has been run successfully on GridPP resources.

3. LZ have finished their mock data challenge and produced 3 months of data. They are now at their reprocessing stage.

4. The WLCG Tier-2 A/R reports for May (http://wlcg-docs.web.cern.ch/wlcg-docs/reporting/reliability-availability/2018/05-18/) revealed the following

· ALICE All OK

· ATLAS
RHUL 81%
Lancaster 86%

· LHCB
RHUL 76%
Bristol 74% (Availability only)

· CMS
Bristol 77%

The site explanations for missing the 90% target are:

RHUL: had problem with SE headnode, when node was out of time sync.

Lancaster: had two unscheduled downtimes, one due to a shared file system that the batch system relied upon breaking over a weekend; and another due to a supposed low-risk electrical maintenance going wrong and causing a power outage which kept the site offline for a bit over 24 hours.

Bristol: a problem with a New & Improved network switch for the main Bristol VM + LCG nodes; the problem manifest itself on afternoon of Friday 4 May, with result that all the nodes essentially lost network connectivity over the bank holiday weekend.

5. A GOCDB information review has recently completed. The UK situation is good.

6. There was a useful HEPSYSMAN meeting at RAL in June: https://indico.cern.ch/event/721692/. The working (hack) sessions were particularly valued but site participation was perhaps lower than the historical average with half GridPP sites represented. Interesting to note that Dan’s talk on the HEPSYSMAN role was “What’s the point to your existence”.

7. There has been an ongoing discussion in the ops area about the best place to keep and maintain documentation. Tom had done a good job with Git but this has been difficult for non-core members to update and some would prefer the wiki to be used. The question goes to how to maintain things with less manpower.

8. For awareness – Birmingham is deploying EOS for its disk storage (Birmingham have EOS it for Alice for a while and now proposing for other users – perhaps something to discuss at GridPP41).

SI-5 Tier-1 Manager’s Report (AD)
———————————
– Operationally it has been a very quiet last few weeks.

– ATLAS no longer (since 25th June) have any data managed by Rucio on Castor disk. Some log files remain which will be left for a few weeks to ‘expire’ before the service can be decommissioned.

– We have been doing extensive debugging of transfers between Echo S3 and various Grid storage endpoints (for both SKA and DUNE). There is an outstanding issue with certificates and the differences between commercial and Grid ones. We are having to put work arounds in place on a site by site basis. In the longer term we will need a policy discussion/change to get things to ‘just work’.
DK suggested DUNE could join the IGTF certificate trust list and there was some discussion on protocol in this regard. This has been raised with David Crooks and the Security team who are working to resolve.

– After discussions with multiple tape experts last week in DESY, we have started the process of procuring LTO tapes and drives.
There was some discussion on LTO tape and Oracle situation, AD confirmed this is being considered separately. AD will provide a more detailed summary of the proposed plans in this regard, including costs for tapes and drives.

Action 673.2: AD will provide the PMB with an overview of strategy for tapes and drives for the remainder of GridPP5 and GridPP6.

SI-6 LCG Management Board Report of Issues (DB)
———————————————–
Nothing to report.

SI-7 External Contexts (PC)
———————————
DUNE, LZ and IRIS were discussed at relevant points during the meeting.

REVIEW OF ACTIONS
=================
644.4: AD will progress capture of funds for Dirac with Mark Wilkinson. (Update: funding from DIRAC. AS has emailed Mark. They are now using it more heavily. Could use the money for tape, but have to be careful not to buy tape we won’t use. May be better charging later rather than during this FY? AD will now progress). Ongoing.
663.3: RJ and DC will advise how the experiments want disk divided for the start of Run 3 (Alice and LHCb are resolved). (Update: DB will write to DK with DC in copy with proposed way forward). Ongoing.
665.2: AD will produce Procurement schedule for the coming FY to build in an additional month to buffer any delays in the future. Ongoing.
667.1 PG Clarify with STFC what exactly is required for the OC feedback. wrt the Capital reporting. Ongoing.
667.2 Need to do h/w planning before next OC to provide OC with details of shortfall in funds. Ongoing.
670.2: DB and PG will consider percentage splits of CPU/Disk. Done.
670.3: DB, PG and PC will undertake a high-level discussion of planning manpower for in GridPP6. Done.
671.1: DB will discuss with the sites involved for IRIS h/w allocation and make recommendations to the PMB to consider. Done.
671.2: PG and AM to check if there is a common requirement across the Grid that can be negotiated with Dell for a framework agreement (e.g. Storage, Compute, Configurations). Ongoing.
672.1: DB will complete the UKRI Infrastructure Survey and request comments from the PMB by 02/07/18. Ongoing.
672.2: JC and/or PG and/or PC to brief Ops-meeting and request storage group draft background document. AD to contribute. Ongoing
672.3: RJ, DK and AM to draft the Experiment Support background document. Ongoing.
672.4: DK to draft the Security, Trust and Identity background document. Ongoing.
672.5: AD to draft the Tier1 background document. Ongoing.
672.6: JC, SL AM and PG to draft the Tier2 background document. Ongoing.
672.7: PG will consider the agenda for GridPP41 incorporating the GridPP6 Background Documents. Ongoing.

ACTIONS AS OF 02/07/18
======================
644.4: AD will progress capture of funds for Dirac with Mark Wilkinson. (Update: funding from DIRAC. AS has emailed Mark. They are now using it more heavily. Could use the money for tape, but have to be careful not to buy tape we won’t use. May be better charging later rather than during this FY? AD will now progress). Ongoing.
663.3: RJ and DC will advise how the experiments want disk divided for the start of Run 3 (Alice and LHCb are resolved). (Update: DB will write to DK with DC in copy with proposed way forward). Ongoing.
665.2: AD will produce Procurement schedule for the coming FY to build in an additional month to buffer any delays in the future. Ongoing.
667.1 PG Clarify with STFC what exactly is required for the OC feedback. wrt the Capital reporting. Ongoing.
667.2 Need to do h/w planning before next OC to provide OC with details of shortfall in funds. Ongoing.
671.2: PG and AM to check if there is a common requirement across the Grid that can be negotiated with Dell for a framework agreement (e.g. Storage, Compute, Configurations). Ongoing.
672.1: DB will complete the UKRI Infrastructure Survey and request comments from the PMB by 02/07/18. Ongoing.
672.2: JC and/or PG and/or PC to brief Ops-meeting and request storage group draft background document. AD to contribute. Ongoing
672.3: RJ, DK and AM to draft the Experiment Support background document. Ongoing.
672.4: DK to draft the Security, Trust and Identity background document. Ongoing.
672.5: AD to draft the Tier1 background document. Ongoing.
672.6: JC, SL AM and PG to draft the Tier2 background document. Ongoing.
672.7: PG will consider the agenda for GridPP41 incorporating the GridPP6 Background Documents. Ongoing.
673.1: AM will create a matrix of HEP technology that could be of use to other communities.
673.2: AD will provide the PMB with an overview of strategy for tapes and drives for the remainder of GridPP5 and GridPP6.