GridPP PMB Meeting 670 (F2F)

GridPP PMB Meeting 670 (06/06/18) F2F
Present: Dave Britton (Chair), Pete Clarke, David Colling, Alastair Dewhurst, Tony Doyle, Pete Gronbech (Minutes), Roger Jones, Steve Lloyd, Andrew McNab, Andrew Sansum, Louisa Campbell (Minutes).

Apologies: Tony Cass, Jeremy Coles, Dave Kelsey.

1. GridPP5 2nd tranche h/w allocation
There was some discussion of PG’s presentation of proposed h/w allocation for this FY. CPU/Disk split appears reasonable; DB and PG will examine the percentage split in detail and determine any necessary amendments. PG’s approach was accepted in principle, on the basis that it holds up to more detailed scrutiny. Once agreed DB will write to Tony Medland with a summary of proposed allocation.
Action 670.1: PG will clarify Tier-1 Capital spend for this FY with Tony Medland to confirm which column h/w spends should be allocated to.
Action 670.2: DB and PG will consider percentage splits of CPU/Disk.

2. GridPP and IRIS (UKT0)
AS summarised IRIS eInfrastructure and rationale – i.e. bringing together STFC computing interests and delivering resources to communities. Going forward the expectation is to exploit opportunities as they arise in coherence with the development of National eInfrastructure. There was some discussion on whether some of GriddPP5 activities could be Capitalised. IRIS (e-Infrastricture for Research and Innovation at STFC) should not be confused with IRIS project at LSE involving Will Venters (£8M over 4 years; GridPP is an external partner).

ACTION 670.4 PG will Invite Tim to give a presentation at Ambleside.
(Tim’s take on this role and things that can be tackled over the next 6 month period)

3. GridPP6 planning

A long and wide ranging discussion took place on many elements of GridPP and how they might fit into a GridPP6 proposal. It was noted that we will only be briefed on the scope of GridPP6 in October but GridPP5 was certainly a starting point. At the end of the day it was agreed to commission a set of background documents over the summer to be discussed at GridPP41. These documents would then be used to inform the GridPP6 proposal writing. In particular, the issue of storage technology, data management, and posts was one area that needed exploring; as did the evolution of the experiment liaison posts (how do we scale to support significant-sized new efforts such as DUNE?). Security is another area that is important, particularly with the development of more-shared infrastructure (IRIS).

4. Tier-1 Issues
Tier-1 Tape capacity has got a bit tight (but in hand).

If we don’t have enough money, would spend the majority on disk because we could run existing CPU longer. Large memory and SSD’s show benefit for certain workflows. Decent sized disk procurement and small or non-existent CPU.

Current purchase all passed testing (XMA couple of minor issues resolved). Next year’s ITTs would be the same as last year. It was noted that all previous stand stills in the previous plan are actually optional, so some time can be saved.
In previous years Martin, Tim and Andrew were doing a lot of this themselves but a contractor that was employed at RAL, has produced a lessons learned doc.

DB asked why RAL do testing on Disk servers, when vendors have already done tests. AD is also of the opinion that this could be accelerated.

CMS are happy with Echo but Xrootd demons currently need rebooting once a week, memory leak. (30mins effort). Not a massive overhead. It is being investigated. Tim Aydee is looking into it.

Tier-1 staffing. Are we at capacity? 1 definite starter. 1 DB person with visa problems in Egypt. We will more accurately to profile the fabric teams effort.

Some rationalisation/re organization of Tier-1 effort is on-going. Total is around 15.5 – 15.6 atm. Probably need 1 more.

644.4: AS will progress capture of funds for Dirac with Mark Wilkinson. (Update: funding from DIRAC. AS has emailed Mark. They are now using it more heavily. Could use the money for tape, but have to be careful not to buy tape we won’t use. May be better charging later rather than during this FY? AD will now progress). Ongoing.
663.3: RJ and DC will advise how the experiments want disk divided for the start of Run 3 (Alice and LHCb are resolved). Ongoing.
663.8: JC will examine GridPP staff roles/service/areas of expertise. (UPDATE: JC will provide a table with information for discussion at June F2F). Ongoing.
665.2: AD will produce Procurement schedule for the coming FY to build in an additional month to buffer any delays in the future. Ongoing.
667.1 PG Clarify with STFC what exactly is required for the OC feedback. wrt the Capital reporting. Ongoing.
667.2 Need to do h/w planning before next OC to provide OC with details of shortfall in funds. Ongoing.
669.1: DB to respond to a request for resources from DUNE.

ACTIONS AS OF 06/06/18
644.4: AS will progress capture of funds for Dirac with Mark Wilkinson. Update: funding from DIRAC. AS has emailed Mark. They are now using it more heavily. Could use the money for tape, but have to be careful not to buy tape we won’t use. May be better charging later rather than during this FY? AD will now progress). Ongoing.
663.3: RJ and DC will advise how the experiments want disk divided for the start of Run 3 (Alice and LHCb are resolved). Ongoing.
663.8: JC will examine GridPP staff roles/service/areas of expertise. (UPDATE: JC will provide a table with information for discussion at June F2F). Ongoing.
665.2: AD will produce Procurement schedule for the coming FY to build in an additional month to buffer any delays in the future. Ongoing.
667.1 PG Clarify with STFC what exactly is required for the OC feedback. wrt the Capital reporting. Ongoing.
667.2 Need to do h/w planning before next OC to provide OC with details of shortfall in funds. Ongoing.
669.1: DB to respond to a request for resources from DUNE.
670.1 Discussion Papers to be written for GridPP41, Storage, Expt support, others?
670.2: DB and PG will consider percentage splits of CPU/Disk.
670.4 PG Invite Tim to give a talk on the Atlas Liaison Role.