GridPP PMB Meeting 697

GridPP PMB Meeting 697 (11.02.19)
=================================
Present: Pete Clarke(Chair), Dave Britton, Tony Cass, David Colling, Alastair Dewhurst, Tony Doyle, Pete Gronbech, Jon Hays, Roger Jones, Steve Lloyd, Andrew McNab, Gareth Roy, Andrew Sansum, Louisa Campbell (Minutes).

Apologies: Jeremy Coles, Dave Kelsey.

1. GridPP6 Reviewers (email 6/2/2019)
=====================================
DB circulated an email listing possible reviewers and inviting recommendations for additional reviewers. It was suggested it would be challenging to select people who have specialist background knowledge without any conflicts. Suggestions included Amber(JLab – nuclear physics) or Liz (Fermilab, previously CMS), but USA-based people may not be appropriate since their models differ. Other suggestions were invited it was suggested it could be reasonably argued that a reviewer with a close connection, e.g. Ian Bird, should be included precisely because of their connection/understanding of the proposal. DB requested responses with suggestions within 24 hours so he can proceed.

2. GridPP6 Mapping Spreadsheet
===============================
DB shared a mapping spreadsheet for discussion that JC has been working on in discussion with DB which succinctly sets out tasks and WPs as the text would be too lengthy without a table. The proposal will require a shorter table with hyperlinks to the full online table. JC has mapped the GridPP5 table onto the structure and triangulated with new WPs and linked to tasks. This needs to be completed, particularly in WP4 area, so that the associated text can be developed. As a starting point there was discussion on the main aspects of the spreadsheet though more detailed discussion may be useful when JC is available to participate. The first few points refer to Tier1 and AD has sent JC his spreadsheets of the tasks which have been incorporated into this. Appropriate terminology will be agreed then updated at the appropriate points. Roles need to be confirmed defining the high level of skills required. Proposed text defining roles (e.g. RSE infrastructure and middleware) and demonstrating the diversity of associated tasks/responsibilities needs to be refined to reflect FEC-able posts. Somewhere it should be defined different roles in common use throughout the industry are highly specialised roles, even if referred to as RSE.
Members will absorb the table in more detail and a follow-up meeting was proposed for Thursday 14 February. PC will check availability with JC.

3. GridPP6 Proposal Preparations (Actions below 696.X)
======================================================
See actions below.

4. AOCB
=======
None.

5. Standing Items
===================

SI-0 Bi-Weekly Report from Technical Group (DC)
———————————————–
There was no meeting this week, next week there will be a meeting on LHC Condor.

SI-1 ATLAS Weekly Review and Plans (RJ)
—————————————
The recently reported CPU efficiency issue at RAL appears to be a result of job mix which produced failed jobs and is, therefore, not systemic. Some Atlas improvements are being undertaken re corrupted output files.

SI-2 CMS Weekly Review and Plans (DC)
————————————-
There has been an issue since Saturday at RAL which Katie is investigating.

SI-3 LHCb Weekly Review and Plans (PC)
————————————–
One disk service is down at Tier-1 and is being investigated. There have been some issues in submitting jobs at Tier-1 which are being investigated.

SI-4 Production Manager’s report (JC)
————————————-
JC was not in attendance and did not submit a report.

SI-5 Tier-1 Manager’s Report (AD)
———————————
– The garbage collection problems Castor was having with its Tape buffer have been solved. A mis-configuration for NA62 meant their files weren’t being migrated to Tape and this eventually led to the garbage collection breaking. Once this was fixed, the NA62 files were safely migrated to Tape and the buffer cleared up.

– Ongoing issues with ARC-CEs. In the last week arc-ce02,03 and 04 (These are the CEs LHCb submits to) all failed SAM tests at one point. There has been no impact on jobs but it is using up our time poking them. It has been noted that some other services on the VMWare cluster have been sluggish. The problem does originate from a directory that needs to be accessed lots. With the extra Capital we did buy a new VMWare cluster, so if this is the root of the problem a solution is in site.

– Procurement
The final remaining Purchase Orders for the extra capital went out last week.
Capital procurement is waiting for delivery:
DELL Storage, delivery has been booked in for 4th March.
XMA CPU, expected delivery next week. We are discussing acceptance testing plans.
XMA Storage, mid March.
Extra disks for ClusterVision17 storage – In December we ordered some extra disks from XMA and ClusterVision to upgrade the 17 generation. The XMA disk have long since arrived and been installed. Last week we got a message from ClusterVision reporting problems in sourcing the disks. This procurement is ~£30k so it is not a disaster if they can’t deliver…

SI-6 LCG Management Board Report of Issues (DB)
———————————————–
Next meeting is scheduled for 19.02.19.

SI-7 External Contexts (PC)
———————————
Nothing to report.

REVIEW OF ACTIONS
=================
644.4: AD will progress capture of funds for Dirac with Mark Wilkinson. (Update: funding from DIRAC. AS has emailed Mark. They are now using it more heavily. Could use the money for tape, but have to be careful not to buy tape we won’t use. May be better charging later rather than during this FY? AD will now progress. 08/10/18 – Leicester are producing a PO for tapes and will send to AD to produce an invoice). Ongoing.
696.1: RJ to provide ATLAS resource requirements. Done.
696.2: RJ to provide ATLAS’ guidance for 2 FTE location at Tier-2 sites. Ongoing.
696.3: RJ to draft 4c(iii) in the Plan2 document: A description of WP2. Ongoing.
696.4: JC to update GridPP5 tasks tables and map to GridPP6 Workpackages. Done.
696.5: JC to draft 4c(iv) in the Plan2 document and work with DB on 4c(i). Ongoing.
696.6: SL to refine/check Tier-2 Leverage Section. Done.
696.7: JH to draft pathways-to-impact document and extract 1 page for proposal. (Update: JH is seeking clarification between provision in GridPP5 and requirements for GridPP6). Ongoing.
696.8: PC to work on Context section and suggest merger of motivation section. Done.
696.9: PC to coordinate development of 4c(v) WP4 description. Ongoing.
696.10: AD to provide draft of Tier-1 section 6b. Ongoing.
696.11: AD to contribute via PC to 4v(v). Ongoing.
696.12: DC to provide CMS’s guidance for 1FTE location at Tier-2 sites. Done.
696.13: DC to provide assistance to RJ with 4c(iii). Ongoing.
696.14: DB to contact CB. Done.
696.15: DB to draft 4c(ii) with help from JC. Ongoing.
696.16: DB to coordinate 4c(vi). Ongoing.
696.17: DB to continue to develop effort matrix once Experiment site preference are known. Ongoing.
696.18: GR to continue to gather resource requirements. (Update – emails have gone out and responses awaited). Ongoing.
696.19: GR to liaise with PG on 4c(vi)2&3. Ongoing.

ACTIONS AS OF 11.02.19
======================
644.4: AD will progress capture of funds for Dirac with Mark Wilkinson. (Update: funding from DIRAC. AS has emailed Mark. They are now using it more heavily. Could use the money for tape, but have to be careful not to buy tape we won’t use. May be better charging later rather than during this FY? AD will now progress. 08/10/18 – Leicester are producing a PO for tapes and will send to AD to produce an invoice). Ongoing.
696.2: RJ to provide ATLAS’ guidance for 2 FTE location at Tier-2 sites. Ongoing.
696.3: RJ to draft 4c(iii) in the Plan2 document: A description of WP2. Ongoing.
696.5: JC to draft 4c(iv) in the Plan2 document and work with DB on 4c(i). Ongoing.
696.7: JH to draft pathways-to-impact document and extract 1 page for proposal. (Update: JH is seeking clarification between provision in GridPP5 and requirements for GridPP6). Ongoing.
696.9: PC to coordinate development of 4c(v) WP4 description. Ongoing.
696.10: AD to provide draft of Tier-1 section 6b. Ongoing.
696.11: AD to contribute via PC to 4v(v). Ongoing.
696.13: DC to provide assistance to RJ with 4c(iii). Ongoing.
696.15: DB to draft 4c(ii) with help from JC. Ongoing.
696.16: DB to coordinate 4c(vi). Ongoing.
696.17: DB to continue to develop effort matrix once Experiment site preference are known. Ongoing.
696.18: GR to continue to gather resource requirements. (Update – emails have gone out and responses awaited). Ongoing.
696.19: GR to liaise with PG on 4c(vi)2&3. Ongoing.