GridPP PMB Meeting 686

GridPP PMB Meeting 686 (12.11.18)
=================================
Present: Dave Britton, Tony Cass, Pete Clarke, David Colling, Alastair Dewhurst, Tony Doyle, Pete Gronbech (Chair), Roger Jones, Steve Lloyd, Andrew McNab, Gareth Roy, Andrew Sansum, Louisa Campbell (Minutes).

Apologies: Jeremy Coles, Dave Kelsey.

1. OSC Documents
================
PG has uploaded his version (8) to the agenda which includes up to date text from all contributors except some recent minor edits from SL. There are a number of comments that need to be addressed and highlights removed.
The document should be reduced and some sections, e.g. Wider Context, will be shortened appropriately, JC has also re-submitted his section with reduced text. There was a full discussion and agreement on finalising all the points highlighted.
The Project Map and Risk Register have been completed. The Financial Table will be updated by PG in consultation with AD over the next few days. Following a request from OSC (email on 02/11/18) to produce a resource estimate/projection for the next report, PG has circulated a proposed template to be completed. There was some discussion on this and allocated funding per FY. PC will address other matters raised if necessary.

2. PMB Membership to represent Non-LHC VOs
==========================================
Following recent ongoing discussion, it has been agreed that it would be appropriate to invite new members to join the PMB. Jon Hays has been involved on various levels and is has an excellent history as an academic with close involvement in computing and has extremely good connections. It would be very useful to have him as a PMB member relating to non-LHC VOs. There was discussion re SL also being based at QMU, but SL is also involved as Chair of CB. It was suggested that this is another opportunity to consider additional members (academics) onto the PMB in future and any suggestions will be made to DB for discussion. It was unanimously agreed that Jon will be invited to join the PMB.

3. AOCB
=======
PC has contacted some PMB members re moving forward to getting digital asset grants completed. JeS’s have to submitted urgently as they won’t be awarded until early December at the earliest which impacts IRIS. If the grants are received late they can be charged to others who are using and this needs to be clarified very soon. DC hopes this will be processed today.

5. Standing Items
===================

SI-0 Bi-Weekly Report from Technical Group (DC)
———————————————–
There was a meeting relating LHC to Dirac and migrations to T2K – starting next week. This week there will be a Rucio discussion and AD will ensure the agenda is circulated to the Ops meeting tomorrow.

SI-1 ATLAS Weekly Review and Plans (RJ)
—————————————
Minor points at Tier-1 – switching to SANTOS today, tested over the last two weeks. Issue re the use of singularity at Tier-1 – this has been reported by SKA as well.
Tier-1 is now working at pledge, but there are some Tier-2 issues, e.g. Harvester is being replaced, and these are being addressed.

SI-2 CMS Weekly Review and Plans (DC)
————————————-
DC noted that recent CMS conversations relate to automations and restart of Monte Carlo. AD mentioned discussions with Katie and Chris and CMS centrally for Katie to spend a lot of her time working on CMS part of Rucio testing with CTA. This would build her experience and be strategically useful to have this experience at Tier-1, also testing this tape system at CERN could place her in a good position when we move away from Castor. DC and AD will discuss further after the PMB.

SI-3 LHCb Weekly Review and Plans (PC)
————————————–
Nothing to report.

SI-4 Production Manager’s report (JC)
————————————-
No report submitted.

SI-5 Tier-1 Manager’s Report (AD)
———————————
– We have noticed that the CV17 storage nodes have been randomly rebooting. These nodes are not fully weighted up in Echo and further increases have been stopped while the problem is sorted out. A firmware update is being pushed out. Machines that have the new firmware appear to be fixed, but we are still gathering statistics. It should be noted that there has been no observed impact from the user perspective.

– On the 7th November, the first LHCb space token was migrated from Castor to Echo!! This is the FAILOVER space token which is used by jobs from other sites if their storage is down. The test jobs are succeeding.

– On the 7th November NA62 were successfully migrated to the new consolidated Castor tape instance.

– ATLAS are unable to get their jobs via singularity to work. We believe this is because they require privileged containers and we are currently only offering unprivileged ones. We would hope that ATLAS could improve their code.

– CMS AAA issue remains ongoing, but we believe the situation is improving. We updated the XRootD version to fix a known bug on Friday and the amount of red in the SAM tests dropped over the weekend.

SI-6 LCG Management Board Report of Issues (DB)
———————————————–
No MB and nothing to report.

SI-7 External Contexts (PC)
———————————
Nothing to report.

REVIEW OF ACTIONS
=================
644.4: AD will progress capture of funds for Dirac with Mark Wilkinson. (Update: funding from DIRAC. AS has emailed Mark. They are now using it more heavily. Could use the money for tape, but have to be careful not to buy tape we won’t use. May be better charging later rather than during this FY? AD will now progress. 08/10/18 – Leicester are producing a PO for tapes and will send to AD to produce an invoice). Ongoing.
667.2 PG will do h/w planning before next OC to provide OC with details of shortfall in funds. (Update: PG will check the OSC minutes for details and cover with GR). Ongoing.
678.2: DK to finalise the Security, Trust and Identity background document by mid October. (Update: DK and David Crooks have been working on this and it is nearly complete) Ongoing.
678.3: AD to finalise the Tier1 background document, including tape strategy by end September. (Update: Almost complete and will circulate current iteration for comment). Ongoing.
678.5: JC to finalise the Storage background document by end September.
(UPDATE: 17 October meeting with Tony Medland & DB and PC will attend. This is almost complete and awaiting a few minor elements to be worked in ñ GR will upload into Googledocs for info). Ongoing.
684.1: PG will contact the owners of risks who were absent from today’s PMB to confirm they are satisfied with the decisions taken on the risk register. Done.

ACTIONS AS OF 12.11.18
======================
644.4: AD will progress capture of funds for Dirac with Mark Wilkinson. (Update: funding from DIRAC. AS has emailed Mark. They are now using it more heavily. Could use the money for tape, but have to be careful not to buy tape we won’t use. May be better charging later rather than during this FY? AD will now progress. 08/10/18 – Leicester are producing a PO for tapes and will send to AD to produce an invoice). Ongoing.
667.2 PG will do h/w planning before next OC to provide OC with details of shortfall in funds. (Update: PG will check the OSC minutes for details and cover with GR). Ongoing.
678.2: DK to finalise the Security, Trust and Identity background document by mid October. (Update: DK and David Crooks have been working on this and it is nearly complete) Ongoing.
678.3: AD to finalise the Tier1 background document, including tape strategy by end September. (Update: Almost complete and will circulate current iteration for comment). Ongoing.
678.5: JC to finalise the Storage background document by end September.
(UPDATE: 17 October meeting with Tony Medland & DB and PC will attend. This is almost complete and awaiting a few minor elements to be worked in ñ GR will upload into Googledocs for info). Ongoing.