GridPP PMB Meeting 692

GridPP PMB Meeting 692 (07/01/19)
=================================
Present: Dave Britton (Chair), Tony Cass, Jeremy Coles, David Colling, Alastair Dewhurst, Tony Doyle, Pete Gronbech, Jon Hays, Roger Jones, Dave Kelsey, Andrew McNab, Gareth Roy, Andrew Sansum, Louisa Campbell (Minutes).

Apologies: Pete Clarke, Steve Lloyd.

1. GridPP6 Strategy Document
============================
DB has provided various iterations and suggests an outline strategy should be agreed then progressed with the CB since time is of the essence. There was some discussion of how to maintain the community aspect of the Grid and engaged participants as well as expectations of the CB, which acts as an informal oversight board. DB outlined the various sections of V5 of the strategy document and invited comment from PMB members:
Introduction
Vision of high level project
Guidelines for 4 year project flat cash c. £6M-£6.3M per annum needs to be refined. A useful approach could be to take GridPP5 award and include additional Capital then divide by 4 – DB requires actual Capital and Resource split for GridPP5. We can note that figures do not include inflation for staff costs at RAL.
Average cost of FTE needs clarified as the CG grant has been delayed due to variable FEC costs at institutions. RJ provided Lancaster figures for mid-Grade 7, which are higher than planned. DB suggested the costs would balance out using the mid-Grade 7 level figure (£101K) taking account of Grade 6 and 8s. PG will look at this and determine feasibility. If resource budget line for staff is just over £4M the number of FTEs across the project is c. 40 FTE.

The strategy is to present the site, experiment and functionality axis. DB suggests we should focus on functionality to present the project in 5 Work Packages:

1) Operation of UK grid (Operation of WLCG Grid as part of MOU). Covers effort for running infrastructure – h/w and service in line with status quo in 4 sub-sections: a) Tier1, b) Core Tier2, c) non-core Tier2, and d) UK operations, e.g. weekly ops meetings, community infrastructure operations, etc.

(There was some discussion on evolution of Tier2 infrastructure and migration of big sites have been frequently discussed over the last 3 years)

2) User focus operation and liaison effort at Tier1 and Tier2 sites – focussed on whether the infrastructure works for users. E.g. Tier1 LHC Liaison posts and Tier2 manpower to solve experiment specific issues. Need to firmly establish and defend responsibilities for these posts: a) 3 Tier1 liaison posts plus 1 to support ALICE and non-LHC, and b) Tier2 – 2 FTEs. Work in process that RJ, DK and AM will give further thought to, e.g. supporting communities and IRIS.

3) WLCG engagement – to influence the collaboration and adapt appropriately. Covers operational security and significant contribution from staff across sites for WLCG operational and development meetings. This should perhaps change to ‘WLCG Commitment’. LHCb uses large sites and needs responsive resources. Discussions across the Tier2 experiments on how best to deploy effort. DB has made some suggestions on how this could best work. This may be better placed at sites engaged with LHCb (where there is someone at the sites undertaking shifts, supporting users, etc = local oriented support).

4) Infrastructure development – a) Tier1 service development, and b) upgrading UK infrastructure with High Luminocity LHC. We will have larger sites with storage who will be locally responsible for their storage and other Tier2 sites without local storage who may need caches. Smaller experiments and IRIS, Dirac, other STFC communities, etc will also benefit. Plus, possibly, UK infrastructure development.

5) Administration. Proposed 50% of the project leader should be bought out for the project and small effort for PC as deputy leader (c. 20%). Project and resource management – possibly divided between 2 people and Ops Team production management – needs further development.

DB has incorporated Charlotte’s text and responded to points as well as incorporation of tables to identify aspects that could be reduced, perhaps with co-funding. The primary aspect for the CB is that we are consulting and having to mitigate/manage lots of constraints.

2. GridPP6 Authorship Fractions
===============================
DB thanked DC and Gavin Davis for info distinguishing between current and future M & O Authors. He has circulated a revised set of numbers that DC pointed out a correction to regarding Tier-1 authors as Finland and Korea Tier-1s do not contribute to CMS and should not be included. It was suggested AM and RJ should do similar for LHCb and ATLAS numbers and ALICE should also be considered, though the numbers will be small.
AM advised he can go back to Tim Girshaw to advise this will be done differently and there was discussion on how best to arrive at the correct numbers based on column basis and acceptable/reliable/robust/ justifiable to the funder as the basis for contribution of M & O. It was agreed all experiments should follow the same process to determine whether the computer requirements fit within the proposed budget and costings DB has worked out for GridPP6 and AD will do the same to develop full costings. AM will discuss with LHCB and ATLAS and CMS will look at authorship to work out grey book and M & O differences.
ACTION 692.1: RJ, DC and AD to provide information on M & O figures for GridPP6.

3. AOB
======
None.

4. Standing Items
===================

SI-0 Bi-Weekly Report from Technical Group (DC)
———————————————–
No report

SI-1 ATLAS Weekly Review and Plans (RJ)
—————————————
No report.

SI-2 CMS Weekly Review and Plans (DC)
————————————-
No report.

SI-3 LHCb Weekly Review and Plans (PC)
————————————–
No report.

SI-4 Production Manager’s report (JC)
————————————-
No report.

SI-5 Tier-1 Manager’s Report (AD)
———————————
– 28th December, File System problem with one of the Squids, was fixed on 2nd Jan.

– Problem with one of the ARC CEs, this seems to be intermittently impacting VOs. Still under investigation but it doesn’t seem to be deleting jobs once they are finished, which means it is suffering from load issues as it is dealing with 250k jobs as opposed to around 10k for each of the other ARC CEs.

I have attached the Tier-1 CPU usage report for December. Everything seems fine.

Procurement status:
– Extra capacity for XMA and CV17 storage nodes. Purchase orders were submitted last year. Expect XMA delivery within 2 weeks. ClusterVision have had problems and may take an extra week or two for delivery.

– CPU capacity from XMA. Purchase order was submitted last year. Expect delivery 2nd half of February.

– Disk capacity from DELL and XMA. Suppliers were awarded the contract on 3rd January. I will be complaining to SBS as we had provided them the information they required in line with the schedule they had drawn up before Christmas. We had expected them to award the contract ~21st December. We are currently in the contract exchange phase (although both suppliers are already working on fulfilling the award). XMA made a mistake where they undercharged us by ~£5k. They have agreed to honour this, although they have asked us to improve our contract in future*. Both suppliers have said they are on target to deliver before 1st March.

SI-6 LCG Management Board Report of Issues (DB)
———————————————–
No report.

SI-7 External Contexts (PC)
———————————
No report.

REVIEW OF ACTIONS
=================
644.4: AD will progress capture of funds for Dirac with Mark Wilkinson. (Update: funding from DIRAC. AS has emailed Mark. They are now using it more heavily. Could use the money for tape, but have to be careful not to buy tape we won’t use. May be better charging later rather than during this FY? AD will now progress. 08/10/18 – Leicester are producing a PO for tapes and will send to AD to produce an invoice). Ongoing.
678.3: AD to finalise the Tier1 background document, including tape strategy by end September. (Update: Almost complete and will circulate current iteration for comment). Ongoing.
678.5: JC to finalise the Storage background document by end September.
(UPDATE: 17 October meeting with Tony Medland & DB and PC will attend. This is almost complete and awaiting a few minor elements to be worked in ñ GR will upload into Googledocs for info). Ongoing.
690.1: DC to get confirm CMS M&O numbers and identify any mismatch due to inclusion of PhD students. (DC to follow up and provide a location where information can be extracted from).
690.2: JH to follow up with BBC contacts on further potential collaboration in regards to Video Encoding proposal (Ongoing)

ACTIONS AS OF 07.01.19
======================
644.4: AD will progress capture of funds for Dirac with Mark Wilkinson. (Update: funding from DIRAC. AS has emailed Mark. They are now using it more heavily. Could use the money for tape, but have to be careful not to buy tape we won’t use. May be better charging later rather than during this FY? AD will now progress. 08/10/18 – Leicester are producing a PO for tapes and will send to AD to produce an invoice). Ongoing.
678.3: AD to finalise the Tier1 background document, including tape strategy by end September. (Update: Almost complete and will circulate current iteration for comment). Ongoing.
678.5: JC to finalise the Storage background document by end September.
(UPDATE: 17 October meeting with Tony Medland & DB and PC will attend. This is almost complete and awaiting a few minor elements to be worked in ñ GR will upload into Googledocs for info). Ongoing.
690.1: DC to get confirm CMS M&O numbers and identify any mismatch due to inclusion of PhD students. (DC to follow up and provide a location where information can be extracted from). Ongoing
690.2: JH to follow up with BBC contacts on further potential collaboration in regards to Video Encoding proposal. Ongoing
692.1: RJ, DC and AD to provide information on M & O figures for GridPP6.