GridPP PMB Meeting 695

GridPP PMB Meeting 695 (28.01.19)
Present: Dave Britton (Chair), Jeremy Coles, David Colling, Alastair Dewhurst, Tony Doyle, Pete Gronbech, Jon Hays, Dave Kelsey, Steve Lloyd, Gareth Roy, Andrew Sansum, Louisa Campbell (Minutes).

Apologies: Pete Clarke, Tony Cass, Roger Jones, Andrew McNab.

1. STFC Additional HW funding
DB forwarded message from Charlotte giving permission to propose a request c. £200-£500K with a prioritised list should funds become available and it was decided to draw together a list and submit request totalling £500K. It may be preferable to concentrate this on 2-3 sites who can spend this FY. AD confirmed a total of £200K for RAL for immediate spend in HW that can be immediately deployed, including £130K of extra CPU (x24 worker nodes) in chassis at rack end. QMUL has a list totalling £1.2M in order of priority: c. £400K for resources and c. £700K of compute. Glasgow also has a list of c. £1M. Lists are awaited from Imperial, Manchester and Lancaster.
GR will collate a combined list by institute which will then be circulated to the PMB to agree a prioritised list. On the basis that Manchester, Lancaster and Imperial received cash from IRIS recently there was discussion on whether QMUL and Glasgow should be prioritised, though they may also receive IRIS money in the coming FY. It was agreed that lists with priorities and deadlines will be submitted. DB is clarifying the meaning of ‘committed’ with Charlotte. There was discussion on whether it refers to grant started and order placed, therefore ‘committed’ on the budget but not yet spent during the FY.

2. GridPP6 Proposal Preparations (from 7/1/2019)

a) Experimental Resource Requirements (see email Sun 27th at 22:46)
DB circulated an email. It has been agreed by STFC we should submit a full estimate for HW in GridPP6 but we do not yet have clarity on requirements. It is possible to proceed with 20% p.a. flat cash but there is a dichotomy which makes that unrealistic. Profiles from experiments are front-weighted and LHCb have requirements of up to double the resources. DB requested experiment reps provide best estimates which RJ has now supplied, LHCb numbers and ALICE numbers are used and some figures for CPU. Decisions need to be taken on baselines for the proposal, e.g. best estimates of c. £16M from the available plots, but accurate figures should be supplied by experiment reps. DC will discuss with CMS management. There was some discussion on scenarios and consequences as well as the processes involved in agreeing numbers and it was suggested we should use 20% figure in the short term. Once an agreed set of figures is received DB and GR will compare with the 20% figure and return with suggestions for progressing.

b) Proposal Plan (email Mon 28th at 10:30)
DB circulated a proposal plan with chapter numbers and some content in sections as he now requires contributions. There are Intro, Motivation and Context currently and motivation may ultimately be absorbed into other sections.
Messages from last week’s SAGO meeting was the Executive Board (EB) is keen for integration with IRIS, and also for skills (i.e. those leaving us to move into industry and career paths for those who remain) which should be addressed in the proposals. It may be useful to compile a list of staff who have taken either path. It would also be useful to keep in mind how the WPs contribute to IRIS and a more integrated infrastructure.
c) Proposal Structure (email Mon 28th at 11:06)
Sections 4 description of GridPP project structure with 3 parts (strategic objectives adapted from GridPP5; planning scenarios setting the scene of ‘flat cash’; and description of work packages) – clarity is required on WPs they should be numbered in priority order. Section 5 – Resource planning – DB is progressing this with H/W and extrapolations etc. Awaiting Section 6 (meeting WLCG obligation and experiment requirements) there will be an intro and explanation but we have to present Tier1 and Tier2 sections, d)-g) sections will summarise the WPs as set out in Section 4 showing how distributed effort maps onto the WPs. DB noted that service oriented view of GridPP will not be included (this was in GridPP5 proposal), though there is a great deal of useful information that can inform the aforementioned sections – JC is considering how this information could be utilised. Then DB will address Section 6. b) is a description of Tier1 service – AD will discuss with AS for background from GridPP5 and aim for 4 sides of A4 of content. Tier2 content will need to be worked up – SL will take a closer look and offer some suggestions for headings/content. SL and PC have already offered comments on some content already drafted. We will need to update management and administration, milestones, metrics and project lists as well as pathways to impact. There may be a role for input from experiments too. The final sections include Funding summaries with explanations.

d) WP discussion and actions
A suggestion was made that de-scopes could be placed inside the Work Packages rather than at the end, but experience has shown this is more effective if de-scopes are maintained and dealt with together rather than peppered throughout.

4. Researchfish
It is time to update Researchfish (closing date one week after GridPP proposal will be submitted). GR will discuss with PG how this should best be progressed and DB asked members to be responsive to any requests from GR in this regard.


6. Standing Items

SI-0 Bi-Weekly Report from Technical Group (DC)
Nothing to report. Rucio technical discussion is planned for Friday.

SI-1 ATLAS Weekly Review and Plans (RJ)
Low efficiency at RAL and CMS – this may relate to job mix but is being investigated.

SI-2 CMS Weekly Review and Plans (DC)
Nothing significant, though a graph suggested CMS will run out of tape in July. This seems unrealistic due to some assumptions included therein, but DC will watch this.

SI-3 LHCb Weekly Review and Plans (PC)
Nothing to report.

SI-4 Production Manager’s report (JC)
Nothing to report.

SI-5 Tier-1 Manager’s Report (AD)
– CPU Efficiencies are looking bad for ATLAS and CMS (see attached plot). This appears to be a global problem for CMS (i.e. all sites have very poor efficiency). Tim is investigating for ATLAS.

– Some CMS GridFTP errors due to “Address already in use” problem. This was due to new hardware being put into production missing the fix that had been applied to the old machines. This was quickly resolved (intermittent errors for ~24 hours).

– A disk server in Castor for LHCb ran into problems over the weekend and had to be removed from production while the disk array is being rebuilt. Some LHCb files are temporarily unavailable (although they are in Echo so if the LHCb failover mechanism is working, there should be no failed jobs!).

– CMS submitted a GGUS over the weekend due to intermittent SAM failures connecting to Castor with “permission denied”. Under investigation.

– Procurement
DELL Storage, delivery has been booked in for 4th March.
Martin is having a meeting with XMA later today, which will confirm dates:
XMA CPU, expected delivery 2-3 weeks from now.
XMA Storage, they have indicated that they will need to push back the delivery date (from 28th February), but have not indicated they have any problems with the end of March deadline.

SI-6 LCG Management Board Report of Issues (DB)
No MB and nothing to report.

SI-7 External Contexts (PC)
F2F IRIS meeting. Good meeting, AS has a number of actions to take care off. Of interest to GridPP would be data movement and advanced networking.

644.4: AD will progress capture of funds for Dirac with Mark Wilkinson. (Update: funding from DIRAC. AS has emailed Mark. They are now using it more heavily. Could use the money for tape, but have to be careful not to buy tape we won’t use. May be better charging later rather than during this FY? AD will now progress. 08/10/18 – Leicester are producing a PO for tapes and will send to AD to produce an invoice). Ongoing.
678.3: AD to finalise the Tier1 background document, including tape strategy by end September. (Update: Almost complete and will circulate current iteration for comment, more urgent now to feed into DB figures). Ongoing. Done.
694.1: DB will contact Charlotte confirming potential spend of additional HW funding and clarify the definition of ‘confirmed’. Done.
694.2: DB will respond to George Madden confirming proposed dates of the GridPP review meetings. Done.

ACTIONS AS OF 28.01.19
644.4: AD will progress capture of funds for Dirac with Mark Wilkinson. (Update: funding from DIRAC. AS has emailed Mark. They are now using it more heavily. Could use the money for tape, but have to be careful not to buy tape we won’t use. May be better charging later rather than during this FY? AD will now progress. 08/10/18 – Leicester are producing a PO for tapes and will send to AD to produce an invoice). Ongoing.