GridPP PMB Meeting 663

GridPP PMB Meeting 663 (05.03.18)
=================================
Present: Pete Gronbech (Chair), Dave Britton, Jeremy Coles, David Colling, Tony Doyle, Roger Jones, Dave Kelsey, Steve Lloyd, Andrew McNab, Andrew Sansum, Gareth Smith, Louisa Campbell (Minutes).

Apologies: Tony Cass, Alastair Dewhurst, Pete Clarke.

1. OSC talk
====================
PG attached a first draft of the previous talk with elements updated. DB will review this and update where necessary then circulate by Wednesday. This will cover what the document does, setting the scene, then discuss issues arising. UKT0 will most likely generate more discussion/questions – DB and PC will discuss.
There is a finance table which derives from our finance report though it is displayed differently. If Tony Medland is there it is useful to follow the same format as normally used. £140K pulled forward should be covered in the talk (this is already covered in the report).
Feedback document areas to be covered are highlighted in yellow, e.g. costed and resourced capital for the remainder of GridPP5. Actions from previous talk were discussed and confirmed as complete – CDT action point requires more thought and updating from all institutions. AM or RJ (Manchester), JC (Cambridge), DB (Glasgow)
ACTION 663.1: Summaries to be provided of any likely contribution to the broad aims of GridPP from the CDT: AM/RJ for Manchester, JC for Cambridge, DB for Glasgow.

2. ResearchFish
================
Deadline is 14 March – PC has been working on resolving grants appearing on the PIs interface. At the moment this is not ideal but is workable. Papers need to be submitted and the system responds requesting if there are any further inputs, which is time consuming. SL confirmed the GridPP submissions are fine, but older versions are problematic.
PG uploaded a talk – there are several areas we need to put feedback into, primarily on publications. PG has added 94 Atlas, 100+CMS, plus others in addition to existing papers. Other VOs and GridPP staff need to be uploaded but the information is not readily available now that Tom is no longer here collating the information. DOIs are very useful for enabling uploading. Most are on the Inspire HEP database, but is this the best search stream to use or is there a better source for LHC papers? Snow papers or ILC papers etc may have used GridPP resources, but are challenging to find and determine whether they should be quoted.
Other sections were discussed – collaboration of partnerships (list new VOs = 6 have been added), new collaborations should be included, e.g. HN CyCloud, EOSC hub, AENEAS, etc, all listed in the OSC papers – DK will supply the amounts and start/end dates.
Leavers – Andrew Lahiff, Wahid (info in Tier2 report), Tom, George Vasilaskos, will be reported on.
Engagement activities – talks/posters/outreach activities, it would be useful if these are included in Quarterly reports.
Influence on policy (members of committees, etc) and Awards and Recognition (positions on bodies) – both are very closely aligned but the former is more political. Experiment positions should also be included.
Research tools and methods, intellectual policy, etc we do not tend to report on. User facilities has CERN listed.
Key Findings, Narrative impacts, Secondments etc remain unchanged from last time.

3. GridPP40 & F2F
=================
Tier2 H/W allocation policy is on the agenda to cover FY 18/19. There should be discussion how GridPP structures interact with UKT0. Update on the Echo project from Alison would be useful.
HSF session appears to be going ahead, and JC has had a suggestion for a speaker. Anna Scaife has a nominal slot and this needs to be confirmed.

4. SKA DIRAC Transformation DB request
=======================================
SKA transformation would like a new service in DIRAC included in the GridPP as this does not work properly in the current environment – it creates large numbers of jobs and produces 1000s of output files. It would normally be installed on DIRAC but only works with one VO, ie it does not work with multiple VOs – need to create SKA as the administrators. If this is not done they will have to use another route, e.g. install their own DIRAC. AM will send an email to the PMB with details.

5. AOCB
=======
a) UKT0 presentation to BEIS last week went very well. At the end of the meeting they were advised verbally it would be recommended for funding. There will be processes to go through but, if successful, £4M per annum over the next 4 years is a significant injection of funds and governance requires to be considered. This should perhaps be covered in the OSC talk.
b) DC will give outreach talks to schools soon and asked if there are any images to show Grid – Tom used to collect these and had a talk he gave to schools, this should be on the website.

6. Standing Items
===================

SI-0 Bi-Weekly Report from Technical Group (DC)
———————————————–
Short meeting took place on Friday and another will take place on Friday. DC will report thereafter.

SI-1 ATLAS Weekly Review and Plans (RJ)
—————————————
RJ noted RAL’s FTS service is problematic and we have stopped using. Configuration problem affecting routing caused Echo problems and switched to Stack. ADC would like to move to CPU storage and this should be discussed. IC is being switched to CPU.

SI-2 CMS Weekly Review and Plans (DC)
————————————-
Minor issues noted, but nothing of significance to report. Singularity in the UK – RAL Tier1 is currently not enabled and others are scheduled (in production manager’s report).

SI-3 LHCb Weekly Review and Plans (PC)
————————————–
Main impact is an issue with UK certificates having issues accessing storage at some Tier2 sites. Some sites both de-cache and DPM are affected, but this is not universal so investigation will be on a site-by-site basis. LHCb access is being undertaken this week. RAL/LHCb usage has been largely constant despite the outage.

SI-4 Production Manager’s report (JC)
————————————-

1. CMS are pushing for Singularity to be enabled at their sites (they were targeting March). In the UK, the RAL T1 is not currently enabled. Under T2s: Brunel, IC and RALPP are set up; Bristol is deploying. And finally for T3s: Glasgow is pretty much there, RHUL has started, QMUL and Oxford have no queues yet enabled.

2. There is an LHCOPN and LHCONE joint meeting taking place tomorrow (Tuesday the 6th) and Wednesday at Cosener’s House. (https://indico.cern.ch/event/681168/). Duncan is reporting on WLCG IPv6 status and Ian Collier on the (HSF) Community White Paper.

3. A current priority area for sites is migration to CentOS7. We are collating the status and experiences across sites (e.g. for storage https://www.gridpp.ac.uk/wiki/Storage_site_status).

4. There was a WLCG ops coordination meeting last week (https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes180301). It included an update on the CNAF recovery, a draft of the new policy on SAM recalculation (to reduce the overhead on experiments) and progress on the WLCG Storage Space Accounting prototype. Tier-1s have been asked to complete a survey on tape “to investigate how archival systems can be used most optimally by users, and what metrics are available to track the effectiveness of their use”.

SI-5 Tier-1 Manager’s Report (GS)
———————————
Here is a brief Tier1 report for today (5th March ’18) covering the week since the last meeting.

Castor:
– The Castor intervention scheduled for last Thursday (1st March) was cancelled owing to staff availability in the poor weather. (It was to apply Oracle patches to the database systems behind Castor). It is being re-scheduled – possibly towards the end of March.

Echo (and FTS and IPv6)
– As planned the Echo gateways were dual stacked – so providing IPv6 access – last Tuesday (27th Feb). This change itself went well – however, it revealed problems with transfers to/from Echo via our FTS service. One issue was identified – the load balancers in front of the FTS system were not dual stacked. This was fixed quickly but the problem persists. Some VOs are making use of other FTS servers as a workaround. (LHCb seem to be the only one using our FTS server significantly at the moment).

Infrastructure: (No change from last week)
– We await further updates regarding an ongoing problem with one of the BMS (Building Management Systems) in the R89 machine room. This has an intermittent fault.

Networking:
– Early Tuesday morning (27th Feb) there was a problem with two of the three connections that make up the OPN link to CERN. The corresponded with a problem reported by Janet at London Powergate (their updated also indicated that two of our links were affected). We saw the one remaining link maxing out inbound. This was resolved around midday. However, one of the links then showed errors. That link was downed for a while until that was sorted. By Wednesday morning all three links were again fully operational.

Capacity Purchasing:
– One tranche of CPU expected this week; Other concerns around availability of the 12TB disk drives. The increased delivery risk has been flagged with finance.

SI-6 LCG Management Board Report of Issues (DB)
———————————————–
Nothing to report.

SI-7 External Contexts (PC)
———————————
PC was not in attendance, no report submitted.

REVIEW OF ACTIONS
=================
644.4: AS will progress capture of funds for Dirac with Mark Wilkinson. (Update: this will be re-profiled in the next FY). Done.
655.3: PG to consider the agenda and date for Tier1 review and include disaster recovery plans. (UPDATE: appropriate dates are being considered with Alastair Dewhurst). Ongoing.
656.1: DK will report before the end of February on any actions GridPP should take to comply with GDPR. Ongoing.
656.2: DC will report on CPU efficiencies and CMS taskforce. Ongoing.

ACTIONS AS OF 05.03.18
======================
655.3: PG to consider the agenda and date for Tier1 review and include disaster recovery plans. (UPDATE: appropriate dates are being considered with Alastair Dewhurst). Ongoing.
656.1: DK will report before the end of February on any actions GridPP should take to comply with GDPR. Ongoing.
656.2: DC will report on CPU efficiencies and CMS taskforce. Ongoing.

663.1: Summaries to be provided of any likely contribution to the broad aims of GridPP from the CDT: AM/RJ for Manchester, JC for Cambridge, DB for Glasgow.