GridPP PMB Meeting 622

GridPP PMB Meeting 622 (30.01.17)
=================================
Present: Dave Britton(Chair), Tony Cass, Pete Clarke, Jeremy Coles, David Colling, Pete Gronbech, Andrew McNab, Andrew Sansum, Gareth Smith, Louisa Campbell (Minutes).

Apologies: Tony Doyle, Roger Jones, Dave Kelsey, Steve Lloyd.

1. GridPP38 Sussex
==================
The registration link for GridPP38 is now available and LC has circulated to the PMB and UKHEPGRID. An agenda needs to be developed. Site Evolution is clearly still a key issue to discuss, and presenting requirements e.g. what was discussed at Atlas Jamboree, both for latter half of LHC Run2 and for Run3. Includes Tier 1 and 0 evolution together with evolution of Tier 2s. Much of this was covered last time, but a focus on inputs from the experiments would be beneficial. Large and Small Tier-2 sites are perhaps getting better understood, but the middle areas could potentially benefit from more discussions. CMS is less concerned with how the resources are provided, so long as they work effectively and have sufficient access. Middle sized sites may aspire to be larger in order to support other projects beyond particle physics and GridPP can only provide guidance on what manpower will be available beyond 2020 to allow them to plan effectively.
Other suggestions – other non-gridPP VOs progress; security issues; site monitoring; caching models testing; access for CMS (disk volume, overhead, performance, etc.); Caching solutions; presentations from site admins to understand their pressures (particularly medium sites or ones with small amount of GridPP but also other pressures, e.g. Durham, Oxford, Liverpool, etc.).
ACTION 622.1: DB and PG will work on an agenda for GridPP38 and run this past DC for comment/input.

2. Technical Meeting update
=================================
DC provided a summary of the meeting (unedited notes at https://indico.cern.ch/event/609400/) which focussed on Tier2 Evolution.
Atlas
Site smaller than RALPP, Liverpool, Oxford ECDF should not be buying disk and sites that are larger should be buying storage. These four should decide themselves and will come up with different solutions. Durham are running with an ARC cache of tens of TB and are no longer a small CPU site. We are already looking at extra sites of volunteers. To use ARC caching something like a global file system is needed. There was some discussion as to whether or not a simple NFS would be performant enough and alternatives were discussed.
Small sites are in the situation where they have not enough effort to migrate to a new state that requires. There was some discussion about us having a team that go into smaller sites for a few days to help them change over. It was pointed out that each site will have its configuration system for an example. However over a few days these could be sorted out. It was likened to Grid-in-a-box
The role of documentation was also viewed as important and such a task force would take this into account.
CMS
Running a CMS site has always been a bit more heavy weight than Atlas. However running a diskless CMS site is quite straightforward and this has been done with Oxford. These have been pretty successful (see slides). We will extend this to have all data hosted UK site. Chris also showed that files were being used multiple times so xrootd caching would be helpful.
LHCb
Would encourage sites to buy CPU rather than have any more sites joining the group of T2-Ds. Looking at using xrootd (did look at shttp but that has stopped at the moment). Starting to look at similar ways of working as CMS described above. This will be an evolutionary process.
Further discussion was put off until 10am next week.

There remains issues to be discussed and these could be explored more at GridPP38. Atlas may be slower, but CMS should be well down this road in a couple of months. We need to consider how caching would fit into other sites, e.g. Dirac has several sites with data, but effectively only one has the data and the other sites have caches that can be accessed. If we have very large cache, copies could be acquired and it was noted Durham using caching was more efficient than using local disk.

3. Standing Items
===================

SI-0 Bi-Weekly Report from Technical Group (DC)
———————————————–
See point 2 above.

SI-1 Dissemination Report (SL)
——————————
No report submitted.

SI-2 ATLAS Weekly Review and Plans (RJ)
—————————————
No report submitted.

SI-3 CMS Weekly Review and Plans (DC)
————————————-
Nothing significant to report.

SI-4 LHCb Weekly Review and Plans (PC)
————————————–
Nothing significant to report.

SI-5 Production Manager’s report (JC)
————————————-
ON Thursday a Security check was run across 20 UK sites and Ian will produce a report on the results. This is sensitive regarding sites that did not respond, caused by an email routing issue at one university. This is now resolved and a new address has been set up to both the university and to the GridPP team, thereby validating the value of this exercise.

SI-6 Tier-1 Manager’s Report (GS)
———————————
Castor 2115 updates took place last Tuesday and GEN status on Thursday. Both are fine, though a special code for Alice in GEN1 is being undertaken, after tomorrow this will be complete.

SI-7 LCG Management Board Report of Issues (DB)
———————————————–
No MB meeting has taken place. DB circulated emails about a scientific computing forum meeting arranged by Eckhard on long-term computing needs for the LHC in its high-luminosity phase – re inconsistencies between funders and experiments’ requirements. It may explore ways to resolve these issues – DB will attend and ascertain membership/purpose of the group.

SI-8 External Contexts (PC)
———————————
Nothing significant to report. There was some discussion on challenges faced by projects seeking funding and planning/budgeting for computing requirements.

REVIEW OF ACTIONS
=================
610.1: AS/GS to produce suggestions for one or more metrics that will summarise the Tier-1 network availability/performance. Done.
616.3: DB and SL will discuss how best to progress replacement of TW’s role. Ongoing.
620.1 DB to contact DK re the procedure to deal with a security incident and the media. (Update: DK had devised an interim statement which involved TW as dissemination officer and he is no longer in post – there is no prescriptive full response as this would be dependent on circumstances and probably involve an emergency PMB and communication with relevant PR representatives). DK will send the statement to PMB in case required in future – spokesman SL as head of board or DB as project leader.

ACTIONS AS OF 30.01.17
======================

616.3: DB and SL will discuss how best to progress replacement of TW’s role. Ongoing.
620.1 DB to contact DK re the procedure to deal with a security incident and the media. (Update: DK had devised an interim statement which involved TW as dissemination officer and he is no longer in post – there is no prescriptive full response as this would be dependent on circumstances and probably involve an emergency PMB and communication with relevant PR representatives). DK will send the statement to PMB in case required in future – spokesman SL as head of board or DB as project leader. Ongoing.
622.1: DB and PG will work on an agenda for GridPP38 and run this past DC for comment/input.