GridPP PMB Meeting 628

GridPP PMB Meeting 628 (20/03/17)
=================================
Present: Dave Britton (Chair), Tony Cass, Pete Clarke, David Colling, Tony Doyle, Pete Gronbech, Roger Jones, Andrew McNab, Gareth Smith, Louisa Campbell (Minutes).

Apologies: Jeremy Coles, Steve Lloyd, Dave Kelsey, Andrew Sansum.

1. GridPP38 Agenda
==================
The agenda is developing well – PG has now received acceptances to talk from Royal Holloway (Simon George) and others.
Day 1 – DB introduction talk then Tier-0, Tier-1 and LHCb. The afternoon includes discussion on results of the site survey (currently 6 replies and AM will raise again at the Ops Meeting) – evolution and resources at sites. Short talk from Sponsor at end of the day. Decisions need to be taken on the Tier-2 draft DB circulated.
Day 2 – Technology from AM, Sam Skipsey and Andrew Lahiff; after coffee so far – Alastair Dewhurst on Echo and Daniel Traynor from Queen Mary – this is a 1.5 hour session which will be easily filled. After lunch talks from smallish sites, including: Birmingham (via Vidyo); Simon George from Royal Holloway; and AAAI from Jens. There is probably space for one more. After coffee small sites will present, including Sussex (Jeremy Maris), Cambridge (JC) and one more slot TBC. Birmingham has particular ALICE responsibility and is presenting first to allow for a check of Vidyo links. PC may make a suggestion for non-HEP VO, Dune is at 3 or 4 sites so they may contribute talks. RJ will approach Jarik Novak at Lancaster to give a talk. LZ at Imperial is another possibility – they have just undergone and passed a review so will have material prepared. PG and DC will discuss the possibility of a talk on LZ.
Potential themes were discussed – previous themes include Revolution/evolution and various others, but no decision was taken.
ACTION 628.1: RJ will invite Jarik Novak to speak at GridPP38.
ACTION 628.2: PG and DC will discuss the possibility of an LZ talk at GridPP38.

2. Tier-2 Evolution
===================
DB circulated a draft paper on Tier-2 Evolution for comments. This requires to be tackled in advance of the next proposal and purchasing h/w with lifetimes of 3-5 years requires changes to the course of the super-tanker in advance. PG has uploaded a document with graphs of work done on sites which aligns well with DB’s graphs. PG also suggests changes re RALPP support to sites on ALICE and ATLAS. The order of importance is not related to the volume of work but rather of their importance to the experiments. Green and red lines on PG’s graphs indicate the size of sites – PG will adjust slightly and DB will include these in the report. DB summarised manpower in 2020 will be challenging and large sites that require the manpower need to be identified as well as an understanding of h/w procurement and network planning. The PMB agreed this should be progressed and DB invited email comments in the next 2-3 days since this is a PMB document. This should be circulated to the CB and the sites to form the basis of discussions in the afternoon session on Day 1 of GridPP38. This could be focussed on h/w purchase and technical evolution of sites against how we envisage sites will look in 2020.
ACTION 628.3: ALL to offer comments on DB’s Tier-2 Evolution draft paper in the next 2-3 days.

3. GridPP39 Dates
=================
The proposed dates (20-22 September) clash with ATLAS and LHCb weeks. Alternative windows are narrow due to availability issues in Durham in early September and also DB’s other commitments at end August. The week commencing 25 September is possible – Wed-Fri most convenient but may clash with term time. LC will check availability with Jeppe at Durham.
ACTION 628.4: LC will check availability w/c 25th September at Durham (27-29th) for GridPP39.

4. AOCB
=======
a) Quarterly reports. PG has received several reports and only awaits the Tier-1 report.

5. Standing Items
===================

SI-0 Bi-Weekly Report from Technical Group (DC)
———————————————–
There was a meeting last Friday – DC circulated notes and presentations. It was discussed that ATLAS have a desire for containers at all sites with a preference for singularity over Docker. RJ confirmed there has been discussion of this and security is an important element. This are implications for sites and the strength of desire for this preference was discussed. RJ will cover singularity in Sussex but it may have been presented as a CMS preference. Andrew Lahiff will cover the technical side in his talk.
At the CMS UK meeting a decision was taken on Oxford using RAL Storage which was very successful and will be visible as it will run on all UK sites to support CMS.
Development of a classic SE storage front with confederation behind it is being worked on, though there are some aspects to work out.
David Crooks gave a Security Operations Centre talk.
CentOS7 is being tested at some sites.

SI-1 Dissemination Report (SL)
——————————
SL was absent and no report was submitted.

SI-2 ATLAS Weekly Review and Plans (RJ)
—————————————
RJ discussed containers from the conclusion of the ATLAS meeting, confirming there is enthusiasm for this.
There has been a great deal of optimising of code to make full use of jobs and a big push on doing more fast simulation which will change the way we work, especially to CPU sites.

SI-3 CMS Weekly Review and Plans (DC)
————————————-
On paper we ran out of tape down to 1PB to spare. There were some pledges in early which resolved this but this is an ongoing issue in CMS. It is very helpful to have the planning for this in bringing about a positive outcome.

SI-4 LHCb Weekly Review and Plans (PC)
————————————–
Nothing of significance to report, except for some job interruptions due to jobs at CERN.

SI-5 Production Manager’s report (JC)
————————————-
No report submitted.

SI-6 Tier-1 Manager’s Report (GS)
———————————
General:
– Ongoing work on the chillers.
– A problem with one of the five Microsoft Hyper-V hypervisors in the high availability cluster caused a number of VMs to reboot overnight Thursday-Friday.

Castor:
– We have an ongoing problem with the SRM SAM tests for Atlas which are failing a lot of the time. We have confirmed this is not affecting Atlas operationally it is just the tests that fails. We still have a GGUS ticket open with Atlas as the test appears to be problematic.
– We continue to fail CMS SRM SAM tests sporadically with timeouts.
– At the end of last week nine of the ’14 generation disk servers – each 100TB – were deployed into AtlasDataDisk. (These are from the batch that was used as CEPH test servers). A further six are being prepared to go into CMS.

Availabilities for February 2017
Alice: 100%
Atlas: 84%
CMS: 97%
LHCb: 98%
OPS: 100%

Comments:
Atlas – We believe a large part of the lack of availability is a problem with the SRM test itself. We have an outstanding GGUS ticket open with Atlas about this (GGUS#126847).
CMS: There is a steady race of timeout in the SRM tests. We will investigate further once the SRMs are updated.
LHCb: The lack of availability was dominated by an incident in the early hours of 22nd Feb. when there was a resource problem within the SRMs. This fixed itself in the morning.

SI-7 LCG Management Board Report of Issues (DB)
———————————————–
Meeting is tomorrow so no report as yet. The only topic on the agenda is the Oracle announcement and they are looking for a Tier-1 statement. AS has had some discussions in this regard and DB will email AS (and cc GS and PG) for a high level statement in this regard.

SI-8 External Contexts (PC)
———————————
Nothing to report.

REVIEW OF ACTIONS
=================
616.3: DB and SL will discuss how best to progress replacement of TW’s role. (Update: DB and SL have amended and awaiting DB final comments) Ongoing.
620.1 DB to contact DK re the procedure to deal with a security incident and the media. (Update: DK had devised an interim statement which involved TW as dissemination officer and he is no longer in post – there is no prescriptive full response as this would be dependent on circumstances and probably involve an emergency PMB and communication with relevant PR representatives). DK will send the statement to PMB in case required in future – spokesman SL as head of board or DB as project leader. Ongoing.
623.3: DB and AS will discuss how best to summarise the Tier1 review. (Update: a brief summary will be written up and presented). Ongoing.
624.1: AS will rework tape modelling taking account of recent changes. Ongoing.
624.2: PG will firm up the GridPP38 agenda and alternative topics in Session 4 and 5. Done.
627.1: DB will give consideration to appropriate slot for a discussion of Tier-2 Evolution at GridPP38. Done.
627.2: PG will ask Jeremy Marris at Sussex to talk at GridPP38. Done.

ACTIONS AS OF 20/03/17
======================
616.3: DB and SL will discuss how best to progress replacement of TW’s role. (Update: DB and SL have amended and awaiting DB final comments) Ongoing.
620.1 DB to contact DK re the procedure to deal with a security incident and the media. (Update: DK had devised an interim statement which involved TW as dissemination officer and he is no longer in post – there is no prescriptive full response as this would be dependent on circumstances and probably involve an emergency PMB and communication with relevant PR representatives). DK will send the statement to PMB in case required in future – spokesman SL as head of board or DB as project leader. Ongoing.
623.3: DB and AS will discuss how best to summarise the Tier1 review. (Update: a brief summary will be written up and presented). Ongoing.
624.1: AS will rework tape modelling taking account of recent changes. Ongoing.
628.1: RJ will invite Jarik Novak to speak at GridPP38.
628.2: PG and DC will discuss the possibility of an LZ talk at GridPP38.
628.3: ALL to offer comments on DB’s Tier-2 Evolution draft paper in the next 2-3 days.

628.4: LC will check availability w/c 25th September at Durham (27-29th) for GridPP39.