GridPP PMB Meeting 648

GridPP PMB Meeting 648 (30.10.17)
=================================
Present: Dave Britton (Chair), Tony Cass, Pete Clarke, Jeremy Coles, David Colling, Tony Doyle, Pete Gronbech (Minutes), Roger Jones, Dave Kelsey, Steve Lloyd, Andrew McNab, Gareth Smith.

Apologies: Andrew Sansum (Vidyo issues)

1. Cloud meeting and Scientific Computing Forum
===============================================
This has been a busy week with meetings indirectly related to GridPP:

PC, AM & DB attended a meeting with BEIS and Cloud vendors. They brought together research communities and cloud vendors, Amazon, Google, Microsoft, Alibaba and IBM and various users from across the research councils. Researchers gave talks in the morning, PC gave STFC’s view along with Jeremy Yates. In the afternoon there were the talks from vendors, then speed dating style quick meetings.

A lot of preparation was made for this meeting. Costs were not (allowed to be) discussed. DB thought there were good talks from researchers. The vendor talks were not uniformily interesting and somewhat frustrating – they think we are individual users connecting to their cloud, whereas we are talking about using their cloud as a backend to our infrastructure.
For example, IBM said you can run all their stuff at your site and then user will be able to submit to the local cluster of the cloud seamlessly.

The one-on-one meetings were more productive. DB attended two meetings – with Microsoft, who do have some understanding what researchers do. IBM were quite interesting since not just renting VM, but can rent bare metal – they may have something to offer, e.g. can capitalise this sort of deal. They could in principle respond to a capital bid, but the devil is in the detail as to how cost effective it would be.

Point DB made to all attendees – analogy of renting a car: we want to own a VW, but they try to rent us a Rolls Royce; how can that be cheaper? We run our centres quite well with a target reliability of 95%, but they are trying to rent us a service with 99.999% uptime. We then try to treat their Rolls-Royce as a VW by using spot-pricing and preemptable jobs. It was ultimately an interesting discussion.

GridPP wanted to ensure no one went away with the impression that we are naive users. There is some concern about what people in BEIS might decree. Charlotte Jamieson was in attendance and Nick Trigg Chaired the meeting – some years ago he wanted GridPP to buy re-packaged EGI services from Constellation. Tony Hay and AS, Phil Kershaw, and others attended from STFC.

LZ has apparently been told by Tony Medland that they would have to re think their computing model by the end of GridPP, which sounds ominous. The conclusion of the meeting was certainly not that we should use cloud but that we should continue the dialogue.
PC will send a summary of the meeting.

Friday was the CERN Overview board, followed by Scientific Computing Forum. PC noted the last meeting was a closed meeting, one or two from each country, but this one was much better and opened the attendance list. Paul Alexander was on the list for UK but did not attend. Similar set of people to UK-T0 type things. It was run by Eckhart Elsen and Ian Bird gave a HEP status report (a verbal summary of the talk he had given at the OB). There was a reports from the US, Spain and the UK. – Spain are downsizing, as they are contributing more than their fair share.

In the US, there is an emphasis on HPC, have to make use of the cycles, but such messages are counterproductive in the UK. If Government thinks HPC will solve our problems they are wrong as our HPC machines are always oversubscribed and there will never be 10% of an HPC that could meet our requirements. It is an expensive way to provide computing cycles to the LHC but it fits the US ambitions to build the biggest HPC machine. DB and PC gave a summary of the UK status and pressures in the UK.

Final talk on HSF road map – long documents created. Graeme Stewart stated they are summarizing them into one document.

2. AOCB
=======
None

3. Standing Items
===================

SI-0 Bi-Weekly Report from Technical Group (DC)
———————————————–
Insufficient number of people for a meeting, but there was a chat regarding Atlas tests with xrootd, etc.

SI-1 Dissemination Report (SL)
——————————
It was agreed that the Dissemination Report should be removed as a standing item (LC has actioned). With regard to the post – QM has sent an email to IC, no response has yet been received. DC believes it should all be happening smoothly, but SL has heard of one similar case where this process is not working correctly.

SI-2 ATLAS Weekly Review and Plans (RJ)
—————————————
Nothing significant to report.

SI-3 CMS Weekly Review and Plans (DC)
————————————-
Nothing significant to report.
CPU inefficiency group is finding lots of edge cases. Trends of Tier-1 efficiencies are improving. PG asked if this was making an effect at RAL and DC explained that CMS run 8 core pilots, but some Fortran bits only run on one core. Local vs remote data problem at RAL causes trouble.

SI-4 LHCb Weekly Review and Plans (PC)
————————————–
LHCb intervention SRM castor fix this week.

SI-5 Production Manager’s report (JC)
————————————-
Relatively light on operations items needing PMB discussion. For awareness, some items that may be of interest:

1. A focus of the monthly WLCG operations call this week will be to review plans for EL7 migration (https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes171102#Review_EL7_migration_plans). Several of our sites in the UK already run EL7/CentOS7. BNL will be presenting at the meeting to share their storage approach and plans.

2. Computing Insight UK 2017 registration closes on Tuesday (https://eventbooking.stfc.ac.uk/news-events/ciuk-2017). This year the theme is “Joining Up the UK e-Infrastructure”.

Some EGI led items:

3. EGI is starting a survey on NGI/site transition plans to IPv6. We will review it tomorrow and respond this week.

4. EGI Storage Accounting is expected to be deployed at all DPM/dCache sites by today (according to EGI), a deadline we will not meet as we are only 50% deployed. Nevertheless we are working towards all applicable sites being ready in November.

5. From November EGI intends to implement a new way for computing the weights for the NGIs average A/R values, introducing the concept of CE’s “computation power”: hep-spec * LogicalCPUs. We are reviewing the approach.

SI-6 Tier-1 Manager’s Report (GS)
———————————
The last report was on the 9th October. It has generally been fairly quiet operationally since then. Here are some of the significant points:

Echo:
——-
– Re-distribution of data in Echo onto the 2015 capacity hardware is still ongoing and making steady progress. We expect this to complete in a few weeks.
– CMS have been switched over to using xrootd.echo.stfc.ac.uk. This means that CMS jobs now use Echo as the primary means of accessing local data via xrootd. If data not found there it will fail over to Castor. There have been a couple of teething problems (a configuration error and then waiting for long-lived CMS jobs to finish.)
– The Echo Gateways have had a parameter change that means the GridFTP gateways make better use of memory. This will enable the number of connections to each gateway server to be increased.

Services:
————
– The FTS3 services have been updated to version 3.7.4. We now have both the FTS services on the same version and can progress consolidating to one instance and making use of a distributed database behind it. During this change there was a problem that led to the loss of some of the FTS settings – with the upshot that we had a problem with file transfers to Echo for some days afterwards until this was identified. We are also investigating dual stack IPv4/IPv6 for one of the existing FTS3 instances.

Castor:
———
* The tape servers are being upgraded to run SL7.

Staffing. (I think this section not for the minutes):
—————————————————————-
– As previously stated one of our DB Admins (Andrey Smirnov) has left and Andrew Lahiff will be leaving in December.
– Darren Moore started in Production Team last Monday. Darren will take over my role.

SI-7 LCG Management Board Report of Issues (DB)
———————————————–
There was no MB

SI-8 External Contexts (PC)
———————————
Nothing to report

REVIEW OF ACTIONS
=================
638.2: AS will check when equipment is due to become obsolete and investigate legal and manpower of donation to the African Data Centre for Bioinformatics and Medical Research. (Update: AS is looking into how this may impact Global Challenge Research Fund – GCRF – which would involve a cross-Council bid). Ongoing.
644.2: PG and AS will document plans and costings for the remainder of GridPP5 taking account of the Oracle tape issues experienced. Ongoing.
644.3: AS put together a starting plan for staff ramp-down. Ongoing.
644.4: AS will progress capture of funds for Dirac with Mark Wilkinson. Ongoing.
644.6: (during GridPP39 meeting) DB will ascertain what Biomed do and how they use the Grid. (Update: Holloway then Manchester and Imperial are biggest representatives – DC emailed to ascertain whether our input is being explicitly credited and was advised that GridPP has to sign up to something LSGC SLA in order to be acknowledged. (https://documents.egi.eu/public/ShowDocument?docid=2874). Ongoing.
647.1: PG will update Data Management Plan. Ongoing.

647.2: DB will circulate link for Data Management Plan once agreed. Ongoing.

ACTIONS AS OF 30.10.17
======================
638.2: AS will check when equipment is due to become obsolete and investigate legal and manpower of donation to the African Data Centre for Bioinformatics and Medical Research. (Update: AS is looking into how this may impact Global Challenge Research Fund – GCRF – which would involve a cross-Council bid). Ongoing.
644.2: PG and AS will document plans and costings for the remainder of GridPP5 taking account of the Oracle tape issues experienced. Ongoing.
644.3: AS put together a starting plan for staff ramp-down. Ongoing.
644.4: AS will progress capture of funds for Dirac with Mark Wilkinson. Ongoing.
644.6: (during GridPP39 meeting) DB will ascertain what Biomed do and how they use the Grid. (Update: Holloway then Manchester and Imperial are biggest representatives – DC emailed to ascertain whether our input is being explicitly credited and was advised that GridPP has to sign up to something LSGC SLA in order to be acknowledged. (https://documents.egi.eu/public/ShowDocument?docid=2874). Ongoing.
647.1: PG will update Data Management Plan. Ongoing.

647.2: DB will circulate link for Data Management Plan once agreed. Ongoing.