GridPP PMB Meeting 639

GridPP PMB Meeting 639 (03.07.17)
=================================
Present: Dave Britton(Chair), Tony Cass, Pete Clarke, Jeremy Coles, David Colling, Tony Doyle, Roger Jones, Dave Kelsey, Steve Lloyd, Andrew McNab, Andrew Sansum, Gareth Smith, Louisa Campbell (Minutes).

Apologies: Pete Gronbech.

1. Network Look Forward
=======================
PC circulated the document and reminded members text in blue needs updating – Atlas and CMS have been amended on the basis of discussion at last week’s PMB (RJ and DC need to review and approve). DC asked for CMS advice on this, with the increased use of AAA and data movement the advice was on the timescale of a few years we should be looking at 100GB for Large Tier-2s (this is still not defined but Imperial is likely to be one) as we currently use more than 30GB per second. This is meant to be 2 x 2 x 10 but Imperial run 35. The text should be updated to, for example, “at its main UK Tier-2 site, Imperial, will require bandwidth of c.40 GB and may look to increase to 100GB per second in the future”. It may be preferable to lay out the requirements of 20 but under each of Atlas and CMS present potential exceptions to standard. We need to discuss RALPP status in CMS and Brunel have sufficient resources for now and won’t procure more. The statement relating to Atlas will be altered to reflect that 20GB is the rule but note 40GB in the context this is the exception rather than the rule. There is currently no statement on OPN and Janet, AS will look at this once the experiment text has been updated. AS will look at the CMS text and plots and provide a statement to PC who will distribute it to the Tier-2 sites after the Ops meeting tomorrow.

2. Dissemination Officer Post
=============================
SL noted the post has been advertised and no suitably qualified candidates applied. Discussion surrounded whether the post required to be based at QMUL, in the past it has been moved to UCL. This is a non-FEC post until March 2020. One option may be to 50% fund a post at Imperial (DC and SL will check the practicalities of this or whether this could be funded in an alternative institution, e.g. Imperial, or continue to be funded at QMUL).
ACTION 639.1: DC and SL will investigate the possibility of filling the Dissemination Officer post at Imperial or QMUL with 50% FTE.

3. Locked Space Technologies
============================
DC spoke to Simon and previously some nodes and an account was set up then handed to them and this was not progressed by Locked Space. DC will forward some information on this to SL.

4. AOCB
=======
a) PC advised the email from Susan Morrell on a NEI survey relates to a new business case being prepared for submission in September. Previously, NEI survey has included infrastructure providers and possibly communities, but not to Users. AS may email Susan for some clarification on the ‘users’ and cc PC.
b) STFC will put out a call on e-infrastructure (drafted by Anthony Davenport) he believes it should relate to PPAN science. UKTO community are working up some text for response and PC and AS will work something up relating to SCD aspects.
c) There has been a proposal from Catalin to host HT-Condor workshop of c. 60 delegates in the first week of September and looking for GridPP support of c. £2500-£2700. There may be a registration fee considered and GridPP may be able to commit some funds to supplement registration fees raised. It is possible that this may follow previous workshops re venue and accommodation. It was agreed to express strong support and commend Catalin’s work in this regard; however, GridPP has a non-increasing travel budget so can support up to a maximum of c. £2000 but can review this should additional funds be required. This will be a benefit to the community.
d) RAL CPU efficiencies – circulated from June. These are good with the exception of CMS. DC noted an initial report from 2 weeks ago, tomorrow’s meeting will provide more information – there are clearly multiple factors at play and all will be analysed and quantified.
ACTION 639.2: AS will email Susan Morrell for clarification on terminologies relating to the NEI Survey.
ACTION 639.3: DB will write to Catalin advising that GridPP will support the planned HT-Condor workshop to c. £2000.

5. Standing Items
===================

SI-0 Bi-Weekly Report from Technical Group (DC)
———————————————–
Last week’s meeting was cancelled and there is nothing to report.

SI-1 Dissemination Report (SL)
——————————
Nothing of significance to report.

SI-2 ATLAS Weekly Review and Plans (RJ)
—————————————
Nothing of significance to report.

SI-3 CMS Weekly Review and Plans (DC)
————————————-
Nothing of significance to report.

SI-4 LHCb Weekly Review and Plans (PC)
————————————–
Nothing of significance to report.

SI-5 Production Manager’s report (JC)
————————————-
There was little to report from the last week. One point to come back to today (following a query from AS) is the status of solidexperiment.org. This was discussed on 16.01.17. AS confirmed this appears to be straightforward and feasible – this was approved in principle last year and VOs were run at that time. Janus is writing up the data moving software, Daniella and Simon have been happy to run this through Dirac. This can be set up, but needs to have a profile set up and allocated (1 PTB over 5 years).

Some points of interest.

1. The next GDB on 12th July has a security bent: https://indico.cern.ch/event/578988/. Topics include container technology, a proposal for an authorisation working group, an update on HNSciCloud, then a series of workshop reports (including of the WLCG workshop).

2. LSST may run a data challenge in the coming months. Consequently we have encouraged sites to consider enabling LSST now (especially those with an association with the VO).

3. EGI continue to run an NGI Operations Centre Managers meeting every month. The agenda for the meeting last Thursday is online at https://indico.egi.eu/indico/event/3237/. Topics include WMS decommissioning, ARGO monitoring development and the new accounting portal release.

4. The pilot version being used by VAC (for GridPP) needs to be updated as GridPP DIRAC has moved to support a newer version.

SI-6 Tier-1 Manager’s Report (GS)
———————————
Castor:
——-
– There have been some intermittent problems accessing the Atlas Scratch Disk. We received a GGUS ticket from Atlas but the problem had resolved itself beforehand. However, it has also recurred. At present the cause is unknown.

Services:
———
– There were problems with xroot redirection for CMS last Thursday to Friday. The usual fix – restarting the services – didn’t work.
A full disk area on one of the nodes was found to be the cause.

Echo:
—–
– The number of placement groups for the Atlas pool is being steadily increased. This is being done ahead of the installation of the new capacity hardware.

Networking:
———–
– We are tracking the ongoing problem with the site firewall that affects data flows.
– Implementation of the third 10Gbit link for the OPN to CERN took place last Wednesday (28th June). We are keeping a watch on the link. Packet loss (as reported by Perfsonar) seems to have been unaffected – it remains low. Data is flowing across all three links.
However, load balancing across all three links is not even – but this may need more time to balance out.
– The link between R89 and the Atlas building was increased from 2*10Gbit to 2*40Gbit last Tuesday.

Hardware:
————
– For the last purchase of capacity hardware:
– The disk servers are around three weeks into their acceptance testing (another week or so to go). So far so good. However, there remains some work to do in the full configuration of these servers. They are quite different to previous purchases with a CPU box each connecting to two external chassis with disks. This has redundant paths to each disk which will need configuration within the OS.
– The testing of the CPU started last week.

CPU Efficiencies: Here is the report from Andrew Lahiff for the LHC Experiments:
————————————————————————
Global CPU efficiency (CPU time / wall time) was up in June at 88.3%, compared with 77.3% in May. Of 232894 HEP-SPEC06 months available wall time, 223202 HEP-SPEC06 months were used (95.8% occupancy). Experiment summary:

Experiment CPU Time Wall Time Wait % Efficiency
HEP-SPEC06 Months
ALICE 28174.71 31175.60 3000.90 90.37
ATLAS 99063.16 106247.40 7184.24 93.24
CMS 17786.78 31600.15 13813.38 56.29
LHCb 46247.15 47391.04 1143.89 97.59

LHC Total 191271.79 216414.19 25142.40 88.38

SI-7 LCG Management Board Report of Issues (DB)
———————————————–
The meeting will take place next week.

SI-8 External Contexts (PC)
———————————
DB has invited Anthony Davenport to attend the next GridPP collaboration meeting in Lancaster, he should be asked to present a high level talk in the opening session on RCUK vision to put into context for delegates. As the program manager for infrastructure his input would be invaluable and the collaboration meeting provides an ideal opportunity to demonstrate what GridPP does. Tony Medland was previously chair of EUT0 – in the UK Tony has passed over attendance to Anthony and PC will continue to provide support and advice to Anthony in this regard.

REVIEW OF ACTIONS
=================
630.2: DB and PG will continue to work on metrics and funding strategies at the macro level. Ongoing.
630.3: DB will tweak his metrics and funding model based on CPU. Ongoing.
638.1: PC will update the text on the Network Forward Look document for the forthcoming 2 years. Ongoing.
638.2: AS will check when equipment is due to become obsolete and investigate legal and manpower of donation to the African Data Centre for Bioinformatics and Medical Research. (Update: AS is looking into how this may relate to the Global Challenge Research Fund – GCRF – which would involve a cross-Council bid) Ongoing.
638.3: SL and DC will prepare a statement relating to Locked Space technologies. Ongoing.

ACTIONS AS OF 03.07.17
======================
630.2: DB and PG will continue to work on metrics and funding strategies at the macro level. Ongoing.
630.3: DB will tweak his metrics and funding model based on CPU. Ongoing.
638.1: PC will update the text on the Network Forward Look document for the forthcoming 2 years. Ongoing.
638.2: AS will check when equipment is due to become obsolete and investigate legal and manpower of donation to the African Data Centre for Bioinformatics and Medical Research. (Update: AS is looking into how this may impact Global Challenge Research Fund – GCRF – which would involve a cross-Council bid) Ongoing.
638.3: SL and DC will prepare a statement relating to Locked Space technologies. Ongoing.
639.1: DC and SL will investigate the possibility of filling the Dissemination Officer post at Imperial or QMUL with 50% FTE.

639.2: AS will email Susan Morrell for clarification on terminologies relating to the NEI Survey.
639.3: DB will write to Catalin advising that GridPP will support the planned HT-Condor workshop to c. £2000.