GridPP PMB Meeting 577

GridPP PMB Meeting 577 (26.10.15)
=================================
Present: Pete Gronbech (Chair), Tony Doyle, Roger Jones, Tony Cass, Andrew McNab, Andrew Sansum, Jeremy Coles, Dave Colling, Steve Lloyd, Claire Devereux, Pete Clarke, (Minutes – Louisa Campbell)Apologies: Gareth Smith, Dave Kelsey, Dave Britton

1. Finalising Hardware (HW) Numbers
====================================
A discussion on finalising the tier-2 hardware allocation to the Tier-2s based on the metrics.

ACTION 577.1 SL to provide relevant hardware numbers to PG.

2. Additional proposed project with LSE
=======================================
Will Venters (LSE) emailed on 22.10.15 with an invitation to be involved as a case study in a large project (c.£5M) looking at “interfaces”. It was noted that his interest would be more relating to verification and issues of management for LHC computing etc, although the other partners in the project were looking at technical (computing) interfaces. It is thought most likely our input will take the form of interviews and observations as with the earlier Pegasus Project that studied GridPP over a number of years. SL and DB to arrange meetings as required. It was agreed that involvement would be beneficial on several levels, including ResearchFish data and outreach judging by our involvement on the previous project. There is likely to be less requirement for interviews and our involvement probably restricted to insights on the interface.

3. QReports
===================
Most reports are now submitted and CD confirmed receipt of the final input for the final 2 quarter report, these should be submitted to PG by end of today.

PG mentioned Q3 reports and requested all reports to be submitted urgently. He will give an update on Q2 reports next week if all are received in time. GS has been sent a request for Q1 report.

ACTION 577.2 CD to submit final 2 QReports.

ACTION 577.3 PG to prepare update on Q2 reports for next PMB.

5. Update on wLCG Technical Strategy Group
==========================================
JC has recently sent a request for more information to Ian Bird and awaits a response, he will follow up next week during the GDB. Ian advised that things have evolved slightly and he will provide more detail soon.

Re the GDB nominations – Ian is currently checking that the 3 nominees received still want to stand. He will be in a position to circulate names and information shortly. There was some discussion on procedures for the vote at next week’s GDP meeting and it is thought that country reps are able to vote in person, via email or by proxy – Ian will advise.

6. AOCB
=======
1) UK-T0 meeting took place last week. AS and PC will prepare and circulate a summary report next week. In the interim they note that the meeting was positively received and attended by a good range of disciplines who all endorsed the meeting. It was agreed that the concept of regular/monthly meetings is extremely helpful in encouraging focussed discussion and action.

ACTION 577.4 AS and PC will prepare and circulate a summary report on the UK-T0 meeting last week.

2) Following on from meeting on 29.9.15, SL is retaining more metrics information so that anomalies can be checked and the page should be modified to record on: Elapsed Time rather than CPU with 50/50 weighting. This should take effect from 11 April but will be monitored from now until then as it is easier to access and share.
Gareth Roy (Glasgow) has noticed an issue between lapsed time and lapsed time on NCPU which has more appropriate/accurate/up to date numbers. It is clear that the production portal lags behind the development portal and to do the accounting we need the correct figures in the next production version. The numbers are not currently properly displayed in the accounting portal but are correctly included in the database. John Gordon can supply correct data from the database if an email enquiry is submitted. It was noted that UCL appears in the database but not in the portal. This point should be revisited next week and it has been suggested that RAL could publish the data straight from the website.

5. Standing Items
==================
SI-0 Bi-Weekly Report from Technical Group (DC)
——————————————-
DC stated that things were progressing on Tier 2 Evolution. Tests are being undertaken but there is nothing significant to report and things are broadly on course as expected. Members were directed to look at the Jira summary page for a report on progress.

SI-1 Dissemination Report (SL)
——————————————-
Nothing to report.

SI-2 ATLAS Weekly Review and Plans (JR)
——————————————-
RJ noted that we are heavily loading RAL tape and this is not helped by castor issues. FAC set-up at RAL. We are looking to move Sussex over to a RAL site and decommissioning disc at all Tier 2 sites, RJ will look at data requirements for removal. Production disc – it was noted this is space saving work, the production disc is still running – this exercise is merely about merging of discs.

SI-3 CMS Weekly Review and Plans (DC)
——————————————-
Nothing to report.

SI-4 LHCb Weekly Review and Plans (PC)
——————————————-
Nothing significant to report.

SI-5 Production Manager’s report (JC)
——————————————-
1. I received a question about the use of the GridPP website for advertising our vacancies. It makes sense to collate our openings, therefore do you think we should/could advertise GridPP vacancies on the GridPP web site? The main issue would be managing the adverts (i.e. updating and removing).
The meeting consensus was that this was a good idea and should be tried.
2. There is an LHCOPN and LHCONE joint meeting at the Science Park Amsterdam (NL) 28-29 of October 2015. Is there any GridPP representation?

Pete Clarke said he was going and thought Duncan Rand may also be attending. (It turns out that Duncan cannot make it but will provide Pete with some results/updates from the Imperial LHCONE connection).

3. GridPP had a good attendance at HEPiX (https://indico.cern.ch/event/384358/timetable/#all.detailed) for the autumn/fall meeting and our contributors gave some well received talks. The next meeting is 18th-22nd April in Zeuthen, Germany.

4. There have been some useful follow-up discussions with VOs we are already supporting and others we may support, during and after the UK-T0 meeting last Tuesday and Wednesday: https://eventbooking.stfc.ac.uk/news-events/uk-t0-workshop-296?agenda=1.

5. (Also mentioned by Roger) Matt Raso-Barnett will be leaving Sussex in December and the site is therefore looking at lightweight options for remaining engaged (e.g. Vac).

As always, lots more detail can be found in https://www.gridpp.ac.uk/wiki/Operations_Bulletin_Latest.

SI-6 Tier-1 Manager’s Report (GS)
——————————————-
Castor:
– The upgrade of the Castor Oracle databases to version 11.2.0.4 took place has been completed. The final step took place successfully on Tuesday 13th October.
– We have seen very high load on the Atlas tape instance. Five additional servers have been added – doubling the size of the disk cache for AtlasTape.

Networking:
– No significant changes made in this period. We are keeping a close watch on some low-level packet loss within our network. We are continuing with the changes needed to remove the old ‘core’ switch from the network.

Batch:
Regarding Action 576.4: (glexec test failures for CMS after the final batch of worker nodes was updated – leading to loss of availability). I do not yet have what I consider a satisfactory answer – please leave action ongoing. However, it is clear that when we had only one batch of worker nodes still to be updated then the large majority of these test jobs ran on the few remaining un-upgraded nodes.

SI-7 LCG Management Board Report of Issues (DB)
——————————————-
DB was absent and the Management Board meeting was postponed from last week until next week. There was a brief discussion of items on the agenda.

REVIEW OF ACTIONS
=================
574.2 On CMS T1 efficiency discrepancies – DC reports CMS are running multicore pilots on single core jobs, but Atlas are doing correctly on higher efficiency. Ongoing.

574.8 DB to obtain information from PC about conclusion of MB discussion on Memory Items for the Future and share with PMB members. Ongoing

576.1 AS To ask of Martin Bly will represent GridPP on the benchmarking group and feed back to the PMB. Done

576.2 DB to talk with DK about the policy for requiring visit notices. Ongoing.

576.3 SL to implement authorization for GridPP funded visits on the system already in use by the experiments. Ongoing.

576.4 GS to respond to the PMB with the explanation of why the glexec test failures were not seen previously during testing and roll-out of the Tier1 worker node configuration. Also to provide list of resulting actions to mitigate this type of problem in future. Ongoing.

ACTIONS AS OF 26.10.15
======================
574.2 On CMS T1 efficiency discrepancies – DC reports CMS are running multicore pilots on single core jobs, but Atlas are doing correctly on higher efficiency. Ongoing.

574.8 DB to obtain information from PC about conclusion of MB discussion on Memory Items for the Future and share with PMB members. Ongoing.

576.2 DB to talk with DK about the policy for requiring visit notices. Ongoing.

576.3 SL to implement authorization for GridPP funded visits on the system already in use by the experiments. Ongoing.

576.4 GS to respond to the PMB with the explanation of why the glexec test failures were not seen previously during testing and roll-out of the Tier1 worker node configuration. Also to provide list of resulting actions to mitigate this type of problem in future. Ongoing.

577.1 SL to provide relevant hardware numbers to PG.

577.2 CD to submit final 2 QReports.

577.3 PG to prepare update on Q2 reports for next PMB.

577.4 AS and PC will prepare and circulate a summary report on the UK Tier meeting last week.