GridPP PMB Meeting 602

GridPP PMB Meeting 602 (18.07.16)
=================================
Present: Dave Britton(Chair), Tony Cass, Pete Clarke, Jeremy Coles, David Colling, Pete Gronbech, Roger Jones, Dave Kelsey, Steve Lloyd, Andrew McNab, Andrew Sansum, Louisa Campbell (Minutes).

Apologies: Gareth Smith, Tony Doyle.

1. CHEP and other (WLCG) conference funding
===========================================
There are three main issues relating to CHEP in San Francisco: it is very expensive; GBP is falling so the exchange rate is low and falling; and we don’t yet have a full list of those who want to go. [It was subsequently confirmed that GridPP had 11 talks and 15 poster accepted] Even with only 50% funding contribution from GridPP there will be a much higher spend than last year when 15 people attended at a cost of £15,000. This is compounded since this year’s travel budget is reduced, there may be some flexibility which would impact next year’s budget, but there is no CHEP next year so this may be manageable. DB acknowledged our position of success given the unprecedented number of accepted contributions and the importance of providing support wherever possible was agreed. Oral presenters should be offered 50% support for work directly relevant to GridPP with a limit of c. £1500. There was discussion on the direct relevance of the Brunel talk that was not clearly related to GridPP. It was agreed that a few poster presenters should be supported but we should balance funding across institutes as far as possible.

The co-located WLCG workshop costs will be in the region of £500-£700 and is core business in a different way to CHEP which is used to present successes. Thus, the WLCG workshop should be considered independently to CHEP and funded at 100%.

DK will present a list with proposed contributions to CHEP, DK will put costs for attending CHEP and WLCG workshop into a spreadsheet for PMB to consider. The conference dinner is not included in the registration fee and costs an additional $100 DB has registered and refused to pay this – PC has paid and will contact the organisers to request a refund (subsequently received) Given the significant costs involved, a policy decision was taken to discourage attendance at the conference dinner, perhaps a separate/cheaper GridPP meal can be arranged.

ACTION 602.1: DK will put costs for CHEP and WLCG workshop attendance into a spreadsheet for PMB to consider.

2. LHCb Disk resource at the Tier-1
===================================
As noted at the previous resource meeting, additional resource may be found but our 30% pledge for this year is based on the resources requested at the time the pledges were made (LHCb requirement increased in October). Discussion took place on additional provision over and above the pledge since the disk is not yet full but may be later in the year. It was agreed the correct pledge was made at the time and best effort will be made to meet increased requirements beyond that, subject to sufficient disk capacity. New pledges will be made at the end of August for 2017 according to relevant information at that time. Operational and policy statements should determine the likelihood of meeting that best effort. Internal estimates at Tier-1 should now be made, this will also depend on migration to CEPH and resources. Checks on pledges are routinely made before ascertaining levels of resources that can be provided. The possibility of LHCb making use of the new CEPH as a way of getting additional disk space was discussed, but this seems unachievable at this time. It was questioned whether the Russian T1 was now operational which should reduce the UK fraction.

ACTION 602.2: AM and AS to resolve the LHCb request at the end of July.

3. ANEAS
========
This is an H2020 project was funded to develop the concept and details associated with SKA European centre. This is primarily a paper-based exercise starting on 27.1.17 for 36 months (coinciding with the end of GridPP5). Alongside this there will probably be testing – not a global distributed test-bed, but parts of test infrastructure. We offered some of AS time funded by SCD, some GridPP funded time from Tier-1 manager and some use of the GridPP test-bed infrastructure. In return ANEAS will fund 0.1 FTE (10%) share of Tier-1 manager role and 5% of Brian’s role. This is not a great deal of funding but has the benefit of our contribution to decisions and some shared post funding and intangible benefits.

AAAI project of Jeremy Yates is confirmed as some funding will be provided to RAL, Jens is in touch with Jeremy on this. AS summarised our contribution to the project is to receive funding to RAL (3 person months), but the start point is not clear, to work on gatewaying between Moonshot and X509. Jens is aware that this is about getting work running on GridPP infrastructure that is being authenticated out of Moonshot without a Grid certificate. Three months may not suffice to achieve a production system but there should be a proof of principle to help to ascertain a way forward. Since this is in GridPP’s name some method of reporting in to the PMB is necessary. AS confirmed the 3 months may run from project month 6 (March or April), most activities will be next financial year and perhaps needs to be increased to 6 months. AS will manage internally and if work extends beyond 6 months it could be monitored through quarterly reports to the PMB. – AS will request a commitment from Jens who should make a presentation to the PMB backed up with a written report.

ACTION 602.3: AS will request that Jens make a presentation to the PMB supported by a written report on plans for ANEAS as well as proposed reporting.

ACTION 602.4: AS will contact Charlotte to determine how much effort in total STFC funding in the AAAI project so that we can see if there are other contributions we should expect.

4. GridPP37 Agenda and registration
===================================
A registration page has now been set up and circulated to DB for testing. RJ has been in touch with the sponsors – BOIS (agents of Boston) – to request a logo and other information for the website.

DB emailed UKHEPGrid asking for suggested topics and did not receive much response, though some comments were raised at the Ops meeting, e.g. Accounting and Apel and how this is published as well as Benchmarking – Alessandra is involved in this. Other possibilities include Tier-2 Evolution, particularly how to manage the storage side; and New Technologies. PG will follow up on this and draft some information to circulate to the PMB. Accounting is an ongoing matter and the new portal appears to address many previously experienced issues. Some aspects, e.g. accounting interactions with the batch queues, relate to low level system management and how sites organise their accounting so it would be a very useful topic for discussion. Alessandra and John Gordon from the Apel side are best placed to discuss this and an update talk from George Ryall and Adrian from RAL would be helpful. This will be a long session which could turn into 1.5 sessions on day 1 to address various interconnected elements.

a) Meeting Theme title suggestions included “Negotiating our Position with WLCG” or “Stepping Stones”. PG will consider further and make suggestions.

b) Session timings / subjects

Accounting portal – PG will pull this together.

Bigger picture than LHC experiments – a session on non-LHCb VOs inviting talks from sites that have made progress. PC can update on the state of play/politics for 5/10 minutes. An update on the LLST promise once a decision is made would be very useful, perhaps from George Beckett – PC will approach him.

Participation from SuperNemo may be invited. PG has spoken to Ben Morgan from Warwick who will further discuss with PC at Birmingham. A contribution from DUNE as a means of networking re Neutrino projects/requirements may be useful. Also, a potential contributor on the SKA project, perhaps JC or a suggestion from him on an astronomer – possibly Anna or Rosa.

Technical aspects, especially storage, with contributions from Sam Skipsey and possibly Brian. Marcus gave a talk at HEPSYS (and possibly at CHEP) but this may not be appropriate at GridPP. JC, PG or AM may have some suggestions, but storage and benchmarking would be good. Post setting up statistics, usage and work fractions being used and performance would be useful to ensure VCycle is operating effectively. AM will present something.

WLCG other matters, e.g. Maarten Litmaath could be invited but it is perhaps more effective for him to contribute in Spring.

Dirac – this needs more discussion time allocated, e.g. set-up and solutions provided to the small VOs. PG recently emailed Birmingham who are no longer funded for Ganga and not reporting on this so it is likely to decline in the future. There was a question on funding for Rob – DC is looking into accelerator funding to take him forward until March, he is happy to support any activities e.g. PG PRD funding application and Daniella’s work. He will check whether any funding has been successful for LZ for Dirac.

Tier-1 – AS discussed talks from Tier-1 or related teams. Potentially useful topics include: update from CEPH activities; network from both problems earlier in the year; and heavy use of network more recently. Most of the focus on this may relate to the Tier-1 meeting in October which should not be pre-empted and will covered during the F2F PMB. AS will consider security and other elements, perhaps IBP6 deadline and plans.

ACTION 602.5 PG will consider a title for GridPP37

ACTION 602.6: PG will pull together a session relating to the Accounting portal perhaps after the tea break on day 1 and potentially running into the afternoon.

ACTION 602.7: PC will approach George Beckett to provide an update from LST and pull together a session on non-LHC VOs.

ACTION 602.8: JC will discuss with Anna the possibility of giving a talk at non-LHC VO session and put together a session with suggested speakers, e.g. Sam Skipsey and Brian.

ACTION 602.9: AS will consider potential content for Tier1 session.

5. AOCB
=======
a) Brian’s question re disk profiling at T2s and how much information should be exposed to disk for WLCG. PC will give a talk at the PPAC meeting next week on computing and would like an updated set of numbers indicating the leverage of GridPP. This is available through the calculations for the proposals or can be compared with the pledge. SL has undertaken a survey for the T2 h/w and PG has added an extra line in the report including this projection of possibly 2.5 or 3 times as much. PC requested information so that he can try to plot this.

b) Email from Tom regarding press releases.
It was agreed the template looks very useful and the PMB thanked Tom for preparing it.

ACTION 602.10: PG will circulate information on disk profiling at T2s for Brian and PC to use in forthcoming presentation.

6. Standing Items
===================

SI-0 Bi-Weekly Report from Technical Group (DC)
———————————————–
No report submitted.

SI-1 Dissemination Report (SL)
——————————
##GridPP Engagement Officer Notes for PMB

###Satellite Applications Catapult Satuccino BBQ (near RAL)

TW attended the annual Satellite Applications Catapult Satuccino BBQ on Wednesday 6th July 2016. Met a few potentially interesting contacts from the space sector groups and SMEs based around RAL, the first meeting from which is…

### Meeting with Stephen Ringler, Oxfordshire Space Cluster Development Manager for STFC

Scheduled for Tuesday 19th July 2016 at RAL to discuss GridPP (and RAL SCD) facilities and resources for potential collaborations.

SI-2 ATLAS Weekly Review and Plans (RJ)
—————————————
No report submitted.

SI-3 CMS Weekly Review and Plans (DC)
————————————-
No report submitted.

SI-4 LHCb Weekly Review and Plans (PC)
————————————–
No report submitted.

SI-5 Production Manager’s report (JC)
————————————-
1. The Tier-2 availability/reliability figures for June show sites are stable at the moment (http://wlcg-sam.cern.ch/reports/2016/201606/wlcg/):

ALICE (http://wlcg-sam.cern.ch/reports/2016/201606/wlcg/WLCG_All_Sites_ALICE_Jun2016.pdf): All okay.

ATLAS (http://wlcg-sam.cern.ch/reports/2016/201606/wlcg/WLCG_All_Sites_ATLAS_Jun2016.pdf):
Birmingham: 83%:83%

CMS: (http://wlcg-sam.cern.ch/reports/2016/201606/wlcg/WLCG_All_Sites_CMS_Jun2016.pdf): All okay.

LHCb (http://wlcg-sam.cern.ch/reports/2016/201606/wlcg/WLCG_All_Sites_LHCB_Jun2016.pdf): No results!

Birmingham: Persistent disk issues on an old disk server affected the site during June. The server is being replaced but cannot be decommissioned until files have been migrated.

2. From 1st July EGI monitoring moved from Regional SAM to Central ARGO (still nagios based). We are seeing some issues with the ARGO tests and raising tickets as appropriate. For now VO Nagios will remain running regionally as ARGO does not offer this functionality but the main gridppnagios service is being decommissioned.

3. The July pre-GDB (12th July) was on Security Operations Centre activities. The GDB on 13th (https://indico.cern.ch/event/394784/) looked at an update on HNSciCloud (the tender is ready); the BDII and information system; Data issues in WLCG and accounting.

4. The status of the VO solidexperiment.org has moved to production status today.

5. The EFDA-JET Grid cluster is now (since 12th July) in downtime as it is being decommissioned. The GridPP PMB notes the scheduled retirement of the Tier-2 site hosted by EFDA -JET and would like to acknowledge many years or productive collaboration and thank our colleagues for the contributions to GridPP of this professionally run site.

6. The SL5 (main services) migration is now effectively complete in the UK. There is one main update still to be completed in relation to the Tier-1 Castor SRM systems. The SRM upgrade is waiting on a Castor upgrade (expected in the August timeframe).

SI-6 Tier-1 Manager’s Report (GS)
———————————
Tape Library:
Significant progress has been made and we now have a stable system. Beforehand we had seen that we could have the system running stably when not all the tape drives were enabled. In was then found that the rate at which we were monitoring the tape system was critical. Although our monitoring has been in place for a long time its frequency had been increased in response to the hardware problems. This was done as in some cases Castor could become out of step with the status of the tape drives – and the monitoring would re-synchronize them. Reducing this monitoring rate – and improving some long-established code included in it – has enabled us to return to stable running with all tape drives in use. We understand that Oracle will be releasing an update to their software in response to this. They recognized they should at least be able to identify such a problem.

We do have one item left for the tape library: Oracle will come in and carry out a preventative maintenance on it. This is a follow up to the hardware problems of a month or so ago. It will mean an outage of a bit less than a working day (maybe around 6 hours) that we will schedule and announce in due course.

The migration of the Atlas data from ‘C’ to ‘D’ media continues and be are over half way through this migration.

Castor:
– We are in discussion with CERN about a problem seen when testing Castor 2.1.15. The problem is in the interaction of GridFTP with Castor.
– An additional nine disk servers that will replace the oldest ones in the disk caches for tape are being prepared to go into Castor.

Batch:
– There have been some delays in getting the batch of HPE worker nodes set-up so that we can run our tests and software. Some technical issues have been worked through and these should start running our tests in the next week.

Network:
– We have seen continues high use of the OPN link – particularly in this last week. (See attached plot).

Tier1 Availabilities for June – pleasingly all 100%
Alice: 100%
Atlas: 100%
CMS: 100%
LHCb: 100%
OPS: 100%

SI-7 LCG Management Board Report of Issues (DB)
———————————————–
The meeting takes place tomorrow, DB may not be in attendance but DK or PC will cover this.

SI-8 External Contexts (PG)
———————————
No report submitted.

Next PMB meeting is 8 August 2016

REVIEW OF ACTIONS
=================
599.1 – SL will update h/w survey spreadsheet and circulate to PMB. (Update: PG spoke to Ben Morgan and acquired more information regarding Supernemo requirements. Ben will attend PPAN in Birmingham and PG will forward email to DB progressing if necessary). Done.
600.1: DC to contact Julia Sedgebeer at Imperial to informally discuss and address SuperNemo’s computing needs and request Daniella and Tom to await outcome of these discussions before progressing further. Ongoing.

600.2: DB/PC will consider whether to contact the head of SuperNemo in the UK discussing support requirements. Ongoing.

600.3: PC and DB will draft up initial thoughts on how best to respond to enquiry from DOE regarding GridPP computing support and circulate to PMB for comment. Done.

601.1: ALL to look at the draft policy document supporting new VOs and feed back comments to PC by the end of this week. Ongoing.

601.2: DB will send out a call to the UKHEPGRID inviting suggestions for themes, sessions and presentations for GridPP37. Done.
601.3: PG will invite suggestions for themes, sessions and presentations for GridPP37 at the Ops meeting. Done.
601.4: PC will mention commencement of accounting period from 1 July 2016 at the Ops meeting. Done.
601.5: AS will check why the portal does not yet include July figures. Done.
601.6: PC to check guidelines to submit PRD to STFC to develop elements on top of openstack to allow other communities to benefit from the cloud. Ongoing.
601.7: AS will read through Tom Whyntie’s MOU document and pass to PC. It was agree there is no overlap and each document is valid in their own right. Done.

ACTIONS AS OF 18.07.16
======================
600.1: DC to contact Julia Sedgebeer at Imperial to informally discuss and address SuperNemo’s computing needs and request Daniella and Tom to await outcome of these discussions before progressing further. Ongoing.
600.2: DB/PC will consider whether to contact the head of SuperNemo in the UK discussing support requirements. Ongoing.
601.1: ALL to look at the draft policy document supporting new VOs and feed back comments to PC by the end of this week. Ongoing.
601.6: PC to check guidelines to submit PRD to STFC to develop elements on top of openstack to allow other communities to benefit from the cloud. Ongoing.
602.1: DK will put costs for CHEP and WLCG workshop attendance into a spreadsheet for PMB to consider.
602.2: AM and AS to resolve the LHCb request at the end of July.
602.3: AS will request that Jens make a presentation to the PMB supported by a written report on plans for ANEAS as well as proposed reporting.

602.4: AS will contact Charlotte to determine how much effort in total STFC funding in the AAAI project so that we can see if there are other contributions we should expect.

602.5 PG will consider a title for GridPP37

602.6: PG will pull together a session relating to the Accounting portal perhaps after the tea break on day 1 and potentially running into the afternoon.

602.7: PC will approach George Beckett to provide an update from LST and pull together a session on non-LHC VOs.

602.8: JC will discuss with Anna the possibility of giving a talk at non-LHC VO session and put together a session with suggested speakers, e.g. Sam Skipsey and Brian.

602.9: AS will consider potential content for Tier1 session.
602.10: PG will circulate information on disk profiling at T2s for Brian and PC to use in forthcoming presentation.