GridPP PMB Meeting 591

GridPP PMB Meeting 591 (29.02.16)
=================================
Present: Dave Britton(Chair), Pete Gronbech, Dave Kelsey, Andrew Sansum, Jeremy Coles, Gareth Smith, Pete Clarke, David Colling, Tony Doyle, Tony Cass, Andrew McNab, Louisa Campbell (Minutes).

Apologies: Steve Lloyd, Roger Jones,.

2. RAL OPN Traffic graphs
=========================
GS circulated FTS plots (Appendix I). The 4 plots of OPN traffic broken down by VO were produced in response to discussions on which VOs were heavily using the OPN. They include:

i) RAL to Tier-1s by VO for the past 6 months. The peak on the right a few weeks ago demonstrates higher traffic. ATLAS and CMS are big players with LHCb showing (ALICE are not represented as they don’t utilise this).
ii) RAL from Tier-1s – fairly similar to i).
iii) Deliberately singled out CERN traffic. No great surprises in the distribution of VOs.
iv) By site – distribution over the 6 months highlights, as expected, preponderance of data coming from CERN is more notable when LHCb was running. US sites (first and last alphabetically) are quite significant.

In conclusion, there are there are no real surprises. VO as expected, US sites are very significant and the high loading of a few weeks ago has now decreased. The large amount of traffic at the start of February was probably the result of CMS and LHCb running at the same time. DB concludes a factor of 5 variation in each bar for a particular experiment and three experiments at the upper end of that distribution at the same time, coincidentally. Most evident in the plot for traffic from RAL to CERN by VO – CMS and LHCb were at upper limits of their bandwidth usage. This was possibly additionally impacted by conference season.

Appropriate provisioning levels were discussed and it was agreed to monitor and respond to issues accordingly – if issues arise for only 5% of the time they are resolved as and when required, but more frequent issues will require action.

GS has now demonstrated the total usage over the longer period which shows this has been rising, but not significantly. DB suggests a place to document this in the annual report presentation to ensure it remains a focus. The importance of from and to RAL was noted – there is now contention on the JANET link and the RAL team are aware of this – but this needs upgrading in the foreseeable future. This should be raised at the next FTS meeting.

1. GridPP36 agenda
==================
PG provided an update. The draft agenda is now available on Indico, though content needs to be fleshed out.

Session 1: GridPP5 (PC to Chair)
– Welcome and GridPP5 talk – DB (timing needs considered)
– WLCG Summary: DB suggests Ian Bird’s summary from the MB would be very useful – PC and DC can present these slides and then go into a discussion which should consider how we engage with working groups and pick up on some themes to revisit later in the meeting.

Session 2: Network & Security (DK to Chair)
– Titles so far from PC (apart from security report from Ian Neilson) so this is looking solid, though could be condensed if necessary.
– It was suggested someone can collate then present summaries of statements from Tier-2s on, for example, how they are connected, major changes, impending or known bottlenecks. DK will discuss with Duncan to send out request to Tier-2s and collate the info.

Session 3: Experience with new technologies (AM & DC to Chair).
– AM summarised ideas of 4 sites of various sizes he will approach today and confirm next week.

Session 4: Non LHC VO Support (JC to Chair)
– Possibly just one presentation rather than a full session. However there are a number of issues here that may be helpful to cover. E.g. a Ganga talk – possibly Ganga vs DIRAC (per discussion during last week’s PMB) and how new users would handle larger jobs to allow for contrasting viewpoints. Titles can be reworked.

DB suggests lots of discussion should be scheduled in sessions 3 and 4.

Session 5: What’s new from the experiments (RJ to Chair)
– PG asked if we have obvious talks in addition to ATLAS and LHCb. Event Indexes will affect sites and this should be discussed as well as Event Service and Networking, etc. As this is the beginning of GridPP5 it was agreed the focus should be on the core mission of what is required to be delivered and immediate pressures of new VOs and UK-T0s as well as flagging up issues that should remain on the radar in the medium and longer term.

Session 6: GridPP in the wider arena (general session – SL to Chair)
TC noted SL asked him to give a talk on CERN projects and outlets in the GridPP5 timeline, but the timing is problematic for travel. DB suggested moving TC’s contribution to Session 1 on Tuesday or swapping session 5-6 if necessary.
PG invited suggestions for other talks that should be included.

ACTION 591.1: DK will ask Duncan to collate information from Tier-2 sites for the Network & Security session summary.

ACTION 591.2: PG or LC will re-set the size for the Dell logo on Indico which is currently too large.

2. Researchfish status
=====================
PG summarised, some details added from DK and CD but there has been little change to report. He confirmed his intention to submit something by the end of this week, though some changes can be incorporated thereafter.
PC summarised his discussion with STFC – he has raised with Janet Seed as a concern and suggested some amendments and improvements should be implemented. Janet has assured she will progress this. It will not be possible to incorporate any changes on this occasion, but PC invited more input for the future. Currently each PI must upload all publications individually per grant which is time consuming and duplication of effort. There was some discussion of potential solutions, though these were not particularly effective. It may be possible to request discounting GridPP as requiring a return or simply use the QMUL ones, though it is not effective to attempt including publications that PG is inputting, but clarity is required on whether this may attract a black mark. PC will write stating it is not possible for bulk input of 1000 publications and suggest all PIs submit a single attachment pointing to PG’s submission. However, it should be recognised some PIs may have institutional requirements. Once per annum someone should be able to upload all outputs associated with experiments and this should appear in Researchfish as a new entity then group leaders could go onto Researchfish and request these to be added to their profile.

ACTION 591.3: PC will write to Ian Fuller stating PIs cannot load individual publications into the current Researchfish and advise that each PI will upload a single submission pointing towards PG submission for all GridPP grants. He will also enquire whether they can be marked as no report required and advise SL what to tell the CB.

3. OSC Papers to prepare
========================
Meeting arranged for Friday 6th May at MRC, London – we need to prepare 2 papers: Financial Report (primarily for PG to collate) and Project Status Report (contributions from several people). From past submission the following should be included with a focus on meeting deliverables for GridPP4+ and not future evolution: Introduction & International Context (DB), Summary of GridPP status (PG), discussion of risk register (PG), Tier 1 status report (GS and AS), deployment status (JC), User Report (RJ) Atlas, DC on LHCb, Impact & Dissemination (Tom Whyntie – SL to coordinate with Tom). This will result in the undernoted actions to be undertaken over by the third week in April.

PG reiterated the importance of receiving quarterly reports on time, to ensure a full and up to date picture of current status is available for incorporation into such reports.

ACTION 591.4: PG to collate information for inclusion in OSC Financial Report.

ACTION 591.5: ALL to contribute to the OSC Project Status Report.

ACTION 591.6: DB to contribute Introduction and International Context for OSC Report.

ACTION 591.7: PG to contribute Summary of GridPP Status for OSC Report.

ACTION 591.8: PG to contribute Discussion of Risk Register for OSC Report.

ACTION 591.9: GS and AS to contribute Tier-1 Status Report for OSC Report.

ACTION 591.10: JC to contribute Deployment Status for OSC Report.

ACTION 591.11: RJ to contribute ATLAS User Report for OSC Report.

ACTION 591.12: DC to contribute LHCb User Report for OSC Report.

ACTION 591.13: SL to coordinate with Tom Whittle to contribute Impact and Dissemination Report for OSC Report.

4. AOCB
=======
a) CD starts a new job today and the PMB thank her sincerely for her contributions to GridPP in EGI context and more generally for her invaluable advice and support. The members extended their very best wishes for her the new role and look forward to continued interaction with her.

b) A GridPP UI?
—————
Directed at new users and GridPP support (e.g. small users from Oxford and LSST, etc). Networks/Firewalls at some sites block networks. Some proxies show network sites are blocked – the possibility of GridPP using a central UI for users to have a temporary account was discussed (possibly based at RAL). AS mentioned this relates to Andrew Lahiff’s recent discussions on EUCLID who need a central UI, which is a challenge in the absence of a central location for user management. It would be helpful to know what appetite exists for this at these sites as it is not their normal practice, would require a build to accommodate and comes with security risks, though these could be minimised. DB suggested setting up a temporary system for new users to deal with initial issues until their own systems are established (e.g. guest accounts for 6 months). EUCLID discussions are moving in that direction – this avoids the long term issues but is a pragmatic solution to provide support in the short term. Only a small number of individuals are involved at the start so this should be acceptable as a short-term solution. This could be discussed further at GridPP36.

ACTION 591.14: AS to consider how to model a proposal for short term temporarily sign-ins for new users to access the Grid.

c) Tier-2 Evolution (AM)
————————
Next release of Vac (0.21) should be this afternoon. This is effectively a release candidate for 1.00 when Vac for running VMs will be “finished” (before we start on container support, etc etc…) Vcycle will go 1.00 at the same time.

I presented the Vacuum Platform we use with Vac and Vcycle based sites at the EGI OMB last week, with a view to becoming an EGI Community Platform. Went well. No objections. They are now going to discuss it where they discuss proposals for new services.

I’d like to include Vac/Vcycle as an HEP Software Foundation project (http://hepsoftwarefoundation.org/projects.html ) In practice, this doesn’t put any new requirements on the way we’re already working (changing the license, copyright, etc) but it’s another place to advertise as we approach the 1.00 release.

The new style Cloud Init ATLAS VMs are now running very well, and I’m just waiting for changes at the CERN end on authentication before we can start rolling them out across our Vac/Vcycle sites.

5. Standing Items
===================

SI-0 Bi-Weekly Report from Technical Group (DC)
———————————————–
No meeting last week – this will take place next week and DC will update then.

SI-1 Dissemination Report (SL)
——————————
SL not present at meeting and no Report submitted.

SI-2 ATLAS Weekly Review and Plans (RJ)
—————————————
ATLAS are doing a large deletion on the RAL tape as in a couple of weeks we will begin repacking the ATLAS tapes onto new media. It is going smoothly.

The Amazon 100k seems to still be ongoing. There seemed to be a lot of problems but they have been resolved and things are being ramped up now.

SI-3 CMS Weekly Review and Plans (DC)
————————————-
Nothing of significance to report.

SI-4 LHCb Weekly Review and Plans (PC)
————————————–
Nothing of significance to report.

SI-5 Production Manager’s report (JC)
————————————-
1) We are continuing to follow up on glibc related patching – occasionally nodes show up in the tests but the sites are quick to respond.

2) We had a core-ops meeting last Thursday. We identified several priority areas.

– The direction of DIRAC and how we handle pilots if glexec support decreases. (Today Daniela has noted that “no SL7 version of dirac on the horizon”!).

– Glexec support within the WN tarball we support has not resulted in an easy to implement solution. Workarounds will be written up but given the WLCG move to stop further deployment of glexec we will not spend more time on it (our goal is now to equip sites to build their own glexec to go alongside their tarballs).

– We will support the UI tarball in CVMFS (done as best efforts and widely used and appreciated but not formally what we took on).

– We are tidying up our core docs (overdue) as some are no longer needed whilst other documented topics have become more important.

– A fresh approach to what was ‘staged rollout’ is needed. This was an EGI related activity. We will focus on middleware readiness testing contributions.

– Engagement with the to be formed WLCG future evolution working groups will be considered once the groups and their remit is better developed.

3) The renewal mechanism in GridPP DIRAC seems to work now (it was a concern for LSST) and IC confirm a problem encountered with ARC CE sites has been resolved. 9 sites are now enabled for LSST and jobs are successfully running on many of them.

4) The jobs Steve runs as the “Steve Lloyd tests” for GridPP sites (http://pprc.qmul.ac.uk/~lloyd/gridpp/) have been very useful but are now largely superseded by other monitoring tests. We will review with Steve whether the tests can useful evolve or whether they are no longer needed. SL agrees and has started testing Dirac using the gridpp VO instead of running atlas jobs. See http://pprc.qmul.ac.uk/~lloyd/gridpp/gridtests/diractest.html for a first look.

For information:

A) CMS will shortly be submitting multi-core jobs to Tier-2s.
B) There is a GDB next Monday: https://indico.cern.ch/event/394780/.

SI-6 Tier-1 Manager’s Report (GS)
———————————
General:
– Updates for the most recent security update (in glibc) were rolled out during the first part of last week.

Castor:
– Testing of the 2.1.15 version is ongoing. The problem identified and reported last week (slower file access times) is still not
resolved. We are therefore unable to schedule the update.

Networking:
– No change to report.

Batch:
– Nothing particular to report.

Procurement:
– Disk and CPU capacity orders in place and we are keeping an eye on delivery times. SBS is now working, but not fully, a reboot tonight should improve. This will be clearer later in the week and AS will send out an email to PMB. STFC has not had the capacity to do anything other than put pressure on SBS to resolve.

SI-7 LCG Management Board Report of Issues (DB)
———————————————–
Covered during last weeks PMB.

REVIEW OF ACTIONS
=================
582.4: DC to insert an update in the wiki page regarding communication with LZ. JC will send a reminder to DC. This will have to be done again as the person who was previously working on this will leave in July. JC noted lack of input/updates and reiterated this information would be very helpful to receive from DC monthly for dissemination. JC will raise at the PMB meeting immediately before the next monthly meeting. Done.

587.2: AM will invite selected small, medium and large sites to contribute presentations at GridPP36 on their plans for site evolution over the next few years and construct a session around this. Ongoing.

588.4: ALL to inform PG of any new roles and other items that need to be inserted into different categories and grants on Researchfish so that he can ensure all are included and circulate to PMB to check. Ongoing.

588.6: GS will investigate reasons for saturation on OPM and report back to the PMB with findings. Done.

589.1: PC and PG will discuss and agree the GridPP36 Agenda. Done.

590.1: ALL contact PC with suggested names for CHEP track convenor. Done.

590.2: DB will consider the suggested GridPP36 session themes and email PG. Done.

590.3: PG will create a draft of sessions for circulation and develop the agenda for GridPP36. Done.

590.4: PC and PG will push Ian Fuller for a response on how to make the Researchfish system more intuitive to Input information and request contact with a representative. Done.

ACTIONS AS OF 29.02.16
======================

587.2: AM will invite selected small, medium and large sites to contribute presentations at GridPP36 on their plans for site evolution over the next few years and construct a session around this. Ongoing.

588.4: ALL to inform PG of any new roles and other items that need to be inserted into different categories and grants on Researchfish so that he can ensure all are included and circulate to PMB to check. Ongoing.

591.1: DK will ask Duncan to collate information from Tier2 sites for the Network & Security session summary.

591.2: PG or LC will re-set the size for the Dell logo on Indico which is currently too large.

591.3: PC will write to Ian Fuller stating PIs cannot load individual publications into the current Researchfish and advise that each PI will upload a single submission pointing towards PG submission for all GridPP grants. He will also enquire whether they can be marked for no report required and advise SL what to tell the CB.

591.4: PG to collate information for inclusion in OSC Financial Report.

591.5: ALL to contribute to the OSC Project Status Report.

591.6: DB to contribute Introduction and International Context for OSC Report.

591.7: PG to contribute Summary of GridPP Status for OSC Report.

591.8: PG to contribute Discussion of Risk Register for OSC Report.

591.9: GS and AS to contribute Tier-1 Status Report for OSC Report.

591.10: JC to contribute Deployment Status for OSC Report.

591.11: RJ to contribute ATLAS User Report for OSC Report.

591.12: DC to contribute LHCb User Report for OSC Report.

591.13: SL to coordinate with Tom Whittle to contribute Impact and Dissemination Report for OSC Report.

591.14: AS to consider how to model a proposal for short term temporarily sign-ins for new users to access the Grid.