GridPP PMB Meeting 593

GridPP PMB Meeting 593(04/04/16)
=================================
Present: Dave Britton(Chair), Tony Cass, Tony Doyle, Steve Lloyd, Andrew McNab, Andrew Sansum, Gareth Smith, Pete Gronbech (Minutes),

Apologies: Pete Clarke, David Colling, Roger Jones, Dave Kelsey, Jeremy Coles, Louisa Campbell.

Again, minute taker absent so these are more notes.

1. GridPP36 Agenda
==================
WLCG talk will be by RJ, PC or DB.

New technologies. If people are concerned that they do not have enough staff. If they have manpower to install open stack fair enough.
VAC is much less effort.

HTCondor and HTCondor CE are probably a good thing to look at. It is thought that HT Condor CE is going to be important.

DB suggested we need a discussion about storage for ALICE, eg at Birmingham with low manpower. Also Atlas – where do we want the storage to be?

Spanish Tier-2s run dcache, what is our feeling about DPM in the longer term?

Maarten Litmath is starting a lightweight Tier-2 working group, AM has spoken to him.

PG to move sponsor talk to 1745

HSF summary talk required (After WLCG summary). Registration in Paris nearly closed.

2. For the F2F: Quarterly reps
=============
Why do we need them, how many do we need?

What can we do with carrot or stick if we need them on time?

3. AOCB
=====================
SKA proposal to H2020 was submitted by the deadline. STFC contribution 6PM funding and 6 matching from GridPP and 3 from STFC. Potentially good.

AS T1 has a visit from David Liddington MP on 29th April. Has an interest in CERN etc.


4. Standing Items
===================

SI-0 Bi-Weekly Report from Technical Group (DC)
———————————————–
Nothing of significance to report.

SI-1 Dissemination Report (SL)
——————————
##GridPP Dissemination Officer Notes for PMB

###New User Engagement Programme – GridPP36 Preview

####General observations

These are provided for PMB ahead of GridPP36 for comment/discussion. Thoughts welcome!

* Monitoring problems is easy; monitoring success is not!

We have many examples of engaging new user communities from the past year. Where problems have occurred, these have been managed and largely solved through the user “Point of Contact” or others CCed on email conversations. More recently, support has been provided by the GRIDPP-SUPPORT JiscMail list. These conversations are, by definition, easy to monitor and track as the user is providing feedback during the problem-solving process. However, when things work progress is more difficult to track. We’ve all been there – when stuff is working the priority becomes completing the work; emails take a back seat. Over the past few months we have been able to obtain more information on the successes through a campaign of “reminder” emails to users; however these have sometimes taken weeks (and several emails) to retrieve. Persistence is key! Unfortunately, this scenario is also difficult to distinguish from the case where users have got everything working but not had time to make progress.

* There is no “one-size-fits-all” solution

Almost all of the examples so far have used different combinations of tools and approaches. On the one hand, this demonstrates that we have a range of solutions available to meet the unique requirements of different user communities. On the other, it takes more work to document and support each case; we are not offering a “one-size-fits-all”/Minimum Viable Product (MVP). In fact this has been commented on by a colleague of Colin Hayhurst (Uni. Sussex Innovation Fellow), who said on reviewing what GridPP had to offer:

“Just a quick glance over GridPP, the main differentiator seems to be the level of hand holding they provide to users, specifically scientific and data analysis work loads. Any other cloud provider (Amazon, Microsoft et al) could provide the same if not a greater compute footprint, but do not provide application specific support.”

Expectations need to be set and managed from the outset

Some problems have been linked to a disconnect between what users have expected from GridPP and what is on offer. While the large LHC experiments have production-level systems in place to manage, store and process their data, their existence is largely due to a team of dedicated developers and a large user base who can test and provide timely feedback. New user communities do not have such resource available. The development of customised, application-specific solutions for new users should therefore be viewed as just that – GridPP can offer a development system that will require testing and resource from the new user(s) in question, with the understanding that GridPP will provide the infrastructure and support where possible. A production-level solution should be the ultimate aim of any collaborative work with GridPP, but this will involve formal Service Level Agreements detailing commitment of resource, etc. As long as this is made clear at the outset, this is a reasonable approach that should prevent new users from getting the wrong impression of what can be achieved (at least to begin with).

####Communities engaged during GridPP4+

#####Research users

* GalDyn: Contact re-established with GalDyn group (UCLan) – contact now submitted thesis and looking to continue research by moving to large-scale simulations on the grid after renewing their expired grid certificate.

* PRaVDA: Contact re-established with Tony Price of the PRaVDA group. It turns out that they have successfully used grid resources for their simulations and have been working on the actual device itself – hence the lack of updates. Simulations to resume. Great support from the local Ganga team (MS/MW), currently looking at integrating file catalogs with DIRAC and Ganga.

* SuperNEMO – looking to resurrect the VO – ongoing. Major barrier seems to be lack of time/resource from new user(s).

* Climate change (Oxford) – GRIDPP-SUPPORT email list working well to support new user from a climate change group at Oxford. Some issues with using a GridPP CernVM from the university network – NAT networking not working (see LIGO below). To bypass this they have been given an account on the Oxford cluster (thanks to Ewan M!). Testing now proceeding as planned.

* SNO+: Matt M (QMUL) has been given considerable assistance via the GRIDPP-SUPPORT mailing list with setting up a Grid FTP endpoint in the SNO+ cavern to allow data transfers to the grid (i.e. the outside world). Test transfers have been conducted, now waiting for it to be used in anger.

* LSST: Thanks to Alessandra the LSST effort is now going great guns with thousands of jobs being submitted with DIRAC using the Ganga interface.

#####Industrial users

* Contact made (via new website) with the Nuclear Physics and Neutronics group, Clean Energy Europe, Amec Foster Wheeler regarding using grid resources for fusion-related Monte Carlo simulations. Based near Manchester. Discussions ongoing regarding distributing sensitive software (i.e. MCNP) – CVMFS probably not appropriate in this case;

* Contact made (via RAL) with start-up SeeCycle www.seecycle.com regarding using GPU or CPU resources for training image-processing/object recognition algorithms for improving cycling safety. As it turns out the GPU cluster would be most useful and they have been given the details required for setting up accounts on RAL’s Emerald cluster.

AM Commented:
Problem that the LHC large experiments have a huge machinery behind them and

CERN pushing to use HTCondor CE and dropping ARC. If this is the case then it could have ramifications for the UK. We (LHCb) had a message from Gavin asking if LHCb can stop using ARC (at CERN) and they said yes.

The US sites seem to be standardising on HTCondor CE. They are both from the same stable so should work together better.
We can get accounting out of it (it’s no worse than ARC)

SI-2 ATLAS Weekly Review and Plans (RJ)
—————————————
Nothing to report.

SI-3 CMS Weekly Review and Plans (DC)
————————————-
Nothing to report.

SI-4 LHCb Weekly Review and Plans (PC)
————————————–
Nothing to report.

SI-5 Production Manager’s report (JC)
————————————-
Nothing to report.

SI-6 Tier-1 Manager’s Report (GS)
———————————
Shaun De Whitt and Juan Sierra left the database team.
Alison Packer is looking into getting short term cover with contract staff. It is a very vulnerable area, but very difficult to get cover.
Problem deploying Castor 2.1.15. Rob Appleyard looking to upgrade the SRM.

Speed of draining seem to be slow. Internal problem which could have visible affects.

Gen scratch will be decommissioned. It gets very little use.
One of 10Gbit links to UK light router failed.

Another problem caused two links to be flapping. Was manually fixed over the Easter weekend, but a configuration error has now been rectified to prevent further occurrences.

Procurement. Bulk procurements are delivered and receipted.
Various details in report.

SI-7 LCG Management Board Report of Issues (DB)
———————————————–
No Report.

REVIEW OF ACTIONS
=================
587.2: AM will invite selected small, medium and large sites to contribute presentations at GridPP36 on their plans for site evolution over the next few years and construct a session around this. Done.

591.4: PG to collate information for inclusion in OSC Financial Report. Ongoing.

591.5: ALL to contribute to the OSC Project Status Report. Ongoing.

591.6: DB to contribute Introduction and International Context for OSC Report. Ongoing.

591.7: PG to contribute Summary of GridPP Status for OSC Report. Ongoing.

591.8: PG to contribute Discussion of Risk Register for OSC Report. Ongoing.

591.9: GS and AS to contribute Tier-1 Status Report for OSC Report. Ongoing.

591.10: JC to contribute Deployment Status for OSC Report. Ongoing.

591.11: RJ to contribute ATLAS User Report for OSC Report. Ongoing.

591.12: DC to contribute LHCb User Report for OSC Report. Ongoing.

591.13: SL to coordinate with Tom Whittle to contribute Impact and Dissemination Report for OSC Report. Ongoing.

591.14: AS to consider how to model a proposal for short term temporarily sign-ins for new users to access the Grid. AS has started discussing with Ian Collier. Ongoing.

592.1 DK to speak to David Salmon re possible JANET Network update at GridPP36, (including as much as allowed on the security incident). Done.

ACTIONS AS OF 04/04/16
======================
591.4: PG to collate information for inclusion in OSC Financial Report. Ongoing.

591.5: ALL to contribute to the OSC Project Status Report. Ongoing.

591.6: DB to contribute Introduction and International Context for OSC Report. Ongoing.

591.7: PG to contribute Summary of GridPP Status for OSC Report. Ongoing.

591.8: PG to contribute Discussion of Risk Register for OSC Report. Ongoing.

591.9: GS and AS to contribute Tier-1 Status Report for OSC Report. Ongoing.

591.10: JC to contribute Deployment Status for OSC Report. Ongoing.

591.11: RJ to contribute ATLAS User Report for OSC Report. Ongoing.

591.12: DC to contribute LHCb User Report for OSC Report. Ongoing.

591.13: SL to coordinate with Tom Whittle to contribute Impact and Dissemination Report for OSC Report. Ongoing.

591.14: AS to consider how to model a proposal for short term temporarily sign-ins for new users to access the Grid. AS has started discussing with Ian Collier. Ongoing.