GridPP PMB Meeting 616

GridPP PMB Meeting 616 (05.12.16)
=================================
Present: Dave Britton(Chair), Tony Cass, Pete Clarke, Jeremy Coles, Tony Doyle, Pete Gronbech, Roger Jones, Dave Kelsey, Steve Lloyd, Andrew McNab, Andrew Sansum, Gareth Smith, Louisa Campbell (Minutes).

Apologies: David Colling.

1. GridPP38 location
====================
DB confirmed initial costing for Sussex was extremely high. However, after discussion, Sussex University have reduced the costs to more affordable rates. Premier Inn in Brighton centre is the best value for accommodation (£77.99 for 5 April and £63.99 for 6 April) and is a 5 minute walk from the rail station so delegates can take public transport to the campus and save the cost of bus hire. The costs amount to c. £12,000 which is on par with other meetings – Glasgow venues were also investigated and found to be broadly comparable. Therefore, since Sussex costs are consistent with elsewhere and with previous meetings, DB requested PMB approval for Sussex as the host of GridPP38 – this was agreed for 5-7 April 2017.
ACTION 616.1: LC will secure venues and accommodation for GridPP38 in Sussex and advise Fab.

2. Tier-1 Procurement
=====================
AS provided a brief update following on from his recent email. Timescales are rather tight though Martin believes it is possible to complete in time, but it would require a very simplified procurement process and clear statements of requirements for HW. It is notable the number of companies responding to procurement recently has reduced to just one, so this may expand potential suppliers and facilitate the capture of more competitive tenders. Martin and the procurement team are investigating the most effective way to progress this. AS presented a case for not requiring BIS approval and is hopeful it may be possible as a precedent has previously been set – he will keep the PMB advised on progress. The PMB agreed this route should be supported as it would be helpful if we can ease the procurement process and encourage more bids to continue operating at this level.
ACTION 616.2: AS will update the PMB on Tier-1 procurement by next week.

3. AOCB
=======
a) Q-Reports
————
PG has received all the reports and hopes to review this week.

b) Tier-1 meeting
—————-
This is taking place on Wednesday, DB cannot now attend.

c) Outbound network
——————
The outbound network load on Janet is increasing. Before upgrading the UK Light Router we were well under 10 GB/s but now are much higher. This is not fully understood.

d) Research Council and network procurement
——————————————-
AS had some concerns and was brought in to coordinate the stakeholder meeting feeding in to procurement. AS is attempting to understand what is valued in the network to build this in to his report.

e) STFC internal policy committee
———————————
The STFC internal policy committee has asked Network Technical Advisory Group for a vision of an excellent network, i.e. what would be required for an internal site network enhancement in the next 5 years. Thus, it is positive to note there is growing recognition that rather than merely updating, a longer term plan of action is required for this type of work.

4. Standing Items
===================

SI-0 Bi-Weekly Report from Technical Group (DC)
———————————————–
PDG Cloud workshop on Tuesday was very helpful – allowing us to make people aware of work we have done and engagement with more communities. Some communities were keen on openstack and this is a good opportunity to make them aware of our plans. AM will discuss further with DC and ensure some groups are signed up relatively soon. DB provided a context – DC chairs the Cloud Working Group and there were some issues relating to HPC, this is a task force under that banner to focus on presenting resources in a cloud-like way and helps us to present our work in a more tailored way to users of the Grid within the cloud interface. AM noted offers of cooperation from Amazon, Microsoft and Google in terms of cloud and this group offers a forum to progress that and consider how best to use cloud if we were given capacity. We have contacts through Lancaster and Manchester and we need to consider how to take this a step further.

SI-1 Dissemination Report (SL)
——————————
##GridPP Engagement Officer Notes for PMB

### More LSST press coverage

* http://www.scientistlive.com/content/shear-brilliance
* http://www.asi.it/it/news/alla-scoperta-delluniverso-oscuro – in Italian!
* http://tech.qq.com/a/20161128/004260.htm – in Chinese!

### Potential collaboration with China with Higgs factory simulations

Thanks to Adrian Bevan (PPRC, QMUL), we have the opportunity to work with the Circular Electron Positron Collider (CEPC) [1]. A team based in China have setup a multi-VO DIRAC instance and Dan Traynor (QMUL) is investigating configuring the QMUL cluster to work with them, as they are short on computing resource and this seems like an interesting opportunity for international collaboration.

The PMB agreed this was a positive route to progress and support.

### GridPP GPU cluster – first production use for IceCube

Dan Traynor (QMUL) has setup a GPU cluster at the QMUL Tier-2 and has been working with Teppei Katori (PPRC, QMUL) to get analysis software for the IceCube experiment [2] running on it. This has been successful. Case study and press release to follow (appropriately for Christmas!).

### So long and thanks for all the Grid!

After three or four years of combining engagement with research, TW is leaving the post of GridPP Dissemination (Engagement) Officer to go back into full-time research. Full details of the handover/transition materials will be supplied in due course, but for now TW wishes to express thanks to the PMB for the many fantastic opportunities and support offered by the Collaboration to engage many new users and communities with GridPP research and technologies. In particular, those behind the GridPP DIRAC system, which has revolutionised how GridPP works with smaller Virtual Organisations, and more recently the Ganga User Interface, deserve immense credit for opening up many exciting opportunities with the astrophysics “Big Science” communities and beyond.

The PMB expressed their sincere thanks to TW who moved the role on exceptionally well in the time he has been with the group – his work on putting together a toolkit and guidelines is hugely appreciated and the PMB wish him every success with his future research career at UCL.

Decisions now need to be taken on the way forward – we can use the meta-kit to support the toolkit but this requires to be formalised. The post was non-FEC so thought needs to be given to how and where to progress a post. DC may be able to consider as DIRAC post, but further discussion is required to achieve some gains to build on the work TW has undertaken. SL will discuss with TW the key demands for his input/support to consider for his replacement. DB is in London Wednesday and will try to meet with SL and/or TW late Wednesday afternoon.

[1] http://cepc.ihep.ac.cn/

[2] https://icecube.wisc.edu/

ACTION 616.3: DB and SL will discuss how best to progress replacement of TW’s role.

SI-2 ATLAS Weekly Review and Plans (RJ)
—————————————
Nothing of signifance to report. Broader discussion on computer model, but nothing new to report on operational issues and no update on the issue with the pilot from last week – this has been challenging to trace. DB is preparing a short paper for the PMB on recommendations for evolution of Tier-2 at a high level to provide a simplified message re procurement of kit. DB discussed with RJ at CERN to understand Atlas’ requirements, he is awaiting input from DC and will then circulate to the PMB for comment before circulating to Tier-2s. The statement will present how GridPP would like them to evolve but they will have flexibility and be able to judge their procurements against the suggested trajectory.

SI-3 CMS Weekly Review and Plans (DC)
————————————-
DC not present, no report submitted.

SI-4 LHCb Weekly Review and Plans (PC)
————————————–
Two issues re LHCb disk were discussed at the Ops meeting today, other than those there is nothing significant to report. GS updated there were 2 disk servers out from different batches – one seemed to have a failed disk which was replaced when a problem was flagged so this was understood. Both issues were related to separate concerns – GS has been monitoring the statistics which do not show cause for concern.

SI-5 Production Manager’s report (JC)
————————————-
1. There was a DPM workshop last week in Paris: https://indico.cern.ch/event/559673/. There remains a strong community effort in this area. Sam represented GridPP. One issue arising for us is the possible deprecation of the DPM interface to ARGUS.

2. QMUL have engaged a Chinese group working on investigations related to a future circular electron positron collider (CEPC). They also run a multi-VO DIRAC instance. Currently only QMUL are involved but CEPC could use additional resources. Is this an area where we might like to pursue more collaboration?

3. Planet GridPP run by Glasgow for our GridPP blog posts is being moved to run on a VM this week. Testing has gone well.

4. VO Nagios for LSST has started and is indicating various issues to follow-up (and thus highlights the need for this type of service): https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?servicegroup=lsst&style=detail.

5. The dune VO started a production run last week and one of their principle sites encountered problems, so there was a push for more sites to enable the VO.

6. NA62 has been set up with a support unit in GGUS.

SI-6 Tier-1 Manager’s Report (GS)
———————————
– There a problem on one of the Power Distribution Units to a rack in the UPS room during last night. This affected two network switches – which in turn affected some core services (including the TopBDII). This was resolved by a member of staff attending during the night.

Castor:
– There was some problems with the CMS Castor instance during last week. A restart of the “transfermanager” on Friday cleared out a backlog of transfer requests that were not progressing – and this enabled the service to work normally.
– As reported before the testing of Castor 2.1.15 is largely complete. Owing to staff availability this update will be carried out in the New Year, with the intention of completing it by the end of January.
– We are looking to merge smaller disk pools into larger ones for both LHCb and Atlas. We are planning to do this on the 8th December.

Tape:
– Migration of LHCb data from ‘C’ to ‘D’ tapes ongoing. Approaching the 40% mark with just over 600 out of the 1000 tapes still to do.

Services:
– The FTS service was upgraded to version 3.5.7 last week.

Infrastructure:
– There was maintenance on the UPS and generator last week.

SI-7 LCG Management Board Report of Issues (DB)
———————————————–
There has been no management board meeting, though there was an overview board at CERN last week. This was rather UK-dominated by WLCG Chair (RJ), DB as UK rep and Ian Collier, with sparse attendance from other countries. There was nothing new or controversial to report other than a status report form Ian Bird and a presentation on computing in the LHC era by Ian in Eckard which DB can circulate to anyone particularly interested, but this contained nothing contentious as most of this was contained in the WLCG workshop in San Francisco. There was an update on the Cycloud project which we are involved in at a very low level. AS is sorting out payment mechanism to send some funds to CERN (c. 35,000 EUR), DB was advised resources may be chosen to be used as part of the pledge. This forum is a good opportunity to speak directly to Eckard.

SI-8 External Contexts (PC)
—————————
Nothing new from the Autumn review.

REVIEW OF ACTIONS
=================
605.1: DK will investigate costs and timescales of upgrading the OPN Link to 30 and report back to PMB. (Update: DK chased this and Janet are still working with new pairing points in London as they go through the same rack but this is being re-engineered for better resilience, costs are as yet unconfirmed – DK will keep the PMB informed). Done.
610.1: AS/GS Produce suggestions for one or more metrics that will summarise the Tier-1 network availability/performance. Ongoing.
612.3: PG will determine which small sites can undertake procurement this FY. (Update: DB and RJ both had notices from JES system confirming their applications have been approved. PG will discuss with each PI to establish figures and remind them imminent action must be taken). Ongoing.
613.1: AS will undertake a post mortem on CMS issues at Tier-1. (UPDATE: AS has pulled a lot of information together and will speak more to AL, he has good information from Chris Brew and will speak to Rob Appleyard about CASTOR). Ongoing.
613.5: ALL submit Q3 reports to PG. Done.
614.2: DC will liaise with various Operational team members on the CMS issue at Tier-1 to ascertain when the issue was identified and by what mechanism it was escalated. Done.
614.4: DB will enter pledged amounts into REBUS then check before formal submission. Done.

614.5: PG will advertise the pledges summary on the website to the Ops team and ask them to highlight any issues with the proposed numbers for each site. Done.

614.7: JC will contact Ian Collier to discuss potential date clashes between HEPSYSMAN and WLCG meetings. Done.
615.1: RJ and DB to discuss policy on CPU and storage with Atlas. Done.
615.2: RJ will forward Simone’s statement to DC to disseminate to tb-support. Done.

ACTIONS AS OF 05.12.16
======================
610.1: AS/GS Produce suggestions for one or more metrics that will summarise the Tier-1 network availability/performance. Ongoing.
612.3: PG will determine which small sites can undertake procurement this FY. (Update: DB and RJ both had notices from JES system confirming their applications have been approved. PG will discuss with each PI to establish figures and remind them imminent action must be taken). Ongoing.
613.1: AS will undertake a post mortem on CMS issues at Tier-1. (UPDATE: AS has pulled a lot of information together and will speak more to AL, he has good information from Chris Brew and will speak to Rob Appleyard about CASTOR). Ongoing.
616.1: LC will secure venues and accommodation for GridPP38 in Sussex and advise Fab.

616.2: AS will update the PMB on Tier-1 procurement by next week.

616.3: DB and SL will discuss how best to progress replacement of TW’s role.