GridPP PMB Meeting 582

GridPP PMB Meeting 582 (07.12.15)
=================================
Present: Dave Britton (Chair), Pete Gronbech, Tony Cass, Andrew Sansum, Jeremy Coles, Steve Lloyd, Dave Kelsey, Gareth Smith, Peter Clarke, Dave Colling, Andrew McNab, Pete Clarke. (Minutes – Louisa Campbell)

Apologies: Tony Doyle, Claire Devereux, Roger Jones.

1. WLCG workshop and DPHEP workshop (travel)
=========================================
First week February in Lisbon – Registration is now open. Members were asked to advise their staff about the workshops and monitor the numbers attending. DB and PC will attend, as it is appropriate for long term planning. Registration costs around Euro 180 and hotels quite reasonable. Members agreed to promote this but ensure attendees are appropriate – this will be picked up through Visit Notices.

ACTION 582.1 – ALL bring WLCG and DPHEP workshops to the attention of key staff who should be in attendance and monitor costs and visit notices.

2. Tape Planning at the Tier-1

==============================
AS and PC have been discussing the best way of proceeding with tape planning given that there are a number of different options to plan for delivery. Points to consider include:

1) We should aim to use the most current estimates of experiment requirements – noting that in some places this is now inconsistent with the GridPP5 plan. These changes reflect evolutions in experiments’ modelling and changing circumstances at the accelerator not available at the time of GridPP5 planning. For example the REBUS 2017 numbers now differ from GRIDPP5 – we should try to match REBUS.

2) Where VO stated requirements in REBUS no longer match our MoU commit (or GridPP5 plan) we should plan (in the tape purchasing sense) to meet the REBUS figure – even if it differs from the commit. For example LHCB 2016 REBUS request is now 5PB lower at RAL than previously. It makes little sense to plan for usage that will not transpire. For guidance 5PB costs c. £100K. DB suggests we commit to MoU and reflect the numbers in REBUS as usual – we commit at end August and revisit REBUS numbers in July/early August and make commitment on what remains in REBUS. It is now too late to change 2016 figures – in any case, for tape we can commit the level but only deploy what is actually required. The Plan needs to reflect commitment. This would be more challenging for disc or CPU, but not in tape as we can demonstrate clearly there was no need to deploy. Consistency in reporting is essential.
3) We have now made a commitment to deliver tape to ALICE (0.87PB) at well above 2016 MoU commit (0.31PB) and we may attempt to meet that commitment in the planning. DB suggests we can leave the situation as it currently stands regarding ALICE tape.
4) Although the DiRAC tape requirement in 2016 was not planned for, we should do so (5PB), despite it not being clear whether this will be used yet this is a high priority (note it is £100K of tape)
5) MICE was discussed and their stated requirement of 2PB (£40K) in 2016 and 4PB in 2017. MICE should be contacted regarding this. Once we reach multi-BPs it becomes very expensive requires accurate planning.
6) Other VOs look more plausible.

ACTION 582.2 AS to advise ALICE to open a dialogue with us regarding additional tape space if they reach a crisis, but in the meantime the situation should be left as previously agreed.

ACTION 582.3 AS and PC to continue to model the costs and planning before the Resourcing meeting on 16.12.15.

3. CVMFS users workshop at RAL (funding request)
=============================================
DB received an email from Catalin informally enquiring whether GridPP might consider providing some funding to support the second CVMFS user workshop at RAL. DB responded it would be necessary to see costs and determine value for money before PMB can consider. AS may be able to persuade the SCD to cover some costs, but noted the costs for this 2-day meeting should be reasonably low. Securing a suitable sponsor for the annual dinner will reduce costs and delegates normally pay their own accommodation. Assistance with costs relating to venue hire, coffee/refreshments should be possible. Agreement was reached that it would be a very helpful opportunity to encourage non-HEP people who are using CVMFS to make a visit to the lab and it is possible that EGI may part-fund. The workshop would be organised with CERN-based people which would widen impact beyond the EGI communities and it can be announced on UKT0 list. It was agreed that GridPP would be prepared to sponsor if funds are available from the GridPP5 budget if Catalin can provide costs and secure sponsorship for the dinner.

4. LZ VO contact
===============
The importance of understanding and monitoring communication between LZ and VOs was raised. Some concern was expressed over contact from LZ that should have come via Imperial. JC set up a webpage to monitor the VOs communication with LZ and this is covered at weekly Ops meetings, but this is unnecessarily time-consuming and a more effective way to monitor should be sought. Agreement was reached for DC to cover this point verbally in monthly meetings and JC will monitor this. A mention in the wiki page would be helpful and it is important to ensure that as well as Imperial all sites are dealing appropriately with LZ communication.

ACTION 582.4 DC to insert an update in the wiki page regarding communication with LZ.

5. Status of quarterly reports?
=============================
PG reminded members who have not already done so to submit quarterly reports.

6. Status of GridPP5 grants?
==========================
As yet there is no news on the new GridPP5 grant status as STFC are currently focussing on Consolidated Grants and the RAL PPD review. It was noted that staff are beginning to express concerns over job security – HR have automatically generated redundancy letters and staff at Glasgow will shortly be registered on the internal Job Seeker Register. DB will formally approach Sarah Verth to address this.

ACTION 582.5 DB will email Sarah and ask her to confirm if a decision on the GridPP5 grant can be provided in the next week.

3. AOCB
=======
a) New GridPP web site
———————-
The new website has now been announced and has had very positive initial feedback. Tom has updated instructions for uploading documents and these are now available on the website.

b) Recent JANET Network issues and impact on GridPP services
——————————————————-
Some issues were experienced last week that affected some jobs. JANET issues can impact operations if services cannot be contacted in good time. Specifically the latest update on the webpage was noted – the recent attacks may derive from Twitter updates and future updates will be made through TB emails. PMB members are not always on TB lists that reach all sites. Some issues using DNS server and a range of other issues have also been experienced.

4. Standing Items
===================
SI-0 Bi-Weekly Report from Technical Group (DC)
——————————————-
Nothing to report on DIRAC service at Imperial, which is working well.

Evolution of Tier2s – AmcN described improvements which are working well and will be released before Christmas. This provides Open-Stack to VMs which will make it more usable as Open-Stack is much more common. This should be well advertised when AMcN has concluded testing.

ATLAS had nothing significant to report – more news about HLT running well and Tier-0 issues were minor.

LHCb had nothing of note to report.

GridPP experiment had nothing of note to report.

Tier-2 Storage – there was some discussion on running disc within ATLAS and this is progressing.

Security – Ian had been discussing security and all is working well.

No comment on GridPP cloud sites.

SI-1 Dissemination Report (SL)
——————————————-
##GridPP Dissemination Officer Notes for PMB

###GridPP Website 2.0 – launched!

The new GridPP public-facing website [DON1] launched this morning. Huge thanks to the PMB for all of their feedback and help throughout the process or re-designing, re-organising, and re-imagining the how GridPP presents itself to the world.

Special thanks too to Andrew M for masterminding the infrastructure that makes it all possible (particularly the WordPress x.509 plugin) and Louisa for very helpful feedback with the PMB Minutes and Documents sections.

We have also re-branded the Facebook [DON2], Twitter [DON3], and Google+ [DON4] pages to ensure brand consistency across all social media platforms. Google Analytics has also been enabled on the new site. We are sure the website will continue to evolve as it is used and feedback is received, but for now – enjoy!

[DON1] https://www.gridpp.ac.uk

[DON2] https://facebook.com/gridpp

[DON3] https://twitter.com/gridpp

[DON4] https://plus.google.com/102505903458149799840

SI-2 ATLAS Weekly Review and Plans (RJ)
—————————————
Nothing to report.

SI-3 CMS Weekly Review and Plans (DC)
——————————————-
Nothing UK related to report.

SI-4 LHCb Weekly Review and Plans (PC)
——————————————-
Nothing to report.

SI-5 Production Manager’s report (JC)
——————————————-
1. JANET has been suffering from problems that appear to be related to a DoS on the network. This has impacted smooth access to various services including the new GridPP website, VOMS and monitoring. Access to the website was intermittent this morning.

2. The new website has been well received. There has been a comment that currently it is hosted without IPv6 support (no AAAA record in DNS) and as soon as possible we should change this to supported.

3. Those intending to go to the Lisbon WLCG workshop in February are being encouraged to register: https://indico.cern.ch/e/WLCG-Workshop-Lisbon-2016.

4. Lancaster has had to go into unscheduled downtime due to flooding of the Lune river and flooding of a local substation (thanks to Alessandra for quick action at the weekend after a request from Roger to change the GOCDB status). The site is expected to be back by mid-week.

5. The November Tier-2 A/R reports are now available. For the UK:

ALICE (http://wlcg-sam.cern.ch/reports/2015/201511/wlcg/WLCG_All_Sites_ALICE_Nov2015.pdf):
All okay.

ATLAS (http://wlcg-sam.cern.ch/reports/2015/201511/wlcg/WLCG_All_Sites_ATLAS_Nov2015.pdf):

Sheffield: 79%:79%
RALPP: 86%: 86%

CMS (http://wlcg-sam.cern.ch/reports/2015/201511/wlcg/WLCG_All_Sites_CMS_Nov2015.pdf):
All okay.

LHCb (http://wlcg-sam.cern.ch/reports/2015/201511/wlcg/WLCG_All_Sites_LHCB_Nov2015.pdf):
RALPP: 88%:88%

Explanations:

RALPP: The internal network between the dCache storage and the worker nodes was running flat out at 10 G for about 3 days, which caused SE tests for ATLAS and LHCb to occasionally time out and fail. This high network load was due to some odd CMS activity.

Sheffield: A badly configured worker node was causing problems at the end of October and beginning of November. This has been resolved.

ATLAS ASAP metrics for November 2015 showed no specific concerns with any UK sites.

6. The autumn HEPSYSMAN meeting has been moved to Friday 15th January and will take place in Manchester.

SI-6 Tier-1 Manager’s Report (GS)
——————————————-
General:
– Generally a quiet week last week.

Castor:
– No significant changes in last week.
– We did have a failure of one Atlas disk server on Friday. One of the disk partitions used by Castor went read-only. This correlated with a disk failure. After some checking the allowing the rebuild of the RAID set to complete the server was put back in production on Saturday.

Networking:
– We have had some problems with the link between the Tier1 network and the UKLight router that provides our data link bypassing the firewall. One of the pair of 10Gbit links has dropped out a few times. It was already planned that this link will be changed tomorrow (Tuesday 8th December). This will both move the Tier1 end of the link off our old core switch and move to 4 * 10Gbit links.
I should point out that this will not, in itself, increase our external bandwidth as the onward links from the UKLight router are 10Gbit to the OPN and 10Gbit to the RAL border routers and JANET. However, we plan to double this latter link to the border router in the new year.
– Other changes needed to remove the old ‘core’ switch from the network are scheduled for completion this week.

Batch:
Have rolled out to one batch of worker nodes the changed algorithm that will allow “pre-emptable” jobs to run while draining of worker nodes to make space for multi-core jobs.

Procurement:
No further update regarding the procurements. As reported last week the tenders are set to close on 18th December. A couple of questions have come in for clarification.

SI-7 LCG Management Board Report of Issues (DB)
———————————————–
No MB report as the meeting takes place tomorrow – PG and DB will attempt to attend remotely between breaks in travel.

REVIEW OF ACTIONS
=================
578.2 AM and JC to investigate EGI community platforms to see whether VAC and possibly DIRAC could be registered. Ongoing.

580.1 AS will look at tape planning to determine if we want to take forward increasing ALICE space and if this can be easily accommodated in all the scenarios. (To be discussed at the Resource Meeting on 16.12.15) Ongoing.

581.1 DB and LC to make enquiries on the viability and potential benefits of hosting CHEP conference at Glasgow in 2018. Done.

581.2 CD to provide DB and LC with copies of relevant spreadsheets, budgets and planning information from recent EGI conference. Ongoing.

581.3 ALL to members who have not already done so should submit reports to PG. Ongoing.

581.4 AMcN to liaise with Tom on announcing the new website. Done.

ACTIONS AS OF 07.12.15
======================
578.2 AM and JC to investigate EGI community platforms to see whether VAC and possibly DIRAC could be registered. Ongoing.

580.1 AS will look at tape planning to determine if we want to take forward increasing ALICE space and if this can be easily accommodated in all the scenarios. (To be discussed at the Resource Meeting on 16.12.15) Ongoing.

581.2 CD to provide DB and LC with copies of relevant spreadsheets, budgets and planning information from recent EGI conference. Ongoing.

581.3 ALL to members who have not already done so should submit reports to PG. Ongoing.

582.1 – ALL bring LSCG and DPHEP workshops to attention of key staff that should be registered and monitor costs and visit notices.

582.2 AS to advise ALICE to open a dialogue with us regarding additional tape space if they reach a crisis, but in the meantime the situation should be left as previously agreed.

582.3 AS and PC to continue to model the costs and planning before the resourcing meeting (16.12.15).

582.4 DC to insert an update in the wiki page regarding communication with LZ.

582.5 DB will email Sarah and ask her to confirm if a decision on the GridPP5 grant can be provided in the next week.