GridPP PMB Meeting 586

GridPP PMB Meeting 586 (18.01.16)
=================================
Present: Pete Gronbech (Chair), Tony Doyle, Andrew Sansum, Jeremy Coles, Gareth Smith, Andrew McNab, Roger Jones, Steve Lloyd, Pete Clarke, Dave Colling, Tony Cass, Louisa Campbell (Minutes).

Apologies: Dave Britton, Dave Kelsey, Claire Devereux.

1. Travel requests from Peter Love
==================================
DB raised this is as relatively expensive. This is because of the location in a downtown major US city – this is a full taste of what we would have had to do with CHEP. DC can part fund the costs, but not fully – total c. £2400-£2700. PMB members in attendance agreed to support 50% of the costs as this is considered a very laudable and worthwhile event for attending.

2. HEPSYSMAN Meetings
=====================
1-day meeting last Friday and ½ day Ganga training course before that for 10-15 people. This was very well received, as was the workshop organised by Mark Slater. It was suggested that the annual 2-day meeting at RAL (previously around Easter, but more recently scheduled in the summer) might be used as a follow-up, concluding with a security session. Sponsorship is currently being pursued and PMB approval was sought to support this. No objections were lodged, but costs will be maintained low.

3. Quarterly Reports
====================
All reports have now been received and PG has provided the PMB with a summary.

PG summarised the key points and enquired whether the CMS efficiencies were maintained at a good rate. GS confirmed they were and across the Tier-1s all were performing consistently.

The procedure for disseminating quarterly reports was clarified. PG confirmed these are loaded onto the website minus sensitive financial information. PG’s summarised report is provided only to the PMB and then attached to the minutes (attached herewith as Appendix I) and the minutes are also ultimately loaded onto the website.

CMS – some discussion took place on the booking of Tier-1 codes. It was suggested these should continue on the report as the procedures and responsibilities for these have been less clear of late. It was agreed, for example, that Andrew Lahiff should be reported at 1.00 FTE.

4. GridPP36 Themes
==================
GridPP36 is confirmed at the Atholl Palace Hotel, Pitlochry and arrangements are in hand. LC summarised:

a) STFC will pay the cost of delegate accommodation through direct invoicing from the Atholl Palace Hotel as well as venue hire and catering. This covers B & B for Monday 11th-Wednesday 13th April. Any delegate who wishes to book additional nights or incurs additional costs will be required to pay these at check-out.
b) DB has secured sponsorship from Dell for the conference dinner.
c) Lunch will be provided on Wednesday 13th, but the conference sessions conclude before lunch.
d) AM and LC have been working on the design of the online registration form on Indico. This is near completion and will be available for delegates to register soon.

DB asked the PMB to consider potential themes and the following was discussed:

Tier2 Evolution (Ongoing practicalities and where we will be at the end of GridPP5). Moving sites – middle sites need to be more carefully considered (probably a question for each individual site since each will respond differently and this will evolve between now and GridPP36). AMcN will push this through the weekly monitoring meeting to encourage people to input data. LHCb want some Tier2 to have input but further down the line will need more Monte Carlo, ATLAS is moving in a different direction to have less sites with lots of data because of the complexity of managing the volume of sites. The importance of Monte Carlo should be stressed by the end of this run.

The PMB agreed the remainder of the agenda/themes will continue to evolve over the coming weeks.

5. AOCB
=======
a) Wish lists for funding are now in place (Jeremy Yates), but they are incomplete at the moment. Costs were required in the form of wish-lists for computer funding over the next 5-6 years for presentation at a meeting of the RCUK infrastructure group. It was reiterated that there is no guarantee the Government will provide this, but previous exercises have had reasonable success in this regard. Some PMB members were working on this for RAL and HTC computing etc. and a total wish-list of £50M across science is included on the spreadsheet. This is for information only and any updates will be forwarded as and when they are available.

b) DB and PC had a very brief conversation with Tony Hey at the science summit meeting at the Royal Society. Although nothing specifically relevant arose, Tony is STFC Chief Data Scientist and co-chair of STFC e-Leadership Council and charged with providing information on planning/infrastructure. DB and PC have arranged a meeting with him soon to formally discuss his thoughts on our remit and how we align with his vision. Tony is currently data-gathering and it is hoped he may be persuaded to attend the next CAP meeting by which time he may have formulated his thoughts in this regard.

A Cloud and data-infrastructure workshop to revise the PDG Data e-Infrastructure Document has been arranged for 05.02.16 at the Farr Institute, Euston Road, London (near UCL) and several PMB members are participating. DB and PC have now confirmed a meeting with Tony Hey the day before. It may be useful for the PMB discuss in advance to ensure Tony has a clear appreciation of our role and current objectives.

c) DC noted that CERN IT reorganisation or interaction with Cloud and other providers is rumoured to be more aligned with batch systems. Tim Bell is leading the group merged from virtualisation and Batch, he and others at CERN find VAC less appealing. However, work continues to focus on how we can best work with Cloud. The intention is to run HTC condor on commercial Cloud resources buying in x no. of machines for x number of months and place batch system then have conventional HT condor in the front of it. This will be base-load dependant and CERN are checking if this is cost-effective in the longer term. This is thought to be more expensive, but the EU are paying for 2/3 of costs and longer term costs should be considered in light of that funding. It was noted that recently a contract with a Cloud provider had to be cancelled because they could not accommodate requirements, therefore this needs clarification. It is easier to do that with Monte Carlo as a baseline load. DC will discus with Tim Bell

ACTION 586.1: DC will discuss proposed IT reorganisation at CERN with Tim Bell.

d) AS will discuss with PG on scheduling for MoBrain (EGI project we offered resources of c. 50m spec hours). It was agreed that the tape allocation for ALICE would be increased to the 870TB level they are currently using, but they must not use any more. This level is above our MoU commitment and will be reviewed later.

ACTION 586.2: AS will contact MoBrain and discuss resources for their EGI project.

e) PC noted the next LHCONE in Taiwan on Sun-Monday 13-14 March 2016 ASGC meeting https://indico.cern.ch/event/461511/ and no-one is attending especially, but AS confirmed Ian Collier is likely to attend and TC may attend but there may be a clash elsewhere. GS questioned progress on putting RAL on to LHC1. This links directly with the earlier point on Cloud and RAL and the UK in the longer term.

6. Standing Items
===================

SI-0 Bi-Weekly Report from Technical Group (DC)
——————————————-
This has been moved via Indico to Fridays to avoid other meetings. DC will therefore report next week.

SI-1 Dissemination Report (SL)
——————————————-
Nothing of significance to report.

SI-2 ATLAS Weekly Review and Plans (RJ)
—————————————
There was some concern over the situation in Glasgow last week, but no storage failures were identified. It has been a slightly bumpy week for UK operations. Trialing 200 pilot event simulations – tested at RAL and appeared to work satisfactorily, though memory-hungry (8 cores and 38 GB which is beyond the ATLAS request). Working on S3 identification (2 x sites on that – RAL and Lancaster).

SI-3 CMS Weekly Review and Plans (DC)
——————————————-
Nothing of direct significance to report

SI-4 LHCb Weekly Review and Plans (PC)
——————————————-
Nothing of direct significance to report.

SI-5 Production Manager’s report (JC)
——————————————-
JC Not present and no report presented.

SI-6 Tier-1 Manager’s Report (GS)
——————————————-
Castor:
– The CMS Tape instance has experienced some of the same problems seen by the LHCb tape instance as few weeks ago. Notably there were failures of disk servers within the CMSTape service class. Last Monday two of the five disk servers were taken out of service owing to multiple disk failures. One of these was returned to service the following day – however one had more significant problems and was only returned to service on Thursday. Some of CMS’s monitoring was badly affected by this failed server as some test files were not available and these took some time to read back from tape owing to the large backlog. One further disk server has now been added to CMSTape bringing the total to six. It is planned to add another one in the next week or so.
– All remaining SL5 disk servers have been upgraded to SL6.

Networking:
– Nothing to report this week.

Batch:
– Nothing to report this week.

Procurement:
– The tenders closed on Friday (18th December) and evaluations have been done. The disk procurement order has been placed.

Actions:
584.4 GS will investigate issues experienced with jobs at RAL between 17-18 December.
Having checked back through our “Admin On Duty” log and ticket system for these dates this was part of the batch problem reported last week. Performance of the batch system became worse on the lead up to, and then through, Christmas. This problem was resolved between Christmas and the New Year.

585.5: GS will check if Tier-1 access can be provided for EC2 support at RAL to manage resources.
Yes. This is going ahead. (Just waiting for a firewall hole to be opened to provide access to our development cloud).

SI-7 LCG Management Board Report of Issues (DB)
———————————————–
PC – awaiting next meeting scheduled in February (the meeting originally scheduled for tomorrow has recently been cancelled).

REVIEW OF ACTIONS
=================
582.4 DC to insert an update in the wiki page regarding communication with LZ. Ongoing.

584.4 GS will investigate issues experienced with jobs at RAL between 17-18 December. GS reported that the dates do not have particular significance – the issues relate to other batch issues GS has already reported on and resolved. Done.

585.1 – GS to report on RAL job efficiencies before Christmas. Ongoing.

585.2: DB and AM will determine who best to send to SSI Collaboration meeting and report back on outcomes. Ongoing.

585.3: DB and PC to discuss and determine what figures should be included as capital amounts for computing infrastructure. Done.

585.4: DB will provide AS with information on costings etc, for a containerised datacentre. Done.

585.5: GS will check if Tier-1 access can be provided for EC2 support at RAL to manage resources. Done.

ACTIONS AS OF 18.01.16
======================

582.4 DC to insert an update in the wiki page regarding communication with LZ. Ongoing.

585.1 – GS to report on RAL job efficiencies before Christmas. Ongoing.

585.2: DB and AM will determine who best to send to SSI Collaboration meeting and report back on outcomes. Ongoing

586.1: DC will discuss proposed IT reorganisation at CERN with Tim Bell.

586.2: AS will contact MoBrain and discuss resources for their EGI project.