GridPP PMB Meeting 689

GridPP PMB Meeting 689 (03/12/18)
=================================
Present: Dave Britton (Chair), David Colling, Alastair Dewhurst, Tony Doyle, Pete Gronbech, Jon Hays, Roger Jones, Steve Lloyd, Andrew McNab, Gareth Roy, Andrew Sansum, Louisa Campbell (Minutes).

Apologies: Tony Cass, Pete Clarke, Jeremy Coles, Dave Kelsey,

1. GridPP42
===========
DB made suggestions for locations and dates, costs, logistics and interest are being investigated. It is challenging to pin down a date that works well due to Easter school holidays and clashes with various meetings, e.g. ACAS, WLCG & HSF workshop, HEPEX, ISGC, IOP. LC is investigating options at North Wales (Betws-y-Coed), Pitlochry and Glasgow. AD has suggested Coseners at RAL which he is investigating. It was agreed that, despite school holidays in Scotland, 1-3 April may be the most attractive dates OR 24-26 April. AD will look into the costs and availability at RAL and the Crown & Thistle Hotel for the Collaboration Dinner. LC and AD will investigate further.

2. IRIS TWG
===========
Several PMB members attended an IRIS Technical group meeting last week that generated some discussion and attracted some animosity between Grid and non-Grid attendees. It was reiterated that care should be taken not to present the Grid as an all-consuming solution to problems without regard to others’ perspectives. DB will circulate a message asking all members to be more diplomatic in non-Grid focussed meetings. Care should be taken in our communications with IRIS and other external groups, it may be useful to reiterate parameters on what should/should not be addressed and recognising that we can help people use the Grid, but it may be their preference to use other solutions. AD will use the Technical Meeting this week as a useful context to address this issue.

3. Quarterly Reports
====================
GR has 3 outstanding reports – JC, AD will complete today (Matt provided a deadline of Thursday), Dan regarding experimental support.

4. OSC
======
The OSC members remained as before for this meeting – new members will be rotated in for the next meeting. Neither Tony Medland nor his replacement were unable to attend. New members will be Jackie Palace, Andy Buckley, Chris Alton (probably eventually become chair). PG and DB gave a presentation and answered questions – some aspects were picked up and responded to, e.g. Atlas being low, how CDT students could use GridPP – it was suggested possibly only via Tier-1 with the mechanisms already in place. Questions about longer term LHC, GridPP6 gets us to the end of Run 3, and DB responded in detail regarding how WLCG is addressing this on various levels. DB noted that the US have stepped up to address some of these issues with $25M funding and noted that while the UK had kick-started this initially, they are now lagging slightly behind. There was some discussion on the GridPP6 proposal on Run 3 and challenges particularly around manpower. OSC thanked us for the documents and addressing the actions raised at the last meeting, noted other projects we worked with and commended this, they also noted we have hit a critical stage for manpower, particularly for Tier-1. It was noted that the Echo project was going well and by next OSC they would like to see tape planning – they generally commended a good operation and our responsive action to issues arising. The next OSC may be April/May 2019 (after GridPP6 proposal submission). Sarah mentioned she hadn’t yet been able to supply terms of reference as this is currently with the new programs manager, DB has requested written guidelines be provided this week. There will be planning on assessing the proposal – the consolidated grant processes have been challenging and FECs from universities have risen considerably which has a knock-on effect. From our perspective, the main focus is GridPP6 planning.

5. Overview Board
=================

Overview board on Friday at CERN – interesting meeting with a WLCG status update from Ian Bird that included new (increased) luminosity estimates for Run-3 but reduced predicted requirments from ATLAS and CMS as they continued to develop their computing models.

6. AOCB
=======
Next PMBs – 10th and 17th December then 7th January 2019

7. Standing Items
===================

SI-0 Bi-Weekly Report from Technical Group (DC)
———————————————–
AD noted a discussion on Friday about HT Condor before going to production, minutes are attached to Agenda. Steve Jones has developed a solution requiring minimal coding changes and will push on with this in Liverpool – to be reviewed in February. There are ongoing discussions about a possible simpler way that will require some effort.

SI-1 ATLAS Weekly Review and Plans (RJ)
—————————————
Nothing substantive – migration of Tier-2 to use Harvester is now complete. AD noted that Tim will be requesting disk space from Atlas to ask if Tier-1 could provide next year’s capacity early so this may result in a formal request.

SI-2 CMS Weekly Review and Plans (DC)
————————————-
Nothing of significance to report.

SI-3 LHCb Weekly Review and Plans (PC)
————————————–
Nothing significant to report. Expecting to move in February. LHCb are having issues submitting to RAL and there has been few jobs, AM confirmed there are reasons for this.

SI-4 Production Manager’s report (JC)
————————————-
JC not present, no report submitted.

SI-5 Tier-1 Manager’s Report (AD)
———————————
Operational
– The batch farm was drained and rebooted on 27th and 28th November, to apply the security patch for CVE-2018-18955.

– CMS SAM tests are still appearing as “missing”. We believe that CMS is aware of this problem and are manually correcting it while working on a fix.

– While completing the weighting up of the ClusterVision 17 storage nodes, the Ceph Manager Daemons crashed and were unavailable for 30 minutes. This did not impact any production work, as despite their grand name, they simply providing monitoring information about the cluster. We believe this is a bug in the web dashboard and have disabled it for now and informed the Ceph developers.

Procurement
– On 30th November we issued purchase order of £67k to upgrade existing Ceph storage nodes with extra disks. This will provide an extra 1.7PB of raw storage significantly cheaper than buying new machines (at least 25% cheaper).

– We have received the quote for CPU that we will be issuing via a direct award. Assuming the same hardware as last year provides the same HS06* then the price per HS06 comes to £9.10, which is the equivalent of a 7% price drop on last year’s XMA quotes. We intend to buy ~£310k which is 44 machines, providing 2816 job slots and 34k HS06. We intend to issue this purchase order on Wednesday 5th December.

– The tender for the disk procurement closes on Friday 7th December.

– Next round of Tape purchases is ongoing.

SI-6 LCG Management Board Report of Issues (DB)
———————————————–
No MB report.

SI-7 External Contexts (PC)
———————————
PC not present – no report submitted.

REVIEW OF ACTIONS
=================
644.4: AD will progress capture of funds for Dirac with Mark Wilkinson. (Update: funding from DIRAC. AS has emailed Mark. They are now using it more heavily. Could use the money for tape, but have to be careful not to buy tape we won’t use. May be better charging later rather than during this FY? AD will now progress. 08/10/18 – Leicester are producing a PO for tapes and will send to AD to produce an invoice). Ongoing.
667.2 PG will do h/w planning before next OC to provide OC with details of shortfall in funds. (Update: PG will check the OSC minutes for details and cover with GR). Done.
678.3: AD to finalise the Tier1 background document, including tape strategy by end September. (Update: Almost complete and will circulate current iteration for comment). Ongoing.
678.5: JC to finalise the Storage background document by end September.
(UPDATE: 17 October meeting with Tony Medland & DB and PC will attend. This is almost complete and awaiting a few minor elements to be worked in and GR will upload into Googledocs for info). Ongoing.

ACTIONS AS OF 03.12.18
======================
644.4: AD will progress capture of funds for Dirac with Mark Wilkinson. (Update: funding from DIRAC. AS has emailed Mark. They are now using it more heavily. Could use the money for tape, but have to be careful not to buy tape we won’t use. May be better charging later rather than during this FY? AD will now progress. 08/10/18 – Leicester are producing a PO for tapes and will send to AD to produce an invoice). Ongoing.
678.3: AD to finalise the Tier1 background document, including tape strategy by end September. (Update: Almost complete and will circulate current iteration for comment). Ongoing.
678.5: JC to finalise the Storage background document by end September.
(UPDATE: 17 October meeting with Tony Medland & DB and PC will attend. This is almost complete and awaiting a few minor elements to be worked in ñ GR will upload into Googledocs for info). Ongoing.