GridPP PMB Meeting 619

GridPP PMB Meeting 619 (09.01.17)
=================================
Present: Dave Britton(Chair), Tony Cass, Pete Clarke, Jeremy Coles, David Colling, Tony Doyle, Pete Gronbech, Roger Jones, Dave Kelsey, Steve Lloyd, Andrew McNab, Andrew Sansum, Gareth Smith, Louisa Campbell (Minutes).

Apologies:

1. GridPP5 h/w grant estimated spend status
===========================================
PG has estimates from all but 3 PIs totalling £554K expected spend. It is not necessary to ask other sites to spend more than currently committed. DC, David Hutchcroft at Liverpool and PC will indicate what may be spent this financial year. DB will update Tony Medland that we have met this planned commitment. Major spenders are Glasgow (£194K), Manchester (£160K), Durham (£65K) and Oxford (£30K); most other sites will spend £10K or less. The Glasgow tender has been agreed and is currently awaiting formalisation from Procurement so this is well advanced. DC will respond to PG this week – a spend of up to £50K will maintain the £600K target previously suggested by Tony Medland. The exchange rate drop by 1% today will have an impact, so it is potentially beneficial to undertake the procurement sooner than later.
ACTION 619.1: DB will update Tony Medland on the planned HW spend at £554K.
ACTION 619.2: DC will respond to PG this week on procurement plans for this financial year.

2. Tier-1 procurement status
============================
There has been some progress since the Festive break. Tenders were prepared before Christmas but BAES advised approval required to be sought from them prior to procurement. Just before Christmas SBS stated they could not undertake procurement until BEIS approval was received. AS has spoken to STFC procurement specialists and completed relevant forms for the BEIS approval – once BEIS approval is sorted we can progress the tenders and procurement this week. Once approval is received procurement should be achievable within the timeframe if we go with suppliers (e.g. Dell) who can deliver in the timescale (i.e. mid-February). Glasgow had a broad range of delivery times (between 2-12 weeks). AS will contact BEIS to advise the forms are completed and check procedures, he will send the form to DB shortly to agree proposed text and will attach the GridPP5 proposal when submitting the forms. £395K extra in this financial year means this is not technically GridPP5.

3. Disk planning at the Tier-1 to meet pledges (Alastair Dewhurst’s email)
==========================================================================
Alastair enquired about figures we have pledged, our meaning of pledge and what figures we will try to meet. He has listed some figures and requested confirmation they are correct, they are. He asked how this will be implemented as some disk is on Castor etc. The technical question of how this is deployed requires to be addressed – 2 sets of numbers in the email (1 from PG pledge emails in Aug 2016 which included modification to LHCb requirements which were a snapshot and now not relevant). PG will advise we are aiming at the numbers contained in REBUS and if this is not achievable we will aim for 90% across the experiments. Re LHCb and provision of old allocation in Castor and extra in Echo – LHCb will discuss this with AM and RJ. We cannot move all Castor HW to Echo and will maintain some in Castor, but individual experiments will have different requirements. It would be useful to keep this running till 2018 – this can be further discussed at the forthcoming review. In advance of the review there should be discussions on how the coexistence between capacity under Castor that can’t be moved to Echo will operate in the next 2 years (i.e. what VOs and workflows will use this), depending on numbers. A planning meeting is scheduled this week and AS will raise this and add to a list of discussion topics at the review.
ACTION 619.3: PG will respond to Alastair Dewhurst’s email on Tier-1 pledges.

4. AOCB
=======
None.

5. Standing Items
===================

SI-0 Bi-Weekly Report from Technical Group (DC)
———————————————–
No technical meeting this week – next meeting is 20.1.17. DB has drafted a high level document on evolution – DC has now contributed and will email it to DB.

SI-1 Dissemination Report (SL)
——————————
## GridPP Engagement Officer Notes for PMB (Addendum)

### GridPP Dissemination and Engagement summary documents

These have been emailed out to TB-SUPPORT, but (quite literally) for the record, the following documents have been published on the CERN-backed Zenodo repository:

* The GridPP Dissemination and Engagement Document Index: http://doi.org/10.5281/zenodo.223054
* The GridPP New User Engagement Programme: Selected Case Studies: http://doi.org/10.5281/zenodo.220995
* The GridPP UserGuide (offline version): http://doi.org/10.5281/zenodo.222702

The Document Index also lists many internal documents that have been securely stored on the Grid. These may be retrieved via the GridPP DIRAC service (see the UserGuide for instructions on how to do this using the LFNs provided).

The Handbook (GridPP-ENG-003) and Addendum (GridPP-ENG-004) should provide all the information required for the dissemination/engagement role in future.

SI-2 ATLAS Weekly Review and Plans (RJ)
—————————————
Big processing over Christmas completed, no other news to report.

SI-3 CMS Weekly Review and Plans (DC)
————————————-
Tests on running Oxford diskless are going very well and efficiently so far. The next step is to install UK-wide redirectors which should be available to all sites. If this works well we can consider using a few sites until February. There was a CMS UK meeting last week where this was discussed.

SI-4 LHCb Weekly Review and Plans (PC)
————————————–
Nothing of significance to report: jobs were running well over Christmas, though there is one unusual ticket against Tier-1 where a user could not read files due to authentication issues and a few jobs failed. Both issues are being monitored and should resolve next week.

SI-5 Production Manager’s report (JC)
————————————-
The holiday period has been relatively quiet so there is not much to report today. These items may be of varying interest to some of you.

1) There is a pre-GDB on networking (http://indico.cern.ch/event/571501/) at CERN this Tuesday (Duncan has a short slot to update participants on GridPP site inputs) and a GDB on Wednesday with (currently) a light agenda (http://indico.cern.ch/event/578982/).

2) Tom’s parting summary on GridPP Dissemination and Engagement documents can be found at http://doi.org/10.5281/zenodo.223054. Links have been circulated to TB-SUPPORT and put in the bulletin for this week.

3) In December JISC deployed a RIPE ATLAS anchor on the Harwell campus: https://atlas.ripe.net/probes/6241/. This may be of some use to us as we did not yet deploy an anchor at a GridPP institute (the RIPE probes compliment the perfSONAR hosts on which we focus. The latest perfSONAR mesh for the UK can be seen at http://maddash.aglt2.org/maddash-webui/index.cgi?dashboard=UK%20Config).

4) Sussex were having availability/reliability problems just before the Christmas break but some focused effort in the days before going on leave resolved them.

5) A round up of the WLCG wide operational status and issues over the last few weeks is being put together here: https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMeetingWeek170109. At the time of writing nothing of particular note for us has appeared.

SI-6 Tier-1 Manager’s Report (GS)
———————————
Operationally, a fairly quiet period over Christmas. Overnight Thursday/Friday 22/23 December there was a problem on one of the Power
Distribution Units to a rack in the UPS room. This mainly affected internal services – although the Top-BDII was also unavailable for
a couple of hours. This was the second time this had given problems and it was swapped out on Friday morning (23rd).

Castor:
– Firmware updates on the RAID cards in the Viglen ’13 batch of disk servers were successfully carried out last Thursday. Security
patch of the SRM nodes was also done.
– The Castor 2.1.15 update in schedule for this month. The first step, the update if the nameserver component, takes place tomorrow.
– We have been seeing load on the CMS Castor instance that has led intermittent failures of the SAM tests of the SRMs – which in turn
has led to poor availability for CMS. An adjustment to the number of available slots for transfers on the disk servers in “CMS Tape”
was made last week.

Tape:
– Migration of LHCb data from ‘C’ to ‘D’ tapes ongoing. Now a little over 70% done. Around 280 out of the 1000 tapes still to do.

Services:
– Various updates have been applied (CVMFS client on worker nodes, Condor batch system version, squids updated)

SI-7 LCG Management Board Report of Issues (DB)
———————————————–
The December meeting was cancelled and there is nothing to report.

SI-8 External Contexts (PC)
———————————
There was a UK-NGI meeting which Charlotte attended. It may be 6 months before information is available on the Government’s infrastructure funding, though there is still an expectation of money there may not be funds available for UKTO work in the near future. Normally the Autumn statements mean using up funds by the following April, but Charlotte suggests the situation could be more complicated as BIS may require additional information. Infrastructure 12 status – the second tranche of funds is looking positive and there is an EGI, EU VAC, data cloud working with others to submit a bid. STFC are involved – this is positive and potentially a continuation of existing work we do with EGI.

REVIEW OF ACTIONS
=================
610.1: AS/GS Produce suggestions for one or more metrics that will summarise the Tier-1 network availability/performance. Ongoing.
612.3: PG will determine which small sites can undertake procurement this FY. (Update: See action point 619.1). Done.
613.1: AS will undertake a post mortem on CMS issues at Tier-1. (UPDATE: DB will digest and provide brief summary of document). Done.
616.1: LC will secure venues and accommodation for GridPP38 in Sussex and advise Fab. Done.
616.2: AS will update the PMB on Tier-1 procurement by next week. Done.
616.3: AS & GS to undertake a sanity check on Janet. Done
616.4: DB and SL will discuss how best to progress replacement of TW’s role. (UPDATE: SL will take action on TW’s replacement). Done.
617.1: ALL to review and comment on the Tier-2 Evolution document this week to agree a final version next week. Done.
617.2: DC will append a statement to the Tier-2 Evolution document on CMS requirements. Done.
617.3: RJ will establish a priority order for resources to address issues arising. Ongoing.
617.4: JC will document what sites and periods CPU is idle and could be used elsewhere and will summarise in an email to the PMB. Ongoing.
617.5: PG will discuss with Ulrich requirements for GANGA going forward and report back to the PMB. Ongoing.
617.6: TC will discuss with Romain to consider submitting an abstract for CyberUK 2017. Ongoing.
617.7: SL will look into possible saturation at 10% level for LHCB jobs and determine if more resources should be allocated. Ongoing.

ACTIONS AS OF 9.1.17
====================
610.1: AS/GS Produce suggestions for one or more metrics that will summarise the Tier-1 network availability/performance. Ongoing.
617.3: RJ will establish a priority order for resources to address issues arising. Ongoing.
617.4: JC will document what sites and periods CPU is idle and could be used elsewhere and will summarise in an email to the PMB. Ongoing.
617.5: PG will discuss with Ulrich requirements for GANGA going forward and report back to the PMB. Ongoing.
617.6: TC will discuss with Romain to consider submitting an abstract for CyberUK 2017. Ongoing.
617.7: SL will look into possible saturation at 10% level for LHCBo jobs and determine if more resources should be allocated. Ongoing.
619.1: DB will update Tony Medland on the planned HW spend at £554K.
619.2: DC will respond to PG this week on procurement plans for this financial year.
619.3: PG will respond to Alastair Dewhurst’s email on Tier-1 pledges.