GridPP PMB Meeting 611

GridPP PMB Meeting 611 (31.10.16)
=================================
Present: Dave Britton(Chair), Pete Clarke, Jeremy Coles, Tony Doyle, Dave Kelsey, Steve Lloyd, Andrew McNab, Andrew Sansum, Gareth Smith, Pete Gronbech (Minutes).

Apologies: David Colling, Roger Jones, Tony Cass.

1. OC Docs
==========
Report needs AS (Tier-1 section and numbers for PC section)
PMP needs AS input. (Purchase plan for Tier-1)

2. Pledges
==========
Can we meet new experiment requests, given budget and £390K. To do 100%
Would have to run 2011 generation another year. (Provides 20K HS06) and stop meeting Atlas and CMS CPU uplift. This was an uplift for UK analysis that came from an extra £370K capital in 2013.)
For disk had to trim down reserve from 5 to 4PB?
Even more pessimism in exchange rates. Using 20%.

It is difficult to hit target as all other capital already spent at the T1. AS to come back this pm with a number.

3. Accounting Issues
=====================
Need to follow up with RHUL whether they will republish the previous year’s data.
Need to find out if Edinburgh is over or under publishing.
Send JeS instructions out now
The value will be calculated and sent later in couple of days.
Ral if they hold 5PB disk buffer then can meet ~60% of the increase.
For FY17
Re capitalizing the tape media AS needs to try to do that.
DK to send something to DB et al today.

4. Standing Items
===================

SI-0 Bi-Weekly Report from Technical Group (DC)
———————————————–
DC not present, nothing of significance to report.

SI-1 Dissemination Report (SL)
——————————
Nothing to report.

SI-2 ATLAS Weekly Review and Plans (RJ)
—————————————
RJ not present, nothing of significance to report.

SI-3 CMS Weekly Review and Plans (DC)
————————————-
DC not present, nothing of significance to report.

SI-4 LHCb Weekly Review and Plans (PC)
————————————–
Nothing to report.

SI-5 Production Manager’s report (JC)
————————————-
Very little to report this week. Items (mainly upcoming meetings) of potential interest:

1) Pete and Gareth are arranging the next HEPSYSMAN for next Monday: https://indico.cern.ch/event/577279/.
2) Sites have been responding to a CRITICAL Linux kernel privilege escalation. (aka DirtyCOW). Since the vulnerability was publicly announced last week, publicly available exploits have emerged and are trivial to use to gain root on the affected systems. Sites have responded well but in a mixed fashion. The T1 for example closed user interfaces, batch entries etc. and went into downtime while tests/investigations of deploying alternative patched kernels and the system tap were executed.

3) There is a GDB next week: https://indico.cern.ch/event/394788/. Topics include an update on the HNSciCloud Tender Evaluation, WLCG workshop summary and a Regional Federations Demonstrator.

4) The SKA-GridPP agenda for this Wednesday and Thursday has evolved a little and can be found at: https://indico.cern.ch/event/570594/.

SI-6 Tier-1 Manager’s Report (GS)
———————————
Here is a Tier1 report covering the period since the meeting on 17th October.

General:
– The main focus of work last week was the patching for the security bug (CVE-2016-5195 or ” DirtyCOW”). In response we stopped new
batch work on Monday (24th). Following patching the batch system was available again by the end of the Wednesday afternoon (26th
Oct). We have announced an outage tomorrow for Castor for the patching.

Castor:
– As reported before the testing of Castor 2.1.15 is largely complete. Owing to staff availability this update will be carried out
in the New Year, with the intention of completing it by the end of January.
– We have seen problems on the “AtlasScratch” instance in Castor. This is a disk-only pool with a small number (only 5) of old disk
servers. A plan to merge this disk pool into the larger AtlasDataDisk has been developed. This will alleviate the bottleneck of this
being served by a small number of old disk servers. A similar merger is also proposed for LHCb (merging the smaller LHCbuser disk
pool into the larger LHCbDst one).

Tape System:
– The intervention (by Oracle) to replace the fixings for the rails used by the handbots within the Tier1 tape library has been
delayed one day to Wednesday 2nd November.
– The additional ‘D’ tape drives have been installed by Oracle and we will start the migration of the LHCb date from ‘C’ to ‘D’
after Wednesday’s intervention.

Services:
While some services were down awaiting the security patch referred to above the opportunity was taken to migrate some services to
the Windows 2012 Hyper-V infrastructure (from the older 2008 infrastructure).

SI-7 LCG Management Board Report of Issues (DB)
———————————————–
Nothing to report.

SI-8 External Contexts (PG)
———————————
Nothing to report.

REVIEW OF ACTIONS
=================
605.1: DK will investigate costs and timescales of upgrading the OPN Link to 30 and report back to PMB. Ongoing.
606.3: AS will propose a convenient date for Tier1 review and circulate to PMB for consideration. Ongoing.
607.2: PG will produce a spreadsheet containing explicit detail on Capital and Resource for Tier1 and as well as Tier1 and Tier2 pledges to include LHC requirements. Ongoing.
607.4: ALL to contribute to the OSC Project Status Report. (Almost complete) Ongoing.
607.8: JC to contribute Deployment Status for OSC Report. Ongoing.
610.1: AS/GS Produce suggestions for one or more metrics that will summarise the Tier1 network availability/performance.

610.2: All – review Pete’s comments in the metrics spreadsheet and act accordingly.
610.3: AS Attempt to get tape media re-classified from resource to capital.
610.4: AS/DB Contact Tony Medland to get new budget allocation (regarding extra capital) in writing so we can start procurement.
610.5: AS Provide numbers/details for H2020 bids. DB will contextualize them.
610.6: GS Produce report on how Tier1 missed that a very low number of CMS jobs were running and therefore fell significantly
behind running the CMS re-reco jobs.

ACTIONS AS OF 31.10.16
======================
605.1: DK will investigate costs and timescales of upgrading the OPN Link to 30 and report back to PMB. Ongoing.
606.3: AS will propose a convenient date for Tier1 review and circulate to PMB for consideration. Ongoing.
607.2: PG will produce a spreadsheet containing explicit detail on Capital and Resource for Tier1 and as well as Tier1 and Tier2 pledges to include LHC requirements. Ongoing.
607.4: ALL to contribute to the OSC Project Status Report. (Almost complete) Ongoing.
607.8: JC to contribute Deployment Status for OSC Report. Ongoing.
610.1: AS/GS Produce suggestions for one or more metrics that will summarise the Tier1 network availability/performance.

610.2: All – review Pete’s comments in the metrics spreadsheet and act accordingly.
610.3: AS Attempt to get tape media re-classified from resource to capital.
610.4: AS/DB Contact Tony Medland to get new budget allocation (regarding extra capital) in writing so we can start procurement.
610.5: AS Provide numbers/details for H2020 bids. DB will contextualize them.
610.6: GS Produce report on how Tier1 missed that a very low number of CMS jobs were running and therefore fell significantly
behind running the CMS re-reco jobs.