GridPP PMB Minutes 872 [08 Jan 2024]

GriddPP PMB Meeting 872, 08.01.24

Present: David Britton (Chair), Peter Clarke, David Colling, Davide Costanzo, Alastair Dewhurst, Katy Ellis, Alessandra Forti, Jonathan Hays, Roger Jones, Steve Lloyd, Andrew Sansum, Tony Cass, Sam Skipsey (Minutes), Jill Sambrook (Minutes),

Present: Apologies: Tony Doyle, Peter Gronbech, Andrew McNab, David Kelsey.

ITEMS

0) DMP for CG 2024 [PC]
Group discussion about the data Management Plan that has to be attached to the CG.
PC confirmed it was almost complete. Has a version that he should circulate shortly.
There are still a few sections that need to be updated:
LHC – still waiting on CMS update. KE will look at this.
NA62 has been updated (by DB).
Hyper-k and T2k – needs an update (Sophie King).
Supernemo- will look up who to contact
Comet (contacts at IC)
SNOPLUS – Jan Wilson.

DB said he can write a short summary about what is contained in the DMP leaving the full details to follow.

1) Trusted Research [PC]
DB asked SS to include this as a point on the GridPP7 Risk Register.
It was also agreed this would be an Agenda item for the F2F PMB meeting to look at this.

AOCB
A discussion was had about Exascale computing in the UK.

STANDING ITEMS

SI-0 Tier-1 Manager’s weekly report & Technical Meetings [AD]
AD shared screen – bullets from slides below
Technical meetings

  • Have a meeting schedule for this Friday on network R&D for DC24.
  • Expects to have somewhat frequent technical meeting this year on:

Token support
Moving on after the Death of CentOS7
Antares

  • No operational problems over Christmas
  • Drafted some slides regarding how ALICE will use Antares in the future.

Once finalised will setup meeting with ALIC Computing Management.

  • Two downtimes planned:

15th January – upgrading Mellanox switches to latest OS
23rd January – upgrading to EOS5

Echo

  • Echo performed very well over Christmas
  • On 2nd January at 10am we added a Rocky 8 storage node into production.
    this broke the cluster until it was reverted.
  • We will do a post mortem.
    we do not believe it was related to the new OS.

Batch Farm

  • Smooth running of Batch Farm over Christmas.
    Drop in usage during Echo Outage
  • Last week we upgraded the 2022 generation to:

Kernel 6.5.10
Docker 24.0.7

GGUS tickets

  • No new operational tickets over Christmas.
  • Open tickets are on:

Token support (CMS/ATLAS)
IPv6 on batch farm (on hold, expected before June 2024)
PerfSonar configuration change (in progress)
Some failing transfers between Tier-1 and RALPP (LHCb). Aim to make more progress in the new year.
Slow checksums (LHCb). Alex made some improvements last year, but there is still a bottleneck in the XRootD code he is looking at.

Procurement

  • Storage

Order has been placed for 32 storage nodes (19712 TB raw storage).

  • CPU

Tender in progress

  • VMware replacement

Requisition for new storage array is in progress.

SI-1 ATLAS Weekly Review and Plans [DCos]
Update on ATLAS Liaison post start date from AD
Last info from December – planning on starting 15th Jan. AD will send an email update if this is not still correct.
Technical front – patch to oracle being applied tomorrow afternoon. May see a spike of jobs failing for 15/20 minutes. Usually not a big deal. Lancaster downtime planned for the 16th and 17th of January.
Otherwise very quiet holiday period.

SI-2 CMS Weekly Review and Plans [DC/KE]

Quiet end of year at RAL.
Issue first day back with Echo – as Alistair mentioned.
Intermittent sand test failure.
Upgrade to Antares.
Labelling of tokens at CMS.
Speed tests showed some interesting anomalies last week. Speaking with James Adams after this meeting. 
Data challenge TOKENS testing. Still working on this with Rucio.

SI-3 LHCb Weekly Review and Plans [AM]
NTR

SI-4 Operations Meeting Report [SS,PG,PC] –
Next ops meeting on the 16th

SI-5 LCG Management Board Report of Issues [DB]
NTR

SI-6 External Contexts (eg NGI/EGI) [PC/JH]
NTR update on action below.

REVIEW OF ACTIONS

868.3 DB and SS to look at Storage accounting – ongoing
email exchanges this morning. Carry on looking at that.
868.4 JH to contact Adrian Beven regarding his Bell II query – action complete.
870.1 DB to make contact with Sussex on the topic of their relationship with GridPP.
Do they wish to continue as things stand or engage in a different way? Ongoing
waiting for things to settle down over Christmas. Dave will leave this on his list for now.