GridPP PMB Meeting 819

GridPP PMB Meeting 819

Present: David Britton (Chair), Tony Cass, Peter Clarke, David Colling, Davide Costanzo, Alistair Dewhurst, Tony Doyle, Peter Gronbech, Jonathan Hays, Roger Jones, David Kelsey, Steve Lloyd, Andrew McNab, Andrew Sansum, Sam Skipsey (Minutes), Jill Sambrook (Minutes)

Apologies: Katy Ellis

Items

  • OSC talk – update and comments [DB]

Dave Britton, Peter Clarke, Roger Jones, Alistair Dewhurst and Sam Skipsey will all attend the OSC meeting on Tuesday 17th May at 10am. (Zoom Link and Agenda now circulated to the group).

DB Circulated slides at the weekend. AD PC and JH all provided some comments and DB has now incorporated these. If anyone else has some information to update or add please send to DB.

DB has asked STFC twice now about what is happening after GridPP6, looking for update on GridPP7, but no commitment yet. Possibly an opportunity to highlight to the OSC that funding decisions need to be made sooner rather than later and perhaps Jeremy could chase STFC on GridPP’s behalf.

Ambleside will be a good opportunity to start to look at writing for the next round of funding.

  • LHCb tickets (including the Vector Reads) update [AD]

As mentioned last week, ATLAS and LHCb have reported issues deleting files from Echo.  LHCb are seeing an average deletion rate of 1 file every 17 seconds, which is clearly not sufficient and likely a result of something mis-configured / broken. The team are still investigating this.

LHCb “slow FTS transfers” ticket – in the meantime, LHCb have changed to WebDAV from GridFTP so performance will have changed. AD suggests closing existing longstanding ticket and reopening if issue with WebDAV as well…

LHCb ticket around Vector Reads – AD provided a presentation about this ongoing investigation.

Focused on some runs against Rob Currie’s script, which was independent of LHCb software and regularly failed against Echo. 

Points noted:

  • The failures present socket time out errors.
  • If the vector read is successful then the response is quick.
  • We have found that if we target individual gateway machines then the success rate is higher.
    • We are looking at a possible issue with DNS Round-Robin alias: sometimes a connection “relooks up” the IP of the host, and gets a different one of the entries?
  • The XRootD solution is to have the hosts behind an Xrootd redirector, rather than DNS round-robins….

The improved vector read code (lockless reads) has been tested on production flows on some WNs. Measurable gains (~20% speed improvement), but less than on pure theoretical benchmarks.

AM asked AD for updates on when we might be able to say to users we can use the RAL analysis for jobs. We need to run bigger numbers and see how reliable this might be. Might be possible to run specific tests for LHCb by targeting pilots to specific CEs (which then would run workloads on the WNs with test configurations).

AM asked for update on (LHCb) Liaison post. AD confirmed interviews took place last week. Post has been offered to an individual. Can hopefully update on this soon.

DB questions if the slow deletes seen by LHCb have any connection to these issues? It was noted that JW has, and is, looking into that issue.

 We will return to these issues in a future PMB meeting.

  • Next week’s WLCG Overview Board. [DB]

The request for input to the upcoming WLCG Overview Board (19th of May) was discussed; DB and RJ will attend and report back at the next PMB.

AOCB: N/A

Standing Items

SI-0 Tier-1 Manager’s weekly report & Technical Meetings [AD]

Report discussed above.

A major upgrade is planned for the 25th May for Antares which should resolve some known issues.

Technical Meeting on Rucio/DIRAC likely this or next Friday. Janusz presented at DIRAC Workshop in meantime.

LHCb Liaisons post update soon. Interviews were held and awaiting confirmation of offer acceptance.

Discussion around RAL Liaison posts: 3.5 FTE exist (1 per ATLAS,CMS,LHCb + extra 0.5) and possible use of their resources for monitoring (longstanding) tickets. AD keen for Liaisons to focus on more complex issues and not used to track business as usual tickets. Can be involved in solving these issues.

SI-1 ATLAS Weekly Review and Plans [DCos]

NTR

SI-2 CMS Weekly Review and Plans [DC/KE]

KE on holiday. Nothing to update this week.  

SI-3 LHCb Weekly Review and Plans [AM]

No further update

SI-4 Operations Meeting Report [SS,PG,PC]
– Minutes 10-05-22

It was agreed the special network meeting would take place on the 24th.

This is a pre-meeting and so not crucial for all to attend. Main discussion around this to take place at Ambleside meeting (GridPP48).  

SI-5 LCG Management Board Report of Issues [DB]
– Environmental Impact of WLCG
– Update on WLCG Privacy Notices

SI-6 External Contexts (eg NGI/EGI) [PC/JH]

JISC finally interviewing David Salmon replacement on Wednesday. (Update from PC)

JH noted that IRIS-CasT is now approved, and will be interested in nominating some GridPP resources as test cases.

Actions

Actions

782.4 DCos – Investigate VAC migration plan for Birmingham. [Ongoing – raised at UK Cloud Support meeting. No formal request as yet.] – follow up

800.3 SS – Status report on Oxford performance and storage (OSC MEETING) – SS send to DB

800.5 AD – Arrange in person DIRAC/Rucio meeting at IC (Jan22).

[on-going]

– AD plans to re-establish virtual meetings and then set up an in person meeting. AD and DC to try and arrange a date.

818.1 – SS/PC Operations meeting on the 24th to focus on networking discussion

818.2 – SS and PC to talk to Matt D and sketch out agenda for larger network meeting over the summer

818.3 – Dave/Sam/Jill to get in touch with RJ soon to start discussion and plans around F2F meeting in Ambleside

818.4 – JS and SS to produce and add the PMB minutes to the CERN website every Monday morning when the Agenda is circulated to the PMB

818.5 – DB/SS Group to formalise VOs to be added to the approved list.

818.5 – PG/AD Pencil in the 25th of May as potential date for a T1 resource meeting. 1st June is plan B date.
AD to create an agenda.

819.1 – AD/AS to provide update for OB meeting for Tier 1 position on digital resource and energy costs.

782.4 DCos – Investigate VAC migration plan for Birmingham. [Ongoing – raised at UK Cloud Support meeting. No formal request as yet.] – follow up

800.3 SS – Status report on Oxford performance and storage (OSC MEETING) – SS send to DB

800.5 AD – Arrange in person DIRAC/Rucio meeting at IC (Jan22).

[on-going]

– AD plans to re-establish virtual meetings and then set up an in person meeting. AD and DC to try and arrange a date.

818.1 – SS/PC Operations meeting on the 24th to focus on networking discussion

818.2 – SS and PC to talk to Matt D and sketch out agenda for larger network meeting over the summer

818.3 – Dave/Sam/Jill to get in touch with RJ soon to start discussion and plans around F2F meeting in Ambleside

818.4 – JS and SS to produce and add the PMB minutes to the CERN website every Monday morning when the Agenda is circulated to the PMB

818.5 – DB/SS Group to formalise VOs to be added to the approved list.

818.5 – PG/AD Pencil in the 25th of May as potential date for a T1 resource meeting. 1st June is plan B date.
AD to create an agenda.

819.1 – AD/AS to provide update for OB meeting for Tier 1 position on digital resource and energy costs.