ARC6 EL9

From GridPP Wiki
Jump to: navigation, search

ARC6 on EL9

I'll try to keep this as neutral as possible with regard to EL9 choice.

For full disclosure I'm testing these parts on Alma9(.2) myself which I've found is an experience which more closely matches the RHEL9 experience out of the box. (CO-Stream may also work, but eat your homework so I'm ignoring this option for now)


ECDF stack CO7

We make use of Ansible playbooks as recipes for deploying (NOT KEEPING STATE) of VMs and the services on a VM.

We treat our Tier2 services as being disposible (as much as possible). If a single VM completely breaks down and stops working we can backup, nuke, and start again.

Our intention for running ARC6 on EL9 is simple, break away from CO7 before the OS is clearly Out Of Support.

We have 3 CE registered in GOCDB:

* 2 for production (ce1.gridpp.ecdf.ed.ac.uk,ce2.gridpp.ecdf.ed.ac.uk)
* 1 for testing/development/peace-of-mind (ce3.gridpp.ecdf.ed.ac.uk)

We now have an experimental VM not in the GOCDB which we're bringing up/down to understand ARC6 on EL9 and if it works well enough for us to jump, the goal is to attempt to migrate one CE late-'23/early-'24.

Our CE make use of ARGUS for account mapping and central banning.

`/etc/arc.conf`:

...
map_with_plugin = all-vos 30 /usr/libexec/arc/arc-lcmaps %D %P liblcmaps.so /usr/lib64 /etc/lcmaps/lcmaps.db arc
...

This is done through the lcmaps.

ECDF testing EL9

Our hope was to deploy a new VM using an EL9 repo.

Then, slightly modify our playbook to support EL9 and deploy a new ARC instance.

I'm currently syncing both ARC and LCMAPS repos to a box in Edinburgh in-case of:

* We might need to roll back
* We find a significant problem following master
* Master moves to Next

ARC6 EL9 rpms

For now we're making use of ARC6 nightly builds: https://www.nordugrid.org/arc/arc6/common/repos/nightlies-repo.html

To discover a build which is available for whatever arch you're using look through here: http://builds.nordugrid.org/nordugrid-arc/master/

For our use-case our .repo file looks like below with a cron job to update the arcnightly variable in yum.

`/etc/yum.repos.d/nordugrid-nightly.repo`:

[nordugrid-nightly]
name=Nordugrid ARC Master Nightly Builds - $basearch
baseurl=http://builds.nordugrid.org/nightlies/nordugrid-arc/master/$arcnightly/rocky/9/$basearch
enabled=1
gpgcheck=0

Unfortunately to replicate the setup we have from CO7 on an EL9 host we need to enable both `epel` and `epel-testing` repositories on Alma9 and RHEL9. (The solution is potentially different with Rocky and I don't know/care how to fix this, I just know the packes were getting built and put in other repos)

The problem(?) is the LDAP interface components rely on packages which have been deprecated from RHEL and therefore removed from core distros. These packages are now in epel-testing as of 22/11/2023 (hopefully they will make it to epel soon-ish)


LCMAPS EL9 rpms

msalle to the rescue! (Nikhef?)

https://copr.fedorainfracloud.org/coprs/msalle/LCMAPS/

`Enable the repo using COPR`

dnf copr enable msalle/LCMAPS

This provides lcmaps packages from the CO7 era with EL9 compatability.

There isn't too much here other than should we encourage these builds getting into EPEL?

ARC6 on EL9 Testing

Testing at this point is with ARC master (master being ARC6 based) currently on 21-11-2023.

I'm backing up remote builds and comparing the official builds nordugrid to RPMs compiled locally at ECDF.

I will be testing local builds of stable releases for ARC6 vs relying on nightly builds, but using builds from the developers means they're likely to be more correct even if they're not completely stable.


What I've found works

* SGE
* Submitting x509 jobs from CO7 client using LHCb and GridPP VO (submit to start to finish to output)
* Most command line interfaces
* Looks similar to the interfaces we see from ARC6 on CO7
* Integration with old ARGUS services
* Production ATLAS/LHCb jobs

What Doesn't work

* External calls to the ARC6-EL9 build via `arcinfo -c` - This is potentially a bug on our test deployment (or our build) which means this isn't working.
* Can't get a token based job to submit


What needs testing

* Test Token job submission. (I think this relies on arcinfo working so far!)
* Test other backends (no idea what support is like for slurm or others)
* Not 100% sure that there aren't other errors/bugs in the logs, need to review any errors in logs and compare this to the 'known errors' we see from ARC6 on CO7.
* Testing if EL8 works better as a backup for jumping to EL9. (We have no EL8 base-image for our VM which would delay trying this, but RPMs exist I think)