RHEL9 systems

From GridPP Wiki
Revision as of 13:38, 19 July 2023 by Robert Currie 63054938fd (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

RHEL9 Experiences

Experiences with RHEL9 based systems and changes that are good to know.

Stream vs Derrivatives

What is Stream

CentOS Stream is a _minimal_ rolling release distro.

What this means is that CentOS Stream 9 can be thought of as permanently 9.1.5 or 9.2.5 i.e. mid-release between minor versions, but it is NOT 10-beta or 11-alpha.

Calling this a 'rolling release', whilst true, makes most people think of Gentoo or Arch linux which are enthusiast distros which more embrase the 'move fast and break things' approach.

Reasons to use Stream

Environments which are running user or 'untrusted' code are very well suited to a well maintained stream environment.

This provides:

* Faster access to new features
* Faster/Better security updates
* Stable RHEL base

For developers this means that you also get to identify problems early for people who are planning to upgrade from 9.1->9.2->9.3 so you can offer them better support.

Reasons to not use Stream

Environments running dedicated services which just _need_ security patching might experience breakage of certain 3rd-party apps/services or even some supported ones (given the huge codebase RHEL support).

Having 9.1/9.2/9.3 allows service maintainers to more easily temporarily revert until a bug is fixed.

Developers running large complex projects may also experience slight pain points of having a rolling distro underneath them. That being said, this is no different than trying to support Fedora/openSuSE or Ubuntu in that these distros regularly patch and add minor updates between major product cycles.

Missing Packages from EPEL??

Some things require PowerTools repo in EL8 which is now named CRB in EL9.

XFS incompatibility

XFS filesystems made under 9 aren't backwards compatible with 7/8. At Edinburgh we couldn't mount/edit our VM template from a 7/8 host unless it was made using EXT4.

I don't know how/if this will impact proxmox.



If you're running Alma9 in a VM environment chances are you might want to set a fixed hostname to match a DNS entry for this VM to be the system hostname.

Unfortunately NetworkManager will agressively try to set the system hostname based upon what it finds from the network by default. (Great for the HyperVisor admin, not so great for the VM admin).

To force the system to set the hostname to what you want without NetworkManager overriding it:


kernel.hostname = myawesomehostname

Console Access

If you want to enable console access to a VM (an alternative to VNC) the kernel has to be told to use a tty console as a terminal. This allows for a good fallback access to a VM when all else fails.

See here: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/configuring_and_managing_virtualization/assembly_connecting-to-virtual-machines_configuring-and-managing-virtualization#proc_opening-a-virtual-machine-serial-console_assembly_connecting-to-virtual-machines


No more network-scripts

Whilst I think you can beat RH9 clones into submission to accept network-scripts it's worth just using NetworkManager.

An example static IP configuration for a RH9 box if you're not using DHCP might look something like:







If you change these files you have to first restart NetworkManager to get it to re-read it's config from disk. Then, re-up any changed interfaces to get NM to apply teh changes you made. Or, if you fancy a cuppa, just reboot.


Logs in RAM

Journalctl by default put all of the system journal in /run which is in tmpfs. After some period of time the system locked up due to memory exhaustion which required a reboot and re-configure of journalctl. (Most of the noise in the logs was from external scanning services/tools probing http(s) endpoints, but on one host a significant few GB or so was due to the box being hammered with ssh requests.)

This was the case for Alma 9.0/9.1, it's unknown if Rocky configures this differently.

Performance Bottleneck

In high verbosity environments (multiple-podman containers, or dCache) incorrectly tuning journalctl can lead to problems with performance and useful debugging messages being lost. Edinburgh is investigating a good set of configuration parameters to recommend for high-verbosity environments running on hdd moving forward.

Config from Edinburgh:

[root@neeps ~]# mkdir -p /var/log/journal/
[root@neeps ~]# restorecon -R -v /var/log/journal/
[root@neeps ~]# cat /etc/systemd/journald.conf 
[root@neeps ~]# systemctl restart systemd-journald
[root@neeps ~]# journalctl --sync



Installing `docker-compose` and using `podman` on RHEL9 systems works quite well with complex networking configs.


Seems to work well with `docker-compose.yml` recipies, although defining "networks" in podman is not quite 100% compatible with docker syntax.



Containers created/managed by `podman-compose` vs `podman` typically end up with different properties. Whilst you can enter a podman-compose container from podman, if you (re-)start a podman-compose container using the podman command directly you will get different behaviours impacting things like network due to podmans different defaults.

There are some similar gotchas between `docker-compose` vs `docker` but in podman these are more readily apparent and can cause headaches.



If you disable firewalld you can now install `iptables-services` from the core repos vs epel and this gives back the ability to manage your firewall via `/etc/sysconfig/iptables`.


We plan to investigate moving some iptables policies to nftables.

Certificate Key Length Policy

To fix this for SSH see: https://access.redhat.com/solutions/6973518

This is mainly to allow connections back to legacy systems and I don't think changing the system policy to allow usage on RHEL9+ is encouraged

Certificate Encryption Type

SHA-1 at the time of writing is used by

update-crypto-policies --set DEFAULT:SHA1

Failure to load cert (RC2-40-CBC)

The above algorithm and others is now deprecated in later openssl3.

The fix for this is easy, just add the `-legacy` when trying to load an old format cert.

This was encountered when loading an old format host cert generated from CertWizard, which handles certs BADLY, not confirmed on PeCR as it generates csr and struggles to download the signed cert and not tested with cert_sorcerer yet due to having to spend >3days understanding and fixing a Tier2 that certwizard broke.