Centos7 Adoption

From GridPP Wiki
Jump to: navigation, search

Liverpool T2 Centos7 Workernodes

Introduction

This document summarises how we build our Centos7 workernodes, without Yaim, at Liverpool. This is for tests. Up to now, we have had jobs arriving from the atlas, gridpp, dteam, t2k, lhcb, ilc and ops VOs.

A repository of UMD middle-ware code has been ported to Centos7. The organiser of the release is Andrea Manzi (Cern.) These instructions relate to a HTCondor workernode, and you would have to adapot them to suit any other batch system.

The headnode for this mini-cluster is a normal HTCondor headnode running on SL6. The build for that is documented in the GridPP wiki: Example Build of an ARC/Condor Cluster.

I only describe the salient points that can lead to a successful deployment. Many of the details are particular to Liverpool, and they are not all covered. If anyone is interested, I'd be glad to provide more information on several areas such as multicore, draining, and so on (sjones@hep.ph.liv.ac.uk).

Make a plain vanilla worker-node

As a basis for all this, we make a plain-vanilla Centos7 worker-node to our local site standard. This is a bare hardware worker-node, which is then installed with the Centos7 operating system via our basic installation system, with IPV4 networking and a configured firewalld (a replacement for iptables.) We use the xfs filesystem for local volumes. The release we use is

# cat /etc/redhat-release 
CentOS Linux release 7.4.1708 (Core) 

The Centos7 software we use is obtained from yum repositories at this site:

http://ftp.mirrorservice.org/sites/mirror.centos.org/7.4.1708/

Some etxra material is obtained from yum repositories at this site (EPEL):

ftp://ftp.mirrorservice.org/sites/download.fedora.redhat.com/pub/epel/7/x86_64/

Puppet3 and Hiera

We have a Puppet server that works in conjunction with Hiera. These programs together provide the high-level build system that completes the build. Puppet provides a declarative means to define the desired end-state of nodes, allowing modular build designs that exhibit low coupling and high coherency. A set of Puppet modules can assigned to nodes in order to build them up incrementally. A module defines requirements that have to be fulfilled by (or applied to) the node, called “resources”. There are many types of resources; the common ones are yum repositories, packages, cronjobs, filesystem mounts, and configuration files.

Hiera reduces the amount of code and promotes re-use by giving a way to define parameters (or, interchangebly, variables) that can be used during the puppet build; this is useful for making modules generic, and hence portable. Without Hiera, the alternatives would be to either hard code the definitions and use conditional logic in the modules to deal with minor variations, or to have multiple modules covering a similar area, each with slightly different settings. Either alternative increases complexity.

Aside: There is a debate about what belongs in Puppet and what belongs in Hiera. In my simplistic view, Hiera should contain only what is desired to make the modules portable. The rest should be in Puppet.

So, Puppet modules can contain parameters that will be resolved by Hiera when the node is built. Hiera looks them up in a user-defined hierarchical database. Hence general, default settings can be generally applied to some nodes, while more specific settings can be defined where specificity is needed for a particular node. For example, I can override a nodetype parameter with a node specific parameter for one particular node, if I wanted to change this node alone, while leaving all the other ones to use more general nodetype values. Thus I can control many node-specific properties of the module without changing the module itself; the module remains general, i.e. portable.

At a high level, at our site, the build of our Centos7 workernodes is defined in Hiera in a “nodetype” file, which contains yaml code. A nodetype yaml file is illustrated below (with much code omitted, related to security, firewall details, ssh keys, ganglia checks and so forth.) The hardware of a node is represented in the name, i.e. this type of node uses the L5420 processor, which has 2 cpus and 8 hyper-threads (4 cores.)

Note on Centos7 Repositories: The software for the installation is all contained in various yum repositories. Here at Liverpool, the material is mirrored daily from the Internet from this website:

http://ftp.mirrorservice.org/sites/mirror.centos.org/7.4.1708/
# cat /etc/puppet/environments/production/hiera/nodetype/condor-c7-L5420.yaml
---

classes:
 - grid_users
 - condor_c7
 - cvmfs
 - grid_hosts

condor_c7::params::ral_node_label: "L5420"
condor_c7::params::ral_scaling: "0.896"
condor_c7::params::num_slots: "1"
condor_c7::params::slot_type_1: "cpus=8,mem=auto,disk=auto"
condor_c7::params::num_slots_type_1: "1"
condor_c7::params::slot_type_1_partitionable: "TRUE"
condor_c7::params::processors: "8"
condor_c7::params::numberOfCpus: "2"
condor_c7::params::headnode: "igrid5.ph.liv.ac.uk"

cvmfs::cvmfs_quota_limit: '20000'
cvmfs::cvmfs_http_proxy: 'http://hepsquid1.ph.liv.ac.uk:3128|http://hepsquid2.ph.liv.ac.uk:3128'
cvmfs::cvmfs_cache_base: '/var/lib/cvmfs'

yum_repositories:
  local:
    descr: "local RPMs"
    baseurl: "http://map2.ph.liv.ac.uk/yum/local/rhel/7/x86_64"
    enabled: "1"
    protect: "1"
    gpgcheck: "0"
    priority: "1"
  epel:
    descr: "Extra Packages for Enterprise Linux 7 - $basearch"
    baseurl: "http://map2.ph.liv.ac.uk/yum/ONLINE/pub/epel/7/$basearch"
    enabled: "1"
    gpgkey: "http://dl.fedoraproject.org/pub/epel/RPM-GPG-KEY-EPEL-7"
    gpgcheck: "1"
    priority: "99"


An actual node, say r26-n15.ph.liv.ac.uk, is associated with this particular nodetype via another Hiera yaml file, thus defining a concrete node.

cat /etc/puppet/environments/production/hiera/node/r26-n15.ph.liv.ac.uk.yaml 
---
classes:
 - 
nodetype: "condor-c7-L5420"
# condor_c7::params::someparm : "somevalue"

I'll describe some of the meaning of the nodetype file above, i.e. condor-c7-L5420.yaml. The classes section defines the build at a high level. The set of puppet modules listed under classes will be applied to the workernodes. I'll give more details on each module later; suffice to say that the main module for this build is called “condor_c7”, while the others are ancillary modules that are also needed.

The next block of code is a set of parameters, at the nodetype level. The base name, e.g. condor_c7, corresponds with the puppet modules that the parameters apply to. In our example these parameters would be unique to this hardware type. We would copy and customise condor-c7-L5420.yaml for other hardware as we make new builds for them. The parameters set properties for nodes of this type which include a label, a scaling factor, the slot layout and the HTCondor headnode to be used. Last we add a local repository and an epel repository to the ones already present from the basic build.

So, in summary, when Puppet/Hiera compiles this, the system will (a) put on some repos, (b) apply some modules to the node possibly replacing (c) default “module” level parameters with the specific “nodetype” ones or very specific “node” ones. Hence Hiera selects values from a hierarchy, from the most general to the more specific (if present.)

Hiera customisation

To implement the above, Rob Fay modified the standard Hiera configuration, since the default layout does not include the nodetype level in its hierarchy. If we built a large cluster without the nodetype level, we would have lots of almost identical node definitions containing duplication. The nodetype level allows us to group nodes into an abstract nodetype. This modification reduces duplication by allowing re-use. The search hierarchy for Hiera at our site is now defined as follows.

# cat /etc/puppet/hiera.yaml
---
:backends:
  - yaml
:hierarchy:
  - node/%{::clientcert}
  - nodetype/%{::nodetype}
  - os/%{::operatingsystem}/%{::operatingsystemmajrelease}
  - os/%{::operatingsystem}
  - site-info
  - common

:yaml:
# datadir is empty here, so hiera uses its defaults:
# - /var/lib/hiera on *nix
# - %CommonAppData%\PuppetLabs\hiera\var on Windows
# When specifying a datadir, make sure the directory exists.
  :datadir: '/etc/puppet/environments/%{environment}/hiera'

We inserted the nodetype level between the node level, and the operating system level. In this representation, the levels are searched from the bottom up, hence values relating to highly general site-info are searched early on, progressing to the very specific node level, passing through various others of varying specificity, now including our user-defined nodetype level.

During the search, to find hits, Hiera uses some “fact” that is known about the node to compare to the database fields. If a hit is made, the specific value is pulled from the Hiera database and may be used as the final resolution of a variable (or it may be overridden later, by an even more specific value.) The “facts” are obtained from a node via a program called facter, that delivers a long list of values from the node about itself. Puppet subsequently uses these “facts” as pre-set variables/parameters. These facts include standard things that can be used in the hierarchy matching process, like clientcert, operatingsystem, operatingsystemmajrelease, etc. The facts do not, by default, include the invented “%{::nodetype}” fact. It's possible to extend facter, but Rob found an easier way to add the nodetype fact.

In the default set-up, when puppet starts a run, it always starts with the site.pp manifest, which is a general file that applies to all builds. /etc/puppet/environments/production/manifests/site.pp

Rob explicitly added this line to that file to read and set the nodetype variable. $nodetype = hiera('nodetype')

When puppet executes this, Hiera surfs its standard hierarchy for some parameter called nodetype. So we set up, in the definition for each node, something like this: nodetype: "condor-c7-L5420"

Hence, the nodetype variable is defined and added to those pre-set fact variables as “just another variable”. Subsequently, nodetype can be used to make matches, so our newly inserted level operates properly.

Puppet module layout

As can be seen in the classes section of the nodetype yaml file, modules provide modularity by separating out the concerns. Once I've defined the condor-c7-L5420 nodetype, I can focus on coding that module without too much worry about how it will interact with others. Hence I'll elaborate on the salient details of some of the modules above so that they could be adopted elsewhere without too much modification. The modules typically live in directories in:

/etc/puppet/environments/production/modules/...

And the modules typically have a layout like this abridged representation.

condor_c7/manifests/init.pp
condor_c7/manifests/params.pp
condor_c7/files/hostcert_hg95.pem
condor_c7/files/hostkey_hg95.pem
condor_c7/templates/00-node_parameters.erb
condor_c7/templates/condor_config.local.erb

The main stipulations of the module are usually in the init.pp file, while the module level parameters (i.e. quite general ones) are in the params.pp file. Files to be installed or used by the modules are listed in the files/... directory structure, and templates (.erb files), which are special files which can be parametrised, are also present in this example. Templates are useful when some generic config file needs some specific edits to be applied when the file is put on the node.

Module details

I've made a tar archive of some the actual modules used at Liverpool, with modifications due to security concerns. These are available[2]. Other Puppet modules were copied from various sources. You have to get the cvmfs module, written at Cern. The grid_hosts module just sets up the /etc/hosts, which is specific to our site. I only cover the modules that are most relevant, which are grid_users and condor_c7.

The grid_users module

The grid_users module makes the grid users, as you might expect. Yaim is not used, so it is very fast. The layout is as follows.

#  find grid_users -type f
grid_users/files/users.conf
grid_users/files/makeNewUsers.pl
grid_users/manifests/init.pp
grid_users/manifests/params.pp

The params.pp file is empty apart from some version number code. The users.conf file is a standard users.conf file that would, in former times, have been used by Yaim. But now we don't use Yaim, so it is used as input for the makeNewUsers.pl script instead of Yaim.

The init.pp file is the heart of the module. It simply installs the users.conf file and the makeNewUsers.pl file, and runs that perl program to make the users specified in the users.conf file.

The condor_c7 module

This is a more complex module so I'll take some time to document what it does. Here is the layout of the module.

# find condor_c7 -type f | grep -v vo.d
condor_c7/files/pool_password
condor_c7/files/cgconfig.conf
condor_c7/files/GLITE
condor_c7/templates/00-node_parameters.erb
condor_c7/templates/condor_config.local.erb
condor_c7/manifests/init.pp
condor_c7/manifests/params.pp

Due to security concerns, it is incomplete, but it should be sufficient to build a worker-node with the appropriate tweaks. Looking at the condor_c7/files directory, there is the usual hostcert/key pair. Also, for HTCondor, there is the pool_password, that both the head node and the worker nodes share. You'll need to read the HTCondor Administration Manual for details on that. There is also a CGROUPS file for the set-up, and a GLITE file that incoming jobs source in order to get the GLITE bash variables, e.g. the SW_DIRs etc.

Looking at the condor_c7/templates directory now. First consider the condor_config.local.erb file. This is a HTCondor local config file that will be used by manifests/init.pp. As the file is copied on, the parameters within it are replaced by the values found by the Hiera system. A similar thing will be done for the 00-node_parameters.erb. In fact, both files could be combined but it is useful to separate out the node parameters because this varies according to nodetype and it's easy to check that the lookup works if it's in its own file.

Aside: By some cosmic co-incidence, the parameter place holders in the erb files resemble the tags in a Java jsp file, which serve a similar function in a Java Servlets application: e.g. <%= @ral_node_label %>.

The manifests/params.pp file contains only version material. But note that the HTCondor version is fixed on 8.6.0-1 because the newer version, 8.6.1-1, has dependency problems. The other default parameters are silly ones, only put in to make sure they are updated.

The manifests/init.pp file orchestrates the whole module. You can download the tar archive[2] and open that file. Inside it, you will find that it installs some yum repositories. You'll have to update these to your own values, since we cache some of the repos locally at Liverpool. But the names of the repos should be sufficient clue as to where to find them.

Next, a set of required packages is specified. Some of them are absolute requirements, others are just nice to have. Some of them have been asked for by various VOs. Some are for testing the system etc.

Next I specify those .erb template files, i.e. the HTCondor local config and the node parameters. (Aside: it is customary, when configuring HTCondor, to put the “fixed” config unchanged from the distribution in the /etc/condor directory, and to put an overlay file, i.e. a condor_config.local file, in the /etc/condor/config./d directory which overrides some of the values in the fixed file.) The parameters in the files will be replaced by the values obtained by Hiera. After the .erb files are written, the module puts down some files and directories, specifically, a control groups config file, a pool_password file (must be shared with the head node), some shell scripts, some mount points and so on.

Next, you see ARC runtime files. Instead of the software tags directory, which GLITE uses, ARC uses these tag files. They inform a job what is available on the system. It's a bit of an arcane idea, but I still do it... for now. Next, you find an ssh key and then a mount of the VO software area. Again, it's an old idea, now that we have cvmfs, but I still implement it, as asked. Finally, I start some services running (condor, CGROUPS) and deal with the fetch_crl procedure. And that's it. That's the summary on how we use Puppet/Hiera to build Centos7 worker nodes!

References

[1] Example Build of ARC HTCondor Cluster. [1]

[2] The tar archive with example modules. [2]