Example Build of an EMI-UMD Cluster

From GridPP Wiki
Jump to: navigation, search

Example EMI/UMD Cluster

A comprehensive set of generic installation instructions can be found here:

https://wiki.italiangrid.it/twiki/bin/view/CREAM/SystemAdministratorGuideForEMI1#Installation_of_the_CREAM_CLI

Example

I provide some salient notes on an example installation so that other sysadmins can compare notes and bring it fully up to date. Please add to or edit this document as necessary. The example describes some generic information about the install process at a well known university. Several important aspects, such a firewall settings and exact network config, are totally omitted from this description.

Note: There has been a bit of discussion on whether to use EMI or UMD repositories for the installs. The example describes UMD. The process for EMI is similar, but uses different servers for its yum repositories, which you can google for.

Server Layout

The example site makes use of YAIM for configuring the systems, as well as the Kickstart and Puppet fabric management tools (this can be ignored). The hardware is:

  • a set of worker nodes
  • a TORQUE/Maui server
  • a CREAM CE

Important note: In the example, the whole set-up will be built. That is because the EMI copy of TORQUE features a new authentication tool, known as munge. It was necessary have munge-enabled software on all the components, making a piecemeal install difficult (perhaps possible?)

Building the TORQUE server

Prepare a server to host the TORQUE software. Our server (a VM running under KVM) has 3 CPUs, 4 GB of RAM and a 45 GB disk. The example server is called mace.ph.famous.ac.uk, and it was initially set-up with a base install of SL5.5. It's good to do a yum clean all after the base install.

Host certificates are not required in the TORQUE configuration.

The TORQUE and CREAM servers will be built on separate hardware. Please note these facts. There will be no BLAH parser running on this TORQUE server. Instead, the CREAM server will itself run a suit of BLAH programs to interface to this TORQUE server remotely (meaning it will use qstat etc. to get job status). To facilitate this, it is necessary for this TORQUE server to export certain log directories (see NFS Exports, below) so that the BLAH parser running remotely on the CREAM server will be able to keep itself up to date with events. For reference, this configuration will use the New BLAH parser.


Odd users and groups

Get rid of any group called stap-server, which will collide with YAIM if it's left in.

Add a new group for munge, e.g.

/usr/sbin/groupadd -g 200 munge

Yum repositories and packages

The UMD yum repositories can be downloaded as RPMs from http://repository.egi.eu/category/umd_releases/distribution/umd_1/ It is necessary to install the rpms for both the epel and umd-release repos. If it's not provided in EPEL or UMD, then you'll also need the egi-trustanchors.repo, published as text here: http://repository.egi.eu/sw/production/cas/1/current/repo-files/egi-trustanchors.repo Finally, for historical reasons related to our build system, we also installed these two repos from the glite 3.2 instructions - jpackage5.0.repo and dag.repo.

For the sake of completeness, here's a list of the other repos that existed in the example install as a by-product of the OS install (sl-testing.repo, sl-srpms.repo, sl-fastbugs.repo, sl-debuginfo.repo, sl-contrib.repo, sl.repo, sl-security.repo, atrpms.repo adobe.repo), a local-sl5.repo for some packages that are private to this site and one redundant repo that we didn't need for TORQUE but is used by CREAM (CA.repo).

Packages

We needed to install two packages to make priorities work: yum-protectbase and yum-priorities.

Then, with respect to the installation, we needed these packages:

java-1.6.0-sun-compat
java-1.6.0-openjdk
emi-torque-server
ca-policy-egi-core
emi-torque-utils

Note on this: for the emi packages, use yum update but disable sl-base and sl-security, else it pulls in gcj version, which is a disaster (--disablerepo=sl-base --disablerepo=sl-security).

NFS exports

Prior to YAIM, it is necessary to make some shared directories, and export them with some lines like this is the NFS /etc/exports file:

/var/torque                   192.168.0.0/16(ro,async,no_root_squash) 138.253.178.0/24(ro,async,no_root_squash)
/etc/grid-security/gridmapdir 192.168.0.0/16(rw,async,no_root_squash) 138.253.178.0/24(rw,async,no_root_squash)
/opt/edg/var/info             192.168.0.0/16(rw,async,no_root_squash) 138.253.178.0/24(rw,async,no_root_squash)

The torque and gridmapdir directories are for the log files and security details, respectively. The other dir, info, is for the software tags. It isn't used on this server, but it's a good place to keep it if multiple CEs ever access your TORQUE server (for for the WNs and maybe the CE).

Munge

As mentioned above, munge needs its own group. If using YUM, then the package for munge was installed from (I think) the EPEL repository, and was pulled in by TORQUE. If you use something else, make sure munge is installed otherwise.

The idea of munge is that all the servers in a certain group share the same “munge key”, and they are thus able to authenticate themselves to each other (and encrypt traffic?). Thus, it is necessary for all the systems (TORQUE, CREAM and WNs) to have copies of the same MUNGE key. In our case, we use puppet to distribute that. However you do it, make a munge key using /usr/sbin/create-munge-key on some system that has munge installed on it (this one?) and use the resulting key on all the systems. The munge-key is installed as shown below and must be installed similarly on all systems in the example cluster. That's another reason why it's good to build the torque server first, as well as the NFS shares.

[root@mace yum.repos.d]# ls -lrt /etc/munge/munge.key 
-r-------- 1 munge munge 1024 Apr  4 19:14 /etc/munge/munge.key

Other files and settings

  • All systems in the example must have ssh installed with hostbased authentication, sharing a copy of the same shosts.equiv.
  • Install a /var/torque/server_priv/nodes file, that enumerates all the nodes in your cluster, e.g.
r31-n01.ph.famous.ac.uk np=8 lcgpro
r31-n02.ph.famous.ac.uk np=8 lcgpro
...
  • Install a /var/spool/maui/maui.cfg, that contains all the rules for the Maui scheduler. The syntax for this file is out of scope here.

Yaim

I suggest you nominate one of your existing glite site-info.def files as a standard, then edit it to make new ones, as required. You'll need to put your “standard” file in /root/glitecfg/site-info.def, and make sure your /opt/glite/yaim/etc/users.conf and /opt/glite/yaim/etc/groups.conf files are in place.

Here are some specific YAIM variables that we needed to use in the site-info.def file:

TORQUE_SERVER=mace.ph.famous.ac.uk
CONFIG_MAUI=yes
CONFIG_TORQUE_NODES=yes
CONFIG_GRIDMAPDIR=yes
TORQUE_VAR_DIR=/var/torque
BATCH_SERVER=$TORQUE_SERVER
JOB_MANAGER=pbs
CE_BATCH_SYS=pbs
BATCH_BIN_DIR=/usr/bin
BATCH_VERSION=torque-2.5.7-7
BATCH_LOG_DIR=/var/torque/

Run the YAIM command:

yaim -c -s /root/glitecfg/site-info.def -n TORQUE_server -n TORQUE_utils


More files and settings

  • Install and run the qmgr commands, e.g. write a file call qmgr.conf, with (something like) this inside:
#
create queue long
set queue long queue_type = Execution
set queue long resources_max.cput = 48:00:00
set queue long resources_max.walltime = 72:00:00
set queue long resources_default.ncpus = 1
set queue long resources_default.walltime = 48:00:00
set queue long acl_group_enable = True
set queue long acl_groups = alice
set queue long acl_groups += alicesgm
set queue long acl_groups += aliceprd
set queue long acl_groups += atlas
...
set queue long enabled = True
set queue long started = True
#
# Set server attributes.
#
set server scheduling = True
set server managers = root@mace.ph.famous.ac.uk
set server operators = root@mace.ph.famous.ac.uk
set server operators += root@hepgrid5.ph.famous.ac.uk

set server submit_hosts = root@mace.ph..famous..ac.uk
set server submit_hosts += root@hepgrid5.ph.famous.ac.uk

set server authorized_users=*@hepgrid5.ph.famous.ac.uk

set server default_queue = long
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server default_node = lcgpro
set server node_pack = False
set server kill_delay = 10
set server next_job_number = 1

Note in particular the “set server authorized_users” commands, for each CE (one in this case, hepgrid5).

Then run this command to load the configuration into the pbs_server.

qmgr < /root/scripts/qmgr.conf

Hopefully, that will complete the TORQUE server installation.

Building the CREAM server

Prepare a server to host the CREAM software. Our server (a VM running under KVM) had 2 CPUs, 2 GB of RAM and a 35 GB disk. The example server is called hepgrid5.ph.famous.ac.uk, and it was initially set-up with a base install of SL5.5. It's good to do a yum clean all after the base install.

Host certificates are required in this CREAM configuration, e.g.

[root@hepgrid5 ~]# ls -lrt /etc/grid-security/host*
-rw-r--r-- 1 root root 2146 Apr  4 19:25 /etc/grid-security/hostcert.pem
-r-------- 1 root root 1845 Apr  4 19:27 /etc/grid-security/hostkey.pem

Odd users and groups

Get rid of any group called stap-server, which will collide with YAIM if it's left in.

Add a new group for munge, e.g.

/usr/sbin/groupadd -g 200 munge

Yum repositories and packages

Use the same yum repos as you did for the TORQUE setup, above.

Packages

We needed to install two packages to make priorities work: yum-protectbase and yum-priorities.

Then, with respect to the installation, we needed these packages:

lcg-CA
emi-cream-ce
java-1.6.0-sun-compat
emi-torque-utils
ca-policy-egi-core

Note on this: for the tomcat5 package use yum update but disable sl-base and sl-security, else it pulls in gcj version, which is a disaster (--disablerepo=sl-base --disablerepo=sl-security).

NFS Mounts

When building the torque server, three directories were exported. It is necessary to make mount point directories on this CREAM server fpr those shared directories, and then put some mount commands in fstab to make sure they get mounted up, e.g.

mace:/var/torque        /var/torque     nfs     ro,sync,intr    0       0
mace:/opt/edg/var/info        /opt/edg/var/info       nfs     rw,sync,intr    0       0
mace:/etc/grid-security/gridmapdir      /etc/grid-security/gridmapdir   nfs     rw,sync,intr    0       0

Munge

Configure munge similarly to the TORQUE server.

Other files and settings

  • All systems in the example must have ssh installed with hostbased authentication, sharing a copy of the same shosts.equiv.
  • CREAM requires a /root/glitecfg/services/glite-creamce to set it up. This file (on the example cluster) looks like this:
[root@hepgrid5 services]# cat /root/glitecfg/services/glite-creamce
CEMON_HOST=hepgrid5.ph.famous.ac.uk
CREAM_DB_USER=dpmuser
CREAM_DB_PASSWORD=nufc4EVA
BLPARSER_HOST=mace.ph.famous.ac.uk # Don't think this is needed?

ACCESS_BY_DOMAIN=no
BLP_PORT=33335
CREAM_PORT=56567

CREAM_CE_STATE=Special
BLAH_JOBID_PREFIX=cr003_

BLPARSER_WITH_UPDATER_NOTIFIER=true

We also need a /etc/grid-security/admin-list file, e.g.

[root@hepgrid5 services]# cat /etc/grid-security/admin-list
/C=UK/O=eScience/OU=FamousUni/L=CSD/CN=some user 
/C=UK/O=eScience/OU=FamousUni/L=CSD/CN=someother user
/C=UK/O=eScience/OU=FamousUni/L=CSD/CN=yetanother user

A change is recommended to the mysql config. Edit the /etc/my.cnf and update it to contain a buffer_pool parameter (note: some users recommend an even bigger value, e.g. 1024M).

innodb_buffer_pool_size=256M

Yaim

Again, I suggest you nominate one of your existing glite site-info.def files as a standard, then edit it to make new ones, as required. You'll need to put your “standard” file in /root/glitecfg/site-info.def, and make sure your /opt/glite/yaim/etc/users.conf and /opt/glite/yaim/etc/groups.conf files are in place.

Here are some CE specific YAIM variables that we needed to use in site-info.def files. Edit to fit your needs.

CE_DATADIR=""
CE_HOST=hepgrid5.$MY_DOMAIN
TORQUE_SERVER=mace.ph.famous.ac.uk
CE_OTHERDESCR=Cores=4,Benchmark=14.59-HEP-SPEC06
CE_SI00=3805
CE_CAPABILITY="CPUScalingReferenceSI00=3805 Share=atlas:63 Share=lhcb:25 glexec"
CE_SF00=1820
CE_CPU_MODEL=Xeon
CE_CPU_VENDOR=intel
CE_CPU_SPEED=2400
CE_OS=ScientificSL
CE_OS_RELEASE=5.5
CE_OS_VERSION=Boron
CE_OS_ARCH=x86_64
CE_MINPHYSMEM=16384
CE_MINVIRTMEM=16384
CE_PHYSCPU=0
CE_LOGCPU=0
CE_SMPSIZE=8
CE_OUTBOUNDIP=TRUE
CE_INBOUNDIP=FALSE
CE_RUNTIMEENV="whatever ...
CE_BATCH_SYS=pbs

Furthermore, the site-info.def file on the example is identical to the one used for the torque server, with just these deltas, that relate to the CPU count and whether to fix torque. Special note slightly related to this: fixing the CPU counts to zero on one CE and more than one on another CE allow multiple CEs to transmit into the same batch system without double counting the CPU power; see http://map2.ph.liv.ac.uk/2010/03/22/capacity-publishing-and-accounting/).

CE_PHYSCPU=142   # add your counts here
CE_LOGCPU=568
CONFIG_MAUI=no
CONFIG_TORQUE_NODES=no

Run the YAIM command:

 
yaim -c -s /root/glitecfg/site-info.def -n creamCE -n TORQUE_utils


More files and settings

None to speak of.


Hopefully, that will complete the CREAM server installation.

Building the worker nodes

TBD-sj-130412

Testing it out

TBD-sj-130412

Tips

Some of GriddP users have provided this feedback, which may help.

  • If you have a small percentage of jobs failing with munge authentication problems (due to a bug in the torque version which is in EPEL), build a more recent version of torque from source (e.g. 2.5.10).
  • Increase PBS_NET_MAX_CONNECTIONS in src/include/server_limits.h if you start getting occasional jobs aborting with errors like this in the CE logs:
BLAH error: submission command failed (exit code = 1) (stdout:) (stderr:qsub: Error (15033 - Batch protocol error
  • Reduce the blah purge time parameter to "purge_interval=1000000" in /etc/blah.config. This prevents the BLAH registry becoming very large, which slows down submission.
  • Note that if you have separate TORQUE and CREAM CE machines you will need allow *both* access to the APEL accounting DB on the APEL node. Log on to the DB and do the following for both:

GRANT ALL ON accounting.* TO 'accounting'@'<host_name>' identified by '<password>';


This page is a Key Document, and is the responsibility of Steve Jones. It was last reviewed on 2014-09-01 when it was considered to be 90% complete. It was last judged to be accurate on 2014-09-01.