HTCondor Jobs In Containers
This page explains two methods of running LHC jobs in SL6 containers using HTCondor with SL7 worker nodes.
The SL7 or Centos7 worker nodes should have: CVMFS, HTCondor, Docker engine, CA certs and fetch-crl installed. See https://docs.docker.com/engine/installation/linux/rhel/ for information on how to install Docker engine.
Usually autofs is used with CVMFS but this will not work with Docker - you will likely get errors of the form:
ls: cannot open directory /cvmfs/cms.cern.ch: Too many levels of symbolic links
Each CVMFS repository must be mounted manually, e.g.
mkdir -p /cvmfs/grid.cern.ch /cvmfs/cms.cern.ch /cvmfs/atlas.cern.ch mount -t cvmfs grid.cern.ch /cvmfs/grid.cern.ch mount -t cvmfs cms.cern.ch /cvmfs/cms.cern.ch mount -t cvmfs atlas.cern.ch /cvmfs/atlas.cern.ch
In a production environment /etc/fstab should be used instead of course. For example
atlas /cvmfs/atlas.cern.ch cvmfs defaults 0 0 cms /cvmfs/cms.cern.ch cvmfs defaults 0 0 grid /cvmfs/grid.cern.ch cvmfs defaults 0 0
Add the condor user to the Docker group so that HTCondor has permission to run containers:
usermod -G docker condor
Some additional HTCondor configuration required in order to automatically bind mount CVMFS and /etc/grid-security into all Docker containers run by HTCondor:
DOCKER_MOUNT_VOLUMES = CVMFS, GRID_SECURITY, PASSWD, GROUP DOCKER_VOLUMES = CVMFS, GRID_SECURITY DOCKER_VOLUME_DIR_CVMFS = /cvmfs:/cvmfs:ro DOCKER_VOLUME_DIR_GRID_SECURITY = /etc/grid-security:/etc/grid-security:ro DOCKER_VOLUME_DIR_PASSWD = /etc/passwd:/etc/passwd:ro DOCKER_VOLUME_DIR_GROUP = /etc/group:/etc/group:ro
Here we also bind mount /etc/passwd and /etc/group into the containers so that pool accounts are available. The pool accounts must be configured on the host (in order for HTCondor to run a job as a particular user the user must exist on the host!)
With HTCondor 8.5.8 and above it's possible to specify what directories to mount in containers using an expression. For example, with this configuration:
DOCKER_VOLUME_DIR_GRID_SECURITY = /etc/grid-security:/etc/grid-security:ro DOCKER_VOLUME_DIR_PASSWD = /etc/passwd:/etc/passwd:ro DOCKER_VOLUME_DIR_GROUP = /etc/group:/etc/group:ro DOCKER_VOLUME_DIR_CVMFS_GRID = /cvmfs/grid.cern.ch:/cvmfs/grid.cern.ch:ro DOCKER_VOLUME_DIR_CVMFS_CMS = /cvmfs/cms.cern.ch:/cvmfs/cms.cern.ch:ro DOCKER_VOLUME_DIR_CVMFS_ATLAS = /cvmfs/atlas.cern.ch:/cvmfs/atlas.cern.ch:ro DOCKER_MOUNT_VOLUMES = GRID_SECURITY, PASSWD, GROUP, CVMFS_GRID, CVMFS_CMS, CVMFS_ATLAS DOCKER_VOLUME_DIR_CVMFS_CMS_MOUNT_IF = regexp("cms",Owner) DOCKER_VOLUME_DIR_CVMFS_ATLAS_MOUNT_IF = regexp("atl",Owner)
the CMS CVMFS repository would only be available to jobs which are running with a username containing "cms" and the ATLAS CVMFS repository would only be available to jobs which are running with a username containing "atl". Other volumes, such as the grid CVMFS repository, are made available to all jobs.
If you want to force all jobs to run in Docker containers by default, this can be done easily by some configuration like the following:
WantDocker = True DockerImage = "alahiff/grid-worker-node:1" SUBMIT_EXPRS = $(SUBMIT_EXPRS), WantDocker, DockerImage
where the image name should be changed as appropriate. In an environment where jobs could be run on either normal worker nodes or in containers (e.g. during migration from SL6 to to SL7 with the Docker universe), it is probably better to control the number of jobs requesting the Docker universe by using a job router. E.g.
CVMFS needs to be configured to include the cernvm-prod.cern.ch repository - this is where the container root filesystems will come from. This means that the variable CVMFS_REPOSITORIES in /etc/cvmfs/default.local should contain cernvm-prod.cern.ch. Also /etc/fstab should contain the CVMFS repositories, for example:
atlas /cvmfs/atlas.cern.ch cvmfs defaults 0 0 cms /cvmfs/cms.cern.ch cvmfs defaults 0 0 grid /cvmfs/grid.cern.ch cvmfs defaults 0 0 cernvm-prod /cvmfs/cernvm-prod.cern.ch cvmfs defaults 0 0
Follow the instructions http://singularity.lbl.gov/install-linux in order to build the Singularity RPM which should be installed on your SL7/Centos7/RHEL7 worker nodes.
HTCondor 8.5.8 or above must be used on the worker nodes. The HTCondor configuration should be almost the same as what you're already using on SL6 but PID namespaces should be disabled. Some additional configuration is needed for Singularity. Example HTCondor configuration which will run all jobs in a SL6 CernVM container using Singularity with CVMFS mounted inside the containers:
SINGULARITY = /usr/bin/singularity SINGULARITY_JOB = true SINGULARITY_IMAGE_EXPR = "/cvmfs/cernvm-prod.cern.ch/cvm3" SINGULARITY_TARGET_DIR = /srv SINGULARITY_BIND_EXPR = "/cvmfs" MOUNT_UNDER_SCRATCH = /tmp, /var/tmp
Note that by default /etc/grid-security/certificates is a symbolic link to /cvmfs/grid.cern.ch/etc/grid-security/certificates so in theory you don't need to install CA certs and run fetch-crl on the worker nodes and bind mount /etc/grid-security/certificates from the host into the containers.