Difference between revisions of "HTCondor Jobs In Containers"

From GridPP Wiki
Jump to: navigation, search
(Docker)
(Image)
 
(63 intermediate revisions by one user not shown)
Line 1: Line 1:
This page explains two methods of running LHC jobs in SL6 containers using HTCondor with SL7 worker nodes.
+
This page describes the setup at RAL for running jobs in Centos 6 containers on SL7 worker nodes.
  
== Docker ==
+
== Worker nodes ==
The SL7 or Centos7 worker nodes should have:CVMFS, HTCondor, Docker engine, CA certs and fetch-crl installed. See https://docs.docker.com/engine/installation/linux/rhel/ for information on how to install Docker engine.
+
The SL7 worker nodes are configured to be as close as possible to our original SL6 workernodes. Exceptions:
 +
* No grid middleware installed
 +
* No HEP_OSlibs_SL6 RPM and dependencies or equivalent RPMs
 +
* glexec, lcas and lcmaps configuration files only (no RPMs are required)
  
Usually autofs is used with CVMFS but this will not work with Docker - you will likely get errors of the form:
+
We are currently using Docker 17.03.0-ce. We found that the only reliable choice for the storage driver is OverlayFS, which seems to be the default now for RHEL7-based systems. The file <code>/etc/docker/daemon.json</code> contains:
ls: cannot open directory /cvmfs/cms.cern.ch: Too many levels of symbolic links
+
  {
Each CVMFS repository must be mounted manually, e.g.
+
    "storage-driver": "overlay",
  mount -t cvmfs grid.cern.ch /cvmfs/grid.cern.ch
+
    "graph": "/pool/docker"
  mount -t cvmfs grid.cern.ch /cvmfs/cms.cern.ch
+
  }
In a production environment /etc/fstab should be used instead of course.
+
The partition <code>/pool</code> is an XFS filesystem which is formatted including the option <code>-n ftype=1</code>. This is essential. Without this there will be lots of kernel errors.
  
Add the condor user to the Docker group so that HTCondor has permission to run containers:
+
We are using CVMFS 2.3.2. In order to ensure that CVMFS mounts survive autofs restarts, the file <code>/etc/systemd/system/autofs.service.d/fuse.conf</code> should be created:
  usermod -G docker condor
+
[Service]
Some additional HTCondor configuration required in order to automatically bind mount CVMFS and /etc/grid-security into all Docker containers run by HTCondor:
+
KillMode=process
  DOCKER_MOUNT_VOLUMES=CVMFS, GRID_SECURITY, PASSWD, GROUP
+
 
  DOCKER_VOLUMES=CVMFS, GRID_SECURITY
+
HTCondor 8.6.3 or above is recommended. Add the following to sudoers to enable HTCondor to use the Docker CLI as root:
  DOCKER_VOLUME_DIR_CVMFS=/cvmfs:/cvmfs:ro
+
  User_Alias      CONDORUSER = condor
 +
Cmnd_Alias      DOCKERCMD = /usr/bin/docker
 +
CONDORUSER      ALL = NOPASSWD: DOCKERCMD
 +
and add the following line to the HTCondor configuration:
 +
DOCKER = sudo /usr/bin/docker
 +
The alternative method of giving HTCondor permission to run containers, i.e. adding the condor user to the docker group, is problematic with Docker 1.13.1 and above (Docker commands will try to read a config file from /root and not have permission to do so).
 +
 
 +
Our full HTCondor configuration relating to Docker is as follows:
 +
  DOCKER = sudo /usr/bin/docker
 +
DOCKER_DROP_ALL_CAPABILITIES=regexp("pilot",x509UserProxyFirstFQAN) =?= False
 +
DOCKER_MOUNT_VOLUMES=GRID_SECURITY, MJF, GRIDENV, GLEXEC, LCMAPS, LCAS, PASSWD, GROUP, CVMFS, CGROUPS, ATLAS_RECOVERY, ETC_ATLAS, ETC_CMS, ETC_ARC
 +
  DOCKER_VOLUME_DIR_ATLAS_RECOVERY=/pool/atlas/recovery:/pool/atlas/recovery
 +
DOCKER_VOLUME_DIR_ATLAS_RECOVERY_MOUNT_IF=regexp("atl",Owner)
 +
DOCKER_VOLUME_DIR_CGROUPS=/sys/fs/cgroup:/sys/fs/cgroup:ro
 +
DOCKER_VOLUME_DIR_CGROUPS_MOUNT_IF=regexp("atl",Owner)
 +
  DOCKER_VOLUME_DIR_CVMFS=/cvmfs:/cvmfs:shared
 +
DOCKER_VOLUME_DIR_ETC_ARC=/etc/arc:/etc/arc:ro
 +
DOCKER_VOLUME_DIR_ETC_ATLAS=/etc/atlas:/etc/atlas:ro
 +
DOCKER_VOLUME_DIR_ETC_ATLAS_MOUNT_IF=regexp("atl",Owner)
 +
DOCKER_VOLUME_DIR_ETC_CMS=/etc/cms:/etc/cms:ro
 +
DOCKER_VOLUME_DIR_ETC_CMS_MOUNT_IF=regexp("cms",Owner)
 +
DOCKER_VOLUME_DIR_GLEXEC=/etc/glexec.conf:/etc/glexec.conf:ro
 +
DOCKER_VOLUME_DIR_GRIDENV=/etc/profile.d/grid-env.sh:/etc/profile.d/grid-env.sh:ro
 
  DOCKER_VOLUME_DIR_GRID_SECURITY=/etc/grid-security:/etc/grid-security:ro
 
  DOCKER_VOLUME_DIR_GRID_SECURITY=/etc/grid-security:/etc/grid-security:ro
DOCKER_VOLUME_DIR_PASSWD=/etc/passwd:/etc/passwd:ro
 
 
  DOCKER_VOLUME_DIR_GROUP=/etc/group:/etc/group:ro
 
  DOCKER_VOLUME_DIR_GROUP=/etc/group:/etc/group:ro
Here we also bind mount /etc/passwd and /etc/group into the containers so that pool accounts are available. The pool accounts must be configured on the host (in order for HTCondor to run a job as a particular user the user must exist on the host!)
+
DOCKER_VOLUME_DIR_LCAS=/etc/lcas:/etc/lcas:ro
 
+
DOCKER_VOLUME_DIR_LCMAPS=/etc/lcmaps:/etc/lcmaps:ro
With HTCondor 8.5.8 and above it's possible to specify what directories to mount in containers using an expression. For example, with this configuration:
+
  DOCKER_VOLUME_DIR_MJF=/etc/machinefeatures:/etc/machinefeatures:ro
  DOCKER_VOLUME_DIR_GRID_SECURITY=/etc/grid-security:/etc/grid-security:ro
+
 
  DOCKER_VOLUME_DIR_PASSWD=/etc/passwd:/etc/passwd:ro
 
  DOCKER_VOLUME_DIR_PASSWD=/etc/passwd:/etc/passwd:ro
DOCKER_VOLUME_DIR_GROUP=/etc/group:/etc/group:ro
+
Some comments on this:
DOCKER_VOLUME_DIR_CVMFS_GRID=/cvmfs/grid.cern.ch:/cvmfs/grid.cern.ch:ro
+
* by default HTCondor drops all Linux capabilities in the containers it runs. This prevents glexec from working, so we unfortunately have to keep all standard capabilities for jobs using the pilot role.
DOCKER_VOLUME_DIR_CVMFS_CMS=/cvmfs/cms.cern.ch:/cvmfs/cms.cern.ch:ro
+
* Directories such as <code>/cvmfs</code>, <code>/etc/grid-security</code>, <code>/etc/machinefeatures</code>, <code>/etc/lcas</code>, <code>/etc/lcmaps</code> are bind mounted into the containers for all jobs
DOCKER_VOLUME_DIR_CVMFS_ATLAS=/cvmfs/atlas.cern.ch:/cvmfs/atlas.cern.ch:ro
+
* The glexec config file is bind mounted into the containers
DOCKER_MOUNT_VOLUMES=GRID_SECURITY, PASSWD, GROUP, CVMFS_GRID, CVMFS_CMS, CVMFS_ATLAS
+
* For ATLAS jobs only, <code>/sys/fs/cgroup</code> and the job recovery directory are bind mounted into the containers
DOCKER_VOLUME_DIR_CVMFS_CMS_MOUNT_IF=regexp("cms",Owner)
+
* <code>/etc/passwd</code> and <code>/etc/groups</code> are bind mounted into containers so that the pool accounts are available
DOCKER_VOLUME_DIR_CVMFS_ATLAS_MOUNT_IF=regexp("atl",Owner)
+
the CMS CVMFS repository would only be available to jobs which are running with a username containing "cms" and the ATLAS CVMFS repository would only be available to jobs which are running with a username containing "atl". Other volumes, such as the grid CVMFS repository, are made available to all jobs.
+
  
If you want to force all jobs to run in Docker containers by default, this can be done easily by some configuration like the following:
+
== CEs ==
  WantDocker = True
+
We need to ensure that jobs are submitted using the Docker universe with the appropriate image specified rather than the default Vanilla universe. Assuming HTCondor 8.6.x is running on the CEs, a schedd job transform can be used:
  DockerImage = "alahiff/grid-worker-node:1"
+
  JOB_TRANSFORM_NAMES = DefaultDocker
SUBMIT_EXPRS = $(SUBMIT_EXPRS), WantDocker, DockerImage
+
  JOB_TRANSFORM_DefaultDocker @=end
where the image name should be changed as appropriate.
+
[
 +
    Requirements = JobUniverse == 5 && DockerImage =?= undefined && Owner =!= "nagios";
 +
    set_WantDocker = true;
 +
    eval_set_DockerImage = "alahiff/grid-workernode-c6:20170627.1";
 +
    set_Requirements = ( TARGET.HasDocker ) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) && ( TARGET.Cpus >= RequestCpus ) && ( TARGET.HasFileTransfer );
 +
    copy_TransferInput = "OriginalTransferInput";
 +
    eval_set_TransferInput = strcat(OriginalTransferInput, ",", Cmd);
 +
]
 +
@end
  
== Singularity ==
+
== Image ==
HTCondor 8.5.8 or above must be used.
+
The Dockerfile for the image in use is here: https://github.com/alahiff/grid-workernode/blob/master/centos6/Dockerfile. The contents of the image are based on the standard SL6 worker nodes at RAL.

Latest revision as of 10:04, 9 July 2017

This page describes the setup at RAL for running jobs in Centos 6 containers on SL7 worker nodes.

Worker nodes

The SL7 worker nodes are configured to be as close as possible to our original SL6 workernodes. Exceptions:

  • No grid middleware installed
  • No HEP_OSlibs_SL6 RPM and dependencies or equivalent RPMs
  • glexec, lcas and lcmaps configuration files only (no RPMs are required)

We are currently using Docker 17.03.0-ce. We found that the only reliable choice for the storage driver is OverlayFS, which seems to be the default now for RHEL7-based systems. The file /etc/docker/daemon.json contains:

{
    "storage-driver": "overlay",
    "graph": "/pool/docker"
}

The partition /pool is an XFS filesystem which is formatted including the option -n ftype=1. This is essential. Without this there will be lots of kernel errors.

We are using CVMFS 2.3.2. In order to ensure that CVMFS mounts survive autofs restarts, the file /etc/systemd/system/autofs.service.d/fuse.conf should be created:

[Service]
KillMode=process

HTCondor 8.6.3 or above is recommended. Add the following to sudoers to enable HTCondor to use the Docker CLI as root:

User_Alias      CONDORUSER = condor
Cmnd_Alias      DOCKERCMD = /usr/bin/docker
CONDORUSER      ALL = NOPASSWD: DOCKERCMD

and add the following line to the HTCondor configuration:

DOCKER = sudo /usr/bin/docker

The alternative method of giving HTCondor permission to run containers, i.e. adding the condor user to the docker group, is problematic with Docker 1.13.1 and above (Docker commands will try to read a config file from /root and not have permission to do so).

Our full HTCondor configuration relating to Docker is as follows:

DOCKER = sudo /usr/bin/docker
DOCKER_DROP_ALL_CAPABILITIES=regexp("pilot",x509UserProxyFirstFQAN) =?= False
DOCKER_MOUNT_VOLUMES=GRID_SECURITY, MJF, GRIDENV, GLEXEC, LCMAPS, LCAS, PASSWD, GROUP, CVMFS, CGROUPS, ATLAS_RECOVERY, ETC_ATLAS, ETC_CMS, ETC_ARC
DOCKER_VOLUME_DIR_ATLAS_RECOVERY=/pool/atlas/recovery:/pool/atlas/recovery
DOCKER_VOLUME_DIR_ATLAS_RECOVERY_MOUNT_IF=regexp("atl",Owner)
DOCKER_VOLUME_DIR_CGROUPS=/sys/fs/cgroup:/sys/fs/cgroup:ro
DOCKER_VOLUME_DIR_CGROUPS_MOUNT_IF=regexp("atl",Owner)
DOCKER_VOLUME_DIR_CVMFS=/cvmfs:/cvmfs:shared
DOCKER_VOLUME_DIR_ETC_ARC=/etc/arc:/etc/arc:ro
DOCKER_VOLUME_DIR_ETC_ATLAS=/etc/atlas:/etc/atlas:ro
DOCKER_VOLUME_DIR_ETC_ATLAS_MOUNT_IF=regexp("atl",Owner)
DOCKER_VOLUME_DIR_ETC_CMS=/etc/cms:/etc/cms:ro
DOCKER_VOLUME_DIR_ETC_CMS_MOUNT_IF=regexp("cms",Owner)
DOCKER_VOLUME_DIR_GLEXEC=/etc/glexec.conf:/etc/glexec.conf:ro
DOCKER_VOLUME_DIR_GRIDENV=/etc/profile.d/grid-env.sh:/etc/profile.d/grid-env.sh:ro
DOCKER_VOLUME_DIR_GRID_SECURITY=/etc/grid-security:/etc/grid-security:ro
DOCKER_VOLUME_DIR_GROUP=/etc/group:/etc/group:ro
DOCKER_VOLUME_DIR_LCAS=/etc/lcas:/etc/lcas:ro
DOCKER_VOLUME_DIR_LCMAPS=/etc/lcmaps:/etc/lcmaps:ro
DOCKER_VOLUME_DIR_MJF=/etc/machinefeatures:/etc/machinefeatures:ro
DOCKER_VOLUME_DIR_PASSWD=/etc/passwd:/etc/passwd:ro

Some comments on this:

  • by default HTCondor drops all Linux capabilities in the containers it runs. This prevents glexec from working, so we unfortunately have to keep all standard capabilities for jobs using the pilot role.
  • Directories such as /cvmfs, /etc/grid-security, /etc/machinefeatures, /etc/lcas, /etc/lcmaps are bind mounted into the containers for all jobs
  • The glexec config file is bind mounted into the containers
  • For ATLAS jobs only, /sys/fs/cgroup and the job recovery directory are bind mounted into the containers
  • /etc/passwd and /etc/groups are bind mounted into containers so that the pool accounts are available

CEs

We need to ensure that jobs are submitted using the Docker universe with the appropriate image specified rather than the default Vanilla universe. Assuming HTCondor 8.6.x is running on the CEs, a schedd job transform can be used:

JOB_TRANSFORM_NAMES = DefaultDocker
JOB_TRANSFORM_DefaultDocker @=end
[
   Requirements = JobUniverse == 5 && DockerImage =?= undefined && Owner =!= "nagios";
   set_WantDocker = true;
   eval_set_DockerImage = "alahiff/grid-workernode-c6:20170627.1";
   set_Requirements = ( TARGET.HasDocker ) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) && ( TARGET.Cpus >= RequestCpus ) && ( TARGET.HasFileTransfer );
   copy_TransferInput = "OriginalTransferInput";
   eval_set_TransferInput = strcat(OriginalTransferInput, ",", Cmd);
]
@end

Image

The Dockerfile for the image in use is here: https://github.com/alahiff/grid-workernode/blob/master/centos6/Dockerfile. The contents of the image are based on the standard SL6 worker nodes at RAL.