Difference between revisions of "Enable Cgroups in HTCondor"

From GridPP Wiki
Jump to: navigation, search
(typo)
Line 1: Line 1:
  
Following are the steps to enable cgroups on an HTCondor WN. In this example we use node067 as a representative node:
+
Following are the steps to enable cgroups on an HTCondor WN. In this example we use node067 as a representative node.
 +
 
  
 
====1.  ensure libcgroup package is installed, if not, yum install it.====
 
====1.  ensure libcgroup package is installed, if not, yum install it.====
Line 65: Line 66:
 
====Glasgow Scheduling Modifications====
 
====Glasgow Scheduling Modifications====
  
To improve scheduling on the Glasgow cluster we statically assign a memory amount based on the type of job in the system. This allows fin grained control over our Memory overcommit and allows us to restrict the number of jobs we run on our memory constrained systems. There are other ways to do this but this allows us to play with the parameters to see what works.
+
To improve scheduling on the Glasgow cluster we statically assign a memory amount based on the type of job in the system. This allows fine grained control over our Memory overcommit and allows us to restrict the number of jobs we run on our memory constrained systems. There are other ways to do this but this allows us to play with the parameters to see what works.
  
 
In the submit-condor-job found in /usr/share/arc/ we alter the following section to look like the below:
 
In the submit-condor-job found in /usr/share/arc/ we alter the following section to look like the below:

Revision as of 12:10, 4 May 2016

Following are the steps to enable cgroups on an HTCondor WN. In this example we use node067 as a representative node.


1. ensure libcgroup package is installed, if not, yum install it.

node067:~# rpm -qa | grep cgroup
libcgroup-0.37-7.el6.x86_64

2. add a group htcondor to /etc/cgconfig.conf

node067:~# cat /etc/cgconfig.conf
mount {
      cpu     = /cgroup/cpu;
      cpuset  = /cgroup/cpuset;
      cpuacct = /cgroup/cpuacct;
      devices = /cgroup/devices;
      memory  = /cgroup/memory;
      freezer = /cgroup/freezer;
      net_cls = /cgroup/net_cls;
      blkio   = /cgroup/blkio;
}
group htcondor {
      cpu {}
      cpuacct {}
      memory {}
      freezer {}
      blkio {}
}

3. start the cgconfig daemon, a directory htcondor will be created under /cgroup/*/

node067:~#  service cgconfig start;
node067:~#  chkconfig cgconfig on
node067:~# ll -d  /cgroup/memory/htcondor/
drwxr-xr-x. 66 root root 0 Oct  9 11:58 /cgroup/memory/htcondor/

4. in the condor WN configuration , add the following lines for STARTD daemon and then restart the startd daemon:

# Enable CGROUP control
BASE_CGROUP = htcondor
# hard: job can't access more physical memory than allocated
# soft: job can access more physical memory than allocated when there are free memory
CGROUP_MEMORY_LIMIT_POLICY = soft

Then when there are jobs running on this WN, there will be a list of condor_tmp_condor_slot* directories created under /cgroup/*/htcondor/:

node067:~# ll -d  /cgroup/memory/htcondor/condor_tmp_condor_slot1_*
drwxr-xr-x. 2 root root 0 Oct  8 09:22 /cgroup/memory/htcondor/condor_tmp_condor_slot1_10@node067.beowulf.cluster
drwxr-xr-x. 2 root root 0 Oct  9 06:26 /cgroup/memory/htcondor/condor_tmp_condor_slot1_11@node067.beowulf.cluster
drwxr-xr-x. 2 root root 0 Oct  9 05:02 /cgroup/memory/htcondor/condor_tmp_condor_slot1_12@node067.beowulf.cluster
drwxr-xr-x. 2 root root 0 Oct  9 05:18 /cgroup/memory/htcondor/condor_tmp_condor_slot1_13@node067.beowulf.cluster
drwxr-xr-x. 2 root root 0 Oct  9 10:42 /cgroup/memory/htcondor/condor_tmp_condor_slot1_14@node067.beowulf.cluster
drwxr-xr-x. 2 root root 0 Oct  8 12:32 /cgroup/memory/htcondor/condor_tmp_condor_slot1_15@node067.beowulf.cluster
drwxr-xr-x. 2 root root 0 Oct  9 06:52 /cgroup/memory/htcondor/condor_tmp_condor_slot1_16@node067.beowulf.cluster
drwxr-xr-x. 2 root root 0 Oct  9 08:43 /cgroup/memory/htcondor/condor_tmp_condor_slot1_17@node067.beowulf.cluster
drwxr-xr-x. 2 root root 0 Oct  9 06:14 /cgroup/memory/htcondor/condor_tmp_condor_slot1_18@node067.beowulf.cluster

From these directories you can retrieve the recorded information.

More information can be found in the condor manual:


Glasgow Scheduling Modifications

To improve scheduling on the Glasgow cluster we statically assign a memory amount based on the type of job in the system. This allows fine grained control over our Memory overcommit and allows us to restrict the number of jobs we run on our memory constrained systems. There are other ways to do this but this allows us to play with the parameters to see what works.

In the submit-condor-job found in /usr/share/arc/ we alter the following section to look like the below:

############################################################## 
# Requested memory (mb)
##############################################################
set_req_mem
if [ ! -z "$joboption_memory" ] ; then
 memory_bytes=2000*1024
 memory_req=2000
 # HTCondor needs to know the total memory for the job, not memory per core
 if [ ! -z $joboption_count ] && [ $joboption_count -gt 1 ] ; then
    memory_bytes=$(( $joboption_count * 2000 * 1024 ))
    memory_req=$(( $joboption_count * 2000 ))
 fi
 memory_bytes=$(( $memory_bytes + 4000 * 1024  ))  # +4GB extra as hard limit
 echo "request_memory=$memory_req" >> $LRMS_JOB_DESCRIPT
 echo "+JobMemoryLimit=$memory_bytes" >> $LRMS_JOB_DESCRIPT
 REMOVE="${REMOVE} || ResidentSetSize > JobMemoryLimit"
fi

RAL Modifications

In the submit-condor-job found in /usr/share/arc/ we comment out the line:

 REMOVE="${REMOVE} || ResidentSetSize > JobMemoryLimit"

to ensure that jobs are not killed if the memory usage exceeds the requested memory. In order to put a hard memory limit on jobs we include the following in SYSTEM_PERIODIC_REMOVE:

 ResidentSetSize > 3000*RequestMemory