Difference between revisions of "Enable Cgroups in HTCondor"
(Created page with " Following are the steps to enable cgroups on an HTCondor WN. In this example we use node067 as a representative node: 1. ensure libcgroup package is installed, if not, yum ...") |
|||
Line 4: | Line 4: | ||
1. ensure libcgroup package is installed, if not, yum install it. | 1. ensure libcgroup package is installed, if not, yum install it. | ||
− | node067:~# rpm -qa | grep cgroup | + | node067:~# rpm -qa | grep cgroup |
− | libcgroup-0.37-7.el6.x86_64 | + | libcgroup-0.37-7.el6.x86_64 |
2. add a group htcondor to /etc/cgconfig.conf | 2. add a group htcondor to /etc/cgconfig.conf |
Revision as of 15:47, 9 October 2014
Following are the steps to enable cgroups on an HTCondor WN. In this example we use node067 as a representative node:
1. ensure libcgroup package is installed, if not, yum install it.
node067:~# rpm -qa | grep cgroup libcgroup-0.37-7.el6.x86_64
2. add a group htcondor to /etc/cgconfig.conf
node067:~# cat /etc/cgconfig.conf mount { cpu = /cgroup/cpu; cpuset = /cgroup/cpuset; cpuacct = /cgroup/cpuacct; devices = /cgroup/devices; memory = /cgroup/memory; freezer = /cgroup/freezer; net_cls = /cgroup/net_cls; blkio = /cgroup/blkio; } group htcondor { cpu {} cpuacct {} memory {} freezer {} blkio {} }
If you want to set some memory restriction for the cgroup, add the following in memory{}, The following example ensures that the maximum physical memory can be used by the cgroup is 132122908K; and the maximum (physical memory + swap) can be used by the cgroup is 158966452K. This will protect the machine's swap space from getting overloaded.
memory { memory.use_hierarchy="1"; memory.limit_in_bytes=132122908K; memory.memsw.limit_in_bytes=158966452K; }
3. start the cgconfig daemon, a directory htcondor will be created under /cgroup/*/
node067:~# service cgconfig start; node067:~# chkconfig cgconfig on node067:~# ll -d /cgroup/memory/htcondor/ drwxr-xr-x. 66 root root 0 Oct 9 11:58 /cgroup/memory/htcondor/
4. in the condor WN configuration , add the following lines for STARTD daemon and then restart the startd daemon:
# Enable CGROUP control BASE_CGROUP = htcondor # hard: job can't access more physical memory than allocated # soft: job can access more physical memory than allocated when there are free memory CGROUP_MEMORY_LIMIT_POLICY = soft
Then when there are jobs running on this WN, there will be a list of condor_tmp_condor_slot* directories created under /cgroup/*/htcondor/:
node067:~# ll -d /cgroup/memory/htcondor/condor_tmp_condor_slot1_* drwxr-xr-x. 2 root root 0 Oct 8 09:22 /cgroup/memory/htcondor/condor_tmp_condor_slot1_10@node067.beowulf.cluster drwxr-xr-x. 2 root root 0 Oct 9 06:26 /cgroup/memory/htcondor/condor_tmp_condor_slot1_11@node067.beowulf.cluster drwxr-xr-x. 2 root root 0 Oct 9 05:02 /cgroup/memory/htcondor/condor_tmp_condor_slot1_12@node067.beowulf.cluster drwxr-xr-x. 2 root root 0 Oct 9 05:18 /cgroup/memory/htcondor/condor_tmp_condor_slot1_13@node067.beowulf.cluster drwxr-xr-x. 2 root root 0 Oct 9 10:42 /cgroup/memory/htcondor/condor_tmp_condor_slot1_14@node067.beowulf.cluster drwxr-xr-x. 2 root root 0 Oct 8 12:32 /cgroup/memory/htcondor/condor_tmp_condor_slot1_15@node067.beowulf.cluster drwxr-xr-x. 2 root root 0 Oct 9 06:52 /cgroup/memory/htcondor/condor_tmp_condor_slot1_16@node067.beowulf.cluster drwxr-xr-x. 2 root root 0 Oct 9 08:43 /cgroup/memory/htcondor/condor_tmp_condor_slot1_17@node067.beowulf.cluster drwxr-xr-x. 2 root root 0 Oct 9 06:14 /cgroup/memory/htcondor/condor_tmp_condor_slot1_18@node067.beowulf.cluster
From these directories you can retrieve the recorded information.
More information can be found in the condor manual:
https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToLimitMemoryUsage
http://research.cs.wisc.edu/htcondor/manual/current/3_12Setting_Up.html#SECTION0041212000000000000000
http://research.cs.wisc.edu/htcondor/manual/current/3_12Setting_Up.html#SECTION0041214000000000000000