RAL Memory Limits

From GridPP Wiki
Jump to: navigation, search

Current situation

All jobs on the RAL batch system are run in distinct cgroups, using the following cgroup subsystems: cpu, cpuacct, memory, freezer and blkio. No memory limits are applied by HTCondor itself, instead we have soft memory limits applied via cgroups. In this case the cgroup attribute memory.soft_limit_in_bytes for each job is set to the amount of memory requested by the job. Jobs are allowed to exceed this memory limit if there is free memory available on the system. Only when there is contention between other processes for physical memory will the system force physical memory into swap and push the physical memory used towards the assigned limit. In addition, for the htcondor cgroup we have memory.limit_in_bytes set to the physical memory available on the worker node, and memory.memsw.limit_in_bytes set to the sum of the physical memory and 20% of the swap (by default our worker nodes have the same amount of swap as physical memory). This limits the total amount of memory and swap used by all jobs on each worker node. The condor_starter for each job registers to have the cgroup memory controller notify it when the per-cgroup OOM fires, therefore HTCondor knows when a job has been killed by the OOM killer.

For more information about cgroups, see for example https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/ch01.html