RAL Memory Limits

From GridPP Wiki
Jump to: navigation, search

All jobs on the RAL batch system are run in distinct cgroups, using the following cgroup subsystems: cpu, cpuacct, memory, freezer and blkio. On most worker nodes there are no memory limits applied using cgroups. However, there are memory limits applied by HTCondor. If the resident set size exceeds the requested memory then the job is killed.

On one tranche of worker nodes, corresponding to around 2000 cores, we have soft memory limits applied via cgroups. In this case the cgroup attribute memory.soft_limit_in_bytes for each job is set to the amount of memory requested by the job. Jobs are allowed to exceed this memory limit if there is free memory available on the system. Only when there is contention between other processes for physical memory will the system force physical memory into swap and push the physical memory used towards the assigned limit.