Manchester Multicore Torque Configuration
Contents
Dynamic Partitioning with Nikhef scripts
This method relies on a custom python script - wirtten by Jeff templon - to create dynamic partitions in torque. It needs a specific queue for mcore jobs to be created. It also requires to change the (so far) standard properties of the nodes as it uses the resources_default.neednodes queue attribute to create the partitions. I eventually put everything in puppet but below are the basic steps to install.
Scripts Installation
You can download the scripts from the Nikhef SVN repository. There is the main script *mcfloat* and 3 python modules to install. I've opted to install everything in /usr/local. The script will also require a $HOME/tmp directory for a file used to contain the list of nodes that are too empty and should be moved back to the single core partition.
Python Modules
mkdir -p $HOME/tmp mkdir -p /usr/local/lib/python2.6/site-packages cd /usr/local/lib/python2.6/site-packages wget https://ndpfsvn.nikhef.nl/cgi-bin/viewvc.cgi/pdpsoft/trunk/nl.nikhef.ndpf.tools/pjobstats/"torqueJobs.py?revision=2698" -O torqueJobs.py wget https://ndpfsvn.nikhef.nl/cgi-bin/viewvc.cgi/pdpsoft/trunk/nl.nikhef.ndpf.tools/pjobstats/"torqueAttMappers.py?revision=2526" -O torqueAttMappers.py wget https://ndpfsvn.nikhef.nl/cgi-bin/viewvc.cgi/pdpsoft/nl.nikhef.pdp.dynsched-pbs-plugin/trunk/torque_utils.py?revision=2722 -O torque_utils.py
- NOTE 1: scripts have been tested with python >=2.4
- NOTE 2: I've put the wget command to make it easy but of course to do that the version has to be set. If you want to make sure you have the latest version you'll need to check it via your broser and download that.
- NOTE 3: /usr/local/lib/python2.6/site-packages will need to be added to the PYTHONPATH. I've done it in the cron job at the end to avoid adding an extra profile script.
mcfloat
cd /usr/local/bin wget https://ndpfsvn.nikhef.nl/cgi-bin/viewvc.cgi/pdpsoft/nl.nikhef.ndpf.mcfloat/trunk/"mcfloat?revision=2770" -O mcfloat
You need to edit mcfloat to set 5 things
- Torque server
TORQUE = "<torque-server-fqdn>"
- Initial set of WN to use my nodes don't have sequential names so I replaced the elegant for loop to build the array with a plainer comma separated list of nodes names. Do not forget the quotes around the names.
CANDIDATE_NODES = [ 'node-0%02d.domain' % (n) for n in range(1,19) ]
- Queue name you can leave it or replace it. I've replace it with a less experiment oriented name
MCQUEUE = 'atlasmc'
- MAXDRAIN and MAXFREE these depend on the size of your nodes and cluster, you may want to play with it. I reduced the number of MAXDRAIN to 4 for example but I'm probably going to be review it to tune it.
MAXDRAIN = 7 # max num of nodes allowed to drain MAXFREE = 49 # max num of free slots to tolerate
Torque settings
nodes properties
mcfloat will rely on the nodes properties in /var/lib/torque/server_priv/nodes. If you are still using YAIM they are usually set to lcgpro. The mcfloat script makes use of *el6* for the nodes to use for single core jobs and *mc* for the nodes to use for multicore. If you want to use something else you need to edit the mcfloat script. I've opted for a smooth sed command
sed -i.old 's/lcgpro/el6/g' /var/lib/torque/server_priv/nodes
qmgr commands
Now you need to create the queue for multicore. You need to limit the access to those groups that will run multicore. For the moment for me it is only atlas production, but some sites may need to add cms too so I put the additional line as an example.
qmgr create queue mcore set queue mcore queue_type = Execution set queue mcore resources_max.cput = 48:00:00 set queue mcore resources_max.walltime = 72:00:00 set queue mcore resources_default.cput = 48:00:00 set queue mcore resources_default.neednodes = mc set queue mcore resources_default.walltime = 72:00:00 set queue mcore acl_group_enable = True set queue mcore acl_groups = atlprd set queue mcore acl_groups += cmsprd set queue mcore enabled = True set queue mcore started = True
you also need to set resource_default.neednodes for the other queues for the partitioning to work. So for example at my site the other main queue would be set to
set queue long resources_default.neednodes = el6
other queues have their own parameters.
cron job
Finally you need to setup a cron job to run mcfloat. Here is mine
cat /etc/cron.d/mcfloat.cron PYTHONPATH=/usr/local/lib/python2.6/site-packages:${PYTHONPATH} */20 * * * * root python /usr/local/bin/mcfloat >> /var/log/mcfloat.log 2>&1
YAIM
If you are still using YAIM you will need to add the extra queue there too. We create the queue with puppet but the BDII is still handled by YAIM. For example:
LONG_GROUP_ENABLE="...................." MCORE_GROUP_ENABLE="/atlas/ROLE=production" QUEUES="long mcore"
Puppet &co
if you are handling the torque/maui configuration via puppet it is likely it will try to override your /var/lib/torque/server_priv/nodes file when mcfloat modifies it.
If you are using the HEP-puppet torque module you can override this behaviour by creating a class that inherits from torque::server::config
class local::torque inherits torque::server::config { File [ '/etc/torque/nodes', '/var/lib/torque/server_priv/nodes' ]{ replace => "no", } }
and then call it in your cream/CE manifest after you have called torque::server. i.e.
class { 'torque::server': } class { 'local::torque': } class { 'torque::server::limits': }
since the property "replace" didn't exist in the parent class puppet will merge the properties of the File resources from parent and daughter, so if the file is deleted it recreates it, but if it is there it will not replace it.
Manchester modified script
Smaller sites do not have the slots full all the time. This happens in particular when the bigger players stop running for a while. Manchester modified Nikhef script to allocate to multicore the emptier nodes if there are any. It also changed the release of slots of nodes to depends on MAXDRAIN to slow the release down.