Enable Queues on ARC HTCondor

From GridPP Wiki
Jump to: navigation, search


One of the major differences between a standard CREAM+Torque/maui system and ARC-CE+HTCondor is the lack of queues to partition the nodes.

ARC has a queue concept, however the system, in particular the BDII portion is not really coded to handle it if the batch system underneath doesn't have a queue concept. To make queues work I had to patch the ARC code for HTCondor. I use the arc_ce puppet module in HEP_puppet though lately my clone is diverging quite a bit. The code described below is in Manchester arc_ce I'm also opening issues for the upstream arc_ce but it seems to be abandoned.

Here is what I've done

Code changes

ARC code changes

  • I've modified Condor.pm, which publishes the number in the Glue2.0 portion of the BDII, to collect max and default walltime and cputime information from arc.conf rather than adding fixed numbers. The code I've added is
my %queue_ei= queue_extra_info($qname);

   $lrms_queue{maxwalltime} = $queue_ei{maxwalltime} || ;
   $lrms_queue{minwalltime} = $queue_ei{minwalltime} || ;
   $lrms_queue{defaultwallt} = $queue_ei{defaultwallt} || ;
   $lrms_queue{maxcputime} = $queue_ei{maxcputime} || ;
   $lrms_queue{mincputime} = $queue_ei{mincputime} || ;
   $lrms_queue{defaultcput} = $queue_ei{defaultcput} || ;

   $lrms_queue{status} = 1;
   return %lrms_queue;
sub queue_extra_info($){
   require ConfigParser;
   my $qname = shift;

   my $parser = ConfigParser->new($arcconf)
       or die "Cannot parse $arcconf config file";
   return $parser->get_section("queue/$qname")
  • I've added the same code in glue-generator.pl because several systems are still using glue 1.3

Puppet code changes

I've changed a bit support for queues. The main two changes are

  • I've removed the cluster authorizedvo default values in the class because they override those on a per queue basis.
  • I've changed the current handling to just another template snapshot rather than an extra class creating a resources. The template then just loops on an hash of queues in this way
<% @queues.each_pair do |queue_name, queue_data| %>
[queue/<%= queue_name -%>]
name="<%= queue_name -%>"
<% if queue_data['default_memory'] -%>
defaultmemory=<%= queue_data['default_memory'] %>
<% end -%>
<% if queue_data['comment'] -%>
comment="<%= queue_data['comment'] -%>"
<% end -%>

I've put the template in a file called queues.erb instead of queue.erb to avoid overriding completely the previous method. The hiera snapshot is like before with some extra parameters even if they are not in the a-rex schema they can be used see time limits later.

  • The corresponding hiera data structure for the queues is as follows
   default_memory: 2048
   maxwalltime: 4320
   defaultwallt: 1440
   maxcputime: 4320
   defaultcput: 1440
     - '8cpu:2'
   homogeneity: true
   condor_requirements: "(Opsys == \"linux\") && (OpSysMajorVer == 7)"
     - atlas
     - vo.northgrid.ac.uk
   ac_policy: true
  • I've added support for arc-vomsac-check (a tool that enables ACLs on a per queue basis in the grid-manager template
authplugin="ACCEPTED timeout=60,onfailure=fail,onsuccess=pass %W/libexec/arc/arc-vomsac-check -L %C/job.%I.local \
-P %C/job.%I.proxy"
  • The puppet module contained previous fixes to the ARC code, I've committed my fixes with a different version number (the ARC version). Fixes got applied if the apply_fixes parameter was set to true. I modified the apply_fixes parameter from booleian to string and now people can select which fixes to apply. So if you want to apply these fixes you need to add to your arc_ce.yaml hiera
arc_ce::apply_fixes: 5.4.1


ARC setup

After this if you setup the queues in hiera as described above you'll get the in your /etc/arc.conf like this

condor_requirements="(OpSys =?= "LINUX") && (OpSysMajorVer =?= 7)"
ac_policy="VOMS: atlas"
ac_policy="VOMS: vo.northgrid.ac.uk"
condor_requirements="(OpSys =?= "LINUX") && (OpSysMajorVer =?= 7) && (HasGPUs =?= True)"
ac_policy="VOMS: atlas"
ac_policy="VOMS: lhcb"
ac_policy="VOMS: vo.northgrid.ac.uk"
ac_policy="VOMS: icecube"
ac_policy="VOMS: lsst"
ac_policy="VOMS: skatelescope.eu"

Things to note here are

  • authorizedvo: is now on a per queue basis now and published as such in the BDII so that VOs that use Dirac for example automatically go to the right queues.
  • ac_policy: is used by arc-vomsac-check to determine if the job is authorized to access that queue ARC will not submit jobs if they are not authorized. The latter is important if you want only some groups to access certain resources.
  • condor_requirements: these are added to the requirements the job arrives with. For example you can direct different queues to nodes with different OSs (OpSys =?= "LINUX") && (OpSysMajorVer =?= 7) or you can create a queue, like I did, that matches GPU nodes by adding a custom ClassAd condition (HasGPUs =?= True).

HTCondor setup

To select some nodes with a particular property you need to add the property to STARTD_ATTRS while to stop everyone using the nodes you need to add some job ClassAd to START on the WNs you want to group. The simplest way to do this is to install a file in /etc/condor/config.d which gets sourced after the startd configuration. In the ARC setup above for example I've added a condition HasGPUs this mean the job will be matched with nodes that respond positively to this.

The file to configure startd to have HasGPUs looks like this

[root@vm73 ~]# cat /etc/condor/config.d/21_man-gpu.config 
# HasGPUs is used to match the job with the nodes (i.e. the job requires the nodes which have HasGPUs)
HasGPUs = true
# This instead means the job will start only jobs that come from the GPU queue 
START = $(START)  && (NordugridQueue =?= "gpu")

HasGPUs is a WN attribute not a job one, so it cannot be added to START. START will tell startd which conditions the job has to satisfy to run it. In this case I want only the jobs arriving from the gpu queue to run and therefore I've added the (NordugridQueue =?= "gpu") condition to it.

Note that the startd configuration is in a file called /etc/condor/config.d/20_workernode.config so my GPU setup file will be executed afterwards because it starts with 21_man-gpu.config.

HTCondor has other more sophisticated mechanisms to access GPU nodes but they require further changes to submit-condor-job and the ARC CE to understand the request, so for now I haven't used this.