Enable Queues on ARC HTCondor

From GridPP Wiki
Revision as of 07:19, 28 March 2018 by Alessandra Forti c3313b292e (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


One of the major differences between a standard CREAM+Torque/maui system and ARC-CE+HTCondor is the lack of queues to partition the nodes.

ARC has a queue concept, however the system, in particular the BDII portion is not really coded to handle it if the batch system underneath doesn't have a queue concept. To make queues work I had to patch the ARC code for HTCondor. I use the arc_ce puppet module in HEP_puppet though lately my clone is diverging quite a bit. The code described below is in Manchester arc_ce I'm also opening issues for the upstream arc_ce but it seems to be abandoned. If you are not interested in the description of the code changes you can skip to the setup.

Here is what I've done

Code changes

ARC code changes

  • I've modified Condor.pm, which publishes the numbers in the Glue2.0 portion of the BDII, to collect max and default walltime and cputime information from arc.conf rather than adding fixed numbers. The code I've added is
my %queue_ei= queue_extra_info($qname);

   $lrms_queue{maxwalltime} = $queue_ei{maxwalltime} || '';
   $lrms_queue{minwalltime} = $queue_ei{minwalltime} || '';
   $lrms_queue{defaultwallt} = $queue_ei{defaultwallt} || '';
   $lrms_queue{maxcputime} = $queue_ei{maxcputime} || '';
   $lrms_queue{mincputime} = $queue_ei{mincputime} || '';
   $lrms_queue{defaultcput} = $queue_ei{defaultcput} || '';

   $lrms_queue{status} = 1;
   return %lrms_queue;
sub queue_extra_info($){
   require ConfigParser;
   my $qname = shift;

   my $parser = ConfigParser->new($arcconf)
       or die "Cannot parse $arcconf config file";
   return $parser->get_section("queue/$qname");
  • I've added similar code in glue-generator.pl because several systems are still using glue 1.3

Puppet code changes

I've changed a bit support for queues. The main two changes are

  • I've removed the cluster authorizedvo default values in the class because they override those on a per queue basis.
  • I've changed the current handling to just another template snapshot rather than an extra class creating a resources. The template then just loops on an hash of queues in this way
<% @queues.each_pair do |queue_name, queue_data| %>
[queue/<%= queue_name -%>]
name="<%= queue_name -%>"
<% if queue_data['default_memory'] -%>
defaultmemory=<%= queue_data['default_memory'] %>
<% end -%>
<% if queue_data['comment'] -%>
comment="<%= queue_data['comment'] -%>"
<% end -%>

I've put the template in a file called queues.erb instead of queue.erb to avoid overriding completely the previous method. The hiera snapshot is like before with some extra parameters even if they are not in the a-rex schema they can be used see time limits later.

  • The corresponding hiera data structure for the queues is as follows
   default_memory: 2048
   maxwalltime: 4320
   defaultwallt: 1440
   maxcputime: 4320
   defaultcput: 1440
     - '8cpu:2'
   homogeneity: true
   condor_requirements: "(Opsys == \"linux\") && (OpSysMajorVer == 7)"
     - atlas
     - vo.northgrid.ac.uk
   ac_policy: true
  • I've added support for arc-vomsac-check (a tool that enables ACLs on a per queue basis in the grid-manager template
authplugin="ACCEPTED timeout=60,onfailure=fail,onsuccess=pass %W/libexec/arc/arc-vomsac-check -L %C/job.%I.local \
-P %C/job.%I.proxy"
  • The puppet module contained previous fixes to the ARC code, I've committed my fixes with a different version number (the ARC version). Fixes got applied if the apply_fixes parameter was set to true. I modified the apply_fixes parameter from booleian to string and now people can select which fixes to apply.

How to apply the fixes above

If you don't use puppet you'll need to get the ARC.5.4.1 patched files from github and then change your arc.conf manually as described in the setup section. This is untested though, but it may be used as an example to make changes to your code.

If you use puppet you can download the manchester branch and add to your arc_ce.yaml hiera to apply the fixes.

arc_ce::apply_fixes: 5.4.1


ARC setup

After this if you setup the queues in hiera as described above you'll get the in your /etc/arc.conf like this

condor_requirements="(OpSys =?= "LINUX") && (OpSysMajorVer =?= 7)"
ac_policy="VOMS: atlas"
ac_policy="VOMS: vo.northgrid.ac.uk"
condor_requirements="(OpSys =?= "LINUX") && (OpSysMajorVer =?= 7) && (HasGPUs =?= True)"
ac_policy="VOMS: atlas"
ac_policy="VOMS: lhcb"
ac_policy="VOMS: vo.northgrid.ac.uk"
ac_policy="VOMS: icecube"
ac_policy="VOMS: lsst"
ac_policy="VOMS: skatelescope.eu"

Things to note here are

  • authorizedvo: is on a per queue basis now and published as such in the BDII so that VOs that use Dirac for example automatically go to the right queues.
  • ac_policy: is used by arc-vomsac-check to determine if the job is authorized to access that queue. ARC will not submit jobs if they are not authorized. The latter is important if you want only some groups to access certain resources. The way I did it is to declare authorizedvo in hiera and the code will also setup the ac_policy, so you don't have to write the same thing twice, just set "ac_policy: true" for each queue in hiera.
  • condor_requirements: these are added to the requirements the job arrives with. For example you can direct different queues to nodes with different OSs (OpSys =?= "LINUX") && (OpSysMajorVer =?= 7) or you can create a queue, like I did, that matches GPU nodes by adding a custom ClassAd condition (HasGPUs =?= True).

HTCondor setup

To select some nodes with a particular property you need to add the property to STARTD_ATTRS while to stop everyone using the nodes you need to add some job ClassAd to START on the WNs you want to group. The simplest way to do this is to install a file in /etc/condor/config.d which gets sourced after the startd configuration. In the ARC setup above for example I've added a condition HasGPUs this mean the job will be matched with nodes that respond positively to this.

The file to configure startd to have HasGPUs looks like this

[root@vm73 ~]# cat /etc/condor/config.d/21_man-gpu.config 
# HasGPUs is used to match the job with the nodes (i.e. the job requires the nodes which have HasGPUs)
HasGPUs = true
# This instead means the job will start only jobs that come from the GPU queue 
START = $(START)  && (NordugridQueue =?= "gpu")

HasGPUs is a WN attribute not a job one, so it cannot be added to START. START will tell startd which conditions the job has to satisfy to run it. In this case I want only the jobs arriving from the gpu queue to run and therefore I've added the (NordugridQueue =?= "gpu") condition to it.

Note that the startd configuration is in a file called /etc/condor/config.d/20_workernode.config so my GPU setup file will be executed afterwards because it starts with 21_man-gpu.config.

HTCondor has other more sophisticated mechanisms to access GPU nodes but they require further changes to submit-condor-job and the ARC CE to understand the request, so for now I haven't used this.

Still to do

Not every VO arrives with resources requirements and infact the time and memory values are ignored by most VOs. With the numbers in the arc.conf though it should be possible to add the limits to jobs that arrive without resources requirements like it happens in standard batch systems when the default queue values are set. This will require another iteration on submit-condor-job.