Difference between revisions of "Enable Queues on ARC HTCondor"

From GridPP Wiki
Jump to: navigation, search
(Puppet code changes)
Line 80: Line 80:
 
  -P %C/job.%I.proxy"
 
  -P %C/job.%I.proxy"
  
* The puppet module contained previous fixes to the ARC code, I've committed my fixes with a different version number (the ARC version). Fixes got applied if the apply_fixes parameter was set to true.  I modified the apply_fixes parameter from booleian to string and now people can select which fixes to apply. So if you want to apply these fixes you need to add to your arc_ce.yaml hiera
+
* The puppet module contained previous fixes to the ARC code, I've committed my fixes with a different version number (the ARC version). Fixes got applied if the apply_fixes parameter was set to true.  I modified the apply_fixes parameter from booleian to string and now people can select which fixes to apply.  
 +
 
 +
=== How to apply the fixes above ===
 +
 
 +
If you don't use puppet you'll need to get the ARC.5.4.1 [https://github.com/afortiorama/arc_ce/tree/master/files/fixes patched files from github] and then change your arc.conf manually as described in the [https://www.gridpp.ac.uk/wiki/Enable_Queues_on_ARC_HTCondor#Setup setup section].
 +
 
 +
If you use puppet you can download my clone and to apply these fixes you need to add to your arc_ce.yaml hiera
  
 
  arc_ce::apply_fixes: 5.4.1
 
  arc_ce::apply_fixes: 5.4.1

Revision as of 08:44, 23 March 2018

Introduction

One of the major differences between a standard CREAM+Torque/maui system and ARC-CE+HTCondor is the lack of queues to partition the nodes.

ARC has a queue concept, however the system, in particular the BDII portion is not really coded to handle it if the batch system underneath doesn't have a queue concept. To make queues work I had to patch the ARC code for HTCondor. I use the arc_ce puppet module in HEP_puppet though lately my clone is diverging quite a bit. The code described below is in Manchester arc_ce I'm also opening issues for the upstream arc_ce but it seems to be abandoned.

Here is what I've done

Code changes

ARC code changes

  • I've modified Condor.pm, which publishes the numbers in the Glue2.0 portion of the BDII, to collect max and default walltime and cputime information from arc.conf rather than adding fixed numbers. The code I've added is
my %queue_ei= queue_extra_info($qname);

   $lrms_queue{maxwalltime} = $queue_ei{maxwalltime} || '';
   $lrms_queue{minwalltime} = $queue_ei{minwalltime} || '';
   $lrms_queue{defaultwallt} = $queue_ei{defaultwallt} || '';
   $lrms_queue{maxcputime} = $queue_ei{maxcputime} || '';
   $lrms_queue{mincputime} = $queue_ei{mincputime} || '';
   $lrms_queue{defaultcput} = $queue_ei{defaultcput} || '';

   $lrms_queue{status} = 1;
   return %lrms_queue;
}
sub queue_extra_info($){
   require ConfigParser;
   my $qname = shift;

   my $parser = ConfigParser->new($arcconf)
       or die "Cannot parse $arcconf config file";
   return $parser->get_section("queue/$qname");
}
  • I've added similar code in glue-generator.pl because several systems are still using glue 1.3

Puppet code changes

I've changed a bit support for queues. The main two changes are

  • I've removed the cluster authorizedvo default values in the class because they override those on a per queue basis.
  • I've changed the current handling to just another template snapshot rather than an extra class creating a resources. The template then just loops on an hash of queues in this way
<% @queues.each_pair do |queue_name, queue_data| %>
[queue/<%= queue_name -%>]
name="<%= queue_name -%>"
<% if queue_data['default_memory'] -%>
defaultmemory=<%= queue_data['default_memory'] %>
<% end -%>
<% if queue_data['comment'] -%>
comment="<%= queue_data['comment'] -%>"
[.....]
<% end -%>

I've put the template in a file called queues.erb instead of queue.erb to avoid overriding completely the previous method. The hiera snapshot is like before with some extra parameters even if they are not in the a-rex schema they can be used see time limits later.

  • The corresponding hiera data structure for the queues is as follows
arc_ce::queues:
 long:
   default_memory: 2048
   maxwalltime: 4320
   defaultwallt: 1440
   maxcputime: 4320
   defaultcput: 1440
   cpudistribution:
     - '8cpu:2'
   homogeneity: true
   condor_requirements: "(Opsys == \"linux\") && (OpSysMajorVer == 7)"
   authorized_vos:
     - atlas
     - vo.northgrid.ac.uk
   ac_policy: true
 medium:
   [.....]
 gpu:
   [.....]
  • I've added support for arc-vomsac-check (a tool that enables ACLs on a per queue basis in the grid-manager template
authplugin="ACCEPTED timeout=60,onfailure=fail,onsuccess=pass %W/libexec/arc/arc-vomsac-check -L %C/job.%I.local \
-P %C/job.%I.proxy"
  • The puppet module contained previous fixes to the ARC code, I've committed my fixes with a different version number (the ARC version). Fixes got applied if the apply_fixes parameter was set to true. I modified the apply_fixes parameter from booleian to string and now people can select which fixes to apply.

How to apply the fixes above

If you don't use puppet you'll need to get the ARC.5.4.1 patched files from github and then change your arc.conf manually as described in the setup section.

If you use puppet you can download my clone and to apply these fixes you need to add to your arc_ce.yaml hiera

arc_ce::apply_fixes: 5.4.1

Setup

ARC setup

After this if you setup the queues in hiera as described above you'll get the in your /etc/arc.conf like this

[queue/long]
name="long"
defaultmemory=2048
maxcputime=4320
maxwalltime=4320
condor_requirements="(OpSys =?= "LINUX") && (OpSysMajorVer =?= 7)"
cpudistribution=8cpu:2
authorizedvo="atlas"
ac_policy="VOMS: atlas"
authorizedvo="vo.northgrid.ac.uk"
ac_policy="VOMS: vo.northgrid.ac.uk"
[queue/gpu]
name="gpu"
defaultmemory=2048
maxcputime=4320
maxwalltime=4320
condor_requirements="(OpSys =?= "LINUX") && (OpSysMajorVer =?= 7) && (HasGPUs =?= True)"
authorizedvo="atlas"
ac_policy="VOMS: atlas"
authorizedvo="lhcb"
ac_policy="VOMS: lhcb"
authorizedvo="vo.northgrid.ac.uk"
ac_policy="VOMS: vo.northgrid.ac.uk"
authorizedvo="icecube"
ac_policy="VOMS: icecube"
authorizedvo="lsst"
ac_policy="VOMS: lsst"
authorizedvo="skatelescope.eu"
ac_policy="VOMS: skatelescope.eu"

Things to note here are

  • authorizedvo: is on a per queue basis now and published as such in the BDII so that VOs that use Dirac for example automatically go to the right queues.
  • ac_policy: is used by arc-vomsac-check to determine if the job is authorized to access that queue ARC will not submit jobs if they are not authorized. The latter is important if you want only some groups to access certain resources. The way I did it is to declare authorizedvo in hiera and the code will also setup the ac_policy, so you don't have to write the same things twice, just set "ac_policy: true" for each queue in hiera.
  • condor_requirements: these are added to the requirements the job arrives with. For example you can direct different queues to nodes with different OSs (OpSys =?= "LINUX") && (OpSysMajorVer =?= 7) or you can create a queue, like I did, that matches GPU nodes by adding a custom ClassAd condition (HasGPUs =?= True).

HTCondor setup

To select some nodes with a particular property you need to add the property to STARTD_ATTRS while to stop everyone using the nodes you need to add some job ClassAd to START on the WNs you want to group. The simplest way to do this is to install a file in /etc/condor/config.d which gets sourced after the startd configuration. In the ARC setup above for example I've added a condition HasGPUs this mean the job will be matched with nodes that respond positively to this.

The file to configure startd to have HasGPUs looks like this

[root@vm73 ~]# cat /etc/condor/config.d/21_man-gpu.config 
# HasGPUs is used to match the job with the nodes (i.e. the job requires the nodes which have HasGPUs)
HasGPUs = true
STARTD_ATTRS = $(STARTD_ATTRS), HasGPUs
# This instead means the job will start only jobs that come from the GPU queue 
START = $(START)  && (NordugridQueue =?= "gpu")

HasGPUs is a WN attribute not a job one, so it cannot be added to START. START will tell startd which conditions the job has to satisfy to run it. In this case I want only the jobs arriving from the gpu queue to run and therefore I've added the (NordugridQueue =?= "gpu") condition to it.

Note that the startd configuration is in a file called /etc/condor/config.d/20_workernode.config so my GPU setup file will be executed afterwards because it starts with 21_man-gpu.config.

HTCondor has other more sophisticated mechanisms to access GPU nodes but they require further changes to submit-condor-job and the ARC CE to understand the request, so for now I haven't used this.

Still to do

Not every VO arrives with resources requirements and infact the time and memory values are ignored by most VOs. With the numbers in the arc.conf though it should be possible to add the limits to jobs that arrive without resources requirements like it happens in standard batch systems when the default queue values are set. This will require another iteration on submit-condor-job.