Difference between revisions of "Example Build of an ARC/Condor Cluster"

From GridPP Wiki
Jump to: navigation, search
(Created page with "=Introduction= =Infrastructure/Fabric= =Head Node Build= =Worker Node Build= =Performance and Tuning= =Further Work= #================== EXAMPLE TEXT =========...")
 
(Introduction)
 
(382 intermediate revisions by 5 users not shown)
Line 1: Line 1:
=Introduction==Infrastructure/Fabric==Head Node Build==Worker Node Build==Performance and Tuning==Further Work=
+
=Introduction=
  
  
  
 +
NOTE: This document is based on an ARC 5 set-up, which is behind the newest releases. New installations should consider using ARC 6 instead, although some config options are quite different.
  
 +
We initially installed ARC to support multicore at our site. A multicore job is one which needs to use more than one processor on a node. Before 2014, multicore jobs were not been used much on the grid infrastructure. This has changed because Atlas and other large users have asked sites to enable multicore on their clusters.
  
 +
Unfortunately, it is not just a simple task of setting some parameter on the head node and sitting back while jobs arrive. Different grid systems have varying levels of support for multicore, ranging from non-existent to virtually full support.
  
 +
This report discusses the multicore configuration at Liverpool. We decided to build a cluster using one of the most capable batch systems currently available, called HTCondor (or CONDOR for short). We also decided to front the system with an ARC CE.
  
 +
I thank Andrew Lahiff at RAL for the initial configuration and many suggestions and help. Some links to some of Andrew's material are in the “See Also” section.
  
#================== EXAMPLE TEXT ==================
+
=Important Documents=
  
 +
You'll need a copy of the ARC System Admin Manual.
  
The GridPP Project Management Board has agreed that up to 10 % of GridPP's processing capability should be allocated for non-LHC work. VOs that access the Grid like this must become [[Policies_for_GridPP_approved_VOs|Approved VOs]]; policies for managing them are described here: [[Policies_for_GridPP_approved_VOs]].
+
  http://www.nordugrid.org/documents/arc-ce-sysadm-guide.pdf
  
The tables below indicate VOs that the GridPP PMB has approved, and the PMB encourages support for these VOs at all of its collaborating sites. Information about all European Grid Initiative (EGI), global and local VOs is given in the [http://operations-portal.egi.eu/ Operations portal] which is the main reference source for VO information (including VO manager, end-points, requirements etc.).
+
And a copy of the Condor  System Admin Manual (this one is for 8.2.10).
  
'''Best practise tip: ''' In order to facilitate the support of these VOs at sites, a regularly-updated tarball containing site-info.def.frag file and VO.D directory is available: [http://hep.ph.liv.ac.uk/VomsSnooper/glitecfg.tar glitecfg.tar]
+
research.cs.wisc.edu/htcondor/manual/v8.2/condor-V8_2_10-Manual.pdf
  
 +
In addition to that, read these notes on Condor/CGROUPS, here:
  
'''NOTA BENE'''
+
https://www.gridpp.ac.uk/wiki/Enable_Cgroups_in_HTCondor
  
Some sections in this document are automatically updated from the CIC Portal (approximately once a week). Please do not change the tables in the '''VO Yaim Records''' or the '''VO Resource Requirements''' sections.
+
And this JURA document will help with the accounting.
  
 +
http://www.nordugrid.org/documents/jura-tech-doc.pdf
  
 +
=Infrastructure/Fabric=
  
==Approved EGI VOs==
+
The multicore cluster consists of an SL6 headnode to run the ARC CE and the Condor batch system. The headnode has a dedicated set of 132 workernodes of various types, providing a total of around 1100 single threads of execution, which I shall call unislots, or slots for short.
 +
 
 +
==Head Node ==
 +
 
 +
The headnode is a virtual system running on KVM.
  
 
{|border="1" cellpadding="1"  
 
{|border="1" cellpadding="1"  
|+Approved [http://wiki.egee-see.org/index.php/Extended_VO_Information EGEE VOs]
+
|+Head node hardware
 
|-style="background:#7C8AAF;color:white"
 
|-style="background:#7C8AAF;color:white"
!Name
+
!Host Name
!Area
+
!OS
!Contact
+
!CPUs
 +
!RAM
 +
!Disk Space (mostly /var)
  
  
 
|-
 
|-
|[http://aliceinfo.cern.ch alice]
+
|hepgrid2.ph.liv.ac.uk
|LHC experiment at CERN
+
|SL6.4
|
+
|5
 +
|10 gig
 +
|55 gig
 +
|}
  
|-
+
== Worker nodes ==
|[http://atlas.web.cern.ch/Atlas/index.html atlas]
+
|LHC experiment at CERN
+
|
+
  
|-
+
This is the output of our [https://indico.cern.ch/event/577279/contributions/2359189/attachments/1367049/2071368/SiteLayoutDB.pdf Site Layout Database], showing how ARC/Condor cluster is made up. All the nodes currently run SL6.4.
|[http://egee-na4.ct.infn.it/biomed/ biomed]
+
|Medical image processing and biomedical data processing
+
|
+
  
|-
+
'''Workernode types'''
|[http://cmsinfo.cern.ch/Welcome.html cms]
+
|LHC experiment at CERN
+
|
+
  
 +
<table border="1">
 +
  <tr>
 +
    <td>Node type name</td>
 +
    <td>CPUs per node</td>
 +
    <td>Slots per node</td>
 +
    <td>HS06 per slot</td>
 +
    <td>GB per slot</td>
 +
    <td>Scale factor</td>
 +
  </tr>
 +
  <tr>
 +
    <td>BASELINE</td>
 +
    <td>0</td>
 +
    <td>0</td>
 +
    <td>10.0</td>
 +
    <td>0.0</td>
 +
    <td>0.0</td>
 +
  </tr>
 +
  <tr>
 +
    <td>L5420</td>
 +
    <td>2</td>
 +
    <td>8</td>
 +
    <td>8.9</td>
 +
    <td>2.0</td>
 +
    <td>0.89</td>
 +
  </tr>
 +
  <tr>
 +
    <td>E5620</td>
 +
    <td>2</td>
 +
    <td>12</td>
 +
    <td>10.6325</td>
 +
    <td>2.0</td>
 +
    <td>1.0632</td>
 +
  </tr>
 +
  <tr>
 +
    <td>X5650</td>
 +
    <td>2</td>
 +
    <td>24</td>
 +
    <td>8.66</td>
 +
    <td>2.08</td>
 +
    <td>0.866</td>
 +
  </tr>
 +
  <tr>
 +
    <td>E5-2630</td>
 +
    <td>2</td>
 +
    <td>23</td>
 +
    <td>11.28</td>
 +
    <td>2.17</td>
 +
    <td>1.128</td>
 +
  </tr>
 +
  <tr>
 +
    <td>E5-2630V3</td>
 +
    <td>2</td>
 +
    <td>32</td>
 +
    <td>11.07</td>
 +
    <td>4.12</td>
 +
    <td>1.107</td>
 +
  </tr>
 +
</table>
  
|-
+
'''Rack layout'''
|[http://it-div-gd.web.cern.ch/it-div-gd/ dteam]
+
|Default VO for EGI/NGI deployment
+
|
+
  
|-
+
<table border="1">
|[http://datagrid.nadc.nl/twiki/bin/view/ESR/WebHome esr]
+
  <tr>
|Earth Science Research covering Solid Earth, Ocean, Atmosphere and their interfaces.
+
    <td>Node Set</td>
|
+
    <td>Node Type</td>
 +
    <td>Node Count</td>
 +
    <td>CPUs in Node</td>
 +
    <td>Slots per Node</td>
 +
    <td>HS06 per Slot</td>
 +
    <td>HS06</td>
 +
  </tr>
 +
  <tr>
 +
    <td>21</td>
 +
    <td>E5620</td>
 +
    <td>4</td>
 +
    <td>2</td>
 +
    <td>12</td>
 +
    <td>10.6325</td>
 +
    <td>510.36</td>
 +
  </tr>
 +
  <tr>
 +
    <td>21X</td>
 +
    <td>X5650</td>
 +
    <td>16</td>
 +
    <td>2</td>
 +
    <td>24</td>
 +
    <td>8.66</td>
 +
    <td>3325.4399</td>
 +
  </tr>
 +
  <tr>
 +
    <td>22</td>
 +
    <td>E5620</td>
 +
    <td>20</td>
 +
    <td>2</td>
 +
    <td>12</td>
 +
    <td>10.6325</td>
 +
    <td>2551.7999</td>
 +
  </tr>
 +
  <tr>
 +
    <td>23p1</td>
 +
    <td>E5620</td>
 +
    <td>10</td>
 +
    <td>2</td>
 +
    <td>12</td>
 +
    <td>10.6325</td>
 +
    <td>1275.9</td>
 +
  </tr>
 +
  <tr>
 +
    <td>26</td>
 +
    <td>E5-2630</td>
 +
    <td>4</td>
 +
    <td>2</td>
 +
    <td>23</td>
 +
    <td>11.28</td>
 +
    <td>1037.76</td>
 +
  </tr>
 +
  <tr>
 +
    <td>26L</td>
 +
    <td>L5420</td>
 +
    <td>7</td>
 +
    <td>2</td>
 +
    <td>8</td>
 +
    <td>8.9</td>
 +
    <td>498.4</td>
 +
  </tr>
 +
  <tr>
 +
    <td>26V</td>
 +
    <td>E5-2630V3</td>
 +
    <td>5</td>
 +
    <td>2</td>
 +
    <td>32</td>
 +
    <td>11.07</td>
 +
    <td>1771.2</td>
 +
  </tr>
 +
</table>
  
|-
+
'''General cluster properties'''
|[https://listserv.physics.utoronto.ca/pipermail/lcg-admin/2005-November/000017.html geant4]
+
|Geant4 is a Monte Carlo simulation toolkit which emulates the interactions of particles.
+
|
+
  
|-
+
<table border="1">
|[http://lhcb.web.cern.ch/lhcb/ lhcb]
+
  <tr>
|LHC experiment at CERN
+
    <td>HS06</td>
|
+
    <td> 10970.86</td>
 +
  </tr>
 +
  <tr>
 +
    <td>Physical CPUs</td>
 +
    <td>132</td>
 +
  </tr>
 +
  <tr>
 +
    <td>Logical CPUs (slots)</td>
 +
    <td>1100</td>
 +
  </tr>
 +
  <tr>
 +
    <td>Cores</td>
 +
    <td>8.333</td>
 +
  </tr>
 +
  <tr>
 +
    <td>Benchmark</td>
 +
    <td>9.974</td>
 +
  </tr>
 +
  <tr>
 +
    <td>CE_SI00</td>
 +
    <td>2493</td>
 +
  </tr>
 +
  <tr>
 +
    <td>CPUScalingReferenceSI00</td>
 +
    <td>2500</td>
 +
  </tr>
 +
</table>
  
|-
+
=Software Builds and Configuration=
|[http://savannah.fzk.de/websites/fzk/magic/content.htm magic]
+
|[http://wwwmagic.mppmu.mpg.de/ Gamma ray telescope] - Monte Carlo event production
+
|
+
  
|-
+
There are a few particulars of the Liverpool site that I want to get out of the way to start with.  For the initial installation of an operating system on our head nodes and worker nodes, we use tools developed at Liverpool (BuildTools) based on Kickstart, NFS, TFTP and DHCP. The source (synctool.pl and linktool.pl) can be obtained from sjones@hep.ph.liv.ac.uk. Alternatively, similar functionality is said to exist in the [http://en.wikipedia.org/wiki/Cobbler_%28software%29 Cobbler] suite, which is released as Open Source and some sites have based their initial install on that. Once the OS is on, the first reboot starts [http://puppetlabs.com/ Puppet] to give a personality to the node.  Puppet is becoming something of a de-facto standard in its own right, so I'll use some puppet terminology within this document where some explanation of a particular feature is needed.
|[http://wwwas.oat.ts.astro.it/planck-egee/planck.htm planck]
+
|Satellite project for mapping Cosmic Microwave Background
+
|
+
  
|-
+
== Special Software Control Measures ==
|[http://www.t2k.org t2k.org]
+
The software for the installation is all contained in various yum repositories. Here at Liverpool, we maintain two mirrored copies of the yum material. One of them, the online repository, is mirrored daily from the Internet. It is not used for any installation. The other copy, termed the local repository, is used to take a snapshot when necessary of the online repository. Installations are done from the local repository. Thus we maintain precise control of the software we use. There is no need to make any further reference to this set-up.
|Next Generation Long Baseline Neutrino Oscillation Experiment
+
|Ben Still (QMUL)
+
  
 +
We'll start with the headnode and "work down" so to speak.
  
|-
+
== Yum repos  ==
|[https://wiki.egi.eu/wiki/OPS_vo ops]
+
|The OPS VO is an infrastructure VO that MUST be enabled by all EGI Resource Centres that support the VO concept
+
|
+
|}
+
  
== Approved Global VOs==
+
This table shows the origin of the software releases via yum repositories.
  
 
+
{|border="1" cellpadding="1"  
{|border="1" cellpadding="1"
+
|+Yum Repositories
|+Approved Global VOs
+
 
|-style="background:#7C8AAF;color:white"
 
|-style="background:#7C8AAF;color:white"
!Name
+
!Product
!Area
+
!Where
!Contact
+
!Yum repo
 +
!Source
 +
!Keys
  
 
|-
 
|-
|[http://polywww.in2p3.fr/activites/physique/flc/calice.htmlD calice]
+
|ARC
|CAlorimeter for the LInear Collider  Experiment
+
|Head node
|Roman  Poeschl
+
|http://download.nordugrid.org/repos/15.03/centos/el6/x86_64/base, download.nordugrid.org/repos/15.03/centos/el6/x86_64/updates
 +
| http://download.nordugrid.org/repos/15.03/centos/el6/source
 +
|http://download.nordugrid.org/RPM-GPG-KEY-nordugrid
  
  
 
|-
 
|-
|[http://www-cdf.fnal.gov/ cdf]
+
|VomsSnooper
|Particle physics detector at Fermilab
+
|Head node
|
+
|http://www.sysadmin.hep.ac.uk/rpms/fabric-management/RPMS.vomstools/
 +
|null
 +
|null
  
|-
 
|[http://www-d0.fnal.gov/ dzero]
 
|D0 - particle physics detector for proton antiproton collision studies
 
|
 
  
 
|-
 
|-
|[http://fusion.bifi.unizar.es/ fusion]
+
|Condor (we use 8.2.X):
|FUSION Virtual Organization was created to support the EGEE-II NA4 activity
+
|Head and Worker
|Ruben Valles
+
|http://research.cs.wisc.edu/htcondor/yum/stable/rhel6
 
+
|null
 +
|null
  
  
 
|-
 
|-
|[http://www-h1.desy.de/ hone]
+
|WLCG
|Experiment located at DESY for positron-proton collision studies
+
|Head and Worker
|
+
|http://linuxsoft.cern.ch/wlcg/sl6/x86_64/
 
+
|null
|-
+
|null
|[http://www-flc.desy.de/flc/ ilc]
+
|International Linear Collider project (future electron-positron linear collider studies)
+
|
+
  
  
 
|-
 
|-
|[http://sixtrack.web.cern.ch/SixTrack/ vo.sixt.cern.ch]
+
|Trust anchors
|Single particle tracking studies for LHC
+
|Head and Worker
|
+
|http://repository.egi.eu/sw/production/cas/1/current/
 +
|null
 +
|null
  
 
|-
 
|-
|[http://www-zeus.desy.de/ zeus]
+
|Puppet
|Particle physics experiment on DESY's electron-proton collider (HERA)
+
|Head and Worker
|
+
|http://yum.puppetlabs.com/el/6/products/x86_64
 +
|null
 +
|null
  
 
|-
 
|-
|[http://na62.web.cern.ch/NA62/ na62.vo.gridpp.ac.uk]
+
|epel
|Another CP violation experiment at CERN
+
|Head and worker
|[http://www.ppe.gla.ac.uk/~protopop/ Dan Protopopescu] ([http://www.gla.ac.uk Uni. Glasgow])
+
|http://download.fedoraproject.org/pub/epel/6/x86_64/
 +
|null
 +
|null
  
 
|-
 
|-
|[http://superb.infn.it/ superbvo.org]
+
|emi (to be phased out, June 2017; use UMD)
|An international enterprise aiming at the construction of a very high luminosity asymmetric e+e- flavour factory.  
+
|Head and Worker
|Luca Tomassetti
+
|http://emisoft.web.cern.ch/emisoft/dist/EMI/3/sl6//x86_64/base,http://emisoft.web.cern.ch/emisoft/dist/EMI/3/sl6//x86_64/third-party, http://emisoft.web.cern.ch/emisoft/dist/EMI/3/sl6//x86_64/updates
 +
|null
 +
|null
  
 
|-
 
|-
|[https://w3.hepix.org/ipv6-bis/doku.php?id=ipv6:introduction ipv6.hepix.org]
+
|CernVM-packages:
|Testing of IPv6 of the middleware, applications and tools (HEP, EGI, middleware technology providers and other infrastructures used by WLCG).
+
|Worker
|Chris Walker, Dave Kelsey
+
|http://map2.ph.liv.ac.uk//yum/cvmfs/EL/6.4/x86_64/
 +
|null
 +
|http://cvmrepo.web.cern.ch/cvmrepo/yum/RPM-GPG-KEY-CernVM
 +
 
  
|-
 
|
 
|'''The VOs below are not in the CIC Portal data'''
 
|
 
|-
 
|*
 
|*
 
|*
 
 
|}
 
|}
  
== Approved Local VOs==
+
==Head Node ==
  
{|border="1" cellpadding="1"
 
|+Approved Local VOs
 
  
|-style="background:#7C8AAF;color:white"
+
=== Head Standard build ===
!Name
+
!Area
+
!Contact
+
  
|-
+
The basis for the initial build follows the standard model for any grid server node at Liverpool. I won't explain that in detail – each site is likely to have its own standard, which is general to all the components used to build any grid node (such as a CE, ARGUS, BDII, TORQUE etc.) but prior to any middleware. Such a baseline build might include networking, iptables, nagios scripts, ganglia, ssh etc.
|[http://www.gridpp.ac.uk/ gridpp]
+
|GridPP is a collaboration of particle physicists and computer scientists from the UK and CERN
+
|Jeremy Coles (Cambridge)
+
  
|-
+
=== Head Extra Directories ===
|[http://www.phenogrid.dur.ac.uk/ pheno]
+
|A collaboration of UK Particle Physics Phenomenologists who are developing applications for the LHC
+
|David Grellscheid (Durham)
+
  
 +
I had to make these specific directories myself:
  
|-
+
/etc/arc/runtime/ENV
|[http://mice.iit.edu  mice]
+
/etc/condor/ral
|A neutrino factory experiments
+
/etc/lcmaps/
|Paul Hodgson (Sheffield)
+
/root/glitecfg/services
 +
/root/scripts
 +
/var/spool/arc/debugging
 +
/var/spool/arc/grid
 +
/var/spool/arc/jobstatus
 +
/var/urs
  
|-
+
=== Head Additional Packages ===
|[https://voms.gridpp.ac.uk:8443/voms/camont camont]
+
|Image Processing in High Energy Physics
+
|Karl Harrison harrison@hep.phy.cam.ac.uk, Cambridge Ontology Ltd
+
  
 +
These packages were needed to add the middleware required, i.e. ARC, Condor and ancillary material.
  
 +
{|border="1" cellpadding="1"
 +
|+Additional Packages 
 +
|-style="background:#7C8AAF;color:white"
 +
!Package
 +
!Description
  
 
|-
 
|-
|[http://snoplus.phy.queensu.ca/Home.html snoplus.snolab.ca]
+
|nordugrid-arc-compute-element
| A  Diverse Instrument for Neutrino Research within the SNOLAB Underground facility
+
|The ARC CE Middleware
|Jeanne Wilson (QMU), [http://www.sussex.ac.uk/profiles/287073 Matt Mottram] (Uni. Sussex)
+
  
 
|-
 
|-
|[http://www.geog.leeds.ac.uk/projects/neiss/about.php neiss.org.uk]
+
|condor
|National e-Infrastructure for Social Simulation
+
|HT Condor, the main batch server package (we are on 8.2.7)
|Sam Skipsey (Glasgow), June Finch
+
  
 
|-
 
|-
|[https://www.gridpp.ac.uk/wiki/Vo.londongrid.ac.uk vo.londongrid.ac.uk]
+
|apel-client
|the regional VO for LondonGrid; provides access to members of the universities and academic institutes of London 
+
|Accounting, ARC/Condor bypasses the APEL server and goes direct.
|Rand Duncan
+
  
|-
 
|[https://voms.gridpp.ac.uk:8443/voms/vo.northgrid.ac.uk vo.northgrid.ac.uk]
 
|Regional VO to allow access to HEP resources to different local disciplines.
 
|Alessandra Forti
 
  
 
|-
 
|-
|[http://www.southgrid.ac.uk/VO/ vo.southgrid.ac.uk]
+
|ca_policy_igtf-classic
|The VO is for academic and other users in the SouthGrid region to test access to EGI resources.
+
|Certificates
|Peter Gronbech
+
  
 
|-
 
|-
|[http://www.sruc.ac.uk/epic/ epic.vo.gridpp.ac.uk]
+
|lcas-plugins-basic
|Veterinary epidemiology in Scotland
+
|Security
|Thomas Doherty
+
  
 
|-
 
|-
|[http://www.hyperk.org hyperk.org]
+
|lcas-plugins-voms
|The Hyper-Kamiokande experiment
+
|Security
|Christopher Walker, Francesca di lodovico
+
 
+
  
 
|-
 
|-
|[http://www.thelangtonstarcentre.org/index.php/cernschool cernatschool.org]
+
|lcas
|The [[CERN@school]] project.
+
|Security
|Steve Lloyd (QML), [http://pprc.qmul.ac.uk/directory/t.whyntie Tom Whyntie] (QML, Langton Star Centre)
+
|-
+
|
+
|'''The VOs below are not in the CIC Portal data'''
+
|
+
  
 
|-
 
|-
|earthsci.vo.gridpp.ac.uk
+
|lcmaps
|TBD
+
|Security
|TBD
+
  
 
|-
 
|-
|*
+
|lcmaps-plugins-basic
|*
+
|Security
|*
+
 
+
|}
+
 
+
== Other VOs ==
+
 
+
This area can be used to record information about VOs that are site specific or localised in a region. It will help to provide this if you have a VO you would like supported elsewhere
+
 
+
{|border="1" cellpadding="1"
+
|+Other VOs
+
 
+
|-style="background:#7C8AAF;color:white"
+
!Name
+
!Area
+
!Contact
+
 
+
 
+
  
 
|-
 
|-
|vo.landslides.mossaic.org
+
|lcmaps-plugins-c-pep
|The landslides VO belongs to the Mossaic project (http://mossaic.org/).
+
|Security
|Luke Kreczko (L.Kreczko@bristol.ac.uk)
+
  
 
|-
 
|-
|enmr.eu
+
|lcmaps-plugins-verify-proxy
|unk
+
|Security
|unk
+
  
 
|-
 
|-
|none
+
|lcmaps-plugins-voms
|none
+
|Security
|none
+
  
|}
 
 
== Approved VOs in the process of being established ==
 
 
As part of its commitment to various projects, the GridPP PMB has approved the establishment of the following VOs (your site can not yet support these but when the VO is setup and functioning we will let you know!)
 
 
{|border="1" cellpadding="1"
 
|+VOs being established
 
 
|-style="background:#7C8AAF;color:white"
 
!Name
 
!Area
 
!Contact
 
  
 
|-
 
|-
|-
+
|globus-ftp-control
|-
+
|Extra packages for Globus
|-
+
 
+
 
+
|}
+
 
+
== VOs that have been removed from approved list ==
+
 
+
The table below comprises a history of VOs that have been removed from the approved list for various reasons.
+
 
+
{|border="1" cellpadding="1"
+
|+VOs that have been removed
+
 
+
|-style="background:#7C8AAF;color:white"
+
!Name
+
!Date of removal
+
!Notes
+
  
 
|-
 
|-
|babar
+
|globus-gsi-callback
|9 Oct 2013
+
|Extra packages for Globus
|none
+
  
|-
 
|camont.gridpp.ac.uk
 
|9 Oct 2013
 
|The simply named "camont" is still approved. This was a duplicate/error.
 
 
|-
 
|cedar
 
|9 Oct 2013
 
|none
 
 
|-
 
|ltwo
 
|9 Oct 2013
 
|none
 
  
 
|-
 
|-
|minos.vo.gridpp.ac.uk
+
|VomsSnooper
|9 Oct 2013
+
|VOMS Helper, used to set up the LSC (list of Certificates) files
|none
+
  
 
|-
 
|-
|na48
+
|glite-yaim-core
|9 Oct 2013
+
|Yaim,just use Yaim to make accounts.
|none
+
  
 
|-
 
|-
|ngs.ac.uk
+
|yum-plugin-priorities.noarch
|9 Oct 2013
+
|Helpers for Yum
|none
+
  
 
|-
 
|-
|supernemo.vo.eu-egee.org
+
|yum-plugin-protectbase.noarch
|9 Oct 2013
+
|Helpers for Yum
|none
+
  
 
|-
 
|-
|totalep
+
|yum-utils
|9 Oct 2013
+
|Helpers for Yum
|none
+
  
 
|}
 
|}
  
== Example site-info.def entries ==
 
  
The following examples are updated from originals given by Pete Gronbech.
+
=== Head Files ===
  
Not all the VOs below are approved by GridPP but many are enabled within GridPP sites.
+
The following set of files were additionally installed. Some of them are empty. Some of them can be used as they are. Others have to be edited to fit your site. Any that is a script must have executable permissions (e.g. 755).  
More details can often be found on the CIC portal http://cic.gridops.org/index.php?section=home&page=volist
+
  
=== General notes and settings ===
+
 +
* '''File:''' /etc/arc.conf
 +
* Notes:
 +
* The main configuration file of the ARC CE. It adds support for scaling factors, APEL reporting, ARGUS Mapping, BDII  publishing (power and scaling), multiple VO support, and default limits.
 +
* '''Special note: ''' Ext3 has a limit of 31998 directories in the sessiondir. This limit is easily breeched on a large cluster. Either use (say) xfs, or define multiple sessiondir variables to spread the load to several directories, as per “ARC CE System Administrator Guide”.
 +
* Customise: Yes. You'll need to edit it it to suit your site. Please see the [[Publishing tutorial]].
 +
* Content:
 +
 +
 +
[common]
 +
debug="1"
 +
x509_user_key="/etc/grid-security/hostkey.pem"
 +
x509_user_cert="/etc/grid-security/hostcert.pem"
 +
x509_cert_dir="/etc/grid-security/certificates"
 +
gridmap="/etc/grid-security/grid-mapfile"
 +
lrms="condor"
 +
hostname="hepgrid2.ph.liv.ac.uk"
 +
 +
[grid-manager]
 +
debug="3"
 +
logsize=30000000 20
 +
enable_emies_interface="yes"
 +
arex_mount_point="https://hepgrid2.ph.liv.ac.uk:443/arex"
 +
user="root"
 +
controldir="/var/spool/arc/jobstatus"
 +
sessiondir="/var/spool/arc/grid"
 +
runtimedir="/etc/arc/runtime"
 +
logfile="/var/log/arc/grid-manager.log"
 +
pidfile="/var/run/grid-manager.pid"
 +
joblog="/var/log/arc/gm-jobs.log"
 +
shared_filesystem="no"
 +
authplugin="PREPARING timeout=60,onfailure=pass,onsuccess=pass /usr/local/bin/default_rte_plugin.py %S %C %I ENV/GLITE"
 +
authplugin="FINISHING timeout=60,onfailure=pass,onsuccess=pass /usr/local/bin/scaling_factors_plugin.py %S %C %I"
 +
# This copies the files containing useful output from completed jobs into a directory /var/spool/arc/debugging
 +
#authplugin="FINISHED timeout=60,onfailure=pass,onsuccess=pass /usr/local/bin/debugging_rte_plugin.py %S %C %I"
 +
 +
mail="root@hep.ph.liv.ac.uk"
 +
jobreport="APEL:http://mq.cro-ngi.hr:6162"
 +
jobreport_options="urbatch:1000,archiving:/var/urs,topic:/queue/global.accounting.cpu.central,gocdb_name:UKI-NORTHGRID-LIV-HEP,use_ssl:true,Network:PROD,benchmark_type:Si2k,benchmark_value:2500.00"
 +
jobreport_credentials="/etc/grid-security/hostkey.pem /etc/grid-security/hostcert.pem /etc/grid-security/certificates"
 +
jobreport_publisher="jura_dummy"
 +
# Disable (1 month !)
 +
jobreport_period=2500000
 +
 +
[gridftpd]
 +
debug="1"
 +
logsize=30000000 20
 +
user="root"
 +
logfile="/var/log/arc/gridftpd.log"
 +
pidfile="/var/run/gridftpd.pid"
 +
port="2811"
 +
allowunknown="yes"
 +
globus_tcp_port_range="20000,24999"
 +
globus_udp_port_range="20000,24999"
 +
maxconnections="500"
 +
 +
#
 +
# Notes:
 +
#
 +
# The first two args are implicitly given to arc-lcmaps, and are
 +
#    argv[1] - the subject/DN
 +
#    argv[2] - the proxy file
 +
#
 +
# The remain attributes are explicit, after the "lcmaps" field in the examples below.
 +
#    argv[3] - lcmaps_library
 +
#    argv[4] - lcmaps_dir
 +
#    argv[5] - lcmaps_db_file
 +
#    argv[6 etc.] - policynames
 +
#
 +
# lcmaps_dir and/or lcmaps_db_file may be '*', in which case they are
 +
# fully truncated (placeholders).
 +
#
 +
# Some logic is applied. If the lcmaps_library is not specified with a
 +
# full path, it is given the path of the lcmaps_dir. We have to assume that
 +
# the lcmaps_dir is a poor name for that field, as discussed in the following
 +
# examples.
 +
#
 +
# Examples:
 +
#  In this example, used at RAL, the liblcmaps.so is given no
 +
#  path, so it is assumes to exist in /usr/lib64 (note the poorly
 +
#  named field - the lcmaps_dir is populated by a library path.)
 +
#
 +
# Fieldnames:      lcmaps_lib  lcmaps_dir lcmaps_db_file            policy
 +
#unixmap="* lcmaps liblcmaps.so /usr/lib64 /usr/etc/lcmaps/lcmaps.db arc"
 +
#
 +
#  In the next example, used at Liverpool, lcmaps_lib is fully qualified. Thus
 +
#  the lcmaps_dir is not used (although is does set the LCMAPS_DIR env var).
 +
#  In this case, the lcmaps_dir really does contain the lcmaps dir location.
 +
#
 +
# Fieldnames:      lcmaps_lib              lcmaps_dir  lcmaps_db_file policy
 +
unixmap="* lcmaps  /usr/lib64/liblcmaps.so /etc/lcmaps lcmaps.db      arc"
 +
unixmap="arcfailnonexistentaccount:arcfailnonexistentaccount all"
 +
 +
 +
[gridftpd/jobs]
 +
debug="1"
 +
path="/jobs"
 +
plugin="jobplugin.so"
 +
allownew="yes"
 +
 +
[infosys]
 +
debug="1"
 +
user="root"
 +
overwrite_config="yes"
 +
port="2135"
 +
registrationlog="/var/log/arc/inforegistration.log"
 +
providerlog="/var/log/arc/infoprovider.log"
 +
provider_loglevel="1"
 +
infosys_glue12="enable"
 +
infosys_glue2_ldap="enable"
 +
 +
[infosys/glue12]
 +
debug="1"
 +
resource_location="Liverpool, UK"
 +
resource_longitude="-2.964"
 +
resource_latitude="53.4035"
 +
glue_site_web="http://www.gridpp.ac.uk/northgrid/liverpool"
 +
glue_site_unique_id="UKI-NORTHGRID-LIV-HEP"
 +
cpu_scaling_reference_si00="2493"
 +
processor_other_description="Cores=8.333,Benchmark=9.974-HEP-SPEC06"
 +
provide_glue_site_info="false"
 +
 +
[infosys/admindomain]
 +
debug="1"
 +
name="UKI-NORTHGRID-LIV-HEP"
 +
 +
# infosys view of the computing cluster (service)
 +
[cluster]
 +
debug="1"
 +
name="hepgrid2.ph.liv.ac.uk"
 +
localse="hepgrid11.ph.liv.ac.uk"
 +
cluster_alias="hepgrid2 (UKI-NORTHGRID-LIV-HEP)"
 +
comment="UKI-NORTHGRID-LIV-HEP Main Grid Cluster"
 +
homogeneity="True"
 +
nodecpu="Intel(R) Xeon(R) CPU L5420 @ 2.50GHz"
 +
architecture="x86_64"
 +
nodeaccess="inbound"
 +
nodeaccess="outbound"
 +
#opsys="SL64"
 +
opsys="ScientificSL : 6.4 : Carbon"
 +
nodememory="3000"
 +
 +
authorizedvo="alice"
 +
authorizedvo="atlas"
 +
authorizedvo="biomed"
 +
authorizedvo="calice"
 +
authorizedvo="camont"
 +
authorizedvo="cdf"
 +
authorizedvo="cernatschool.org"
 +
authorizedvo="cms"
 +
authorizedvo="dteam"
 +
authorizedvo="dzero"
 +
authorizedvo="epic.vo.gridpp.ac.uk"
 +
authorizedvo="esr"
 +
authorizedvo="fusion"
 +
authorizedvo="geant4"
 +
authorizedvo="gridpp"
 +
authorizedvo="hyperk.org"
 +
authorizedvo="ilc"
 +
authorizedvo="lhcb"
 +
#authorizedvo="lz"
 +
authorizedvo="lsst"
 +
authorizedvo="magic"
 +
authorizedvo="mice"
 +
authorizedvo="na62.vo.gridpp.ac.uk"
 +
authorizedvo="neiss.org.uk"
 +
authorizedvo="ops"
 +
authorizedvo="pheno"
 +
authorizedvo="planck"
 +
authorizedvo="snoplus.snolab.ca"
 +
authorizedvo="t2k.org"
 +
authorizedvo="vo.northgrid.ac.uk"
 +
authorizedvo="zeus"
 +
 +
benchmark="SPECINT2000 2493"
 +
benchmark="SPECFP2000 2493"
 +
totalcpus=1100
 +
 +
[queue/grid]
 +
debug="1"
 +
name="grid"
 +
homogeneity="True"
 +
comment="Default queue"
 +
nodecpu="adotf"
 +
architecture="adotf"
 +
defaultmemory=3000
 +
maxrunning=1400
 +
totalcpus=1100
 +
maxuserrun=1400
 +
maxqueuable=2800
 +
#maxcputime=2880
 +
#maxwalltime=2880
 +
MainMemorySize="16384"
 +
OSFamily="linux"
 +
 +
 +
 
 +
 +
 +
* '''File:''' /etc/arc/runtime/ENV/GLITE
 +
* Notes: The GLITE runtime environment.
 +
* Content:
  
The records in the sections below were prepared by the [[VomsSnooper Tools]], that are located at http://www.sysadmin.hep.ac.uk/rpms/fabric-management/RPMS.vomstools/ .  
+
  #!/bin/sh
 +
 
 +
  export GLOBUS_LOCATION=/usr
 +
 
 +
  if [ "x$1" = "x0" ]; then
 +
    # Set environment variable containing queue name
 +
    env_idx=0
 +
    env_var="joboption_env_$env_idx"
 +
    while [ -n "${!env_var}" ]; do
 +
      env_idx=$((env_idx+1))
 +
      env_var="joboption_env_$env_idx"
 +
    done
 +
    eval joboption_env_$env_idx="NORDUGRID_ARC_QUEUE=$joboption_queue"
 +
 
 +
    export RUNTIME_ENABLE_MULTICORE_SCRATCH=1
 +
 
 +
  fi
 +
 
 +
  if [ "x$1" = "x1" ]; then
 +
    # Set grid environment
 +
    if [ -e /etc/profile.d/env.sh ]; then
 +
      source /etc/profile.d/env.sh
 +
    fi
 +
    if [ -e /etc/profile.d/zz-env.sh ]; then
 +
      source /etc/profile.d/zz-env.sh
 +
    fi
 +
    export LD_LIBRARY_PATH=/opt/xrootd/lib
 +
 
 +
    # Set basic environment variables
 +
    export GLOBUS_LOCATION=/usr
 +
    HOME=`pwd`
 +
    export HOME
 +
    USER=`whoami`
 +
    export USER
 +
    HOSTNAME=`hostname -f`
 +
    export HOSTNAME
 +
  fi
 +
 
 +
  export DPM_HOST=hepgrid11.ph.liv.ac.uk
 +
  export DPNS_HOST=hepgrid11.ph.liv.ac.uk
 +
  export GLEXEC_LOCATION=/usr
 +
  export RFIO_PORT_RANGE=20000,25000
 +
  export SITE_GIIS_URL=hepgrid4.ph.liv.ac.uk
 +
  export SITE_NAME=UKI-NORTHGRID-LIV-HEP
 +
  export MYPROXY_SERVER=lcgrbp01.gridpp.rl.ac.uk
 +
 
 +
 
 +
  export VO_ALICE_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 +
  export VO_ALICE_SW_DIR=/opt/exp_soft_sl5/alice
 +
  export VO_ATLAS_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 +
  export VO_ATLAS_SW_DIR=/cvmfs/atlas.cern.ch/repo/sw
 +
  export VO_BIOMED_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 +
  export VO_BIOMED_SW_DIR=/opt/exp_soft_sl5/biomed
 +
  export VO_CALICE_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 +
  export VO_CALICE_SW_DIR=/opt/exp_soft_sl5/calice
 +
  export VO_CAMONT_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 +
  export VO_CAMONT_SW_DIR=/opt/exp_soft_sl5/camont
 +
  export VO_CDF_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 +
  export VO_CDF_SW_DIR=/opt/exp_soft_sl5/cdf
 +
  export VO_CERNATSCHOOL_ORG_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 +
  export VO_CERNATSCHOOL_ORG_SW_DIR=/opt/exp_soft_sl5/cernatschool
 +
  export VO_CMS_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 +
  export VO_CMS_SW_DIR=/opt/exp_soft_sl5/cms
 +
  export VO_DTEAM_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 +
  export VO_DTEAM_SW_DIR=/opt/exp_soft_sl5/dteam
 +
  export VO_DZERO_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 +
  export VO_DZERO_SW_DIR=/opt/exp_soft_sl5/dzero
 +
  export VO_EPIC_VO_GRIDPP_AC_UK_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 +
  export VO_EPIC_VO_GRIDPP_AC_UK_SW_DIR=/opt/exp_soft_sl5/epic
 +
  export VO_ESR_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 +
  export VO_ESR_SW_DIR=/opt/exp_soft_sl5/esr
 +
  export VO_FUSION_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 +
  export VO_FUSION_SW_DIR=/opt/exp_soft_sl5/fusion
 +
  export VO_GEANT4_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 +
  export VO_GEANT4_SW_DIR=/opt/exp_soft_sl5/geant4
 +
  export VO_GRIDPP_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 +
  export VO_GRIDPP_SW_DIR=/opt/exp_soft_sl5/gridpp
 +
  export VO_HYPERK_ORG_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 +
  export VO_HYPERK_ORG_SW_DIR=/cvmfs/hyperk.egi.eu
 +
  export VO_ILC_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 +
  export VO_ILC_SW_DIR=/cvmfs/ilc.desy.de
 +
  export VO_LHCB_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 +
  export VO_LHCB_SW_DIR=/cvmfs/lhcb.cern.ch
 +
  export VO_LZ_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 +
  export VO_LZ_SW_DIR=/opt/exp_soft_sl5/lsst
 +
  export VO_LSST_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 +
  export VO_LSST_SW_DIR=/opt/exp_soft_sl5/lsst
 +
  export VO_MAGIC_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 +
  export VO_MAGIC_SW_DIR=/opt/exp_soft_sl5/magic
 +
  export VO_MICE_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 +
  export VO_MICE_SW_DIR=/cvmfs/mice.egi.eu
 +
  export VO_NA62_VO_GRIDPP_AC_UK_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 +
  export VO_NA62_VO_GRIDPP_AC_UK_SW_DIR=/cvmfs/na62.cern.ch
 +
  export VO_NEISS_ORG_UK_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 +
  export VO_NEISS_ORG_UK_SW_DIR=/opt/exp_soft_sl5/neiss
 +
  export VO_OPS_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 +
  export VO_OPS_SW_DIR=/opt/exp_soft_sl5/ops
 +
  export VO_PHENO_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 +
  export VO_PHENO_SW_DIR=/opt/exp_soft_sl5/pheno
 +
  export VO_PLANCK_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 +
  export VO_PLANCK_SW_DIR=/opt/exp_soft_sl5/planck
 +
  export VO_SNOPLUS_SNOLAB_CA_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 +
  export VO_SNOPLUS_SNOLAB_CA_SW_DIR=/cvmfs/snoplus.egi.eu
 +
  export VO_T2K_ORG_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 +
  export VO_T2K_ORG_SW_DIR=/cvmfs/t2k.egi.eu
 +
  export VO_VO_NORTHGRID_AC_UK_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 +
  export VO_VO_NORTHGRID_AC_UK_SW_DIR=/opt/exp_soft_sl5/northgrid
 +
  export VO_ZEUS_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 +
  export VO_ZEUS_SW_DIR=/opt/exp_soft_sl5/zeus
 +
 
 +
  export RUCIO_HOME=/cvmfs/atlas.cern.ch/repo/sw/ddm/rucio-clients/0.1.12
 +
  export RUCIO_AUTH_TYPE=x509_proxy
 +
 
 +
  export LCG_GFAL_INFOSYS="lcg-bdii.gridpp.ac.uk:2170,topbdii.grid.hep.ph.ic.ac.uk:2170"
 +
 
 +
  # Fix to circumvent Condor Globus Libraries
 +
  # (i.e. this error: lcg-cr: /usr/lib64/condor/libglobus_common.so.0: no version information available (required by /usr/lib64/libcgsi_plugin.so.1)
 +
  export LD_LIBRARY_PATH=/usr/lib64/:$LD_LIBRARY_PATH
 +
 
  
First, it is important to realize that many of the Yaim variables described below  can be given to Yaim
+
* '''File:''' /etc/condor/config.d/14accounting-groups-map.config
in one of two formats. The original format, for inclusion within the site-info.def file itself,
+
* Notes: Implements accounting groups, so that fairshares can be used that refer to whole groups of users, instead of individual ones.
are know as SID records. Due to restrictions with DNS style names (with dots in them) it
+
* Customise: Yes. You'll need to edit it to suit your site.
was later necessary to invent a new format, whereby the records are stored in their own
+
* Content:
separate files under a vo.d directory. This format is known as VODs format. To see an example,
+
scroll down to the ALICE VOMS records, below. Any VO can be defined well in VOD format,
+
but VOs with dots in their names are cumbersome to define in SID format.
+
  
In the boxes below, each VO has a set of records that are used by Yaim to configure a site. Some
+
  # Liverpool Tier-2 HTCondor configuration: accounting groups
other required records follow a generic pattern; these are VO_*_SW_DIR AND VO_*_DEFAULT_SE.
+
 
The values for these records are described generically as
+
  # Primary group, assign individual test submitters into the HIGHPRIO group,  
follows, using the Liverpool site-info.def as the modelThese will vary according to
+
  # else just assign job into primary group of its VO
the specifics at your site. In any case, by way of example (at Liverpool) the
+
 
following is defined first.
+
  LivAcctGroup = strcat("group_",toUpper( ifThenElse(regexp("sgmatl34",Owner),"highprio", ifThenElse(regexp("sgmops11",Owner),"highprio", ifThenElse(regexp("^alice", x509UserProxyVOName), "alice", ifThenElse(regexp("^atlas", x509UserProxyVOName), "atlas", ifThenElse(regexp("^biomed", x509UserProxyVOName), "biomed", ifThenElse(regexp("^calice", x509UserProxyVOName), "calice", ifThenElse(regexp("^camont", x509UserProxyVOName), "camont", ifThenElse(regexp("^cdf", x509UserProxyVOName), "cdf", ifThenElse(regexp("^cernatschool.org", x509UserProxyVOName), "cernatschool_org", ifThenElse(regexp("^cms", x509UserProxyVOName), "cms", ifThenElse(regexp("^dteam", x509UserProxyVOName), "dteam", ifThenElse(regexp("^dzero", x509UserProxyVOName), "dzero", ifThenElse(regexp("^epic.vo.gridpp.ac.uk", x509UserProxyVOName), "epic_vo_gridpp_ac_uk", ifThenElse(regexp("^esr", x509UserProxyVOName), "esr", ifThenElse(regexp("^fusion", x509UserProxyVOName), "fusion", ifThenElse(regexp("^geant4", x509UserProxyVOName), "geant4", ifThenElse(regexp("^gridpp", x509UserProxyVOName), "gridpp", ifThenElse(regexp("^hyperk.org", x509UserProxyVOName), "hyperk_org", ifThenElse(regexp("^ilc", x509UserProxyVOName), "ilc", ifThenElse(regexp("^lhcb", x509UserProxyVOName), "lhcb", ifThenElse(regexp("^lsst", x509UserProxyVOName), "lsst", ifThenElse(regexp("^magic", x509UserProxyVOName), "magic", ifThenElse(regexp("^mice", x509UserProxyVOName), "mice", ifThenElse(regexp("^na62.vo.gridpp.ac.uk", x509UserProxyVOName), "na62_vo_gridpp_ac_uk", ifThenElse(regexp("^neiss.org.uk", x509UserProxyVOName), "neiss_org_uk", ifThenElse(regexp("^ops", x509UserProxyVOName), "ops", ifThenElse(regexp("^pheno", x509UserProxyVOName), "pheno", ifThenElse(regexp("^planck", x509UserProxyVOName), "planck", ifThenElse(regexp("^snoplus.snolab.ca", x509UserProxyVOName), "snoplus_snolab_ca", ifThenElse(regexp("^t2k.org", x509UserProxyVOName), "t2k_org", ifThenElse(regexp("^vo.northgrid.ac.uk", x509UserProxyVOName), "vo_northgrid_ac_uk", ifThenElse(regexp("^zeus", x509UserProxyVOName), "zeus","nonefound"))))))))))))))))))))))))))))))))))
 +
 
 +
  # Subgroup
 +
  # For the subgroup, just assign job to the group of the owner (i.e. owner name less all those digits at the end).
 +
  # Also show whether multi or single core.
 +
  LivAcctSubGroup = strcat(regexps("([A-Za-z0-9]+[A-Za-z])\d+", Owner, "\1"),ifThenElse(RequestCpus > 1,"_mcore","_score"))
 +
 
 +
  # Now build up the whole accounting group
 +
  AccountingGroup = strcat(LivAcctGroup, ".", LivAcctSubGroup, ".", Owner)
 +
 
 +
  # Add these ClassAd specifications to the submission expressions
 +
  SUBMIT_EXPRS = $(SUBMIT_EXPRS) LivAcctGroup, LivAcctSubGroup, AccountingGroup
 +
   
 +
 +
* '''File:''' /etc/condor/config.d/11fairshares.config
 +
* Notes: Implements fair share settings, relying on groups of users.
 +
* Customise: Yes. You'll need to edit it to suit your site.
 +
* Content:
 +
  # Liverpool Tier-2 HTCondor configuration: fairshares
 +
 
 +
  # use this to stop jobs from starting.
 +
  # CONCURRENCY_LIMIT_DEFAULT = 0
 +
 
 +
  # Half-life of user priorities
 +
  PRIORITY_HALFLIFE = 259200
 +
 
 +
  # Handle surplus
 +
  GROUP_ACCEPT_SURPLUS = True
 +
  GROUP_AUTOREGROUP = True
 +
 
 +
  # Weight slots using CPUs
 +
  #NEGOTIATOR_USE_SLOT_WEIGHTS = True
 +
 
 +
  # See: https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=3271
 +
  NEGOTIATOR_ALLOW_QUOTA_OVERSUBSCRIPTION = False
 +
 
 +
  # Calculate the surplus allocated to each group correctly
 +
  NEGOTIATOR_USE_WEIGHTED_DEMAND = True
 +
 
 +
  GROUP_NAMES = \
 +
  group_HIGHPRIO,  \
 +
  group_ALICE,  \
 +
  group_ATLAS,  \
 +
  group_BIOMED,  \
 +
  group_CALICE,  \
 +
  group_CAMONT,  \
 +
  group_CDF,  \
 +
          group_LSST,  \
 +
  group_CERNATSCHOOL_ORG,  \
 +
  group_CMS,  \
 +
  group_DTEAM,  \
 +
  group_DZERO,  \
 +
  group_EPIC_VO_GRIDPP_AC_UK,  \
 +
  group_ESR,  \
 +
  group_FUSION,  \
 +
  group_GEANT4,  \
 +
  group_GRIDPP,  \
 +
  group_HYPERK_ORG,  \
 +
  group_ILC,  \
 +
  group_LHCB,  \
 +
  group_MAGIC,  \
 +
  group_MICE,  \
 +
  group_NA62_VO_GRIDPP_AC_UK,  \
 +
  group_NEISS_ORG_UK,  \
 +
  group_OPS,  \
 +
  group_PHENO,  \
 +
  group_PLANCK,  \
 +
          group_LZ,  \
 +
  group_SNOPLUS_SNOLAB_CA,  \
 +
  group_T2K_ORG,  \
 +
  group_VO_NORTHGRID_AC_UK,  \
 +
  group_VO_SIXT_CERN_CH,  \
 +
  group_ZEUS
 +
 
 +
 
 +
  GROUP_QUOTA_DYNAMIC_group_HIGHPRIO  = 0.05
 +
  GROUP_QUOTA_DYNAMIC_group_ALICE = 0.05
 +
  GROUP_QUOTA_DYNAMIC_group_ATLAS =  0.65
 +
  GROUP_QUOTA_DYNAMIC_group_BIOMED = 0.00806452
 +
  GROUP_QUOTA_DYNAMIC_group_CALICE = 0.00806452
 +
  GROUP_QUOTA_DYNAMIC_group_CAMONT = 0.00806452
 +
  GROUP_QUOTA_DYNAMIC_group_CDF = 0.00806452
 +
  GROUP_QUOTA_DYNAMIC_group_LSST = 0.00806452
 +
  GROUP_QUOTA_DYNAMIC_group_CERNATSCHOOL_ORG = 0.00806452
 +
  GROUP_QUOTA_DYNAMIC_group_CMS = 0.00806452
 +
  GROUP_QUOTA_DYNAMIC_group_DTEAM = 0.00806452
 +
  GROUP_QUOTA_DYNAMIC_group_DZERO = 0.00806452
 +
  GROUP_QUOTA_DYNAMIC_group_EPIC_VO_GRIDPP_AC_UK = 0.00806452
 +
  GROUP_QUOTA_DYNAMIC_group_ESR = 0.00806452
 +
  GROUP_QUOTA_DYNAMIC_group_FUSION = 0.00806452
 +
  GROUP_QUOTA_DYNAMIC_group_GEANT4 = 0.00806452
 +
  GROUP_QUOTA_DYNAMIC_group_GRIDPP = 0.00806452
 +
  GROUP_QUOTA_DYNAMIC_group_HYPERK_ORG = 0.00806452
 +
  GROUP_QUOTA_DYNAMIC_group_ILC = 0.00806452
 +
  GROUP_QUOTA_DYNAMIC_group_LHCB =  0.20
 +
  GROUP_QUOTA_DYNAMIC_group_MAGIC = 0.00806452
 +
  GROUP_QUOTA_DYNAMIC_group_MICE = 0.00806452
 +
  GROUP_QUOTA_DYNAMIC_group_NA62_VO_GRIDPP_AC_UK = 0.00806452
 +
  GROUP_QUOTA_DYNAMIC_group_NEISS_ORG_UK = 0.00806452
 +
  GROUP_QUOTA_DYNAMIC_group_OPS = 0.00806452
 +
  GROUP_QUOTA_DYNAMIC_group_PHENO = 0.00806452
 +
  GROUP_QUOTA_DYNAMIC_group_PLANCK = 0.00806452
 +
  GROUP_QUOTA_DYNAMIC_group_LZ =  0.01
 +
  GROUP_QUOTA_DYNAMIC_group_SNOPLUS_SNOLAB_CA = 0.00806452
 +
  GROUP_QUOTA_DYNAMIC_group_T2K_ORG = 0.00806452
 +
  GROUP_QUOTA_DYNAMIC_group_VO_NORTHGRID_AC_UK = 0.00806452
 +
  GROUP_QUOTA_DYNAMIC_group_VO_SIXT_CERN_CH = 0.00806452
 +
  GROUP_QUOTA_DYNAMIC_group_ZEUS = 0.00806452
 +
 
 +
  DEFAULT_PRIO_FACTOR = 5000.00
 +
  GROUP_PRIO_FACTOR_group_HIGHPRIO = 1000.0
 +
  GROUP_PRIO_FACTOR_group_ALICE = 1000.0
 +
  GROUP_PRIO_FACTOR_group_ATLAS = 1000.0
 +
  GROUP_PRIO_FACTOR_group_BIOMED = 1000.0
 +
  GROUP_PRIO_FACTOR_group_CALICE = 1000.0
 +
  GROUP_PRIO_FACTOR_group_CAMONT = 1000.0
 +
  GROUP_PRIO_FACTOR_group_CDF = 1000.0
 +
  GROUP_PRIO_FACTOR_group_LSST = 1000.0
 +
  GROUP_PRIO_FACTOR_group_CERNATSCHOOL_ORG = 1000.0
 +
  GROUP_PRIO_FACTOR_group_CMS = 1000.0
 +
  GROUP_PRIO_FACTOR_group_DTEAM = 1000.0
 +
  GROUP_PRIO_FACTOR_group_DZERO = 1000.0
 +
  GROUP_PRIO_FACTOR_group_EPIC_VO_GRIDPP_AC_UK = 1000.0
 +
  GROUP_PRIO_FACTOR_group_ESR = 1000.0
 +
  GROUP_PRIO_FACTOR_group_FUSION = 1000.0
 +
  GROUP_PRIO_FACTOR_group_GEANT4 = 1000.0
 +
  GROUP_PRIO_FACTOR_group_GRIDPP = 1000.0
 +
  GROUP_PRIO_FACTOR_group_HYPERK_ORG = 1000.0
 +
  GROUP_PRIO_FACTOR_group_ILC = 1000.0
 +
  GROUP_PRIO_FACTOR_group_LHCB = 1000.0
 +
  GROUP_PRIO_FACTOR_group_MAGIC = 1000.0
 +
  GROUP_PRIO_FACTOR_group_MICE = 1000.0
 +
  GROUP_PRIO_FACTOR_group_NA62_VO_GRIDPP_AC_UK = 1000.0
 +
  GROUP_PRIO_FACTOR_group_NEISS_ORG_UK = 1000.0
 +
  GROUP_PRIO_FACTOR_group_OPS = 1000.0
 +
  GROUP_PRIO_FACTOR_group_PHENO = 1000.0
 +
  GROUP_PRIO_FACTOR_group_PLANCK = 1000.0
 +
  GROUP_PRIO_FACTOR_group_LZ = 10000.00
 +
  GROUP_PRIO_FACTOR_group_SNOPLUS_SNOLAB_CA = 1000.0
 +
  GROUP_PRIO_FACTOR_group_T2K_ORG = 1000.0
 +
  GROUP_PRIO_FACTOR_group_VO_NORTHGRID_AC_UK = 1000.0
 +
  GROUP_PRIO_FACTOR_group_VO_SIXT_CERN_CH = 1000.0
 +
  GROUP_PRIO_FACTOR_group_ZEUS = 1000.0
 +
 
 +
 
 +
  # Change the order in which the negotiator considers groups: (1) high priority groups used for
 +
  # SUM tests etc, (2) multicore groups ordered by how far below their quota each group is,
 +
  # (3) single core groups ordered by how far below their quota each group is
 +
 
 +
  GROUP_SORT_EXPR = ifThenElse(AccountingGroup=?="<none>", 3.4e+38,                                                                \
 +
                    ifThenElse(AccountingGroup=?="group_HIGHPRIO", -23,                                                            \
 +
                    ifThenElse(AccountingGroup=?="group_DTEAM", -18,                                                            \
 +
                    ifThenElse(AccountingGroup=?="group_OPS", -17,                                                            \
 +
                    ifThenElse(regexp("mcore",AccountingGroup),ifThenElse(GroupQuota > 0,-2+GroupResourcesInUse/GroupQuota,-1), \
 +
                    ifThenElse(GroupQuota > 0, GroupResourcesInUse/GroupQuota, 3.3e+38))))))
 +
 
  
<pre>
+
* '''File:''' /etc/condor/pool_password
MY_DOMAIN=ph.liv.ac.uk
+
* Notes: Will have its own section (TBD)
DPM_HOST=hepgrid11.ph.liv.ac.uk
+
* Customise: Yes.
VO_SW_DIR=/opt/exp_soft_sl5/
+
* Content:
STORAGE_PATH=/dpm/$MY_DOMAIN/home
+
<br>
</pre>
+
  
Then the generic records are defined as follows.
+
  Password Authentication
 +
  The password method provides mutual authentication through the use of a shared
 +
  secret. This is  often a good choice when strong security is desired, but an existing
 +
  Kerberos or X.509 infrastructure is not in place. Password authentication
 +
  is available on both Unix andWindows. It currently can only be used for daemon
 +
  -to-daemon authentication. The shared secret in this context is referred to as
 +
  the pool password. Before a daemon can use password authentication, the pool
 +
  password must be stored on the daemon's local machine. On Unix, the password will
 +
  be placed in a file defined by the configuration variable SEC_PASSWORD_FILE. This file
 +
  will be accessible only by the UID that HTCondor is started as. OnWindows, the same
 +
  secure password store that is used for user passwords will be used for the pool
 +
  password (see section 7.2.3). Under Unix, the password file can be generated by
 +
  using the following command to write directly to the password file:
 +
  condor_store_cred -f /path/to/password/file
 +
 
 +
* '''File:''' /etc/condor/condor_config.local
 +
* Notes: The main client CONDOR configuration custom file.
 +
* Customise: Yes. You'll need to edit it to suit your site.
 +
* Content:
  
<pre>
+
  ##  What machine is your central manager?
VO_<UCVONAME>_SW_DIR=$VO_SW_DIR/<LCVONAME>
+
 
VO_<UCVONAME>_DEFAULT_SE=$DPM_HOST
+
  CONDOR_HOST = $(FULL_HOSTNAME)
 +
 
 +
  ## Pool's short description
 +
 
 +
  COLLECTOR_NAME = Condor at $(FULL_HOSTNAME)
 +
 
 +
  ##  When is this machine willing to start a job?
 +
 
 +
  START = FALSE
 +
 
 +
  ##  When to suspend a job?
 +
 
 +
  SUSPEND = FALSE
 +
 
 +
  ##  When to nicely stop a job?
 +
  # When a job is running and the PREEMPT expression evaluates to True, the
 +
  # condor_startd will evict the job. The PREEMPT expression s hould reflect the
 +
  # requirements under which the machine owner will not permit a job to continue to run.
 +
  # For example, a policy to evict a currently running job when a key is hit or when
 +
  # it is the 9:00am work arrival time, would be expressed in the PREEMPT expression
 +
  # and enforced by the condor_startd.
 +
 
 +
  PREEMPT = FALSE
 +
 
 +
  # If there is a job from a higher priority user sitting idle, the
 +
  # condor_negotiator daemon may evict a currently running job submitted
 +
  # from a lower priority user if PREEMPTION_REQUIREMENTS is True.
 +
 
 +
  PREEMPTION_REQUIREMENTS = FALSE
 +
 
 +
  # No job has pref over any other
 +
 
 +
  #RANK = FALSE
 +
 
 +
  ##  When to instantaneously kill a preempting job
 +
  ##  (e.g. if a job is in the pre-empting stage for too long)
 +
 
 +
  KILL = FALSE
 +
 
 +
  ##  This macro determines what daemons the condor_master will start and keep its watchful eyes on.
 +
  ##  The list is a comma or space separated list of subsystem names
 +
 
 +
  DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD, STARTD
 +
 
 +
  #######################################
 +
  # Andrew Lahiff's scaling
 +
 
 +
  MachineRalScaling = "$$([ifThenElse(isUndefined(RalScaling), 1.00, RalScaling)])"
 +
  MachineRalNodeLabel = "$$([ifThenElse(isUndefined(RalNodeLabel), "NotKnown", RalNodeLabel)])"
 +
  SUBMIT_EXPRS = $(SUBMIT_EXPRS) MachineRalScaling MachineRalNodeLabel
 +
 
 +
  #######################################
 +
  # Andrew Lahiff's security
 +
 
 +
  ALLOW_WRITE =
 +
 
 +
  UID_DOMAIN = ph.liv.ac.uk
 +
 
 +
  CENTRAL_MANAGER1 = hepgrid2.ph.liv.ac.uk
 +
  COLLECTOR_HOST = $(CENTRAL_MANAGER1)
 +
 
 +
  # Central managers
 +
  CMS = condor_pool@$(UID_DOMAIN)/hepgrid2.ph.liv.ac.uk
 +
 
 +
  # CEs
 +
  CES = condor_pool@$(UID_DOMAIN)/hepgrid2.ph.liv.ac.uk
 +
 
 +
  # Worker nodes
 +
  WNS = condor_pool@$(UID_DOMAIN)/192.168.*
 +
 
 +
  # Users
 +
  USERS = *@$(UID_DOMAIN)
 +
  USERS = *
 +
 
 +
  # Required for HA
 +
  HOSTALLOW_NEGOTIATOR = $(COLLECTOR_HOST)
 +
  HOSTALLOW_ADMINISTRATOR = $(COLLECTOR_HOST)
 +
  HOSTALLOW_NEGOTIATOR_SCHEDD = $(COLLECTOR_HOST)
 +
 
 +
  # Authorization
 +
  HOSTALLOW_WRITE =
 +
  ALLOW_READ = */*.ph.liv.ac.uk
 +
  NEGOTIATOR.ALLOW_WRITE = $(CES), $(CMS)
 +
  COLLECTOR.ALLOW_ADVERTISE_MASTER = $(CES), $(CMS), $(WNS)
 +
  COLLECTOR.ALLOW_ADVERTISE_SCHEDD = $(CES)
 +
  COLLECTOR.ALLOW_ADVERTISE_STARTD = $(WNS)
 +
  SCHEDD.ALLOW_WRITE = $(USERS)
 +
  SHADOW.ALLOW_WRITE = $(WNS), $(CES)
 +
  ALLOW_DAEMON = condor_pool@$(UID_DOMAIN)/*.ph.liv.ac.uk, $(FULL_HOSTNAME)
 +
  ALLOW_ADMINISTRATOR = root@$(UID_DOMAIN)/$(IP_ADDRESS), condor_pool@$(UID_DOMAIN)/$(IP_ADDRESS), $(CMS)
 +
  ALLOW_CONFIG = root@$(FULL_HOSTNAME)
 +
 
 +
  # Don't allow nobody to run jobs
 +
  SCHEDD.DENY_WRITE = nobody@$(UID_DOMAIN)
 +
 
 +
  # Authentication
 +
  SEC_PASSWORD_FILE = /etc/condor/pool_password
 +
  SEC_DEFAULT_AUTHENTICATION = REQUIRED
 +
  SEC_READ_AUTHENTICATION = OPTIONAL
 +
  SEC_CLIENT_AUTHENTICATION = REQUIRED
 +
  SEC_DEFAULT_AUTHENTICATION_METHODS = PASSWORD,FS
 +
  SCHEDD.SEC_WRITE_AUTHENTICATION_METHODS = FS,PASSWORD
 +
  SCHEDD.SEC_DAEMON_AUTHENTICATION_METHODS = FS,PASSWORD
 +
  SEC_CLIENT_AUTHENTICATION_METHODS = FS,PASSWORD,CLAIMTOBE
 +
  SEC_READ_AUTHENTICATION_METHODS = FS,PASSWORD,CLAIMTOBE
 +
 
 +
  # Integrity
 +
  SEC_DEFAULT_INTEGRITY  = REQUIRED
 +
  SEC_DAEMON_INTEGRITY = REQUIRED
 +
  SEC_NEGOTIATOR_INTEGRITY = REQUIRED
 +
 
 +
  # Multicore
 +
  # Disable DEFRAG
 +
  #####DAEMON_LIST = $(DAEMON_LIST) DEFRAG
 +
 
 +
  DEFRAG_SCHEDULE = graceful
 +
 
 +
  DEFRAG_INTERVAL = 90 
 +
  DEFRAG_MAX_CONCURRENT_DRAINING = 1
 +
  DEFRAG_DRAINING_MACHINES_PER_HOUR = 1.0
 +
  DEFRAG_MAX_WHOLE_MACHINES = 4
 +
 
 +
  ## Allow some defrag configuration to be settable
 +
  DEFRAG.SETTABLE_ATTRS_ADMINISTRATOR = DEFRAG_MAX_CONCURRENT_DRAINING,DEFRAG_DRAINING_MACHINES_PER_HOUR,DEFRAG_MAX_WHOLE_MACHINES
 +
  ENABLE_RUNTIME_CONFIG = TRUE
 +
 
 +
  # The defrag depends on the number of spares already present, biased towards systems with many cpus
 +
  DEFRAG_RANK = Cpus * pow(TotalCpus,(1.0 / 2.0))
 +
 
 +
  # Definition of a "whole" machine:
 +
  DEFRAG_WHOLE_MACHINE_EXPR =  Cpus >= 8 && StartJobs =?= True && RalNodeOnline =?= True
 +
 
 +
  # Cancel once we have 8
 +
  DEFRAG_CANCEL_REQUIREMENTS = Cpus >= 8
 +
 
 +
  # Decide which slots can be drained
 +
  DEFRAG_REQUIREMENTS = PartitionableSlot && StartJobs =?= True && RalNodeOnline =?= True
 +
 
 +
  ## Logs
 +
  MAX_DEFRAG_LOG = 104857600
 +
  MAX_NUM_DEFRAG_LOG = 10
 +
 
 +
  #DEFRAG_DEBUG = D_FULLDEBUG
 +
 
 +
  #NEGOTIATOR_DEBUG        = D_FULLDEBUG
 +
 
 +
  # Port limits
 +
  HIGHPORT = 65000
 +
  LOWPORT = 20000
 +
 
 +
  # History
 +
  HISTORY = $(SPOOL)/history
 +
 
 +
  # Longer but better
 +
  NEGOTIATE_ALL_JOBS_IN_CLUSTER = True
 +
 
 +
  ## Allow some negotiator configuration to be settable
 +
  NEGOTIATOR.PERSISTENT_CONFIG_DIR=/var/lib/condor/persistent_config_dir
 +
  NEGOTIATOR.ENABLE_PERSISTENT_CONFIG = True
 +
  NEGOTIATOR.SETTABLE_ATTRS_ADMINISTRATOR = NEGOTIATOR_CYCLE_DELAY
 +
 
 +
  # Try to kill hogs
 +
  SYSTEM_PERIODIC_REMOVE = RemoteWallClockTime > 259200
 +
 
 +
  # Try again with ones that have some vars temporarily undef
 +
  SYSTEM_PERIODIC_RELEASE = (JobRunCount < 10 && (time() - EnteredCurrentStatus) > 1200 ) && (HoldReasonCode == 5 && HoldReasonSubCode == 0)
 +
 
 +
 
  
</pre>
+
 +
* '''File:''' /etc/ld.so.conf.d/condor.conf
 +
* Notes: CONDOR needed this to access its libraries. I had to run 'ldconfig' to make it take hold.
 +
* Customise: Maybe not.
 +
* Content:
 +
/usr/lib64/condor/
  
'''Special note on CVMFS'''
+
* '''File:''' /usr/local/bin/scaling_factors_plugin.py
Following the introduction of CVMFS for distributing experiment software, some VOs, including LHCB and ATLAS, differ
+
* Notes: This implements part of the scaling factor logic (see the '''Notes on Accounting, Scaling andPublishing''' section, below.)
from that generic pattern with regard to their SW_DIR directories. For these VOs, the settings
+
* Customise: It should be generic.
should be simiular to these examples taken from Liverpool.
+
* Content:
<pre>
+
VO_ATLAS_SW_DIR=/cvmfs/atlas.cern.ch/repo/sw
+
VO_HONE_SW_DIR=/cvmfs/hone.gridpp.ac.uk
+
VO_LHCB_SW_DIR=/cvmfs/lhcb.cern.ch
+
VO_MICE_SW_DIR=/cvmfs/mice.gridpp.ac.uk
+
VO_NA62_VO_GRIDPP_AC_UK_SW_DIR=/cvmfs/na62.gridpp.ac.uk
+
</pre>
+
  
or, equivalently in vo.d format:
+
#!/usr/bin/python
<pre>
+
# Copyright 2014 Science and Technology Facilities Council
vo.d/atlas:SW_DIR=/cvmfs/atlas.cern.ch/repo/sw
+
#
vo.d/hone:SW_DIR=/cvmfs/hone.gridpp.ac.uk
+
# Licensed under the Apache License, Version 2.0 (the "License");
vo.d/lhcb:SW_DIR=/cvmfs/lhcb.cern.ch
+
# you may not use this file except in compliance with the License.
vo.d/mice:SW_DIR=/cvmfs/mice.gridpp.ac.uk
+
# You may obtain a copy of the License at
vo.d/na62.vo.gridpp.ac.uk:SW_DIR=/cvmfs/na62.gridpp.ac.uk
+
#
</pre>
+
#  http://www.apache.org/licenses/LICENSE-2.0
 +
#
 +
# Unless required by applicable law or agreed to in writing, software
 +
# distributed under the License is distributed on an "AS IS" BASIS,
 +
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 +
# See the License for the specific language governing permissions and
 +
# limitations under the License.
 +
 +
import re
 +
from os.path import isfile
 +
import shutil
 +
import datetime
 +
import time
 +
import os
 +
 +
 +
"""Usage: scaling_factors_plugin.py <status> <control dir> <jobid>
 +
 +
Authplugin for FINISHING STATE
 +
 +
Example:
 +
 +
  authplugin="FINISHING timeout=60,onfailure=pass,onsuccess=pass /usr/local/bin/scaling_factors_plugin.py %S %C %I"
 +
 +
"""
 +
 +
def ExitError(msg,code):
 +
    """Print error message and exit"""
 +
    from sys import exit
 +
    print(msg)
 +
    exit(code)
 +
 +
def GetScalingFactor(control_dir, jobid):
 +
 +
    errors_file = '%s/job.%s.errors' %(control_dir,jobid)
 +
 +
    if not isfile(errors_file):
 +
        ExitError("No such errors file: %s"%errors_file,1)
 +
 +
    f = open(errors_file)
 +
    errors = f.read()
 +
    f.close()
 +
 +
    scaling = -1
 +
 +
    m = re.search('MATCH_EXP_MachineRalScaling = \"([\dE\+\-\.]+)\"', errors)
 +
    if m:
 +
        scaling = float(m.group(1))
 +
 +
    return scaling
 +
 +
 +
def SetScaledTimes(control_dir, jobid):
 +
 +
    scaling_factor = GetScalingFactor(control_dir, jobid)
 +
 +
    diag_file = '%s/job.%s.diag' %(control_dir,jobid)
 +
 +
 +
    if not isfile(diag_file):
 +
        ExitError("No such errors file: %s"%diag_file,1)
 +
 +
    f = open(diag_file)
 +
    lines = f.readlines()
 +
    f.close()
 +
 +
    newlines = []
 +
 +
    types = ['WallTime=', 'UserTime=', 'KernelTime=']
 +
 +
    for line in lines:
 +
        for type in types:
 +
          if type in line and scaling_factor > 0:
 +
              m = re.search('=(\d+)s', line)
 +
              if m:
 +
                scaled_time = int(float(m.group(1))*scaling_factor)
 +
                line = type + str(scaled_time) + 's\n'
 +
 +
        newlines.append(line)
 +
 +
    fw = open(diag_file, "w")
 +
    fw.writelines(newlines)
 +
    fw.close()
 +
    # Save a copy. Use this for the DAPDUMP analyser.
 +
    #tstamp = datetime.datetime.fromtimestamp(time.time()).strftime('%Y%m%d%H%M%S')
 +
    #dest = '/var/log/arc/diagfiles/' + tstamp + '_' + os.path.basename(diag_file)
 +
    #shutil.copy2(diag_file, dest)
 +
 +
    return 0
 +
 +
 +
def main():
 +
    """Main"""
 +
 +
    import sys
 +
 +
    # Parse arguments
 +
 +
    if len(sys.argv) == 4:
 +
        (exe, status, control_dir, jobid) = sys.argv
 +
    else:
 +
        ExitError("Wrong number of arguments\n"+__doc__,1)
 +
 +
    if status == "FINISHING":
 +
        SetScaledTimes(control_dir, jobid)
 +
        sys.exit(0)
 +
 +
    sys.exit(1)
 +
 +
if __name__ == "__main__":
 +
    main()
 +
 +
 +
* '''File:''' /usr/local/bin/debugging_rte_plugin.py
 +
* Notes: Useful for capturing debug output.
 +
* Customise: It should be generic.
 +
* Content:
  
Entries for these records for each supported VO must be used in addition to the records in the tables
+
#!/usr/bin/python
below. To prepare the entries, replace <UCVONAME> with the short VO name in upper case, and
+
replace <LCVONAME> with the short VO name in lower case.
+
# This copies the files containing useful output from completed jobs into a directory
 +
 +
import shutil
 +
 +
"""Usage: debugging_rte_plugin.py <status> <control dir> <jobid>
 +
 +
Authplugin for FINISHED STATE
 +
 +
Example:
 +
 +
  authplugin="FINISHED timeout=60,onfailure=pass,onsuccess=pass /usr/local/bin/debugging_rte_plugin.py %S %C %I"
 +
 +
"""
 +
 +
def ExitError(msg,code):
 +
    """Print error message and exit"""
 +
    from sys import exit
 +
    print(msg)
 +
    exit(code)
 +
 +
def ArcDebuggingL(control_dir, jobid):
 +
 +
    from os.path import isfile
 +
   
 +
    try:
 +
        m = open("/var/spool/arc/debugging/msgs", 'a')
 +
    except IOError ,  err:
 +
        print err.errno
 +
        print err.strerror
 +
 +
 +
    local_file = '%s/job.%s.local' %(control_dir,jobid)
 +
    grami_file = '%s/job.%s.grami' %(control_dir,jobid)
 +
 +
    if not isfile(local_file):
 +
        ExitError("No such description file: %s"%local_file,1)
 +
 +
    if not isfile(grami_file):
 +
        ExitError("No such description file: %s"%grami_file,1)
 +
 +
    lf = open(local_file)
 +
    local = lf.read()
 +
    lf.close()
 +
 +
    if 'Organic Units' in local or 'stephen jones' in local:
 +
        shutil.copy2(grami_file, '/var/spool/arc/debugging')
 +
 +
        f = open(grami_file)
 +
        grami = f.readlines()
 +
        f.close()
 +
   
 +
        for line in grami:
 +
            m.write(line)
 +
            if 'joboption_directory' in line:
 +
                comment = line[line.find("'")+1:line.find("'",line.find("'")+1)]+'.comment'
 +
                shutil.copy2(comment, '/var/spool/arc/debugging')
 +
            if 'joboption_stdout' in line:
 +
                mystdout = line[line.find("'")+1:line.find("'",line.find("'")+1)]
 +
                m.write("Try Copy mystdout - " + mystdout + "\n")
 +
                if isfile(mystdout):
 +
                  m.write("Copy mystdout - " + mystdout + "\n")
 +
                  shutil.copy2(mystdout, '/var/spool/arc/debugging')
 +
                else:
 +
                  m.write("mystdout gone - " + mystdout + "\n")
 +
            if 'joboption_stderr' in line:
 +
                mystderr = line[line.find("'")+1:line.find("'",line.find("'")+1)]
 +
                m.write("Try Copy mystderr - " + mystderr + "\n")
 +
                if isfile(mystderr):
 +
                  m.write("Copy mystderr - " + mystderr + "\n")
 +
                  shutil.copy2(mystderr, '/var/spool/arc/debugging')
 +
                else:
 +
                  m.write("mystderr gone - " + mystderr + "\n")
 +
 +
    close(m)
 +
    return 0
 +
 +
def main():
 +
    """Main"""
 +
 +
    import sys
 +
 +
    # Parse arguments
 +
 +
    if len(sys.argv) == 4:
 +
        (exe, status, control_dir, jobid) = sys.argv
 +
    else:
 +
        ExitError("Wrong number of arguments\n",1)
 +
 +
    if status == "FINISHED":
 +
        ArcDebuggingL(control_dir, jobid)
 +
        sys.exit(0)
 +
 +
    sys.exit(1)
 +
 +
if __name__ == "__main__":
 +
    main()
 +
 
 +
 +
* '''File:''' /usr/local/bin/default_rte_plugin.py
 +
* Notes: Sets up the default run time environment. Patched (25 Jul 2016) to work with  xRSL and EMI-ES job file inputs.
 +
* Customise: It should be generic.
 +
* Content:
  
In addition to those, there is an optional Yaim variable of the form VO_<UCVONAME>_QUEUES, which contains a list of
+
#!/usr/bin/python
the queues that can handle jobs from the VO concerned. These are sometimes referred to as per VO QUEUE
+
# Copyright 2014 Science and Technology Facilities Council
records (technically, this variable is processed by a script, libexec/YAIM2gLiteConvertor.py, which builds
+
#
up a map of what VO can run of what queue. This is used by (e.g.) a torque server). An example might be:
+
# Licensed under the Apache License, Version 2.0 (the "License");
 +
# you may not use this file except in compliance with the License.
 +
# You may obtain a copy of the License at
 +
#
 +
#  http://www.apache.org/licenses/LICENSE-2.0
 +
#
 +
# Unless required by applicable law or agreed to in writing, software
 +
# distributed under the License is distributed on an "AS IS" BASIS,
 +
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 +
# See the License for the specific language governing permissions and
 +
# limitations under the License.
 +
 +
"""Usage: default_rte_plugin.py <status> <control dir> <jobid> <runtime environment>
 +
 +
Authplugin for PREPARING STATE
 +
 +
Example:
 +
 +
  authplugin="PREPARING timeout=60,onfailure=pass,onsuccess=pass /usr/local/bin/default_rte_plugin.py %S %C %I <rte>"
 +
 +
"""
 +
 +
def ExitError(msg,code):
 +
    """Print error message and exit"""
 +
    from sys import exit
 +
    print(msg)
 +
    exit(code)
 +
 +
def SetDefaultRTE(control_dir, jobid, default_rte):
 +
 +
    from os.path import isfile
 +
 +
    desc_file = '%s/job.%s.description' %(control_dir,jobid)
 +
 +
    if not isfile(desc_file):
 +
        ExitError("No such description file: %s"%desc_file,1)
 +
 +
    f = open(desc_file)
 +
    desc = f.read()
 +
    f.close()
 +
 +
    if default_rte not in desc:
 +
      if '<esadl:ActivityDescription' in desc:
 +
        lines = desc.split('\n')
 +
        with open(desc_file, "w") as myfile:
 +
          for line in lines:
 +
            myfile.write( line + '\n')
 +
            if '<Resources>' in line:
 +
              myfile.write( '  <RuntimeEnvironment>\n')
 +
              myfile.write( '    <Name>' + default_rte + '</Name>\n')
 +
              myfile.write( '  </RuntimeEnvironment>\n')
 +
      else:
 +
        if '<jsdl:JobDefinition' not in desc:
 +
          with open(desc_file, "a") as myfile:
 +
            myfile.write("( runtimeenvironment = \"" + default_rte + "\" )")
 +
 +
    return 0
 +
 +
def main():
 +
    """Main"""
 +
 +
    import sys
 +
 +
    # Parse arguments
 +
 +
    if len(sys.argv) == 5:
 +
        (exe, status, control_dir, jobid, default_rte) = sys.argv
 +
    else:
 +
        ExitError("Wrong number of arguments\n"+__doc__,1)
 +
 +
    if status == "PREPARING":
 +
        SetDefaultRTE(control_dir, jobid, default_rte)
 +
        sys.exit(0)
 +
 +
    sys.exit(1)
 +
 +
if __name__ == "__main__":
 +
    main()
 +
 +
 
 +
 +
* '''File:''' /etc/lcmaps/lcmaps.db
 +
* Notes: Connects the authentication layer to an ARGUS server
 +
* Customise: Yes. It must be changed to suit your site.
 +
* Content:
  
<pre>
+
path = /usr/lib64/lcmaps
VO_XYZ_QUEUES="xyz"
+
</pre>
+
verify_proxy = "lcmaps_verify_proxy.mod"
 +
                    "-certdir /etc/grid-security/certificates"
 +
                    "--discard_private_key_absence"
 +
                    "--allow-limited-proxy"
 +
 +
pepc = "lcmaps_c_pep.mod"
 +
            "--pep-daemon-endpoint-url https://hepgrid9.ph.liv.ac.uk:8154/authz"
 +
            "--resourceid http://authz-interop.org/xacml/resource/resource-type/arc"
 +
            "--actionid http://glite.org/xacml/action/execute"
 +
            "--capath /etc/grid-security/certificates/"
 +
            "--certificate /etc/grid-security/hostcert.pem"
 +
            "--key /etc/grid-security/hostkey.pem"
 +
 +
# Policies:
 +
arc:
 +
verify_proxy -> pepc
 +
  
Some sites don't use VO_<UCVONAME>_QUEUES records at all; instead defining shared queues for VOs (see http://northgrid-tech.blogspot.co.uk/2008/12/phasing-out-vo-queues.html). The point is that the use of QUEUES records varies a lot depending on site circumstances, and all the possible settings are outside the scope of this document at present. But by way of a guide, the verbatim settings of the shared queues configuration at Liverpool (which use no pre-VO queues) are presented below, showing the use of the VOS, QUEUES and _GROUP_ENABLE variables.
+
* '''File:''' /etc/profile.d/env.sh
 +
* Notes: Sets up environment variables for specific VO jobs.
 +
* Customise: Yes. It must be changed to suit your site.
 +
* Content:
  
<pre>
+
if [ "X${GLITE_ENV_SET+X}" = "X" ]; then
VOS="alice atlas biomed calice camont cdf cms dteam dzero esr fusion geant4 hone gridpp ilc \
+
. /usr/libexec/grid-env-funcs.sh
  lhcb magic mice ops pheno planck vo.sixt.cern.ch snoplus.snolab.ca \
+
if [ "x${GLITE_UI_ARCH:-$1}" = "x32BIT" ]; then arch_dir=lib; else arch_dir=lib64; fi
  t2k.org vo.northgrid.ac.uk zeus"
+
gridpath_prepend    "PATH" "/bin"
 +
gridpath_prepend    "MANPATH" "/opt/glite/share/man"
 +
 +
gridenv_set "DPM_HOST" "hepgrid11.ph.liv.ac.uk"
 +
gridenv_set "DPNS_HOST" "hepgrid11.ph.liv.ac.uk"
 +
gridenv_set "GLEXEC_LOCATION" "/usr"
 +
gridenv_set "RFIO_PORT_RANGE" "20000,25000"
 +
gridenv_set "SITE_GIIS_URL" "hepgrid4.ph.liv.ac.uk"
 +
gridenv_set "SITE_NAME" "UKI-NORTHGRID-LIV-HEP"
 +
gridenv_set "MYPROXY_SERVER" "lcgrbp01.gridpp.rl.ac.uk"
 +
   
 +
gridenv_set        "VO_ZEUS_SW_DIR" "/opt/exp_soft_sl5/zeus"
 +
gridenv_set        "VO_ZEUS_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
 +
gridenv_set        "VO_VO_NORTHGRID_AC_UK_SW_DIR" "/opt/exp_soft_sl5/northgrid"
 +
gridenv_set        "VO_VO_NORTHGRID_AC_UK_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
 +
gridenv_set        "VO_T2K_ORG_SW_DIR" "/cvmfs/t2k.gridpp.ac.uk"
 +
gridenv_set        "VO_T2K_ORG_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
 +
gridenv_set        "VO_SNOPLUS_SNOLAB_CA_SW_DIR" "/cvmfs/snoplus.gridpp.ac.uk"
 +
gridenv_set        "VO_SNOPLUS_SNOLAB_CA_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
 +
gridenv_set        "VO_PLANCK_SW_DIR" "/opt/exp_soft_sl5/planck"
 +
gridenv_set        "VO_PLANCK_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
 +
gridenv_set        "VO_PHENO_SW_DIR" "/opt/exp_soft_sl5/pheno"
 +
gridenv_set        "VO_PHENO_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
 +
gridenv_set        "VO_OPS_SW_DIR" "/opt/exp_soft_sl5/ops"
 +
gridenv_set        "VO_OPS_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
 +
gridenv_set        "VO_NEISS_ORG_UK_SW_DIR" "/opt/exp_soft_sl5/neiss"
 +
gridenv_set        "VO_NEISS_ORG_UK_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
 +
gridenv_set        "VO_NA62_VO_GRIDPP_AC_UK_SW_DIR" "/cvmfs/na62.cern.ch"
 +
gridenv_set        "VO_NA62_VO_GRIDPP_AC_UK_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
 +
  gridenv_set        "VO_MICE_SW_DIR" "/cvmfs/mice.gridpp.ac.uk"
 +
gridenv_set        "VO_MICE_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
 +
gridenv_set        "VO_MAGIC_SW_DIR" "/opt/exp_soft_sl5/magic"
 +
gridenv_set        "VO_MAGIC_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
 +
gridenv_set        "VO_LHCB_SW_DIR" "/cvmfs/lhcb.cern.ch"
 +
gridenv_set        "VO_LHCB_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
 +
gridenv_set        "VO_LZ_SW_DIR" "/opt/exp_soft_sl5/lz"
 +
gridenv_set        "VO_LZ_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
 +
gridenv_set        "VO_LSST_SW_DIR" "/opt/exp_soft_sl5/lsst"
 +
gridenv_set        "VO_LSST_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
 +
gridenv_set        "VO_ILC_SW_DIR" "/cvmfs/ilc.desy.de"
 +
gridenv_set        "VO_ILC_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
 +
gridenv_set        "VO_GRIDPP_SW_DIR" "/opt/exp_soft_sl5/gridpp"
 +
gridenv_set        "VO_GRIDPP_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
 +
gridenv_set        "VO_GEANT4_SW_DIR" "/opt/exp_soft_sl5/geant4"
 +
gridenv_set        "VO_GEANT4_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
 +
gridenv_set        "VO_FUSION_SW_DIR" "/opt/exp_soft_sl5/fusion"
 +
gridenv_set        "VO_FUSION_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
 +
gridenv_set        "VO_ESR_SW_DIR" "/opt/exp_soft_sl5/esr"
 +
gridenv_set        "VO_ESR_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
 +
gridenv_set        "VO_EPIC_VO_GRIDPP_AC_UK_SW_DIR" "/opt/exp_soft_sl5/epic"
 +
gridenv_set        "VO_EPIC_VO_GRIDPP_AC_UK_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
 +
gridenv_set        "VO_DZERO_SW_DIR" "/opt/exp_soft_sl5/dzero"
 +
gridenv_set        "VO_DZERO_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
 +
gridenv_set        "VO_DTEAM_SW_DIR" "/opt/exp_soft_sl5/dteam"
 +
gridenv_set        "VO_DTEAM_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
 +
gridenv_set        "VO_CMS_SW_DIR" "/opt/exp_soft_sl5/cms"
 +
gridenv_set        "VO_CMS_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
 +
gridenv_set        "VO_CERNATSCHOOL_ORG_SW_DIR" "/cvmfs/cernatschool.gridpp.ac.uk"
 +
gridenv_set        "VO_CERNATSCHOOL_ORG_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
 +
gridenv_set        "VO_CDF_SW_DIR" "/opt/exp_soft_sl5/cdf"
 +
gridenv_set        "VO_CDF_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
 +
gridenv_set        "VO_CAMONT_SW_DIR" "/opt/exp_soft_sl5/camont"
 +
gridenv_set        "VO_CAMONT_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
 +
gridenv_set        "VO_CALICE_SW_DIR" "/opt/exp_soft_sl5/calice"
 +
gridenv_set        "VO_CALICE_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
 +
gridenv_set        "VO_BIOMED_SW_DIR" "/opt/exp_soft_sl5/biomed"
 +
gridenv_set        "VO_BIOMED_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
 +
gridenv_set        "VO_ATLAS_SW_DIR" "/cvmfs/atlas.cern.ch/repo/sw"
 +
gridenv_set        "VO_ATLAS_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
 +
gridenv_set        "VO_ALICE_SW_DIR" "/opt/exp_soft_sl5/alice"
 +
gridenv_set        "VO_ALICE_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
 +
gridenv_set        "SITE_NAME" "UKI-NORTHGRID-LIV-HEP"
 +
gridenv_set        "SITE_GIIS_URL" "hepgrid4.ph.liv.ac.uk"
 +
gridenv_set        "RFIO_PORT_RANGE" ""20000,25000""
 +
gridenv_set        "MYPROXY_SERVER" "lcgrbp01.gridpp.rl.ac.uk"
 +
gridenv_set        "LCG_LOCATION" "/usr"
 +
gridenv_set        "LCG_GFAL_INFOSYS" "lcg-bdii.gridpp.ac.uk:2170,topbdii.grid.hep.ph.ic.ac.uk:2170"
 +
gridenv_set        "GT_PROXY_MODE" "old"
 +
gridenv_set        "GRID_ENV_LOCATION" "/usr/libexec"
 +
gridenv_set        "GRIDMAPDIR" "/etc/grid-security/gridmapdir"
 +
gridenv_set        "GLITE_LOCATION_VAR" "/var"
 +
gridenv_set        "GLITE_LOCATION" "/usr"
 +
gridenv_set        "GLITE_ENV_SET" "TRUE"
 +
gridenv_set        "GLEXEC_LOCATION" "/usr"
 +
gridenv_set        "DPNS_HOST" "hepgrid11.ph.liv.ac.uk"
 +
gridenv_set        "DPM_HOST" "hepgrid11.ph.liv.ac.uk"
 +
. /usr/libexec/clean-grid-env-funcs.sh
 +
fi
  
QUEUES="long"
 
  
LONG_GROUP_ENABLE=" alice /alice/ROLE=lcgadmin /alice/ROLE=production atlas /atlas/ROLE=lcgadmin /atlas/ROLE=production /atlas/ROLE=pilot \
+
* '''File:''' /etc/grid-security/grid-mapfile
  /biomed/ROLE=lcgadmin calice /calice/ROLE=lcgadmin /calice/ROLE=production \
+
* Notes: Useful for directly mapping a user for testing. Superseded by ARGUS now, so optional.
  camont /camont/ROLE=lcgadmin cdf /cdf/ROLE=lcgadmin cms /cms/ROLE=lcgadmin /cms/ROLE=production \
+
* Customise: Yes. It must be changed to suit your site.
  dteam /dteam/ROLE=lcgadmin /dteam/ROLE=production dzero /dzero/ROLE=lcgadmin esr /esr/ROLE=lcgadmin \
+
* Content:
  fusion /fusion/ROLE=production geant4 /geant4/ROLE=lcgadmin /geant4/ROLE=production gridpp /gridpp/ROLE=lcgadmin \
+
"/C=UK/O=eScience/OU=Liverpool/L=CSD/CN=stephen jones" dteam184
  hone /hone/ROLE=lcgadmin /hone/ROLE=production ilc /ilc/ROLE=lcgadmin /ilc/ROLE=production \
+
  lhcb /lhcb/ROLE=lcgadmin /lhcb/ROLE=production /lhcb/ROLE=pilot \
+
  magic /magic/ROLE=lcgadmin mice /mice/ROLE=lcgadmin /mice/ROLE=production  \
+
  vo.northgrid.ac.uk /vo.northgrid.ac.uk/ROLE=lcgadmin \
+
  ops /ops/ROLE=lcgadmin /ops/ROLE=pilot pheno /pheno/ROLE=lcgadmin planck /planck/ROLE=lcgadmin /planck/ROLE=production \
+
  vo.sixt.cern.ch /vo.sixt.cern.ch/ROLE=lcgadmin snoplus.snolab.ca /snoplus.snolab.ca/ROLE=lcgadmin /snoplus.snolab.ca/ROLE=production \
+
  t2k.org /t2k.org/ROLE=lcgadmin /t2k.org/ROLE=production  \
+
  zeus /zeus/ROLE=lcgadmin /zeus/ROLE=production"
+
  
</pre>
+
* '''File:''' /root/glitecfg/site-info.def
 +
* Notes: Just a copy of the site standard SID file. Used to make the accounts with YAIM.
 +
* Content: as per site standard
  
=== VO Yaim Records ===
+
* '''File:''' /root/glitecfg/vo.d
 +
* Notes: Just a copy of the site standard vo.d dir. Used to make the VOMS config with YAIM.
 +
* Content: as per site standard
  
'''Note about SIDs versus VODs:'''  
+
* '''File:''' /opt/glite/yaim/etc/users.conf
 +
* Notes: Just a copy of the site standard users.conf file. Used to make the accounts  with YAIM.
 +
* Content: as per site standard
  
The following sections detail three important VO variables;
+
* '''File:''' /opt/glite/yaim/etc/groups.conf
VO_<UCVONAME>_VOMS_SERVERS, VO_<UCVONAME>_VOMSES and VO_<UCVONAME>_VOMS_CA_DN. Sometimes,
+
* Notes: Just a copy of the site standard groups.conf file. Used to make the accounts  with YAIM.
additional notes preserved from the original documentation are also given.
+
* Content: as per site standard
  
'''Tip about DNS style VO names:'''  
+
* '''File:''' /etc/arc/runtime/ENV/PROXY
 +
* Notes: Stops error messages of one kind or another
 +
* Content: empty
  
One can represent any VO using VOD records, but some VOs (i.e. DNS style names; those with dots in the name) are
+
* '''File:''' /etc/init.d/nordugrid-arc-egiis
awkward to express in SID format. In the tables  below,  SID records for such VOs are  given but they are
+
* Notes: Stops error messages of one kind or another
commented out with hash signs. In these cases, it might be best to use VOD format records.
+
* Content: empty
  
'''Note about record multiplicity:'''
+
=== Head Cron jobs ===
  
Data in the CIC portal for the VO_<UCVONAME>_VOMS_SERVERS, VO_<UCVONAME>_VOMSES and VO_<UCVONAME>_VOMS_CA_DN records
+
I had to add these cron jobs.
is related by order and multiplicity - they match up.  
+
  
When generating the Yaim versions of this data, records for VO_<UCVONAME>_VOMSES and VO_<UCVONAME>_VOMS_CA_DN
+
* Cron: jura
must also match in order and multiplicity.
+
* Purpose: Run the jura APEL reporter now and again
 +
* Content:
 +
16 6 * * * /usr/libexec/arc/jura /var/spool/arc/jobstatus &>> /var/log/arc/jura.log
  
But it is optional with respect to the  VO_<UCVONAME>_VOMS_SERVERS record. According to Maarten Litmaath "for the CERN
+
* Cron: fetch-crl
servers it actually was deemed desirable that grid-mapfiles be generated using voms.cern.ch only, because
+
* Purpose: Run fetch-crl
lcg-voms.cern.ch is already running the VOMRS (sic) service as an extra load". 
+
* Content:
  
Thus, in the sections below, VO_<UCVONAME>_VOMS_SERVERS for CERN based VOs are restricted to a single record related to
+
# Cron job running by default every 6 hours, at 45 minutes +/- 3 minutes
voms.cern.ch.
+
# The lock file can be enabled or disabled via a
 +
# service fetch-crl-cron start
 +
# chkconfig fetch-crl-cron on
 +
 +
# Note the lock file not existing is success (and over-all success is needed
 +
# in order to prevent error messages from cron. "-q" makes it really
 +
# quiet, but beware that the "-q" overrides any verbosity settings
 +
 +
42 */6 * * * root [ ! -f /var/lock/subsys/fetch-crl-cron ] || ( [ -f /etc/sysconfig/fetch-crl ] && . /etc/sysconfig/fetch-crl ; /usr/sbin/fetch-crl -q -r 360 $FETCHCRL_OPTIONS $FETCHCRL_CRON_OPTIONS )
  
'''NOTA BENE'''
+
=== Patch to give a fixed number of logical and physical CPUs ===
  
Please do not change the '''VO Yaim Records''' tables below, as they are automatically updated from the CIC Portal.  
+
The GLUE2 [http://glue20.web.cern.ch/glue20/# schema] shows that the TotalLogicalCPUs element is intended to represent the total installed capacity (otherwise known as the nameplate capacity or nominal capacity), i.e. including resources which are temporarily unavailable. But the out of the box behaviour yielded strange, varying values for the total of physical and logical cpus in the BDII output. That output is produced in this Perl module.
  
<!-- START OF SIDSECTION -->
+
/usr/share/arc/ARC1ClusterInfo.pm
  
{{BOX VO|ALICE|<!-- VOMS RECORDS for ALICE -->
+
To fix the values to nominal, static values representative of the plate capacity at our site, I added these lines to that file (around line 586) which short-circuits the existing logic completely.
  
 +
  $totalpcpus = 260;
 +
  $totallcpus = 1994;
  
''' site-info.def version (sid) '''
+
<!-- === Patch to fix additional text in GLUE2ServiceAdminDomainForeignKey ===
<pre><nowiki>
+
VO_ALICE_VOMS_SERVERS="'vomss://lcg-voms2.cern.ch:8443/voms/alice?/alice' 'vomss://voms.cern.ch:8443/voms/alice?/alice' 'vomss://voms2.cern.ch:8443/voms/alice?/alice' "
+
VO_ALICE_VOMSES="'alice lcg-voms.cern.ch 15000 /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch alice' 'alice lcg-voms2.cern.ch 15000 /DC=ch/DC=cern/OU=computers/CN=lcg-voms2.cern.ch alice' 'alice voms.cern.ch 15000 /DC=ch/DC=cern/OU=computers/CN=voms.cern.ch alice' 'alice voms2.cern.ch 15000 /DC=ch/DC=cern/OU=computers/CN=voms2.cern.ch alice' "
+
VO_ALICE_VOMS_CA_DN="'/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Grid Certification Authority' '/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Grid Certification Authority' "
+
</nowiki></pre>
+
''' vo.d version (vod)'''
+
<pre><nowiki>
+
# $YAIM_LOCATION/vo.d/alice
+
VOMS_SERVERS="'vomss://lcg-voms2.cern.ch:8443/voms/alice?/alice' 'vomss://voms.cern.ch:8443/voms/alice?/alice' 'vomss://voms2.cern.ch:8443/voms/alice?/alice' "
+
VOMSES="'alice lcg-voms.cern.ch 15000 /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch alice' 'alice lcg-voms2.cern.ch 15000 /DC=ch/DC=cern/OU=computers/CN=lcg-voms2.cern.ch alice' 'alice voms.cern.ch 15000 /DC=ch/DC=cern/OU=computers/CN=voms.cern.ch alice' 'alice voms2.cern.ch 15000 /DC=ch/DC=cern/OU=computers/CN=voms2.cern.ch alice' "
+
VOMS_CA_DN="'/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Grid Certification Authority' '/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Grid Certification Authority' "
+
  
</nowiki></pre>
+
Edit the /usr/share/arc/glue-generator.pl file.
Notes:
+
n/a
+
}}
+
  
 +
After the assignment to $ldif_input (around line 84) add a line of perl to delete extra text (i.e. urn:ad:).
 +
<pre>
 +
my $ldif_input=`$LDIF_GENERATOR_FILE_NG`;  # ADD THE NEXT LINE AFTER HERE ...
 +
$ldif_input =~ s/GLUE2ServiceAdminDomainForeignKey: urn:ad:/GLUE2ServiceAdminDomainForeignKey: /;
 +
</pre>
 +
-->
  
{{BOX VO|ATLAS|<!-- VOMS RECORDS for ATLAS -->
+
=== Patch for BDII Job Count Breakdown ===
  
 +
I put in a set of patches (provided by Andrew Lahiff) to make corrections to the BDII output such that it gave individual breakdowns of job counts in glue1. This consisted of various parts. First, I added to some cron jobs to create job count statistics every 10 minutes.
  
''' site-info.def version (sid) '''
+
*/10 * * * * root /usr/bin/condor_q -constraint 'JobStatus==2' -autoformat x509UserProxyVOName | sort | uniq -c > /var/local/condor_jobs_running
<pre><nowiki>
+
*/10 * * * * root /usr/bin/condor_q -constraint 'JobStatus==1' -autoformat x509UserProxyVOName | sort | uniq -c > /var/local/condor_jobs_idle
VO_ATLAS_VOMS_SERVERS="'vomss://lcg-voms2.cern.ch:8443/voms/atlas?/atlas' 'vomss://voms.cern.ch:8443/voms/atlas?/atlas' 'vomss://voms2.cern.ch:8443/voms/atlas?/atlas' "
+
VO_ATLAS_VOMSES="'atlas lcg-voms.cern.ch 15001 /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch atlas' 'atlas lcg-voms2.cern.ch 15001 /DC=ch/DC=cern/OU=computers/CN=lcg-voms2.cern.ch atlas' 'atlas vo.racf.bnl.gov 15003 /DC=com/DC=DigiCert-Grid/O=Open Science Grid/OU=Services/CN=vo.racf.bnl.gov atlas' 'atlas voms.cern.ch 15001 /DC=ch/DC=cern/OU=computers/CN=voms.cern.ch atlas' 'atlas voms2.cern.ch 15001 /DC=ch/DC=cern/OU=computers/CN=voms2.cern.ch atlas' "
+
VO_ATLAS_VOMS_CA_DN="'/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Grid Certification Authority' '/DC=com/DC=DigiCert-Grid/O=DigiCert Grid/CN=DigiCert Grid CA-1' '/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Grid Certification Authority' "
+
</nowiki></pre>
+
''' vo.d version (vod)'''
+
<pre><nowiki>
+
# $YAIM_LOCATION/vo.d/atlas
+
VOMS_SERVERS="'vomss://lcg-voms2.cern.ch:8443/voms/atlas?/atlas' 'vomss://voms.cern.ch:8443/voms/atlas?/atlas' 'vomss://voms2.cern.ch:8443/voms/atlas?/atlas' "
+
VOMSES="'atlas lcg-voms.cern.ch 15001 /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch atlas' 'atlas lcg-voms2.cern.ch 15001 /DC=ch/DC=cern/OU=computers/CN=lcg-voms2.cern.ch atlas' 'atlas vo.racf.bnl.gov 15003 /DC=com/DC=DigiCert-Grid/O=Open Science Grid/OU=Services/CN=vo.racf.bnl.gov atlas' 'atlas voms.cern.ch 15001 /DC=ch/DC=cern/OU=computers/CN=voms.cern.ch atlas' 'atlas voms2.cern.ch 15001 /DC=ch/DC=cern/OU=computers/CN=voms2.cern.ch atlas' "
+
VOMS_CA_DN="'/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Grid Certification Authority' '/DC=com/DC=DigiCert-Grid/O=DigiCert Grid/CN=DigiCert Grid CA-1' '/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Grid Certification Authority' "
+
  
</nowiki></pre>
+
These create files in the following format, showing the job count of each VO.
Notes:
+
n/a
+
}}
+
  
 +
# cat /var/local/condor_jobs_running
 +
    805 atlas
 +
    10 ilc
 +
    251 lhcb
  
 +
I made additional changes to /usr/share/arc/glue-generator.pl to parse these files and convert them to BDII output. First, I added two subroutines near the top of the file:
  
{{BOX VO|BIOMED|<!-- VOMS RECORDS for BIOMED -->
+
sub getCondorJobsRunning
 +
{
 +
    my ($vo) = @_;
 +
    my $file = "/var/local/condor_jobs_running";
 +
    if (-e $file)
 +
    {
 +
      open(FILE, "<$file");
 +
      foreach my $line (<FILE>)
 +
      {
 +
          if ($line =~ /$vo/)
 +
          {
 +
            my @pieces = split(" ", $line);
 +
            return $pieces[0];
 +
          }
 +
      }
 +
      close(FILE);
 +
    }
 +
    return 0;
 +
}
 +
 +
 +
sub getCondorJobsIdle
 +
{
 +
    my ($vo) = @_;
 +
    my $file = "/var/local/condor_jobs_idle";
 +
    if (-e $file)
 +
    {
 +
      open(FILE, "<$file");
 +
      foreach my $line (<FILE>)
 +
      {
 +
          if ($line =~ /$vo/)
 +
          {
 +
            my @pieces = split(" ", $line);
 +
            return $pieces[0];
 +
          }
 +
      }
 +
      close(FILE);
 +
    }
 +
    return 0;
 +
}
  
 +
And I used the following section of code lower in the file to build the new readings into the output. To insert this patch, delete all lines from the second "foreach (@vos){" down to the corresponding close bracket, then add this code:
  
''' site-info.def version (sid) '''
+
            foreach (@vos){
<pre><nowiki>
+
                chomp;
VO_BIOMED_VOMS_SERVERS="'vomss://cclcgvomsli01.in2p3.fr:8443/voms/biomed?/biomed' "
+
                $vo = $_;
VO_BIOMED_VOMSES="'biomed cclcgvomsli01.in2p3.fr 15000 /O=GRID-FR/C=FR/O=CNRS/OU=CC-IN2P3/CN=cclcgvomsli01.in2p3.fr biomed' "
+
                $vo =~ s/VO:// ;
VO_BIOMED_VOMS_CA_DN="'/C=FR/O=CNRS/CN=GRID2-FR' "
+
                my $vob;
</nowiki></pre>
+
                if ($vo =~ /(\w+)/ || $vo =~ /(\w+)\./) { $vob = $1; }
''' vo.d version (vod)'''
+
<pre><nowiki>
+
                my @pieces = split(/\s+/, $cluster_attributes{'nordugrid-cluster-localse'});
# $YAIM_LOCATION/vo.d/biomed
+
                my $useLocalSE = "";
VOMS_SERVERS="'vomss://cclcgvomsli01.in2p3.fr:8443/voms/biomed?/biomed' "
+
                foreach my $piece (@pieces)
VOMSES="'biomed cclcgvomsli01.in2p3.fr 15000 /O=GRID-FR/C=FR/O=CNRS/OU=CC-IN2P3/CN=cclcgvomsli01.in2p3.fr biomed' "
+
                {
VOMS_CA_DN="'/C=FR/O=CNRS/CN=GRID2-FR' "
+
                    if ($piece =~ /$vob/) { $useLocalSE = $piece; }
 +
                }
 +
                if ($vo =~ /superb/) { $useLocalSE = "srm-superb.gridpp.rl.ac.uk"; }
 +
                if ($useLocalSE eq "") { $useLocalSE = "srm-dteam.gridpp.rl.ac.uk"; }
 +
 +
                my $myVoRunning = getCondorJobsRunning($vo);
 +
                my $myVoIdle = getCondorJobsIdle($vo);
 +
                my $myVoTotal = $myVoRunning + $myVoIdle;
 +
 +
                print "
 +
dn: GlueVOViewLocalID=$vo,GlueCEUniqueID=$ce_unique_id,Mds-Vo-name=resource,o=grid
 +
objectClass: GlueCETop
 +
objectClass: GlueVOView
 +
objectClass: GlueCEInfo
 +
objectClass: GlueCEState
 +
objectClass: GlueCEAccessControlBase
 +
objectClass: GlueCEPolicy
 +
objectClass: GlueKey
 +
objectClass: GlueSchemaVersion
 +
GlueSchemaVersionMajor: 1
 +
GlueSchemaVersionMinor: 2
 +
GlueCEInfoDefaultSE: $cluster_attributes{'nordugrid-cluster-localse'}
 +
GlueCEStateTotalJobs: $myVoTotal
 +
GlueCEInfoDataDir: unset
 +
GlueCEAccessControlBaseRule: VO:$vo
 +
GlueCEStateRunningJobs: $myVoRunning
 +
GlueChunkKey: GlueCEUniqueID=$ce_unique_id
 +
GlueVOViewLocalID: $vo
 +
GlueCEInfoApplicationDir: unset
 +
GlueCEStateWaitingJobs: $myVoIdle
 +
GlueCEStateEstimatedResponseTime: $estRespTime
 +
GlueCEStateWorstResponseTime: $worstRespTime
 +
GlueCEStateFreeJobSlots: $freeSlots
 +
GlueCEStateFreeCPUs: $freeSlots
 +
";
 +
            }
 +
=== Alternative Patch for BDII Job Count Breakdown -- GRIF/IRFU modification ===
  
</nowiki></pre>
+
This is another way to do the same thing. I adapted the above patch so that it's simpler to implement even if it calls condor_q more often, which shouldn't have any impact on the bdii performance anyway.
Notes:
+
n/a
+
}}
+
  
{{BOX VO|CAMONT|<!-- VOMS RECORDS for CAMONT -->
+
Just apply this diff and you're good to go :
  
 +
<nowiki>--- /usr/share/arc/glue-generator.pl.orig 2017-05-15 12:23:47.703420951 +0200
 +
+++ /usr/share/arc/glue-generator.pl 2017-05-15 12:45:27.536352858 +0200
 +
@@ -515,6 +515,8 @@
 +
                chomp;
 +
        $vo = $_;
 +
        $vo =~ s/VO:// ;
 +
+          my $vo_running=`/usr/bin/condor_q -constraint 'JobStatus==2 && x509UserProxyVOName=="$vo"' -autoformat x509UserProxyVOName |/usr/bin/wc -l` ;
 +
+          my $vo_waiting=`/usr/bin/condor_q -constraint 'JobStatus==1 && x509UserProxyVOName=="$vo"' -autoformat x509UserProxyVOName |/usr/bin/wc -l` ;
 +
 +
                print "
 +
dn: GlueVOViewLocalID=$vo,GlueCEUniqueID=$ce_unique_id,Mds-Vo-name=resource,o=grid
 +
@@ -532,11 +534,11 @@
 +
GlueCEStateTotalJobs: $totalJobs
 +
GlueCEInfoDataDir: unset
 +
GlueCEAccessControlBaseRule: VO:$vo
 +
-GlueCEStateRunningJobs: $queue_attributes{'nordugrid-queue-running'}
 +
+GlueCEStateRunningJobs: $vo_running
 +
GlueChunkKey: GlueCEUniqueID=$ce_unique_id
 +
GlueVOViewLocalID: $vo
 +
GlueCEInfoApplicationDir: unset
 +
-GlueCEStateWaitingJobs: $waitingJobs
 +
+GlueCEStateWaitingJobs: $vo_waiting
 +
GlueCEStateEstimatedResponseTime: $estRespTime
 +
GlueCEStateWorstResponseTime: $worstRespTime
 +
GlueCEStateFreeJobSlots: $freeSlots
  
''' site-info.def version (sid) '''
+
</nowiki>
<pre><nowiki>
+
VO_CAMONT_VOMS_SERVERS="'vomss://voms.gridpp.ac.uk:8443/voms/camont?/camont' "
+
VO_CAMONT_VOMSES="'camont voms.gridpp.ac.uk 15025 /C=UK/O=eScience/OU=Manchester/L=HEP/CN=voms.gridpp.ac.uk camont' "
+
VO_CAMONT_VOMS_CA_DN="'/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' "
+
</nowiki></pre>
+
''' vo.d version (vod)'''
+
<pre><nowiki>
+
# $YAIM_LOCATION/vo.d/camont
+
VOMS_SERVERS="'vomss://voms.gridpp.ac.uk:8443/voms/camont?/camont' "
+
VOMSES="'camont voms.gridpp.ac.uk 15025 /C=UK/O=eScience/OU=Manchester/L=HEP/CN=voms.gridpp.ac.uk camont' "
+
VOMS_CA_DN="'/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' "
+
  
</nowiki></pre>
+
note : the .pl file contains tabs, and so should this patch file (the 2 lines just before the "my bo_" variables declarations), otherwise the patch program will fail to apply the patch
Notes:  
+
n/a
+
}}
+
  
{{BOX VO|CDF|<!-- VOMS RECORDS for CDF -->
+
=== Patch for Extra BDII Fields ===
  
 +
To set the GlueCEPolicyMaxCPUTime and GlueCEPolicyMaxWallClockTime bdii publishing values, you need to change the lines involving GlueCEPolicyMaxCPUTime and GlueCEPolicyMaxWallClockTime in /usr/share/arc/glue-generator.pl. For example:
  
''' site-info.def version (sid) '''
+
GlueCEPolicyMaxCPUTime: 4320
<pre><nowiki>
+
GlueCEPolicyMaxWallClockTime: 4320
VO_CDF_VOMS_SERVERS="'vomss://voms.cnaf.infn.it:8443/voms/cdf?/cdf' "
+
VO_CDF_VOMSES="'cdf voms-01.pd.infn.it 15001 /C=IT/O=INFN/OU=Host/L=Padova/CN=voms-01.pd.infn.it cdf' 'cdf voms.cnaf.infn.it 15001 /C=IT/O=INFN/OU=Host/L=CNAF/CN=voms.cnaf.infn.it cdf' 'cdf voms.fnal.gov 15020 /DC=com/DC=DigiCert-Grid/O=Open Science Grid/OU=Services/CN=voms.fnal.gov cdf' "
+
VO_CDF_VOMS_CA_DN="'/C=IT/O=INFN/CN=INFN CA' '/C=IT/O=INFN/CN=INFN CA' '/DC=com/DC=DigiCert-Grid/O=DigiCert Grid/CN=DigiCert Grid CA-1' "
+
</nowiki></pre>
+
''' vo.d version (vod)'''
+
<pre><nowiki>
+
# $YAIM_LOCATION/vo.d/cdf
+
VOMS_SERVERS="'vomss://voms.cnaf.infn.it:8443/voms/cdf?/cdf' "
+
VOMSES="'cdf voms-01.pd.infn.it 15001 /C=IT/O=INFN/OU=Host/L=Padova/CN=voms-01.pd.infn.it cdf' 'cdf voms.cnaf.infn.it 15001 /C=IT/O=INFN/OU=Host/L=CNAF/CN=voms.cnaf.infn.it cdf' 'cdf voms.fnal.gov 15020 /DC=com/DC=DigiCert-Grid/O=Open Science Grid/OU=Services/CN=voms.fnal.gov cdf' "
+
VOMS_CA_DN="'/C=IT/O=INFN/CN=INFN CA' '/C=IT/O=INFN/CN=INFN CA' '/DC=com/DC=DigiCert-Grid/O=DigiCert Grid/CN=DigiCert Grid CA-1' "
+
  
</nowiki></pre>
+
=== Patch for Correct Cores Parsing  ===
Notes:
+
n/a
+
}}
+
  
{{BOX VO|CMS|<!-- VOMS RECORDS for CMS -->
+
Sites can (and do) use floaring point numbers in the cpu counts. A detailed explanation of this is given here: [[Publishing_tutorial#Logical_and_physical_CPUs]]. In summary, the calculation of installed capacity involves timesing the average cores per logical cpu by the total number of logical cpus and timesing that by average HEPSPEC06 of a logical cpu. Obviously, average cores per logical cpu can be a floating point.
  
 +
But the ARC system, as it stands, only reads <b>Cores</b> as an integer, so a change to the regexp is needed if the  site uses a floating point number.
  
''' site-info.def version (sid) '''
+
The problem lies in two spots in /usr/share/arc/glue-generator.pl. A regex is supposed to pull out the Cores=XXX.XXX value, but only matches integers. Since we set Cores to an average value (Cores=5.93,Benchmark...) it rounds down to 5, setting glueSubClusterPhysicalCPUs to 724/5 = 144. The true value should be 724/5.93 = 122.
<pre><nowiki>
+
VO_CMS_VOMS_SERVERS="'vomss://lcg-voms2.cern.ch:8443/voms/cms?/cms' 'vomss://voms.cern.ch:8443/voms/cms?/cms' 'vomss://voms2.cern.ch:8443/voms/cms?/cms' "
+
VO_CMS_VOMSES="'cms lcg-voms.cern.ch 15002 /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch cms' 'cms lcg-voms2.cern.ch 15002 /DC=ch/DC=cern/OU=computers/CN=lcg-voms2.cern.ch cms' 'cms voms.cern.ch 15002 /DC=ch/DC=cern/OU=computers/CN=voms.cern.ch cms' 'cms voms2.cern.ch 15002 /DC=ch/DC=cern/OU=computers/CN=voms2.cern.ch cms' "
+
VO_CMS_VOMS_CA_DN="'/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Grid Certification Authority' '/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Grid Certification Authority' "
+
</nowiki></pre>
+
''' vo.d version (vod)'''
+
<pre><nowiki>
+
# $YAIM_LOCATION/vo.d/cms
+
VOMS_SERVERS="'vomss://lcg-voms2.cern.ch:8443/voms/cms?/cms' 'vomss://voms.cern.ch:8443/voms/cms?/cms' 'vomss://voms2.cern.ch:8443/voms/cms?/cms' "
+
VOMSES="'cms lcg-voms.cern.ch 15002 /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch cms' 'cms lcg-voms2.cern.ch 15002 /DC=ch/DC=cern/OU=computers/CN=lcg-voms2.cern.ch cms' 'cms voms.cern.ch 15002 /DC=ch/DC=cern/OU=computers/CN=voms.cern.ch cms' 'cms voms2.cern.ch 15002 /DC=ch/DC=cern/OU=computers/CN=voms2.cern.ch cms' "
+
VOMS_CA_DN="'/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Grid Certification Authority' '/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Grid Certification Authority' "
+
  
</nowiki></pre>
+
I put in the patch below to "fix" it.
Notes:
+
n/a
+
}}
+
  
 +
# svn diff  ./modules/emi-server/files/condor/glue-generator.pl
 +
Index: modules/emi-server/files/condor/glue-generator.pl
 +
===================================================================
 +
--- modules/emi-server/files/condor/glue-generator.pl    (revision 2817)
 +
+++ modules/emi-server/files/condor/glue-generator.pl    (working copy)
 +
@@ -217,7 +217,7 @@
 +
$glueHostArchitecturePlatformType=$cluster_attributes{'nordugrid-cluster-architecture'};
 +
$glueSubClusterUniqueID=$cluster_attributes{'nordugrid-cluster-name'};
 +
      $glueSubClusterName=$glue_site_unique_id;
 +
-    if ( $processorOtherDesc =~ m/Cores=(\d+)/ ){
 +
+    if ( $processorOtherDesc =~ m/Cores=([0-9]*\.?[0-9]+)/ ){
 +
          $smpSize=$1;
 +
$glueSubClusterPhysicalCPUs=int($cluster_attributes{'nordugrid-cluster-totalcpus'}/$smpSize);
 +
      }
 +
@@ -227,6 +227,7 @@
 +
      }
 +
$glueSubClusterLogicalCPUs=$cluster_attributes{'nordugrid-cluster-totalcpus'};
 +
$glueClusterUniqueID=$cluster_attributes{'nordugrid-cluster-name'};
 +
+        $smpSize = int($smpSize);
 +
 +
      WriteSubCluster();
 +
      }
 +
@@ -438,7 +439,7 @@
 +
$glueHostArchitecturePlatformType=$queue_attributes{'nordugrid-queue-architecture'};
 +
##XX
 +
$glueSubClusterUniqueID=$queue_attributes{'nordugrid-queue-name'}; ##XX
 +
$glueSubClusterName=$queue_attributes{'nordugrid-queue-name'};  ##XX
 +
-        if ( $processorOtherDesc =~ m/Cores=(\d+)/ ){
 +
+        if ( $processorOtherDesc =~ m/Cores=([0-9]*\.?[0-9]+)/ ){
 +
              $smpSize=$1;
 +
$glueSubClusterPhysicalCPUs=int($queue_attributes{'nordugrid-queue-totalcpus'}/$smpSize);
 +
          }
 +
@@ -448,6 +449,7 @@
 +
          }
 +
$glueSubClusterLogicalCPUs=$queue_attributes{'nordugrid-queue-totalcpus'};
 +
##XX
 +
$glueClusterUniqueID=$cluster_attributes{'nordugrid-cluster-name'}; ##XX
 +
+                $smpSize = int($smpSize);
 +
 +
          WriteSubCluster();
 +
          }
  
{{BOX VO|DTEAM|<!-- VOMS RECORDS for DTEAM -->
+
=== Patch to turn on SSL in APEL ===
  
 +
After installing the Apel package, I had to make this changes by hand. On line 136 of the /usr/libexec/arc/ssmsend file, I had to add a parameter ; use_ssl = _use_ssl.
  
''' site-info.def version (sid) '''
+
=== Install the vomsdir LSC Files ===
<pre><nowiki>
+
VO_DTEAM_VOMS_SERVERS="'vomss://voms.hellasgrid.gr:8443/voms/dteam?/dteam' "
+
VO_DTEAM_VOMSES="'dteam voms.hellasgrid.gr 15004 /C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms.hellasgrid.gr dteam' 'dteam voms2.hellasgrid.gr 15004 /C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms2.hellasgrid.gr dteam' "
+
VO_DTEAM_VOMS_CA_DN="'/C=GR/O=HellasGrid/OU=Certification Authorities/CN=HellasGrid CA 2006' '/C=GR/O=HellasGrid/OU=Certification Authorities/CN=HellasGrid CA 2006' "
+
</nowiki></pre>
+
''' vo.d version (vod)'''
+
<pre><nowiki>
+
# $YAIM_LOCATION/vo.d/dteam
+
VOMS_SERVERS="'vomss://voms.hellasgrid.gr:8443/voms/dteam?/dteam' "
+
VOMSES="'dteam voms.hellasgrid.gr 15004 /C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms.hellasgrid.gr dteam' 'dteam voms2.hellasgrid.gr 15004 /C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms2.hellasgrid.gr dteam' "
+
VOMS_CA_DN="'/C=GR/O=HellasGrid/OU=Certification Authorities/CN=HellasGrid CA 2006' '/C=GR/O=HellasGrid/OU=Certification Authorities/CN=HellasGrid CA 2006' "
+
  
</nowiki></pre>
+
I used VomsSnooper to do this as follows.
Notes:
+
n/a
+
}}
+
  
 +
# cd /opt/GridDevel/vomssnooper/usecases/getLSCRecords 
 +
# sed -i -e \"s/ vomsdir/ \/etc\/grid-security\/vomsdir/g\" getLSCRecords.sh
 +
# ./getLSCRecords.sh
  
{{BOX VO|DZERO|<!-- VOMS RECORDS for DZERO -->
+
=== Yaim to make head user accounts, /etc/vomses file <del>and glexec.conf etc.</del>===
  
 +
I used Yaim to do this as follows.
  
''' site-info.def version (sid) '''
+
# yaim  -r -s /root/glitecfg/site-info.def -n ABC -f config_users
<pre><nowiki>
+
# yaim  -r -s /root/glitecfg/site-info.def -n ABC -f config_vomses
VO_DZERO_VOMS_SERVERS="'vomss://voms.fnal.gov:8443/voms/dzero?/dzero' "
+
<del> # yaim -c -s /root/glitecfg/site-info.def -n GLEXEC_wn </del>
VO_DZERO_VOMSES="'dzero voms.fnal.gov 15002 /DC=com/DC=DigiCert-Grid/O=Open Science Grid/OU=Services/CN=voms.fnal.gov dzero' "
+
VO_DZERO_VOMS_CA_DN="'/DC=com/DC=DigiCert-Grid/O=DigiCert Grid/CN=DigiCert Grid CA-1' "
+
</nowiki></pre>
+
''' vo.d version (vod)'''
+
<pre><nowiki>
+
# $YAIM_LOCATION/vo.d/dzero
+
VOMS_SERVERS="'vomss://voms.fnal.gov:8443/voms/dzero?/dzero' "
+
VOMSES="'dzero voms.fnal.gov 15002 /DC=com/DC=DigiCert-Grid/O=Open Science Grid/OU=Services/CN=voms.fnal.gov dzero' "
+
VOMS_CA_DN="'/DC=com/DC=DigiCert-Grid/O=DigiCert Grid/CN=DigiCert Grid CA-1' "
+
  
</nowiki></pre>
+
For this to work, ap priori, the site-info.def file must be present. A users.conf file and a groups.conf file must exist in the /opt/glite/yaim/etc/ directory. This is usually a part of any grid system CE install, but advice on how to prepare these is given in this Yaim guide (that I hope will be maintained for a little while longer.)
Notes:
+
n/a
+
}}
+
  
{{BOX VO|ESR|<!-- VOMS RECORDS for ESR -->
+
https://twiki.cern.ch/twiki/bin/view/LCG/YaimGuide400
  
 +
(As far as I know, there is no reason for the headnode to use glexec.)
  
''' site-info.def version (sid) '''
+
=== Head Services ===
<pre><nowiki>
+
VO_ESR_VOMS_SERVERS="'vomss://voms.grid.sara.nl:8443/voms/esr?/esr' "
+
VO_ESR_VOMSES="'esr voms.grid.sara.nl 30001 /O=dutchgrid/O=hosts/OU=sara.nl/CN=voms.grid.sara.nl esr' "
+
VO_ESR_VOMS_CA_DN="'/C=NL/O=NIKHEF/CN=NIKHEF medium-security certification auth' "
+
</nowiki></pre>
+
''' vo.d version (vod)'''
+
<pre><nowiki>
+
# $YAIM_LOCATION/vo.d/esr
+
VOMS_SERVERS="'vomss://voms.grid.sara.nl:8443/voms/esr?/esr' "
+
VOMSES="'esr voms.grid.sara.nl 30001 /O=dutchgrid/O=hosts/OU=sara.nl/CN=voms.grid.sara.nl esr' "
+
VOMS_CA_DN="'/C=NL/O=NIKHEF/CN=NIKHEF medium-security certification auth' "
+
  
</nowiki></pre>
+
I had to set some services running.
Notes:
+
n/a
+
}}
+
  
{{BOX VO|EPIC.VO.GRIDPP.AC.UK|<!-- VOMS RECORDS for EPIC.VO.GRIDPP.AC.UK -->
+
A-rex - the ARC CE service
 +
condor - the CONDOR batch system service
 +
nordugrid-arc-ldap-infosys – part of the bdii
 +
nordugrid-arc-slapd – part of the bdii
 +
nordugrid-arc-bdii – part of the bdii
 +
gridftpd – the gridftp service
  
  
''' site-info.def version (sid) '''
+
=== File cleanup ===
<pre><nowiki>
+
# VO_EPIC_VO_GRIDPP_AC_UK_VOMS_SERVERS="'vomss://voms.gridpp.ac.uk:8443/voms/epic.vo.gridpp.ac.uk?/epic.vo.gridpp.ac.uk' "
+
# VO_EPIC_VO_GRIDPP_AC_UK_VOMSES="'epic.vo.gridpp.ac.uk voms.gridpp.ac.uk 15507 /C=UK/O=eScience/OU=Manchester/L=HEP/CN=voms.gridpp.ac.uk epic.vo.gridpp.ac.uk' 'epic.vo.gridpp.ac.uk voms02.gridpp.ac.uk 15027 /C=UK/O=eScience/OU=Oxford/L=OeSC/CN=voms02.gridpp.ac.uk epic.vo.gridpp.ac.uk' 'epic.vo.gridpp.ac.uk voms03.gridpp.ac.uk 15027 /C=UK/O=eScience/OU=Imperial/L=Physics/CN=voms03.gridpp.ac.uk epic.vo.gridpp.ac.uk' "
+
# VO_EPIC_VO_GRIDPP_AC_UK_VOMS_CA_DN="'/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' "
+
</nowiki></pre>
+
''' vo.d version (vod)'''
+
<pre><nowiki>
+
# $YAIM_LOCATION/vo.d/epic.vo.gridpp.ac.uk
+
VOMS_SERVERS="'vomss://voms.gridpp.ac.uk:8443/voms/epic.vo.gridpp.ac.uk?/epic.vo.gridpp.ac.uk' "
+
VOMSES="'epic.vo.gridpp.ac.uk voms.gridpp.ac.uk 15507 /C=UK/O=eScience/OU=Manchester/L=HEP/CN=voms.gridpp.ac.uk epic.vo.gridpp.ac.uk' 'epic.vo.gridpp.ac.uk voms02.gridpp.ac.uk 15027 /C=UK/O=eScience/OU=Oxford/L=OeSC/CN=voms02.gridpp.ac.uk epic.vo.gridpp.ac.uk' 'epic.vo.gridpp.ac.uk voms03.gridpp.ac.uk 15027 /C=UK/O=eScience/OU=Imperial/L=Physics/CN=voms03.gridpp.ac.uk epic.vo.gridpp.ac.uk' "
+
VOMS_CA_DN="'/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' "
+
  
</nowiki></pre>
+
ARC keeps a prodigious number of tiny stale output files that need to be cleaned up. Eventually, so many are kept that the head node can run out of inodes or file space. I keep the system clean with a cronjob that runs a script like this one.
Notes:
+
n/a
+
}}
+
  
{{BOX VO|GEANT4|<!-- VOMS RECORDS for GEANT4 -->
 
  
 +
#!/bin/bash
 +
 +
MAXAGE=21
 +
echo `date` cleanJobstatusDirs.sh starts with maxage of $MAXAGE days
 +
fname=/opt/jobstatus_archive/jobstatus_"$(date +%Y%m%d%H%M%S)".tar
 +
sleep 1
 +
 +
if [ ! -d /opt/jobstatus_archive ]; then
 +
  mkdir /opt/jobstatus_archive
 +
  if [ $? != 0 ]; then
 +
    echo Some kind of problem so I cannot make the jobstatus_archive dir
 +
    exit 1
 +
  fi
 +
fi
 +
 +
cd /var/spool/arc/jobstatus
 +
if [ $? != 0 ]; then
 +
  echo Some problem getting to the jobstatus dir so I am bailing out
 +
  exit 1
 +
fi
 +
 +
# Back up all the jobstatus files older than MAXAGE
 +
tmpListOfOldFiles=$(mktemp /tmp/jobstatus_archive_files.XXXXXX)
 +
find /var/spool/arc/jobstatus  -mtime +$MAXAGE -type f  > $tmpListOfOldFiles
 +
tar -cf $fname -T $tmpListOfOldFiles
 +
gzip $fname
 +
 +
# Delete all the jobstatus files older than MAXAGE
 +
for f in `cat $tmpListOfOldFiles`; do
 +
  echo Deleting empty file $f
 +
  rm -f $f
 +
done
 +
 +
tmpListOfOldDirs=$(mktemp /tmp/jobstatus_archive_dirs.XXXXXX)
 +
for f in `cat $tmpListOfOldFiles`; do echo `dirname $f`; done | sort -n | uniq > $tmpListOfOldDirs
 +
 +
for d in `cat $tmpListOfOldDirs`; do
 +
  ls -1 $d | wc -l | grep -q "^0$"
 +
  if [ $? == 0 ]; then
 +
    echo Deleting empty dir $d
 +
    rmdir $d
 +
  fi
 +
done
 +
 +
# Clean the delegations of empty dirs more than 90 days old
 +
find /var/spool/arc/jobstatus/delegations/ -depth -type d -empty -mtime +90 -delete
 +
 +
# Clean the urs
 +
find /var/urs -depth -type f -mtime +90 -delete
 +
 +
rm $tmpListOfOldFiles
 +
rm $tmpListOfOldDirs
  
''' site-info.def version (sid) '''
 
<pre><nowiki>
 
VO_GEANT4_VOMS_SERVERS="'vomss://voms.cern.ch:8443/voms/geant4?/geant4' "
 
VO_GEANT4_VOMSES="'geant4 lcg-voms.cern.ch 15007 /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch geant4' 'geant4 voms.cern.ch 15007 /DC=ch/DC=cern/OU=computers/CN=voms.cern.ch geant4' "
 
VO_GEANT4_VOMS_CA_DN="'/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' "
 
</nowiki></pre>
 
''' vo.d version (vod)'''
 
<pre><nowiki>
 
# $YAIM_LOCATION/vo.d/geant4
 
VOMS_SERVERS="'vomss://voms.cern.ch:8443/voms/geant4?/geant4' "
 
VOMSES="'geant4 lcg-voms.cern.ch 15007 /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch geant4' 'geant4 voms.cern.ch 15007 /DC=ch/DC=cern/OU=computers/CN=voms.cern.ch geant4' "
 
VOMS_CA_DN="'/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' "
 
  
</nowiki></pre>
+
And that was it. That's all I did to get the server working, as far as I can recall.
Notes:
+
n/a
+
}}
+
  
{{BOX VO|GRIDPP|<!-- VOMS RECORDS for GRIDPP -->
+
==Worker Node ==
  
 +
=== Worker Standard build ===
  
''' site-info.def version (sid) '''
+
As for the headnode, the basis for the initial worker node build follows the standard model for any workernode at Liverpool, prior to the installation of any middleware. Such a baseline build might include networking, cvmfs, iptables, nagios scripts, emi-wn package, ganglia, ssh etc.  
<pre><nowiki>
+
VO_GRIDPP_VOMS_SERVERS="'vomss://voms.gridpp.ac.uk:8443/voms/gridpp?/gridpp' "
+
VO_GRIDPP_VOMSES="'gridpp voms.gridpp.ac.uk 15000 /C=UK/O=eScience/OU=Manchester/L=HEP/CN=voms.gridpp.ac.uk gridpp' 'gridpp voms02.gridpp.ac.uk 15000 /C=UK/O=eScience/OU=Oxford/L=OeSC/CN=voms02.gridpp.ac.uk gridpp' 'gridpp voms03.gridpp.ac.uk 15000 /C=UK/O=eScience/OU=Imperial/L=Physics/CN=voms03.gridpp.ac.uk gridpp' "
+
VO_GRIDPP_VOMS_CA_DN="'/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' "
+
</nowiki></pre>
+
''' vo.d version (vod)'''
+
<pre><nowiki>
+
# $YAIM_LOCATION/vo.d/gridpp
+
VOMS_SERVERS="'vomss://voms.gridpp.ac.uk:8443/voms/gridpp?/gridpp' "
+
VOMSES="'gridpp voms.gridpp.ac.uk 15000 /C=UK/O=eScience/OU=Manchester/L=HEP/CN=voms.gridpp.ac.uk gridpp' 'gridpp voms02.gridpp.ac.uk 15000 /C=UK/O=eScience/OU=Oxford/L=OeSC/CN=voms02.gridpp.ac.uk gridpp' 'gridpp voms03.gridpp.ac.uk 15000 /C=UK/O=eScience/OU=Imperial/L=Physics/CN=voms03.gridpp.ac.uk gridpp' "
+
VOMS_CA_DN="'/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' "
+
  
</nowiki></pre>
+
Aside: After an installation mistake, it was discovered that an ordinary TORQUE workernode could be used as the basis of the build, and it would then be possible to use the same worker node on both ARC/CONDOR and CREAM/TORQUE systems, but not simultaneously. This idea was not persued, however.
Notes:
+
n/a
+
}}
+
  
{{BOX VO|HONE|<!-- VOMS RECORDS for HONE -->
+
=== Worker Extra Directories ===
 +
I needed to make these directories:
  
 +
/root/glitecfg
 +
/etc/condor/config.d
 +
/etc/grid-security/gridmapdir
 +
/etc/arc/runtime/ENV
 +
/etc/condor/ral
 +
/data/condor_pool
  
''' site-info.def version (sid) '''
+
And these:
<pre><nowiki>
+
/opt/exp_soft_sl5              # Note: this is our traditional software mount point
VO_HONE_VOMS_SERVERS="'vomss://grid-voms.desy.de:8443/voms/hone?/hone' "
+
/usr/libexec/condor/scripts    # Only used by our autmatic test routines
VO_HONE_VOMSES="'hone grid-voms.desy.de 15106 /C=DE/O=GermanGrid/OU=DESY/CN=host/grid-voms.desy.de hone' "
+
VO_HONE_VOMS_CA_DN="'/C=DE/O=GermanGrid/CN=GridKa-CA' "
+
</nowiki></pre>
+
''' vo.d version (vod)'''
+
<pre><nowiki>
+
# $YAIM_LOCATION/vo.d/hone
+
VOMS_SERVERS="'vomss://grid-voms.desy.de:8443/voms/hone?/hone' "
+
VOMSES="'hone grid-voms.desy.de 15106 /C=DE/O=GermanGrid/OU=DESY/CN=host/grid-voms.desy.de hone' "
+
VOMS_CA_DN="'/C=DE/O=GermanGrid/CN=GridKa-CA' "
+
  
</nowiki></pre>
+
On our system, exp_soft_sl5 is actually a mount point to a central location. CVMFS takes
Notes:
+
over this role now, but it might be necessary to set up a shared mount system
n/a
+
such as this and point the VO software directories to it, as shown in the
}}
+
head node file /etc/profile.d/env.sh (see above.)
  
{{BOX VO|HYPERK.ORG|<!-- VOMS RECORDS for HYPERK.ORG -->
+
=== Worker Additional Packages ===
  
 +
We had to install the main CONDOR package:
 +
condor
  
''' site-info.def version (sid) '''
+
And Andrew McNab's Machine Job Features package, which provides run time information that jobs can read.
<pre><nowiki>
+
mjf-htcondor-00.13-1.noarch
# VO_HYPERK_ORG_VOMS_SERVERS="'vomss://voms.gridpp.ac.uk:8443/voms/hyperk.org?/hyperk.org' "
+
# VO_HYPERK_ORG_VOMSES="'hyperk.org voms.gridpp.ac.uk 15510 /C=UK/O=eScience/OU=Manchester/L=HEP/CN=voms.gridpp.ac.uk hyperk.org' 'hyperk.org voms02.gridpp.ac.uk 15510 /C=UK/O=eScience/OU=Oxford/L=OeSC/CN=voms02.gridpp.ac.uk hyperk.org' 'hyperk.org voms03.gridpp.ac.uk 15510 /C=UK/O=eScience/OU=Imperial/L=Physics/CN=voms03.gridpp.ac.uk hyperk.org' "
+
# VO_HYPERK_ORG_VOMS_CA_DN="'/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' "
+
</nowiki></pre>
+
''' vo.d version (vod)'''
+
<pre><nowiki>
+
# $YAIM_LOCATION/vo.d/hyperk.org
+
VOMS_SERVERS="'vomss://voms.gridpp.ac.uk:8443/voms/hyperk.org?/hyperk.org' "
+
VOMSES="'hyperk.org voms.gridpp.ac.uk 15510 /C=UK/O=eScience/OU=Manchester/L=HEP/CN=voms.gridpp.ac.uk hyperk.org' 'hyperk.org voms02.gridpp.ac.uk 15510 /C=UK/O=eScience/OU=Oxford/L=OeSC/CN=voms02.gridpp.ac.uk hyperk.org' 'hyperk.org voms03.gridpp.ac.uk 15510 /C=UK/O=eScience/OU=Imperial/L=Physics/CN=voms03.gridpp.ac.uk hyperk.org' "
+
VOMS_CA_DN="'/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' "
+
  
</nowiki></pre>
+
We also had to install some various bits of extra middleware:
Notes:
+
emi-wn    # for glite-brokerinfo (at least)
n/a
+
lcg-util
}}
+
libcgroup
 +
fetch-crl
 +
voms-clients3
 +
voms
 +
lcg-util-libs
 +
lcg-util-python
 +
lfc-devel
 +
lfc
 +
lfc-perl
 +
lfc-python
 +
uberftp
 +
voms-clients3
 +
voms
 +
gfal2-plugin-lfc
 +
HEP_OSlibs_SL6
  
 +
These libraries were also needed:
 +
libXft-devel
 +
libxml2-devel
 +
libXpm-devel
  
{{BOX VO|ILC|<!-- VOMS RECORDS for ILC -->
+
We also installed some things, mostly for various VOs, I think:
 +
bzip2-devel
 +
compat-gcc-34-c++
 +
compat-gcc-34-g77
 +
gcc-c++
 +
gcc-gfortran
 +
git
 +
gmp-devel
 +
imake
 +
ipmitool
 +
libgfortran
 +
liblockfile-devel
 +
ncurses-devel
 +
python
  
 +
=== Worker Files ===
  
''' site-info.def version (sid) '''
+
* '''File:''' /root/scripts/set_node_parameters.pl
<pre><nowiki>
+
* Notes: This script senses the type of the system and sets it up according to how many slots it has etc.You'll also have to make arrangements to run this script once when you setup the machine. On the liverpool system, this is done with the following puppet stanza. If you are using Puppet with Hiera, you can probably parameterise these settings.
VO_ILC_VOMS_SERVERS="'vomss://grid-voms.desy.de:8443/voms/ilc?/ilc' "
+
VO_ILC_VOMSES="'ilc grid-voms.desy.de 15110 /C=DE/O=GermanGrid/OU=DESY/CN=host/grid-voms.desy.de ilc' "
+
VO_ILC_VOMS_CA_DN="'/C=DE/O=GermanGrid/CN=GridKa-CA' "
+
</nowiki></pre>
+
''' vo.d version (vod)'''
+
<pre><nowiki>
+
# $YAIM_LOCATION/vo.d/ilc
+
VOMS_SERVERS="'vomss://grid-voms.desy.de:8443/voms/ilc?/ilc' "
+
VOMSES="'ilc grid-voms.desy.de 15110 /C=DE/O=GermanGrid/OU=DESY/CN=host/grid-voms.desy.de ilc' "
+
VOMS_CA_DN="'/C=DE/O=GermanGrid/CN=GridKa-CA' "
+
  
</nowiki></pre>
+
exec { "set_node_parameters.pl": command =>  "/root/scripts/set_node_parameters.pl > /etc/condor/config.d/00-node_parameters; \
Notes:  
+
/bin/touch /root/scripts/done-set_node_parameters.pl", require => [ File["/root/scripts/set_node_parameters.pl"],
n/a
+
File["/etc/condor/config.d"] ], onlyif => "/usr/bin/test ! -f /root/scripts/done-set_node_parameters.pl", timeout => "86400" }
}}
+
* Customise: Yes. You'll need to edit it it to suit your site.
 +
* Content:
 +
#!/usr/bin/perl
 +
 +
use strict;
 +
my $foundType = 0;
 +
my @outputLines;
 +
 +
#processor : 3
 +
#physical id : 0
 +
 +
my $processors = 0;
 +
my %physicalIds;
 +
 +
open(CPUINFO,"/proc/cpuinfo") or die("Can't open /proc/cpuinfo, $?");
 +
while(<CPUINFO>) {
 +
 
 +
  if (/processor/) {
 +
    $processors++;
 +
  }
 +
  if (/physical id\s*:\s*(\d+)/) {
 +
    $physicalIds{$1} = 1;
 +
  }
 +
  if (/model name/) {
 +
    if (! $foundType) {
 +
      s/.*CPU\s*//;s/\s.*//;
 +
      if (/E5620/){
 +
        $foundType=1;
 +
        push (@outputLines, "RalNodeLabel = E5620\n");
 +
        push (@outputLines, "RalScaling =  1.205\n");
 +
        push (@outputLines, "NUM_SLOTS = 1\n");
 +
        push (@outputLines, "SLOT_TYPE_1              = cpus=10,mem=auto,disk=auto\n");
 +
        push (@outputLines, "NUM_SLOTS_TYPE_1          = 1\n");
 +
        push (@outputLines, "SLOT_TYPE_1_PARTITIONABLE = TRUE\n");
 +
      }
 +
      elsif (/L5420/){
 +
        $foundType=1;
 +
        push (@outputLines, "RalNodeLabel = L5420\n");
 +
        push (@outputLines, "RalScaling =  0.896\n");
 +
        push (@outputLines, "NUM_SLOTS = 1\n");
 +
        push (@outputLines, "SLOT_TYPE_1              = cpus=8,mem=auto,disk=auto\n");
 +
        push (@outputLines, "NUM_SLOTS_TYPE_1          = 1\n");
 +
        push (@outputLines, "SLOT_TYPE_1_PARTITIONABLE = TRUE\n");
 +
      }
 +
      elsif (/X5650/){
 +
        $foundType=1;
 +
        push (@outputLines, "RalNodeLabel = X5650\n");
 +
        push (@outputLines, "RalScaling =  1.229\n");
 +
        push (@outputLines, "NUM_SLOTS = 1\n");
 +
        push (@outputLines, "SLOT_TYPE_1              = cpus=16,mem=auto,disk=auto\n");
 +
        push (@outputLines, "NUM_SLOTS_TYPE_1          = 1\n");
 +
        push (@outputLines, "SLOT_TYPE_1_PARTITIONABLE = TRUE\n");
 +
      }
 +
      elsif (/E5-2630/){
 +
        $foundType=1;
 +
        push (@outputLines, "RalNodeLabel = E5-2630\n");
 +
        push (@outputLines, "RalScaling =  1.386\n");
 +
        push (@outputLines, "NUM_SLOTS = 1\n");
 +
        push (@outputLines, "SLOT_TYPE_1              = cpus=18,mem=auto,disk=auto\n");
 +
        push (@outputLines, "NUM_SLOTS_TYPE_1          = 1\n");
 +
        push (@outputLines, "SLOT_TYPE_1_PARTITIONABLE = TRUE\n");
 +
      }
 +
      else {
 +
        $foundType=1;
 +
        push (@outputLines, "RalNodeLabel = BASELINE\n");
 +
        push (@outputLines, "RalScaling =  1.0\n");
 +
        push (@outputLines, "NUM_SLOTS = 1\n");
 +
        push (@outputLines, "SLOT_TYPE_1              = cpus=8,mem=auto,disk=auto\n");
 +
        push (@outputLines, "NUM_SLOTS_TYPE_1          = 1\n");
 +
        push (@outputLines, "SLOT_TYPE_1_PARTITIONABLE = TRUE\n");
 +
      }
 +
    }
 +
  }
 +
}
 +
close(CPUINFO);
 +
foreach my $line(@outputLines) {
 +
  print $line;
 +
}
 +
my @keys = keys(%physicalIds);
 +
my $numberOfCpus = $#keys+1;
 +
print ("# processors : $processors\n");
 +
print ("# numberOfCpus : $numberOfCpus\n");
 +
 +
exit(0);
 +
 
 +
 +
* '''File:''' /etc/condor/condor_config.local
 +
* Notes: The main client condor configuration custom file.
 +
* Customise: Yes. You'll need to edit it to suit your site.
 +
* Content:
  
{{BOX VO|IPV6.HEPIX.ORG|<!-- VOMS RECORDS for IPV6.HEPIX.ORG -->
+
##  What machine is your central manager?
 +
 +
CONDOR_HOST = hepgrid2.ph.liv.ac.uk
 +
 +
## Pool's short description
 +
 +
COLLECTOR_NAME = Condor at $(FULL_HOSTNAME)
 +
 +
## Put the output in a huge dir
 +
 +
EXECUTE = /data/condor_pool/
 +
 +
##  Make it switchable when this machine is willing to start a job
 +
 +
ENABLE_PERSISTENT_CONFIG = TRUE
 +
PERSISTENT_CONFIG_DIR = /etc/condor/ral
 +
STARTD_ATTRS = $(STARTD_ATTRS) StartJobs, RalNodeOnline, OnlyMulticore
 +
STARTD.SETTABLE_ATTRS_ADMINISTRATOR = StartJobs , OnlyMulticore
 +
StartJobs = False
 +
RalNodeOnline = False
 +
OnlyMulticore = False
 +
 +
#START = ((StartJobs =?= True) && (RalNodeOnline =?= True) && (ifThenElse(OnlyMulticore =?= True,ifThenElse(RequestCpus =?= 8, True, False) ,True ) ))
 +
START = ((StartJobs == True) && (RalNodeOnline == True) && (ifThenElse(OnlyMulticore == True,ifThenElse(RequestCpus == 8, True, False) ,True ) ))
 +
 +
##  When to suspend a job?
 +
 +
SUSPEND = FALSE
 +
 +
##  When to nicely stop a job?
 +
# When a job is running and the PREEMPT expression evaluates to True, the
 +
# condor_startd will evict the job. The PREEMPT expression s hould reflect the
 +
# requirements under which the machine owner will not permit a job to continue to run.
 +
# For example, a policy to evict a currently running job when a key is hit or when
 +
# it is the 9:00am work arrival time, would be expressed in the PREEMPT expression
 +
# and enforced by the condor_startd.
 +
 +
PREEMPT = FALSE
 +
 +
# If there is a job from a higher priority user sitting idle, the
 +
# condor_negotiator daemon may evict a currently running job submitted
 +
# from a lower priority user if PREEMPTION_REQUIREMENTS is True.
 +
 +
PREEMPTION_REQUIREMENTS = FALSE
 +
 +
# No job has pref over any other
 +
 +
#RANK = FALSE
 +
 +
##  When to instantaneously kill a preempting job
 +
##  (e.g. if a job is in the pre-empting stage for too long)
 +
 +
KILL = FALSE
 +
 +
##  This macro determines what daemons the condor_master will start and keep its watchful eyes on.
 +
##  The list is a comma or space separated list of subsystem names
 +
 +
DAEMON_LIST = MASTER, STARTD
 +
 +
ALLOW_WRITE = *
 +
 +
#######################################
 +
# scaling
 +
#
 +
 +
STARTD_ATTRS = $(STARTD_ATTRS) RalScaling RalNodeLabel
 +
 +
#######################################
 +
# Andrew Lahiff's tip for over committing memory
 +
 +
#MEMORY = 1.35 * quantize( $(DETECTED_MEMORY), 1000 )
 +
MEMORY = 2.2 * quantize( $(DETECTED_MEMORY), 1000 )
 +
 +
#######################################
 +
# Andrew Lahiff's security
 +
 +
ALLOW_WRITE =
 +
 +
UID_DOMAIN = ph.liv.ac.uk
 +
 +
CENTRAL_MANAGER1 = hepgrid2.ph.liv.ac.uk
 +
COLLECTOR_HOST = $(CENTRAL_MANAGER1)
 +
 +
# Central managers
 +
CMS = condor_pool@$(UID_DOMAIN)/hepgrid2.ph.liv.ac.uk
 +
 +
# CEs
 +
CES = condor_pool@$(UID_DOMAIN)/hepgrid2.ph.liv.ac.uk
 +
 +
# Worker nodes
 +
WNS = condor_pool@$(UID_DOMAIN)/192.168.*
 +
 +
# Users
 +
USERS = *@$(UID_DOMAIN)
 +
USERS = *
 +
 +
# Required for HA
 +
HOSTALLOW_NEGOTIATOR = $(COLLECTOR_HOST)
 +
HOSTALLOW_ADMINISTRATOR = $(COLLECTOR_HOST)
 +
HOSTALLOW_NEGOTIATOR_SCHEDD = $(COLLECTOR_HOST)
 +
 +
# Authorization
 +
HOSTALLOW_WRITE =
 +
ALLOW_READ = */*.ph.liv.ac.uk
 +
NEGOTIATOR.ALLOW_WRITE = $(CES), $(CMS)
 +
COLLECTOR.ALLOW_ADVERTISE_MASTER = $(CES), $(CMS), $(WNS)
 +
COLLECTOR.ALLOW_ADVERTISE_SCHEDD = $(CES)
 +
COLLECTOR.ALLOW_ADVERTISE_STARTD = $(WNS)
 +
SCHEDD.ALLOW_WRITE = $(USERS)
 +
SHADOW.ALLOW_WRITE = $(WNS), $(CES)
 +
ALLOW_DAEMON = condor_pool@$(UID_DOMAIN)/*.ph.liv.ac.uk, $(FULL_HOSTNAME)
 +
ALLOW_ADMINISTRATOR = root@$(UID_DOMAIN)/$(IP_ADDRESS), condor_pool@$(UID_DOMAIN)/$(IP_ADDRESS), $(CMS)
 +
ALLOW_CONFIG = root@$(FULL_HOSTNAME)
 +
 +
# Temp debug
 +
#ALLOW_WRITE = $(FULL_HOSTNAME), $(IP_ADDRESS), $(CONDOR_HOST)
 +
 +
 +
# Don't allow nobody to run jobs
 +
SCHEDD.DENY_WRITE = nobody@$(UID_DOMAIN)
 +
 +
# Authentication
 +
SEC_PASSWORD_FILE = /etc/condor/pool_password
 +
SEC_DEFAULT_AUTHENTICATION = REQUIRED
 +
SEC_READ_AUTHENTICATION = OPTIONAL
 +
SEC_CLIENT_AUTHENTICATION = REQUIRED
 +
SEC_DEFAULT_AUTHENTICATION_METHODS = PASSWORD,FS
 +
SCHEDD.SEC_WRITE_AUTHENTICATION_METHODS = FS,PASSWORD
 +
SCHEDD.SEC_DAEMON_AUTHENTICATION_METHODS = FS,PASSWORD
 +
SEC_CLIENT_AUTHENTICATION_METHODS = FS,PASSWORD,CLAIMTOBE
 +
SEC_READ_AUTHENTICATION_METHODS = FS,PASSWORD,CLAIMTOBE
 +
 +
# Integrity
 +
SEC_DEFAULT_INTEGRITY  = REQUIRED
 +
SEC_DAEMON_INTEGRITY = REQUIRED
 +
SEC_NEGOTIATOR_INTEGRITY = REQUIRED
 +
 +
# Separation
 +
USE_PID_NAMESPACES = False
 +
 +
# Smooth updates
 +
MASTER_NEW_BINARY_RESTART = PEACEFUL
 +
 +
# Give jobs 3 days
 +
MAXJOBRETIREMENTTIME = 3600 * 24 * 3
 +
 +
# Port limits
 +
HIGHPORT = 65000
 +
LOWPORT = 20000
 +
 +
# Startd Crons
 +
STARTD_CRON_JOBLIST=TESTNODE
 +
STARTD_CRON_TESTNODE_EXECUTABLE=/usr/libexec/condor/scripts/testnodeWrapper.sh
 +
STARTD_CRON_TESTNODE_PERIOD=300s
 +
 +
# Make sure values get over
 +
STARTD_CRON_AUTOPUBLISH = If_Changed
 +
 +
# One job per claim
 +
CLAIM_WORKLIFE = 0
 +
 +
# Enable CGROUP control
 +
BASE_CGROUP = htcondor
 +
# hard: job can't access more physical memory than allocated
 +
# soft: job can access more physical memory than allocated when there is free memory
 +
CGROUP_MEMORY_LIMIT_POLICY = soft
 +
 +
# Use Machine-Job-Features
 +
USER_JOB_WRAPPER=/usr/sbin/mjf-job-wrapper
 +
 
 +
* '''File:''' /etc/profile.d/liv-lcg-env.sh
 +
* Notes: Some environment script needed by the system.
 +
* Customise: Yes. You'll need to edit it it to suit your site.
 +
* Content:
 +
export ATLAS_RECOVERDIR=/data/atlas
 +
EDG_WL_SCRATCH=$TMPDIR
 +
 +
ID=`id -u`
 +
 +
if [ $ID -gt 19999 ]; then
 +
  ulimit -v 10000000
 +
fi
 +
 +
 +
* '''File:''' /etc/profile.d/liv-lcg-env.csh
 +
* Notes: Some other environment script needed by the system.
 +
* Customise: Yes. You'll need to edit it it to suit your site.
 +
* Content:
 +
setenv ATLAS_RECOVERDIR /data/atlas
 +
if ( "$?TMPDIR" == "1" ) then
 +
setenv EDG_WL_SCRATCH $TMPDIR
 +
else
 +
setenv EDG_WL_SCRATCH ""
 +
endif
  
  
''' site-info.def version (sid) '''
+
<pre><nowiki>
+
# VO_IPV6_HEPIX_ORG_VOMS_SERVERS="'vomss://voms2.cnaf.infn.it:8443/voms/ipv6.hepix.org?/ipv6.hepix.org' "
+
# VO_IPV6_HEPIX_ORG_VOMSES="'ipv6.hepix.org voms2.cnaf.infn.it 15013 /C=IT/O=INFN/OU=Host/L=CNAF/CN=voms2.cnaf.infn.it ipv6.hepix.org' "
+
# VO_IPV6_HEPIX_ORG_VOMS_CA_DN="'/C=IT/O=INFN/CN=INFN CA' "
+
</nowiki></pre>
+
''' vo.d version (vod)'''
+
<pre><nowiki>
+
# $YAIM_LOCATION/vo.d/ipv6.hepix.org
+
VOMS_SERVERS="'vomss://voms2.cnaf.infn.it:8443/voms/ipv6.hepix.org?/ipv6.hepix.org' "
+
VOMSES="'ipv6.hepix.org voms2.cnaf.infn.it 15013 /C=IT/O=INFN/OU=Host/L=CNAF/CN=voms2.cnaf.infn.it ipv6.hepix.org' "
+
VOMS_CA_DN="'/C=IT/O=INFN/CN=INFN CA' "
+
  
</nowiki></pre>
+
* '''File:''' /etc/condor/pool_password
Notes:  
+
* Notes: Will have its own section (TBD)
n/a
+
* Customise: Yes.
}}
+
* Content: The content is the same as the one on the head node (see above).
  
 +
* '''File:''' /root/glitecfg/site-info.def
 +
* Notes: Just a copy of the site standard SID file. Used to make the accounts.
 +
* Content: as per site standard
 +
* '''File:''' /root/glitecfg/vo.d
 +
* Notes: Just a copy of the site standard vo.d dir. Used to make the accounts.
 +
* Content: as per site standard
 +
* '''File:''' /opt/glite/yaim/etc/users.conf
 +
* Notes: Just a copy of the site standard users.conf file. Used to make the accounts.
 +
* Content: as per site standard
 +
* '''File:''' /opt/glite/yaim/etc/groups.conf
 +
* Notes: Just a copy of the site standard groups.conf file. Used to make the accounts.
 +
* Content: as per site standard
 +
* '''File:''' /etc/lcas/lcas-glexec.db
 +
* Notes: Stops yaim from complaining about missing file
 +
* Content: empty
 +
* '''File:''' /etc/arc/runtime/ENV/GLITE
 +
* Notes: Same as the head node version; see above. The GLITE runtime environment.
 +
* Content: See above
 +
* '''File:''' /etc/arc/runtime/ENV/PROXY
 +
* Notes: Same as the head node version; see above. Stops error messages of one kind or another
 +
* Content: empty
 +
* '''File:''' /usr/etc/globus-user-env.sh
 +
* Notes: Jobs just need it to be there.
 +
* Content: empty
  
{{BOX VO|LHCB|<!-- VOMS RECORDS for LHCB -->
+
=== Worker Cron jobs ===
 +
We run a cronjob to keep cvmfs clean:
  
 +
0 5 */3 * * /root/bin/cvmfs_fsck.sh >> /var/log/cvmfs_fsck.log 2>&1
  
''' site-info.def version (sid) '''
+
=== Worker Special notes ===  
<pre><nowiki>
+
VO_LHCB_VOMS_SERVERS="'vomss://lcg-voms2.cern.ch:8443/voms/lhcb?/lhcb' 'vomss://voms.cern.ch:8443/voms/lhcb?/lhcb' 'vomss://voms2.cern.ch:8443/voms/lhcb?/lhcb' "
+
VO_LHCB_VOMSES="'lhcb lcg-voms.cern.ch 15003 /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch lhcb' 'lhcb lcg-voms2.cern.ch 15003 /DC=ch/DC=cern/OU=computers/CN=lcg-voms2.cern.ch lhcb' 'lhcb voms.cern.ch 15003 /DC=ch/DC=cern/OU=computers/CN=voms.cern.ch lhcb' 'lhcb voms2.cern.ch 15003 /DC=ch/DC=cern/OU=computers/CN=voms2.cern.ch lhcb' "
+
VO_LHCB_VOMS_CA_DN="'/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Grid Certification Authority' '/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Grid Certification Authority' "
+
</nowiki></pre>
+
''' vo.d version (vod)'''
+
<pre><nowiki>
+
# $YAIM_LOCATION/vo.d/lhcb
+
VOMS_SERVERS="'vomss://lcg-voms2.cern.ch:8443/voms/lhcb?/lhcb' 'vomss://voms.cern.ch:8443/voms/lhcb?/lhcb' 'vomss://voms2.cern.ch:8443/voms/lhcb?/lhcb' "
+
VOMSES="'lhcb lcg-voms.cern.ch 15003 /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch lhcb' 'lhcb lcg-voms2.cern.ch 15003 /DC=ch/DC=cern/OU=computers/CN=lcg-voms2.cern.ch lhcb' 'lhcb voms.cern.ch 15003 /DC=ch/DC=cern/OU=computers/CN=voms.cern.ch lhcb' 'lhcb voms2.cern.ch 15003 /DC=ch/DC=cern/OU=computers/CN=voms2.cern.ch lhcb' "
+
VOMS_CA_DN="'/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Grid Certification Authority' '/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Grid Certification Authority' "
+
  
</nowiki></pre>
+
None to speak of (yet).
Notes:
+
n/a
+
}}
+
  
{{BOX VO|MAGIC|<!-- VOMS RECORDS for MAGIC -->
+
=== Worker user accounts ===
  
 +
As with the head node, I used Yaim to do this as follows.
  
''' site-info.def version (sid) '''
+
# yaim  -r -s /root/glitecfg/site-info.def -n ABC -f config_users
<pre><nowiki>
+
VO_MAGIC_VOMS_SERVERS="'vomss://voms01.pic.es:8443/voms/magic?/magic' "
+
VO_MAGIC_VOMSES="'magic voms01.pic.es 15003 /DC=es/DC=irisgrid/O=pic/CN=voms01.pic.es magic' 'magic voms02.pic.es 15003 /DC=es/DC=irisgrid/O=pic/CN=voms02.pic.es magic' "
+
VO_MAGIC_VOMS_CA_DN="'/DC=es/DC=irisgrid/CN=IRISGridCA' '/DC=es/DC=irisgrid/CN=IRISGridCA' "
+
</nowiki></pre>
+
''' vo.d version (vod)'''
+
<pre><nowiki>
+
# $YAIM_LOCATION/vo.d/magic
+
VOMS_SERVERS="'vomss://voms01.pic.es:8443/voms/magic?/magic' "
+
VOMSES="'magic voms01.pic.es 15003 /DC=es/DC=irisgrid/O=pic/CN=voms01.pic.es magic' 'magic voms02.pic.es 15003 /DC=es/DC=irisgrid/O=pic/CN=voms02.pic.es magic' "
+
VOMS_CA_DN="'/DC=es/DC=irisgrid/CN=IRISGridCA' '/DC=es/DC=irisgrid/CN=IRISGridCA' "
+
  
</nowiki></pre>
+
For this to work, ap priori, a users.conf file and a groups.conf file must exist in the /opt/glite/yaim/etc/ directory. This is usually a part of any grid system CE install, but advice on how to prepare these is given in this Yaim guide (that I hope will be maintained for a little while longer.)
Notes:
+
n/a
+
}}
+
  
{{BOX VO|MICE|<!-- VOMS RECORDS for MICE -->
+
https://twiki.cern.ch/twiki/bin/view/LCG/YaimGuide400
  
 +
=== Worker Services ===
  
''' site-info.def version (sid) '''
+
You have to set this service running:
<pre><nowiki>
+
VO_MICE_VOMS_SERVERS="'vomss://voms.gridpp.ac.uk:8443/voms/mice?/mice' "
+
VO_MICE_VOMSES="'mice voms.gridpp.ac.uk 15001 /C=UK/O=eScience/OU=Manchester/L=HEP/CN=voms.gridpp.ac.uk mice' 'mice voms02.gridpp.ac.uk 15001 /C=UK/O=eScience/OU=Oxford/L=OeSC/CN=voms02.gridpp.ac.uk mice' 'mice voms03.gridpp.ac.uk 15001 /C=UK/O=eScience/OU=Imperial/L=Physics/CN=voms03.gridpp.ac.uk mice' "
+
VO_MICE_VOMS_CA_DN="'/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' "
+
</nowiki></pre>
+
''' vo.d version (vod)'''
+
<pre><nowiki>
+
# $YAIM_LOCATION/vo.d/mice
+
VOMS_SERVERS="'vomss://voms.gridpp.ac.uk:8443/voms/mice?/mice' "
+
VOMSES="'mice voms.gridpp.ac.uk 15001 /C=UK/O=eScience/OU=Manchester/L=HEP/CN=voms.gridpp.ac.uk mice' 'mice voms02.gridpp.ac.uk 15001 /C=UK/O=eScience/OU=Oxford/L=OeSC/CN=voms02.gridpp.ac.uk mice' 'mice voms03.gridpp.ac.uk 15001 /C=UK/O=eScience/OU=Imperial/L=Physics/CN=voms03.gridpp.ac.uk mice' "
+
VOMS_CA_DN="'/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' "
+
  
</nowiki></pre>
+
condor
Notes:
+
n/a
+
  
}}
+
=== Workernode On/Off Control (and Health Checking) ===
  
 +
For health checking, we use a script to check the worker node and "turns it off" if it fails. To implement this, we use a CONDOR feature; startd_cron jobs.
  
{{BOX VO|NA62.VO.GRIDPP.AC.UK|<!-- VOMS RECORDS for NA62.VO.GRIDPP.AC.UK -->
+
This config in the /etc/condor_config.local file on a worker node defines some new configuration variables.
  
 +
ENABLE_PERSISTENT_CONFIG = TRUE
 +
PERSISTENT_CONFIG_DIR = /etc/condor/ral
 +
STARTD_ATTRS = $(STARTD_ATTRS) StartJobs, RalNodeOnline
 +
STARTD.SETTABLE_ATTRS_ADMINISTRATOR = StartJobs
 +
StartJobs = False
 +
RalNodeOnline = False
  
''' site-info.def version (sid) '''
+
The prefix "Ral" is used here because some of this material is inherited from Andrew Lahiff at RAL. It's just to de-conflict names.  
<pre><nowiki>
+
# VO_NA62_VO_GRIDPP_AC_UK_VOMS_SERVERS="'vomss://voms.gridpp.ac.uk:8443/voms/na62.vo.gridpp.ac.uk?/na62.vo.gridpp.ac.uk' 'vomss://voms02.gridpp.ac.uk:8443/voms/na62.vo.gridpp.ac.uk?/na62.vo.gridpp.ac.uk' 'vomss://voms03.gridpp.ac.uk:8443/voms/na62.vo.gridpp.ac.uk?/na62.vo.gridpp.ac.uk' "
+
# VO_NA62_VO_GRIDPP_AC_UK_VOMSES="'na62.vo.gridpp.ac.uk voms.gridpp.ac.uk 15501 /C=UK/O=eScience/OU=Manchester/L=HEP/CN=voms.gridpp.ac.uk na62.vo.gridpp.ac.uk' 'na62.vo.gridpp.ac.uk voms02.gridpp.ac.uk 15501 /C=UK/O=eScience/OU=Oxford/L=OeSC/CN=voms02.gridpp.ac.uk na62.vo.gridpp.ac.uk' 'na62.vo.gridpp.ac.uk voms03.gridpp.ac.uk 15501 /C=UK/O=eScience/OU=Imperial/L=Physics/CN=voms03.gridpp.ac.uk na62.vo.gridpp.ac.uk' "
+
# VO_NA62_VO_GRIDPP_AC_UK_VOMS_CA_DN="'/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' "
+
</nowiki></pre>
+
''' vo.d version (vod)'''
+
<pre><nowiki>
+
# $YAIM_LOCATION/vo.d/na62.vo.gridpp.ac.uk
+
VOMS_SERVERS="'vomss://voms.gridpp.ac.uk:8443/voms/na62.vo.gridpp.ac.uk?/na62.vo.gridpp.ac.uk' 'vomss://voms02.gridpp.ac.uk:8443/voms/na62.vo.gridpp.ac.uk?/na62.vo.gridpp.ac.uk' 'vomss://voms03.gridpp.ac.uk:8443/voms/na62.vo.gridpp.ac.uk?/na62.vo.gridpp.ac.uk' "
+
VOMSES="'na62.vo.gridpp.ac.uk voms.gridpp.ac.uk 15501 /C=UK/O=eScience/OU=Manchester/L=HEP/CN=voms.gridpp.ac.uk na62.vo.gridpp.ac.uk' 'na62.vo.gridpp.ac.uk voms02.gridpp.ac.uk 15501 /C=UK/O=eScience/OU=Oxford/L=OeSC/CN=voms02.gridpp.ac.uk na62.vo.gridpp.ac.uk' 'na62.vo.gridpp.ac.uk voms03.gridpp.ac.uk 15501 /C=UK/O=eScience/OU=Imperial/L=Physics/CN=voms03.gridpp.ac.uk na62.vo.gridpp.ac.uk' "
+
VOMS_CA_DN="'/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' "
+
  
</nowiki></pre>
+
Anyway, the first section says to keep a persistent record of configuration settings; it adds new configuration settings called "StartJobs" and “RalNodeOnline”; it sets them initially to False; and it makes the START configuration setting dependant upon them both being set. Note: the START setting is very important because the node won't start jobs unless it is True.
Notes:
+
n/a
+
}}
+
  
 +
Next, this config also in the /etc/condor/condor_config.local file tells the system (startd) to run a cron script every five minutes.
  
{{BOX VO|NEISS.ORG.UK|<!-- VOMS RECORDS for NEISS.ORG.UK -->
+
STARTD_CRON_JOBLIST=TESTNODE
 +
STARTD_CRON_TESTNODE_EXECUTABLE=/usr/libexec/condor/scripts/testnodeWrapper.sh
 +
STARTD_CRON_TESTNODE_PERIOD=300s
 +
 +
# Make sure values get over
 +
STARTD_CRON_AUTOPUBLISH = If_Changed
  
 +
The testnodeWrapper.sh script looks like this:
  
''' site-info.def version (sid) '''
+
#!/bin/bash
<pre><nowiki>
+
# VO_NEISS_ORG_UK_VOMS_SERVERS="'vomss://voms.gridpp.ac.uk:8443/voms/neiss.org.uk?/neiss.org.uk' "
+
MESSAGE=OK
# VO_NEISS_ORG_UK_VOMSES="'neiss.org.uk voms.gridpp.ac.uk 15027 /C=UK/O=eScience/OU=Manchester/L=HEP/CN=voms.gridpp.ac.uk neiss.org.uk' 'neiss.org.uk voms02.gridpp.ac.uk 15027 /C=UK/O=eScience/OU=Oxford/L=OeSC/CN=voms02.gridpp.ac.uk neiss.org.uk' 'neiss.org.uk voms03.gridpp.ac.uk 15027 /C=UK/O=eScience/OU=Imperial/L=Physics/CN=voms03.gridpp.ac.uk neiss.org.uk' "
+
# VO_NEISS_ORG_UK_VOMS_CA_DN="'/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' "
+
/usr/libexec/condor/scripts/testnode.sh > /dev/null 2>&1
</nowiki></pre>
+
STATUS=$?
''' vo.d version (vod)'''
+
<pre><nowiki>
+
if [ $STATUS != 0 ]; then
# $YAIM_LOCATION/vo.d/neiss.org.uk
+
  MESSAGE=`grep ^[A-Z0-9_][A-Z0-9_]*=$STATUS\$ /usr/libexec/condor/scripts/testnode.sh | head -n 1 | sed -e "s/=.*//"`
VOMS_SERVERS="'vomss://voms.gridpp.ac.uk:8443/voms/neiss.org.uk?/neiss.org.uk' "
+
  if [[ -z "$MESSAGE" ]]; then
VOMSES="'neiss.org.uk voms.gridpp.ac.uk 15027 /C=UK/O=eScience/OU=Manchester/L=HEP/CN=voms.gridpp.ac.uk neiss.org.uk' 'neiss.org.uk voms02.gridpp.ac.uk 15027 /C=UK/O=eScience/OU=Oxford/L=OeSC/CN=voms02.gridpp.ac.uk neiss.org.uk' 'neiss.org.uk voms03.gridpp.ac.uk 15027 /C=UK/O=eScience/OU=Imperial/L=Physics/CN=voms03.gridpp.ac.uk neiss.org.uk' "
+
    MESSAGE=ERROR
VOMS_CA_DN="'/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' "
+
  fi
 +
fi
 +
 +
if [[ $MESSAGE =~ ^OK$ ]] ; then
 +
  echo "RalNodeOnline = True"
 +
else
 +
  echo "RalNodeOnline = False"
 +
fi
 +
echo "RalNodeOnlineMessage = $MESSAGE"
 +
 +
echo `date`, message $MESSAGE >> /tmp/testnode.status
 +
exit 0
  
</nowiki></pre>
+
This just wraps an existing script which I reuse from our TORQUE/MAUI cluster. The existing script just returns a non-zero code if any error happens. To add a bit of extra info, it also looks up the meaning of the code. The important thing to notice is that it echoes out a line to set the RalNodeOnline setting to false. This is then used in the setting of START. Note: on TORQUE/MAUI, the script ran as “root”; here it runs as “condor”. It uses sudo for some of the sections which (e.g.) check disks etc. because condor could not get smartctl settings etc.
Notes:  
+
n/a
+
}}
+
  
{{BOX VO|OPS|<!-- VOMS RECORDS for OPS -->
+
When a node fails the test, START goes to False and the node won't run more jobs.
  
 +
For On/Off control, we use another setting to control START (as well as RalNodeOnline). We have the "StartJobs" setting. We can control this independently, so we can turn a node offline whether or not it has an error. This is useful for stopping the node to (say) rebuild it, similar to the pbs_nodes command on TORQUE/MAUI. The command to control the worker node can be issued remotely from the head node, like this.
  
''' site-info.def version (sid) '''
+
condor_config_val -verbose -name r21-n01 -startd -set "StartJobs = false"
<pre><nowiki>
+
condor_reconfig r21-n01
VO_OPS_VOMS_SERVERS="'vomss://lcg-voms2.cern.ch:8443/voms/ops?/ops' 'vomss://voms.cern.ch:8443/voms/ops?/ops' 'vomss://voms2.cern.ch:8443/voms/ops?/ops' "
+
condor_reconfig -daemon startd r21-n01
VO_OPS_VOMSES="'ops lcg-voms.cern.ch 15009 /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch ops' 'ops lcg-voms2.cern.ch 15009 /DC=ch/DC=cern/OU=computers/CN=lcg-voms2.cern.ch ops' 'ops voms.cern.ch 15009 /DC=ch/DC=cern/OU=computers/CN=voms.cern.ch ops' 'ops voms2.cern.ch 15009 /DC=ch/DC=cern/OU=computers/CN=voms2.cern.ch ops' "
+
VO_OPS_VOMS_CA_DN="'/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Grid Certification Authority' '/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Grid Certification Authority' "
+
</nowiki></pre>
+
''' vo.d version (vod)'''
+
<pre><nowiki>
+
# $YAIM_LOCATION/vo.d/ops
+
VOMS_SERVERS="'vomss://lcg-voms2.cern.ch:8443/voms/ops?/ops' 'vomss://voms.cern.ch:8443/voms/ops?/ops' 'vomss://voms2.cern.ch:8443/voms/ops?/ops' "
+
VOMSES="'ops lcg-voms.cern.ch 15009 /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch ops' 'ops lcg-voms2.cern.ch 15009 /DC=ch/DC=cern/OU=computers/CN=lcg-voms2.cern.ch ops' 'ops voms.cern.ch 15009 /DC=ch/DC=cern/OU=computers/CN=voms.cern.ch ops' 'ops voms2.cern.ch 15009 /DC=ch/DC=cern/OU=computers/CN=voms2.cern.ch ops' "
+
VOMS_CA_DN="'/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Grid Certification Authority' '/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Grid Certification Authority' "
+
  
</nowiki></pre>
+
=GOCDB Entries and Registration =
Notes:
+
n/a
+
}}
+
  
 +
Add new service entries for the head node in GOCDB for the following service types.
  
{{BOX VO|PHENO|<!-- VOMS RECORDS for PHENO -->
+
* gLite-APEL
 +
* gLExec
 +
* ARC-CE
  
 +
It is safe to monitor all these services, once they are marked in production. Once the system is in GOCDB, the accounting system, APEL, will be able to accept accounting records (or contact APEL-SUPPORT@JISCMAIL.AC.UK.)
  
''' site-info.def version (sid) '''
+
Also contact representatives of the big experiments and tell them about the new CE. Ask Atlas to add  the new CE in its analysis, production and multicore job queues.
<pre><nowiki>
+
VO_PHENO_VOMS_SERVERS="'vomss://voms.gridpp.ac.uk:8443/voms/pheno?/pheno' "
+
VO_PHENO_VOMSES="'pheno voms.gridpp.ac.uk 15011 /C=UK/O=eScience/OU=Manchester/L=HEP/CN=voms.gridpp.ac.uk pheno' 'pheno voms02.gridpp.ac.uk 15011 /C=UK/O=eScience/OU=Oxford/L=OeSC/CN=voms02.gridpp.ac.uk pheno' 'pheno voms03.gridpp.ac.uk 15011 /C=UK/O=eScience/OU=Imperial/L=Physics/CN=voms03.gridpp.ac.uk pheno' "
+
VO_PHENO_VOMS_CA_DN="'/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' "
+
</nowiki></pre>
+
''' vo.d version (vod)'''
+
<pre><nowiki>
+
# $YAIM_LOCATION/vo.d/pheno
+
VOMS_SERVERS="'vomss://voms.gridpp.ac.uk:8443/voms/pheno?/pheno' "
+
VOMSES="'pheno voms.gridpp.ac.uk 15011 /C=UK/O=eScience/OU=Manchester/L=HEP/CN=voms.gridpp.ac.uk pheno' 'pheno voms02.gridpp.ac.uk 15011 /C=UK/O=eScience/OU=Oxford/L=OeSC/CN=voms02.gridpp.ac.uk pheno' 'pheno voms03.gridpp.ac.uk 15011 /C=UK/O=eScience/OU=Imperial/L=Physics/CN=voms03.gridpp.ac.uk pheno' "
+
VOMS_CA_DN="'/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' "
+
  
</nowiki></pre>
+
= Software Tags =
Notes:
+
n/a
+
}}
+
  
 +
The use of software tags has almost disappeared since we started using CVMFS. We expect that to continue.
  
{{BOX VO|PLANCK|<!-- VOMS RECORDS for PLANCK -->
+
An ARC CE, unlike CREAM, does not support software tags in the same way. ARC has a different but broadly equivalent mechanism of its own, called ARC runtime environments. These get published in the same way as software tags in the information system. The site admin has to put files into the runtimedir directory (e.g. /etc/arc/runtime). For example, at Liverpool, I've put in this tag for biomed:
  
 +
# ls /etc/arc/runtime/
 +
/etc/arc/runtime/VO-biomed-CVMFS
  
''' site-info.def version (sid) '''
+
These are managed by our configuration management system - VOs can't make changes themselves. Users can query for the tag as so:
<pre><nowiki>
+
VO_PLANCK_VOMS_SERVERS="'vomss://voms.cnaf.infn.it:8443/voms/planck?/planck' "
+
VO_PLANCK_VOMSES="'planck voms.cnaf.infn.it 15002 /C=IT/O=INFN/OU=Host/L=CNAF/CN=voms.cnaf.infn.it planck' "
+
VO_PLANCK_VOMS_CA_DN="'/C=IT/O=INFN/CN=INFN CA' "
+
</nowiki></pre>
+
''' vo.d version (vod)'''
+
<pre><nowiki>
+
# $YAIM_LOCATION/vo.d/planck
+
VOMS_SERVERS="'vomss://voms.cnaf.infn.it:8443/voms/planck?/planck' "
+
VOMSES="'planck voms.cnaf.infn.it 15002 /C=IT/O=INFN/OU=Host/L=CNAF/CN=voms.cnaf.infn.it planck' "
+
VOMS_CA_DN="'/C=IT/O=INFN/CN=INFN CA' "
+
  
</nowiki></pre>
+
# ldapsearch -LLL -x -h lcg-bdii.gridpp.ac.uk:2170 -b o=grid 'GlueSubClusterUniqueID=hepgrid2.ph.liv.ac.uk' GlueHostApplicationSoftwareRunTimeEnvironment
Notes:  
+
...
n/a
+
GlueHostApplicationSoftwareRunTimeEnvironment: VO-biomed-CVMFS
}}
+
...
  
{{BOX VO|SNOPLUS.SNOLAB.CA|<!-- VOMS RECORDS for SNOPLUS.SNOLAB.CA -->
+
=Notes on Accounting, Scaling and Publishing=
  
 +
== Background ==
  
''' site-info.def version (sid) '''
+
Various notes on Jura accounting are available on the [http://wiki.nordugrid.org/wiki/Accounting nordugrid] wiki. I gave a [https://indico.cern.ch/event/556609/contributions/2256481/attachments/1329531/1997362/benchAndPub.pdf presentation] on Accounting, Scaling and Publishing for ARC/Condor and other systems at GridPP37 in Ambleside, UK, which forms the basis for the [[Benchmarking procedure]]. The material in this section is all based off the CREAM/Torque publishing tutorial written some time ago: [[Publishing_tutorial]].
<pre><nowiki>
+
# VO_SNOPLUS_SNOLAB_CA_VOMS_SERVERS="'vomss://voms.gridpp.ac.uk:8443/voms/snoplus.snolab.ca?/snoplus.snolab.ca' "
+
# VO_SNOPLUS_SNOLAB_CA_VOMSES="'snoplus.snolab.ca voms.gridpp.ac.uk 15503 /C=UK/O=eScience/OU=Manchester/L=HEP/CN=voms.gridpp.ac.uk snoplus.snolab.ca' 'snoplus.snolab.ca voms02.gridpp.ac.uk 15003 /C=UK/O=eScience/OU=Oxford/L=OeSC/CN=voms02.gridpp.ac.uk snoplus.snolab.ca' 'snoplus.snolab.ca voms03.gridpp.ac.uk 15003 /C=UK/O=eScience/OU=Imperial/L=Physics/CN=voms03.gridpp.ac.uk snoplus.snolab.ca' "
+
# VO_SNOPLUS_SNOLAB_CA_VOMS_CA_DN="'/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' "
+
</nowiki></pre>
+
''' vo.d version (vod)'''
+
<pre><nowiki>
+
# $YAIM_LOCATION/vo.d/snoplus.snolab.ca
+
VOMS_SERVERS="'vomss://voms.gridpp.ac.uk:8443/voms/snoplus.snolab.ca?/snoplus.snolab.ca' "
+
VOMSES="'snoplus.snolab.ca voms.gridpp.ac.uk 15503 /C=UK/O=eScience/OU=Manchester/L=HEP/CN=voms.gridpp.ac.uk snoplus.snolab.ca' 'snoplus.snolab.ca voms02.gridpp.ac.uk 15003 /C=UK/O=eScience/OU=Oxford/L=OeSC/CN=voms02.gridpp.ac.uk snoplus.snolab.ca' 'snoplus.snolab.ca voms03.gridpp.ac.uk 15003 /C=UK/O=eScience/OU=Imperial/L=Physics/CN=voms03.gridpp.ac.uk snoplus.snolab.ca' "
+
VOMS_CA_DN="'/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' "
+
  
</nowiki></pre>
+
The salient points in this document explain (A) how to apply scaling factors to individual nodes in a mixed cluster and (B) how total power of a site is transmitted. I'll first lay out how it was done in CREAM/TORQUE and then explain the changes required to make it relates to ARC/CONDOR.
Notes:
+
n/a
+
}}
+
  
 +
==Historical Set-up with CREAM/TORQUE/MAUI==
  
 +
===Application of Scaling Factors ===
  
{{BOX VO|T2K.ORG|<!-- VOMS RECORDS for T2K.ORG -->
+
At Liverpool, we introduced an abstract node-type, called BASELINE, with a reference value of 10 HEPSPEC. This is transmitted to the information system on a per CE basis, and can be seen as follows.
  
 +
$ ldapsearch -LLL -x -h hepgrid4:2170 -b o=grid GlueCEUniqueID=hepgrid5.ph.liv.ac.uk:8443/cream-pbs-long GlueCECapability | perl -p0e 's/\n //g'
 +
 +
GlueCECapability: CPUScalingReferenceSI00=2500
  
''' site-info.def version (sid) '''
+
All CE's share the same value. Note: The value of 2500 corresponds to 10 HEPSPEC expressed in “bogoSpecInt2k” (which is equal to 1/250th of a HEPSPEC).
<pre><nowiki>
+
# VO_T2K_ORG_VOMS_SERVERS="'vomss://voms.gridpp.ac.uk:8443/voms/t2k.org?/t2k.org' "
+
# VO_T2K_ORG_VOMSES="'t2k.org voms.gridpp.ac.uk 15003 /C=UK/O=eScience/OU=Manchester/L=HEP/CN=voms.gridpp.ac.uk t2k.org' 't2k.org voms02.gridpp.ac.uk 15003 /C=UK/O=eScience/OU=Oxford/L=OeSC/CN=voms02.gridpp.ac.uk t2k.org' 't2k.org voms03.gridpp.ac.uk 15003 /C=UK/O=eScience/OU=Imperial/L=Physics/CN=voms03.gridpp.ac.uk t2k.org' "
+
# VO_T2K_ORG_VOMS_CA_DN="'/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' "
+
</nowiki></pre>
+
''' vo.d version (vod)'''
+
<pre><nowiki>
+
# $YAIM_LOCATION/vo.d/t2k.org
+
VOMS_SERVERS="'vomss://voms.gridpp.ac.uk:8443/voms/t2k.org?/t2k.org' "
+
VOMSES="'t2k.org voms.gridpp.ac.uk 15003 /C=UK/O=eScience/OU=Manchester/L=HEP/CN=voms.gridpp.ac.uk t2k.org' 't2k.org voms02.gridpp.ac.uk 15003 /C=UK/O=eScience/OU=Oxford/L=OeSC/CN=voms02.gridpp.ac.uk t2k.org' 't2k.org voms03.gridpp.ac.uk 15003 /C=UK/O=eScience/OU=Imperial/L=Physics/CN=voms03.gridpp.ac.uk t2k.org' "
+
VOMS_CA_DN="'/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' "
+
  
</nowiki></pre>
+
All real nodes receive a TORQUE scaling factor that describes how powerful their slots are relative to the abstract reference. For example, a machine with slightly less powerful slots than BASELINE might have a factor of 0.896. TORQUE then automatically normalises cpu durations with the scaling factor. Thus the accounting system merely needs to know the CPUScalingReferenceSI00 value to be able to compute work done.
Notes:
+
n/a
+
}}
+
  
{{BOX VO|VO.SIXT.CERN.CH|<!-- VOMS RECORDS for VO.SIXT.CERN.CH -->
+
===Transmit Total Power of a Site===
  
 +
The total power of a site is conveyed to the information system by sending out values for Total Logical Cpus (or unislots) and Benchmark (average power of a single slot) and multiplying them together. It is done on a per CE basis, and the calculation at Liverpool (which then had 4 CREAM CEs) looks like this:
  
''' site-info.def version (sid) '''
+
$ ldapsearch -LLL -x -h hepgrid4:2170 -b o=grid GlueSubClusterUniqueID=hepgrid5.ph.liv.ac.uk GlueSubClusterLogicalCPUs GlueHostProcessorOtherDescription | perl -p0e 's/\n //g'
<pre><nowiki>
+
# VO_VO_SIXT_CERN_CH_VOMS_SERVERS="'vomss://voms.cern.ch:8443/voms/vo.sixt.cern.ch?/vo.sixt.cern.ch' "
+
GlueSubClusterLogicalCPUs: 1
# VO_VO_SIXT_CERN_CH_VOMSES="'vo.sixt.cern.ch lcg-voms.cern.ch 15005 /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch vo.sixt.cern.ch' 'vo.sixt.cern.ch voms.cern.ch 15005 /DC=ch/DC=cern/OU=computers/CN=voms.cern.ch vo.sixt.cern.ch' "
+
GlueHostProcessorOtherDescription: Cores=6.23,Benchmark=12.53-HEP-SPEC06
# VO_VO_SIXT_CERN_CH_VOMS_CA_DN="'/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' "
+
</nowiki></pre>
+
''' vo.d version (vod)'''
+
<pre><nowiki>
+
# $YAIM_LOCATION/vo.d/vo.sixt.cern.ch
+
VOMS_SERVERS="'vomss://voms.cern.ch:8443/voms/vo.sixt.cern.ch?/vo.sixt.cern.ch' "
+
VOMSES="'vo.sixt.cern.ch lcg-voms.cern.ch 15005 /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch vo.sixt.cern.ch' 'vo.sixt.cern.ch voms.cern.ch 15005 /DC=ch/DC=cern/OU=computers/CN=voms.cern.ch vo.sixt.cern.ch' "
+
VOMS_CA_DN="'/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' "
+
  
</nowiki></pre>
+
$ ldapsearch -LLL -x -h hepgrid4:2170 -b o=grid GlueSubClusterUniqueID=hepgrid6.ph.liv.ac.uk GlueSubClusterLogicalCPUs GlueHostProcessorOtherDescription | perl -p0e 's/\n //g'
Notes:  
+
GlueSubClusterLogicalCPUs: 1
n/a
+
GlueHostProcessorOtherDescription: Cores=6.23,Benchmark=12.53-HEP-SPEC06
}}
+
  
 +
$ ldapsearch -LLL -x -h hepgrid4:2170 -b o=grid GlueSubClusterUniqueID=hepgrid10.ph.liv.ac.uk GlueSubClusterLogicalCPUs GlueHostProcessorOtherDescription | perl -p0e 's/\n //g'
 +
GlueSubClusterLogicalCPUs: 1
 +
GlueHostProcessorOtherDescription: Cores=6.23,Benchmark=12.53-HEP-SPEC06
  
{{BOX VO|ZEUS|<!-- VOMS RECORDS for ZEUS -->
+
$ ldapsearch -LLL -x -h hepgrid4:2170 -b o=grid GlueSubClusterUniqueID=hepgrid97.ph.liv.ac.uk GlueSubClusterLogicalCPUs GlueHostProcessorOtherDescription | perl -p0e 's/\n //g'
 +
GlueSubClusterLogicalCPUs: 1381
 +
GlueHostProcessorOtherDescription: Cores=6.23,Benchmark=12.53-HEP-SPEC06
  
 +
$ bc -l
 +
(1 + 1 + 1 + 1381) * 12.53
  
''' site-info.def version (sid) '''
+
Giving 17341.52 HEPSPEC
<pre><nowiki>
+
VO_ZEUS_VOMS_SERVERS="'vomss://grid-voms.desy.de:8443/voms/zeus?/zeus' "
+
VO_ZEUS_VOMSES="'zeus grid-voms.desy.de 15112 /C=DE/O=GermanGrid/OU=DESY/CN=host/grid-voms.desy.de zeus' "
+
VO_ZEUS_VOMS_CA_DN="'/C=DE/O=GermanGrid/CN=GridKa-CA' "
+
</nowiki></pre>
+
''' vo.d version (vod)'''
+
<pre><nowiki>
+
# $YAIM_LOCATION/vo.d/zeus
+
VOMS_SERVERS="'vomss://grid-voms.desy.de:8443/voms/zeus?/zeus' "
+
VOMSES="'zeus grid-voms.desy.de 15112 /C=DE/O=GermanGrid/OU=DESY/CN=host/grid-voms.desy.de zeus' "
+
VOMS_CA_DN="'/C=DE/O=GermanGrid/CN=GridKa-CA' "
+
  
</nowiki></pre>
+
Note: All 1384 nodes are/were available to each CE to submit to, but the bulk is allocated for hepgrid97 for the purposes of power publishing only.
Notes:  
+
N/A
+
}}
+
  
 +
== The Setup with ARC/CONDOR ==
  
 +
===Application of Scaling Factors===
  
 +
There's an ARC “authplugin” script called scaling_factors_plugin.py, that gets run when a job finishes. It normalises the accounting. It gets a MachineRalScaling (that has been buried in an “errors” file. See “RalScaling” below) then parses the diag file, multiplying the run-times by the factor.
  
{{BOX VO|CALICE  |<!-- VOMS RECORDS for CALICE -->
+
Also in ARC is a “jobreport_options” parameter that contains (e.g.) “benchmark_value:2500.00". I assume this is the equivalent of the “GlueCECapability: CPUScalingReferenceSI00=2500 ” in the “Application of Scaling Factors” section above, i.e. it is in bogospecint2k (250 * HEPSPEC). I assume that it represents the power of the reference node type, i.e. the power to which all the other nodes relate by way of their individual scaling factor.
  
 +
The next thing considered is this RalScaling / MachineRalScaling mechanism. This is set in one of the config files on the WNs:
  
''' site-info.def version (sid) '''
+
RalScaling = 2.14
<pre><nowiki>
+
STARTD_ATTRS = $(STARTD_ATTRS) RalScaling
VO_CALICE_VOMS_SERVERS="'vomss://grid-voms.desy.de:8443/voms/calice?/calice' "
+
VO_CALICE_VOMSES="'calice grid-voms.desy.de 15102 /C=DE/O=GermanGrid/OU=DESY/CN=host/grid-voms.desy.de calice' "
+
VO_CALICE_VOMS_CA_DN="'/C=DE/O=GermanGrid/CN=GridKa-CA' "
+
</nowiki></pre>
+
''' vo.d version (vod)'''
+
<pre><nowiki>
+
# $YAIM_LOCATION/vo.d/calice
+
VOMS_SERVERS="'vomss://grid-voms.desy.de:8443/voms/calice?/calice' "
+
VOMSES="'calice grid-voms.desy.de 15102 /C=DE/O=GermanGrid/OU=DESY/CN=host/grid-voms.desy.de calice' "
+
VOMS_CA_DN="'/C=DE/O=GermanGrid/CN=GridKa-CA' "
+
  
</nowiki></pre>
+
It tells the node how powerful it is by setting a new variable with some arbitrary name. This goes on the ARC CE:
Notes:
+
n/a
+
}}
+
  
{{BOX VO|FUSION |<!-- VOMS RECORDS for FUSION -->
+
MachineRalScaling = "$$([ifThenElse(isUndefined(RalScaling), 1.00, RalScaling)])"
 +
SUBMIT_EXPRS = $(SUBMIT_EXPRS) MachineRalScaling
  
 +
This gets  hold of the RalScaling variable on the WN, then passes it through via the SUBMIT_EXPRS parameter. It winds up in the “errors” file, which is then used in a normalisation script. Note that the scaling factor is applied to the workernode at build time by the set_node_parameters.pl script described in the Files section above.
  
''' site-info.def version (sid) '''
+
=== Notes on HEPSPEC Publishing Parameters ===
<pre><nowiki>
+
VO_FUSION_VOMS_SERVERS="'vomss://voms-prg.bifi.unizar.es:8443/voms/fusion?/fusion' "
+
VO_FUSION_VOMSES="'fusion voms-prg.bifi.unizar.es 15001 /DC=es/DC=irisgrid/O=bifi-unizar/CN=voms-prg.bifi.unizar.es fusion' "
+
VO_FUSION_VOMS_CA_DN="'/DC=es/DC=irisgrid/CN=IRISGridCA' "
+
</nowiki></pre>
+
''' vo.d version (vod)'''
+
<pre><nowiki>
+
# $YAIM_LOCATION/vo.d/fusion
+
VOMS_SERVERS="'vomss://voms-prg.bifi.unizar.es:8443/voms/fusion?/fusion' "
+
VOMSES="'fusion voms-prg.bifi.unizar.es 15001 /DC=es/DC=irisgrid/O=bifi-unizar/CN=voms-prg.bifi.unizar.es fusion' "
+
VOMS_CA_DN="'/DC=es/DC=irisgrid/CN=IRISGridCA' "
+
  
</nowiki></pre>
+
The [[Publishing_tutorial]] describes a situation where Yaim is used to convert and transfer the information. In this case, the same data has to be transposed into the arc.conf configuration file so that the ARC BDII can access and publish the values. The following table shows how to map the YAIM values references in the tutorial to the relevant configuration settings in the ARC system.
Notes:
+
Some spare fields are provided from the original version of this document.
+
<pre>
+
GRIDMAP_AUTH should include "ldap://swevo.ific.uv.es/ou=users,o=registrar,dc=swe,dc=lcg,dc=org"
+
FUSION VOMS server certificate http://swevo.ific.uv.es/vo/files/swevo.ific.uv.es.pem
+
  
Web site http://grid.bifi.unizar.es/egee/fusion-vo/
 
</pre>
 
  
}}
+
{|border="1" cellpadding="1"
 +
|+Worker node hardware
 +
|-style="background:#7C8AAF;color:white"
 +
!Description
 +
!Yaim variable
 +
!ARC Conf Section
 +
!Example ARC Variable
 +
!Notes
  
{{BOX VO|SUPERBVO.ORG |<!-- VOMS RECORDS for SUPERBVO.ORG -->
+
|-
 +
|Total physical cpus in cluster
 +
|CE_PHYSCPU=114
 +
|N/A
 +
|N/A
 +
|No equivalent in ARC
  
 +
|-
 +
|Total cores/logical-cpus/unislots/threads... in cluster
 +
|CE_LOGCPU=652
 +
|[cluster] and [queue/grid]
 +
|totalcpus=652
 +
|Only 1 queue; same in both sections
  
''' site-info.def version (sid) '''
+
|-
<pre><nowiki>
+
|Accounting Scaling
# VO_SUPERBVO_ORG_VOMS_SERVERS="'vomss://voms2.cnaf.infn.it:8443/voms/superbvo.org?/superbvo.org' "
+
|CE_CAPABILITY="CPUScalingReferenceSI00=2500 ...
# VO_SUPERBVO_ORG_VOMSES="'superbvo.org voms-02.pd.infn.it 15009 /C=IT/O=INFN/OU=Host/L=Padova/CN=voms-02.pd.infn.it superbvo.org' 'superbvo.org voms2.cnaf.infn.it 15009 /C=IT/O=INFN/OU=Host/L=CNAF/CN=voms2.cnaf.infn.it superbvo.org' "
+
|[grid-manager]
# VO_SUPERBVO_ORG_VOMS_CA_DN="'/C=IT/O=INFN/CN=INFN CA' '/C=IT/O=INFN/CN=INFN CA' "
+
|jobreport_options="... benchmark_value:2500.00"
</nowiki></pre>
+
|Provides the reference for accounting
''' vo.d version (vod)'''
+
<pre><nowiki>
+
# $YAIM_LOCATION/vo.d/superbvo.org
+
VOMS_SERVERS="'vomss://voms2.cnaf.infn.it:8443/voms/superbvo.org?/superbvo.org' "
+
VOMSES="'superbvo.org voms-02.pd.infn.it 15009 /C=IT/O=INFN/OU=Host/L=Padova/CN=voms-02.pd.infn.it superbvo.org' 'superbvo.org voms2.cnaf.infn.it 15009 /C=IT/O=INFN/OU=Host/L=CNAF/CN=voms2.cnaf.infn.it superbvo.org' "
+
VOMS_CA_DN="'/C=IT/O=INFN/CN=INFN CA' '/C=IT/O=INFN/CN=INFN CA' "
+
  
</nowiki></pre>
+
|-
Notes:
+
|Power of 1 logical cpu, in HEPSPEC * 250 (bogoSI00)
n/a
+
|CE_SI00
 +
|[infosys/glue12]
 +
|NA
 +
|See Yaim Manual; equivalent to benchmark * 250
  
}}
 
  
{{BOX VO|VO.LONDONGRID.AC.UK |<!-- VOMS RECORDS for VO.LONDONGRID.AC.UK -->
+
|-
 +
|Cores: the average unislots in a physical cpu
 +
|CE_OTHERDESCR=Cores=n.n, ...
 +
|[infosys/glue12]
 +
|processor_other_description="Cores=5.72 ..."
 +
|Yaim var was shared with Benchmark (below)
  
 +
|-
 +
|Benchmark: The scaled power of a single core/logical-cpu/unislot/thread ...
 +
|CE_OTHERDESCR=....,Benchmark=11.88-HEP-SPEC06
 +
|[infosys/glue12]
 +
|processor_other_description="...,Benchmark=11.88-HEP-SPEC06"
 +
|Yaim var was shared with Cores (above)
  
''' site-info.def version (sid) '''
 
<pre><nowiki>
 
# VO_VO_LONDONGRID_AC_UK_VOMS_SERVERS="'vomss://voms.gridpp.ac.uk:8443/voms/vo.londongrid.ac.uk?/vo.londongrid.ac.uk' "
 
# VO_VO_LONDONGRID_AC_UK_VOMSES="'vo.londongrid.ac.uk voms.gridpp.ac.uk 15021 /C=UK/O=eScience/OU=Manchester/L=HEP/CN=voms.gridpp.ac.uk vo.londongrid.ac.uk' 'vo.londongrid.ac.uk voms02.gridpp.ac.uk 15021 /C=UK/O=eScience/OU=Oxford/L=OeSC/CN=voms02.gridpp.ac.uk vo.londongrid.ac.uk' 'vo.londongrid.ac.uk voms03.gridpp.ac.uk 15021 /C=UK/O=eScience/OU=Imperial/L=Physics/CN=voms03.gridpp.ac.uk vo.londongrid.ac.uk' "
 
# VO_VO_LONDONGRID_AC_UK_VOMS_CA_DN="'/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' "
 
</nowiki></pre>
 
''' vo.d version (vod)'''
 
<pre><nowiki>
 
# $YAIM_LOCATION/vo.d/vo.londongrid.ac.uk
 
VOMS_SERVERS="'vomss://voms.gridpp.ac.uk:8443/voms/vo.londongrid.ac.uk?/vo.londongrid.ac.uk' "
 
VOMSES="'vo.londongrid.ac.uk voms.gridpp.ac.uk 15021 /C=UK/O=eScience/OU=Manchester/L=HEP/CN=voms.gridpp.ac.uk vo.londongrid.ac.uk' 'vo.londongrid.ac.uk voms02.gridpp.ac.uk 15021 /C=UK/O=eScience/OU=Oxford/L=OeSC/CN=voms02.gridpp.ac.uk vo.londongrid.ac.uk' 'vo.londongrid.ac.uk voms03.gridpp.ac.uk 15021 /C=UK/O=eScience/OU=Imperial/L=Physics/CN=voms03.gridpp.ac.uk vo.londongrid.ac.uk' "
 
VOMS_CA_DN="'/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' "
 
  
</nowiki></pre>
 
Notes:
 
n/a
 
}}
 
  
{{BOX VO|VO.NORTHGRID.AC.UK |<!-- VOMS RECORDS for VO.NORTHGRID.AC.UK -->
+
|}
  
  
''' site-info.def version (sid) '''
+
Once the system is operating, the following script can be used to test the published power of your site.
<pre><nowiki>
+
# VO_VO_NORTHGRID_AC_UK_VOMS_SERVERS="'vomss://voms.gridpp.ac.uk:8443/voms/vo.northgrid.ac.uk?/vo.northgrid.ac.uk' "
+
# VO_VO_NORTHGRID_AC_UK_VOMSES="'vo.northgrid.ac.uk voms.gridpp.ac.uk 15018 /C=UK/O=eScience/OU=Manchester/L=HEP/CN=voms.gridpp.ac.uk vo.northgrid.ac.uk' 'vo.northgrid.ac.uk voms02.gridpp.ac.uk 15018 /C=UK/O=eScience/OU=Oxford/L=OeSC/CN=voms02.gridpp.ac.uk vo.northgrid.ac.uk' 'vo.northgrid.ac.uk voms03.gridpp.ac.uk 15018 /C=UK/O=eScience/OU=Imperial/L=Physics/CN=voms03.gridpp.ac.uk vo.northgrid.ac.uk' "
+
# VO_VO_NORTHGRID_AC_UK_VOMS_CA_DN="'/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' "
+
</nowiki></pre>
+
''' vo.d version (vod)'''
+
<pre><nowiki>
+
# $YAIM_LOCATION/vo.d/vo.northgrid.ac.uk
+
VOMS_SERVERS="'vomss://voms.gridpp.ac.uk:8443/voms/vo.northgrid.ac.uk?/vo.northgrid.ac.uk' "
+
VOMSES="'vo.northgrid.ac.uk voms.gridpp.ac.uk 15018 /C=UK/O=eScience/OU=Manchester/L=HEP/CN=voms.gridpp.ac.uk vo.northgrid.ac.uk' 'vo.northgrid.ac.uk voms02.gridpp.ac.uk 15018 /C=UK/O=eScience/OU=Oxford/L=OeSC/CN=voms02.gridpp.ac.uk vo.northgrid.ac.uk' 'vo.northgrid.ac.uk voms03.gridpp.ac.uk 15018 /C=UK/O=eScience/OU=Imperial/L=Physics/CN=voms03.gridpp.ac.uk vo.northgrid.ac.uk' "
+
VOMS_CA_DN="'/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' "
+
  
</nowiki></pre>
+
#!/usr/bin/perl
Notes:  
+
n/a
+
my @glasgow = qw ( svr010.gla.scotgrid.ac.uk  svr011.gla.scotgrid.ac.uk  svr014.gla.scotgrid.ac.uk  svr026.gla.scotgrid.ac.uk);
}}
+
my @liverpoolCE = qw (hepgrid5.ph.liv.ac.uk hepgrid6.ph.liv.ac.uk hepgrid10.ph.liv.ac.uk hepgrid97.ph.liv.ac.uk );
 +
my @liverpoolCE = qw (hepgrid2.ph.liv.ac.uk );
 +
 +
my $power = 0;
 +
for my $server (@liverpoolCE  ) {
 +
  my $p = getPower($server);
 +
  $power = $power + $p;
 +
}
 +
 +
print("Total power is $power\n");
 +
 +
sub getPower() {
 +
 +
  $bdii = "hepgrid2.ph.liv.ac.uk:2135";
 +
 +
  my $server = shift;
 +
 +
  open(CMD,"ldapsearch -LLL -x -h $bdii -b o=grid 'GlueSubClusterUniqueID=$server' |") or die("No get $server stuff");
 +
  my $buf = '';
 +
  my @lines;
 +
  while (<CMD>) {
 +
    chomp();
 +
    if (/^ /) {
 +
      s/^ //; $buf .= $_;
 +
    }
 +
    else {
 +
      push(@lines,$buf); $buf = $_;
 +
    }
 +
  }
 +
  close(CMD);
 +
  push(@lines,$buf);
 +
 
 +
  my $avgHepspec = -1;
 +
  my $slots = -1;
 +
  foreach my $l (@lines) {
 +
    if ($l =~ /^GlueHostProcessorOtherDescription: Cores=([0-9\.]+),Benchmark=([0-9\.]+)-HEP-SPEC06/) {
 +
      $avgHepspec = $2;
 +
      print("avgHepspec -- $avgHepspec, $l\n");
 +
    }
 +
    if ($l =~ /^GlueSubClusterLogicalCPUs: ([0-9]+)/) {
 +
      $slots = $1;
 +
      print("slots      -- $slots\n");
 +
    }
 +
  }
 +
 
 +
  die("Reqd val not found $avgHepspec $slots \n") if (($avgHepspec == -1) or ($slots == -1));
 +
 +
  my $power =  $avgHepspec * $slots;
 +
  print("power avgHepspec slots, $power, $avgHepspec, $slots\n");
 +
  return $power;
 +
}
  
{{BOX VO|VO.SOUTHGRID.AC.UK |<!-- VOMS RECORDS for VO.SOUTHGRID.AC.UK -->
+
=== Transmit Total Power of a Site ===
  
 +
At present, there is no mechanism for that as far as I know.
  
''' site-info.def version (sid) '''
+
===Republishing Accounting Records===
<pre><nowiki>
+
# VO_VO_SOUTHGRID_AC_UK_VOMS_SERVERS="'vomss://voms.gridpp.ac.uk:8443/voms/vo.southgrid.ac.uk?/vo.southgrid.ac.uk' "
+
# VO_VO_SOUTHGRID_AC_UK_VOMSES="'vo.southgrid.ac.uk voms.gridpp.ac.uk 15019 /C=UK/O=eScience/OU=Manchester/L=HEP/CN=voms.gridpp.ac.uk vo.southgrid.ac.uk' 'vo.southgrid.ac.uk voms02.gridpp.ac.uk 15019 /C=UK/O=eScience/OU=Oxford/L=OeSC/CN=voms02.gridpp.ac.uk vo.southgrid.ac.uk' 'vo.southgrid.ac.uk voms03.gridpp.ac.uk 15019 /C=UK/O=eScience/OU=Imperial/L=Physics/CN=voms03.gridpp.ac.uk vo.southgrid.ac.uk' "
+
# VO_VO_SOUTHGRID_AC_UK_VOMS_CA_DN="'/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' "
+
</nowiki></pre>
+
''' vo.d version (vod)'''
+
<pre><nowiki>
+
# $YAIM_LOCATION/vo.d/vo.southgrid.ac.uk
+
VOMS_SERVERS="'vomss://voms.gridpp.ac.uk:8443/voms/vo.southgrid.ac.uk?/vo.southgrid.ac.uk' "
+
VOMSES="'vo.southgrid.ac.uk voms.gridpp.ac.uk 15019 /C=UK/O=eScience/OU=Manchester/L=HEP/CN=voms.gridpp.ac.uk vo.southgrid.ac.uk' 'vo.southgrid.ac.uk voms02.gridpp.ac.uk 15019 /C=UK/O=eScience/OU=Oxford/L=OeSC/CN=voms02.gridpp.ac.uk vo.southgrid.ac.uk' 'vo.southgrid.ac.uk voms03.gridpp.ac.uk 15019 /C=UK/O=eScience/OU=Imperial/L=Physics/CN=voms03.gridpp.ac.uk vo.southgrid.ac.uk' "
+
VOMS_CA_DN="'/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' "
+
  
</nowiki></pre>
+
You can find some more reading on what you need to do to publish when you setup a new ARC-CE in this page [https://wiki.chipp.ch/twiki/bin/view/LCGTier2/ServiceApel#ARC_CE_Jura this twiki]
Notes:
+
n/a
+
}}
+
  
{{BOX VO|VO.LANDSLIDES.MOSSAIC.ORG |<!-- VOMS RECORDS for VO.LANDSLIDES.MOSSAIC.ORG -->
+
Republishing records from ARC is only possible for APEL if archiving option was set up in the arc.conf (see above for the settings). If this was set for the period covered, you can use the script below (called merge-and-create-publish.sh, and written by Jernej Porenta) for collecting the relevant archived records and putting them in the republishing directory. After doing this, you can run jura publishing in the normal manner, or wait for the cron job to kick off. You must set the following attributes in the script before running it.
  
 +
* archiving directory
 +
* required data gap
 +
* output directory for a new file
  
''' site-info.def version (sid) '''
+
#!/bin/bash
<pre><nowiki>
+
# VO_VO_LANDSLIDES_MOSSAIC_ORG_VOMS_SERVERS="'vomss://voms.gridpp.ac.uk:8443/voms/vo.landslides.mossaic.org?/vo.landslides.mossaic.org' 'vomss://voms02.gridpp.ac.uk:8443/voms/vo.landslides.mossaic.org?/vo.landslides.mossaic.org' 'vomss://voms03.gridpp.ac.uk:8443/voms/vo.landslides.mossaic.org?/vo.landslides.mossaic.org' "
+
# Script to create republish data for JURA from archive dir
# VO_VO_LANDSLIDES_MOSSAIC_ORG_VOMSES="'vo.landslides.mossaic.org voms.gridpp.ac.uk 15502 /C=UK/O=eScience/OU=Manchester/L=HEP/CN=voms.gridpp.ac.uk vo.landslides.mossaic.org' 'vo.landslides.mossaic.org voms02.gridpp.ac.uk 15502 /C=UK/O=eScience/OU=Oxford/L=OeSC/CN=voms02.gridpp.ac.uk vo.landslides.mossaic.org' 'vo.landslides.mossaic.org voms03.gridpp.ac.uk 15502 /C=UK/O=eScience/OU=Imperial/L=Physics/CN=voms03.gridpp.ac.uk vo.landslides.mossaic.org' "
+
# VO_VO_LANDSLIDES_MOSSAIC_ORG_VOMS_CA_DN="'/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' "
+
# JURA archive dir, where all the old accounting records from ARC are saved (archiving setting from jobreport_options in arc.conf)
</nowiki></pre>
+
ARCHIVEDIR="/var/urs/"
''' vo.d version (vod)'''
+
<pre><nowiki>
+
# Time frame of republish data
# $YAIM_LOCATION/vo.d/vo.landslides.mossaic.org
+
FROM="28-Feb-2015"
VOMS_SERVERS="'vomss://voms.gridpp.ac.uk:8443/voms/vo.landslides.mossaic.org?/vo.landslides.mossaic.org' 'vomss://voms02.gridpp.ac.uk:8443/voms/vo.landslides.mossaic.org?/vo.landslides.mossaic.org' 'vomss://voms03.gridpp.ac.uk:8443/voms/vo.landslides.mossaic.org?/vo.landslides.mossaic.org' "
+
TO="02-Apr-2015"
VOMSES="'vo.landslides.mossaic.org voms.gridpp.ac.uk 15502 /C=UK/O=eScience/OU=Manchester/L=HEP/CN=voms.gridpp.ac.uk vo.landslides.mossaic.org' 'vo.landslides.mossaic.org voms02.gridpp.ac.uk 15502 /C=UK/O=eScience/OU=Oxford/L=OeSC/CN=voms02.gridpp.ac.uk vo.landslides.mossaic.org' 'vo.landslides.mossaic.org voms03.gridpp.ac.uk 15502 /C=UK/O=eScience/OU=Imperial/L=Physics/CN=voms03.gridpp.ac.uk vo.landslides.mossaic.org' "
+
VOMS_CA_DN="'/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' "
+
# Output directory for new files, which should go into JURA outgoing dir (usually: /var/spool/arc/ssm/<APEL server>/outgoing/00000000
 +
OUTPUT="/var/spool/arc/ssm/mq.cro-ngi.hr/outgoing/00000000/"
 +
 +
#####
 +
 +
TMPFILE="file.$$"
 +
 +
if [ ! -d $OUTPUT ] || [ ! -d $ARCHIVEDIR ]; then
 +
        echo "Output or Archive dir is missing"
 +
        exit 0
 +
fi
 +
 +
 +
# find all accounting records from archive dir with modifiation time in the specified timeframe and paste the records into temporary file
 +
find $ARCHIVEDIR -type f -name 'usagerecordCAR.*' -newermt "$FROM -1 sec" -and -not -newermt "$TO -1 sec" -printf "%C@ %p\n" | sort | awk '{ print $2 }' | xargs -L1 -- grep -h UsageRecord >> $TMPFILE
 +
 +
# fix issues with missing CpuDuration
 +
perl -p -i -e 's|WallDuration><ServiceLevel|WallDuration><CpuDuration urf:usageType="all">PT0S</CpuDuration><ServiceLevel|' $TMPFILE
 +
 +
# split the temporary file into smaller files with only 999 accounting records each
 +
split -a 4 -l 999 -d $TMPFILE $OUTPUT/
 +
 +
# rename the files into format that JURA publisher will understand
 +
for F in `find $OUTPUT -type f`; do
 +
        FILE=`basename $F`
 +
        NEWFILE=`date -d "$FROM + $FILE second" +%Y%m%d%H%M%S`
 +
        mv -v $OUTPUT/$FILE $OUTPUT/$NEWFILE
 +
done
 +
 +
# prepend XML tags for accounting files
 +
find $OUTPUT -type f -print0 | xargs -0 -L1 -- sed -i '1s/^/<?xml version="1.0"?>\n<UsageRecords xmlns="http:\/\/eu-emi.eu\/namespaces\/2012\/11\/computerecord">\n/'
 +
 +
# attach XML tags for accounting files
 +
for file in `find $OUTPUT -type f`; do
 +
        echo "</UsageRecords>" >> $file
 +
done
 +
 +
rm -f $TMPFILE
 +
 +
echo "Publish files are in $OUTPUT directory"
  
</nowiki></pre>
+
=Tests and Testing=
Notes:
+
n/a
+
}}
+
  
 +
The following URL lists some critical tests for ATLAS, and the Liverpool site. You'll have to modify the site name.
  
{{BOX VO|ENMR.EU |<!-- VOMS RECORDS for ENMR.EU -->
+
<pre>http://dashb-atlas-sum.cern.ch/dashboard/request.py/historicalsmryview-sum#view=serviceavl&time[]=last48&granularity[]=default&profile=ATLAS_CRITICAL&group=All+sites&site[]=UKI-NORTHGRID-LIV-HEP&flavour[]=All+Service+Flavours&flavour[]=ARC-CE&disabledFlavours=true</pre>
  
 +
To check the UK job submission status:
 +
<pre>http://bigpanda.cern.ch/dash/production/?cloudview=region&computingsite=*MCORE*#cloud_UK </pre>
  
''' site-info.def version (sid) '''
+
= Defragmentation for multicore jobs =
<pre><nowiki>
+
# VO_ENMR_EU_VOMS_SERVERS="'vomss://voms2.cnaf.infn.it:8443/voms/enmr.eu?/enmr.eu' "
+
# VO_ENMR_EU_VOMSES="'enmr.eu voms-02.pd.infn.it 15014 /C=IT/O=INFN/OU=Host/L=Padova/CN=voms-02.pd.infn.it enmr.eu' 'enmr.eu voms2.cnaf.infn.it 15014 /C=IT/O=INFN/OU=Host/L=CNAF/CN=voms2.cnaf.infn.it enmr.eu' "
+
# VO_ENMR_EU_VOMS_CA_DN="'/C=IT/O=INFN/CN=INFN CA' '/C=IT/O=INFN/CN=INFN CA' "
+
</nowiki></pre>
+
''' vo.d version (vod)'''
+
<pre><nowiki>
+
# $YAIM_LOCATION/vo.d/enmr.eu
+
VOMS_SERVERS="'vomss://voms2.cnaf.infn.it:8443/voms/enmr.eu?/enmr.eu' "
+
VOMSES="'enmr.eu voms-02.pd.infn.it 15014 /C=IT/O=INFN/OU=Host/L=Padova/CN=voms-02.pd.infn.it enmr.eu' 'enmr.eu voms2.cnaf.infn.it 15014 /C=IT/O=INFN/OU=Host/L=CNAF/CN=voms2.cnaf.infn.it enmr.eu' "
+
VOMS_CA_DN="'/C=IT/O=INFN/CN=INFN CA' '/C=IT/O=INFN/CN=INFN CA' "
+
  
</nowiki></pre>
+
In this section, I discuss various approaches to defragmenting a cluster to make room for multi-core jobs.
Notes:
+
n/a
+
}}
+
  
 +
== Fallow ==
  
 +
I currently recommend Fallow over the other methods I have tried.
  
 +
=== Introduction to Fallow ===
  
 +
Fallow is a tool based on the older idea, DrainBoss (see below). Fallow is smaller, simpler and more precise. The integral term (which was complex) has been dropped and the proportional controller has been simplified.
  
 +
=== Config Settings ===
  
{{BOX VO|CERNATSCHOOL.ORG |<!-- VOMS RECORDS for CERNATSCHOOL.ORG -->
+
To use Fallow, some new config is required on the workernodes. The reason for this is described below in the Principles of Operation section.
  
 +
Lines in the /etc/condor/condor_config.local file need to be amended to hold the OnlyMulticore attribute, as show here.
  
''' site-info.def version (sid) '''
+
  ENABLE_PERSISTENT_CONFIG = TRUE
<pre><nowiki>
+
  PERSISTENT_CONFIG_DIR = /etc/condor/ral
# VO_CERNATSCHOOL_ORG_VOMS_SERVERS="'vomss://voms.gridpp.ac.uk:8443/voms/cernatschool.org?/cernatschool.org' "
+
  STARTD_ATTRS = $(STARTD_ATTRS) StartJobs, RalNodeOnline, OnlyMulticore
# VO_CERNATSCHOOL_ORG_VOMSES="'cernatschool.org voms.gridpp.ac.uk 15500 /C=UK/O=eScience/OU=Manchester/L=HEP/CN=voms.gridpp.ac.uk cernatschool.org' 'cernatschool.org voms02.gridpp.ac.uk 15500 /C=UK/O=eScience/OU=Oxford/L=OeSC/CN=voms02.gridpp.ac.uk cernatschool.org' 'cernatschool.org voms03.gridpp.ac.uk 15500 /C=UK/O=eScience/OU=Imperial/L=Physics/CN=voms03.gridpp.ac.uk cernatschool.org' "
+
  STARTD.SETTABLE_ATTRS_ADMINISTRATOR = StartJobs , OnlyMulticore
# VO_CERNATSCHOOL_ORG_VOMS_CA_DN="'/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' "
+
  OnlyMulticore = False
</nowiki></pre>
+
''' vo.d version (vod)'''
+
<pre><nowiki>
+
# $YAIM_LOCATION/vo.d/cernatschool.org
+
VOMS_SERVERS="'vomss://voms.gridpp.ac.uk:8443/voms/cernatschool.org?/cernatschool.org' "
+
VOMSES="'cernatschool.org voms.gridpp.ac.uk 15500 /C=UK/O=eScience/OU=Manchester/L=HEP/CN=voms.gridpp.ac.uk cernatschool.org' 'cernatschool.org voms02.gridpp.ac.uk 15500 /C=UK/O=eScience/OU=Oxford/L=OeSC/CN=voms02.gridpp.ac.uk cernatschool.org' 'cernatschool.org voms03.gridpp.ac.uk 15500 /C=UK/O=eScience/OU=Imperial/L=Physics/CN=voms03.gridpp.ac.uk cernatschool.org' "
+
VOMS_CA_DN="'/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' '/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B' "
+
  
</nowiki></pre>
+
And the START classad, in the same file, has to be modified to use the OnlyMulticore attribute, as follows.
Notes:
+
n/a
+
}}
+
  
 +
START = ((StartJobs =?= True) && (RalNodeOnline =?= True) && (ifThenElse(OnlyMulticore =?= True,ifThenElse(RequestCpus =?= 8, True, False) ,True ) ))
  
 +
The OnlyMulticore attribute is a persistent, settable attribute that can be altered by (say) an admin user or a script. The START classad, which is consulted before a job is started, will only yield True for a specific job if (as well as certain other conditions) OnlyMulticore is False, or OnlyMulticore is True and the job needs 8 cpus. Thus the node can be controlled to bar it from running single-core jobs by making OnlyMulticore true.
  
{{BOX VO|EARTHSCIENCE <!-- NO XML --> (Note that this VO is not in the CIC Portal)|<pre><nowiki>
+
=== Principles of Operation  ===
VOMS_SERVERS=TBD
+
VOMSES=TBD
+
VOMS_CA_DN=TBD
+
</nowiki></pre>
+
Notes:
+
n/a
+
}}
+
  
 +
Fallow takes a parameter that tells it how many unislots (single cores) should be used ideally by multi-core jobs. This is called the setpoint.
  
 +
Fallow detects how many multi-core and single-core jobs are running and queued, and uses the OnlyMulticore attribute (see below) to control whether nodes are allowed or not to run single-core jobs. A node that is not allowed to run single-core jobs is, effectively, draining.
  
<!-- END OF SIDSECTION -->
+
It does nothing if there are no jobs in the queue or if there are only multi-core jobs in the queue. This is OK because the cluster is already effectively draining if there are no single-core jobs in the queue, and it's pointless doing anything if there are no jobs at all in the queue.
  
'''NOTA BENE'''
+
If there are only single-cores in the queue, Fallow sets OnlyMulticore on all nodes to False, allowing all nodes to any type of job. This is OK because there are no multi-core jobs waiting, so no reservations are wanted.
Please do not change by hand the '''VO Resource Requirements''' table below, as it is automatically updated from the CIC Portal.
+
  
== VO Resource Requirements ==
+
If there are multi-core and single-core jobs in the queue, Fallow uses the following algorithm.
{|border="1" cellpadding="1"
+
|+VO Resource Requirements
+
|-style="background:#7C8AAF;color:white"
+
!VO
+
!Ram/Core
+
!MaxCPU
+
!MaxWall
+
!Scratch
+
!Other
+
  
|-
+
Fallow works out how many multi-core (8 core) slots are needed to achieve the setpoint. Fallow exits if there are already enough running (Fallow never stops a running job to achieve the setpoint.)
|alice
+
|2000
+
|1320
+
|1500
+
|10000
+
|
+
  
 +
Fallow then subtracts the running jobs from the desired to find how many newly drained nodes would be needed to reach the desired state. This gives the number of new nodes to set OnlyMulticore.
  
|-
+
Fallow obtains a list of nodes that can run jobs. It then removes from the list those nodes that are already OnlyMulticore but not yet with 8 cores of slack; these are already in progress.
|atlas
+
|2048
+
|3120
+
|5760
+
|20000
+
|Additional runtime requirements:
+
  _ at least 4GB of VM for each job slot
+
  
Software installation common items:
+
Fallow then tries to find a set of nodes that are not OnlyMulticore , and sets them OnlyMulticore, starting the drain. Following this algorithm, the system should eventually converge on the correct number of multi-core jobs as desired.
_ the full compiler suite (c/c++ and fortran) should be installed in the WNs, including all the compat-gcc-32* and the SL_libg2c.a_change packages in SL4-like nodes;
+
_ the reccommended version of the compilers is 3.4.6;
+
_ the f2c and libgfortran libraries (in both i386 and x86_64 versions, in case of x86_64 systems) are also required to run the software;
+
_ other libraries required are:
+
  libpopt.so.0
+
  libblas.so
+
_ other applications required are: uuencode, uudecode, bc, curl;
+
_ high priority in the batch system for the atlassgm user;
+
_ for nodes running at 64 bits, a copy of python compiled at 32 bits is also needed to use the 32 bits python bindings in the middleware. See https://twiki.cern.ch/twiki/bin/view/Atlas/RPMcompatSLC4 for more details;
+
_ for SL5 nodes please refer to https://twiki.cern.ch/twiki/bin/view/Atlas/RPMCompatSLC5 and https://twiki.cern.ch/twiki/bin/view/Atlas/SL5Migration ;
+
_ for SL6 nodes please refer to https://twiki.cern.ch/twiki/bin/view/AtlasComputing/RPMCompatSLC6 and https://twiki.cern.ch/twiki/bin/view/LCG/SL6Migration
+
  
Software installation setup (cvmfs sites):
+
To avoid confusion, I haven't yet mentioned how newly drained nodes are put back online. This is actually done as the first thing in Fallow. It scans all the nodes, finding ones that are OnlyMulticore but which have now got 8 cores of slack. It turns OnlyMulticore off for those nodes, putting them back into service.
  _ https://twiki.cern.ch/twiki/bin/view/Atlas/CernVMFS
+
  
Software installation requirements (non-cvmfs sites):
+
=== Preferring Multicore Jobs ===
_ an experimental software area (shared filesystem) with at least 500 GB free and reserved for ATLAS.
+
  
 +
==== Algorithmic ====
  
|-
+
For this system to work, it is necessary for it to prefer to start multi-core jobs over single-core jobs. This is because the drain process described above is futile if single-core jobs grab the newly prepared nodes. The system at Liverpool ensures this through various measures. The first and most effective measure is inherent in the Fallow algorithm. As a node drains in OnlyMulticore mode, single-core jobs are not allowed. At some point, 8 or more slots will become free. The system will schedule a multicore job in those slots, because single-core jobs are barred. The next run of Fallow will put the node back in service by allowing single-core jobs, but it is too late - a multicore job is (usually) already running, assuming any were queued.  
|biomed
+
|100
+
|1
+
|1
+
|100
+
|For sites providing an SE, minimal required storage space is 1TB.  
+
  
 +
The only exception to this is a race condition. Say the condor scheduler considers a draining (OnlyMulticore) node and finds that it has too few free cores to schedule a multi-core job. Then say that between then, and the next run of Fallow, enough cores become free. Fallow will then run and turn off OnlyMulticore. The first run of the scheduler after Fallow can then start a single-core job, which spoils the plan.
  
 +
Fallow has logic to counter this. After Fallow discovers a node has enough cores to turn OnlyMulticore off, it waits for a period exceeding one scheduling cycle to ensure that the scheduler has a chance to put a multi-core job on it. Only then does fallow turn OnlyMulticore off. The scheduling cycle period is given to Fallow as a command line parameter.
  
 +
It is recommended anyway that the scheduler should run much more frequently than Fallow, to minimise the chance that this window will be available. There are also other measures that can be used to give more certainly over this aspect, described next for the sake of completeness.
  
|-
+
==== User Priorities ====
|calice
+
|2048
+
|3600
+
|5400
+
|15000
+
|CVMFS is used for the software distribution via:
+
  
  /cvmfs/calice.desy.de
+
On our cluster, we define accounting groups and any job is assigned to some user that belongs to an accounting group (with reference to his proxy certificate and via an authentication and mapping system called lcmaps and Argus). The rules that do this are described in the main document, and look something like this:
  
For setup instructions refer to:
+
LivAcctGroup = strcat("group_",toUpper(
 +
ifThenElse(regexp("sgmatl34",Owner),"highprio",
 +
ifThenElse(regexp("sgmops11",Owner),"highprio",
 +
ifThenElse(regexp("^alice", x509UserProxyVOName), "alice",
 +
ifThenElse(regexp("^atlas", x509UserProxyVOName), "atlas",
 +
ifThenElse(regexp("^biomed", x509UserProxyVOName), <…. and so on …>
 +
"nonefound")))))))))))))))))))))))))))))))) )) ))
  
  http://grid.desy.de/cvmfs
+
LivAcctSubGroup = strcat(regexps("([A-Za-z0-9]+[A-Za-z])\d+", Owner,
 +
"\1"),ifThenElse(RequestCpus > 1,"_mcore","_score"))
  
 +
AccountingGroup = strcat(LivAcctGroup, ".", LivAcctSubGroup, ".", Owner)
  
 +
SUBMIT_EXPRS = $(SUBMIT_EXPRS) LivAcctGroup, LivAcctSubGroup,
 +
AccountingGroup
  
|-
+
The idea is that we have a major accounting group and a sub accounting group for each job, which is put in the SUBMIT_EXPRS as a parameter. The sub accounting group is always _mcore or _score for reasons that will be obvious in a minute. When I run condor_userprio, I see this for e.g.  ATLAS (some cols omitted). Note the priority factor, last col.
|camont
+
|1000
+
|600
+
|2880
+
|1
+
|
+
  
 +
group_ATLAS 0.65 Regroup 1000.00
 +
pilatl_score.pilatl08@ph.liv.ac.uk 500.00 1000.00
 +
atlas_score.atlas006@ph.liv.ac.uk 500.33 1000.00
 +
prdatl_mcore.prdatl28@ph.liv.ac.uk 49993.42 1.00
 +
pilatl_score.pilatl24@ph.liv.ac.uk 96069.21 1000.00
 +
prdatl_score.prdatl28@ph.liv.ac.uk 202372.86 1000.00
  
|-
+
The priority factor for the _mcore subgroup has been set to 1 , using
|cdf
+
|0
+
|4320
+
|4320
+
|10
+
|yaim variables:
+
  
VO_CDF_VOMS_SERVERS="'vomss://voms.cnaf.infn.it:8443/voms/cdf?/cdf' 'vomss://voms-01.pd.infn.it:8443/voms/cdf?/cdf'"
+
condor_userprio -setfactor prdatl_mcore.prdatl28@ph.liv.ac.uk 1
VO_CDF_VOMSES="'cdf voms.cnaf.infn.it 15001 /C=IT/O=INFN/OU=Host/L=CNAF/CN=voms.cnaf.infn.it cdf' 'cdf voms-01.pd.infn.it 15001 /C=IT/O=INFN/OU=Host/L=Padova/CN=voms-01.pd.infn.it cdf'"
+
  
 +
If the default priority factor is (say) 1000, then this makes mcore jobs much more likely to be selected to run than score jobs. Thus if a wide slot is asking for jobs, they it should get a wide job. This seems to be borne out in experience.
  
|-
+
==== GROUP_SORT_EXPR ====
|cernatschool.org
+
|0
+
|0
+
|0
+
|0
+
|
+
  
 +
Andrew Lahiffe has had good results from the GROUP_SORT_EXPR, but I haven't tried it out yet.
  
|-
+
=== Download, Install, Configure ===
|cms
+
|2000
+
|2880
+
|4320
+
|20000
+
|Input I/O requirement is an average 2.5 MB/s per job from MSS.
+
  
All jobs need to have outbound connectivity.
+
The Fallow controller is available as an RPM in this location:
  
Sites must not use pool accounts for the FQAN cms:/cms/Role=lcgadmin .
+
[http://hep.ph.liv.ac.uk/~sjones/ hep.ph.liv.ac.uk/~sjones/]
For any other CMS job, sites need to use pool accounts so that at any time every grid credential is mapped to an independent local account.
+
  
 +
It's an RPM so it can be installed on the ARC/Condor headnode with rpm or yum. Once installed, open
  
 +
/root/scripts/runFallow.sh
  
 +
script and you can modify the line that runs the script, i.e.
  
|-
+
./fallow.py -s 350 -n 61
|dteam
+
|0
+
|0
+
|0
+
|0
+
|
+
  
 +
The -s parameter is the number of unislots (single-cores) to be reserved for multicore jobs. The -n parameter is the negotiator interval + 1. Change this to your site specific value. You can then start the fallow service, i.e.
  
|-
+
service fallow start
|dzero
+
|0
+
|1500
+
|24
+
|6
+
|Worker Nodes need outgoing internet access.
+
  
 +
It will write a log file to
  
|-
+
/root/scripts/fallow.log
|enmr.eu
+
|1000
+
|2880
+
|4320
+
|1000
+
|1) The line:
+
"/enmr.eu/*"::::
+
has to be added to group.conf file before configuring via yaim the grid services.
+
In the CREAM-CE this reflects in the lines:
+
"/enmr.eu/*/Role=NULL/Capability=NULL" .enmr
+
"/enmr.eu/*" .enmr
+
of both the files /etc/grid-security/grid-mapfile and /etc/grid-security/voms-grid-mapfile, and in the lines:
+
"/enmr.eu/*/Role=NULL/Capability=NULL" enmr
+
"/enmr.eu/*" enmr
+
of the file /etc/grid-security/groupmapfile.
+
It is required to enable whatever VO group added for implementing per-application accounting.
+
  
2) Further, multiple queues should ideally be enabled with different Job Wall Clock Time limits:
+
== DrainBoss ==
_ very short: 30 minutes max - for NAGIOS probes, that run with the VO FQAN:
+
/enmr.eu/ops/Role=NULL/Capability=NULL
+
_ short : 120 minutes max
+
_ medium : 12 hours max
+
_ long : 48 hours
+
  
3) A WeNMR supported application, Gromacs, run in multithreading mode on multiprocessor boxes (MPI not needed), as described in http://www.egi.eu/blog/2011/10/31/running_multiprocessor_jobs_on_the_grid.html.
+
DrainBoss has been superceded by Fallow, above.
Please inform the VO managers if your site does not support this kind of jobs.
+
  
4) WeNMR software area can be mounted on the WNs through CVMFS as decribed in https://www.gridpp.ac.uk/wiki/RALnonLHCCVMFS
+
=== Introduction to DrainBoss ===
  
 +
If all jobs on a cluster require the same number of CPUs, e.g. all need one, or all need two etc., then you can simple load up each node with  jobs until it is full and no more. When one jobs ends, another can use its slot. But a problem occurs when you try to run jobs which vary in the number of cpus they require. Consider when a node has (say) eight cores, and it running eight single core jobs. One is the first to end, and a slot becomes free. But let us say that the highest priority job in the queue is an eight core job. The newly freed slot is not wide enough to take it, so it has to wait. Should the scheduler use the slot for a waiting single core job, or hold it back for the other seven jobs to end? It it holds jobs back, then resources are wasted. If it pops another single core job into running, then the multicore job has no prospect of ever running.  The solution that Condor provides to the problem has two rules: start multicore jobs in preference to single core jobs, and periodically drain down nodes so that a multicore job can fit on them. The is implemented using the Condor DEFRAG daemon. This has parameters, described in the section below, which control the way nodes are selected and drained for multicore jobs.  DrainBoss provides functionality for a similar approach but has a the additional features of a process controller that is used to sense the condition of the cluster and adjust the way nodes are drained and put back into service in a way that provides a certain amount of predictability.
  
|-
+
=== Process controller principles ===
|epic.vo.gridpp.ac.uk
+
|0
+
|0
+
|0
+
|0
+
|
+
  
 +
A process controller provides a feedback control system. It measures some variable, and compares this to some ideal value, called a setpoint, finding the error. It corrects the process to try to bring the error to the setpoint, eliminating the error. There are a large number of algorithms used to compute the correction, but DrainBoss makes use of the first two terms of the well-known Proportional Integral Derivative  (PID) control algorithm, i.e. it's a PI controller. The proportional term sets the correction proportionally to the size of the error. This is sometimes called the gain of the controller. This is sufficient for many fast acting processes, but any process involving the draining of  compute nodes is likely to have a period of some hours or days. In this application, pure proportional control is too sensitive to time lags  and the control would be very poor. This, in this application, the proportional is used but it has a very low gain to damp down its sensitivity. The second term, integral action, is more important in this application. Integral actions sums (i.e. integrates, hence the name) the size of the  error over time and feeds that in to the controller output as well. Thus, as the area under the error build over time, the control output grows to offset it. This eventually overcomes the offset and returns the measured variable to the set point.
  
|-
+
=== Application ===
|esr
+
|2048
+
|2100
+
|0
+
|0
+
|Many applications only need part of the following. Java/Perl/Python/C/C++/FORTRAN77,-90,-95; IDL and MATLAB runtime; Scilab or Octave. Needs MPI for some applications.
+
Some applications require access to job output during execution, some even interaction via X11.
+
1 GB RAM; some applications need 3 GB RAM.
+
Outbound connectivity from WN to databases.
+
Shared file system needed for MPI applications, with about 10 GB of space.
+
There are applications needing about 1000 simultaneously open files.
+
Depending on application, output file sizes from some MB to 5 GB, for a total of several hundred thousand files.
+
No permanent storage needed but transient and durable.
+
Low-latency scheduling for short jobs needed.
+
  
 +
There are a few particulars to this application that affect the design of the controller.
  
|-
+
First, the prime objectives of the system are to maximise the usage of the cluster and get good throughput of both single-core and multicore jobs. A good controller might be able to achieve this but there are a few problems to deal with.
|fusion
+
|0
+
|0
+
|0
+
|0
+
|Please see MoU for details. NOTE that the CPU numbers cannot be determined because of conflicting numbers in the MoU.
+
  
 +
* Minimal negative corrections: To achieve control, the controller usually only puts more nodes into drain state. It never stops nodes draining, with one exception - once a drain starts, it usually completes. The purpose of this policy is that drains represent a cost to the system, and cancelling throws away ant achievement made from the draining. Just because there are few multicore jobs ion the queue at present doesn't mean some might not crop up at any time. It appears that cancelling drains, and throwing away the achievement made from the draining, could easily be premature. Instead, the nodes are left to drain out and put back into service, just in case a multicore jobs comes along and needs the slot. The only exception to this rules is when there are no multicore or single core jobs in the queue. In this case, the single core jobs are potentially being held back for now reason., It thins unique case, all draining is immediately cancelled to allow the single core nodes to be run.
  
|-
+
* Traffic problems: on a cluster, there is no guarantee that a constant supply of multicore jobs and single core jobs is available. There could be periods when the queue is depleted of one or both types of work. The controller will deal with these issues in the best way it can using these rules. If there are no multicore jobs queued, then it's pointless to start draining any systems because there are no jobs to fill the resulting wide slots. Also, if there are no multicore jobs but some single core jobs are queued, then the controller cancels the on-going drains to let the single core jobs run, otherwise the jobs would be held back for no valid reason. The truth table below shows the simple picture.
|geant4
+
|1000
+
|650
+
|850
+
|300
+
|Software is distributed via CernVM-FS
+
(http://cernvm.cern.ch/portal/filesystem), configuration should include
+
geant4.cern.ch<http://geant4.cern.ch> and dependency (sft.cern.ch<http://sft.cern.ch>, grid.cern.ch<http://grid.cern.ch>) areas.
+
  
CernVM-FS needs to be accessed on WN. CernVM-FS Cache area needed is about
 
5GB.
 
  
 +
{|border="1" cellpadding="1"
 +
|+
  
|-
+
|-style="background:#7C8AAF;color:white"
|gridpp
+
|Queue state
|1000
+
|1000
+
|0
+
|0
+
 
|
 
|
 
 
|-
 
|hone
 
|2048
 
|3600
 
|5400
 
|5000
 
|CVMFS is used for the software distribution via:
 
 
  /cvmfs/hone.desy.de
 
 
For setup instructions refer to:
 
 
  http://grid.desy.de/cvmfs
 
 
 
 
|-
 
|hyperk.org
 
|2000
 
|1440
 
|1440
 
|10000
 
 
|
 
|
 
 
|-
 
|ilc
 
|2048
 
|3600
 
|5400
 
|15000
 
|CVMFS is used for the software distribution via:
 
 
  /cvmfs/calice.desy.de
 
 
For setup instructions refer to:
 
 
  http://grid.desy.de/cvmfs
 
 
 
 
|-
 
|ipv6.hepix.org
 
|0
 
|0
 
|0
 
|0
 
 
|
 
|
 
 
|-
 
|vo.landslides.mossaic.org
 
|0
 
|0
 
|0
 
|0
 
 
|
 
|
 
  
 
|-
 
|-
|lhcb
+
|mc jobs queued
|4000
+
|no
|6000
+
|yes
|7200
+
|no
|20000
+
|yes
|Further recommendations from LHCb for sites:
+
 
+
As of June 2014 LHCb is not able to use SL5 based resources (worker nodes) anymore on distributed computing infrastructures.
+
 
+
The amount of memory in the field "Max used physical non-swap X86_64 memory size" of the resources section is understood to be the virtual memory required per single process of a LHCb payload. Usually LHCb payloads consist of one "worker process", consuming the majority of memory, and several wrapper processes. The total amount of virtual memory for all wrapper processes accounts for 1 GB which needs to be added as a requirement to the field  "Max used physical non-swap X86_64 memory size" in case the virtual memory of the whole process tree is monitored.
+
 
+
The amount of space in field "Max size of scratch space used by jobs", shall be interpreted as, 5 GB needed for local software installation, the remaining amount is needed 50 % each for downloaded input files and produced output files. T2 sites only providing Monte Carlo simulation will only need to provide the scratch space of local software installation.
+
 
+
The CPU limits are understood to be expressed in kSI2k.minutes
+
 
+
The shared software area shall be provided via CVMFS. LHCb uses the mount point /cvmfs/lhcb.cern.ch on the worker nodes.
+
 
+
Provisioning of a reasonable number of slots per disk server, proportional to the maximum number of concurrent jobs at the site.
+
 
+
Advertisement of OS and machine capabilities in the BDII as described in https://wiki.egi.eu/wiki/Operations/HOWTO05 and https://wiki.egi.eu/wiki/Operations/HOWTO06.
+
 
+
Separation of clusters running different OSes via different CEs.
+
 
+
Non T1 sites providing CVMFS, direct CREAM submission and the requested amount of local scratch space will be considered as candidates for additional workloads (e.g. data reprocessing campaign).
+
 
+
 
+
 
+
  
 
|-
 
|-
|vo.londongrid.ac.uk
+
|sc jobs queued
|1024
+
|no
|1440
+
|no
|1440
+
|yes
|1000
+
|yes
|
+
  
 
+
|-style="background:#7C8AAF;color:white"
|-
+
|Actions
|magic
+
|1024
+
|5000
+
|0
+
|0
+
|Fortran77 and other compilers. See details in annex of MoU (documentation section).
+
 
+
 
+
|-
+
|mice
+
|0
+
|0
+
|0
+
|0
+
 
|
 
|
 
 
|-
 
|na62.vo.gridpp.ac.uk
 
|2048
 
|0
 
|0
 
|2048
 
 
|
 
|
 
 
|-
 
|neiss.org.uk
 
|0
 
|0
 
|0
 
|0
 
 
|
 
|
 
 
|-
 
|vo.northgrid.ac.uk
 
|0
 
|0
 
|0
 
|0
 
 
|
 
|
 
  
 
|-
 
|-
|ops
+
|start drain if nec.
|0
+
|no
|0
+
|yes
|0
+
|no
|0
+
|yes
|
+
 
+
  
 
|-
 
|-
|pheno
+
|cancel on-going drains
|0
+
|no
|0
+
|no
|0
+
|yes
|0
+
|no
|
+
  
 +
|}
  
|-
+
=== Tuning ===
|planck
+
|0
+
|950
+
|0
+
|0
+
|Need access to job output during execution
+
Need R-GMA for monitoring
+
RAM 1GB
+
Scratch 200GB
+
SE for durable files (not permanent)
+
Java/Perl/Python/C/C++/Fortran90,-95,-77/Octave
+
IDL(commercial) where available
+
  
 +
Tuning was done entirely by hand although there are technical ways to tune the system more accurately that I hope to research in future.
  
 +
=== Current status ===
 +
blah
  
|-
 
|vo.sixt.cern.ch
 
|0
 
|0
 
|0
 
|0
 
|
 
  
 +
=== Download ===
  
|-
+
The DrainBoss controller is available as an RPM in this Yum repository:
|snoplus.snolab.ca
+
[http://www.sysadmin.hep.ac.uk/rpms/fabric-management/RPMS.vomstools/ www.sysadmin.hep.ac.uk]
|2000
+
|1440
+
|2160
+
|20000
+
|libX11-dev
+
autoconf
+
automake
+
libtool
+
bzip2-devel
+
g++
+
gcc
+
curl-config (libcurl)
+
python
+
python-devel
+
ld
+
cmake
+
  
 +
== The DEFRAG daemon ==
  
 +
This is the traditional approach to defragmentation used in the the initial version of the example build of an ARC/Condor cluster. It uses the
 +
DEFRAG daemon that comes with condor.
 +
To configure this set-up, you need to edit on the server the condor_config.local on the server, and create a
 +
script,  set_defrag_parameters.sh, to control the amount of defragging. The script is operated by a cron job. Full
 +
details on this configuration are given ihte section of server files, above. The meaning of some important
 +
fragmentation parameters used to control the DEFRAG daemon is discussed next.
  
|-
+
* DEFRAG_INTERVAL – How often the daemon evaluates defrag status and sets systems draining.
|vo.southgrid.ac.uk
+
* DEFRAG_REQUIREMENTS – Only machines that fit these requirements will start to drain.
|0
+
* DEFRAG_DRAINING_MACHINES_PER_HOUR – Only this many machines will be set off draining each hour.
|0
+
* DEFRAG_MAX_WHOLE_MACHINES – Don't start any draining if you already have this many whole machines.
|0
+
* DEFRAG_MAX_CONCURRENT_DRAINING – Never drain more than this many machines at once.
|0
+
* DEFRAG_RANK – This allows you to prefer some machines over others to drain.
|
+
* DEFRAG_WHOLE_MACHINE_EXPR – This defines whether a certain machine is whole or not.
 +
* DEFRAG_CANCEL_REQUIREMENTS – Draining will be stopped when a draining machine matches these requirements.
  
 +
Note: The meaning of the ClassAds and parameters used to judge the fragmentation state of a machine is Byzantine in its complexity. The following definitions have been learned from experience.
  
|-
+
The multicore set-up in CONDOR makes use of the idea of a abstract Partitonable Slot (PSlot) that can't run jobs but contains real slots of various sizes that can. In our set-up, every node has a single PSlot on it. Smaller "real" slots are made from it, each with either 1 single simultaneous thread of execution (a unislot) or 8 unislots for multicore jobs. The table below shows the meaning of some ClassAds used to express the usage of a node that is currently runing seven single core jobs (I think it's taken from an E5620 CPU).
|superbvo.org
+
|2000
+
|1440
+
|1800
+
|15000
+
| _ A minimum of 10 prd pool accounts related to Proxy VOMS Role "ProductionManager" is required
+
  
  _ Software installation procedure based on Grid job submission needs the yum-utils package. It is available only on SL5 (or greater) OS Worker Nodes.
+
The ClassAds in the first columns (Pslot) have the following meanings. DetectedCpus shows that the node has 16 hyper-threads in total - this is the hardware limit for simultaneous truly concurrent threads.  The next row, TotalSlots, shows the size of the PSlot on this node. In this case, only 10 unislots can ever be used for jobs, unusing 6 unislots (note: it has been found that total throughput does not increase even if all the unislots are used so it is not inefficient to unuse 6 unislots.) Next, TotalSlots is equal to 8 in this case, which represents the total of all the used unislots in the sub slots, plus 1 to represent the PSlot. A value of 8 shows that this PSlot currently has seven of its unislots used by sub slots, and three unused. These could be used to make new sub slots to run jobs in. The last ClassAd, Cpus, represents the usable unislots in the PSlot that are left over (i.e. 3).  
  
_ LCG-Utils suite should be accessible at job execution.
+
With respect to the sub slot columns, the DetectedCpus and TotalSlots values can be ignored as they are always the same. Both TotalSlot and Cpus in the sub slot columns represent how many unislots are in this sub slot.
  
_ The SL5X superb software release needs the following packages installed on the wn [x86_64|i386] arch systems, the package release value is not a strict requirement, i386 and x86_64 RPM architecture should be provided:
+
It's as clear as mud, isn't it? But my experiments show it is consistent.
  
yum-utils-1.1.16-14.el5.noarch
+
{|border="1" cellpadding="1"
openmotif-2.3.1-2.el5.i386[x86_64]
+
|-style="background:#7C8AAF;color:white"
lapack-3.0-37.el5.i386[x86_64]
+
!PSlot
boost-1.33.1-10.el5.i386[x86_64]
+
!Sub slot
blas-3.0-37.el5.i386 [x86_64]
+
!Sub Slot
pcre-6.6-2.el5_1.7.i386[x86_64]
+
!Sub Slot
 
+
!Sub Slot
 
+
!Sub Slot
_ The SL4X superb software release needs the following packages installed on the wn [x86_64|i386] arch systems:
+
!Sub Slot
 
+
!Sub Slot
lesstif-0.95.0-1.i386
+
!Empty 3 unislots
lesstif-clients-0.95.0-1.i386
+
blas-3.0-25.1.i386
+
boost-1.32.0-6.rhel4.i386
+
lapack-3.0-25.1.i386
+
pcre-4.5-4.el4_6.6.i386
+
  
 +
|-
 +
|DetectedCpus:<br>How Many<br>HyperThreads<br>e.g. 16
 +
|Ignore
 +
|Ignore
 +
|Ignore
 +
|Ignore
 +
|Ignore
 +
|Ignore
 +
|Ignore
 +
|Empty
  
 +
|-
 +
|TotalSlotCpus:<br>How many CPUs<br>can be used<br>e.g. 10
 +
|Ignore
 +
|Ignore
 +
|Ignore
 +
|Ignore
 +
|Ignore
 +
|Ignore
 +
|Ignore
 +
|Empty
  
 
|-
 
|-
|t2k.org
+
|TotalSlots:<br>Total of main plus<br>all sub slots<br>e.g. 8
|1500
+
|TotalSlots:<br>How many unislots in<br>this sub slot.<br>e.g. 1
|600
+
|TotalSlots:<br>How many unislots in<br>this sub slot.<br>e.g. 1
|600
+
|TotalSlots:<br>How many unislots in<br>this sub slot.<br>e.g. 1
|1000
+
|TotalSlots:<br>How many unislots in<br>this sub slot.<br>e.g. 1
|cvs
+
|TotalSlots:<br>How many unislots in<br>this sub slot.<br>e.g. 1
imake
+
|TotalSlots:<br>How many unislots in<br>this sub slot.<br>e.g. 1
gmp-devel
+
|TotalSlots:<br>How many unislots in<br>this sub slot.<br>e.g. 1
liblockfile-devel
+
|Empty
ncurses-devel
+
libX11-devel
+
libXft-devel
+
libXpm-devel
+
libtermcap-devel
+
libXext-devel
+
libxml2-devel
+
 
+
  
 
|-
 
|-
|zeus
+
|Cpus:<br>Usable unislots<br>left over<br><br>e.g. 3
|2048
+
|As above<br>Always the same.
|3600
+
|As above<br>Always the same.
|5400
+
|As above<br>Always the same.
|5000
+
|As above<br>Always the same.
|CVMFS is used for the software distribution via:
+
|As above<br>Always the same.
 +
|As above<br>Always the same.
 +
|As above<br>Always the same.
 +
|Empty
  
  /cvmfs/zeus.desy.de
 
  
For setup instructions refer to:
+
|}
  
  http://grid.desy.de/cvmfs
+
=== Setting Defrag Parameters ===
  
 +
The script below is for sensing the load condition of the cluster and setting appropriate parameters for defragmentation.
  
 +
* '''File:''' /root/scripts/set_defrag_parameters.sh
 +
* Notes: This script senses changes to the running and queueing job load, and sets parameters related to defragmentation. This allows the cluster to support a load consisting of both multicore and singlecore jobs.
 +
* Customise: Yes. You'll need to edit it it to suit your site. BTW: I'm experimenting with a swanky new version that involves a rate controlller. I'll report on that in due course.
 +
* Content:
 +
#!/bin/bash
 +
#
 +
# Change condor_defrag daemon parameters depending on what's queued
 +
 +
function setDefrag () {
 +
 +
    # Get the address of the defrag daemon
 +
    defrag_address=$(condor_status -any -autoformat MyAddress -constraint 'MyType =?= "Defrag"')
 +
 +
    # Log
 +
    echo `date` " Setting DEFRAG_MAX_CONCURRENT_DRAINING=$3, DEFRAG_DRAINING_MACHINES_PER_HOUR=$4, DEFRAG_MAX_WHOLE_MACHINES=$5 (queued multicore=$1, running multicore=$2)"
 +
 +
    # Set configuration
 +
    /usr/bin/condor_config_val -address "$defrag_address" -rset "DEFRAG_MAX_CONCURRENT_DRAINING = $3" >& /dev/null
 +
    /usr/bin/condor_config_val -address "$defrag_address" -rset "DEFRAG_DRAINING_MACHINES_PER_HOUR = $4" >& /dev/null
 +
    /usr/bin/condor_config_val -address "$defrag_address" -rset "DEFRAG_MAX_WHOLE_MACHINES = $5" >& /dev/null
 +
    /usr/sbin/condor_reconfig -daemon defrag >& /dev/null
 +
}
 +
 +
function cancel_draining_nodes () {
 +
  # Get draining nodes
 +
  for dn in `condor_status | grep Drained | sed -e "s/.*@//" -e "s/\..*//" `; do
 +
    slot1=0
 +
    condor_status -long $dn| while read line; do
 +
 
 +
      # Toggle if slot1@ (not slot1_...). slot1@ lists the empty (i.e. drained) total
 +
      if [[ $line =~ ^Name.*slot1@.*$ ]] ; then
 +
        slot1=1
 +
      fi
 +
      if [[ $line =~ ^Name.*slot1_.*$ ]] ; then
 +
        slot1=0
 +
      fi
 +
   
 +
      if [ $slot1 == 1 ]; then
 +
        if [[ $line =~ ^Cpus\ \=\ (.*)$ ]] ; then
 +
 
 +
          # We must capture empty/drained total
 +
          cpus="${BASH_REMATCH[1]}"
 +
          if [ $cpus -ge 8 ]; then
 +
            # We have enough already. Pointless waiting longer.
 +
            echo Cancel drain of $dn, as we have $cpus free already
 +
            condor_drain -cancel $dn
 +
          fi
 +
        fi
 +
      fi
 +
    done
 +
  done
 +
}
 +
 +
queued_mc_jobs=$(condor_q -global -constraint 'RequestCpus == 8 && JobStatus == 1' -autoformat ClusterId | wc -l)
 +
 +
queued_sc_jobs=$(condor_q -global -constraint 'RequestCpus == 1 && JobStatus == 1' -autoformat ClusterId | wc -l)
 +
 +
running_mc_jobs=$(condor_q -global -constraint 'RequestCpus == 8 && JobStatus == 2' -autoformat ClusterId | wc -l)
 +
 +
running_sc_jobs=$(condor_q -global -constraint 'RequestCpus == 1 && JobStatus == 2' -autoformat ClusterId | wc -l)
 +
 +
queued_mc_slots=`expr $queued_mc_jobs \* 8`
 +
 +
queued_sc_slots=$queued_sc_jobs
 +
 +
# Ratio control
 +
P_SETPOINT=0.5    # When the ratio between multicore and singlecore is more than this, take action
 +
 +
#CONSTANTS
 +
C_MxWM=1000  # At max, pay no heed to how many whole systems
 +
C_MxDH=3    # At max, kick off N per hour to drain
 +
C_MxCD=2    # At max, never more than Xth of cluster should defrag at once (for goodness sake)
 +
 +
C_MnWM=6    # At min, don't bother if n already whole
 +
C_MnDH=1    # At min, only start 1 per hour max
 +
C_MnCD=1    # At min, don't bother if n already going
 +
 +
C_ZWM=0    # At zero, don't bother if 0 already whole
 +
C_ZDH=0    # At zero, only start 0 per hour max
 +
C_ZCD=0    # At zero, don't bother if 0 already going
 +
 +
 +
if [ $queued_sc_slots -le 3 ]; then
 +
  # Very few sc jobs. Max defrag.
 +
  setDefrag $queued_mc_jobs $running_mc_jobs $C_MxCD $C_MxDH $C_MxWM
 +
else
 +
  if [ $queued_mc_slots -le 1 ]; then
 +
    # More than a couple of sc jobs, and almost no mc jobs.
 +
    # No defraging starts,  cancel current defraging
 +
    setDefrag $queued_mc_jobs $running_mc_jobs $C_ZCD $C_ZDH $C_ZWM
 +
    cancel_draining_nodes
 +
  else
 +
    # More than a couple of sc jobs, and mc jobs
 +
    RATIO=`echo "$queued_mc_slots / $queued_sc_slots" | bc -l`
 +
    RESULT=$(echo "${RATIO} > ${P_SETPOINT}" | bc -l )
 +
   
 +
    if [ $RESULT -eq 1 ]; then
 +
      # Surplus of MC over SC, lots of defrag.
 +
      setDefrag $queued_mc_jobs $running_mc_jobs $C_MxCD $C_MxDH $C_MxWM   
 +
    else
 +
      # Not More MC than SC, little of defrag
 +
      setDefrag $queued_mc_jobs $running_mc_jobs $C_MnCD $C_MnDH $C_MnWM   
 +
    fi
 +
  fi
 +
fi
 +
 +
# Raise priority of MC jobs
 +
/root/scripts/condor_q_cores.pl > /tmp/c
 +
 +
# Put all the MC records in one file, with I jobs only
 +
grep ^MC /tmp/c | grep ' I ' > /tmp/mc.c
 +
 +
# Go over those queued multicore jobs and up thier prio
 +
for j in `cat /tmp/mc.c | sed -e "s/\S*\s//" -e "s/ .*//"`; do condor_prio -p 6 $j; done
 +
rm /tmp/c /tmp/mc.c
 +
 +
 +
exit
  
 +
This cron job runs the script periodically.
  
|-style="background:#7C8AAF;color:white"
+
* Cron: defrag
|Maximum:
+
* Purpose: Sets the defrag parameters dynamically
|4000
+
* Puppet stanza:
|6000
+
|7200
+
|20000
+
|
+
 
+
|}
+
==VO enablement  ==
+
 
+
The VOs that are enabled at each site are listed in a [https://pprc.qmul.ac.uk/~walker/votable.html VO table].
+
  
 +
  cron { "set_defrag_parameters.sh":
 +
    command => "/root/scripts/set_defrag_parameters.sh >> /var/log/set_defrag_parameters.log",
 +
    require => File["/root/scripts/set_defrag_parameters.sh"],
 +
    user => root,
 +
    minute  => "*/5",
 +
    hour    => "*",
 +
    monthday => "*",
 +
    month    => "*",
 +
    weekday  => "*",
 +
  }
  
 +
= Further Work =
  
<!-- [https://gfe03.hep.ph.ic.ac.uk:4175/vosupport.html VO Supported in London (Auto Generated)]
+
blah blah blah
  
[[Image:20070227-enabled-vos.PNG]] -->
 
  
 +
= Also see =
  
[[Category:GridPP Deployment]]
+
* https://www.gridpp.ac.uk/wiki/ARC_HTCondor_Basic_Install
[[Category:VOMS]]
+
* https://www.gridpp.ac.uk/wiki/Imperial_arc_ce_for_cloud
 +
* https://www.gridpp.ac.uk/wiki/ARC_CE_Tips
 +
* http://www.slideserve.com/hanzila/multi-core-jobs-at-the-ral-tier-1
 +
* https://www.gridpp.ac.uk/wiki/ARC_HTCondor_Accounting
 +
* https://www.gridpp.ac.uk/wiki/Enable_Cgroups_in_HTCondor
 +
* https://www.gridpp.ac.uk/wiki/Enable_Queues_on_ARC_HTCondor
  
{{KeyDocs|responsible=Steve Jones|reviewdate=2014-08-18|accuratedate=2014-08-18|percentage=100}}
+
[[Category:Batch Systems]]
 +
[[Category:arcce]]
 +
[[Category:HTcondor]]

Latest revision as of 08:26, 28 November 2019

Contents

Introduction

NOTE: This document is based on an ARC 5 set-up, which is behind the newest releases. New installations should consider using ARC 6 instead, although some config options are quite different.

We initially installed ARC to support multicore at our site. A multicore job is one which needs to use more than one processor on a node. Before 2014, multicore jobs were not been used much on the grid infrastructure. This has changed because Atlas and other large users have asked sites to enable multicore on their clusters.

Unfortunately, it is not just a simple task of setting some parameter on the head node and sitting back while jobs arrive. Different grid systems have varying levels of support for multicore, ranging from non-existent to virtually full support.

This report discusses the multicore configuration at Liverpool. We decided to build a cluster using one of the most capable batch systems currently available, called HTCondor (or CONDOR for short). We also decided to front the system with an ARC CE.

I thank Andrew Lahiff at RAL for the initial configuration and many suggestions and help. Some links to some of Andrew's material are in the “See Also” section.

Important Documents

You'll need a copy of the ARC System Admin Manual.

http://www.nordugrid.org/documents/arc-ce-sysadm-guide.pdf

And a copy of the Condor System Admin Manual (this one is for 8.2.10).

research.cs.wisc.edu/htcondor/manual/v8.2/condor-V8_2_10-Manual.pdf

In addition to that, read these notes on Condor/CGROUPS, here:

https://www.gridpp.ac.uk/wiki/Enable_Cgroups_in_HTCondor

And this JURA document will help with the accounting.

http://www.nordugrid.org/documents/jura-tech-doc.pdf

Infrastructure/Fabric

The multicore cluster consists of an SL6 headnode to run the ARC CE and the Condor batch system. The headnode has a dedicated set of 132 workernodes of various types, providing a total of around 1100 single threads of execution, which I shall call unislots, or slots for short.

Head Node

The headnode is a virtual system running on KVM.

Head node hardware
Host Name OS CPUs RAM Disk Space (mostly /var)


hepgrid2.ph.liv.ac.uk SL6.4 5 10 gig 55 gig

Worker nodes

This is the output of our Site Layout Database, showing how ARC/Condor cluster is made up. All the nodes currently run SL6.4.

Workernode types

Node type name CPUs per node Slots per node HS06 per slot GB per slot Scale factor
BASELINE 0 0 10.0 0.0 0.0
L5420 2 8 8.9 2.0 0.89
E5620 2 12 10.6325 2.0 1.0632
X5650 2 24 8.66 2.08 0.866
E5-2630 2 23 11.28 2.17 1.128
E5-2630V3 2 32 11.07 4.12 1.107

Rack layout

Node Set Node Type Node Count CPUs in Node Slots per Node HS06 per Slot HS06
21 E5620 4 2 12 10.6325 510.36
21X X5650 16 2 24 8.66 3325.4399
22 E5620 20 2 12 10.6325 2551.7999
23p1 E5620 10 2 12 10.6325 1275.9
26 E5-2630 4 2 23 11.28 1037.76
26L L5420 7 2 8 8.9 498.4
26V E5-2630V3 5 2 32 11.07 1771.2

General cluster properties

HS06 10970.86
Physical CPUs 132
Logical CPUs (slots) 1100
Cores 8.333
Benchmark 9.974
CE_SI00 2493
CPUScalingReferenceSI00 2500

Software Builds and Configuration

There are a few particulars of the Liverpool site that I want to get out of the way to start with. For the initial installation of an operating system on our head nodes and worker nodes, we use tools developed at Liverpool (BuildTools) based on Kickstart, NFS, TFTP and DHCP. The source (synctool.pl and linktool.pl) can be obtained from sjones@hep.ph.liv.ac.uk. Alternatively, similar functionality is said to exist in the Cobbler suite, which is released as Open Source and some sites have based their initial install on that. Once the OS is on, the first reboot starts Puppet to give a personality to the node. Puppet is becoming something of a de-facto standard in its own right, so I'll use some puppet terminology within this document where some explanation of a particular feature is needed.

Special Software Control Measures

The software for the installation is all contained in various yum repositories. Here at Liverpool, we maintain two mirrored copies of the yum material. One of them, the online repository, is mirrored daily from the Internet. It is not used for any installation. The other copy, termed the local repository, is used to take a snapshot when necessary of the online repository. Installations are done from the local repository. Thus we maintain precise control of the software we use. There is no need to make any further reference to this set-up.

We'll start with the headnode and "work down" so to speak.

Yum repos

This table shows the origin of the software releases via yum repositories.

Yum Repositories
Product Where Yum repo Source Keys
ARC Head node http://download.nordugrid.org/repos/15.03/centos/el6/x86_64/base, download.nordugrid.org/repos/15.03/centos/el6/x86_64/updates http://download.nordugrid.org/repos/15.03/centos/el6/source http://download.nordugrid.org/RPM-GPG-KEY-nordugrid


VomsSnooper Head node http://www.sysadmin.hep.ac.uk/rpms/fabric-management/RPMS.vomstools/ null null


Condor (we use 8.2.X): Head and Worker http://research.cs.wisc.edu/htcondor/yum/stable/rhel6 null null


WLCG Head and Worker http://linuxsoft.cern.ch/wlcg/sl6/x86_64/ null null


Trust anchors Head and Worker http://repository.egi.eu/sw/production/cas/1/current/ null null
Puppet Head and Worker http://yum.puppetlabs.com/el/6/products/x86_64 null null
epel Head and worker http://download.fedoraproject.org/pub/epel/6/x86_64/ null null
emi (to be phased out, June 2017; use UMD) Head and Worker http://emisoft.web.cern.ch/emisoft/dist/EMI/3/sl6//x86_64/base,http://emisoft.web.cern.ch/emisoft/dist/EMI/3/sl6//x86_64/third-party, http://emisoft.web.cern.ch/emisoft/dist/EMI/3/sl6//x86_64/updates null null
CernVM-packages: Worker http://map2.ph.liv.ac.uk//yum/cvmfs/EL/6.4/x86_64/ null http://cvmrepo.web.cern.ch/cvmrepo/yum/RPM-GPG-KEY-CernVM


Head Node

Head Standard build

The basis for the initial build follows the standard model for any grid server node at Liverpool. I won't explain that in detail – each site is likely to have its own standard, which is general to all the components used to build any grid node (such as a CE, ARGUS, BDII, TORQUE etc.) but prior to any middleware. Such a baseline build might include networking, iptables, nagios scripts, ganglia, ssh etc.

Head Extra Directories

I had to make these specific directories myself:

/etc/arc/runtime/ENV
/etc/condor/ral
/etc/lcmaps/
/root/glitecfg/services
/root/scripts
/var/spool/arc/debugging
/var/spool/arc/grid
/var/spool/arc/jobstatus
/var/urs

Head Additional Packages

These packages were needed to add the middleware required, i.e. ARC, Condor and ancillary material.

Additional Packages
Package Description
nordugrid-arc-compute-element The ARC CE Middleware
condor HT Condor, the main batch server package (we are on 8.2.7)
apel-client Accounting, ARC/Condor bypasses the APEL server and goes direct.


ca_policy_igtf-classic Certificates
lcas-plugins-basic Security
lcas-plugins-voms Security
lcas Security
lcmaps Security
lcmaps-plugins-basic Security
lcmaps-plugins-c-pep Security
lcmaps-plugins-verify-proxy Security
lcmaps-plugins-voms Security


globus-ftp-control Extra packages for Globus
globus-gsi-callback Extra packages for Globus


VomsSnooper VOMS Helper, used to set up the LSC (list of Certificates) files
glite-yaim-core Yaim,just use Yaim to make accounts.
yum-plugin-priorities.noarch Helpers for Yum
yum-plugin-protectbase.noarch Helpers for Yum
yum-utils Helpers for Yum


Head Files

The following set of files were additionally installed. Some of them are empty. Some of them can be used as they are. Others have to be edited to fit your site. Any that is a script must have executable permissions (e.g. 755).


  • File: /etc/arc.conf
  • Notes:
  • The main configuration file of the ARC CE. It adds support for scaling factors, APEL reporting, ARGUS Mapping, BDII publishing (power and scaling), multiple VO support, and default limits.
  • Special note: Ext3 has a limit of 31998 directories in the sessiondir. This limit is easily breeched on a large cluster. Either use (say) xfs, or define multiple sessiondir variables to spread the load to several directories, as per “ARC CE System Administrator Guide”.
  • Customise: Yes. You'll need to edit it it to suit your site. Please see the Publishing tutorial.
  • Content:


[common]
debug="1"
x509_user_key="/etc/grid-security/hostkey.pem"
x509_user_cert="/etc/grid-security/hostcert.pem"
x509_cert_dir="/etc/grid-security/certificates"
gridmap="/etc/grid-security/grid-mapfile"
lrms="condor" 
hostname="hepgrid2.ph.liv.ac.uk"

[grid-manager]
debug="3"
logsize=30000000 20
enable_emies_interface="yes"
arex_mount_point="https://hepgrid2.ph.liv.ac.uk:443/arex"
user="root"
controldir="/var/spool/arc/jobstatus"
sessiondir="/var/spool/arc/grid"
runtimedir="/etc/arc/runtime"
logfile="/var/log/arc/grid-manager.log"
pidfile="/var/run/grid-manager.pid"
joblog="/var/log/arc/gm-jobs.log"
shared_filesystem="no" 
authplugin="PREPARING timeout=60,onfailure=pass,onsuccess=pass /usr/local/bin/default_rte_plugin.py %S %C %I ENV/GLITE"
authplugin="FINISHING timeout=60,onfailure=pass,onsuccess=pass /usr/local/bin/scaling_factors_plugin.py %S %C %I"
# This copies the files containing useful output from completed jobs into a directory /var/spool/arc/debugging 
#authplugin="FINISHED timeout=60,onfailure=pass,onsuccess=pass /usr/local/bin/debugging_rte_plugin.py %S %C %I"

mail="root@hep.ph.liv.ac.uk"
jobreport="APEL:http://mq.cro-ngi.hr:6162"
jobreport_options="urbatch:1000,archiving:/var/urs,topic:/queue/global.accounting.cpu.central,gocdb_name:UKI-NORTHGRID-LIV-HEP,use_ssl:true,Network:PROD,benchmark_type:Si2k,benchmark_value:2500.00"
jobreport_credentials="/etc/grid-security/hostkey.pem /etc/grid-security/hostcert.pem /etc/grid-security/certificates"
jobreport_publisher="jura_dummy"
# Disable (1 month !)
jobreport_period=2500000

[gridftpd]
debug="1"
logsize=30000000 20
user="root"
logfile="/var/log/arc/gridftpd.log"
pidfile="/var/run/gridftpd.pid"
port="2811"
allowunknown="yes"
globus_tcp_port_range="20000,24999"
globus_udp_port_range="20000,24999"
maxconnections="500"

#
# Notes:
#
# The first two args are implicitly given to arc-lcmaps, and are
#    argv[1] - the subject/DN
#    argv[2] - the proxy file
#
# The remain attributes are explicit, after the "lcmaps" field in the examples below.
#    argv[3] - lcmaps_library
#    argv[4] - lcmaps_dir
#    argv[5] - lcmaps_db_file
#    argv[6 etc.] - policynames
#
# lcmaps_dir and/or lcmaps_db_file may be '*', in which case they are
# fully truncated (placeholders).
#
# Some logic is applied. If the lcmaps_library is not specified with a
# full path, it is given the path of the lcmaps_dir. We have to assume that
# the lcmaps_dir is a poor name for that field, as discussed in the following
# examples.
#
# Examples:
#   In this example, used at RAL, the liblcmaps.so is given no
#   path, so it is assumes to exist in /usr/lib64 (note the poorly
#   named field - the lcmaps_dir is populated by a library path.)
#
# Fieldnames:      lcmaps_lib   lcmaps_dir lcmaps_db_file            policy
#unixmap="* lcmaps liblcmaps.so /usr/lib64 /usr/etc/lcmaps/lcmaps.db arc"
#
#   In the next example, used at Liverpool, lcmaps_lib is fully qualified. Thus
#   the lcmaps_dir is not used (although is does set the LCMAPS_DIR env var).
#   In this case, the lcmaps_dir really does contain the lcmaps dir location.
#
# Fieldnames:      lcmaps_lib              lcmaps_dir  lcmaps_db_file policy
unixmap="* lcmaps  /usr/lib64/liblcmaps.so /etc/lcmaps lcmaps.db      arc"
unixmap="arcfailnonexistentaccount:arcfailnonexistentaccount all"


[gridftpd/jobs]
debug="1"
path="/jobs"
plugin="jobplugin.so"
allownew="yes" 

[infosys]
debug="1"
user="root"
overwrite_config="yes"
port="2135"
registrationlog="/var/log/arc/inforegistration.log"
providerlog="/var/log/arc/infoprovider.log"
provider_loglevel="1"
infosys_glue12="enable"
infosys_glue2_ldap="enable"

[infosys/glue12]
debug="1"
resource_location="Liverpool, UK"
resource_longitude="-2.964"
resource_latitude="53.4035"
glue_site_web="http://www.gridpp.ac.uk/northgrid/liverpool"
glue_site_unique_id="UKI-NORTHGRID-LIV-HEP"
cpu_scaling_reference_si00="2493"
processor_other_description="Cores=8.333,Benchmark=9.974-HEP-SPEC06"
provide_glue_site_info="false"

[infosys/admindomain]
debug="1"
name="UKI-NORTHGRID-LIV-HEP"

# infosys view of the computing cluster (service)
[cluster]
debug="1"
name="hepgrid2.ph.liv.ac.uk"
localse="hepgrid11.ph.liv.ac.uk"
cluster_alias="hepgrid2 (UKI-NORTHGRID-LIV-HEP)"
comment="UKI-NORTHGRID-LIV-HEP Main Grid Cluster"
homogeneity="True"
nodecpu="Intel(R) Xeon(R) CPU L5420 @ 2.50GHz"
architecture="x86_64"
nodeaccess="inbound"
nodeaccess="outbound"
#opsys="SL64"
opsys="ScientificSL : 6.4 : Carbon"
nodememory="3000"

authorizedvo="alice"
authorizedvo="atlas"
authorizedvo="biomed"
authorizedvo="calice"
authorizedvo="camont"
authorizedvo="cdf"
authorizedvo="cernatschool.org"
authorizedvo="cms"
authorizedvo="dteam"
authorizedvo="dzero"
authorizedvo="epic.vo.gridpp.ac.uk"
authorizedvo="esr"
authorizedvo="fusion"
authorizedvo="geant4"
authorizedvo="gridpp"
authorizedvo="hyperk.org"
authorizedvo="ilc"
authorizedvo="lhcb"
#authorizedvo="lz"
authorizedvo="lsst"
authorizedvo="magic"
authorizedvo="mice"
authorizedvo="na62.vo.gridpp.ac.uk"
authorizedvo="neiss.org.uk"
authorizedvo="ops"
authorizedvo="pheno"
authorizedvo="planck"
authorizedvo="snoplus.snolab.ca"
authorizedvo="t2k.org"
authorizedvo="vo.northgrid.ac.uk"
authorizedvo="zeus"

benchmark="SPECINT2000 2493"
benchmark="SPECFP2000 2493"
totalcpus=1100

[queue/grid]
debug="1"
name="grid"
homogeneity="True"
comment="Default queue"
nodecpu="adotf"
architecture="adotf"
defaultmemory=3000
maxrunning=1400
totalcpus=1100
maxuserrun=1400
maxqueuable=2800
#maxcputime=2880
#maxwalltime=2880
MainMemorySize="16384"
OSFamily="linux"


 


  • File: /etc/arc/runtime/ENV/GLITE
  • Notes: The GLITE runtime environment.
  • Content:
 #!/bin/sh
 
 export GLOBUS_LOCATION=/usr
 
 if [ "x$1" = "x0" ]; then
   # Set environment variable containing queue name
   env_idx=0
   env_var="joboption_env_$env_idx"
   while [ -n "${!env_var}" ]; do
      env_idx=$((env_idx+1))
      env_var="joboption_env_$env_idx"
   done 
   eval joboption_env_$env_idx="NORDUGRID_ARC_QUEUE=$joboption_queue"
 	
   export RUNTIME_ENABLE_MULTICORE_SCRATCH=1
 
 fi
 
 if [ "x$1" = "x1" ]; then
   # Set grid environment
   if [ -e /etc/profile.d/env.sh ]; then
      source /etc/profile.d/env.sh
   fi 
   if [ -e /etc/profile.d/zz-env.sh ]; then
      source /etc/profile.d/zz-env.sh
   fi
   export LD_LIBRARY_PATH=/opt/xrootd/lib
 
   # Set basic environment variables
   export GLOBUS_LOCATION=/usr
   HOME=`pwd`
   export HOME
   USER=`whoami`
   export USER
   HOSTNAME=`hostname -f`
   export HOSTNAME
 fi
 
 export DPM_HOST=hepgrid11.ph.liv.ac.uk
 export DPNS_HOST=hepgrid11.ph.liv.ac.uk
 export GLEXEC_LOCATION=/usr
 export RFIO_PORT_RANGE=20000,25000
 export SITE_GIIS_URL=hepgrid4.ph.liv.ac.uk
 export SITE_NAME=UKI-NORTHGRID-LIV-HEP
 export MYPROXY_SERVER=lcgrbp01.gridpp.rl.ac.uk
 
 
 export VO_ALICE_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 export VO_ALICE_SW_DIR=/opt/exp_soft_sl5/alice
 export VO_ATLAS_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 export VO_ATLAS_SW_DIR=/cvmfs/atlas.cern.ch/repo/sw
 export VO_BIOMED_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 export VO_BIOMED_SW_DIR=/opt/exp_soft_sl5/biomed
 export VO_CALICE_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 export VO_CALICE_SW_DIR=/opt/exp_soft_sl5/calice
 export VO_CAMONT_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 export VO_CAMONT_SW_DIR=/opt/exp_soft_sl5/camont
 export VO_CDF_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 export VO_CDF_SW_DIR=/opt/exp_soft_sl5/cdf
 export VO_CERNATSCHOOL_ORG_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 export VO_CERNATSCHOOL_ORG_SW_DIR=/opt/exp_soft_sl5/cernatschool
 export VO_CMS_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 export VO_CMS_SW_DIR=/opt/exp_soft_sl5/cms
 export VO_DTEAM_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 export VO_DTEAM_SW_DIR=/opt/exp_soft_sl5/dteam
 export VO_DZERO_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 export VO_DZERO_SW_DIR=/opt/exp_soft_sl5/dzero
 export VO_EPIC_VO_GRIDPP_AC_UK_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 export VO_EPIC_VO_GRIDPP_AC_UK_SW_DIR=/opt/exp_soft_sl5/epic
 export VO_ESR_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 export VO_ESR_SW_DIR=/opt/exp_soft_sl5/esr
 export VO_FUSION_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 export VO_FUSION_SW_DIR=/opt/exp_soft_sl5/fusion
 export VO_GEANT4_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 export VO_GEANT4_SW_DIR=/opt/exp_soft_sl5/geant4
 export VO_GRIDPP_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 export VO_GRIDPP_SW_DIR=/opt/exp_soft_sl5/gridpp
 export VO_HYPERK_ORG_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 export VO_HYPERK_ORG_SW_DIR=/cvmfs/hyperk.egi.eu
 export VO_ILC_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 export VO_ILC_SW_DIR=/cvmfs/ilc.desy.de
 export VO_LHCB_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 export VO_LHCB_SW_DIR=/cvmfs/lhcb.cern.ch
 export VO_LZ_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 export VO_LZ_SW_DIR=/opt/exp_soft_sl5/lsst
 export VO_LSST_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 export VO_LSST_SW_DIR=/opt/exp_soft_sl5/lsst
 export VO_MAGIC_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 export VO_MAGIC_SW_DIR=/opt/exp_soft_sl5/magic
 export VO_MICE_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 export VO_MICE_SW_DIR=/cvmfs/mice.egi.eu
 export VO_NA62_VO_GRIDPP_AC_UK_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 export VO_NA62_VO_GRIDPP_AC_UK_SW_DIR=/cvmfs/na62.cern.ch
 export VO_NEISS_ORG_UK_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 export VO_NEISS_ORG_UK_SW_DIR=/opt/exp_soft_sl5/neiss
 export VO_OPS_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 export VO_OPS_SW_DIR=/opt/exp_soft_sl5/ops
 export VO_PHENO_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 export VO_PHENO_SW_DIR=/opt/exp_soft_sl5/pheno
 export VO_PLANCK_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 export VO_PLANCK_SW_DIR=/opt/exp_soft_sl5/planck
 export VO_SNOPLUS_SNOLAB_CA_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 export VO_SNOPLUS_SNOLAB_CA_SW_DIR=/cvmfs/snoplus.egi.eu
 export VO_T2K_ORG_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 export VO_T2K_ORG_SW_DIR=/cvmfs/t2k.egi.eu
 export VO_VO_NORTHGRID_AC_UK_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 export VO_VO_NORTHGRID_AC_UK_SW_DIR=/opt/exp_soft_sl5/northgrid
 export VO_ZEUS_DEFAULT_SE=hepgrid11.ph.liv.ac.uk
 export VO_ZEUS_SW_DIR=/opt/exp_soft_sl5/zeus
 
 export RUCIO_HOME=/cvmfs/atlas.cern.ch/repo/sw/ddm/rucio-clients/0.1.12
 export RUCIO_AUTH_TYPE=x509_proxy 
 
 export LCG_GFAL_INFOSYS="lcg-bdii.gridpp.ac.uk:2170,topbdii.grid.hep.ph.ic.ac.uk:2170"
 
 # Fix to circumvent Condor Globus Libraries
 # (i.e. this error: lcg-cr: /usr/lib64/condor/libglobus_common.so.0: no version information available (required by /usr/lib64/libcgsi_plugin.so.1)
 export LD_LIBRARY_PATH=/usr/lib64/:$LD_LIBRARY_PATH
 
  • File: /etc/condor/config.d/14accounting-groups-map.config
  • Notes: Implements accounting groups, so that fairshares can be used that refer to whole groups of users, instead of individual ones.
  • Customise: Yes. You'll need to edit it to suit your site.
  • Content:
 # Liverpool Tier-2 HTCondor configuration: accounting groups 
 
 # Primary group, assign individual test submitters into the HIGHPRIO group, 
 # else just assign job into primary group of its VO
 
 LivAcctGroup = strcat("group_",toUpper( ifThenElse(regexp("sgmatl34",Owner),"highprio", ifThenElse(regexp("sgmops11",Owner),"highprio", ifThenElse(regexp("^alice", x509UserProxyVOName), "alice", ifThenElse(regexp("^atlas", x509UserProxyVOName), "atlas", ifThenElse(regexp("^biomed", x509UserProxyVOName), "biomed", ifThenElse(regexp("^calice", x509UserProxyVOName), "calice", ifThenElse(regexp("^camont", x509UserProxyVOName), "camont", ifThenElse(regexp("^cdf", x509UserProxyVOName), "cdf", ifThenElse(regexp("^cernatschool.org", x509UserProxyVOName), "cernatschool_org", ifThenElse(regexp("^cms", x509UserProxyVOName), "cms", ifThenElse(regexp("^dteam", x509UserProxyVOName), "dteam", ifThenElse(regexp("^dzero", x509UserProxyVOName), "dzero", ifThenElse(regexp("^epic.vo.gridpp.ac.uk", x509UserProxyVOName), "epic_vo_gridpp_ac_uk", ifThenElse(regexp("^esr", x509UserProxyVOName), "esr", ifThenElse(regexp("^fusion", x509UserProxyVOName), "fusion", ifThenElse(regexp("^geant4", x509UserProxyVOName), "geant4", ifThenElse(regexp("^gridpp", x509UserProxyVOName), "gridpp", ifThenElse(regexp("^hyperk.org", x509UserProxyVOName), "hyperk_org", ifThenElse(regexp("^ilc", x509UserProxyVOName), "ilc", ifThenElse(regexp("^lhcb", x509UserProxyVOName), "lhcb", ifThenElse(regexp("^lsst", x509UserProxyVOName), "lsst", ifThenElse(regexp("^magic", x509UserProxyVOName), "magic", ifThenElse(regexp("^mice", x509UserProxyVOName), "mice", ifThenElse(regexp("^na62.vo.gridpp.ac.uk", x509UserProxyVOName), "na62_vo_gridpp_ac_uk", ifThenElse(regexp("^neiss.org.uk", x509UserProxyVOName), "neiss_org_uk", ifThenElse(regexp("^ops", x509UserProxyVOName), "ops", ifThenElse(regexp("^pheno", x509UserProxyVOName), "pheno", ifThenElse(regexp("^planck", x509UserProxyVOName), "planck", ifThenElse(regexp("^snoplus.snolab.ca", x509UserProxyVOName), "snoplus_snolab_ca", ifThenElse(regexp("^t2k.org", x509UserProxyVOName), "t2k_org", ifThenElse(regexp("^vo.northgrid.ac.uk", x509UserProxyVOName), "vo_northgrid_ac_uk", ifThenElse(regexp("^zeus", x509UserProxyVOName), "zeus","nonefound"))))))))))))))))))))))))))))))))))
 
 # Subgroup
 # For the subgroup, just assign job to the group of the owner (i.e. owner name less all those digits at the end).
 # Also show whether multi or single core.
 LivAcctSubGroup = strcat(regexps("([A-Za-z0-9]+[A-Za-z])\d+", Owner, "\1"),ifThenElse(RequestCpus > 1,"_mcore","_score"))
 
 # Now build up the whole accounting group
 AccountingGroup = strcat(LivAcctGroup, ".", LivAcctSubGroup, ".", Owner)
 
 # Add these ClassAd specifications to the submission expressions
 SUBMIT_EXPRS = $(SUBMIT_EXPRS) LivAcctGroup, LivAcctSubGroup, AccountingGroup 
   

  • File: /etc/condor/config.d/11fairshares.config
  • Notes: Implements fair share settings, relying on groups of users.
  • Customise: Yes. You'll need to edit it to suit your site.
  • Content:
 # Liverpool Tier-2 HTCondor configuration: fairshares
 
 # use this to stop jobs from starting.
 # CONCURRENCY_LIMIT_DEFAULT = 0
 
 # Half-life of user priorities
 PRIORITY_HALFLIFE = 259200
 
 # Handle surplus
 GROUP_ACCEPT_SURPLUS = True
 GROUP_AUTOREGROUP = True
 
 # Weight slots using CPUs
 #NEGOTIATOR_USE_SLOT_WEIGHTS = True
 
 # See: https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=3271
 NEGOTIATOR_ALLOW_QUOTA_OVERSUBSCRIPTION = False
 
 # Calculate the surplus allocated to each group correctly
 NEGOTIATOR_USE_WEIGHTED_DEMAND = True
 
 GROUP_NAMES = \
 	group_HIGHPRIO,  \
 	group_ALICE,  \
 	group_ATLAS,  \
 	group_BIOMED,  \
 	group_CALICE,  \
 	group_CAMONT,  \
 	group_CDF,  \
         group_LSST,  \
 	group_CERNATSCHOOL_ORG,  \
 	group_CMS,  \
 	group_DTEAM,  \
 	group_DZERO,  \
 	group_EPIC_VO_GRIDPP_AC_UK,  \
 	group_ESR,  \
 	group_FUSION,  \
 	group_GEANT4,  \
 	group_GRIDPP,  \
 	group_HYPERK_ORG,  \
 	group_ILC,  \
 	group_LHCB,  \
 	group_MAGIC,  \
 	group_MICE,  \
 	group_NA62_VO_GRIDPP_AC_UK,  \
 	group_NEISS_ORG_UK,  \
 	group_OPS,  \
 	group_PHENO,  \
 	group_PLANCK,  \
         group_LZ,  \
 	group_SNOPLUS_SNOLAB_CA,  \
 	group_T2K_ORG,  \
 	group_VO_NORTHGRID_AC_UK,  \
 	group_VO_SIXT_CERN_CH,  \
 	group_ZEUS
 
 
 GROUP_QUOTA_DYNAMIC_group_HIGHPRIO  = 0.05
 GROUP_QUOTA_DYNAMIC_group_ALICE =  0.05
 GROUP_QUOTA_DYNAMIC_group_ATLAS =  0.65
 GROUP_QUOTA_DYNAMIC_group_BIOMED = 0.00806452
 GROUP_QUOTA_DYNAMIC_group_CALICE = 0.00806452
 GROUP_QUOTA_DYNAMIC_group_CAMONT = 0.00806452
 GROUP_QUOTA_DYNAMIC_group_CDF = 0.00806452
 GROUP_QUOTA_DYNAMIC_group_LSST = 0.00806452
 GROUP_QUOTA_DYNAMIC_group_CERNATSCHOOL_ORG = 0.00806452
 GROUP_QUOTA_DYNAMIC_group_CMS = 0.00806452
 GROUP_QUOTA_DYNAMIC_group_DTEAM = 0.00806452
 GROUP_QUOTA_DYNAMIC_group_DZERO = 0.00806452
 GROUP_QUOTA_DYNAMIC_group_EPIC_VO_GRIDPP_AC_UK = 0.00806452
 GROUP_QUOTA_DYNAMIC_group_ESR = 0.00806452
 GROUP_QUOTA_DYNAMIC_group_FUSION = 0.00806452
 GROUP_QUOTA_DYNAMIC_group_GEANT4 = 0.00806452
 GROUP_QUOTA_DYNAMIC_group_GRIDPP = 0.00806452
 GROUP_QUOTA_DYNAMIC_group_HYPERK_ORG = 0.00806452
 GROUP_QUOTA_DYNAMIC_group_ILC = 0.00806452
 GROUP_QUOTA_DYNAMIC_group_LHCB =  0.20
 GROUP_QUOTA_DYNAMIC_group_MAGIC = 0.00806452
 GROUP_QUOTA_DYNAMIC_group_MICE = 0.00806452
 GROUP_QUOTA_DYNAMIC_group_NA62_VO_GRIDPP_AC_UK = 0.00806452
 GROUP_QUOTA_DYNAMIC_group_NEISS_ORG_UK = 0.00806452
 GROUP_QUOTA_DYNAMIC_group_OPS = 0.00806452
 GROUP_QUOTA_DYNAMIC_group_PHENO = 0.00806452
 GROUP_QUOTA_DYNAMIC_group_PLANCK = 0.00806452
 GROUP_QUOTA_DYNAMIC_group_LZ =  0.01
 GROUP_QUOTA_DYNAMIC_group_SNOPLUS_SNOLAB_CA = 0.00806452
 GROUP_QUOTA_DYNAMIC_group_T2K_ORG = 0.00806452
 GROUP_QUOTA_DYNAMIC_group_VO_NORTHGRID_AC_UK = 0.00806452
 GROUP_QUOTA_DYNAMIC_group_VO_SIXT_CERN_CH = 0.00806452
 GROUP_QUOTA_DYNAMIC_group_ZEUS = 0.00806452
 
 DEFAULT_PRIO_FACTOR = 5000.00
 GROUP_PRIO_FACTOR_group_HIGHPRIO = 1000.0
 GROUP_PRIO_FACTOR_group_ALICE = 1000.0
 GROUP_PRIO_FACTOR_group_ATLAS = 1000.0
 GROUP_PRIO_FACTOR_group_BIOMED = 1000.0
 GROUP_PRIO_FACTOR_group_CALICE = 1000.0
 GROUP_PRIO_FACTOR_group_CAMONT = 1000.0
 GROUP_PRIO_FACTOR_group_CDF = 1000.0
 GROUP_PRIO_FACTOR_group_LSST = 1000.0
 GROUP_PRIO_FACTOR_group_CERNATSCHOOL_ORG = 1000.0
 GROUP_PRIO_FACTOR_group_CMS = 1000.0
 GROUP_PRIO_FACTOR_group_DTEAM = 1000.0
 GROUP_PRIO_FACTOR_group_DZERO = 1000.0
 GROUP_PRIO_FACTOR_group_EPIC_VO_GRIDPP_AC_UK = 1000.0
 GROUP_PRIO_FACTOR_group_ESR = 1000.0
 GROUP_PRIO_FACTOR_group_FUSION = 1000.0
 GROUP_PRIO_FACTOR_group_GEANT4 = 1000.0
 GROUP_PRIO_FACTOR_group_GRIDPP = 1000.0
 GROUP_PRIO_FACTOR_group_HYPERK_ORG = 1000.0
 GROUP_PRIO_FACTOR_group_ILC = 1000.0
 GROUP_PRIO_FACTOR_group_LHCB = 1000.0
 GROUP_PRIO_FACTOR_group_MAGIC = 1000.0
 GROUP_PRIO_FACTOR_group_MICE = 1000.0
 GROUP_PRIO_FACTOR_group_NA62_VO_GRIDPP_AC_UK = 1000.0
 GROUP_PRIO_FACTOR_group_NEISS_ORG_UK = 1000.0
 GROUP_PRIO_FACTOR_group_OPS = 1000.0
 GROUP_PRIO_FACTOR_group_PHENO = 1000.0
 GROUP_PRIO_FACTOR_group_PLANCK = 1000.0
 GROUP_PRIO_FACTOR_group_LZ = 10000.00
 GROUP_PRIO_FACTOR_group_SNOPLUS_SNOLAB_CA = 1000.0
 GROUP_PRIO_FACTOR_group_T2K_ORG = 1000.0
 GROUP_PRIO_FACTOR_group_VO_NORTHGRID_AC_UK = 1000.0
 GROUP_PRIO_FACTOR_group_VO_SIXT_CERN_CH = 1000.0
 GROUP_PRIO_FACTOR_group_ZEUS = 1000.0
 
 
 # Change the order in which the negotiator considers groups: (1) high priority groups used for
 # SUM tests etc, (2) multicore groups ordered by how far below their quota each group is,
 # (3) single core groups ordered by how far below their quota each group is
 
 GROUP_SORT_EXPR = ifThenElse(AccountingGroup=?="<none>", 3.4e+38,                                                                 \
                   ifThenElse(AccountingGroup=?="group_HIGHPRIO", -23,                                                             \
                   ifThenElse(AccountingGroup=?="group_DTEAM", -18,                                                            \
                   ifThenElse(AccountingGroup=?="group_OPS", -17,                                                            \
                   ifThenElse(regexp("mcore",AccountingGroup),ifThenElse(GroupQuota > 0,-2+GroupResourcesInUse/GroupQuota,-1), \
                   ifThenElse(GroupQuota > 0, GroupResourcesInUse/GroupQuota, 3.3e+38))))))
 
  • File: /etc/condor/pool_password
  • Notes: Will have its own section (TBD)
  • Customise: Yes.
  • Content:


 Password Authentication
 The password method provides mutual authentication through the use of a shared 
 secret. This is  often a good choice when strong security is desired, but an existing 
 Kerberos or X.509 infrastructure is not in place. Password authentication
 is available on both Unix andWindows. It currently can only be used for daemon
 -to-daemon authentication. The shared secret in this context is referred to as 
 the pool password. Before a daemon can use password authentication, the pool 
 password must be stored on the daemon's local machine. On Unix, the password will 
 be placed in a file defined by the configuration variable SEC_PASSWORD_FILE. This file 
 will be accessible only by the UID that HTCondor is started as. OnWindows, the same 
 secure password store that is used for user passwords will be used for the pool 
 password (see section 7.2.3). Under Unix, the password file can be generated by 
 using the following command to write directly to the password file:
 condor_store_cred -f /path/to/password/file
 
  • File: /etc/condor/condor_config.local
  • Notes: The main client CONDOR configuration custom file.
  • Customise: Yes. You'll need to edit it to suit your site.
  • Content:
 ##  What machine is your central manager?
 
 CONDOR_HOST = $(FULL_HOSTNAME)
 
 ## Pool's short description
 
 COLLECTOR_NAME = Condor at $(FULL_HOSTNAME)
 
 ##  When is this machine willing to start a job? 
 
 START = FALSE
 
 ##  When to suspend a job?
 
 SUSPEND = FALSE
 
 ##  When to nicely stop a job?
 # When a job is running and the PREEMPT expression evaluates to True, the
 # condor_startd will evict the job. The PREEMPT expression s hould reflect the
 # requirements under which the machine owner will not permit a job to continue to run.
 # For example, a policy to evict a currently running job when a key is hit or when
 # it is the 9:00am work arrival time, would be expressed in the PREEMPT expression
 # and enforced by the condor_startd.
 
 PREEMPT = FALSE
 
 # If there is a job from a higher priority user sitting idle, the
 # condor_negotiator daemon may evict a currently running job submitted
 # from a lower priority user if PREEMPTION_REQUIREMENTS is True.
 
 PREEMPTION_REQUIREMENTS = FALSE
 
 # No job has pref over any other
 
 #RANK = FALSE
 
 ##  When to instantaneously kill a preempting job
 ##  (e.g. if a job is in the pre-empting stage for too long)
 
 KILL = FALSE
 
 ##  This macro determines what daemons the condor_master will start and keep its watchful eyes on.
 ##  The list is a comma or space separated list of subsystem names
 
 DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD, STARTD
 
 #######################################
 # Andrew Lahiff's scaling
 
 MachineRalScaling = "$$([ifThenElse(isUndefined(RalScaling), 1.00, RalScaling)])"
 MachineRalNodeLabel = "$$([ifThenElse(isUndefined(RalNodeLabel), "NotKnown", RalNodeLabel)])"
 SUBMIT_EXPRS = $(SUBMIT_EXPRS) MachineRalScaling MachineRalNodeLabel
  
 #######################################
 # Andrew Lahiff's security
 
 ALLOW_WRITE = 
 
 UID_DOMAIN = ph.liv.ac.uk
 
 CENTRAL_MANAGER1 = hepgrid2.ph.liv.ac.uk
 COLLECTOR_HOST = $(CENTRAL_MANAGER1)
 
 # Central managers
 CMS = condor_pool@$(UID_DOMAIN)/hepgrid2.ph.liv.ac.uk
 
 # CEs
 CES = condor_pool@$(UID_DOMAIN)/hepgrid2.ph.liv.ac.uk
 
 # Worker nodes
 WNS = condor_pool@$(UID_DOMAIN)/192.168.*
 
 # Users
 USERS = *@$(UID_DOMAIN)
 USERS = *
 
 # Required for HA
 HOSTALLOW_NEGOTIATOR = $(COLLECTOR_HOST)
 HOSTALLOW_ADMINISTRATOR = $(COLLECTOR_HOST)
 HOSTALLOW_NEGOTIATOR_SCHEDD = $(COLLECTOR_HOST)
 
 # Authorization
 HOSTALLOW_WRITE =
 ALLOW_READ = */*.ph.liv.ac.uk
 NEGOTIATOR.ALLOW_WRITE = $(CES), $(CMS)
 COLLECTOR.ALLOW_ADVERTISE_MASTER = $(CES), $(CMS), $(WNS)
 COLLECTOR.ALLOW_ADVERTISE_SCHEDD = $(CES)
 COLLECTOR.ALLOW_ADVERTISE_STARTD = $(WNS)
 SCHEDD.ALLOW_WRITE = $(USERS)
 SHADOW.ALLOW_WRITE = $(WNS), $(CES)
 ALLOW_DAEMON = condor_pool@$(UID_DOMAIN)/*.ph.liv.ac.uk, $(FULL_HOSTNAME)
 ALLOW_ADMINISTRATOR = root@$(UID_DOMAIN)/$(IP_ADDRESS), condor_pool@$(UID_DOMAIN)/$(IP_ADDRESS), $(CMS)
 ALLOW_CONFIG = root@$(FULL_HOSTNAME)
 
 # Don't allow nobody to run jobs
 SCHEDD.DENY_WRITE = nobody@$(UID_DOMAIN)
 
 # Authentication
 SEC_PASSWORD_FILE = /etc/condor/pool_password
 SEC_DEFAULT_AUTHENTICATION = REQUIRED
 SEC_READ_AUTHENTICATION = OPTIONAL
 SEC_CLIENT_AUTHENTICATION = REQUIRED
 SEC_DEFAULT_AUTHENTICATION_METHODS = PASSWORD,FS
 SCHEDD.SEC_WRITE_AUTHENTICATION_METHODS = FS,PASSWORD
 SCHEDD.SEC_DAEMON_AUTHENTICATION_METHODS = FS,PASSWORD
 SEC_CLIENT_AUTHENTICATION_METHODS = FS,PASSWORD,CLAIMTOBE
 SEC_READ_AUTHENTICATION_METHODS = FS,PASSWORD,CLAIMTOBE
 
 # Integrity
 SEC_DEFAULT_INTEGRITY  = REQUIRED
 SEC_DAEMON_INTEGRITY = REQUIRED
 SEC_NEGOTIATOR_INTEGRITY = REQUIRED
 
 # Multicore
 # Disable DEFRAG
 #####DAEMON_LIST = $(DAEMON_LIST) DEFRAG
 
 DEFRAG_SCHEDULE = graceful
 
 DEFRAG_INTERVAL = 90  
 DEFRAG_MAX_CONCURRENT_DRAINING = 1 
 DEFRAG_DRAINING_MACHINES_PER_HOUR = 1.0
 DEFRAG_MAX_WHOLE_MACHINES = 4
 
 ## Allow some defrag configuration to be settable
 DEFRAG.SETTABLE_ATTRS_ADMINISTRATOR = DEFRAG_MAX_CONCURRENT_DRAINING,DEFRAG_DRAINING_MACHINES_PER_HOUR,DEFRAG_MAX_WHOLE_MACHINES
 ENABLE_RUNTIME_CONFIG = TRUE
 
 # The defrag depends on the number of spares already present, biased towards systems with many cpus
 DEFRAG_RANK = Cpus * pow(TotalCpus,(1.0 / 2.0))
 
 # Definition of a "whole" machine:
 DEFRAG_WHOLE_MACHINE_EXPR =  Cpus >= 8 && StartJobs =?= True && RalNodeOnline =?= True
 
 # Cancel once we have 8
 DEFRAG_CANCEL_REQUIREMENTS = Cpus >= 8 
 
 # Decide which slots can be drained
 DEFRAG_REQUIREMENTS = PartitionableSlot && StartJobs =?= True && RalNodeOnline =?= True
 
 ## Logs
 MAX_DEFRAG_LOG = 104857600
 MAX_NUM_DEFRAG_LOG = 10
 
 #DEFRAG_DEBUG = D_FULLDEBUG
 
 #NEGOTIATOR_DEBUG        = D_FULLDEBUG
 
 # Port limits
 HIGHPORT = 65000
 LOWPORT = 20000
 
 # History
 HISTORY = $(SPOOL)/history
 
 # Longer but better
 NEGOTIATE_ALL_JOBS_IN_CLUSTER = True
 
 ## Allow some negotiator configuration to be settable
 NEGOTIATOR.PERSISTENT_CONFIG_DIR=/var/lib/condor/persistent_config_dir
 NEGOTIATOR.ENABLE_PERSISTENT_CONFIG = True
 NEGOTIATOR.SETTABLE_ATTRS_ADMINISTRATOR = NEGOTIATOR_CYCLE_DELAY
 
 # Try to kill hogs
 SYSTEM_PERIODIC_REMOVE = RemoteWallClockTime > 259200
 
 # Try again with ones that have some vars temporarily undef
 SYSTEM_PERIODIC_RELEASE = (JobRunCount < 10 && (time() - EnteredCurrentStatus) > 1200 ) && (HoldReasonCode == 5 && HoldReasonSubCode == 0)
 
 


  • File: /etc/ld.so.conf.d/condor.conf
  • Notes: CONDOR needed this to access its libraries. I had to run 'ldconfig' to make it take hold.
  • Customise: Maybe not.
  • Content:
/usr/lib64/condor/
  • File: /usr/local/bin/scaling_factors_plugin.py
  • Notes: This implements part of the scaling factor logic (see the Notes on Accounting, Scaling andPublishing section, below.)
  • Customise: It should be generic.
  • Content:
#!/usr/bin/python
# Copyright 2014 Science and Technology Facilities Council
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#  http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import re
from os.path import isfile
import shutil
import datetime
import time
import os


"""Usage: scaling_factors_plugin.py <status> <control dir> <jobid>

Authplugin for FINISHING STATE

Example:

  authplugin="FINISHING timeout=60,onfailure=pass,onsuccess=pass /usr/local/bin/scaling_factors_plugin.py %S %C %I"

"""

def ExitError(msg,code):
    """Print error message and exit"""
    from sys import exit
    print(msg)
    exit(code)

def GetScalingFactor(control_dir, jobid):

    errors_file = '%s/job.%s.errors' %(control_dir,jobid)

    if not isfile(errors_file):
       ExitError("No such errors file: %s"%errors_file,1)

    f = open(errors_file)
    errors = f.read()
    f.close()

    scaling = -1

    m = re.search('MATCH_EXP_MachineRalScaling = \"([\dE\+\-\.]+)\"', errors)
    if m:
       scaling = float(m.group(1))

    return scaling


def SetScaledTimes(control_dir, jobid):

    scaling_factor = GetScalingFactor(control_dir, jobid)

    diag_file = '%s/job.%s.diag' %(control_dir,jobid)


    if not isfile(diag_file):
       ExitError("No such errors file: %s"%diag_file,1)

    f = open(diag_file)
    lines = f.readlines()
    f.close()

    newlines = []

    types = ['WallTime=', 'UserTime=', 'KernelTime=']

    for line in lines:
       for type in types:
          if type in line and scaling_factor > 0:
             m = re.search('=(\d+)s', line)
             if m:
                scaled_time = int(float(m.group(1))*scaling_factor)
                line = type + str(scaled_time) + 's\n'

       newlines.append(line)

    fw = open(diag_file, "w")
    fw.writelines(newlines)
    fw.close()
    # Save a copy. Use this for the DAPDUMP analyser.
    #tstamp = datetime.datetime.fromtimestamp(time.time()).strftime('%Y%m%d%H%M%S')
    #dest = '/var/log/arc/diagfiles/' + tstamp + '_' + os.path.basename(diag_file)
    #shutil.copy2(diag_file, dest)

    return 0


def main():
    """Main"""

    import sys

    # Parse arguments

    if len(sys.argv) == 4:
        (exe, status, control_dir, jobid) = sys.argv
    else:
        ExitError("Wrong number of arguments\n"+__doc__,1)

    if status == "FINISHING":
        SetScaledTimes(control_dir, jobid)
        sys.exit(0)

    sys.exit(1)

if __name__ == "__main__":
    main()


  • File: /usr/local/bin/debugging_rte_plugin.py
  • Notes: Useful for capturing debug output.
  • Customise: It should be generic.
  • Content:
#!/usr/bin/python

# This copies the files containing useful output from completed jobs into a directory 

import shutil

"""Usage: debugging_rte_plugin.py <status> <control dir> <jobid>

Authplugin for FINISHED STATE

Example:

  authplugin="FINISHED timeout=60,onfailure=pass,onsuccess=pass /usr/local/bin/debugging_rte_plugin.py %S %C %I"

"""

def ExitError(msg,code):
    """Print error message and exit"""
    from sys import exit
    print(msg)
    exit(code)

def ArcDebuggingL(control_dir, jobid):

    from os.path import isfile
   
    try:
        m = open("/var/spool/arc/debugging/msgs", 'a') 
    except IOError ,  err:
        print err.errno 
        print err.strerror 


    local_file = '%s/job.%s.local' %(control_dir,jobid)
    grami_file = '%s/job.%s.grami' %(control_dir,jobid)

    if not isfile(local_file):
       ExitError("No such description file: %s"%local_file,1)

    if not isfile(grami_file):
       ExitError("No such description file: %s"%grami_file,1)

    lf = open(local_file)
    local = lf.read()
    lf.close()

    if 'Organic Units' in local or 'stephen jones' in local:
        shutil.copy2(grami_file, '/var/spool/arc/debugging')

        f = open(grami_file)
        grami = f.readlines()
        f.close()
    
        for line in grami:
            m.write(line)
            if 'joboption_directory' in line:
               comment = line[line.find("'")+1:line.find("'",line.find("'")+1)]+'.comment'
               shutil.copy2(comment, '/var/spool/arc/debugging')
            if 'joboption_stdout' in line:
               mystdout = line[line.find("'")+1:line.find("'",line.find("'")+1)]
               m.write("Try Copy mystdout - " + mystdout + "\n")
               if isfile(mystdout):
                 m.write("Copy mystdout - " + mystdout + "\n")
                 shutil.copy2(mystdout, '/var/spool/arc/debugging')
               else:
                 m.write("mystdout gone - " + mystdout + "\n")
            if 'joboption_stderr' in line:
               mystderr = line[line.find("'")+1:line.find("'",line.find("'")+1)]
               m.write("Try Copy mystderr - " + mystderr + "\n")
               if isfile(mystderr):
                 m.write("Copy mystderr - " + mystderr + "\n")
                 shutil.copy2(mystderr, '/var/spool/arc/debugging')
               else:
                 m.write("mystderr gone - " + mystderr + "\n")

    close(m)
    return 0

def main():
    """Main"""

    import sys

    # Parse arguments

    if len(sys.argv) == 4:
        (exe, status, control_dir, jobid) = sys.argv
    else:
        ExitError("Wrong number of arguments\n",1)

    if status == "FINISHED":
       ArcDebuggingL(control_dir, jobid)
       sys.exit(0)

    sys.exit(1)

if __name__ == "__main__":
    main()
  

  • File: /usr/local/bin/default_rte_plugin.py
  • Notes: Sets up the default run time environment. Patched (25 Jul 2016) to work with xRSL and EMI-ES job file inputs.
  • Customise: It should be generic.
  • Content:
#!/usr/bin/python
# Copyright 2014 Science and Technology Facilities Council
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#  http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""Usage: default_rte_plugin.py <status> <control dir> <jobid> <runtime environment>

Authplugin for PREPARING STATE

Example:

  authplugin="PREPARING timeout=60,onfailure=pass,onsuccess=pass /usr/local/bin/default_rte_plugin.py %S %C %I <rte>"

"""

def ExitError(msg,code):
    """Print error message and exit"""
    from sys import exit
    print(msg)
    exit(code)

def SetDefaultRTE(control_dir, jobid, default_rte):

    from os.path import isfile

    desc_file = '%s/job.%s.description' %(control_dir,jobid)

    if not isfile(desc_file):
       ExitError("No such description file: %s"%desc_file,1)

    f = open(desc_file)
    desc = f.read()
    f.close()

    if default_rte not in desc:
      if '<esadl:ActivityDescription' in desc:
        lines = desc.split('\n')
        with open(desc_file, "w") as myfile:
          for line in lines:
            myfile.write( line + '\n')
            if '<Resources>' in line:
              myfile.write( '   <RuntimeEnvironment>\n')
              myfile.write( '     <Name>' + default_rte + '</Name>\n')
              myfile.write( '   </RuntimeEnvironment>\n')
      else:
        if '<jsdl:JobDefinition' not in desc:
          with open(desc_file, "a") as myfile:
            myfile.write("( runtimeenvironment = \"" + default_rte + "\" )")

    return 0

def main():
    """Main"""

    import sys

    # Parse arguments

    if len(sys.argv) == 5:
        (exe, status, control_dir, jobid, default_rte) = sys.argv
    else:
        ExitError("Wrong number of arguments\n"+__doc__,1)

    if status == "PREPARING":
        SetDefaultRTE(control_dir, jobid, default_rte)
        sys.exit(0)

    sys.exit(1)

if __name__ == "__main__":
    main()

 

  • File: /etc/lcmaps/lcmaps.db
  • Notes: Connects the authentication layer to an ARGUS server
  • Customise: Yes. It must be changed to suit your site.
  • Content:
path = /usr/lib64/lcmaps

verify_proxy = "lcmaps_verify_proxy.mod"
                    "-certdir /etc/grid-security/certificates"
                    "--discard_private_key_absence"
                    "--allow-limited-proxy"

pepc = "lcmaps_c_pep.mod"
            "--pep-daemon-endpoint-url https://hepgrid9.ph.liv.ac.uk:8154/authz"
            "--resourceid http://authz-interop.org/xacml/resource/resource-type/arc"
            "--actionid http://glite.org/xacml/action/execute"
            "--capath /etc/grid-security/certificates/"
            "--certificate /etc/grid-security/hostcert.pem"
            "--key /etc/grid-security/hostkey.pem"

# Policies:
arc:
verify_proxy -> pepc

  • File: /etc/profile.d/env.sh
  • Notes: Sets up environment variables for specific VO jobs.
  • Customise: Yes. It must be changed to suit your site.
  • Content:
if [ "X${GLITE_ENV_SET+X}" = "X" ]; then
. /usr/libexec/grid-env-funcs.sh
if [ "x${GLITE_UI_ARCH:-$1}" = "x32BIT" ]; then arch_dir=lib; else arch_dir=lib64; fi
gridpath_prepend     "PATH" "/bin"
gridpath_prepend     "MANPATH" "/opt/glite/share/man"

gridenv_set "DPM_HOST" "hepgrid11.ph.liv.ac.uk"
gridenv_set "DPNS_HOST" "hepgrid11.ph.liv.ac.uk"
gridenv_set "GLEXEC_LOCATION" "/usr"
gridenv_set "RFIO_PORT_RANGE" "20000,25000"
gridenv_set "SITE_GIIS_URL" "hepgrid4.ph.liv.ac.uk"
gridenv_set "SITE_NAME" "UKI-NORTHGRID-LIV-HEP"
gridenv_set "MYPROXY_SERVER" "lcgrbp01.gridpp.rl.ac.uk"

gridenv_set         "VO_ZEUS_SW_DIR" "/opt/exp_soft_sl5/zeus"
gridenv_set         "VO_ZEUS_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
gridenv_set         "VO_VO_NORTHGRID_AC_UK_SW_DIR" "/opt/exp_soft_sl5/northgrid"
gridenv_set         "VO_VO_NORTHGRID_AC_UK_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
gridenv_set         "VO_T2K_ORG_SW_DIR" "/cvmfs/t2k.gridpp.ac.uk"
gridenv_set         "VO_T2K_ORG_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
gridenv_set         "VO_SNOPLUS_SNOLAB_CA_SW_DIR" "/cvmfs/snoplus.gridpp.ac.uk"
gridenv_set         "VO_SNOPLUS_SNOLAB_CA_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
gridenv_set         "VO_PLANCK_SW_DIR" "/opt/exp_soft_sl5/planck"
gridenv_set         "VO_PLANCK_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
gridenv_set         "VO_PHENO_SW_DIR" "/opt/exp_soft_sl5/pheno"
gridenv_set         "VO_PHENO_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
gridenv_set         "VO_OPS_SW_DIR" "/opt/exp_soft_sl5/ops"
gridenv_set         "VO_OPS_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
gridenv_set         "VO_NEISS_ORG_UK_SW_DIR" "/opt/exp_soft_sl5/neiss"
gridenv_set         "VO_NEISS_ORG_UK_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
gridenv_set         "VO_NA62_VO_GRIDPP_AC_UK_SW_DIR" "/cvmfs/na62.cern.ch"
gridenv_set         "VO_NA62_VO_GRIDPP_AC_UK_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
gridenv_set         "VO_MICE_SW_DIR" "/cvmfs/mice.gridpp.ac.uk"
gridenv_set         "VO_MICE_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
gridenv_set         "VO_MAGIC_SW_DIR" "/opt/exp_soft_sl5/magic"
gridenv_set         "VO_MAGIC_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
gridenv_set         "VO_LHCB_SW_DIR" "/cvmfs/lhcb.cern.ch"
gridenv_set         "VO_LHCB_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
gridenv_set         "VO_LZ_SW_DIR" "/opt/exp_soft_sl5/lz"
gridenv_set         "VO_LZ_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
gridenv_set         "VO_LSST_SW_DIR" "/opt/exp_soft_sl5/lsst"
gridenv_set         "VO_LSST_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
gridenv_set         "VO_ILC_SW_DIR" "/cvmfs/ilc.desy.de"
gridenv_set         "VO_ILC_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
gridenv_set         "VO_GRIDPP_SW_DIR" "/opt/exp_soft_sl5/gridpp"
gridenv_set         "VO_GRIDPP_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
gridenv_set         "VO_GEANT4_SW_DIR" "/opt/exp_soft_sl5/geant4"
gridenv_set         "VO_GEANT4_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
gridenv_set         "VO_FUSION_SW_DIR" "/opt/exp_soft_sl5/fusion"
gridenv_set         "VO_FUSION_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
gridenv_set         "VO_ESR_SW_DIR" "/opt/exp_soft_sl5/esr"
gridenv_set         "VO_ESR_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
gridenv_set         "VO_EPIC_VO_GRIDPP_AC_UK_SW_DIR" "/opt/exp_soft_sl5/epic"
gridenv_set         "VO_EPIC_VO_GRIDPP_AC_UK_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
gridenv_set         "VO_DZERO_SW_DIR" "/opt/exp_soft_sl5/dzero"
gridenv_set         "VO_DZERO_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
gridenv_set         "VO_DTEAM_SW_DIR" "/opt/exp_soft_sl5/dteam"
gridenv_set         "VO_DTEAM_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
gridenv_set         "VO_CMS_SW_DIR" "/opt/exp_soft_sl5/cms"
gridenv_set         "VO_CMS_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
gridenv_set         "VO_CERNATSCHOOL_ORG_SW_DIR" "/cvmfs/cernatschool.gridpp.ac.uk"
gridenv_set         "VO_CERNATSCHOOL_ORG_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
gridenv_set         "VO_CDF_SW_DIR" "/opt/exp_soft_sl5/cdf"
gridenv_set         "VO_CDF_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
gridenv_set         "VO_CAMONT_SW_DIR" "/opt/exp_soft_sl5/camont"
gridenv_set         "VO_CAMONT_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
gridenv_set         "VO_CALICE_SW_DIR" "/opt/exp_soft_sl5/calice"
gridenv_set         "VO_CALICE_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
gridenv_set         "VO_BIOMED_SW_DIR" "/opt/exp_soft_sl5/biomed"
gridenv_set         "VO_BIOMED_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
gridenv_set         "VO_ATLAS_SW_DIR" "/cvmfs/atlas.cern.ch/repo/sw"
gridenv_set         "VO_ATLAS_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
gridenv_set         "VO_ALICE_SW_DIR" "/opt/exp_soft_sl5/alice"
gridenv_set         "VO_ALICE_DEFAULT_SE" "hepgrid11.ph.liv.ac.uk"
gridenv_set         "SITE_NAME" "UKI-NORTHGRID-LIV-HEP"
gridenv_set         "SITE_GIIS_URL" "hepgrid4.ph.liv.ac.uk"
gridenv_set         "RFIO_PORT_RANGE" ""20000,25000""
gridenv_set         "MYPROXY_SERVER" "lcgrbp01.gridpp.rl.ac.uk"
gridenv_set         "LCG_LOCATION" "/usr"
gridenv_set         "LCG_GFAL_INFOSYS" "lcg-bdii.gridpp.ac.uk:2170,topbdii.grid.hep.ph.ic.ac.uk:2170"
gridenv_set         "GT_PROXY_MODE" "old"
gridenv_set         "GRID_ENV_LOCATION" "/usr/libexec"
gridenv_set         "GRIDMAPDIR" "/etc/grid-security/gridmapdir"
gridenv_set         "GLITE_LOCATION_VAR" "/var"
gridenv_set         "GLITE_LOCATION" "/usr"
gridenv_set         "GLITE_ENV_SET" "TRUE"
gridenv_set         "GLEXEC_LOCATION" "/usr"
gridenv_set         "DPNS_HOST" "hepgrid11.ph.liv.ac.uk"
gridenv_set         "DPM_HOST" "hepgrid11.ph.liv.ac.uk"
. /usr/libexec/clean-grid-env-funcs.sh
fi


  • File: /etc/grid-security/grid-mapfile
  • Notes: Useful for directly mapping a user for testing. Superseded by ARGUS now, so optional.
  • Customise: Yes. It must be changed to suit your site.
  • Content:
"/C=UK/O=eScience/OU=Liverpool/L=CSD/CN=stephen jones" dteam184
  • File: /root/glitecfg/site-info.def
  • Notes: Just a copy of the site standard SID file. Used to make the accounts with YAIM.
  • Content: as per site standard
  • File: /root/glitecfg/vo.d
  • Notes: Just a copy of the site standard vo.d dir. Used to make the VOMS config with YAIM.
  • Content: as per site standard
  • File: /opt/glite/yaim/etc/users.conf
  • Notes: Just a copy of the site standard users.conf file. Used to make the accounts with YAIM.
  • Content: as per site standard
  • File: /opt/glite/yaim/etc/groups.conf
  • Notes: Just a copy of the site standard groups.conf file. Used to make the accounts with YAIM.
  • Content: as per site standard
  • File: /etc/arc/runtime/ENV/PROXY
  • Notes: Stops error messages of one kind or another
  • Content: empty
* File: /etc/init.d/nordugrid-arc-egiis
  • Notes: Stops error messages of one kind or another
  • Content: empty

Head Cron jobs

I had to add these cron jobs.

  • Cron: jura
  • Purpose: Run the jura APEL reporter now and again
  • Content:
16 6 * * * /usr/libexec/arc/jura /var/spool/arc/jobstatus &>> /var/log/arc/jura.log
  • Cron: fetch-crl
  • Purpose: Run fetch-crl
  • Content:
# Cron job running by default every 6 hours, at 45 minutes +/- 3 minutes
# The lock file can be enabled or disabled via a
# service fetch-crl-cron start
# chkconfig fetch-crl-cron on

# Note the lock file not existing is success (and over-all success is needed
# in order to prevent error messages from cron. "-q" makes it really
# quiet, but beware that the "-q" overrides any verbosity settings

42 */6 * * *	root	[ ! -f /var/lock/subsys/fetch-crl-cron ] || ( [ -f /etc/sysconfig/fetch-crl ] && . /etc/sysconfig/fetch-crl ; /usr/sbin/fetch-crl -q -r 360 $FETCHCRL_OPTIONS $FETCHCRL_CRON_OPTIONS )

Patch to give a fixed number of logical and physical CPUs

The GLUE2 schema shows that the TotalLogicalCPUs element is intended to represent the total installed capacity (otherwise known as the nameplate capacity or nominal capacity), i.e. including resources which are temporarily unavailable. But the out of the box behaviour yielded strange, varying values for the total of physical and logical cpus in the BDII output. That output is produced in this Perl module.

/usr/share/arc/ARC1ClusterInfo.pm

To fix the values to nominal, static values representative of the plate capacity at our site, I added these lines to that file (around line 586) which short-circuits the existing logic completely.

 $totalpcpus = 260;
 $totallcpus = 1994;


Patch for BDII Job Count Breakdown

I put in a set of patches (provided by Andrew Lahiff) to make corrections to the BDII output such that it gave individual breakdowns of job counts in glue1. This consisted of various parts. First, I added to some cron jobs to create job count statistics every 10 minutes.

*/10 * * * * root /usr/bin/condor_q -constraint 'JobStatus==2' -autoformat x509UserProxyVOName | sort | uniq -c > /var/local/condor_jobs_running
*/10 * * * * root /usr/bin/condor_q -constraint 'JobStatus==1' -autoformat x509UserProxyVOName | sort | uniq -c > /var/local/condor_jobs_idle

These create files in the following format, showing the job count of each VO.

# cat /var/local/condor_jobs_running
   805 atlas
    10 ilc
   251 lhcb

I made additional changes to /usr/share/arc/glue-generator.pl to parse these files and convert them to BDII output. First, I added two subroutines near the top of the file:

sub getCondorJobsRunning
{
   my ($vo) = @_;
   my $file = "/var/local/condor_jobs_running";
   if (-e $file)
   {
      open(FILE, "<$file");
      foreach my $line (<FILE>)
      {
         if ($line =~ /$vo/)
         {
            my @pieces = split(" ", $line);
            return $pieces[0];
         }
      }
      close(FILE);
   }
   return 0;
}


sub getCondorJobsIdle
{
   my ($vo) = @_;
   my $file = "/var/local/condor_jobs_idle";
   if (-e $file)
   {
      open(FILE, "<$file");
      foreach my $line (<FILE>)
      {
         if ($line =~ /$vo/)
         {
            my @pieces = split(" ", $line);
            return $pieces[0];
         }
      }
      close(FILE);
   }
   return 0;
}

And I used the following section of code lower in the file to build the new readings into the output. To insert this patch, delete all lines from the second "foreach (@vos){" down to the corresponding close bracket, then add this code:

            foreach (@vos){
                chomp;
                $vo = $_;
                $vo =~ s/VO:// ;
                my $vob;
                if ($vo =~ /(\w+)/ || $vo =~ /(\w+)\./) { $vob = $1; }

                my @pieces = split(/\s+/, $cluster_attributes{'nordugrid-cluster-localse'});
                my $useLocalSE = "";
                foreach my $piece (@pieces)
                {
                   if ($piece =~ /$vob/) { $useLocalSE = $piece; }
                }
                if ($vo =~ /superb/) { $useLocalSE = "srm-superb.gridpp.rl.ac.uk"; }
                if ($useLocalSE eq "") { $useLocalSE = "srm-dteam.gridpp.rl.ac.uk"; }

                my $myVoRunning = getCondorJobsRunning($vo);
                my $myVoIdle = getCondorJobsIdle($vo);
                my $myVoTotal = $myVoRunning + $myVoIdle;

                print "
dn: GlueVOViewLocalID=$vo,GlueCEUniqueID=$ce_unique_id,Mds-Vo-name=resource,o=grid
objectClass: GlueCETop
objectClass: GlueVOView
objectClass: GlueCEInfo
objectClass: GlueCEState
objectClass: GlueCEAccessControlBase
objectClass: GlueCEPolicy
objectClass: GlueKey
objectClass: GlueSchemaVersion
GlueSchemaVersionMajor: 1
GlueSchemaVersionMinor: 2
GlueCEInfoDefaultSE: $cluster_attributes{'nordugrid-cluster-localse'}
GlueCEStateTotalJobs: $myVoTotal
GlueCEInfoDataDir: unset
GlueCEAccessControlBaseRule: VO:$vo
GlueCEStateRunningJobs: $myVoRunning
GlueChunkKey: GlueCEUniqueID=$ce_unique_id
GlueVOViewLocalID: $vo
GlueCEInfoApplicationDir: unset
GlueCEStateWaitingJobs: $myVoIdle
GlueCEStateEstimatedResponseTime: $estRespTime
GlueCEStateWorstResponseTime: $worstRespTime
GlueCEStateFreeJobSlots: $freeSlots
GlueCEStateFreeCPUs: $freeSlots
";
            }

Alternative Patch for BDII Job Count Breakdown -- GRIF/IRFU modification

This is another way to do the same thing. I adapted the above patch so that it's simpler to implement even if it calls condor_q more often, which shouldn't have any impact on the bdii performance anyway.

Just apply this diff and you're good to go :

--- /usr/share/arc/glue-generator.pl.orig	2017-05-15 12:23:47.703420951 +0200
+++ /usr/share/arc/glue-generator.pl	2017-05-15 12:45:27.536352858 +0200
@@ -515,6 +515,8 @@
                 chomp;
         	$vo = $_;
         	$vo =~ s/VO:// ;
+          my $vo_running=`/usr/bin/condor_q -constraint 'JobStatus==2 && x509UserProxyVOName=="$vo"' -autoformat x509UserProxyVOName |/usr/bin/wc -l` ;
+          my $vo_waiting=`/usr/bin/condor_q -constraint 'JobStatus==1 && x509UserProxyVOName=="$vo"' -autoformat x509UserProxyVOName |/usr/bin/wc -l` ;
 
                 print "
 dn: GlueVOViewLocalID=$vo,GlueCEUniqueID=$ce_unique_id,Mds-Vo-name=resource,o=grid
@@ -532,11 +534,11 @@
 GlueCEStateTotalJobs: $totalJobs
 GlueCEInfoDataDir: unset
 GlueCEAccessControlBaseRule: VO:$vo
-GlueCEStateRunningJobs: $queue_attributes{'nordugrid-queue-running'}
+GlueCEStateRunningJobs: $vo_running
 GlueChunkKey: GlueCEUniqueID=$ce_unique_id
 GlueVOViewLocalID: $vo
 GlueCEInfoApplicationDir: unset
-GlueCEStateWaitingJobs: $waitingJobs
+GlueCEStateWaitingJobs: $vo_waiting
 GlueCEStateEstimatedResponseTime: $estRespTime
 GlueCEStateWorstResponseTime: $worstRespTime
 GlueCEStateFreeJobSlots: $freeSlots


note : the .pl file contains tabs, and so should this patch file (the 2 lines just before the "my bo_" variables declarations), otherwise the patch program will fail to apply the patch

Patch for Extra BDII Fields

To set the GlueCEPolicyMaxCPUTime and GlueCEPolicyMaxWallClockTime bdii publishing values, you need to change the lines involving GlueCEPolicyMaxCPUTime and GlueCEPolicyMaxWallClockTime in /usr/share/arc/glue-generator.pl. For example:

GlueCEPolicyMaxCPUTime: 4320
GlueCEPolicyMaxWallClockTime: 4320

Patch for Correct Cores Parsing

Sites can (and do) use floaring point numbers in the cpu counts. A detailed explanation of this is given here: Publishing_tutorial#Logical_and_physical_CPUs. In summary, the calculation of installed capacity involves timesing the average cores per logical cpu by the total number of logical cpus and timesing that by average HEPSPEC06 of a logical cpu. Obviously, average cores per logical cpu can be a floating point.

But the ARC system, as it stands, only reads Cores as an integer, so a change to the regexp is needed if the site uses a floating point number.

The problem lies in two spots in /usr/share/arc/glue-generator.pl. A regex is supposed to pull out the Cores=XXX.XXX value, but only matches integers. Since we set Cores to an average value (Cores=5.93,Benchmark...) it rounds down to 5, setting glueSubClusterPhysicalCPUs to 724/5 = 144. The true value should be 724/5.93 = 122.

I put in the patch below to "fix" it.

# svn diff   ./modules/emi-server/files/condor/glue-generator.pl
Index: modules/emi-server/files/condor/glue-generator.pl
===================================================================
--- modules/emi-server/files/condor/glue-generator.pl    (revision 2817)
+++ modules/emi-server/files/condor/glue-generator.pl    (working copy)
@@ -217,7 +217,7 @@
$glueHostArchitecturePlatformType=$cluster_attributes{'nordugrid-cluster-architecture'};
$glueSubClusterUniqueID=$cluster_attributes{'nordugrid-cluster-name'};
     $glueSubClusterName=$glue_site_unique_id;
-    if ( $processorOtherDesc =~ m/Cores=(\d+)/ ){
+    if ( $processorOtherDesc =~ m/Cores=([0-9]*\.?[0-9]+)/ ){
         $smpSize=$1;
$glueSubClusterPhysicalCPUs=int($cluster_attributes{'nordugrid-cluster-totalcpus'}/$smpSize);
     }
@@ -227,6 +227,7 @@
     }
$glueSubClusterLogicalCPUs=$cluster_attributes{'nordugrid-cluster-totalcpus'};
$glueClusterUniqueID=$cluster_attributes{'nordugrid-cluster-name'};
+        $smpSize = int($smpSize);

     WriteSubCluster();
     }
@@ -438,7 +439,7 @@
$glueHostArchitecturePlatformType=$queue_attributes{'nordugrid-queue-architecture'}; 
##XX
$glueSubClusterUniqueID=$queue_attributes{'nordugrid-queue-name'}; ##XX
$glueSubClusterName=$queue_attributes{'nordugrid-queue-name'};  ##XX
-        if ( $processorOtherDesc =~ m/Cores=(\d+)/ ){
+        if ( $processorOtherDesc =~ m/Cores=([0-9]*\.?[0-9]+)/ ){
             $smpSize=$1;
$glueSubClusterPhysicalCPUs=int($queue_attributes{'nordugrid-queue-totalcpus'}/$smpSize);
         }
@@ -448,6 +449,7 @@
         }
$glueSubClusterLogicalCPUs=$queue_attributes{'nordugrid-queue-totalcpus'}; 
##XX
$glueClusterUniqueID=$cluster_attributes{'nordugrid-cluster-name'}; ##XX
+                $smpSize = int($smpSize);

         WriteSubCluster();
         }

Patch to turn on SSL in APEL

After installing the Apel package, I had to make this changes by hand. On line 136 of the /usr/libexec/arc/ssmsend file, I had to add a parameter ; use_ssl = _use_ssl.

Install the vomsdir LSC Files

I used VomsSnooper to do this as follows.

# cd /opt/GridDevel/vomssnooper/usecases/getLSCRecords  
# sed -i -e \"s/ vomsdir/ \/etc\/grid-security\/vomsdir/g\" getLSCRecords.sh
# ./getLSCRecords.sh

Yaim to make head user accounts, /etc/vomses file and glexec.conf etc.

I used Yaim to do this as follows.

# yaim  -r -s /root/glitecfg/site-info.def -n ABC -f config_users
# yaim  -r -s /root/glitecfg/site-info.def -n ABC -f config_vomses
 # yaim -c -s /root/glitecfg/site-info.def -n GLEXEC_wn 

For this to work, ap priori, the site-info.def file must be present. A users.conf file and a groups.conf file must exist in the /opt/glite/yaim/etc/ directory. This is usually a part of any grid system CE install, but advice on how to prepare these is given in this Yaim guide (that I hope will be maintained for a little while longer.)

https://twiki.cern.ch/twiki/bin/view/LCG/YaimGuide400

(As far as I know, there is no reason for the headnode to use glexec.)

Head Services

I had to set some services running.

A-rex - the ARC CE service
condor - the CONDOR batch system service
nordugrid-arc-ldap-infosys – part of the bdii
nordugrid-arc-slapd – part of the bdii
nordugrid-arc-bdii – part of the bdii
gridftpd – the gridftp service


File cleanup

ARC keeps a prodigious number of tiny stale output files that need to be cleaned up. Eventually, so many are kept that the head node can run out of inodes or file space. I keep the system clean with a cronjob that runs a script like this one.


#!/bin/bash

MAXAGE=21
echo `date` cleanJobstatusDirs.sh starts with maxage of $MAXAGE days
fname=/opt/jobstatus_archive/jobstatus_"$(date +%Y%m%d%H%M%S)".tar
sleep 1

if [ ! -d /opt/jobstatus_archive ]; then
  mkdir /opt/jobstatus_archive
  if [ $? != 0 ]; then
    echo Some kind of problem so I cannot make the jobstatus_archive dir
    exit 1
  fi
fi

cd /var/spool/arc/jobstatus
if [ $? != 0 ]; then
  echo Some problem getting to the jobstatus dir so I am bailing out
  exit 1
fi

# Back up all the jobstatus files older than MAXAGE
tmpListOfOldFiles=$(mktemp /tmp/jobstatus_archive_files.XXXXXX)
find /var/spool/arc/jobstatus  -mtime +$MAXAGE -type f  > $tmpListOfOldFiles
tar -cf $fname -T $tmpListOfOldFiles
gzip $fname

# Delete all the jobstatus files older than MAXAGE
for f in `cat $tmpListOfOldFiles`; do
  echo Deleting empty file $f
  rm -f $f
done

tmpListOfOldDirs=$(mktemp /tmp/jobstatus_archive_dirs.XXXXXX)
for f in `cat $tmpListOfOldFiles`; do echo `dirname $f`; done | sort -n | uniq > $tmpListOfOldDirs 

for d in `cat $tmpListOfOldDirs`; do
  ls -1 $d | wc -l | grep -q "^0$"
  if [ $? == 0 ]; then
    echo Deleting empty dir $d
    rmdir $d
  fi
done

# Clean the delegations of empty dirs more than 90 days old
find /var/spool/arc/jobstatus/delegations/ -depth -type d -empty -mtime +90 -delete

# Clean the urs
find /var/urs -depth -type f -mtime +90 -delete

rm $tmpListOfOldFiles
rm $tmpListOfOldDirs


And that was it. That's all I did to get the server working, as far as I can recall.

Worker Node

Worker Standard build

As for the headnode, the basis for the initial worker node build follows the standard model for any workernode at Liverpool, prior to the installation of any middleware. Such a baseline build might include networking, cvmfs, iptables, nagios scripts, emi-wn package, ganglia, ssh etc.

Aside: After an installation mistake, it was discovered that an ordinary TORQUE workernode could be used as the basis of the build, and it would then be possible to use the same worker node on both ARC/CONDOR and CREAM/TORQUE systems, but not simultaneously. This idea was not persued, however.

Worker Extra Directories

I needed to make these directories:

/root/glitecfg
/etc/condor/config.d
/etc/grid-security/gridmapdir
/etc/arc/runtime/ENV
/etc/condor/ral
/data/condor_pool

And these:

/opt/exp_soft_sl5               # Note: this is our traditional software mount point
/usr/libexec/condor/scripts     # Only used by our autmatic test routines

On our system, exp_soft_sl5 is actually a mount point to a central location. CVMFS takes over this role now, but it might be necessary to set up a shared mount system such as this and point the VO software directories to it, as shown in the head node file /etc/profile.d/env.sh (see above.)

Worker Additional Packages

We had to install the main CONDOR package:

condor

And Andrew McNab's Machine Job Features package, which provides run time information that jobs can read.

mjf-htcondor-00.13-1.noarch

We also had to install some various bits of extra middleware:

emi-wn     # for glite-brokerinfo (at least)
lcg-util
libcgroup
fetch-crl
voms-clients3
voms
lcg-util-libs
lcg-util-python
lfc-devel
lfc
lfc-perl
lfc-python
uberftp
voms-clients3
voms
gfal2-plugin-lfc
HEP_OSlibs_SL6

These libraries were also needed:

libXft-devel
libxml2-devel
libXpm-devel

We also installed some things, mostly for various VOs, I think:

bzip2-devel
compat-gcc-34-c++
compat-gcc-34-g77
gcc-c++
gcc-gfortran
git
gmp-devel
imake
ipmitool
libgfortran
liblockfile-devel
ncurses-devel
python

Worker Files

  • File: /root/scripts/set_node_parameters.pl
  • Notes: This script senses the type of the system and sets it up according to how many slots it has etc.You'll also have to make arrangements to run this script once when you setup the machine. On the liverpool system, this is done with the following puppet stanza. If you are using Puppet with Hiera, you can probably parameterise these settings.
exec { "set_node_parameters.pl": command =>  "/root/scripts/set_node_parameters.pl > /etc/condor/config.d/00-node_parameters; \
/bin/touch /root/scripts/done-set_node_parameters.pl", require => [ File["/root/scripts/set_node_parameters.pl"], 
File["/etc/condor/config.d"] ], onlyif => "/usr/bin/test ! -f /root/scripts/done-set_node_parameters.pl", timeout => "86400" }
  • Customise: Yes. You'll need to edit it it to suit your site.
  • Content:
#!/usr/bin/perl

use strict;
my $foundType = 0;
my @outputLines;

#processor	: 3
#physical id	: 0

my $processors = 0;
my %physicalIds;

open(CPUINFO,"/proc/cpuinfo") or die("Can't open /proc/cpuinfo, $?");
while(<CPUINFO>) {
 
  if (/processor/) {
    $processors++;
  }
  if (/physical id\s*:\s*(\d+)/) {
    $physicalIds{$1} = 1;
  }
  if (/model name/) {
    if (! $foundType) {
      s/.*CPU\s*//;s/\s.*//;
      if (/E5620/){ 
        $foundType=1;
        push (@outputLines, "RalNodeLabel = E5620\n");
        push (@outputLines, "RalScaling =  1.205\n");
        push (@outputLines, "NUM_SLOTS = 1\n");
        push (@outputLines, "SLOT_TYPE_1               = cpus=10,mem=auto,disk=auto\n");
        push (@outputLines, "NUM_SLOTS_TYPE_1          = 1\n");
        push (@outputLines, "SLOT_TYPE_1_PARTITIONABLE = TRUE\n");
      }
      elsif (/L5420/){ 
        $foundType=1;
        push (@outputLines, "RalNodeLabel = L5420\n");
        push (@outputLines, "RalScaling =  0.896\n");
        push (@outputLines, "NUM_SLOTS = 1\n");
        push (@outputLines, "SLOT_TYPE_1               = cpus=8,mem=auto,disk=auto\n");
        push (@outputLines, "NUM_SLOTS_TYPE_1          = 1\n");
        push (@outputLines, "SLOT_TYPE_1_PARTITIONABLE = TRUE\n");
      }
      elsif (/X5650/){ 
        $foundType=1;
        push (@outputLines, "RalNodeLabel = X5650\n");
        push (@outputLines, "RalScaling =  1.229\n");
        push (@outputLines, "NUM_SLOTS = 1\n");
        push (@outputLines, "SLOT_TYPE_1               = cpus=16,mem=auto,disk=auto\n");
        push (@outputLines, "NUM_SLOTS_TYPE_1          = 1\n");
        push (@outputLines, "SLOT_TYPE_1_PARTITIONABLE = TRUE\n");
      }
      elsif (/E5-2630/){ 
        $foundType=1;
        push (@outputLines, "RalNodeLabel = E5-2630\n");
        push (@outputLines, "RalScaling =  1.386\n");
        push (@outputLines, "NUM_SLOTS = 1\n");
        push (@outputLines, "SLOT_TYPE_1               = cpus=18,mem=auto,disk=auto\n");
        push (@outputLines, "NUM_SLOTS_TYPE_1          = 1\n");
        push (@outputLines, "SLOT_TYPE_1_PARTITIONABLE = TRUE\n");
      }
      else {
        $foundType=1;
        push (@outputLines, "RalNodeLabel = BASELINE\n");
        push (@outputLines, "RalScaling =  1.0\n"); 
        push (@outputLines, "NUM_SLOTS = 1\n");
        push (@outputLines, "SLOT_TYPE_1               = cpus=8,mem=auto,disk=auto\n");
        push (@outputLines, "NUM_SLOTS_TYPE_1          = 1\n");
        push (@outputLines, "SLOT_TYPE_1_PARTITIONABLE = TRUE\n");
      }
    }
  }
}
close(CPUINFO); 
foreach my $line(@outputLines) {
  print $line;
}
my @keys = keys(%physicalIds);
my $numberOfCpus = $#keys+1;
print ("# processors : $processors\n");
print ("# numberOfCpus : $numberOfCpus\n");

exit(0);
 

  • File: /etc/condor/condor_config.local
  • Notes: The main client condor configuration custom file.
  • Customise: Yes. You'll need to edit it to suit your site.
  • Content:
##  What machine is your central manager?

CONDOR_HOST = hepgrid2.ph.liv.ac.uk

## Pool's short description

COLLECTOR_NAME = Condor at $(FULL_HOSTNAME)

## Put the output in a huge dir

EXECUTE = /data/condor_pool/

##  Make it switchable when this machine is willing to start a job 

ENABLE_PERSISTENT_CONFIG = TRUE
PERSISTENT_CONFIG_DIR = /etc/condor/ral
STARTD_ATTRS = $(STARTD_ATTRS) StartJobs, RalNodeOnline, OnlyMulticore
STARTD.SETTABLE_ATTRS_ADMINISTRATOR = StartJobs , OnlyMulticore
StartJobs = False
RalNodeOnline = False
OnlyMulticore = False

#START = ((StartJobs =?= True) && (RalNodeOnline =?= True) && (ifThenElse(OnlyMulticore =?= True,ifThenElse(RequestCpus =?= 8, True, False) ,True ) ))
START = ((StartJobs == True) && (RalNodeOnline == True) && (ifThenElse(OnlyMulticore == True,ifThenElse(RequestCpus == 8, True, False) ,True ) ))

##  When to suspend a job?

SUSPEND = FALSE

##  When to nicely stop a job?
# When a job is running and the PREEMPT expression evaluates to True, the 
# condor_startd will evict the job. The PREEMPT expression s hould reflect the 
# requirements under which the machine owner will not permit a job to continue to run. 
# For example, a policy to evict a currently running job when a key is hit or when 
# it is the 9:00am work arrival time, would be expressed in the PREEMPT expression 
# and enforced by the condor_startd. 

PREEMPT = FALSE

# If there is a job from a higher priority user sitting idle, the 
# condor_negotiator daemon may evict a currently running job submitted 
# from a lower priority user if PREEMPTION_REQUIREMENTS is True.

PREEMPTION_REQUIREMENTS = FALSE

# No job has pref over any other

#RANK = FALSE

##  When to instantaneously kill a preempting job
##  (e.g. if a job is in the pre-empting stage for too long)

KILL = FALSE

##  This macro determines what daemons the condor_master will start and keep its watchful eyes on.
##  The list is a comma or space separated list of subsystem names

DAEMON_LIST = MASTER, STARTD

ALLOW_WRITE = *

#######################################
# scaling 
#

STARTD_ATTRS = $(STARTD_ATTRS) RalScaling RalNodeLabel

#######################################
# Andrew Lahiff's tip for over committing memory

#MEMORY = 1.35 * quantize( $(DETECTED_MEMORY), 1000 )
MEMORY = 2.2 * quantize( $(DETECTED_MEMORY), 1000 )

#######################################
# Andrew Lahiff's security

ALLOW_WRITE = 

UID_DOMAIN = ph.liv.ac.uk

CENTRAL_MANAGER1 = hepgrid2.ph.liv.ac.uk
COLLECTOR_HOST = $(CENTRAL_MANAGER1)

# Central managers
CMS = condor_pool@$(UID_DOMAIN)/hepgrid2.ph.liv.ac.uk

# CEs
CES = condor_pool@$(UID_DOMAIN)/hepgrid2.ph.liv.ac.uk

# Worker nodes
WNS = condor_pool@$(UID_DOMAIN)/192.168.*

# Users
USERS = *@$(UID_DOMAIN)
USERS = *

# Required for HA
HOSTALLOW_NEGOTIATOR = $(COLLECTOR_HOST)
HOSTALLOW_ADMINISTRATOR = $(COLLECTOR_HOST)
HOSTALLOW_NEGOTIATOR_SCHEDD = $(COLLECTOR_HOST)

# Authorization
HOSTALLOW_WRITE =
ALLOW_READ = */*.ph.liv.ac.uk
NEGOTIATOR.ALLOW_WRITE = $(CES), $(CMS)
COLLECTOR.ALLOW_ADVERTISE_MASTER = $(CES), $(CMS), $(WNS)
COLLECTOR.ALLOW_ADVERTISE_SCHEDD = $(CES)
COLLECTOR.ALLOW_ADVERTISE_STARTD = $(WNS)
SCHEDD.ALLOW_WRITE = $(USERS)
SHADOW.ALLOW_WRITE = $(WNS), $(CES)
ALLOW_DAEMON = condor_pool@$(UID_DOMAIN)/*.ph.liv.ac.uk, $(FULL_HOSTNAME)
ALLOW_ADMINISTRATOR = root@$(UID_DOMAIN)/$(IP_ADDRESS), condor_pool@$(UID_DOMAIN)/$(IP_ADDRESS), $(CMS)
ALLOW_CONFIG = root@$(FULL_HOSTNAME)

# Temp debug
#ALLOW_WRITE = $(FULL_HOSTNAME), $(IP_ADDRESS), $(CONDOR_HOST)


# Don't allow nobody to run jobs
SCHEDD.DENY_WRITE = nobody@$(UID_DOMAIN)

# Authentication
SEC_PASSWORD_FILE = /etc/condor/pool_password
SEC_DEFAULT_AUTHENTICATION = REQUIRED
SEC_READ_AUTHENTICATION = OPTIONAL
SEC_CLIENT_AUTHENTICATION = REQUIRED
SEC_DEFAULT_AUTHENTICATION_METHODS = PASSWORD,FS
SCHEDD.SEC_WRITE_AUTHENTICATION_METHODS = FS,PASSWORD
SCHEDD.SEC_DAEMON_AUTHENTICATION_METHODS = FS,PASSWORD
SEC_CLIENT_AUTHENTICATION_METHODS = FS,PASSWORD,CLAIMTOBE
SEC_READ_AUTHENTICATION_METHODS = FS,PASSWORD,CLAIMTOBE

# Integrity
SEC_DEFAULT_INTEGRITY  = REQUIRED
SEC_DAEMON_INTEGRITY = REQUIRED
SEC_NEGOTIATOR_INTEGRITY = REQUIRED

# Separation
USE_PID_NAMESPACES = False

# Smooth updates
MASTER_NEW_BINARY_RESTART = PEACEFUL

# Give jobs 3 days
MAXJOBRETIREMENTTIME = 3600 * 24 * 3

# Port limits
HIGHPORT = 65000
LOWPORT = 20000

# Startd Crons
STARTD_CRON_JOBLIST=TESTNODE
STARTD_CRON_TESTNODE_EXECUTABLE=/usr/libexec/condor/scripts/testnodeWrapper.sh
STARTD_CRON_TESTNODE_PERIOD=300s

# Make sure values get over
STARTD_CRON_AUTOPUBLISH = If_Changed

# One job per claim
CLAIM_WORKLIFE = 0

# Enable CGROUP control
BASE_CGROUP = htcondor
# hard: job can't access more physical memory than allocated
# soft: job can access more physical memory than allocated when there is free memory
CGROUP_MEMORY_LIMIT_POLICY = soft

# Use Machine-Job-Features
USER_JOB_WRAPPER=/usr/sbin/mjf-job-wrapper
 
  • File: /etc/profile.d/liv-lcg-env.sh
  • Notes: Some environment script needed by the system.
  • Customise: Yes. You'll need to edit it it to suit your site.
  • Content:
export ATLAS_RECOVERDIR=/data/atlas
EDG_WL_SCRATCH=$TMPDIR

ID=`id -u`

if [ $ID -gt 19999 ]; then
  ulimit -v 10000000
fi


  • File: /etc/profile.d/liv-lcg-env.csh
  • Notes: Some other environment script needed by the system.
  • Customise: Yes. You'll need to edit it it to suit your site.
  • Content:
setenv ATLAS_RECOVERDIR /data/atlas
if ( "$?TMPDIR" == "1" ) then
setenv EDG_WL_SCRATCH $TMPDIR
else
setenv EDG_WL_SCRATCH ""
endif



  • File: /etc/condor/pool_password
  • Notes: Will have its own section (TBD)
  • Customise: Yes.
  • Content: The content is the same as the one on the head node (see above).
  • File: /root/glitecfg/site-info.def
  • Notes: Just a copy of the site standard SID file. Used to make the accounts.
  • Content: as per site standard
  • File: /root/glitecfg/vo.d
  • Notes: Just a copy of the site standard vo.d dir. Used to make the accounts.
  • Content: as per site standard
  • File: /opt/glite/yaim/etc/users.conf
  • Notes: Just a copy of the site standard users.conf file. Used to make the accounts.
  • Content: as per site standard
  • File: /opt/glite/yaim/etc/groups.conf
  • Notes: Just a copy of the site standard groups.conf file. Used to make the accounts.
  • Content: as per site standard
  • File: /etc/lcas/lcas-glexec.db
  • Notes: Stops yaim from complaining about missing file
  • Content: empty
  • File: /etc/arc/runtime/ENV/GLITE
  • Notes: Same as the head node version; see above. The GLITE runtime environment.
  • Content: See above
  • File: /etc/arc/runtime/ENV/PROXY
  • Notes: Same as the head node version; see above. Stops error messages of one kind or another
  • Content: empty
  • File: /usr/etc/globus-user-env.sh
  • Notes: Jobs just need it to be there.
  • Content: empty

Worker Cron jobs

We run a cronjob to keep cvmfs clean:

0 5 */3 * * /root/bin/cvmfs_fsck.sh >> /var/log/cvmfs_fsck.log 2>&1

Worker Special notes

None to speak of (yet).

Worker user accounts

As with the head node, I used Yaim to do this as follows.

# yaim  -r -s /root/glitecfg/site-info.def -n ABC -f config_users

For this to work, ap priori, a users.conf file and a groups.conf file must exist in the /opt/glite/yaim/etc/ directory. This is usually a part of any grid system CE install, but advice on how to prepare these is given in this Yaim guide (that I hope will be maintained for a little while longer.)

https://twiki.cern.ch/twiki/bin/view/LCG/YaimGuide400

Worker Services

You have to set this service running:

condor

Workernode On/Off Control (and Health Checking)

For health checking, we use a script to check the worker node and "turns it off" if it fails. To implement this, we use a CONDOR feature; startd_cron jobs.

This config in the /etc/condor_config.local file on a worker node defines some new configuration variables.

ENABLE_PERSISTENT_CONFIG = TRUE
PERSISTENT_CONFIG_DIR = /etc/condor/ral
STARTD_ATTRS = $(STARTD_ATTRS) StartJobs, RalNodeOnline
STARTD.SETTABLE_ATTRS_ADMINISTRATOR = StartJobs
StartJobs = False
RalNodeOnline = False

The prefix "Ral" is used here because some of this material is inherited from Andrew Lahiff at RAL. It's just to de-conflict names.

Anyway, the first section says to keep a persistent record of configuration settings; it adds new configuration settings called "StartJobs" and “RalNodeOnline”; it sets them initially to False; and it makes the START configuration setting dependant upon them both being set. Note: the START setting is very important because the node won't start jobs unless it is True.

Next, this config also in the /etc/condor/condor_config.local file tells the system (startd) to run a cron script every five minutes.

STARTD_CRON_JOBLIST=TESTNODE
STARTD_CRON_TESTNODE_EXECUTABLE=/usr/libexec/condor/scripts/testnodeWrapper.sh
STARTD_CRON_TESTNODE_PERIOD=300s

# Make sure values get over
STARTD_CRON_AUTOPUBLISH = If_Changed

The testnodeWrapper.sh script looks like this:

#!/bin/bash

MESSAGE=OK

/usr/libexec/condor/scripts/testnode.sh > /dev/null 2>&1
STATUS=$?

if [ $STATUS != 0 ]; then
  MESSAGE=`grep ^[A-Z0-9_][A-Z0-9_]*=$STATUS\$ /usr/libexec/condor/scripts/testnode.sh | head -n 1 | sed -e "s/=.*//"`
  if  -z "$MESSAGE" ; then
    MESSAGE=ERROR
  fi
fi

if  $MESSAGE =~ ^OK$  ; then
  echo "RalNodeOnline = True"
else
  echo "RalNodeOnline = False"
fi
echo "RalNodeOnlineMessage = $MESSAGE"

echo `date`, message $MESSAGE >> /tmp/testnode.status
exit 0

This just wraps an existing script which I reuse from our TORQUE/MAUI cluster. The existing script just returns a non-zero code if any error happens. To add a bit of extra info, it also looks up the meaning of the code. The important thing to notice is that it echoes out a line to set the RalNodeOnline setting to false. This is then used in the setting of START. Note: on TORQUE/MAUI, the script ran as “root”; here it runs as “condor”. It uses sudo for some of the sections which (e.g.) check disks etc. because condor could not get smartctl settings etc.

When a node fails the test, START goes to False and the node won't run more jobs.

For On/Off control, we use another setting to control START (as well as RalNodeOnline). We have the "StartJobs" setting. We can control this independently, so we can turn a node offline whether or not it has an error. This is useful for stopping the node to (say) rebuild it, similar to the pbs_nodes command on TORQUE/MAUI. The command to control the worker node can be issued remotely from the head node, like this.

condor_config_val -verbose -name r21-n01 -startd -set "StartJobs = false"
condor_reconfig r21-n01
condor_reconfig -daemon startd r21-n01

GOCDB Entries and Registration

Add new service entries for the head node in GOCDB for the following service types.

  • gLite-APEL
  • gLExec
  • ARC-CE

It is safe to monitor all these services, once they are marked in production. Once the system is in GOCDB, the accounting system, APEL, will be able to accept accounting records (or contact APEL-SUPPORT@JISCMAIL.AC.UK.)

Also contact representatives of the big experiments and tell them about the new CE. Ask Atlas to add the new CE in its analysis, production and multicore job queues.

Software Tags

The use of software tags has almost disappeared since we started using CVMFS. We expect that to continue.

An ARC CE, unlike CREAM, does not support software tags in the same way. ARC has a different but broadly equivalent mechanism of its own, called ARC runtime environments. These get published in the same way as software tags in the information system. The site admin has to put files into the runtimedir directory (e.g. /etc/arc/runtime). For example, at Liverpool, I've put in this tag for biomed:

# ls /etc/arc/runtime/
/etc/arc/runtime/VO-biomed-CVMFS

These are managed by our configuration management system - VOs can't make changes themselves. Users can query for the tag as so:

# ldapsearch -LLL -x -h lcg-bdii.gridpp.ac.uk:2170 -b o=grid 'GlueSubClusterUniqueID=hepgrid2.ph.liv.ac.uk' GlueHostApplicationSoftwareRunTimeEnvironment
...
GlueHostApplicationSoftwareRunTimeEnvironment: VO-biomed-CVMFS
...

Notes on Accounting, Scaling and Publishing

Background

Various notes on Jura accounting are available on the nordugrid wiki. I gave a presentation on Accounting, Scaling and Publishing for ARC/Condor and other systems at GridPP37 in Ambleside, UK, which forms the basis for the Benchmarking procedure. The material in this section is all based off the CREAM/Torque publishing tutorial written some time ago: Publishing_tutorial.

The salient points in this document explain (A) how to apply scaling factors to individual nodes in a mixed cluster and (B) how total power of a site is transmitted. I'll first lay out how it was done in CREAM/TORQUE and then explain the changes required to make it relates to ARC/CONDOR.

Historical Set-up with CREAM/TORQUE/MAUI

Application of Scaling Factors

At Liverpool, we introduced an abstract node-type, called BASELINE, with a reference value of 10 HEPSPEC. This is transmitted to the information system on a per CE basis, and can be seen as follows.

$ ldapsearch -LLL -x -h hepgrid4:2170 -b o=grid GlueCEUniqueID=hepgrid5.ph.liv.ac.uk:8443/cream-pbs-long GlueCECapability | perl -p0e 's/\n //g'

GlueCECapability: CPUScalingReferenceSI00=2500

All CE's share the same value. Note: The value of 2500 corresponds to 10 HEPSPEC expressed in “bogoSpecInt2k” (which is equal to 1/250th of a HEPSPEC).

All real nodes receive a TORQUE scaling factor that describes how powerful their slots are relative to the abstract reference. For example, a machine with slightly less powerful slots than BASELINE might have a factor of 0.896. TORQUE then automatically normalises cpu durations with the scaling factor. Thus the accounting system merely needs to know the CPUScalingReferenceSI00 value to be able to compute work done.

Transmit Total Power of a Site

The total power of a site is conveyed to the information system by sending out values for Total Logical Cpus (or unislots) and Benchmark (average power of a single slot) and multiplying them together. It is done on a per CE basis, and the calculation at Liverpool (which then had 4 CREAM CEs) looks like this:

$ ldapsearch -LLL -x -h hepgrid4:2170 -b o=grid GlueSubClusterUniqueID=hepgrid5.ph.liv.ac.uk GlueSubClusterLogicalCPUs GlueHostProcessorOtherDescription | perl -p0e 's/\n //g'

GlueSubClusterLogicalCPUs: 1
GlueHostProcessorOtherDescription: Cores=6.23,Benchmark=12.53-HEP-SPEC06
$ ldapsearch -LLL -x -h hepgrid4:2170 -b o=grid GlueSubClusterUniqueID=hepgrid6.ph.liv.ac.uk GlueSubClusterLogicalCPUs GlueHostProcessorOtherDescription | perl -p0e 's/\n //g'
GlueSubClusterLogicalCPUs: 1
GlueHostProcessorOtherDescription: Cores=6.23,Benchmark=12.53-HEP-SPEC06
$ ldapsearch -LLL -x -h hepgrid4:2170 -b o=grid GlueSubClusterUniqueID=hepgrid10.ph.liv.ac.uk GlueSubClusterLogicalCPUs GlueHostProcessorOtherDescription | perl -p0e 's/\n //g'
GlueSubClusterLogicalCPUs: 1
GlueHostProcessorOtherDescription: Cores=6.23,Benchmark=12.53-HEP-SPEC06
$ ldapsearch -LLL -x -h hepgrid4:2170 -b o=grid GlueSubClusterUniqueID=hepgrid97.ph.liv.ac.uk GlueSubClusterLogicalCPUs GlueHostProcessorOtherDescription | perl -p0e 's/\n //g'
GlueSubClusterLogicalCPUs: 1381
GlueHostProcessorOtherDescription: Cores=6.23,Benchmark=12.53-HEP-SPEC06
$ bc -l
(1 + 1 + 1 + 1381) * 12.53

Giving 17341.52 HEPSPEC

Note: All 1384 nodes are/were available to each CE to submit to, but the bulk is allocated for hepgrid97 for the purposes of power publishing only.

The Setup with ARC/CONDOR

Application of Scaling Factors

There's an ARC “authplugin” script called scaling_factors_plugin.py, that gets run when a job finishes. It normalises the accounting. It gets a MachineRalScaling (that has been buried in an “errors” file. See “RalScaling” below) then parses the diag file, multiplying the run-times by the factor.

Also in ARC is a “jobreport_options” parameter that contains (e.g.) “benchmark_value:2500.00". I assume this is the equivalent of the “GlueCECapability: CPUScalingReferenceSI00=2500 ” in the “Application of Scaling Factors” section above, i.e. it is in bogospecint2k (250 * HEPSPEC). I assume that it represents the power of the reference node type, i.e. the power to which all the other nodes relate by way of their individual scaling factor.

The next thing considered is this RalScaling / MachineRalScaling mechanism. This is set in one of the config files on the WNs:

RalScaling = 2.14
STARTD_ATTRS = $(STARTD_ATTRS) RalScaling

It tells the node how powerful it is by setting a new variable with some arbitrary name. This goes on the ARC CE:

MachineRalScaling = "$$([ifThenElse(isUndefined(RalScaling), 1.00, RalScaling)])"
SUBMIT_EXPRS = $(SUBMIT_EXPRS) MachineRalScaling

This gets hold of the RalScaling variable on the WN, then passes it through via the SUBMIT_EXPRS parameter. It winds up in the “errors” file, which is then used in a normalisation script. Note that the scaling factor is applied to the workernode at build time by the set_node_parameters.pl script described in the Files section above.

Notes on HEPSPEC Publishing Parameters

The Publishing_tutorial describes a situation where Yaim is used to convert and transfer the information. In this case, the same data has to be transposed into the arc.conf configuration file so that the ARC BDII can access and publish the values. The following table shows how to map the YAIM values references in the tutorial to the relevant configuration settings in the ARC system.


Worker node hardware
Description Yaim variable ARC Conf Section Example ARC Variable Notes
Total physical cpus in cluster CE_PHYSCPU=114 N/A N/A No equivalent in ARC
Total cores/logical-cpus/unislots/threads... in cluster CE_LOGCPU=652 [cluster] and [queue/grid] totalcpus=652 Only 1 queue; same in both sections
Accounting Scaling CE_CAPABILITY="CPUScalingReferenceSI00=2500 ... [grid-manager] jobreport_options="... benchmark_value:2500.00" Provides the reference for accounting
Power of 1 logical cpu, in HEPSPEC * 250 (bogoSI00) CE_SI00 [infosys/glue12] NA See Yaim Manual; equivalent to benchmark * 250


Cores: the average unislots in a physical cpu CE_OTHERDESCR=Cores=n.n, ... [infosys/glue12] processor_other_description="Cores=5.72 ..." Yaim var was shared with Benchmark (below)
Benchmark: The scaled power of a single core/logical-cpu/unislot/thread ... CE_OTHERDESCR=....,Benchmark=11.88-HEP-SPEC06 [infosys/glue12] processor_other_description="...,Benchmark=11.88-HEP-SPEC06" Yaim var was shared with Cores (above)



Once the system is operating, the following script can be used to test the published power of your site.

#!/usr/bin/perl

my @glasgow = qw ( svr010.gla.scotgrid.ac.uk  svr011.gla.scotgrid.ac.uk  svr014.gla.scotgrid.ac.uk  svr026.gla.scotgrid.ac.uk);
my @liverpoolCE = qw (hepgrid5.ph.liv.ac.uk hepgrid6.ph.liv.ac.uk hepgrid10.ph.liv.ac.uk hepgrid97.ph.liv.ac.uk );
my @liverpoolCE = qw (hepgrid2.ph.liv.ac.uk );

my $power = 0;
for my $server (@liverpoolCE  ) {
  my $p = getPower($server);
  $power = $power + $p;
}

print("Total power is $power\n");

sub getPower() {

  $bdii = "hepgrid2.ph.liv.ac.uk:2135";

  my $server = shift;

  open(CMD,"ldapsearch -LLL -x -h $bdii -b o=grid 'GlueSubClusterUniqueID=$server' |") or die("No get $server stuff");
  my $buf = ;
  my @lines;
  while (<CMD>) {
    chomp();
    if (/^ /) {
      s/^ //; $buf .= $_;
    }
    else {
      push(@lines,$buf); $buf = $_;
    }
  } 
  close(CMD);
  push(@lines,$buf);
  
  my $avgHepspec = -1;
  my $slots = -1;
  foreach my $l (@lines) {
    if ($l =~ /^GlueHostProcessorOtherDescription: Cores=([0-9\.]+),Benchmark=([0-9\.]+)-HEP-SPEC06/) {
      $avgHepspec = $2;
      print("avgHepspec -- $avgHepspec, $l\n");
    }
    if ($l =~ /^GlueSubClusterLogicalCPUs: ([0-9]+)/) {
      $slots = $1;
      print("slots      -- $slots\n");
    }
  }
  
  die("Reqd val not found $avgHepspec $slots \n") if (($avgHepspec == -1) or ($slots == -1));

  my $power =  $avgHepspec * $slots;
  print("power avgHepspec slots, $power, $avgHepspec, $slots\n");
  return $power;
}

Transmit Total Power of a Site

At present, there is no mechanism for that as far as I know.

Republishing Accounting Records

You can find some more reading on what you need to do to publish when you setup a new ARC-CE in this page this twiki

Republishing records from ARC is only possible for APEL if archiving option was set up in the arc.conf (see above for the settings). If this was set for the period covered, you can use the script below (called merge-and-create-publish.sh, and written by Jernej Porenta) for collecting the relevant archived records and putting them in the republishing directory. After doing this, you can run jura publishing in the normal manner, or wait for the cron job to kick off. You must set the following attributes in the script before running it.

  • archiving directory
  • required data gap
  • output directory for a new file
#!/bin/bash

# Script to create republish data for JURA from archive dir

# JURA archive dir, where all the old accounting records from ARC are saved (archiving setting from jobreport_options in arc.conf)
ARCHIVEDIR="/var/urs/"

# Time frame of republish data
FROM="28-Feb-2015"
TO="02-Apr-2015"

# Output directory for new files, which should go into JURA outgoing dir (usually: /var/spool/arc/ssm/<APEL server>/outgoing/00000000
OUTPUT="/var/spool/arc/ssm/mq.cro-ngi.hr/outgoing/00000000/"

#####

TMPFILE="file.$$"

if [ ! -d $OUTPUT ] || [ ! -d $ARCHIVEDIR ]; then
        echo "Output or Archive dir is missing"
        exit 0
fi


# find all accounting records from archive dir with modifiation time in the specified timeframe and paste the records into temporary file
find $ARCHIVEDIR -type f -name 'usagerecordCAR.*' -newermt "$FROM -1 sec" -and -not -newermt "$TO -1 sec" -printf "%C@ %p\n" | sort | awk '{ print $2 }' | xargs -L1 -- grep -h UsageRecord >> $TMPFILE

# fix issues with missing CpuDuration
perl -p -i -e 's|WallDuration><ServiceLevel|WallDuration><CpuDuration urf:usageType="all">PT0S</CpuDuration><ServiceLevel|' $TMPFILE

# split the temporary file into smaller files with only 999 accounting records each
split -a 4 -l 999 -d $TMPFILE $OUTPUT/

# rename the files into format that JURA publisher will understand
for F in `find $OUTPUT -type f`; do
        FILE=`basename $F`
        NEWFILE=`date -d "$FROM + $FILE second" +%Y%m%d%H%M%S`
        mv -v $OUTPUT/$FILE $OUTPUT/$NEWFILE
done

# prepend XML tags for accounting files
find $OUTPUT -type f -print0 | xargs -0 -L1 -- sed -i '1s/^/<?xml version="1.0"?>\n<UsageRecords xmlns="http:\/\/eu-emi.eu\/namespaces\/2012\/11\/computerecord">\n/'

# attach XML tags for accounting files
for file in `find $OUTPUT -type f`; do
        echo "</UsageRecords>" >> $file
done

rm -f $TMPFILE

echo "Publish files are in $OUTPUT directory"

Tests and Testing

The following URL lists some critical tests for ATLAS, and the Liverpool site. You'll have to modify the site name.

http://dashb-atlas-sum.cern.ch/dashboard/request.py/historicalsmryview-sum#view=serviceavl&time[]=last48&granularity[]=default&profile=ATLAS_CRITICAL&group=All+sites&site[]=UKI-NORTHGRID-LIV-HEP&flavour[]=All+Service+Flavours&flavour[]=ARC-CE&disabledFlavours=true

To check the UK job submission status:

http://bigpanda.cern.ch/dash/production/?cloudview=region&computingsite=*MCORE*#cloud_UK 

Defragmentation for multicore jobs

In this section, I discuss various approaches to defragmenting a cluster to make room for multi-core jobs.

Fallow

I currently recommend Fallow over the other methods I have tried.

Introduction to Fallow

Fallow is a tool based on the older idea, DrainBoss (see below). Fallow is smaller, simpler and more precise. The integral term (which was complex) has been dropped and the proportional controller has been simplified.

Config Settings

To use Fallow, some new config is required on the workernodes. The reason for this is described below in the Principles of Operation section.

Lines in the /etc/condor/condor_config.local file need to be amended to hold the OnlyMulticore attribute, as show here.

 ENABLE_PERSISTENT_CONFIG = TRUE
 PERSISTENT_CONFIG_DIR = /etc/condor/ral
 STARTD_ATTRS = $(STARTD_ATTRS) StartJobs, RalNodeOnline, OnlyMulticore
 STARTD.SETTABLE_ATTRS_ADMINISTRATOR = StartJobs , OnlyMulticore
 OnlyMulticore = False

And the START classad, in the same file, has to be modified to use the OnlyMulticore attribute, as follows.

START = ((StartJobs =?= True) && (RalNodeOnline =?= True) && (ifThenElse(OnlyMulticore =?= True,ifThenElse(RequestCpus =?= 8, True, False) ,True ) ))

The OnlyMulticore attribute is a persistent, settable attribute that can be altered by (say) an admin user or a script. The START classad, which is consulted before a job is started, will only yield True for a specific job if (as well as certain other conditions) OnlyMulticore is False, or OnlyMulticore is True and the job needs 8 cpus. Thus the node can be controlled to bar it from running single-core jobs by making OnlyMulticore true.

Principles of Operation

Fallow takes a parameter that tells it how many unislots (single cores) should be used ideally by multi-core jobs. This is called the setpoint.

Fallow detects how many multi-core and single-core jobs are running and queued, and uses the OnlyMulticore attribute (see below) to control whether nodes are allowed or not to run single-core jobs. A node that is not allowed to run single-core jobs is, effectively, draining.

It does nothing if there are no jobs in the queue or if there are only multi-core jobs in the queue. This is OK because the cluster is already effectively draining if there are no single-core jobs in the queue, and it's pointless doing anything if there are no jobs at all in the queue.

If there are only single-cores in the queue, Fallow sets OnlyMulticore on all nodes to False, allowing all nodes to any type of job. This is OK because there are no multi-core jobs waiting, so no reservations are wanted.

If there are multi-core and single-core jobs in the queue, Fallow uses the following algorithm.

Fallow works out how many multi-core (8 core) slots are needed to achieve the setpoint. Fallow exits if there are already enough running (Fallow never stops a running job to achieve the setpoint.)

Fallow then subtracts the running jobs from the desired to find how many newly drained nodes would be needed to reach the desired state. This gives the number of new nodes to set OnlyMulticore.

Fallow obtains a list of nodes that can run jobs. It then removes from the list those nodes that are already OnlyMulticore but not yet with 8 cores of slack; these are already in progress.

Fallow then tries to find a set of nodes that are not OnlyMulticore , and sets them OnlyMulticore, starting the drain. Following this algorithm, the system should eventually converge on the correct number of multi-core jobs as desired.

To avoid confusion, I haven't yet mentioned how newly drained nodes are put back online. This is actually done as the first thing in Fallow. It scans all the nodes, finding ones that are OnlyMulticore but which have now got 8 cores of slack. It turns OnlyMulticore off for those nodes, putting them back into service.

Preferring Multicore Jobs

Algorithmic

For this system to work, it is necessary for it to prefer to start multi-core jobs over single-core jobs. This is because the drain process described above is futile if single-core jobs grab the newly prepared nodes. The system at Liverpool ensures this through various measures. The first and most effective measure is inherent in the Fallow algorithm. As a node drains in OnlyMulticore mode, single-core jobs are not allowed. At some point, 8 or more slots will become free. The system will schedule a multicore job in those slots, because single-core jobs are barred. The next run of Fallow will put the node back in service by allowing single-core jobs, but it is too late - a multicore job is (usually) already running, assuming any were queued.

The only exception to this is a race condition. Say the condor scheduler considers a draining (OnlyMulticore) node and finds that it has too few free cores to schedule a multi-core job. Then say that between then, and the next run of Fallow, enough cores become free. Fallow will then run and turn off OnlyMulticore. The first run of the scheduler after Fallow can then start a single-core job, which spoils the plan.

Fallow has logic to counter this. After Fallow discovers a node has enough cores to turn OnlyMulticore off, it waits for a period exceeding one scheduling cycle to ensure that the scheduler has a chance to put a multi-core job on it. Only then does fallow turn OnlyMulticore off. The scheduling cycle period is given to Fallow as a command line parameter.

It is recommended anyway that the scheduler should run much more frequently than Fallow, to minimise the chance that this window will be available. There are also other measures that can be used to give more certainly over this aspect, described next for the sake of completeness.

User Priorities

On our cluster, we define accounting groups and any job is assigned to some user that belongs to an accounting group (with reference to his proxy certificate and via an authentication and mapping system called lcmaps and Argus). The rules that do this are described in the main document, and look something like this:

LivAcctGroup = strcat("group_",toUpper(
ifThenElse(regexp("sgmatl34",Owner),"highprio",
ifThenElse(regexp("sgmops11",Owner),"highprio",
ifThenElse(regexp("^alice", x509UserProxyVOName), "alice",
ifThenElse(regexp("^atlas", x509UserProxyVOName), "atlas",
ifThenElse(regexp("^biomed", x509UserProxyVOName), <…. and so on …>
"nonefound")))))))))))))))))))))))))))))))) )) ))
LivAcctSubGroup = strcat(regexps("([A-Za-z0-9]+[A-Za-z])\d+", Owner,
"\1"),ifThenElse(RequestCpus > 1,"_mcore","_score"))
AccountingGroup = strcat(LivAcctGroup, ".", LivAcctSubGroup, ".", Owner)
SUBMIT_EXPRS = $(SUBMIT_EXPRS) LivAcctGroup, LivAcctSubGroup,
AccountingGroup

The idea is that we have a major accounting group and a sub accounting group for each job, which is put in the SUBMIT_EXPRS as a parameter. The sub accounting group is always _mcore or _score for reasons that will be obvious in a minute. When I run condor_userprio, I see this for e.g. ATLAS (some cols omitted). Note the priority factor, last col.

group_ATLAS 0.65 Regroup 1000.00
pilatl_score.pilatl08@ph.liv.ac.uk 500.00 1000.00
atlas_score.atlas006@ph.liv.ac.uk 500.33 1000.00
prdatl_mcore.prdatl28@ph.liv.ac.uk 49993.42 1.00 
pilatl_score.pilatl24@ph.liv.ac.uk 96069.21 1000.00
prdatl_score.prdatl28@ph.liv.ac.uk 202372.86 1000.00

The priority factor for the _mcore subgroup has been set to 1 , using

condor_userprio -setfactor prdatl_mcore.prdatl28@ph.liv.ac.uk 1 

If the default priority factor is (say) 1000, then this makes mcore jobs much more likely to be selected to run than score jobs. Thus if a wide slot is asking for jobs, they it should get a wide job. This seems to be borne out in experience.

GROUP_SORT_EXPR

Andrew Lahiffe has had good results from the GROUP_SORT_EXPR, but I haven't tried it out yet.

Download, Install, Configure

The Fallow controller is available as an RPM in this location:

hep.ph.liv.ac.uk/~sjones/

It's an RPM so it can be installed on the ARC/Condor headnode with rpm or yum. Once installed, open

/root/scripts/runFallow.sh

script and you can modify the line that runs the script, i.e.

./fallow.py -s 350 -n 61

The -s parameter is the number of unislots (single-cores) to be reserved for multicore jobs. The -n parameter is the negotiator interval + 1. Change this to your site specific value. You can then start the fallow service, i.e.

service fallow start

It will write a log file to

/root/scripts/fallow.log

DrainBoss

DrainBoss has been superceded by Fallow, above.

Introduction to DrainBoss

If all jobs on a cluster require the same number of CPUs, e.g. all need one, or all need two etc., then you can simple load up each node with jobs until it is full and no more. When one jobs ends, another can use its slot. But a problem occurs when you try to run jobs which vary in the number of cpus they require. Consider when a node has (say) eight cores, and it running eight single core jobs. One is the first to end, and a slot becomes free. But let us say that the highest priority job in the queue is an eight core job. The newly freed slot is not wide enough to take it, so it has to wait. Should the scheduler use the slot for a waiting single core job, or hold it back for the other seven jobs to end? It it holds jobs back, then resources are wasted. If it pops another single core job into running, then the multicore job has no prospect of ever running. The solution that Condor provides to the problem has two rules: start multicore jobs in preference to single core jobs, and periodically drain down nodes so that a multicore job can fit on them. The is implemented using the Condor DEFRAG daemon. This has parameters, described in the section below, which control the way nodes are selected and drained for multicore jobs. DrainBoss provides functionality for a similar approach but has a the additional features of a process controller that is used to sense the condition of the cluster and adjust the way nodes are drained and put back into service in a way that provides a certain amount of predictability.

Process controller principles

A process controller provides a feedback control system. It measures some variable, and compares this to some ideal value, called a setpoint, finding the error. It corrects the process to try to bring the error to the setpoint, eliminating the error. There are a large number of algorithms used to compute the correction, but DrainBoss makes use of the first two terms of the well-known Proportional Integral Derivative (PID) control algorithm, i.e. it's a PI controller. The proportional term sets the correction proportionally to the size of the error. This is sometimes called the gain of the controller. This is sufficient for many fast acting processes, but any process involving the draining of compute nodes is likely to have a period of some hours or days. In this application, pure proportional control is too sensitive to time lags and the control would be very poor. This, in this application, the proportional is used but it has a very low gain to damp down its sensitivity. The second term, integral action, is more important in this application. Integral actions sums (i.e. integrates, hence the name) the size of the error over time and feeds that in to the controller output as well. Thus, as the area under the error build over time, the control output grows to offset it. This eventually overcomes the offset and returns the measured variable to the set point.

Application

There are a few particulars to this application that affect the design of the controller.

First, the prime objectives of the system are to maximise the usage of the cluster and get good throughput of both single-core and multicore jobs. A good controller might be able to achieve this but there are a few problems to deal with.

  • Minimal negative corrections: To achieve control, the controller usually only puts more nodes into drain state. It never stops nodes draining, with one exception - once a drain starts, it usually completes. The purpose of this policy is that drains represent a cost to the system, and cancelling throws away ant achievement made from the draining. Just because there are few multicore jobs ion the queue at present doesn't mean some might not crop up at any time. It appears that cancelling drains, and throwing away the achievement made from the draining, could easily be premature. Instead, the nodes are left to drain out and put back into service, just in case a multicore jobs comes along and needs the slot. The only exception to this rules is when there are no multicore or single core jobs in the queue. In this case, the single core jobs are potentially being held back for now reason., It thins unique case, all draining is immediately cancelled to allow the single core nodes to be run.
  • Traffic problems: on a cluster, there is no guarantee that a constant supply of multicore jobs and single core jobs is available. There could be periods when the queue is depleted of one or both types of work. The controller will deal with these issues in the best way it can using these rules. If there are no multicore jobs queued, then it's pointless to start draining any systems because there are no jobs to fill the resulting wide slots. Also, if there are no multicore jobs but some single core jobs are queued, then the controller cancels the on-going drains to let the single core jobs run, otherwise the jobs would be held back for no valid reason. The truth table below shows the simple picture.


Queue state
mc jobs queued no yes no yes
sc jobs queued no no yes yes
Actions
start drain if nec. no yes no yes
cancel on-going drains no no yes no

Tuning

Tuning was done entirely by hand although there are technical ways to tune the system more accurately that I hope to research in future.

Current status

blah


Download

The DrainBoss controller is available as an RPM in this Yum repository:

www.sysadmin.hep.ac.uk

The DEFRAG daemon

This is the traditional approach to defragmentation used in the the initial version of the example build of an ARC/Condor cluster. It uses the DEFRAG daemon that comes with condor. To configure this set-up, you need to edit on the server the condor_config.local on the server, and create a script, set_defrag_parameters.sh, to control the amount of defragging. The script is operated by a cron job. Full details on this configuration are given ihte section of server files, above. The meaning of some important fragmentation parameters used to control the DEFRAG daemon is discussed next.

  • DEFRAG_INTERVAL – How often the daemon evaluates defrag status and sets systems draining.
  • DEFRAG_REQUIREMENTS – Only machines that fit these requirements will start to drain.
  • DEFRAG_DRAINING_MACHINES_PER_HOUR – Only this many machines will be set off draining each hour.
  • DEFRAG_MAX_WHOLE_MACHINES – Don't start any draining if you already have this many whole machines.
  • DEFRAG_MAX_CONCURRENT_DRAINING – Never drain more than this many machines at once.
  • DEFRAG_RANK – This allows you to prefer some machines over others to drain.
  • DEFRAG_WHOLE_MACHINE_EXPR – This defines whether a certain machine is whole or not.
  • DEFRAG_CANCEL_REQUIREMENTS – Draining will be stopped when a draining machine matches these requirements.

Note: The meaning of the ClassAds and parameters used to judge the fragmentation state of a machine is Byzantine in its complexity. The following definitions have been learned from experience.

The multicore set-up in CONDOR makes use of the idea of a abstract Partitonable Slot (PSlot) that can't run jobs but contains real slots of various sizes that can. In our set-up, every node has a single PSlot on it. Smaller "real" slots are made from it, each with either 1 single simultaneous thread of execution (a unislot) or 8 unislots for multicore jobs. The table below shows the meaning of some ClassAds used to express the usage of a node that is currently runing seven single core jobs (I think it's taken from an E5620 CPU).

The ClassAds in the first columns (Pslot) have the following meanings. DetectedCpus shows that the node has 16 hyper-threads in total - this is the hardware limit for simultaneous truly concurrent threads. The next row, TotalSlots, shows the size of the PSlot on this node. In this case, only 10 unislots can ever be used for jobs, unusing 6 unislots (note: it has been found that total throughput does not increase even if all the unislots are used so it is not inefficient to unuse 6 unislots.) Next, TotalSlots is equal to 8 in this case, which represents the total of all the used unislots in the sub slots, plus 1 to represent the PSlot. A value of 8 shows that this PSlot currently has seven of its unislots used by sub slots, and three unused. These could be used to make new sub slots to run jobs in. The last ClassAd, Cpus, represents the usable unislots in the PSlot that are left over (i.e. 3).

With respect to the sub slot columns, the DetectedCpus and TotalSlots values can be ignored as they are always the same. Both TotalSlot and Cpus in the sub slot columns represent how many unislots are in this sub slot.

It's as clear as mud, isn't it? But my experiments show it is consistent.

PSlot Sub slot Sub Slot Sub Slot Sub Slot Sub Slot Sub Slot Sub Slot Empty 3 unislots
DetectedCpus:
How Many
HyperThreads
e.g. 16
Ignore Ignore Ignore Ignore Ignore Ignore Ignore Empty
TotalSlotCpus:
How many CPUs
can be used
e.g. 10
Ignore Ignore Ignore Ignore Ignore Ignore Ignore Empty
TotalSlots:
Total of main plus
all sub slots
e.g. 8
TotalSlots:
How many unislots in
this sub slot.
e.g. 1
TotalSlots:
How many unislots in
this sub slot.
e.g. 1
TotalSlots:
How many unislots in
this sub slot.
e.g. 1
TotalSlots:
How many unislots in
this sub slot.
e.g. 1
TotalSlots:
How many unislots in
this sub slot.
e.g. 1
TotalSlots:
How many unislots in
this sub slot.
e.g. 1
TotalSlots:
How many unislots in
this sub slot.
e.g. 1
Empty
Cpus:
Usable unislots
left over

e.g. 3
As above
Always the same.
As above
Always the same.
As above
Always the same.
As above
Always the same.
As above
Always the same.
As above
Always the same.
As above
Always the same.
Empty


Setting Defrag Parameters

The script below is for sensing the load condition of the cluster and setting appropriate parameters for defragmentation.

  • File: /root/scripts/set_defrag_parameters.sh
  • Notes: This script senses changes to the running and queueing job load, and sets parameters related to defragmentation. This allows the cluster to support a load consisting of both multicore and singlecore jobs.
  • Customise: Yes. You'll need to edit it it to suit your site. BTW: I'm experimenting with a swanky new version that involves a rate controlller. I'll report on that in due course.
  • Content:
#!/bin/bash
#
# Change condor_defrag daemon parameters depending on what's queued

function setDefrag () {

   # Get the address of the defrag daemon
   defrag_address=$(condor_status -any -autoformat MyAddress -constraint 'MyType =?= "Defrag"')

   # Log
   echo `date` " Setting DEFRAG_MAX_CONCURRENT_DRAINING=$3, DEFRAG_DRAINING_MACHINES_PER_HOUR=$4, DEFRAG_MAX_WHOLE_MACHINES=$5 (queued multicore=$1, running multicore=$2)"

   # Set configuration
   /usr/bin/condor_config_val -address "$defrag_address" -rset "DEFRAG_MAX_CONCURRENT_DRAINING = $3" >& /dev/null
   /usr/bin/condor_config_val -address "$defrag_address" -rset "DEFRAG_DRAINING_MACHINES_PER_HOUR = $4" >& /dev/null
   /usr/bin/condor_config_val -address "$defrag_address" -rset "DEFRAG_MAX_WHOLE_MACHINES = $5" >& /dev/null
   /usr/sbin/condor_reconfig -daemon defrag >& /dev/null
}

function cancel_draining_nodes () {
  # Get draining nodes
  for dn in `condor_status | grep Drained | sed -e "s/.*@//" -e "s/\..*//" `; do
    slot1=0
    condor_status -long $dn| while read line; do
  
      # Toggle if slot1@ (not slot1_...). slot1@ lists the empty (i.e. drained) total
      if  $line =~ ^Name.*slot1@.*$  ; then
        slot1=1
      fi
      if  $line =~ ^Name.*slot1_.*$  ; then
        slot1=0
      fi
    
      if [ $slot1 == 1 ]; then
        if  $line =~ ^Cpus\ \=\ (.*)$  ; then
  
          # We must capture empty/drained total
          cpus="${BASH_REMATCH[1]}"
          if [ $cpus -ge 8 ]; then
            # We have enough already. Pointless waiting longer.
            echo Cancel drain of $dn, as we have $cpus free already
            condor_drain -cancel $dn
          fi
        fi
      fi
    done
  done
}

queued_mc_jobs=$(condor_q -global -constraint 'RequestCpus == 8 && JobStatus == 1' -autoformat ClusterId | wc -l)

queued_sc_jobs=$(condor_q -global -constraint 'RequestCpus == 1 && JobStatus == 1' -autoformat ClusterId | wc -l)

running_mc_jobs=$(condor_q -global -constraint 'RequestCpus == 8 && JobStatus == 2' -autoformat ClusterId | wc -l)

running_sc_jobs=$(condor_q -global -constraint 'RequestCpus == 1 && JobStatus == 2' -autoformat ClusterId | wc -l)

queued_mc_slots=`expr $queued_mc_jobs \* 8`

queued_sc_slots=$queued_sc_jobs

# Ratio control
P_SETPOINT=0.5    # When the ratio between multicore and singlecore is more than this, take action

#CONSTANTS
C_MxWM=1000  # At max, pay no heed to how many whole systems
C_MxDH=3    # At max, kick off N per hour to drain
C_MxCD=2     # At max, never more than Xth of cluster should defrag at once (for goodness sake)

C_MnWM=6    # At min, don't bother if n already whole
C_MnDH=1    # At min, only start 1 per hour max
C_MnCD=1    # At min, don't bother if n already going

C_ZWM=0    # At zero, don't bother if 0 already whole
C_ZDH=0    # At zero, only start 0 per hour max
C_ZCD=0    # At zero, don't bother if 0 already going


if [ $queued_sc_slots -le 3 ]; then
  # Very few sc jobs. Max defrag.
  setDefrag $queued_mc_jobs $running_mc_jobs $C_MxCD $C_MxDH $C_MxWM
else
  if [ $queued_mc_slots -le 1 ]; then
    # More than a couple of sc jobs, and almost no mc jobs.
    # No defraging starts,  cancel current defraging
    setDefrag $queued_mc_jobs $running_mc_jobs $C_ZCD $C_ZDH $C_ZWM
    cancel_draining_nodes
  else
    # More than a couple of sc jobs, and mc jobs 
    RATIO=`echo "$queued_mc_slots / $queued_sc_slots" | bc -l`
    RESULT=$(echo "${RATIO} > ${P_SETPOINT}" | bc -l )
    
    if [ $RESULT -eq 1 ]; then
      # Surplus of MC over SC, lots of defrag. 
      setDefrag $queued_mc_jobs $running_mc_jobs $C_MxCD $C_MxDH $C_MxWM    
    else
      # Not More MC than SC, little of defrag
      setDefrag $queued_mc_jobs $running_mc_jobs $C_MnCD $C_MnDH $C_MnWM    
    fi
  fi
fi

# Raise priority of MC jobs
/root/scripts/condor_q_cores.pl > /tmp/c

# Put all the MC records in one file, with I jobs only
grep ^MC /tmp/c | grep ' I ' > /tmp/mc.c

# Go over those queued multicore jobs and up thier prio
for j in `cat /tmp/mc.c | sed -e "s/\S*\s//" -e "s/ .*//"`; do condor_prio -p 6 $j; done
rm /tmp/c /tmp/mc.c


exit

This cron job runs the script periodically.

  • Cron: defrag
  • Purpose: Sets the defrag parameters dynamically
  • Puppet stanza:
 cron { "set_defrag_parameters.sh":
   command => "/root/scripts/set_defrag_parameters.sh >> /var/log/set_defrag_parameters.log",
   require => File["/root/scripts/set_defrag_parameters.sh"],
   user => root,
   minute   => "*/5",
   hour     => "*",
   monthday => "*",
   month    => "*",
   weekday  => "*",
 }

Further Work

blah blah blah


Also see