ARC CE Hints

From GridPP Wiki
Jump to: navigation, search

Mapping to pool accounts

Argus in combination with lcmaps can be used to map DNs to pool accounts. In the [gridftpd] section of /etc/arc.conf include the following:

allowunknown="yes"
unixmap="* lcmaps liblcmaps.so /usr/lib64 /etc/lcmaps/lcmaps.db voms"
unixmap="nobody:nobody all"

The second line above ensures that DNs which are not mapped successfully are mapped to nobody. The LRMS can be configured to not accept jobs from nobody. The file /etc/lcmaps/lcmaps.db is

path = /usr/lib64/lcmaps
verify_proxy = "lcmaps_verify_proxy.mod" "-certdir /etc/grid-security/certificates" "--discard_private_key_absence" "--allow-limited-proxy"
pepc = "lcmaps_c_pep.mod" "--pep-daemon-endpoint-url https://argus.domain:8154/authz" "--resourceid http://authz-interop.org/xacml/resource/resource-type/arc" "--actionid http://glite.org/xacml/action/execute" "--capath /etc/grid-security/certificates/" "--certificate /etc/grid-security/hostcert.pem" "--key /etc/grid-security/hostkey.pem"
# Policies: arc: verify_proxy -> pepc

where argus.domain should be replace with the hostname of your Argus server. The Argus default policy should contain an appropriate section for the ARC CE, for example:

resource "http://authz-interop.org/xacml/resource/resource-type/arc" {
      obligation
"http://glite.org/xacml/obligation/local-environment-map" {}
       action ".*" {
         rule permit { pfqan="/cms/Role=pilot/Capability=NULL" }
         rule permit { pfqan="/cms/Role=pilot" }
         rule permit { pfqan="/cms/Role=lcgadmin/Capability=NULL" }
         rule permit { pfqan="/cms/Role=lcgadmin" }
         rule permit { pfqan="/cms/Role=production/Capability=NULL" }
         rule permit { pfqan="/cms/Role=production" }
         rule permit { pfqan="/cms/Role=t1production/Capability=NULL" }
         rule permit { pfqan="/cms/Role=t1production" }
         rule permit { pfqan="/cms/Role=t1access/Capability=NULL" }
         rule permit { pfqan="/cms/Role=t1access" }
     }
}

How to get EMI WMS jobs to work

Create an empty file on all worker nodes called /usr/etc/globus-user-env.sh

Changes required for DIRAC

DIRAC is unable to specify runtime environments. Since it is common for environment variables required for grid jobs to be setup in a runtime environment, e.g. ENV/GLITE, we need to force the ARC CE to use a specified runtime environment by default. In the [grid-manager] section of /etc/arc.conf include the following line:

authplugin="PREPARING timeout=60,onfailure=pass,onsuccess=pass /usr/local/bin/default_rte_plugin.py %S %C %I ENV/GLITE"

where default_rte_plugin.py can be found here https://raw.githubusercontent.com/alahiff/ral-arc-ce-plugins/master/default_rte_plugin.py Replace ENV/GLITE with the name of the runtime environment that you want to be the default.

Changes required for LHCb

LHCb require an environment variable NORDUGRID_ARC_QUEUE to be defined which specifies the name of the queue. The RAL Tier-1 sets this up in our ENV/GLITE runtime environment, which can be found here https://raw.githubusercontent.com/alahiff/ral-arc-ce-rte/master/GLITE

Changes required for ATLAS

By default ATLAS jobs specify the ENV/PROXY runtime environment. At the RAL Tier-1 we just have an empty file on the ARC CEs and worker nodes called /etc/arc/runtime/ENV/PROXY

ARC/Slurm Job Proxy Renewal

For jobs run outside pilot frameworks, the arcrenew tool does not renew proxies within running jobs when ARC CEs have a SLURM backend.

The GLITE runtime environment script can have the following added to leave proxies in the shared session directory and allow for the renewal:

case $1 in
 0) cat ${joboption_controldir}/job.${joboption_gridid}.proxy >$joboption_directory/user.proxy
    ;;
 1) export X509_USER_PROXY=$RUNTIME_JOB_DIR/user.proxy
    export X509_USER_CERT=$RUNTIME_JOB_DIR/user.proxy
    #Multicore jobs stay in the shared directory and don't need to be moved back
    if [ -f $SLURM_SUBMIT_DIR/user.proxy ]; then
       touch $RUNTIME_JOB_DIR/MC_JOB #Just for testing, can be removed
    else
       mv $X509_USER_PROXY $SLURM_SUBMIT_DIR/user.proxy
       ln -s $SLURM_SUBMIT_DIR/user.proxy $X509_USER_PROXY
    fi
    ;;
 2) :
    ;;
esac

The /usr/share/arc/scan-SLURM-job script then needs to be modified to check the CE held proxy against the proxy in the running job and replace if it is newer. At line 331 in ARC5:

       PENDING|SUSPENDE|COMPLETING)
       #Job is running, nothing to do.
           ;;
       RUNNING)
           jobfile="${basenames[$localid]}.local"
           sessiondir=`grep -h '^sessiondir=' $jobfile | sed 's/^sessiondir=\(.*\)/\1/'`
           liveproxy="${sessiondir}/user.proxy"
           ceproxy="${basenames[$localid]}.proxy"
           if [ "$liveproxy" -ot "$ceproxy" ]; then
               cp -f "$ceproxy" "$liveproxy"
           fi
           ;;