Difference between revisions of "ARC CE Hints"
Line 62: | Line 62: | ||
1) export X509_USER_PROXY=$RUNTIME_JOB_DIR/user.proxy | 1) export X509_USER_PROXY=$RUNTIME_JOB_DIR/user.proxy | ||
export X509_USER_CERT=$RUNTIME_JOB_DIR/user.proxy | export X509_USER_CERT=$RUNTIME_JOB_DIR/user.proxy | ||
− | + | #Multicore jobs stay in the shared directory and don't need to be moved back | |
− | #Multicore jobs stay in the shared directory and don't need to be | + | |
if [ -f $SLURM_SUBMIT_DIR/user.proxy ]; then | if [ -f $SLURM_SUBMIT_DIR/user.proxy ]; then | ||
touch $RUNTIME_JOB_DIR/MC_JOB #Just for testing, can be removed | touch $RUNTIME_JOB_DIR/MC_JOB #Just for testing, can be removed |
Latest revision as of 16:02, 27 August 2015
Contents
Mapping to pool accounts
Argus in combination with lcmaps can be used to map DNs to pool accounts. In the [gridftpd] section of /etc/arc.conf include the following:
allowunknown="yes" unixmap="* lcmaps liblcmaps.so /usr/lib64 /etc/lcmaps/lcmaps.db voms" unixmap="nobody:nobody all"
The second line above ensures that DNs which are not mapped successfully are mapped to nobody. The LRMS can be configured to not accept jobs from nobody. The file /etc/lcmaps/lcmaps.db is
path = /usr/lib64/lcmaps
verify_proxy = "lcmaps_verify_proxy.mod" "-certdir /etc/grid-security/certificates" "--discard_private_key_absence" "--allow-limited-proxy"
pepc = "lcmaps_c_pep.mod" "--pep-daemon-endpoint-url https://argus.domain:8154/authz" "--resourceid http://authz-interop.org/xacml/resource/resource-type/arc" "--actionid http://glite.org/xacml/action/execute" "--capath /etc/grid-security/certificates/" "--certificate /etc/grid-security/hostcert.pem" "--key /etc/grid-security/hostkey.pem"
# Policies: arc: verify_proxy -> pepc
where argus.domain should be replace with the hostname of your Argus server. The Argus default policy should contain an appropriate section for the ARC CE, for example:
resource "http://authz-interop.org/xacml/resource/resource-type/arc" { obligation "http://glite.org/xacml/obligation/local-environment-map" {} action ".*" { rule permit { pfqan="/cms/Role=pilot/Capability=NULL" } rule permit { pfqan="/cms/Role=pilot" } rule permit { pfqan="/cms/Role=lcgadmin/Capability=NULL" } rule permit { pfqan="/cms/Role=lcgadmin" } rule permit { pfqan="/cms/Role=production/Capability=NULL" } rule permit { pfqan="/cms/Role=production" } rule permit { pfqan="/cms/Role=t1production/Capability=NULL" } rule permit { pfqan="/cms/Role=t1production" } rule permit { pfqan="/cms/Role=t1access/Capability=NULL" } rule permit { pfqan="/cms/Role=t1access" } } }
How to get EMI WMS jobs to work
Create an empty file on all worker nodes called /usr/etc/globus-user-env.sh
Changes required for DIRAC
DIRAC is unable to specify runtime environments. Since it is common for environment variables required for grid jobs to be setup in a runtime environment, e.g. ENV/GLITE, we need to force the ARC CE to use a specified runtime environment by default. In the [grid-manager] section of /etc/arc.conf include the following line:
authplugin="PREPARING timeout=60,onfailure=pass,onsuccess=pass /usr/local/bin/default_rte_plugin.py %S %C %I ENV/GLITE"
where default_rte_plugin.py can be found here https://raw.githubusercontent.com/alahiff/ral-arc-ce-plugins/master/default_rte_plugin.py Replace ENV/GLITE with the name of the runtime environment that you want to be the default.
Changes required for LHCb
LHCb require an environment variable NORDUGRID_ARC_QUEUE to be defined which specifies the name of the queue. The RAL Tier-1 sets this up in our ENV/GLITE runtime environment, which can be found here https://raw.githubusercontent.com/alahiff/ral-arc-ce-rte/master/GLITE
Changes required for ATLAS
By default ATLAS jobs specify the ENV/PROXY runtime environment. At the RAL Tier-1 we just have an empty file on the ARC CEs and worker nodes called /etc/arc/runtime/ENV/PROXY
ARC/Slurm Job Proxy Renewal
For jobs run outside pilot frameworks, the arcrenew tool does not renew proxies within running jobs when ARC CEs have a SLURM backend.
The GLITE runtime environment script can have the following added to leave proxies in the shared session directory and allow for the renewal:
case $1 in 0) cat ${joboption_controldir}/job.${joboption_gridid}.proxy >$joboption_directory/user.proxy ;; 1) export X509_USER_PROXY=$RUNTIME_JOB_DIR/user.proxy export X509_USER_CERT=$RUNTIME_JOB_DIR/user.proxy #Multicore jobs stay in the shared directory and don't need to be moved back if [ -f $SLURM_SUBMIT_DIR/user.proxy ]; then touch $RUNTIME_JOB_DIR/MC_JOB #Just for testing, can be removed else mv $X509_USER_PROXY $SLURM_SUBMIT_DIR/user.proxy ln -s $SLURM_SUBMIT_DIR/user.proxy $X509_USER_PROXY fi ;; 2) : ;; esac
The /usr/share/arc/scan-SLURM-job script then needs to be modified to check the CE held proxy against the proxy in the running job and replace if it is newer. At line 331 in ARC5:
PENDING|SUSPENDE|COMPLETING) #Job is running, nothing to do. ;; RUNNING) jobfile="${basenames[$localid]}.local" sessiondir=`grep -h '^sessiondir=' $jobfile | sed 's/^sessiondir=\(.*\)/\1/'` liveproxy="${sessiondir}/user.proxy" ceproxy="${basenames[$localid]}.proxy" if [ "$liveproxy" -ot "$ceproxy" ]; then cp -f "$ceproxy" "$liveproxy" fi ;;