Glideinwms with arcce

From GridPP Wiki
Jump to: navigation, search

How to create a "Stealth Cloud"

(or how to get your ARCCE to submit to a cloud via a glideinWMS)

  1. Setup a glidein WMS, apart from the Submit module which needs to be hosted by the ARCCE.
  2. Setup and ARC CE as e.g. described here.
  3. On the ARCCE open the necessary ports (and restart iptables)
  4. Make a condor user:
    [root@cetest02 opt]# groupadd condor
    [root@cetest02 opt]# useradd -m -g condor condor
    [root@cetest02 opt]# passwd -l condor
    
  5. Set up the Submit module from the glideinWMS on the ARCCE:
    Note: The versions used on the ARCCE and the glideinWMS have to match ...
    Currently I use condor-8.0.7-x86_64_RedHat6-unstripped.tar.gz and glideinWMS_v3_2_5.tgz
    (as root) glideinWMS_v3_2_5.tgz
    Unpack the glideinwms tarball and set its ownership to something sensible:
    [root@cetest02 opt]# tar -zxvf /opt/tarballs/glideinWMS_v3_2_5.tgz; chown -R root:root glideinwms
    

    (as root) /opt/glideinwms/install/manage-glideins --install submit --ini /opt/glideinwms-conf/condor.ini

  6. Edit /opt/condor-submit/config.d/03_gwms_local.config to keep an x509 proxy on the WN. At the end add:
    use_x509userproxy = True
    SUBMIT_EXPRS = $(SUBMIT_EXPRS) use_x509userproxy
    DELEGATE_JOB_GSI_CREDENTIALS = False
    

    source /opt/condor-submit/condor.sh condor_reconfig

  7. Note the comments on /usr/share/arc/Condor.pm on the ARCCE page.

Testing and maintenance

Check if the cloud is running jobs:

deathstar:root :~] cat is_the_cloud_working.sh 
#!/bin/sh
source /opt/condor-submit/condor.sh
condor_q -global

NUM_JOBS=`condor_q -global | awk '{ print $6; }' | grep -c 'R'`
if [ $NUM_JOBS -gt 1 ]; then
  echo "YES! Jobs are running."
else
  echo "No, sorry, it isn't working."
fi
exit 0 # Don't source this script ;-)

Clean up jobs in 'held' state:

deathstar:root :~] cat clean_held_jobs.sh 
#!/bin/bash

CONDOR_CRAP=`condor_q -hold | grep cld | awk '{print $1}'`
for JOB_ID in $CONDOR_CRAP
do
echo "Removing job: " $JOB_ID
condor_rm $JOB_ID
# sleep 2
# condor_rm -forcex $JOB_ID
done

Restart the ARC-CE:

#!/bin/sh

for i in gridftpd a-rex nordugrid-arc-slapd nordugrid-arc-bdii; do
  /etc/init.d/${i} stop
done

sleep 2

for i in gridftpd a-rex nordugrid-arc-slapd nordugrid-arc-bdii; do
  /etc/init.d/${i} start
  sleep 1 
done


Return to glidein overview page.