Difference between revisions of "Glideinwms with arcce"
From GridPP Wiki
(3 intermediate revisions by one user not shown) | |||
Line 30: | Line 30: | ||
</li> | </li> | ||
+ | <li> Edit /opt/condor-submit/config.d/03_gwms_local.config to keep an x509 proxy on the WN. At the end add: <br> | ||
+ | <pre> | ||
+ | use_x509userproxy = True | ||
+ | SUBMIT_EXPRS = $(SUBMIT_EXPRS) use_x509userproxy | ||
+ | DELEGATE_JOB_GSI_CREDENTIALS = False | ||
+ | </pre> | ||
+ | source /opt/condor-submit/condor.sh | ||
+ | condor_reconfig | ||
+ | </li> | ||
+ | |||
+ | <li> | ||
+ | Note the comments on /usr/share/arc/Condor.pm on the ARCCE [https://www.gridpp.ac.uk/wiki/Imperial_arc_ce_for_cloud page]. | ||
+ | </li> | ||
+ | </ol> | ||
+ | |||
+ | = Testing and maintenance = | ||
+ | |||
+ | Check if the cloud is running jobs: <br> | ||
+ | <pre> | ||
+ | deathstar:root :~] cat is_the_cloud_working.sh | ||
+ | #!/bin/sh | ||
+ | source /opt/condor-submit/condor.sh | ||
+ | condor_q -global | ||
+ | |||
+ | NUM_JOBS=`condor_q -global | awk '{ print $6; }' | grep -c 'R'` | ||
+ | if [ $NUM_JOBS -gt 1 ]; then | ||
+ | echo "YES! Jobs are running." | ||
+ | else | ||
+ | echo "No, sorry, it isn't working." | ||
+ | fi | ||
+ | exit 0 # Don't source this script ;-) | ||
+ | </pre> | ||
+ | |||
+ | Clean up jobs in 'held' state: <br> | ||
+ | <pre> | ||
+ | deathstar:root :~] cat clean_held_jobs.sh | ||
+ | #!/bin/bash | ||
+ | |||
+ | CONDOR_CRAP=`condor_q -hold | grep cld | awk '{print $1}'` | ||
+ | for JOB_ID in $CONDOR_CRAP | ||
+ | do | ||
+ | echo "Removing job: " $JOB_ID | ||
+ | condor_rm $JOB_ID | ||
+ | # sleep 2 | ||
+ | # condor_rm -forcex $JOB_ID | ||
+ | done | ||
+ | </pre> | ||
+ | |||
+ | Restart the ARC-CE: <br> | ||
+ | <pre> | ||
+ | #!/bin/sh | ||
+ | |||
+ | for i in gridftpd a-rex nordugrid-arc-slapd nordugrid-arc-bdii; do | ||
+ | /etc/init.d/${i} stop | ||
+ | done | ||
+ | |||
+ | sleep 2 | ||
+ | |||
+ | for i in gridftpd a-rex nordugrid-arc-slapd nordugrid-arc-bdii; do | ||
+ | /etc/init.d/${i} start | ||
+ | sleep 1 | ||
+ | done | ||
+ | </pre> | ||
<hr><br> | <hr><br> | ||
Return to glidein [https://www.gridpp.ac.uk/wiki/Cloud_Work_at_Imperial overview page]. <br> | Return to glidein [https://www.gridpp.ac.uk/wiki/Cloud_Work_at_Imperial overview page]. <br> |
Latest revision as of 13:44, 12 June 2014
How to create a "Stealth Cloud"
(or how to get your ARCCE to submit to a cloud via a glideinWMS)
- Setup a glidein WMS, apart from the Submit module which needs to be hosted by the ARCCE.
- Setup and ARC CE as e.g. described here.
- On the ARCCE open the necessary ports (and restart iptables)
- Make a condor user:
[root@cetest02 opt]# groupadd condor [root@cetest02 opt]# useradd -m -g condor condor [root@cetest02 opt]# passwd -l condor
- Set up the Submit module from the glideinWMS on the ARCCE:
Note: The versions used on the ARCCE and the glideinWMS have to match ...
Currently I use condor-8.0.7-x86_64_RedHat6-unstripped.tar.gz and glideinWMS_v3_2_5.tgz
(as root) glideinWMS_v3_2_5.tgz
Unpack the glideinwms tarball and set its ownership to something sensible:
[root@cetest02 opt]# tar -zxvf /opt/tarballs/glideinWMS_v3_2_5.tgz; chown -R root:root glideinwms
(as root) /opt/glideinwms/install/manage-glideins --install submit --ini /opt/glideinwms-conf/condor.ini
- Edit /opt/condor-submit/config.d/03_gwms_local.config to keep an x509 proxy on the WN. At the end add:
use_x509userproxy = True SUBMIT_EXPRS = $(SUBMIT_EXPRS) use_x509userproxy DELEGATE_JOB_GSI_CREDENTIALS = False
source /opt/condor-submit/condor.sh condor_reconfig
- Note the comments on /usr/share/arc/Condor.pm on the ARCCE page.
Testing and maintenance
Check if the cloud is running jobs:
deathstar:root :~] cat is_the_cloud_working.sh #!/bin/sh source /opt/condor-submit/condor.sh condor_q -global NUM_JOBS=`condor_q -global | awk '{ print $6; }' | grep -c 'R'` if [ $NUM_JOBS -gt 1 ]; then echo "YES! Jobs are running." else echo "No, sorry, it isn't working." fi exit 0 # Don't source this script ;-)
Clean up jobs in 'held' state:
deathstar:root :~] cat clean_held_jobs.sh #!/bin/bash CONDOR_CRAP=`condor_q -hold | grep cld | awk '{print $1}'` for JOB_ID in $CONDOR_CRAP do echo "Removing job: " $JOB_ID condor_rm $JOB_ID # sleep 2 # condor_rm -forcex $JOB_ID done
Restart the ARC-CE:
#!/bin/sh for i in gridftpd a-rex nordugrid-arc-slapd nordugrid-arc-bdii; do /etc/init.d/${i} stop done sleep 2 for i in gridftpd a-rex nordugrid-arc-slapd nordugrid-arc-bdii; do /etc/init.d/${i} start sleep 1 done
Return to glidein overview page.