Imperial arc ce for cloud

From GridPP Wiki
Revision as of 15:50, 12 June 2014 by Simon Fayer 9b75294973 (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Setting up and ARC-CE as a cloud frontend

== Chapter 1: Installing an ARC-CE ==

0) Find and read the "ARC Computing Element System Administrator Guide".
1) Install a base CentOS6 machine.
2) Enable puppet on the machine, I'm using the CREAM CE profile for now...

   - service autofs restart / reboot is probably good now.
   - yum update

3) Ensure the machine can submit to the batch system & has all of the users.
4) Enable the required repos

   - cd /etc/yum.repos.d/
   - wget http://repository.egi.eu/sw/production/cas/1/current/repo-files/EGI-trustanchors.repo
   - Check the file looks reasonable! (cat EGI-trustanchors.repo)
   - yum -y install http://www.mirrorservice.org/sites/dl.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm
   - yum -y install http://download.nordugrid.org/packages/nordugrid-release/releases/13.11/centos/el6/x86_64/nordugrid-release-13.11-1.el6.noarch.rpm
   - yum -y install http://emisoft.web.cern.ch/emisoft/dist/EMI/3/sl6/x86_64/base/emi-release-3.0.0-2.el6.noarch.rpm
   - Edit nordugrid* & epel* and give them all priority=98
   - Edit emi* and remove all protect=1 lines
   - cd
   - yum update (Nothing should happen, be wary of anything being updated)

5) Install the CA & machine hostcert

   - yum -y install ca-policy-egi-core fetch-crl
   - chkconfig fetch-crl-cron on
   - service fetch-crl-cron start
   - mv host{cert,key}.pem /etc/grid-security/
   - restorecon -v /etc/grid-security/*.pem

6) Install the argus client LCAS & LCMAPS modules

   - yum -y install lcmaps-plugins-c-pep lcas-plugins-voms lcas lcmaps lcmaps-plugins-verify-proxy lcmaps-plugins-basic lcas-plugins-basic lcmaps-plugins-voms
   - Copy in the LCMAP & LCAS config files from a CREAM-CE to /etc/lcmaps/lcmaps.db & /etc/lcas/lcas.db.
   - Copy /etc/grid-security/{groupmapfile,grid-mapfile} from another CE, updating as appropriate.
   - touch /etc/lcas/ban_users.db

7) Actually install the CE package

   - yum -y install nordugrid-arc-compute-element

8) Set-up the file systems

   - /var/spool/arc should probably be its own volume.
   - mkdir /var/spool/arc / mount / ...
   - mkdir -p /var/spool/arc/{control,session}

9) Configure the CE

   - Edit /etc/arc.conf (Example at the end of this document)

10) Set-up the voms for the supported VOs and other authentication

   - i.e.  mkdir -p /etc/grid-security/vomsdir/dteam
   -       Create /etc/grid-security/vomsdir/dteam/voms.hellasgrid.gr.lsc ...
   - touch /etc/grid-security/grid-mapfile

11) Create some users for the CE to use

   - Create a users.conf/group.conf.
   - yum -y install glite-yaim-core
   - Create a simple siteinfo.def (example below).
   - /opt/glite/yaim/bin/yaim -s siteinfo.def -f config_users -r
   - mkdir /etc/grid-security/gridmapdir
   - for USR in `getent passwd | grep '^cld' | awk -F: '{ print $1 }'`; do touch /etc/grid-security/gridmapdir/${USR}; done
# siteinfo.def
CONFIG_USERS=yes
USERS_CONF=users.conf
GROUPS_CONF=groups.conf
VOS="cms dteam ops vo.londongrid.ac.uk"

12) Populate /etc/grid-security/vomsdir with required VOs (it's easiest to copy them from another CE). 13) Fix SELinux for the BDII

   - semanage fcontext -a -t slapd_db_t "/var/run/arc/bdii(/.*)?"; restorecon -vR /var/run/arc
   - semanage fcontext -a -t slapd_db_t "/var/run/bdii(/.*)?"; restorecon -vR /var/run/bdii/

14) Start the service going

   - chkconfig a-rex on; chkconfig gridftpd on
   - chkconfig nordugrid-arc-slapd on; chkconfig nordugrid-arc-bdii on
   - service a-rex start
   - service gridftpd start
   - service nordugrid-arc-slapd start
   - service nordugrid-arc-bdii start

15) Open some firewall ports.

   - The example config needs ports 2811, 2135 & 8443 opening for TCP.

16) Create some RTEs:

   - mkdir -p /srv/rte/ENV
   - touch /srv/rte/ENV/{GLITE,PROXY}
   - chmod +x /srv/rte/ENV/*

17) Enable the accounting:

   - yum -y install apel-client
   - Edit /usr/libexec/arc/ssmsend, on line 136 (the Ssm2 constructor) add a parameter of use_ssl = _use_ssl.
   - echo "10 6 * * * root /usr/libexec/arc/jura /srv/localstage/jobstatus &> /var/log/arc/jura.log" > /etc/cron.d/jura

Chapter 2: Cloud specific tweaks

A few changes to the ARC-CE are required to cope with the peculiarities of the cloud.
The BDII needs some adjustments to /usr/share/arc/Condor.pm:
The number of available slots goes up and down with the number of VMs on-line. If/when this gets to zero the BDII crashes, fixed by changing Condor.pm. (Reported here).

 - Condor.pm:243 my $qfactor = (condor_queue_get_nodes() + 1) / (condor_cluster_totalcpus() + 1);

Sometimes NumJobs > MaxQueuable which causes the WMS list match to fail, easy to fix by setting the max to 0, which disables this check:

 - Condor.pm:491 $lrms_queue{maxqueuable} = 0;

Chapter 3: Testing an ARC-CE

0) Find an ARC-UI (source /vols/grid/ui/arc/arcui.sh on our lx machines)
1) Create the files for a job:

   - echo "echo Hello World 
hostname
env" > hello.sh
- chmod +x hello.sh
    - echo '&( executable = "hello.sh" ) <br/>
             ( stdout = "stdout" ) <br/>
             ( stderr = "stderr" ) <br/>
             ( inputfiles = ( "hello.sh" "" ) ) <br/>
             ( outputfiles = ( "stdout" "" ) ( "stderr" "" ) ) <br/>
             ( queue = "grid.q" ) <br/>
             ( jobname = "RSL Hello World job" )' > hello.rsl <br/>
    

2) Submit the job:

   - arcproxy -S dteam 
- arcsub -c ldap://cetest01.grid.hep.ph.ic.ac.uk hello.rsl

3) Check the logs:

   - All the ARC logs are in /var/log/arc. 
- Apart from the ones that are in /var/spool/arc/control (specified in controldir in arc.conf)

4) Wait for it to run & finish:

   - arcstat -a 

5) Get the output:

   - arcget -a 


Appendix A: Example arc.conf


# Cloud Condor

[common]
x509_user_key="/etc/grid-security/hostkey.pem"
x509_user_cert="/etc/grid-security/hostcert.pem"
x509_cert_dir="/etc/grid-security/certificates"
gridmap="/etc/grid-security/grid-mapfile"
lrms="condor"
condor_bin_path="/opt/condor-submit/bin"
condor_config="/opt/condor-submit/etc/condor_config"
controldir="/srv/localstage/jobstatus"
shared_filesystem="no"

[grid-manager]
user="root"
controldir="/srv/localstage/jobstatus"
sessiondir="/srv/localstage/session"
debug="5"
logfile="/var/log/arc/grid-manager.log"
pidfile="/var/log/arc/grid-manager.pid"
mail="lcg-site-admin@ic.ac.uk"
joblog="/var/log/arc/gm-jobs.log"
arex_mount_point="https://cetest02.grid.hep.ph.ic.ac.uk:8443/arex"
runtimedir="/srv/rte"

# gridftp server config
[gridftpd]
user="root"
debug="5"
logfile="/var/log/arc/gridftpd.log"
pidfile="/var/run/gridftpd.pid"
port="2811"
allowunknown="yes"
globus_tcp_port_range="20000,24999"
globus_udp_port_range="20000,24999"
unixmap="* lcmaps liblcmaps.so /usr/lib64 /etc/lcmaps/lcmaps.db withvoms"

[group/users]
plugin="5 /usr/libexec/arc/arc-lcas %D %P liblcas.so /usr/lib64 /etc/lcas/lcas.db"

# job submission interface via gridftp
[gridftpd/jobs]
path="/jobs"
plugin="jobplugin.so"
allownew="yes"
groupcfg="users"

# openldap server config
[infosys]
user="root"
overwrite_config="yes"
port="2135"
debug="1"
registrationlog="/var/log/arc/inforegistration.log"
providerlog="/var/log/arc/infoprovider.log"
provider_loglevel="2"
infosys_glue12="enable"
infosys_glue2_ldap="enable"

[infosys/glue12]
resource_location="London, UK"
resource_longitude="-0.17897"
resource_latitude="51.49945"
glue_site_web="http://www.hep.ph.ic.ac.uk/e-science/"
glue_site_unique_id="UKI-LT2-IC-HEP"
cpu_scaling_reference_si00="2160"
processor_other_description="Cores=4,Benchmark=8.65-HEP-SPEC06"
provide_glue_site_info="false"

[infosys/admindomain]
name="UKI-LT2-IC-HEP"

# infosys view of the computing cluster (service)
[cluster]
cluster_alias="cetest02 (UKI-LT2-IC-HEP)"
comment="UKI-LT2-IC-HEP Main Grid Cluster"
homogeneity="True"
nodecpu="xeon"
architecture="x86_64"
nodeaccess="inbound"
nodeaccess="outbound"
opsys="CentOS"
nodememory="6000"
authorizedvo="cms"
authorizedvo="ops"
authorizedvo="dteam"
authorizedvo="vo.londongrid.ac.uk"
benchmark="SPECINT2000 2075"
benchmark="SPECFP2000 2075"
totalcpus=4

[queue/condor]
name="condor"

Appendix B: Example lcmaps.db

path = /usr/lib64/lcmaps 

good = "lcmaps_dummy_good.mod"

pepc        = "lcmaps_c_pep.mod"
              "--pep-daemon-endpoint-url https://lt2argus00.grid.hep.ph.ic.ac.uk:8154/authz"
              "--resourceid http://authz-interop.org/xacml/resource/resource-type/wn"
              "--actionid http://glite.org/xacml/action/execute"
              "--capath /etc/grid-security/certificates"
              "--certificate /etc/grid-security/hostcert.pem"
              "--key /etc/grid-security/hostkey.pem"
              "--pep-certificate-mode explicit"

argus:
pepc -> good


Appendix C: Example arcsh.sh

#!/bin/sh  <br/>
source /vols/grid/wn/3.2.11-0/external/etc/profile.d/grid-env.sh  <br/>
exec /bin/sh "$@"  <br/>

Appendix D: Example argus_mapper.sh

#!/bin/sh
OUTPUT=`/usr/bin/pepcli -t 20 -p https://lt2argus00.grid.hep.ph.ic.ac.uk:8154/authz -k "$1" -r http://authz-interop.org/xacml/resource/resource-type/wn -a http://glite.org/xacml/action/execute --cert /etc/grid-security/hostcert.pem --key /etc/grid-security/hostkey.pem --capath /etc/grid-security/certificates`
RES=$?
if [ "$RES" -ne "0" ]; then exit $RES; fi

echo "$OUTPUT" | grep -q '^Decision: Permit$'
if [ "$?" -ne "0" ]; then
  exit 1
fi
LUSER=`echo "$OUTPUT" | grep '^Username' | cut -d" " -f2`
if [ "$?" -ne "0" ]; then
  exit 2
fi
LGROUP=`echo "$OUTPUT" | grep '^Group' | cut -d" " -f2`
if [ "$?" -ne "0" ]; then
  exit 3
fi
echo $LUSER:$LGROUP
exit 0

Bonus Appendix: Building an ARC-UI package on EL5 & 6


TOP=/path/to/tree
mkdir -p ${TOP}/src
cd ${TOP}/src
wget "http://www.globus.org/ftppub/gt5/5.2/5.2.3/installers/src/gt5.2.3-all-source-installer.tar.gz"
tar zxmvf gt5.2.3-all-source-installer.tar.gz
cd gt5.2.3-all-source-installer
./configure --prefix=${TOP}/arc
make
make install
cd ..
export LD_LIBRARY_PATH=${TOP}/lib64
export PATH=$PATH:${TOP}/bin
export PKG_CONFIG_PATH=${TOP}/lib64/pkgconfig
wget http://download.nordugrid.org/repos/12.05/redhat/el6/source/updates/SRPMS/nordugrid-arc-2.0.1-1.el6.src.rpm
rpm2cpio nordugrid-arc-2.0.1-1.el6.src.rpm | cpio -id
tar zxmvf nordugrid-arc-2.0.1.tar.gz
cd nordugrid-arc-2.0.1
./configure --disable-all --enable-all-clients --enable-hed --prefix=${TOP} --libdir=${TOP}/lib64
make
make install
cd ..



Return to glidein overview page.