Imperial arc ce for cloud
Setting up and ARC-CE as a cloud frontend
== Chapter 1: Installing an ARC-CE ==
0) Find and read the "ARC Computing Element System Administrator Guide".
1) Install a base CentOS6 machine.
2) Enable puppet on the machine, I'm using the CREAM CE profile for now...
- service autofs restart / reboot is probably good now. - yum update
3) Ensure the machine can submit to the batch system & has all of the users.
4) Enable the required repos
- cd /etc/yum.repos.d/ - wget http://repository.egi.eu/sw/production/cas/1/current/repo-files/EGI-trustanchors.repo - Check the file looks reasonable! (cat EGI-trustanchors.repo) - yum -y install http://www.mirrorservice.org/sites/dl.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm - yum -y install http://download.nordugrid.org/packages/nordugrid-release/releases/13.11/centos/el6/x86_64/nordugrid-release-13.11-1.el6.noarch.rpm - yum -y install http://emisoft.web.cern.ch/emisoft/dist/EMI/3/sl6/x86_64/base/emi-release-3.0.0-2.el6.noarch.rpm - Edit nordugrid* & epel* and give them all priority=98 - Edit emi* and remove all protect=1 lines - cd - yum update (Nothing should happen, be wary of anything being updated)
5) Install the CA & machine hostcert
- yum -y install ca-policy-egi-core fetch-crl - chkconfig fetch-crl-cron on - service fetch-crl-cron start - mv host{cert,key}.pem /etc/grid-security/ - restorecon -v /etc/grid-security/*.pem
6) Install the argus client LCAS & LCMAPS modules
- yum -y install lcmaps-plugins-c-pep lcas-plugins-voms lcas lcmaps lcmaps-plugins-verify-proxy lcmaps-plugins-basic - Copy in the LCMAPs config file to /etc/lcmaps/lcmaps.db (See appendix B)
7) Actually install the CE package
- yum -y install nordugrid-arc-compute-element
8) Set-up the file systems
- /var/spool/arc should probably be its own volume. - mkdir /var/spool/arc / mount / ... - mkdir -p /var/spool/arc/{control,session}
9) Configure the CE
- Edit /etc/arc.conf (Example at the end of this document)
10) Set-up the voms for the supported VOs and other authentication
- i.e. mkdir -p /etc/grid-security/vomsdir/dteam - Create /etc/grid-security/vomsdir/dteam/voms.hellasgrid.gr.lsc ... - touch /etc/grid-security/grid-mapfile
11) Create some users for the CE to use
- Create a users.conf/group.conf. - yum -y install glite-yaim-core - Create a simple siteinfo.def (example below). - /opt/glite/yaim/bin/yaim -s siteinfo.def -f config_users -r - mkdir /etc/grid-security/gridmapdir - for USR in `getent passwd | grep '^cld' | awk -F: '{ print $1 }'`; do touch /etc/grid-security/gridmapdir/${USR}; done
# siteinfo.def CONFIG_USERS=yes USERS_CONF=users.conf GROUPS_CONF=groups.conf VOS="cms dteam ops vo.londongrid.ac.uk"
12) Fix SELinux for the BDII
- semanage fcontext -a -t slapd_db_t "/var/run/arc/bdii(/.*)?"; restorecon -vR /var/run/arc - semanage fcontext -a -t slapd_db_t "/var/run/bdii(/.*)?"; restorecon -vR /var/run/bdii/
13) Start the service going
- chkconfig a-rex on; chkconfig gridftpd on; chkconfig grid-infosys on - service a-rex start - service gridftpd start - service grid-infosys start
14) Open some firewall ports.
- The example config needs ports 2811, 2135 & 8443 opening for TCP.
Chapter 2: Cloud specific tweaks
A few changes to the ARC-CE are required to cope with the peculiarities of the cloud.
The BDII needs some adjustments to /usr/share/arc/Condor.pm:
The number of available slots goes up and down with the number of VMs on-line. If/when this gets to zero the BDII crashes, fixed by changing Condor.pm. (Reported here).
- Condor.pm:243 my $qfactor = (condor_queue_get_nodes() + 1) / (condor_cluster_totalcpus() + 1);
Sometimes NumJobs > MaxQueuable which causes the WMS list match to fail, easy to fix by setting the max to 0, which disables this check:
- Condor.pm:491 $lrms_queue{maxqueuable} = 0;
Chapter 3: Testing an ARC-CE
0) Find an ARC-UI (source /vols/grid/ui/arc/arcui.sh on our lx machines)
1) Create the files for a job:
- echo "echo Hello World
hostname
env" > hello.sh
- chmod +x hello.sh
- echo '&( executable = "hello.sh" ) <br/> ( stdout = "stdout" ) <br/> ( stderr = "stderr" ) <br/> ( inputfiles = ( "hello.sh" "" ) ) <br/> ( outputfiles = ( "stdout" "" ) ( "stderr" "" ) ) <br/> ( queue = "grid.q" ) <br/> ( jobname = "RSL Hello World job" )' > hello.rsl <br/>
2) Submit the job:
- arcproxy -S dteam
- arcsub -c ldap://cetest01.grid.hep.ph.ic.ac.uk hello.rsl
3) Check the logs:
- All the ARC logs are in /var/log/arc.
- Apart from the ones that are in /var/spool/arc/control (specified in controldir in arc.conf)
4) Wait for it to run & finish:
- arcstat -a
5) Get the output:
- arcget -a
Appendix A: Example arc.conf
# Cloud Condor [common] x509_user_key="/etc/grid-security/hostkey.pem" x509_user_cert="/etc/grid-security/hostcert.pem" x509_cert_dir="/etc/grid-security/certificates" gridmap="/etc/grid-security/grid-mapfile" lrms="condor" condor_bin_path="/opt/condor-submit/bin" condor_config="/opt/condor-submit/etc/condor_config" controldir="/srv/localstage/jobstatus" shared_filesystem="no" [grid-manager] user="root" controldir="/srv/localstage/jobstatus" sessiondir="/srv/localstage/session" debug="5" logfile="/var/log/arc/grid-manager.log" pidfile="/var/log/arc/grid-manager.pid" mail="lcg-site-admin@ic.ac.uk" joblog="/var/log/arc/gm-jobs.log" arex_mount_point="https://cetest02.grid.hep.ph.ic.ac.uk:8443/arex" runtimedir="/srv/rte" # gridftp server config [gridftpd] user="root" debug="5" logfile="/var/log/arc/gridftpd.log" pidfile="/var/run/gridftpd.pid" port="2811" allowunknown="yes" globus_tcp_port_range="20000,24999" globus_udp_port_range="20000,24999" unixmap="* lcmaps liblcmaps.so /usr/lib64 /etc/lcmaps/lcmaps.db withvoms" [group/users] plugin="5 /usr/libexec/arc/arc-lcas %D %P liblcas.so /usr/lib64 /etc/lcas/lcas.db" # job submission interface via gridftp [gridftpd/jobs] path="/jobs" plugin="jobplugin.so" allownew="yes" groupcfg="users" # openldap server config [infosys] user="root" overwrite_config="yes" port="2135" debug="1" registrationlog="/var/log/arc/inforegistration.log" providerlog="/var/log/arc/infoprovider.log" provider_loglevel="2" infosys_glue12="enable" infosys_glue2_ldap="enable" [infosys/glue12] resource_location="London, UK" resource_longitude="-0.17897" resource_latitude="51.49945" glue_site_web="http://www.hep.ph.ic.ac.uk/e-science/" glue_site_unique_id="UKI-LT2-IC-HEP" cpu_scaling_reference_si00="2160" processor_other_description="Cores=4,Benchmark=8.65-HEP-SPEC06" provide_glue_site_info="false" [infosys/admindomain] name="UKI-LT2-IC-HEP" # infosys view of the computing cluster (service) [cluster] cluster_alias="cetest02 (UKI-LT2-IC-HEP)" comment="UKI-LT2-IC-HEP Main Grid Cluster" homogeneity="True" nodecpu="xeon" architecture="x86_64" nodeaccess="inbound" nodeaccess="outbound" opsys="CentOS" nodememory="6000" authorizedvo="cms" authorizedvo="ops" authorizedvo="dteam" authorizedvo="vo.londongrid.ac.uk" benchmark="SPECINT2000 2075" benchmark="SPECFP2000 2075" totalcpus=4 [queue/condor] name="condor"
Appendix B: Example lcmaps.db
path = /usr/lib64/lcmaps good = "lcmaps_dummy_good.mod" pepc = "lcmaps_c_pep.mod" "--pep-daemon-endpoint-url https://lt2argus00.grid.hep.ph.ic.ac.uk:8154/authz" "--resourceid http://authz-interop.org/xacml/resource/resource-type/wn" "--actionid http://glite.org/xacml/action/execute" "--capath /etc/grid-security/certificates" "--certificate /etc/grid-security/hostcert.pem" "--key /etc/grid-security/hostkey.pem" "--pep-certificate-mode explicit"
argus:
pepc -> good
Appendix C: Example arcsh.sh
#!/bin/sh <br/> source /vols/grid/wn/3.2.11-0/external/etc/profile.d/grid-env.sh <br/> exec /bin/sh "$@" <br/>
Appendix D: Example argus_mapper.sh
#!/bin/sh OUTPUT=`/usr/bin/pepcli -t 20 -p https://lt2argus00.grid.hep.ph.ic.ac.uk:8154/authz -k "$1" -r http://authz-interop.org/xacml/resource/resource-type/wn -a http://glite.org/xacml/action/execute --cert /etc/grid-security/hostcert.pem --key /etc/grid-security/hostkey.pem --capath /etc/grid-security/certificates` RES=$? if [ "$RES" -ne "0" ]; then exit $RES; fi echo "$OUTPUT" | grep -q '^Decision: Permit$' if [ "$?" -ne "0" ]; then exit 1 fi LUSER=`echo "$OUTPUT" | grep '^Username' | cut -d" " -f2` if [ "$?" -ne "0" ]; then exit 2 fi LGROUP=`echo "$OUTPUT" | grep '^Group' | cut -d" " -f2` if [ "$?" -ne "0" ]; then exit 3 fi echo $LUSER:$LGROUP exit 0
Bonus Appendix: Building an ARC-UI package on EL5 & 6
TOP=/path/to/tree
mkdir -p ${TOP}/src
cd ${TOP}/src
wget "http://www.globus.org/ftppub/gt5/5.2/5.2.3/installers/src/gt5.2.3-all-source-installer.tar.gz"
tar zxmvf gt5.2.3-all-source-installer.tar.gz
cd gt5.2.3-all-source-installer
./configure --prefix=${TOP}/arc
make
make install
cd ..
export LD_LIBRARY_PATH=${TOP}/lib64
export PATH=$PATH:${TOP}/bin
export PKG_CONFIG_PATH=${TOP}/lib64/pkgconfig
wget http://download.nordugrid.org/repos/12.05/redhat/el6/source/updates/SRPMS/nordugrid-arc-2.0.1-1.el6.src.rpm
rpm2cpio nordugrid-arc-2.0.1-1.el6.src.rpm | cpio -id
tar zxmvf nordugrid-arc-2.0.1.tar.gz
cd nordugrid-arc-2.0.1
./configure --disable-all --enable-all-clients --enable-hed --prefix=${TOP} --libdir=${TOP}/lib64
make
make install
cd ..
Return to glidein overview page.