Difference between revisions of "ARC HTCondor Basic Install"

From GridPP Wiki
Jump to: navigation, search
Line 118: Line 118:
 
Note that it may take a few minutes for the information system to be available.
 
Note that it may take a few minutes for the information system to be available.
  
Try submitting a test job:
+
Try submitting a test job using ''arctest'', which creates and submits predefined test jobs:
 
  -bash-4.1$ arctest -c lcgvm21.gridpp.rl.ac.uk -J 1
 
  -bash-4.1$ arctest -c lcgvm21.gridpp.rl.ac.uk -J 1
 
  Test submitted with jobid: gsiftp://lcgvm21.gridpp.rl.ac.uk:2811/jobs/Um0NDmEkj2jnvMODjqAWcw5nABFKDmABFKDmOpOKDmABFKDmTCef4m
 
  Test submitted with jobid: gsiftp://lcgvm21.gridpp.rl.ac.uk:2811/jobs/Um0NDmEkj2jnvMODjqAWcw5nABFKDmABFKDmOpOKDmABFKDmTCef4m
Line 152: Line 152:
 
  -bash-4.1$ arcsub -c lcgvm21.gridpp.rl.ac.uk test11.xrsl  
 
  -bash-4.1$ arcsub -c lcgvm21.gridpp.rl.ac.uk test11.xrsl  
 
  Job submitted with jobid: gsiftp://lcgvm21.gridpp.rl.ac.uk:2811/jobs/D7rNDmOKk2jnvMODjqAWcw5nABFKDmABFKDmoqQKDmABFKDmvdxuEn
 
  Job submitted with jobid: gsiftp://lcgvm21.gridpp.rl.ac.uk:2811/jobs/D7rNDmOKk2jnvMODjqAWcw5nABFKDmABFKDmoqQKDmABFKDmvdxuEn
 +
 +
The job should soon finish:
 +
-bash-4.1$ arcstat gsiftp://lcgvm21.gridpp.rl.ac.uk:2811/jobs/D7rNDmOKk2jnvMODjqAWcw5nABFKDmABFKDmoqQKDmABFKDmvdxuEn
 +
Job: gsiftp://lcgvm21.gridpp.rl.ac.uk:2811/jobs/D7rNDmOKk2jnvMODjqAWcw5nABFKDmABFKDmoqQKDmABFKDmvdxuEn
 +
  Name: ARC-HTCondor test
 +
  State: Finished (FINISHED)
 +
  Exit Code: 0

Revision as of 20:47, 8 May 2014

This page explains how to setup a minimal ARC CE and HTCondor pool. In order to be as simple as possible the CE, HTCondor central manager and worker node are setup on a single machine.

Prerequisites

Prepare an SL6 VM with a host certificate.

ARC CE installation

YUM repository configuration for EPEL and NorduGrid:

rpm -Uvh https://anorien.csc.warwick.ac.uk/mirrors/epel/6/x86_64/epel-release-6-8.noarch.rpm
rpm -Uvh http://download.nordugrid.org/packages/nordugrid-release/releases/13.11/centos/el6/x86_64/nordugrid-release-13.11-1.el6.noarch.rpm

Install the ARC CE meta-package:

yum install nordugrid-arc-compute-element

HTCondor installation

Setup the YUM repository:

cd /etc/yum.repos.d/
wget http://research.cs.wisc.edu/htcondor/yum/repo.d/htcondor-stable-rhel6.repo

Install the most recent stable version of HTCondor:

yum install condor

HTCondor configuration

Configure HTCondor to use partitionable slots. Create a file /etc/condor/config.d/00-slots containing the following:

NUM_SLOTS = 1
SLOT_TYPE_1               = cpus=100%,mem=100%,auto
NUM_SLOTS_TYPE_1          = 1
SLOT_TYPE_1_PARTITIONABLE = TRUE

Start HTCondor by running:

service condor start

Check the HTCondor is working correctly:

[root@lcgvm21 ~]# condor_status -any
MyType             TargetType         Name                                     
Collector          None               Personal Condor at lcgvm21.gridpp.rl.ac.u
Scheduler          None               lcgvm21.gridpp.rl.ac.uk                  
DaemonMaster       None               lcgvm21.gridpp.rl.ac.uk                  
Negotiator         None               lcgvm21.gridpp.rl.ac.uk                  
Machine            Job                slot1@lcgvm21.gridpp.rl.ac.uk

Usually the 'Collector' and 'Negotiator' would be running on a machine designated as the central manager and the 'Scheduler' would be running on the CE. Here 'Machine' corresponds to a resource able to run jobs, i.e. a worker node. Every machine running HTCondor in addition has a 'Master' daemon running which takes care of all other HTCondor daemons running on it.

ARC CE configuration

Create the required control and session directories:

mkdir -p /var/spool/arc/jobstatus
mkdir -p /var/spool/arc/grid

Create a simple grid-mapfile for testing, for example /etc/grid-security/grid-mapfile containing, for example:

"/C=UK/O=eScience/OU=CLRC/L=RAL/CN=andrew lahiff" pcms001

Replace the DN and user id here as necessary.

Create a minimal configuration file /etc/arc.conf:

[common]
x509_user_key="/etc/grid-security/hostkey.pem"
x509_user_cert="/etc/grid-security/hostcert.pem"
x509_cert_dir="/etc/grid-security/certificates"
gridmap="/etc/grid-security/grid-mapfile"
lrms="condor" 
[grid-manager] user="root" controldir="/var/spool/arc/jobstatus" sessiondir="/var/spool/arc/grid" runtimedir="/etc/arc/runtime" logfile="/var/log/arc/grid-manager.log" pidfile="/var/run/grid-manager.pid" joblog="/var/log/arc/gm-jobs.log" shared_filesystem="no"
[gridftpd] user="root" logfile="/var/log/arc/gridftpd.log" pidfile="/var/run/gridftpd.pid" port="2811" allowunknown="no"
[gridftpd/jobs] path="/jobs" plugin="jobplugin.so" allownew="yes"
[infosys] user="root" overwrite_config="yes" port="2135" registrationlog="/var/log/arc/inforegistration.log" providerlog="/var/log/arc/infoprovider.log"
[cluster] cluster_alias="MINIMAL Computing Element" comment="This is a minimal out-of-box CE setup" homogeneity="True" architecture="adotf" nodeaccess="outbound" authorizedvo="cms"
[queue/grid] name="grid" homogeneity="True" comment="Default queue" nodecpu="adotf" architecture="adotf" defaultmemory="1000"

Start the GridFTP server, A-REX service and LDAP information system:

service gridftpd start
service a-rex start
service nordugrid-arc-ldap-infosys start

The ARC CE and HTCondor pool is now ready.

Testing

From a standard UI, check the status of the newly-installed ARC CE:

-bash-4.1$ arcinfo -c lcgvm21.gridpp.rl.ac.uk
Computing service: MINIMAL Computing Element (production)
  Information endpoint: ldap://lcgvm21.gridpp.rl.ac.uk:2135/Mds-Vo-Name=local,o=grid
  Information endpoint: ldap://lcgvm21.gridpp.rl.ac.uk:2135/o=glue
  Submission endpoint: gsiftp://lcgvm21.gridpp.rl.ac.uk:2811/jobs (status: ok, interface: org.nordugrid.gridftpjob)

Note that it may take a few minutes for the information system to be available.

Try submitting a test job using arctest, which creates and submits predefined test jobs:

-bash-4.1$ arctest -c lcgvm21.gridpp.rl.ac.uk -J 1
Test submitted with jobid: gsiftp://lcgvm21.gridpp.rl.ac.uk:2811/jobs/Um0NDmEkj2jnvMODjqAWcw5nABFKDmABFKDmOpOKDmABFKDmTCef4m

Check the status of the job. If you do this before the information system has been updated, you will see a response like this

-bash-4.1$ arcstat gsiftp://lcgvm21.gridpp.rl.ac.uk:2811/jobs/Um0NDmEkj2jnvMODjqAWcw5nABFKDmABFKDmOpOKDmABFKDmTCef4m
WARNING: Job information not found in the information system: gsiftp://lcgvm21.gridpp.rl.ac.uk:2811/jobs/Um0NDmEkj2jnvMODjqAWcw5nABFKDmABFKDmOpOKDmABFKDmTCef4m
WARNING: This job was very recently submitted and might not yet have reached the information system
No jobs

When the job has finished running you should see this:

-bash-4.1$ arcstat gsiftp://lcgvm21.gridpp.rl.ac.uk:2811/jobs/Um0NDmEkj2jnvMODjqAWcw5nABFKDmABFKDmOpOKDmABFKDmTCef4m
Job: gsiftp://lcgvm21.gridpp.rl.ac.uk:2811/jobs/Um0NDmEkj2jnvMODjqAWcw5nABFKDmABFKDmOpOKDmABFKDmTCef4m
 Name: arctest1
 State: Finished (FINISHED)
 Exit Code: 0

The job's output can also be obtained easily:

-bash-4.1$ arcget gsiftp://lcgvm21.gridpp.rl.ac.uk:2811/jobs/Um0NDmEkj2jnvMODjqAWcw5nABFKDmABFKDmOpOKDmABFKDmTCef4m
Results stored at: Um0NDmEkj2jnvMODjqAWcw5nABFKDmABFKDmOpOKDmABFKDmTCef4m
Jobs processed: 1, successfully retrieved: 1, successfully cleaned: 1

Create a file, e.g. test.xrsl, containing:

&(executable="test.sh")
(stdout="test.out")
(stderr="test.err")
(jobname="ARC-HTCondor test")

and create an executable test.sh, for example:

#!/bin/sh
printenv

Submit the job:

-bash-4.1$ arcsub -c lcgvm21.gridpp.rl.ac.uk test11.xrsl 
Job submitted with jobid: gsiftp://lcgvm21.gridpp.rl.ac.uk:2811/jobs/D7rNDmOKk2jnvMODjqAWcw5nABFKDmABFKDmoqQKDmABFKDmvdxuEn

The job should soon finish:

-bash-4.1$ arcstat gsiftp://lcgvm21.gridpp.rl.ac.uk:2811/jobs/D7rNDmOKk2jnvMODjqAWcw5nABFKDmABFKDmoqQKDmABFKDmvdxuEn
Job: gsiftp://lcgvm21.gridpp.rl.ac.uk:2811/jobs/D7rNDmOKk2jnvMODjqAWcw5nABFKDmABFKDmoqQKDmABFKDmvdxuEn
 Name: ARC-HTCondor test
 State: Finished (FINISHED)
 Exit Code: 0