Difference between revisions of "Imperial Condor Log"
From GridPP Wiki
(→Plain condor with GPU node as WN) |
|||
Line 48: | Line 48: | ||
* submit a test job (as user001 on cetest03) -- needs editing | * submit a test job (as user001 on cetest03) -- needs editing | ||
<pre> | <pre> | ||
− | + | [user001@cetest03 ~]$ cat test.submit | |
Universe = vanilla | Universe = vanilla | ||
Line 58: | Line 58: | ||
request_GPUs = 1 | request_GPUs = 1 | ||
− | |||
Queue | Queue | ||
</pre> | </pre> |
Revision as of 13:24, 6 June 2016
Plain condor with GPU node as WN
- Install plain condor on cetest03 and lt2gpu00
wget https://research.cs.wisc.edu/htcondor/yum/repo.d/htcondor-stable-rhel6.repo rpm --import http://research.cs.wisc.edu/htcondor/yum/RPM-GPG-KEY-HTCondor yum install condor
- Open the relevant ports on both machines (wiki not secret enough to list here, I think)
- Make some users (same uid/gid on both machines).
(Because I am no good at remembering options, here's two samples:)
useradd -m -d /srv/localstage/user004 user004 useradd -m -d /srv/localstage/user002 -g user002 -u 502 user002
- All configurations go in /etc/condor/condor_config.local.
We'll try and keep the configurations as identical on both nodes as possible, even if not every option is needed by every node. - After changing the configuration condor needs to be restarted to reload the config file:
service condor restart
- These basic config files work:
On cetest03:
CONDOR_HOST = cetest03.grid.hep.ph.ic.ac.uk # this makes it a scheduler DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD SEC_PASSWORD_FILE = /etc/condor/pool_password ALLOW_WRITE = *.grid.hep.ph.ic.ac.uk UID_DOMAIN = cetest03.grid.hep.ph.ic.ac.uk use feature : GPUs # stop the emails SCHEDD_RESTART_REPORT =
On lt2gpu00:
CONDOR_HOST = cetest03.grid.hep.ph.ic.ac.uk # this makes it a WN DAEMON_LIST = MASTER, STARTD # get server and WN to talk to each other SEC_PASSWORD_FILE = /etc/condor/pool_password ALLOW_WRITE = *.grid.hep.ph.ic.ac.uk # I don't want to be nobody: keep same user name throught UID_DOMAIN = cetest03.grid.hep.ph.ic.ac.uk use feature : GPUs SCHEDD_RESTART_REPORT =
- Too see what's going on:
- On lt2gpu00: condor_status
- On cetest03: condor_q, condor_status -long
- The other helpful command to see what's going on: condor_config_val -dump
- submit a test job (as user001 on cetest03) -- needs editing
[user001@cetest03 ~]$ cat test.submit Universe = vanilla Executable = hello_world.sh input = /dev/null output = hello.out.$(Cluster) error = hello.error.$(ClusterId) request_GPUs = 1 Queue