Difference between revisions of "Imperial Condor Log"
From GridPP Wiki
(→Plain condor with GPU node as WN) |
|||
Line 46: | Line 46: | ||
* The other helpful command to see what's going on: condor_config_val -dump | * The other helpful command to see what's going on: condor_config_val -dump | ||
− | * submit a test job (as user001 on cetest03) | + | * submit a test job (as user001 on cetest03) <br /> |
+ | |||
<pre> | <pre> | ||
[user001@cetest03 ~]$ cat test.submit | [user001@cetest03 ~]$ cat test.submit | ||
Line 60: | Line 61: | ||
Queue | Queue | ||
</pre> | </pre> | ||
+ | <pre> | ||
+ | [user001@cetest03 ~]$ cat hello_world.sh | ||
+ | #!/bin/bash | ||
+ | |||
+ | echo "Hello World" | ||
+ | echo "Today is: " `date` | ||
+ | echo "I am running on: " `hostname` | ||
+ | echo "I am " `whoami` | ||
+ | |||
+ | env | sort | ||
+ | |||
+ | echo "+++++++++++++++++++++++++++++++++++" | ||
+ | |||
+ | /srv/localstage/sf105/samples/NVIDIA_CUDA-7.5_Samples/bin/x86_64/linux/release/deviceQuery | ||
+ | |||
+ | sleep 30 | ||
+ | </pre> | ||
+ | |||
+ | The two different GPUs can be distinguished by their Bus ID: 4 and 10. |
Revision as of 13:28, 6 June 2016
Plain condor with GPU node as WN
- Install plain condor on cetest03 and lt2gpu00
wget https://research.cs.wisc.edu/htcondor/yum/repo.d/htcondor-stable-rhel6.repo rpm --import http://research.cs.wisc.edu/htcondor/yum/RPM-GPG-KEY-HTCondor yum install condor
- Open the relevant ports on both machines (wiki not secret enough to list here, I think)
- Make some users (same uid/gid on both machines).
(Because I am no good at remembering options, here's two samples:)
useradd -m -d /srv/localstage/user004 user004 useradd -m -d /srv/localstage/user002 -g user002 -u 502 user002
- All configurations go in /etc/condor/condor_config.local.
We'll try and keep the configurations as identical on both nodes as possible, even if not every option is needed by every node. - After changing the configuration condor needs to be restarted to reload the config file:
service condor restart
- These basic config files work:
On cetest03:
CONDOR_HOST = cetest03.grid.hep.ph.ic.ac.uk # this makes it a scheduler DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD SEC_PASSWORD_FILE = /etc/condor/pool_password ALLOW_WRITE = *.grid.hep.ph.ic.ac.uk UID_DOMAIN = cetest03.grid.hep.ph.ic.ac.uk use feature : GPUs # stop the emails SCHEDD_RESTART_REPORT =
On lt2gpu00:
CONDOR_HOST = cetest03.grid.hep.ph.ic.ac.uk # this makes it a WN DAEMON_LIST = MASTER, STARTD # get server and WN to talk to each other SEC_PASSWORD_FILE = /etc/condor/pool_password ALLOW_WRITE = *.grid.hep.ph.ic.ac.uk # I don't want to be nobody: keep same user name throught UID_DOMAIN = cetest03.grid.hep.ph.ic.ac.uk use feature : GPUs SCHEDD_RESTART_REPORT =
- Too see what's going on:
- On lt2gpu00: condor_status
- On cetest03: condor_q, condor_status -long
- The other helpful command to see what's going on: condor_config_val -dump
- submit a test job (as user001 on cetest03)
[user001@cetest03 ~]$ cat test.submit Universe = vanilla Executable = hello_world.sh input = /dev/null output = hello.out.$(Cluster) error = hello.error.$(ClusterId) request_GPUs = 1 Queue
[user001@cetest03 ~]$ cat hello_world.sh #!/bin/bash echo "Hello World" echo "Today is: " `date` echo "I am running on: " `hostname` echo "I am " `whoami` env | sort echo "+++++++++++++++++++++++++++++++++++" /srv/localstage/sf105/samples/NVIDIA_CUDA-7.5_Samples/bin/x86_64/linux/release/deviceQuery sleep 30
The two different GPUs can be distinguished by their Bus ID: 4 and 10.