Difference between revisions of "Imperial Condor Log"
From GridPP Wiki
(Created page with "=== Plain condor with GPU node as WN === * Install plain condor on cetest03 and lt2gpu00 <pre> wget https://research.cs.wisc.edu/htcondor/yum/repo.d/htcondor-stable-rhel6.rep...") |
(→Plain condor with GPU node as WN) |
||
Line 17: | Line 17: | ||
service condor restart </pre> | service condor restart </pre> | ||
* This basic config files works: | * This basic config files works: | ||
− | + | <pre> | |
+ | CONDOR_HOST = cetest03.grid.hep.ph.ic.ac.uk | ||
+ | DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD | ||
+ | SEC_PASSWORD_FILE = /etc/condor/pool_password | ||
+ | ALLOW_WRITE = *.grid.hep.ph.ic.ac.uk | ||
+ | UID_DOMAIN = cetest03.grid.hep.ph.ic.ac.uk | ||
+ | use feature : GPUs | ||
+ | # stop the emails | ||
+ | SCHEDD_RESTART_REPORT = | ||
+ | <pre> | ||
* Too see what's going on: | * Too see what's going on: | ||
** On lt2gpu00: condor_status | ** On lt2gpu00: condor_status |
Revision as of 16:48, 3 June 2016
Plain condor with GPU node as WN
- Install plain condor on cetest03 and lt2gpu00
wget https://research.cs.wisc.edu/htcondor/yum/repo.d/htcondor-stable-rhel6.repo rpm --import http://research.cs.wisc.edu/htcondor/yum/RPM-GPG-KEY-HTCondor yum install condor
- Open the relevant ports on both machines (wiki not secret enough to list here, I think)
- Make some users (same uid/gid on both machines).
(Because I am no good at remembering options, here's two samples:)
useradd -m -d /srv/localstage/user004 user004 useradd -m -d /srv/localstage/user002 -g user002 -u 502 user002
- All configurations go in /etc/condor/condor_config.local.
We'll try and keep the configurations as identical on both nodes as possible, even if not every option is needed by every node. - After changing the configuration condor needs to be restarted to reload the config file:
service condor restart
- This basic config files works:
CONDOR_HOST = cetest03.grid.hep.ph.ic.ac.uk DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD SEC_PASSWORD_FILE = /etc/condor/pool_password ALLOW_WRITE = *.grid.hep.ph.ic.ac.uk UID_DOMAIN = cetest03.grid.hep.ph.ic.ac.uk use feature : GPUs # stop the emails SCHEDD_RESTART_REPORT = <pre> * Too see what's going on: ** On lt2gpu00: condor_status ** On cetest03: condor_q, condor_status -long * The other helpful command to see what's going on: condor_config_val -dump * submit a test job (as user001)