Difference between revisions of "Imperial Condor Log"

From GridPP Wiki
Jump to: navigation, search
(Plain condor with GPU node as WN)
Line 16: Line 16:
 
   <pre>
 
   <pre>
 
   service condor restart </pre>
 
   service condor restart </pre>
* This basic config files works:
+
* These basic config files work: <br />
 +
On cetest03:
 
   <pre>
 
   <pre>
 
   CONDOR_HOST = cetest03.grid.hep.ph.ic.ac.uk
 
   CONDOR_HOST = cetest03.grid.hep.ph.ic.ac.uk
  DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD
+
  # this makes it a scheduler
  SEC_PASSWORD_FILE = /etc/condor/pool_password
+
  DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD
  ALLOW_WRITE = *.grid.hep.ph.ic.ac.uk
+
  SEC_PASSWORD_FILE = /etc/condor/pool_password
  UID_DOMAIN = cetest03.grid.hep.ph.ic.ac.uk
+
  ALLOW_WRITE = *.grid.hep.ph.ic.ac.uk
  use feature : GPUs
+
  UID_DOMAIN = cetest03.grid.hep.ph.ic.ac.uk
  # stop the emails
+
  use feature : GPUs
  SCHEDD_RESTART_REPORT =  
+
  # stop the emails
 +
  SCHEDD_RESTART_REPORT =  
 +
  </pre> <br /> On lt2gpu00:
 
   <pre>
 
   <pre>
 +
  CONDOR_HOST = cetest03.grid.hep.ph.ic.ac.uk
 +
  # this makes it a WN
 +
  DAEMON_LIST = MASTER, STARTD
 +
  # get server and WN to talk to each other
 +
  SEC_PASSWORD_FILE = /etc/condor/pool_password
 +
  ALLOW_WRITE = *.grid.hep.ph.ic.ac.uk
 +
  # I don't want to be nobody: keep same user name throught
 +
  UID_DOMAIN = cetest03.grid.hep.ph.ic.ac.uk
 +
  use feature : GPUs
 +
  SCHEDD_RESTART_REPORT = </pre>
 
* Too see what's going on:  
 
* Too see what's going on:  
 
** On lt2gpu00: condor_status
 
** On lt2gpu00: condor_status

Revision as of 16:51, 3 June 2016

Plain condor with GPU node as WN

  • Install plain condor on cetest03 and lt2gpu00
wget https://research.cs.wisc.edu/htcondor/yum/repo.d/htcondor-stable-rhel6.repo
rpm --import http://research.cs.wisc.edu/htcondor/yum/RPM-GPG-KEY-HTCondor
yum install condor
  • Open the relevant ports on both machines (wiki not secret enough to list here, I think)
  • Make some users (same uid/gid on both machines).
    (Because I am no good at remembering options, here's two samples:)
   useradd -m -d /srv/localstage/user004 user004
   useradd -m -d /srv/localstage/user002 -g user002 -u 502 user002  
  • All configurations go in /etc/condor/condor_config.local.
    We'll try and keep the configurations as identical on both nodes as possible, even if not every option is needed by every node.
  • After changing the configuration condor needs to be restarted to reload the config file:
  service condor restart 
  • These basic config files work:

On cetest03:

  CONDOR_HOST = cetest03.grid.hep.ph.ic.ac.uk
  # this makes it a scheduler
  DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD
  SEC_PASSWORD_FILE = /etc/condor/pool_password
  ALLOW_WRITE = *.grid.hep.ph.ic.ac.uk
  UID_DOMAIN = cetest03.grid.hep.ph.ic.ac.uk
  use feature : GPUs
  # stop the emails
  SCHEDD_RESTART_REPORT = 
  

On lt2gpu00:
  CONDOR_HOST = cetest03.grid.hep.ph.ic.ac.uk
  # this makes it a WN
  DAEMON_LIST = MASTER, STARTD
  # get server and WN to talk to each other
  SEC_PASSWORD_FILE = /etc/condor/pool_password
  ALLOW_WRITE = *.grid.hep.ph.ic.ac.uk
  # I don't want to be nobody: keep same user name throught
  UID_DOMAIN = cetest03.grid.hep.ph.ic.ac.uk
  use feature : GPUs
  SCHEDD_RESTART_REPORT = 
  • Too see what's going on:
    • On lt2gpu00: condor_status
    • On cetest03: condor_q, condor_status -long
  • The other helpful command to see what's going on: condor_config_val -dump
  • submit a test job (as user001)