Difference between revisions of "A quick guide to HTCondor"

From GridPP Wiki
Jump to: navigation, search
Line 2: Line 2:
 
== Basic commands ==
 
== Basic commands ==
  
Firstly, some basic HTCondor commands are as follows.
+
Firstly, some basic HTCondor commands are as follows. To submit a job, type:
 
+
To submit a job, type:
+
  
 
<pre>
 
<pre>
Line 22: Line 20:
 
</pre>
 
</pre>
  
For example, if you create a file called simplejob.sub containing the following [1], as well as a script called script.sh [2], you can submit it and should see:
+
== Job submission ==
 +
 
 +
Create a file called simplejob.sub containing:
 +
<pre>
 +
cmd=script.sh
 +
arguments=10
 +
output=job.$(cluster).$(process).out
 +
error=job.$(cluster).$(process).err
 +
log=job.$(cluster).$(process).log
 +
should_transfer_files = YES
 +
when_to_transfer_output = ON_EXIT
 +
RequestMemory=100
 +
queue
 +
</pre>
 +
and a script called <code>script.sh</code> containing:
 +
<pre>
 +
#!/bin/sh
 +
sleep $1
 +
hostname
 +
</pre>
 +
Make sure it's executable:
 +
<pre>
 +
chmod 755 script.sh
 +
</pre>
 +
 
 +
 
 +
as well as a script called script.sh [2], you can submit it and should see:
  
 
-bash-4.1$ condor_submit simplejob.sub
 
-bash-4.1$ condor_submit simplejob.sub

Revision as of 13:48, 26 February 2016

Basic commands

Firstly, some basic HTCondor commands are as follows. To submit a job, type:

condor_submit <file>

To list your running and idle jobs, type:

condor_q

To list completed jobs, type:

condor_history

Job submission

Create a file called simplejob.sub containing:

cmd=script.sh
arguments=10
output=job.$(cluster).$(process).out
error=job.$(cluster).$(process).err
log=job.$(cluster).$(process).log
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
RequestMemory=100
queue

and a script called script.sh containing:

#!/bin/sh
sleep $1
hostname

Make sure it's executable:

chmod 755 script.sh


as well as a script called script.sh [2], you can submit it and should see:

-bash-4.1$ condor_submit simplejob.sub Submitting job(s). 1 job(s) submitted to cluster 91959.

The job will: - run the script "script.sh" with argument "10" - 100 MB of memory will be requested - the executable (script.sh in this case) will be automatically transferred to the worker nodes - stdout from the job will end up in the file job.91959.0.out (once it completes) - stderr from the job will end up in the file job.91959.0.err (once it completes) - the log file job.91959.0.log will give some information about the status of the job

Checking the status:

-bash-4.1$ condor_q


-- Schedd: lcgui03.gridpp.rl.ac.uk : <130.246.180.41:33754?...

ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD

91959.0 alahiff 2/26 13:08 0+00:00:07 R 0 0.0 script.sh 10

1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended

 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended

Once the job is completed

-bash-4.1$ ls -lt *91959* -rw-r--r-- 1 alahiff esc 1032 Feb 26 13:09 job.91959.0.log -rw-r--r-- 1 alahiff esc 24 Feb 26 13:09 job.91959.0.out -rw-r--r-- 1 alahiff esc 0 Feb 26 13:08 job.91959.0.err

Note that with this example job description file, all files generated by the job will be automatically transferred back to the machine where you submitted the job (i.e. lcgui03 or lcgui04). You can prevent this from happening by adding:

+TransferOutput=""

If the job needs additional files, you can add a line something like this:

transfer_input_files = input1.dat,input2.dat

and they will be copied.

The official documentation is here:

http://research.cs.wisc.edu/htcondor/manual/v8.4/index.html

Regards, Andrew.

[1] cmd=script.sh arguments=10 output=job.$(cluster).$(process).out error=job.$(cluster).$(process).err log=job.$(cluster).$(process).log should_transfer_files = YES when_to_transfer_output = ON_EXIT RequestMemory=100 queue

[2]

  1. !/bin/sh

sleep $1 hostname