Difference between revisions of "A quick guide to HTCondor"
Line 74: | Line 74: | ||
* <code>log=job.$(cluster).$(process).log</code>: tells HTCondor where to write the job event log and the name of the file | * <code>log=job.$(cluster).$(process).log</code>: tells HTCondor where to write the job event log and the name of the file | ||
− | + | Once the job has completed there will be 3 files visible, containing the log, stdout and stderr: | |
− | + | <pre> | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
-bash-4.1$ ls -lt *91959* | -bash-4.1$ ls -lt *91959* | ||
-rw-r--r-- 1 alahiff esc 1032 Feb 26 13:09 job.91959.0.log | -rw-r--r-- 1 alahiff esc 1032 Feb 26 13:09 job.91959.0.log | ||
-rw-r--r-- 1 alahiff esc 24 Feb 26 13:09 job.91959.0.out | -rw-r--r-- 1 alahiff esc 24 Feb 26 13:09 job.91959.0.out | ||
-rw-r--r-- 1 alahiff esc 0 Feb 26 13:08 job.91959.0.err | -rw-r--r-- 1 alahiff esc 0 Feb 26 13:08 job.91959.0.err | ||
+ | </pre> | ||
+ | |||
+ | == Input/output files == | ||
Note that with this example job description file, all files generated by the job will be automatically transferred back to the machine where you submitted the job (i.e. lcgui03 or lcgui04). You can prevent this from happening by adding: | Note that with this example job description file, all files generated by the job will be automatically transferred back to the machine where you submitted the job (i.e. lcgui03 or lcgui04). You can prevent this from happening by adding: | ||
Line 101: | Line 94: | ||
and they will be copied. | and they will be copied. | ||
− | + | == Official documentation == | |
http://research.cs.wisc.edu/htcondor/manual/v8.4/index.html | http://research.cs.wisc.edu/htcondor/manual/v8.4/index.html | ||
+ | |||
+ | Information about submitting jobs http://research.cs.wisc.edu/htcondor/manual/v8.4/condor_submit.html#man-condor-submit | ||
Regards, | Regards, | ||
Andrew. | Andrew. |
Revision as of 14:05, 26 February 2016
Basic commands
Firstly, some basic HTCondor commands are as follows. To submit a job, type:
condor_submit <file>
To list your running and idle jobs, type:
condor_q
To list completed jobs, type:
condor_history
Job submission
Create a file called simplejob.sub
containing:
cmd=script.sh arguments=10 output=job.$(cluster).$(process).out error=job.$(cluster).$(process).err log=job.$(cluster).$(process).log should_transfer_files = YES when_to_transfer_output = ON_EXIT RequestMemory=100 queue
and a script called script.sh
containing:
#!/bin/sh sleep $1 hostname
Make sure it's executable:
chmod 755 script.sh
Submit the job:
-bash-4.1$ condor_submit simplejob.sub Submitting job(s). 1 job(s) submitted to cluster 91959.
Checking the status of the job:
-bash-4.1$ condor_q -- Schedd: lcgui03.gridpp.rl.ac.uk : <130.246.180.41:33754?... ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 91959.0 alahiff 2/26 13:08 0+00:00:07 R 0 0.0 script.sh 10 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
Explanation of the content of script.sh
:
-
cmd=script.sh
: the job will execute the scriptscript.sh
-
arguments=10
: the argument10
will be passed to the executable when it is run -
RequestMemory=100
: request 100MB memory for the job -
should_transfer_files = YES
: tells HTCondor to transfer files to/from the worker node -
when_to_transfer_output = ON_EXIT
: tells HTCondor transfers any output files only when the job has completed -
output=job.$(cluster).$(process).out
: tells HTCondor the path (and name) of the file containing the job's stderr on the submit machine -
error=job.$(cluster).$(process).err
: tells HTCondor the path (and name) of the file containing the job's stdout on the submit machine -
log=job.$(cluster).$(process).log
: tells HTCondor where to write the job event log and the name of the file
Once the job has completed there will be 3 files visible, containing the log, stdout and stderr:
-bash-4.1$ ls -lt *91959* -rw-r--r-- 1 alahiff esc 1032 Feb 26 13:09 job.91959.0.log -rw-r--r-- 1 alahiff esc 24 Feb 26 13:09 job.91959.0.out -rw-r--r-- 1 alahiff esc 0 Feb 26 13:08 job.91959.0.err
Input/output files
Note that with this example job description file, all files generated by the job will be automatically transferred back to the machine where you submitted the job (i.e. lcgui03 or lcgui04). You can prevent this from happening by adding:
+TransferOutput=""
If the job needs additional files, you can add a line something like this:
transfer_input_files = input1.dat,input2.dat
and they will be copied.
Official documentation
http://research.cs.wisc.edu/htcondor/manual/v8.4/index.html
Information about submitting jobs http://research.cs.wisc.edu/htcondor/manual/v8.4/condor_submit.html#man-condor-submit
Regards, Andrew.