Glasgow Job Submission Quickstart Guide

From GridPP Wiki
Jump to: navigation, search

Introduction

This quickstart guide assumes that you have completed the Glasgow Local Users Getting Started Guide and will therefore:

  1. Have obtained a grid certificate
  2. Will have joined a VO (assumed to be dteam in this documentation)
  3. Have local access to the cluster and so can login to svr020.gla.scotgrid.ac.uk on port 2222:
  4. Your files are stored in the standard shared area
  5. Specifically your grid certificate has been copied onto the cluster. You need this to actually run jobs

If you don't understand any of the items in the list please follow the Glasgow Local Users Getting Started Guide which should explain everything you need to get started.

Assumptions

This quickstart guide makes the following assumptions:

  1. Your username is gla019
  2. Your shared data area is /cluster/share/gla019/

You should substitute your own username and data area where appropriate.

Creating and Submitting a Basic Job

Creating

Accessing the System

  • If you logged in using vanilla ssh then initialise a grid proxy using
 $ grid-proxy-init

Preparing the Job

For each job you need a "JDL file" (JDL is the Job Description Language). To submit the simplest "hello world" job, create a file called hello.jdl with this content:

Executable = "/bin/echo";
Arguments = "Hello World";
StdOutput = "hw.out";
StdError = "hw.err";
OutputSandbox = {"hw.out", "hw.err"};
VirtualOrganisation = "gridpp";
Requirements = other.GlueCEUniqueID == "svr016.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs-gridpp";

If your VO is not gridpp substitute for it in the two final lines. In this example hello.jdl was stored in /cluster/share/gla019/.

For a more detailed guide you are referred to

Submitting

Check that everything is working with

   $edg-job-list-match hello.jdl

which should produce a list of sites to which the job may be sent. As we restricted the job to Glasgow only we should match!

Then submit the job:

   $edg-job-submit -o /tmp/hello.jid /cluster/share/gla019/hello.jdl 

Where /tmp/hello.id is created automatically (and stores the job id number) and /cluster/share/gla019/hello.jdl is the path to the hello.jdl file that you have just created. You can check the status of your job using:

   $edg-job-status -i /tmp/hello.jid


If you are quick enough you will see an output like:

*************************************************************
BOOKKEEPING INFORMATION:

Status info for the Job : https://svr023.gla.scotgrid.ac.uk:9000/YbncJC-vh7fYU7WYQ1u5JA
Current Status:     Scheduled 
Status Reason:      Job successfully submitted to Globus
Destination:        svr016.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs-gridpp
reached on:         Tue Jun 12 10:59:40 2007
*************************************************************

which tells you that the job is scheduled to run. Eventually you will see

*************************************************************
BOOKKEEPING INFORMATION:

Status info for the Job : https://svr023.gla.scotgrid.ac.uk:9000/YbncJC-vh7fYU7WYQ1u5JA
Current Status:     Done (Success)
Exit code:          0
Status Reason:      Job terminated successfully
Destination:        svr016.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs-gridpp
reached on:         Tue Jun 12 11:01:11 2007
*************************************************************

when the job has successfully completed.

When the job is complete you can retrieve the output (the stdout and stderr files in this case) with

   $edg-job-get-output -i /tmp/hello.jid

Typically the output is stored in /tmp but you can supply your own with the --dir option.

Executing Pre-Prepared Binaries

It's unlikely your research involves printing hello world, so /bin/echo is of limited use.

More likely you wish to execute a binary or wrapper script you have prepared in your CLUSTER_SHARED area. In that case just give the name of this binary as the job to execute:

Executable = "/cluster/share/gla019/bin/myapp";
Arguments = "--input=someData --output=cluster/share/gla019/output/someProcessedData";
StdOutput = "stdout";
StdError = "stderr";
OutputSandbox = {"stdout", "stderr"};
VirtualOrganisation = "gridpp";
Requirements = other.GlueCEUniqueID == "svr016.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs-gridpp";

Then proceed as above.

Submitting Wrapper Scripts and Other Files

If you can submit to more than just the Glasgow cluster you'll probably need scripts and other files to be shipped with the job. These are sent in the so-called input sandbox. As a simple example, create a file called hello2.txt containing the string "Hello World". Then create a file hello2.sh

#!/bin/bash

cat $1

and create a new JDL file hello2.jdl:

Executable = "hello2.sh";
Arguments = "hello2.txt";
InputSandbox = {"hello2.sh", "hello2.txt"};
StdOutput = "hw.out";
StdError = "hw.err";
OutputSandbox = {"hw.out", "hw.err"};
VirtualOrganisation = "gridpp";

changing "gridpp" to your own VO if necessary and submit it as before.

Note that we dropped the Requirements stanza which previously limited the job to running at Glasgow - edg-job-list-match should now return more matching sites.

The files hello2.txt and hello2.sh will be sumbitted along with the job, hello2.sh will be set to executable and "hello2.sh hello2.txt" will be run which will print the string "Hello World" to stdout.

One thing to note is that permissions on the sandbox files are not preserved. A file named in the Executable field in the JDL will have the x bit set, but any other files will have it cleared.

You should also be aware that sandboxes are for small files, up to a few MB - resource brokers will generally limit the maximum size. Larger files should be accessed via the data management system.

The examples above have jobs which take a very short time to run. However, for real jobs you need to take into account that batch queues have time limits. This can be managed by adding a Requirement to the JDL, which specifies constraints on the site and queue used to run the job. This can be quite complex and you should consult the User Guide for full details. However, as a simple example you can specify a minimum CPU time of an hour with a JDL line like:

Requirements = other.GlueCEPolicyMaxCPUTime > 60;

Scaling Up To Many Jobs

It's possible to write your own wrappers around multiple job submissions, status pollings and retrieval of outputs. However, this is re-inventing a wheel and we don't recommend it. Instead we'd suggest the Glasgow Ganga Quickstart Guide which describes how to use the ganga package to control multiple jobs on the grid.