Submitting jobs to Glasgow

From GridPP Wiki
Revision as of 06:34, 22 June 2007 by Graeme stewart (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

The Glasgow cluster is open to all researchers at the University using grid methods. In addition the Glasgow cluster is part of ScotGrid, the EGEE project and the WLCG grid - if you are a member of a supported virtual organisation (VO) from anywhere in the world you will be able to use our cluster.

Prerequisite: Get a Grid Certificate

You won't be able to use the cluster at all unless you have a grid certificate. Within the UK certificates are issued by the UK eScience CA: http://www.grid-support.ac.uk/ca/.

You will have to show photo ID to the registrar - currently at Glasgow this is John Watt in NeSC (Kelvin Building).

Glasgow Users not in a VO

If you are not yet in a VO then you can get access to the cluster via (gsi)ssh in order to perpare a sutable environment for your jobs - placing data files, compiling binaries and libraries, etc.

Then you have a choice of grid methods which you can use to run your jobs:

  1. Directly, using globus job submission.
  2. Join a test VO, which allows submission via a resource broker.

Direct Globus Submission

Once you have a recognised DN, then you can submit directly into the cluster using the globus gatekeeper interface.

  1. Email the DN of your certificate to [1] requesting access to the cluster. Important: You must agree to be bound by the latest version of the JSPG Grid Acceptable Use Policy document (reading VO as my research project). Please state this clearly as part of your request. Access will not be granted otherwise.
  2. Once you hear that you've been granted access use a gsissh client to login to svr020.gla.scotgrid.ac.uk:2222.
  3. Compile or seed your application on svr020 - this will be visible to the batch worker nodes when you submit your jobs.
  4. Use globus-job-submit to submit a job into the local batch system. The resource name is svr016.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs.

N.B. There are some known problems with globus submission. On it's own it's not a scalable way of submitting jobs - you'll need another workflow manager to do this (people usually end up lashing up some bash scripts, which is troublesome and reinvents the wheel). There is also a known bug where the gatekeeper will, for a period of some minutes, claim you cannot poll your jobs because it thinks you are someone else. This error is normally transitory and will go away if you retry.

Joining a VO for Submission

This method uses the standard EGEE job submission tools (edg-job-submit), which submit your job to a resource broker. In theory a resource broker could submit your job to any site on the grid, but in practice, if only Glasgow is set-up to receive your jobs then you should restrict your job so it only runs on our cluster.

Although this seems somewhat circuitous, the RB is quite a bit more robust that direct globus submission - it's the submission method used to process 10s of thousands of jobs a day on the EGEE grid.

However, for this to work you do have to join a VO. If your project does not yet have a VO then you may join the gridpp VO for testing purposes (use the browser which has your certificate in it). When you've verified that this method works we will help you to setup a real VO for your project.

  1. Email the DN of your certificate to [2] requesting access to the cluster. Important: You must agree to be bound by the latest version of the JSPG Grid Acceptable Use Policy document. Please state this clearly as part of your request. Access will not be granted otherwise.
  2. Once you hear that you've been granted access use a gsissh client to login to svr020.gla.scotgrid.ac.uk:2222.
  3. Compile or seed your application into the data area you've been given on svr020, which will be visible to the batch worker nodes when your job runs.
  4. Use edg-job-submit to submit a job into the local resource broker. This will manage submission to the local cluster and the recovery of output. See below for how to specify necessary job parameters. (There's a good overview of the process here: http://www.gridpp.ac.uk/deployment/users/ and particularly http://www.gridpp.ac.uk/deployment/users/submit.html.)

Using edg-job-submit manually suffers from the same scaling problems as the globus-job-submit method - it's much too tedious to manage large numbers of jobs this way.

Some experience has been gained locally for using a package called ganga, which is great for managing large numbers of jobs on the grid. We have prepared a Glasgow Ganga Quickstart Guide.

Users in a Supported VO

See the GridPP User Area and the gLite User Guide (under Workload Management) for a more comprehensive introduction to job submission, but an example JDL file to submit to Glasgow would be:

Executable = "test.sh";
Arguments = "some job parameters";
InputSandbox = {"test.sh"};
StdOutput = "test.out";
StdError = "test.err";
OutputSandbox = {"test.out", "test.err"};
VirtualOrganisation = "atlas";
Requirements = other.GlueCEUniqueID == "svr016.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs-atlas";

Obviously you should change the atlas parts to whatever your actual VO is.

User Interface

All of the above methods for job submission require a set of user grid software that, when installed on a machine, turn it into a grid User Interface (UI). If your project does not have a UI then you can ask for access to one supplied by ScotGrid. This is currently included in the local access package above (svr020.gla.scotgrid.ac.uk is a UI).

However, if you don't even have gsissh then we can enable vanilla ssh for you (with some restrictions). Please email your DN, an ssh v2 key and a hostname to [3] and we will grant access via normal non-gsi ssh. Access via normal ssh will only be allowed from a specific host (or small number of hosts), which is why we need a hostname from which you will access the cluster.

Security

We know you'd rather not think about it, but it's important.

As part of joining a VO you will be required to sign that VO's Acceptable Use Policy. We will only enable VOs whose AUP's are acceptable to us. If you wish login access to the cluster, then you also must agree to the JSPG AUP as stated above (read VO as my research project where necessary).

The two points of the AUP we wish to draw particular attention to are:

  1. You shall [...] protect your GRID credentials (e.g. private keys, passwords) i.e. you must use suitable passphrases on grid certificates and ssh keys.
  2. You shall immediately report any known or suspected security breach, which also includes informing us as a site. If there is a security emergency please inform the email addresses listed here.