Glasgow Ganga Quickstart Guide

From GridPP Wiki
Jump to: navigation, search

Ganga Links

Also useful are introductions to python, e.g., the python tutorial. The iPython Documentation points out where iPython syntax differs from normal python scripts.

HOWTO for Glasgow

Getting Started

  1. Login to svr020 using gsissh or ssh
  2. Type ganga
    1. Say yes to setup the standard config files

Your First Ganga Job

In [3]:j1 = Job(application=Executable(exe='/bin/echo',args=['Hello, World']))

In [4]:j1.submit()
Ganga.GPIDev.Lib.Job               : INFO     submitting job 0
Ganga.GPIDev.Adapters              : INFO     submitting job 0 to Local backend
Ganga.GPIDev.Lib.Job               : INFO     job 0 status changed to "submitted"
Out[4]: 1

In [5]:
Ganga.GPIDev.Lib.Job               : INFO     job 0 status changed to "running"
Ganga.GPIDev.Lib.Job               : INFO     job 0 status changed to "completed"

In [5]:print file(j1.outputdir+'stdout').read()
Hello, World

Note this job ran locally on the UI, which is not too interesting.

Your First Ganga Grid Job

Prequel

You should quit ganga and edit the VirtualOrganisation stanza in .gangarc to your VO, e.g.,

 VirtualOrganisation = gridpp

You should also ensure that ganga maintains the validity of your grid proxy, so in the [GridProxy_Properties] section uncomment the lines validityAtCreation and minValidity, putting, e.g.,

 validityAtCreation = 36:00
 minValidity = 24:00

(See also the later section on certificates.)

LCG Backend

Running jobs on the grid is easy - just change the job's backend to LCG:

In [2]:gridJob=Job(backend=LCG(), application=Executable(exe='/bin/echo',args=['Hello, World']))

Targeting Glasgow

The above job can run anywhere your VO is supported. However, if you are preparing an environment to specifically target Glasgow, then you need to tell ganga not to send the job anywhere else. Do this by adding the CE's queue name to the job:

In [5]:gridJob.backend.CE='svr016.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs-gridpp'

(Change gridpp to the name of your VO, or the queue which your VO can access - you can check the queues in the information system monitor.)

Submitting The Job

Now run the job:

In [6]: gridJob.submit()
Ganga.GPIDev.Lib.Job               : INFO     submitting job 2
Ganga.GPIDev.Adapters              : INFO     submitting job 2 to LCG backend
Ganga.GPIDev.Lib.Job               : INFO     job 2 status changed to "submitted"

Ganga will submit the job to our resource broker, then poll it for status changes for you. When the job is done the output is retrieved and stored in the ganga work directory.

Ganga.GPIDev.Lib.Job               : INFO     job 2 status changed to "running"
Ganga.GPIDev.Lib.Job               : INFO     job 2 status changed to "completing"
Ganga.GPIDev.Lib.Job               : INFO     job 2 status changed to "completed"

This is much more convenient that having to poll edg-job-status by hand.

Job Output

Each job has an outputdir, and all output from the job will be stored here. You can process this inside ganga, using standard python, or (more likely), process the output offline with other tools.

By default, ganga will store jobs' outputs in ~/gangadir/workspace/Local/JOB_ID/output, where JOB_ID is a sequential job number.

Wrapper Scripts and Sandboxes

Wrapper to Start a Prepared Binary

When the job wakes up in the batch system it's probably not in the working directory you expect - it will usually be in a scratch directory for the job.

If you have prepared binaries in your $CLUSTER_SHARED area, and perhaps some input files and output directories, you might want to use a wrapper script that moves to the right directory, then starts up the correct code.

Here's an example, which uses some environment variables to make sure the job is running in a unique directory:

#! /bin/bash
#
# Make a structured directory to run the job in - the job's output files should go somewhere sensible
BASE_DIR=$CLUSTER_SHARED/sieve/run
cd $BASE_DIR || exit 1
JOB_DIR="$(date +'%Y-%m-%d')/$PBS_JOBID"
mkdir -p $JOB_DIR || exit 1
cd $JOB_DIR || exit 1

# Now invoke the program
BINARY=$CLUSTER_SHARED/sieve/sieve
echo "Invoking $BINARY $@"
$BINARY "$@"
if [ $? == "0" ]; then
    echo "All done. Make tea..."
else
    echo "$BINARY failed with status $?. Oh dear..."
fi

If this wrapper is in, say, CLUSTER_SHARED/bin/sievewrapper.sh then the ganga job can be defined as:

In [53]: import os

In [54]:sieveJob=Job(backend=LCG(CE='svr016.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs-gridpp'),
   ....: application=Executable(exe=os.environ['CLUSTER_SHARED']+'/wrappers/sievewrap.sh',args=['-s', '1000', '-e', '1000000000']))

A Little Python Aside

iPython is a fully functioning python shell, so it takes all the normal python commands. In the last example we imported the os module, which allows us to access the environment variables, such as CLUSTER_SHARED within python using os.environ.

Sandboxes

If you are just running on the Glasgow cluster then you probably don't need sandboxes (sets of files copied to/from the batch system with the job) - just work in the CLUSTER_SHARED directory. However, they can be useful, so here's how to use them:

Input Sandboxes

When a job's defined as

 Executable(exe='/bin/echo', ...)

then it's the binary on the remote system which is executed. If you want to send a wrapper script with the job, then tell ganga the exe is a File:

 Executable(exe=File('~/wrappers/sievewrap.sh'), ...)

Then the sievewrap.sh script is parceled up with the job and sent along with it. (In standard EGEE speak the file becomes part of the job's input sandbox.)

You can add other files to the job's sandbox using, e.g.,

 gridJob.inputsandbox=[File('~/inputs/myJobInputs.dat')]

Again, these files will be in the job's working directory when the job starts.

Output Sandboxes

Output sandboxes are files which will be retrieved from the batch system once a job has run. They will be passed back to you as files in the output directory of that job.

 gridJob.outputsandbox=['someOutput.txt', 'jobLogs.*']

Grid Certificates in Ganga

Controlling Certificate Lifetime

Grid jobs need to have a valid proxy certificate for the entire lifetime of the job - and this has to include any queuing time. You can ensure that ganga will submit certificates with suitable lifetimes by changing the parameters in .gangarc. E.g., if your job takes 24 hours to run, and you want to allow for 24 hours of time in the queue, then perhaps

 [GridProxy_Properties]
 
 # Proxy validity at creation (hh:mm)
 validityAtCreation = 72:00
 
 # Minimum proxy validity (hh:mm), below which new proxy needs to be created
 minValidity = 48:00

Ganga will now refuse to submit the jobs unless a proxy of 48 hours exists. Use

 gridProxy.renew()

to renew your proxy. gridProxy.info() will tell you how much time is left.

Using a MyProxy Server

The above method is quite risky, in that it exposes long lived proxies on sites. Much better than this is to upload a long lived proxy to a MyProxy server. Then the Glasgow resource broker will renew proxies for jobs which are running short. The default MyProxy server on svr020 is hosted at RAL Tier1 and the command to upload a proxy certificate to here is:

 $ myproxy-init -d -n

The default lifetime of the proxy is 7 days. This can be increased, but it's better to just renew it as necessary.

To get information about your uploaded proxy use myproxy-info -d or to delete an uploaded proxy use myproxy-destroy -d

For more details see https://edms.cern.ch/file/722398/1.1/gLite-3-UserGuide.html (Section on Proxy Renewal).

Bulk Job Submission

Ganga includes a very simple job splitter, which can be used to take an array of jobs, each with different input parameters, and then submit them in bulk to the cluster.

It's easiest to illustrate with a simple example:

In [87]:import os

In [88]:jobArray = list()

In [89]:for n in range(20):
   ....:     jobArray.append(Executable(exe=os.environ['CLUSTER_SHARED']+'/bin/myapp', args=["--verbose", "--logFile=runZ%03d.log" % n]))
   ....:     

In [90]:jobArray[1]
Out[90]: Executable (
 exe = '/cluster/share/gla012/bin/myapp' ,
 env = {} ,
 args = ['--verbose', '--logFile=runZ001.log'] 
 ) 

Note we:

  1. Need to import the os module to get access to the environment list.
  2. Use the string % operator to zero pad the log file name. (See the python manual.)

Now we use an ExeSplitter to define our multi-part job:

In [92]:bulkGridJob=Job(splitter=ExeSplitter(apps=jobArray), \
          backend=LCG(CE='svr016.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs-gridpp'))

In [93]:bulkGridJob.submit()
Ganga.GPIDev.Lib.Job               : INFO     submitting job 19
Ganga.GPIDev.Adapters              : INFO     submitting job 19.0 to LCG backend
Ganga.GPIDev.Lib.Job               : INFO     job 19.0 status changed to "submitted"
Ganga.GPIDev.Adapters              : INFO     submitting job 19.1 to LCG backend
Ganga.GPIDev.Lib.Job               : INFO     job 19.1 status changed to "submitted"
Ganga.GPIDev.Adapters              : INFO     submitting job 19.2 to LCG backend
...

The submission of each sub-job is done separately, which can take a little time. As usual ganga will take care of polling the status of each job and retrieving the output when it becomes available.

In this way ganga can control the submission of several 100 jobs quite easily.

Note that the output of each subjob will be found in a numbered subdirectory of the main controlling job (in this case, job 19):

In [97]:bulkGridJob.subjobs[1].outputdir
Out[97]: /clusterhome/home/gla012/gangadir/workspace/Local/19/1/output/

And all the other job parameters can be queried in the same way:

In [99]:bulkGridJob.subjobs[1].backend       
Out[99]: LCG (
 status = 'Scheduled' ,
 reason = 'Job successfully submitted to Globus' ,
 iocache = '' ,
 CE = 'svr016.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs-gridpp' ,
 middleware = 'EDG' ,
 actualCE = 'svr016.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs-gridpp' ,
 id = 'https://svr023.gla.scotgrid.ac.uk:9000/S_RBomCRMwFN0kG_rUp7Gg' ,
 jobtype = 'Normal' ,
 exitcode = None ,
 requirements = LCGRequirements (
    other = [] ,
    nodenumber = 1 ,
    memory = None ,
    software = [] ,
    ipconnectivity = 0 ,
    cputime = None ,
    walltime = None 
    ) 
 ) 

Disconnecting and Reconnecting

Starting Up Again

Ganga keeps all state about your jobs in ~/gangadir. When you restart ganga it will reread the last state and take appropriate actions (querying running job statuses, downloading outputs, etc.). However, it will have forgotten local names for your jobs, but you can reset these using the jobs object, which contains all of your jobs.

svr020:~$ ganga

*** Welcome to Ganga ***
Version: Ganga-4-4-1
Documentation and support: http://cern.ch/ganga
Type help() or help('index') for online help.

This is free software (GPL), and you are welcome to redistribute it
under certain conditions; type license() for details.

Ganga.GPIDev.Lib.JobRegistry       : INFO     Found 3 jobs in jobs
Ganga.GPIDev.Lib.JobRegistry       : INFO     Found 0 jobs in templates


In [1]:jobs
Out[1]: Statistics: 3  jobs
--------------
#   id      status        name   subjobs      application          backend                               backend.actualCE  
#    0      failed                             Executable              LCG  svr016.gla.scotgrid.ac.uk:2119/jobmanager-lcg  
#    1   completed                             Executable              LCG  svr016.gla.scotgrid.ac.uk:2119/jobmanager-lcg  
#    2         new                             Executable              LCG                                                 

In [2]: myJob=jobs(2)

In [3]: myJob.submit()

...


Screen

It's also possible to run your ganga session in screen, which allows you to disconnect and logout, while ganga still runs. You can then reconnect when you log back in (possibly from a different machine). There's a nice screen tutorial here. N.B. to reattach to a screen running on svr020 use:

 screen -r