Guide to Ganga

From GridPP Wiki
Jump to: navigation, search

Introduction

This is a guide to installing and configuring the Ganga Job Management tool for use with both local batch systems and the DIRAC workload management system. It's maintained by Mark Slater (mws<AT>hep.ph.bh.bham.ac.uk) - please email if you have any comments/problems!

For a general overview talk on the Grid, Dirac and Ganga, please see this talk

For more info and more in depth user guides, please visit the main Ganga website http://ganga.web.cern.ch/ganga/

Requirements

Before you start using Ganga (assuming you want to use it to submit jobs to the grid rather than just for local batch system submission), there are a few steps you need to go through:

Installation and Configuration

Here are the steps to download and configure Ganga:

  • Download the install script from the Ganga website and make it executable:
wget http://ganga.web.cern.ch/ganga/download/ganga-install
chmod +x ganga-install
  • Run the script with the external plugins you want to include to download and install Ganga at ~/Ganga. Generally, this will be the GangaDirac plugin:
./ganga-install --extern=GangaDirac LAST
  • Now run Ganga with the -g flag to create the default .gangarc file:
/home/<username>/Ganga/install/<version>/bin/ganga -g -o[Configuration]RUNTIME_PATH=GangaDirac
  • To configure Ganga to submit using your DIRAC client installation, setup the DIRAC client and export the environment to a file for Ganga to use:
source ~/dirac/bashrc
env > ~/dirac/envfile
  • Now edit your .gangarc file and set the following option:
[Configuration] RUNTIME_PATH = GangaDirac
[Dirac] DiracEnvFile = /home/<username>/dirac/envfile
[defaults_GridCommand]info = dirac-proxy-info
[defaults_GridCommand]init = dirac-proxy-init -g <dirac user group>
  • Now setup the DIRAC client (as you should before running Ganga if you want to use it) and then run Ganga. It should ask you to generate a proxy and then leave you at the IPython prompt:
source ~/dirac/bashrc
/home/<username>/Ganga/install/<version>/bin/ganga
  • To test that all is working, try to submit a basic job to the local machine you're running and then to DIRAC:
Job().submit()
Job( backend=Dirac() ).submit()

Getting Started

Ganga is a general job management tool to help with the submission, monitoring and manipulation of jobs to different systems. It is based on the idea of plugins that tell a Job what to run (Application), Where to run (Backend), how to run (Splitter and PostProcessor) and what data to use (InputFiles and OutputFiles). It is written almost entirely in Python and either the modified IPython prompt or scripts can be used to control it.

To start, we'll submit a default job that will go to the 'Local' backend (i.e. the machine you are using at present). Start ganga as above and then enter the following:

j = Job()
j.submit()

You should (almost immediately) have the job submit, start running and then complete. By default, the stdout/err are copied back with your job and stored in the Ganga workspace. To view them, you can use the following:

j.peek("stdout", "emacs")    # open any file in the j.outputdir with the given command
!emacs $j.outputdir/stdout   # Use '!' to give a shell command and '$' for an IPython command

This default job object uses the 'Executable' application with the exe set to 'echo' and the arguments set to 'Hello World'. To run your own scripts, do the following:

j = Job()
j.application = Executable()
j.application.exe = '/path/to/script'
j.application.args = [ ... ]
j.submit()

To view the jobs that you have created, use the 'jobs' command. This gives a list of the job objects along with their status. You can also use this to access the jobs themselves and view all the information about them, e.g.

jobs
j = jobs(0)    # grab jobs object ID 0
j              # view the object
j.application
j.backend

To get more information about the different objects and plugins, use the 'help' system:

help()
help(Job)
help(Executable)
plugins("applications")
plugins("backends")


Input and Output Data

Submitting to Different Backends

Using Queues to Speed Up Submission

When submitting to some backends, DIRAC included, it can take a bit of time to go through the whole submission process. When you have 10s-1000s of jobs to submit, this can become a significant problem. You can greatly speed things up by using the Ganga queues system to submit your jobs in parallel, e.g.:

for i in range(0, 10):
   j = Job( backend = Dirac() )
   queues.add(j.submit)

You can view the threads Ganga knows about by using the 'queues' command. To configure the number of queues, use:

[DIRAC] NumWorkerThreads

You can add any function call to the queues system to run in the background. To get more info, use help(queues).

Using Tasks for Automated Submission

Using Ganga as a Service