VO specific software on the Grid

From GridPP Wiki
Jump to: navigation, search

Note: This document is a DRAFT. The person responsible for the document is Janusz M. If you have any suggestions, please email him. If you find something on this page that is plainly wrong, please just correct it :-)

Introduction

The standard way of distributing software on the grid is via CVMFS. This assumes that the uploaded software is of production quality, i.e. this method not suitable for debugging, which ideally should be done locally.
There might be cases where distribution via CVMFS is not suitable, so we list alternative methods as well. These however should be used with caution. It's very easy to inadvertently overload a network link, and it will get you banned, possibly on both ends of the connection.

Software distribution via CVMFS

If a site supports a given VO, it will support the CVMFS repositories you need. There are two caveats to this though:

  • It is the VO's responsibility to let the sites know which repositories it requires as this is VO specific. Ideally the person looking after your CVMFS area should do this. However if you (as the user) find that you require a repository that is not present at a site, you can ticket the site via GGUS or if you don't know where to start, please email this mailing list.
  • A VO should not rely on CVMFS repositories of other experiments that might be incidentally present at some sites. The maintainers of other repositories might remove software without notice.

How does uploading the software work ?

Typically a VO designates a couple of people to upload the software to the CVMFS server from which it is distributed to the sites. This person is the assigned a special 'role' on the voms server (usually 'lcgadmin') which enables them to upload the software. An example on how to upload software to servers at RAL under the egi.eu domain is given here: Managing_a_CVMFS_area_at_RAL

Accessing software on CVMFS from a grid job

The only detail a user (client) has to know is how the repository(-ries) are mapped on Worker Nodes. In this article we will use the gridpp VO repository, which is mapped to /cvmfs/gridpp.egi.eu/ . A VO software administrator uploaded a following example python script and saved it as testing/hello.py :


#!/usr/bin/env python
import sys

print "----------------------"
print "Hello, I'm a snake !  /\/\/o"
print "----------------------"

print " More info:\n"

print (sys.version)

#
 

It normally takes a few hours before uploaded software becomes available to clients. Now we need to create a job wrapper (run_hello_cvmfs.sh) which will be submitted as a Dirac executable:


#!/bin/bash
#
# Run the Python script.
export GRIDPP_VO_CVMFS_ROOT=/cvmfs/gridpp.egi.eu/testing/HelloWorld
if [ -d "$GRIDPP_VO_CVMFS_ROOT" ]; then
   $GRIDPP_VO_CVMFS_ROOT/hello.py
else
   echo "Requester CVMFS directory does not exist $GRIDPP_VO_CVMFS_ROOT  "
   exit 1
fi
#


The last step is to create a Dirac jdl file (hello_cvmfs.jdl):

[
JobName = "Snake_Job_CVMFS";
Executable = "run_hello_cvmfs.sh";
Arguments = "";
StdOutput = "StdOut";
StdError = "StdErr";
InputSandbox = {"run_hello_cvmfs.sh"};
OutputSandbox = {"StdOut","StdErr"};
]

In the jdl we define the executable (run_hello_cvmfs.sh) which is shipped with the job in the input sandbox. Now we can submit our first CVMFS job:

dirac-wms-job-submit -f logfile hello_cvmfs.jdl

Check its status, which in our case returned:

dirac-wms-job-status -f logfile
JobID=5213546 Status=Running; MinorStatus=Job Initialization; Site=VAC.UKI-LT2-RHUL.uk;

When job finishes, we can grab the output (dirac-wms-job-get-output -f logfile), which reads:

----------------------
Hello, I'm a snake !  /\/\/o
----------------------
 More info:

2.7.12 (default, Dec 17 2016, 21:07:48) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-17)]

As stated above although CVMFS provides an easy access to experiment's software it is not well suited for rapid software changes. Updates are typically visible a few hours after uploading. It is best suited for distributing well tested software. In some cases however it might be necessary to apply quick patches or test different external libraries etc. One way of achieving this task is to use code versioning systems, i.e. git. We'll cover this topic in the next section.

Using Code Versioning Systems (example: git)

We'll try to use git to access our software. This will be still the same trivial Python script as used above. Clearly trying to pull in a few GB of code, building it on every WN and submitting 1000 jobs for "test" is not a use case described here.

Our job wrapper will look like this (run_hello.py):

#!/bin/bash
#
# Get the Python script from Github:
wget https://github.com/martynia/HelloWorld/archive/master.zip
unzip master.zip
cp HelloWorld-master/hello.py .
./hello.py
#

And the jdl:

[
JobName = "Snake_Job";
Executable = "run_hello.sh";
Arguments = "";
StdOutput = "StdOut";
StdError = "StdErr";
InputSandbox = {"run_hello.sh"};
OutputSandbox = {"StdOut","StdErr"};
]

This method does not require the VO to host its own git installation, we can just get the zipped software bundle.

Alternatively we could try to use CERN CVMFS git installation, which is located (at the time of writing) at: /cvmfs/sft.cern.ch/lcg/git-2.9.3/. We would need to replace the wget line in the job wrapper above with the git invocation:

/cvmfs/sft.cern.ch/lcg/git-2.9.3/git clone https://github.com/martynia/HelloWorld.git

And submit a job in a usual way.

Storing software on a Storage Element (SE)

If your software is small you can (temporarily) store in on an SE. Assuming you are using DIRAC you can use the DIRAC data management tools (see e.g. here: Quick_Guide_to_Dirac) to upload your files to an SE that your VO has access to. It is recommended that you replicate your software to multiple SEs, to avoid overloading a single instance and to have the software present at the site you are running at. Note that you cannot run your software directly from the SE, you will have to copy it to the worker node your job runs first.