VO specific software on the Grid

From GridPP Wiki
Jump to: navigation, search

Note: This document is a DRAFT.

Introduction

The standard way of distributing software on the grid is via CVMFS. This assumes that the software if of production quality, i.e. this method not for debugging.

Software distribution via CVMFS

If a site supports a given VO, it will support the CVMFS repositories you need. There are two caveats to this though:

  • It is the VO's responsibility to let the sites know which repositories it requires as this is VO specific. Ideally the person looking after your CVMFS area should do this. However if you (as the user) find that you require a repository that is not present at a site, you can ticket the site via GGUS or if you don't know where to start, please email [1].
  • A VO should not rely on CVMFS repositories of other experiments that might be incidentally present at some sites. The maintainers of other repositories might remove software without notice.

How does uploading the software work ?

Typically a VO designates a couple of people to upload the software to the CVMFS server from which it is distributed to the sites. This person is the assigned a special 'role' on the voms server (usually 'lcgadmin') which enables them to upload the software.

The only detail a user (client) has to know is how the repository(-ries) are mapped on Worker Nodes. In this article we will use the gridpp VO repository, which is mapped to /cvmfs/gridpp.egi.eu/ . A VO software administrator uploaded a following example python script and saved it as testing/hello.py :


#!/usr/bin/env python
import sys

print "----------------------"
print "Hello, I'm a snake !  /\/\/o"
print "----------------------"

print " More info:\n"

print (sys.version)

#
 

It normally takes a few hours before uploaded software becomes available to clients. Now we need to create a job wrapper (run_hello_cvmfs.sh) which will be submitted as a Dirac executable:


#!/bin/bash
#
# Run the Python script.
export GRIDPP_VO_CVMFS_ROOT=/cvmfs/gridpp.egi.eu/testing/HelloWorld
if [ -d "$GRIDPP_VO_CVMFS_ROOT" ]; then
   $GRIDPP_VO_CVMFS_ROOT/hello.py
else
   echo "Requester CVMFS directory does not exist $GRIDPP_VO_CVMFS_ROOT  "
   exit 1
fi
#


The last step is to create a Dirac jdl file (hello_cvmfs.jdl):

[
JobName = "Snake_Job_CVMFS";
Executable = "run_hello_cvmfs.sh";
Arguments = "";
StdOutput = "StdOut";
StdError = "StdErr";
InputSandbox = {"run_hello_cvmfs.sh"};
OutputSandbox = {"StdOut","StdErr"};
]

In the jdl we define the executable (run_hello_cvmfs.sh) which is shipped with the job in the input sandbox. Now we can submit our first CVMFS job:

dirac-wms-job-submit -f logfile hello_cvmfs.jdl

Check its status, which in our case returned:

dirac-wms-job-status -f logfile
JobID=5213546 Status=Running; MinorStatus=Job Initialization; Site=VAC.UKI-LT2-RHUL.uk;

When job finishes, we can grab the output (dirac-wms-job-get-output -f logfile), which reads:

----------------------
Hello, I'm a snake !  /\/\/o
----------------------
 More info:

2.7.12 (default, Dec 17 2016, 21:07:48) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-17)]

As stated above although CVMFS provides an easy access to experiment's software it is not well suited for rapid software changes. Updates are typically visible a few hours after uploading. It is best suited for distributing well tested software. In some cases however it might be necessary to apply quick patches or test different external libraries etc. One way of achieving this task is to use code versioning systems, i.e. git. We'll cover this topic in the next section.

Using Code Versioning Systems (example: git)

We'll try to use git to access hour software. This will be still the same trivial Python script as used above. Clearly trying to pull in a few GB of code, building it on every WN and submitting 1000 jobs for "test" is not a use case described here.

Our job wrapper will look like this (run_hello.py):

#!/bin/bash
#
# Get the Python script from Github:
wget https://github.com/martynia/HelloWorld/archive/master.zip
unzip master.zip
cp HelloWorld-master/hello.py .
./hello.py
#

And the jdl:

[
JobName = "Snake_Job";
Executable = "run_hello.sh";
Arguments = "";
StdOutput = "StdOut";
StdError = "StdErr";
InputSandbox = {"run_hello.sh"};
OutputSandbox = {"StdOut","StdErr"};
]

This method does not require the VO to host its own git installation, we can just get the zipped software bundle.

Alternatively we could try to use CERN CVMFS git installation, which is located (at the time of writing) at: /cvmfs/sft.cern.ch/lcg/git-2.9.3/. We would need to replace the wget line in the job wrapper above with the git invocation:

/cvmfs/sft.cern.ch/lcg/git-2.9.3/git clone https://github.com/martynia/HelloWorld.git

And submit a job in a usual way.