Running BaBar SP Jobs on the Grid

From GridPP Wiki
Jump to: navigation, search

The SPGrid tools are designed to use as much of the current utilities as possible so once the submission site has been set up the process of building, running, checking, merging and exporting runs is very similar to that of a "standard" site submitting jobs to a local batch system.

Building Runs

Currently a slightly modified version of the spbuild that knows how to create the jdl files is needed to build runs for the grid. It adds the --grid option to cause this to happen.

In general running it is used exactly the same as the standard spbuild command (with the exception of the --grid option).

spbuild-grid --grid --user <username> <run list>
spbuild-grid --grid --user <username> -n <number>

Choosing Where to Send Runs

The --grid option accepts a string that modifies ths JDL and can be used to direct the jobs to a specific site or sites with a specific "Tag"

Use the CE= form to direct a job to a specific Compute Element.

spbuild-grid --grid CE=lcgce01.gridpp.rl.ac.uk:2119/jobmanager-lcgpbs-babarL700 --user <username> <run list>

Would build a run that would be sent only to the babarL700 queue at the RAL Tier 1

Use the SRTE= form to specify a "Tag" that sites must have to run this job - this is in addition to the standard tags of software release CondDB version.

spbuild-grid --grid SRTE=VO-babar-chris --user <username> <run list>

Choosing a Resource Broker

To locate Resource Brokers supporting BaBar:

ldapsearch -x -H ldap://lcgbdii02.gridpp.rl.ac.uk:2170 -b mds-vo-name=local,o=grid \
'(&(GlueServiceType=ResourceBroker)(GlueServiceAccessControlRule=atlas))' GlueServiceEndpoint

Submitting Runs

Submission to the Grid is exactly the same as for local running

spsub -y <run list>

It returns the Grid Job ID on the command line.

Monitoring Runs

After the runs have been submitted spgridjobs can be used to check on their status.

spgridjobs [--check] [--update] [--summary] [<run list>]

With no command line options spgridjobs gets the status of the run from the status.txt, --check forces it to use edg-job-status to query the status of each job, --update updates the status.txt file if necessary and --summary produces a table showing the number of runs in each state at each site.

Retrieving Runs

Once the run has completed you need to copy the data for the Storage Element it was written to at the end of the job and get the Standard Error and Standard Out from the Resource Broker. This is done with the command spunpack:

spunpack [--quiet] [--procspec <procspec>] <run list>


Merging and Exporting Runs

Once you have processed enough runs the merge and export processes are exactly the same as for the local production.

spmerge --user <user> [--verbose] [--debug] [--norun] [--number n]
spexport --user <user> [--verbose] [--debug] [--norun] [--number n]

Chris brew 17:06, 26 Apr 2006 (BST)