CERN@school-frame-reader on the GridPP

From GridPP Wiki
Jump to: navigation, search

Introduction

This program should read in some Timepix detector data output by the Pixelman software and convert it to a number of Comma Separated Value (CSV) files of the pixel X, Y and Counts recorded. The CSV files should then be compressed into a tar file ready for retrieval from the Grid.

Input

The program should take three files as input:

  • The data file;
  • The DSC file;
  • The index file.

The grid job requires the following parameters:

  • The locations of the three input files;
  • The name of the output CSV file.

Output

The program should output:

  • A tarred set of CSV files containing the pixel X, Y, C;
  • The standard output and standard error streams retrievable as readable files (i.e. .out, .err).

Getting the Code

Retrieving the source from Github

The code is available from Github:

git clone https://github.com/twhyntie/CERNatschool-frame-reader.git

Building the code

cd CERNatschool-frame-reader
make

Running the code

The code comes complete with a shell script that will run the program and compress the output CVS into one tar file for retrieving later (i.e. from the Grid output).

./readframes.sh [path-to-data]/[root-name-of-data] [root-output-name]

For example, if you'd moved the data files data (the frame data), data.dsc (DSC file for data) and data.idx (index file), you'd run:

./readframes.sh data dataout

and end up with:

  • dataout[x].csv (where x = 0, 1, ..., n-1 and n is the number of frames)
  • dataout.tar

Getting the Data

A sample dataset has been uploaded to figshare here, and may be retrieved with the following commands:

wget http://files.figshare.com/977161/data
wget http://files.figshare.com/977163/data.idx
wget http://files.figshare.com/977162/data.dsc

Running on the GridPP

Running using the InputSandbox

This solution copies the shell script and the frame-reader binary to the SE. The data files are also copied to the Storage Element using the InputSandbox. The output, a compressed .tar of the CSV files, is copied back to the OutputSandbox along with the standard output and error.

Note: in order to run the frame-reader it needs to be made executable by the shell script (which is the "Executable" specified by the JDL file). This is done here by adding a chmod a+x line to the shell script.

Firstly, set up the frame-reader software in your local area and download the test data from figshare into the same directory:

git clone https://github.com/twhyntie/CERNatschool-frame-reader.git
cd CERNatschool-frame-reader
make
wget http://files.figshare.com/977161/data
wget http://files.figshare.com/977163/data.idx
wget http://files.figshare.com/977162/data.dsc

Then prepare your grid environment:

voms-proxy-init --voms cernatschool.org # Enter password when prompted.
lcg-infosites wms                       # Find an available WMS
export GLITE_WMS_WMPROXY_ENDPOINT=https://wms02.grid.hep.ph.ic.ac.uk:7443/glite_wms_wmproxy_server

Create the JDL file (named, for example, frame-reader-test001.jdl) for the job:

#############CERNatschool-frame-reader-test#################
Executable = "readframes.sh";
Arguments = "data dataout";
StdOutput = "stdout.txt";
StdError = "stderr.txt";
InputSandbox = {"readframes.sh", "frame-reader", "data", "data.idx", "data.dsc"};
OutputSandbox = {"stdout.txt", "stderr.txt", "dataout.tar"};
############################################################

Note that the files in the InputSandbox should be in the same directory as your JDL file. The Arguments provided are:

  1. data: the name of the input data file (in this case, "data");
  2. dataout: root name for the output data. So the output files in the example above will be dataout0.csv, dataout1.csv, etc. and the .tar file will be called dataout.tar.

Submit the job, get the status of the job, and get the output from the job as follows:

glite-wms-job-submit -a -o jobIDfile frame-reader-test001.jdl #submit the job
glite-wms-job-status -i jobIDfile                             #get job status
glite-wms-job-output -i jobIDfile --dir joboutput             #get job output

  • jobIDfile is a file that contains the job information that you'll need to check on its status and retrieve the job output.
  • joboutput is the directory you specify where the job output (i.e. the .tar file and output text files) will be placed when you get the job output.

All being well, the job should run successfully and you should be able to retrieve and uncompress dataout.tar to get the comma separated values of the pixels in the three example frames.

Requirements for next version

  • Error handling: If not enough input arguments, have the script exit gracefully.
  • Handle data in formats other than binary [X, Y, C].

Useful Links