CERN@school-frame-reader on the GridPP
Contents
Introduction
This program should read in some Timepix detector data output by the Pixelman software and convert it to a number of Comma Separated Value (CSV) files of the pixel X, Y and Counts recorded. The CSV files should then be compressed into a tar file ready for retrieval from the Grid.
Input
The program should take three files as input:
- The data file;
- The DSC file;
- The index file.
The grid job requires the following parameters:
- The locations of the three input files;
- The name of the output CSV file.
Output
The program should output:
- A tarred set of CSV files containing the pixel
X, Y, C
; - The standard output and standard error streams retrievable as readable files (i.e. .out, .err).
Getting the Code
Retrieving the source from Github
The code is available from Github:
git clone https://github.com/twhyntie/CERNatschool-frame-reader.git
Building the code
cd CERNatschool-frame-reader make
Running the code
The code comes complete with a shell script that will run the program and compress the output CVS into one tar file for retrieving later (i.e. from the Grid output).
./readframes.sh [path-to-data]/[root-name-of-data] [root-output-name]
For example, if you'd moved the data files data
(the frame data), data.dsc
(DSC file for data
) and data.idx
(index file), you'd run:
./readframes.sh data dataout
and end up with:
- dataout[x].csv (where x = 0, 1, ..., n-1 and n is the number of frames)
- dataout.tar
Getting the Data
A sample dataset has been uploaded to figshare here, and may be retrieved with the following commands:
wget http://files.figshare.com/977161/data wget http://files.figshare.com/977163/data.idx wget http://files.figshare.com/977162/data.dsc
Running on the GridPP
Running using the InputSandbox
This solution copies the shell script and the frame-reader
binary to the SE. The data files are also copied to the Storage Element using the InputSandbox
. The output, a compressed .tar
of the CSV files, is copied back to the OutputSandbox
along with the standard output and error.
Note: in order to run the frame-reader
it needs to be made executable by the shell script (which is the "Executable" specified by the JDL file). This is done here by adding a chmod a+x
line to the shell script.
Firstly, set up the frame-reader
software in your local area and download the test data from figshare into the same directory:
git clone https://github.com/twhyntie/CERNatschool-frame-reader.git cd CERNatschool-frame-reader make wget http://files.figshare.com/977161/data wget http://files.figshare.com/977163/data.idx wget http://files.figshare.com/977162/data.dsc
Then prepare your grid environment:
voms-proxy-init --voms cernatschool.org # Enter password when prompted. lcg-infosites wms # Find an available WMS export GLITE_WMS_WMPROXY_ENDPOINT=https://wms02.grid.hep.ph.ic.ac.uk:7443/glite_wms_wmproxy_server
Create the JDL file (named, for example, frame-reader-test001.jdl) for the job:
#############CERNatschool-frame-reader-test################# Executable = "readframes.sh"; Arguments = "data dataout"; StdOutput = "stdout.txt"; StdError = "stderr.txt"; InputSandbox = {"readframes.sh", "frame-reader", "data", "data.idx", "data.dsc"}; OutputSandbox = {"stdout.txt", "stderr.txt", "dataout.tar"}; ############################################################
Note that the files in the InputSandbox
should be in the same directory as your JDL file. The Arguments
provided are:
-
data
: the name of the input data file (in this case, "data
"); -
dataout
: root name for the output data. So the output files in the example above will bedataout0.csv
,dataout1.csv
, etc. and the .tar file will be calleddataout.tar
.
Submit the job, get the status of the job, and get the output from the job as follows:
glite-wms-job-submit -a -o jobIDfile frame-reader-test001.jdl #submit the job glite-wms-job-status -i jobIDfile #get job status glite-wms-job-output -i jobIDfile --dir joboutput #get job output
-
jobIDfile
is a file that contains the job information that you'll need to check on its status and retrieve the job output. -
joboutput
is the directory you specify where the job output (i.e. the .tar file and output text files) will be placed when you get the job output.
All being well, the job should run successfully and you should be able to retrieve and uncompress dataout.tar
to get the comma separated values of the pixels in the three example frames.
Requirements for next version
- Error handling: If not enough input arguments, have the script exit gracefully.
- Handle data in formats other than binary [X, Y, C].