Difference between revisions of "A quick guide to CVMFS"

From GridPP Wiki
Jump to: navigation, search
(Removed the web interface instructions and updated with the gsi commands.)
(Added the Python worked example.)
Line 1: Line 1:
 
==Deploying software with CVMFS==
 
==Deploying software with CVMFS==
 +
 
For more information about CVMFS at RAL, click [[RAL Tier1 CVMFS|here]].
 
For more information about CVMFS at RAL, click [[RAL Tier1 CVMFS|here]].
  
 
===Overview of the process===
 
===Overview of the process===
  
# Prepare your working area
+
* Prepare your software;
# Prepare your software
+
* Deploy your software to the CVMFS repository;
# Upload your software to the RAL CVMFS repository and deploy it
+
* Prepare for job submission;
# Wait a bit...
+
* Submit your job(s).
# Run your CVMFS-powered jobs
+
  
===A trivial example===
+
==A worked example==
  
====Preparing your working area====
+
Here we will demonstrate the full process of deploying and running software with CVMFS using a Python script and some sample CERN@school data. All of the code is available via the GridPP GitHub repository - please feel free to adapt and modify for your own needs!
  
Log in to your machine of choice and create a new working area.
+
===Prepare your software===
 +
In order to get your software running on the grid, you'll need to bundle it up into a tarball (<code>.tgz</code>) so that it's ready to upload to the RAL CVMFS stratum-1 server. This tarball will need to include the scripts, executables and libraries you need, all compiled to run on a 64-bit SL6 machine. For convenience, we have provided an example using Python in the GridPP GitHub repository [https://github.com/gridpp/cvmfs-test-001-00-00 cvmfs-test-001-00-00]. You can get this with:
  
 
<pre>
 
<pre>
$ ssh -Y whyntie@heppc402
+
$ cd $CVMFS_UPLOAD_DIR # choose a suitable location for this.
whyntie@heppc402's password: # enter your password here
+
$ wget https://github.com/gridpp/cvmfs-test-001-00-00/archive/master.zip -O cvmfs-test-001-00-00-master.zip
 
+
$ unzip cvmfs-test-001-00-00-master.zip
$ mkdir cvmfstests
+
$ rm cvmfs-test-001-00-00-master.zip
$ cd cvmfstests
+
$ tar -cvf cvmfs-test-001-00-00.tgz cvmfs-test-001-00-00-master/
$ mkdir helloworld
+
$ cd helloworld
+
$ pwd
+
/users/whyntie/cvmfstests/helloworld
+
 
</pre>
 
</pre>
  
====Preparing your software====
+
This contains:
  
Create a new directory that will form the basis of your CVMFS tarball.
+
* <code>process-frame.py</code>: a simple Python script to process a frame of CERN@school Timepix data, either uploaded with the job or retrived from a Storage Element (SE);
 +
* <code>lib</code>: some pre-compiled Python libraries for non-standard Python modules used by <code>process-frames.py</code>.
  
<pre>
+
The idea is that <code>process-frame.py</code> will run remotely on the grid, using the non-standard Python libraries supplied with the CVMFS repository. This saves having to install the modules on each Computing Element (CE) every time you want to run a grid job. You will need to do compile and supply the libraries you need when assembling your own tarballs.
$ mkdir hello-world_001-00-00
+
$ cd hello-world_001-00-00
+
$ pwd
+
/users/whyntie/cvmfstests/helloworld/hello-world_001-00-00
+
</pre>
+
  
Create three files in this directory:
 
  
* <code>hello-world.sh</code> - the "software";
+
===Deploy your software to the CVMFS repository===
* <code>run.sh</code> - the script that runs the software;
+
* <code>README.md</code> - for your notes, ideally in the [http://daringfireball.net/projects/markdown/syntax MarkDown] format.
+
  
<pre>
+
With your tarball prepared, you can now upload it to the RAL CVMFS stratum-1 by generating a grid proxy, <code>gsiscp</code>-ing the tarball over, and unpacking the tarball in your repository:
$ cat hello-world.sh
+
#!/bin/bash
+
#
+
#=============================================================================
+
#                    The GridPP CVMFS Hello World! Script
+
#=============================================================================
+
#
+
# Usage: . hello-world.sh [whoever you want to greet]
+
#
+
echo 'Hello' $1'!'
+
$
+
$
+
$ cat run.sh
+
#!/bin/bash
+
#
+
#=============================================================================
+
#                    The GridPP CVMFS Hello World! run script
+
#=============================================================================
+
#
+
# Usage: . run.sh [full path of the hello-world.sh script]
+
#
+
# Note that for grid jobs (or clusters with CVMFS enabled) this will be
+
# the CVMFS directory.
+
#
+
$1/hello-world.sh World
+
$
+
$
+
$ cat README.md
+
My CVMFS Test Notes
+
===================
+
You're keeping detailed notes, right? Good good.
+
</pre>
+
 
+
Don't forget to change the permissions on the scripts so that they can be run.
+
  
 
<pre>
 
<pre>
$ chmod a+x run.sh
+
$ voms-proxy-init --voms [your VO name, e.g. cernatschool.org]
$ chmod a+x hello-world.sh
+
$ gsiscp -P 1975 cvmfs-test-001-00-00.tgz cvmfs-upload01.gridpp.rl.ac.uk:./cvmfs_repo/.
</pre>
+
$ gsissh -p 1975 cvmfs-upload01.gridpp.rl.ac.uk
  
Now compress these files into a tarball.
+
$ cd cd cvmfs_repo/
 
+
$ tar -xvf cvmfs-test-001-00-00.tgz
<pre>
+
$ cd ../
+
$ tar -czf hello-world_001-00-00.tar hello-world_001-00-00
+
$ ls
+
hello-world_001-00-00 hello-world_001-00-00.tgz
+
 
</pre>
 
</pre>
  
====Uploading and deploying the software====
+
Your software has now been deployed. However, it may take up to '''three hours''' for the cron jobs to deploy it to all sites - be patient!
  
Create a proxy with your Virtual Organisation (VO):
+
===Prepare for job submission===
  
<pre>
+
You will need to setup whichever User Interface (UI) you use for submitting grid jobs and generate an appropriate proxy. We will use [[Quick_Guide_to_Dirac|DIRAC]] in this example.
$ voms-proxy-init --voms cernatschool.org
+
</pre>
+
 
+
Then copy the tar ball to your CVMFS repository with the <code>gsiscp</code> command:
+
  
 
<pre>
 
<pre>
$ gsiscp -P 1975 hello-world_001-00-00.tgz cvmfs-upload01.gridpp.rl.ac.uk:./cvmfs_repo/.
+
$ cd $DIRAC_DIR
 +
$ . bashrc # set the DIRAC environment variables
 +
$ dirac-proxy-init -g [VO name]_user -M
 
</pre>
 
</pre>
  
You should then be able to log on and decompress the tar ball into the repository. Your proxy will take care of the username and password.
+
For convenience, we have prepared an example JDL file and test data that will run with the software deployed above. You can get this from the GridPP GitHub repository [https://github.com/gridpp/cvmfs-getting-started cvmfs-getting-started]:
  
 
<pre>
 
<pre>
$ gsissh -p 1975 cvmfs-upload01.gridpp.rl.ac.uk
+
$ cd $CVMFS_SUBMIT_DIR
$ tar -xvf hello-world_001-00-00.tgz
+
$ git clone https://github.com/gridpp/cvmfs-getting-started.git
 +
$ cd cvmfs-getting-started
 
</pre>
 
</pre>
  
'''Note''': it may take up to ''three hours'' for your software to appear, depending on the timing of the cron jobs.
+
This contains:
  
<!--
+
* <code>dirac-test.jdl</code>: a job description file for DIRAC users;
 +
* <code>glite-test.jdl</code>: a job description file for glite users;
 +
* <code>run.sh</code>: the script that sets the environment variables and runs the software in CVMFS.
  
Once you have supplied the RAL team with your grid certificate DN, visit the [https://cvmfs-upload01.gridpp.rl.ac.uk/ RAL CVMFS repository]. You will need your grid certificate installed in your browser in order to be identified.
+
In a nutshell, the job description file submits the <code>run.sh</code> script and the frame of data with the job. <code>run.sh</code> then sets the environment variables to make sure your CVMFS libraries are available, and runs the Python script <code>process-frame.py</code> remotely. You should look at the contents of these files to understand exactly what's going on.
  
<pre>
+
===Submit your job(s)===
$ firefox https://cvmfs-upload01.gridpp.rl.ac.uk &
+
</pre>
+
  
After confirming certificate and security settings, you should be presented with a page like this (click to enlarge):
+
To submit a job with DIRAC:
 
+
[[File:Cvmfs-repo-upload_home.PNG|thumb|512px|left|The RAL CVMFS Stratum-0 Uploader page - home.]]
+
 
+
<div style="clear:both;"></div>
+
 
+
(Note: this is from the CERN@school VO, which has already had software uploaded. Your page should be blank...)
+
 
+
To upload the <code>hello-world_001-00-00</code> tarball, click on the blue "Upload" button. This will take you to the "Upload new package" screen:
+
 
+
[[File:Cvmfs-repo-upload_upload.PNG|thumb|512px|left|The RAL CVMFS Stratum-0 Uploader - "Upload new package".]]
+
 
+
<div style="clear:both;"></div>
+
 
+
Click on the "Select file" button, select your <code>hello-world_001-00-00.tar</code> tarball and then press the blue "Upload" button. The tarball should now be uploaded to your VO's repository, appearing on the right-hand side of the Uploader homepage with a white background (indicating that is uploaded).
+
 
+
Now you should be ready to deploy the tarball. This is done by pressing the "Deploy" button in the '''project content''' panel of the Uploader home page:
+
 
+
[[File:Cvmfs_deploy-button.PNG|thumb|left|The deploy button.]]
+
 
+
<div style="clear:both;"></div>
+
 
+
You should now see the deploy page:
+
 
+
[[File:Cvmfs-deploy-page.PNG|thumb|512px|left|The RAL CVMFS Stratum-0 Uploader - the deploy page.]]
+
 
+
<div style="clear:both;"></div>
+
 
+
Click on the drop-down menu and you should see <code>hello-world_001-00-00.tar</code> available to select. Select it, then press the blue "Deploy" button. You should now be redirected to the home page, where <code>hello-world_001-00-00.tar</code> should be displayed with a green background to indicate that it has been successfully deployed to the repository.
+
 
+
It should take a maximum of three hours for the software to become available to worker nodes with the appropriate CVMFS access. If your local cluster has access to your VO's repository, you can check with:
+
  
 
<pre>
 
<pre>
$ ls /cvmfs/cernatschool.gridpp.ac.uk
+
$ chmod a+x run.sh
hello-world_001-00-00
+
$ dirac-wms-job-submit dirac-test.jdl
$ ls /cvmfs/cernatschool.gridpp.ac.uk/hello-world_001-00-00
+
JobID = [number]
hello-world.sh  README.md  run.sh
+
 
</pre>
 
</pre>
  
(<code>cernatschool.gridpp.ac.uk</code> is the address for the <code>cernatschool.org</code> VO - you can find out the corresponding address from the CVMFS uploader homepage.)
+
You can monitor the progress of your job using the [https://dirac.grid.hep.ph.ic.ac.uk:8443 DIRAC web interface]. Once it has completed, you can retrieve the output (which consists of a JSON file containing processed information about the frame and a log file) with:
 
+
-->
+
 
+
====Run a job using your CVMFS software====
+
 
+
Jobs using your software can be submitted and run as normal - but now you don't need to worry about installing software anywhere. The executables you need will be available in the CVMFS repository. So a JDL file for the Hello World! software (deployed on the <code>cernatschool.org</code> repository would look like this:
+
 
+
<pre>
+
$ cat helloworld.jdl
+
# The GridPP CVMFS Hello World! JDL file
+
Executable = "/bin/sh";
+
# Replace "cernatschool.gridpp.ac.uk" with your VO's address, of course!
+
Arguments = "/cvmfs/cernatschool.gridpp.ac.uk/hello-world_001-00-00/run.sh /cvmfs/cernatschool.gridpp.ac.uk/hello-world_001-00-00";
+
StdOutput = "stdout.txt";
+
StdError = "stderr.txt";
+
OutputSandbox = {"stdout.txt", "stderr.txt"};
+
#
+
</pre>
+
 
+
As you can see, the shell executable is being run with two arguments:
+
 
+
# The script to run - <code>run.sh</code> - which is in the tarball.
+
# The first argument supplied to <code>run.sh</code>. In this example, the first argument of <code>run.sh</code> base directory (including the CVMFS prefix). We have made this an argument in the <code>run.sh</code> to make local testing easier - which, when the software gets more complicated, is a worthwhile thing to do...
+
 
+
So, if you're using <code>glite</code> for your WMS, you can submit the job as usual:
+
  
 
<pre>
 
<pre>
$ voms-proxy-init --voms cernatschool.org
+
$ dirac-wms-job-get-output [number]
Enter GRID pass phrase for this identity: # you know what to do...
+
$ cat [number]/file-info.json
...[proxy confirmation messages]
+
{"n_pixel": 735, "file_name": "data000.txt", "max_count": 639}
$ myproxy-init -d -n
+
Your identity: /C=UK/O=eScience/OU=QueenMaryLondon/L=Physics/CN=tom whyntie
+
Enter GRID pass phrase for this identity: # and again...
+
...[proxy confirmation messages]
+
$ glite-wms-job-submit -a -o jobIDfile helloworld.jdl
+
...[job submission messages]
+
 
</pre>
 
</pre>
  
When you retrive the output from the jobs, <code>stdout.txt</code> should contain a very, very exciting message, indicating that your job has succeeded and the CVMFS software has been successfully deployed.
+
And that's it! Congratulations, you've successfully used CVMFS to deploy and run software on the grid.  
  
 
==Useful links==
 
==Useful links==

Revision as of 00:00, 4 February 2015

Deploying software with CVMFS

For more information about CVMFS at RAL, click here.

Overview of the process

  • Prepare your software;
  • Deploy your software to the CVMFS repository;
  • Prepare for job submission;
  • Submit your job(s).

A worked example

Here we will demonstrate the full process of deploying and running software with CVMFS using a Python script and some sample CERN@school data. All of the code is available via the GridPP GitHub repository - please feel free to adapt and modify for your own needs!

Prepare your software

In order to get your software running on the grid, you'll need to bundle it up into a tarball (.tgz) so that it's ready to upload to the RAL CVMFS stratum-1 server. This tarball will need to include the scripts, executables and libraries you need, all compiled to run on a 64-bit SL6 machine. For convenience, we have provided an example using Python in the GridPP GitHub repository cvmfs-test-001-00-00. You can get this with:

$ cd $CVMFS_UPLOAD_DIR # choose a suitable location for this.
$ wget https://github.com/gridpp/cvmfs-test-001-00-00/archive/master.zip -O cvmfs-test-001-00-00-master.zip
$ unzip cvmfs-test-001-00-00-master.zip
$ rm cvmfs-test-001-00-00-master.zip
$ tar -cvf cvmfs-test-001-00-00.tgz cvmfs-test-001-00-00-master/

This contains:

  • process-frame.py: a simple Python script to process a frame of CERN@school Timepix data, either uploaded with the job or retrived from a Storage Element (SE);
  • lib: some pre-compiled Python libraries for non-standard Python modules used by process-frames.py.

The idea is that process-frame.py will run remotely on the grid, using the non-standard Python libraries supplied with the CVMFS repository. This saves having to install the modules on each Computing Element (CE) every time you want to run a grid job. You will need to do compile and supply the libraries you need when assembling your own tarballs.


Deploy your software to the CVMFS repository

With your tarball prepared, you can now upload it to the RAL CVMFS stratum-1 by generating a grid proxy, gsiscp-ing the tarball over, and unpacking the tarball in your repository:

$ voms-proxy-init --voms [your VO name, e.g. cernatschool.org]
$ gsiscp -P 1975 cvmfs-test-001-00-00.tgz cvmfs-upload01.gridpp.rl.ac.uk:./cvmfs_repo/.
$ gsissh -p 1975 cvmfs-upload01.gridpp.rl.ac.uk

$ cd cd cvmfs_repo/
$ tar -xvf cvmfs-test-001-00-00.tgz

Your software has now been deployed. However, it may take up to three hours for the cron jobs to deploy it to all sites - be patient!

Prepare for job submission

You will need to setup whichever User Interface (UI) you use for submitting grid jobs and generate an appropriate proxy. We will use DIRAC in this example.

$ cd $DIRAC_DIR
$ . bashrc # set the DIRAC environment variables
$ dirac-proxy-init -g [VO name]_user -M

For convenience, we have prepared an example JDL file and test data that will run with the software deployed above. You can get this from the GridPP GitHub repository cvmfs-getting-started:

$ cd $CVMFS_SUBMIT_DIR
$ git clone https://github.com/gridpp/cvmfs-getting-started.git
$ cd cvmfs-getting-started

This contains:

  • dirac-test.jdl: a job description file for DIRAC users;
  • glite-test.jdl: a job description file for glite users;
  • run.sh: the script that sets the environment variables and runs the software in CVMFS.

In a nutshell, the job description file submits the run.sh script and the frame of data with the job. run.sh then sets the environment variables to make sure your CVMFS libraries are available, and runs the Python script process-frame.py remotely. You should look at the contents of these files to understand exactly what's going on.

Submit your job(s)

To submit a job with DIRAC:

$ chmod a+x run.sh
$ dirac-wms-job-submit dirac-test.jdl
JobID = [number]

You can monitor the progress of your job using the DIRAC web interface. Once it has completed, you can retrieve the output (which consists of a JSON file containing processed information about the frame and a log file) with:

$ dirac-wms-job-get-output [number]
$ cat [number]/file-info.json
{"n_pixel": 735, "file_name": "data000.txt", "max_count": 639}

And that's it! Congratulations, you've successfully used CVMFS to deploy and run software on the grid.

Useful links

Internal

External