The GridPP UserGuide

The DFC Command Line Interface

The DIRAC File Catalog (DFC) Command Line Interface (CLI), a.k.a. the DFC CLI, provides a way of interacting with DIRAC's File Catalog via - you guessed it - the command line. The DFC CLI lets you manually upload and download files to Storage Elements (SEs), browse the DFC associated with your Virtual Organisation (VO), create and remove directories in the DFC, and manage the replicas associated with each entry in the DFC.

The DFC CLI is great for small-scale tasks such as creating and tweaking test data sets, but ultimately we will want to use scripts to help coordinate large-scale upload operations and managing metadata (i.e. data about the data).

Getting started with the DFC CLI

Accessing the DFC CLI

The DFC CLI is accessed via a DIRAC command, so we'll need to source our DIRAC environment and generate a DIRAC proxy.

$ source /cvmfs/ganga.cern.ch/dirac_ui/bashrc
$ dirac-proxy-init -g gridpp_user -M
Generating proxy... 
Enter Certificate password: # Enter your grid certificate password...
.
. [Proxy information-based output.]
.
If you wish to use a different VO, replace gridpp with the name of the VO in the commands in this section.

The DFC CLI is then started with the following DIRAC command:

$ dirac-dms-filecatalog-cli 
Starting FileCatalog client

File Catalog Client $Revision: 1.17 $Date: 

FC:/>
We'll come back to the DIRAC command line tools in the next section, but the dirac-dms- at the start of the command refers to the DIRAC Data Management System tools. All DIRAC commands are grouped in this way which, combined with tab completion, can be very handy for finding the command you're looking for!

The FC:/> at the command prompt tells you that you're in the DFC CLI. You can now explore the DFC using commands that are very similar to those used with a typical UNIX file system. Let's do this now.

Finding your user space in the DFC

Let's start by listing the root directories in the DFC, which will give us a list of the Virtual Organisations supported by GridPP DIRAC:

FC:/> ls
cernatschool.org
gridpp
vo.londongrid.ac.uk
vo.northgrid.ac.uk
vo.scotgrid.ac.uk
vo.southgrid.ac.uk

We're using GridPP DIRAC as a member of gridpp VO, so let's move into that directory.

FC:/> cd gridpp/user

If one hasn't been created for you already, you can create your own user space on the VO's File Catalog like so:

FC:/gridpp/user> cd a
FC:/gridpp/user/a> mkdir ada.lovelace
FC:/gridpp/user/a> chmod 755 ada.lovelace
FC:/gridpp/user/a> ls -la
drwxr-xr-x 0 ada.lovelace gridpp_user 0 2015-12-16 10:24:54 ada.lovelace 
FC:/gridpp/user/a> exit
If you don't know your DIRAC username (which should be used as your user directory), exit the DFC CLI and use the dirac-proxy-info command.
Using the -la option with the ls command works just as it does with the normal command line, allowing you to see file owners, groups (VOs), permissions, etc.
Don't forget to change the file permissions on your files so that other users can't modify them.

You've now got your own space on the GridPP DFC. Let's put some files in it.

Uploading files

Firstly, we'll need a file to upload. Any file will do, but to keep things simple let's create one in a temporary directory:

$ cd ~
$ mkdir tmp; cd tmp
$ vim TEST.md # Or whichever editor you use...
$ cat TEST.md
#Hello Grid!
This is a test **MarkDown file**.

Next we'll need to know which Storage Elements are available to us.

Storage Elements "are physical sites where data are stored and accessed, for example, physical file systems, disk caches or hierarchical mass storage systems. Storage Elements manage storage and enforce authorization policies on who is allowed to create, delete and access physical files. They enforce local as well as Virtual Organization policies for the use of storage resources. They guarantee that physical names for data objects are valid and unique on the storage device(s), and they provide data access. A storage element is an interface for grid jobs and grid users to access underlying storage through the Storage Resource Management protocol (SRM), the Globus Grid FTP protocol, and possibly other interfaces as well."

Credit: Open Science Grid (2012)

We can list the available SEs with the following DIRAC command:

$ dirac-dms-show-se-status 
SE                           ReadAccess WriteAccess RemoveAccess CheckAccess 
=============================================================================
[... more disks ...]
UKI-LT2-QMUL2-disk           Active     Active      Unknown      Unknown     
[... more disks ...]
UKI-NORTHGRID-LIV-HEP-disk   Active     Active      Unknown      Unknown
[... more disks ...]

While we don't need to know the details of where and how our data will be stored on an SE, we do need to know its name. We'll use the UKI-LT2-QMUL2-disk SE for now. We add the file to the DFC as follows using the add command, which takes the following arguments:

add <LFN> <Local file name> <SE name>

where:

  • <LFN> is the Logical File Name (LFN) of the file in the DFC. This can either be relative to your current position in the DFC (which can be found with the pwd command in the DFC CLI), or made absolute by preceeding the name with a slash /;
  • <Local file name> should be the name of the local file you want to upload. Again, this can be relative to wherever you were on your local system when you started the DFC CLI, or the absolute path to the file on your local system;
  • <SE name> is the name of the SE as retrived from the dirac-dms-show-se-status command.

Let's add our file to the grid now.

$ dirac-dms-filecatalog-cli
Starting FileCatalog client

File Catalog Client $Revision: 1.17 $Date: 

FC:/> cd /gridpp/user/a/ada.lovelace
FC:/gridpp/user/a/ada.lovelace> mkdir tmp
FC:/gridpp/user/a/ada.lovelace> cd tmp
FC:/gridpp/user/a/ada.lovelace> add TEST.md TEST.md UKI-LT2-QMUL2-disk

File /gridpp/user/a/ada.lovelace/tmp/TEST.md successfully uploaded to the UKI-LT2-QMUL2-disk SE
FC:/gridpp/user/a/ada.lovelace/tmp>ls -la
-rwxrwxr-x 1 ada.lovelace gridpp_user 47 2015-12-16 11:47:28 TEST.md

And there we go! Your first file has been uploaded to a Storage Element on the grid. Have a biscuit. You've earned it.

Replicating files

Part of the joy of using the grid is being able to distribute computational tasks to different sites. However, if you want to look at the same data with a different task at different sites in an efficient manner, ideally you'd need copies of that data at those sites. This strategy also makes sense from a backup/redundancy perspective. We can achieve this on the grid by using replicas.

A replica is a copy of a given file that is located on a different Storage Element (SE). The file is identified by its Logical File Name (LFN) in the DIRAC File Catalog (DFC). Associated with each LFN entry is a list of SEs where replicas of the file can be found.

To list the locations of replicas for a given file catalog entry, we use the replicas command in the DFC CLI:

replicas <LFN>

so continuing with our example:

FC:/gridpp/user/a/ada.lovelace/tmp>replicas TEST.md
lfn: /gridpp/user/a/ada.lovelace/tmp/TEST.md
UKI-LT2-QMUL2-disk srm://se03.esc.qmul.ac.uk:8444/srm/managerv2?SFN=//gridpp/user/a/ada.lovelace/tmp/TEST.md

We replicate files with the replicate command:

replicate <LFN> <SE name>

Let's replicate our test file to the Liverpool disk and check that the replica list has been updated:

FC:/gridpp/user/a/ada.lovelace/tmp>replicate TEST.md UKI-NORTHGRID-LIV-HEP-disk
{'Failed': {},
 'Successful': {'/gridpp/user/a/ada.lovelace/tmp/TEST.md': {'register': 0.7740910053253174,
                                                            'replicate': 107.09606409072876}}}
File /gridpp/user/a/ada.lovelace/tmp/TEST.md successfully replicated to the UKI-NORTHGRID-LIV-HEP-disk SE
FC:/gridpp/user/a/ada.lovelace/tmp>replicas TEST.md
lfn: /gridpp/user/a/ada.lovelace/tmp/TEST.md
UKI-LT2-QMUL2-disk srm://se03.esc.qmul.ac.uk:8444/srm/managerv2?SFN=//gridpp/user/a/ada.lovelace/tmp/TEST.md
UKI-NORTHGRID-LIV-HEP-disk srm://hepgrid11.ph.liv.ac.uk:8446/srm/managerv2?SFN=//gridpp/user/a/ada.lovelace/tmp/TEST.md

Replicas can be removed with the rmreplica command:

rmreplica <LFN> <SE name>

Let's remove the Liverpool disk replica:

FC:/gridpp/user/a/ada.lovelace/tmp>rmreplica TEST.md UKI-NORTHGRID-LIV-HEP-disk
lfn: /gridpp/user/a/ada.lovelace/tmp/TEST.md
Replica at UKI-NORTHGRID-LIV-HEP-disk moved to Trash Bin

Finally, we can remove a file completely using the (somewhat familiar) rm command:

rm <LFN>

Let's tidy up our test file:

FC:/gridpp/user/a/ada.lovelace/tmp>rm TEST.md
lfn: /gridpp/user/a/ada.lovelace/tmp/TEST.md
File /gridpp/user/a/ada.lovelace/tmp/TEST.md removed from the catalog

Downloading files

Finally, we can download files using the DFC CLI with the get command:

get <LFN> [<local directory>]

Note that the local directory argument is optional. Let's download a test file from the gridpp examples directory now:

FC:/> get /gridpp/userguide/WELCOME.md ./.
FC:/> exit
$ cat WELCOME.md
#Welcome to GridPP!

It looks like your download has worked. Congratulations!
$ rm WELCOME.md

As we said earlier, the DFC CLI is only useful for small-scale operations. On our way to scaling up, we can look at starting to automate our workflows using scripts. In the next section we'll look at how the DIRAC command line tools can help with this.