User documentation for CASTOR at the RAL Tier-1

CASTOR (CERN Advanced STORage manager) is a software technology used to manage a large tape-based storage system. It was developed at CERN, but it has been customised for the environment at RAL. This page has a brief explanation of the system and some guidance for users, with pointers to more detailed information. The information is especially aimed at non-LHC VOs. It is assumed that the reader has some general knowledge of the Grid (see elsewhere on this web site for pointers to further information).

CASTOR contacts

Allocation of tape and disk resources in CASTOR to an experiment (VO) is via the GridPP User Board, so this should be the first port of call for a VO wanting to start using CASTOR.

There is a weekly meeting for discussion between Tier-1 users and members of the Tier-1 team. This is currently held every Wednesday at 13:30. The meeting is held physically in building R89 (the new computer centre), and also in EVO (in the GridPP community).

For operational problems please submit a ticket, either via GGUS or directly to the Tier-1 helpdesk.

CASTOR components

CASTOR is a complex system with a number of more-or-less separate components. General information about CASTOR can be found on the CERN web site:

CASTOR home page at CERN
CASTOR User Guide
and is summarised in this architectural diagram. However, the configuration at RAL is somewhat different, so this page explains some of the RAL-specific details, as illustrated in this schematic diagram.

Command-line tools

The comments below include references to some CASTOR-specific commands. The Tier-1 has its own User Interface (UI) hosts, but access to these is no longer available to general users. However, it may be possible to negotiate access for a small number of people per experiment to enable debugging or privileged operations. For RAL users it may also be possible to use the Tier-2 UI in PPD, but this operates as a separate system and hence may not fully interoperate (use of RFIO requires UIDs and GIDs to match on the two systems). Access from outside RAL is generally not possible except via the Grid (SRM) interface.

Some environment variables are needed for the commands, as described below. To summarise, the relevant variables are:


CNS_HOST=castorns.ads.rl.ac.uk
STAGE_SVCCLASS=atlasSimStrip
STAGE_HOST=catlasstager.ads.rl.ac.uk
RFIO_USE_CASTOR_V2=YES
with the service class and stager name changed as appropriate.

The tape storage system

The Tier-1 has a large robotic tape store with a potential capacity of around 5 PB, and a total of 18 tape drives. This is shared between all the user communities, but some of the drives are reserved to particular VOs to prevent one VO starving another of resources. The mounting of tape cartridges to and from tape drives is managed by a piece of software called the Volume Drive Queue Manager (VDQM), but this is normally transparent to users.

As files are deleted from tapes, gaps are left. A piece of software called Repack is used to move files around on tape to recover this space. In addition there is a possiblility to group related files together on tape in file families, but this is somewhat complex and is not described here.

The Name Server

The CASTOR name server maintains a single namespace for all files stored in the system at RAL, using an underlying Oracle database. This is a unix-like namespace with a root of /castor/ads.rl.ac.uk/. The name server host is castorns.ads.rl.ac.uk; usually this will be used by default, but if necessary it can be defined using the CNS_HOST environment variable.

Name server commands are prefixed with ns. These are generally rather low-level commands which are unlikely to be needed by most users, but the nsls command may be useful to list files in a similar way to the unix ls. The environment variable CASTOR_HOME can be defined as a prefix used with relative path names.

The Stagers

A Stager is a software system which manages files on a pool of disk servers, and transfers (stages) those files to and from the tape system. In general CASTOR expects to use the disk servers as a cache, from which files can be deleted if space is needed as they can be subsequently recalled from tape. However, CASTOR has also been recently enhanced to support the use of disk-only files with no tape copy. Files can only be accessed from disk, so if a file is migrated to tape it has to be staged back to disk to be used, which can take some time.

At RAL each of the major LHC experiments (ATLAS, CMS and LHCb) have their own stager to avoid contention. The other experiments all share a single stager. Requests in the stagers are scheduled using the LSF batch scheduler. The disk pools managed by a stager are divided into service classes which have a set of defined properties, for example whether the files are backed up to tape or not.

There are again some command-line tools to interact with the stager. The most useful of these is stager_qry -M <filename>, which gives some information about the status of the given file, in particular whether it's currently staged to disk. There's also stager_qry -s which gives a summary of the total and free space in each service class.

These commands need to know the stager host name, which can be set using the STAGE_HOST environment variable. The current names are catlasstager.ads.rl.ac.uk, ccmsstager.ads.rl.ac.uk, clhcbstager.ads.rl.ac.uk and genstager.ads.rl.ac.uk. It may also be necessary to define the variable STAGE_SVCCLASS to the name of the relevant service class. The service classes are specific to each experiment so in general you will need to ask about which class to use, but the defined class names can be obtained from the information system or from ganglia as described below. There is also some information on the wiki.

File classes

Each file belongs to a file class, the main purpose of which is to define whether the file will be copied to tape or not. The file class may depend on both the service class and the file name, e.g. all files under a given directory. Files in a given file class will also generally be grouped together when written to tape. File class properties can be listed with the nslistclass command, and the class for a given file can be determined using nsls --class.

RFIO

RFIO (Remote File I/O) is a software protocol which provides Unix-like access to files in the CASTOR namespace. Note that this is not a true Unix filesystem, but a library which mimics the standard posix i/o functions and a set of command-line tools similar to the standard Unix tools. Particularly useful commands are rfdir (similar to nsls described above) and rfcp (similar to the Unix cp).

The current version of RFIO is not Grid-aware, it maps users according to their local Unix uid/gid, which limits its usefulness in a Grid environment, especially at RAL where users are in general no longer given local accounts. A Grid-aware version of RFIO has been developed and is expected to appear with the next major upgrade to CASTOR, but this may not be until the end of 2010. However, the latter is already in use with the DPM disk storage system used at many Grid sites, and the clients are therefore distributed with the standard grid User Interface software. Unfortunately these have the same names but are not interoperable with the CASTOR RFIO tools. The former are generally stored in /opt/lcg/bin and the latter in /usr/bin with the latter usually coming first in the PATH, so it may be necessary to ensure that you refer to them using the full path name. You should also set the environment variable RFIO_USE_CASTOR_V2=YES, as well as the stager variables described above.

The SRM

SRM (Storage Resource Manager) is a standard Grid protocol used to communicate with a storage system. The CASTOR implementation is currently a separate software layer on top of the standard CASTOR system, with its own front-end servers and back-end database. The SRM exposes a Grid-enabled web-service interface and can therefore be addressed directly by a client, but in general it's more convenient to use higher-level tools as described below. Each experiment (VO) has its own SRM endpoint called srm-<voname>.gridpp.rl.ac.uk, which may map to several load-balanced hosts for resilience.

The SRM has recently been upgraded to version 2 of the protocol. The main new feature is support for so-called space tokens. These are named storage areas with defined properties, which for CASTOR basically map to service classes (although not all service classes have an associated space token).

The current implementation of the CASTOR SRM is not VOMS-aware; it relies on a static mapping from the DN of the user to a VO-based Unix account via the so-called Grid map file. One consequence of this is that it is not possible for a user to belong to more than one VO with the same certificate (DN) as the user mapping will always be whichever one happens to be found first. Users in multiple VOs should therefore have a separate certificate for each VO. [Update November 2009: it is now supposed to be the case that each SRM front-end has its own Grid map file generated only for the relevant VO, so this restriction should no longer apply.]

The information system

Information about the CASTOR SRMs is published in the Grid information system in the standard way. Currently each SRM endpoint appears as a separate Storage Element (SE), and each service class is published as a separate GlueSA object, which in turn has an attached VOInfo object for each associated space token (if any).

High-level client tools

The standard way to access CASTOR is via the general Grid clients, i.e. the lcg-utils command-line tools, the GFAL API and the FTS for bulk data movement. General information about data management can be found in the gLite Users' Guide, and there is also a good introduction on the SEEGrid wiki. GFAL and lcg-utils have man pages which can also be found on the web, and FTS has wiki-based documentation:

Note that GFAL and lcg-utils have both C and python APIs.

In general the lcg-utils tools are designed to work with the LFC file catalogue. However, for simple uses this may not be necessary and the tools can be used without the LFC; files can be copied to and from the CASTOR SRM with lcg-cp and deleted with lcg-del. A simple example to copy a local file into CASTOR would be:


lcg-cp file:/etc/group srm://srm-atlas.gridpp.rl.ac.uk/castor/ads.rl.ac.uk/test/atlas/test.file
(after creating your Grid proxy), where srm-atlas should be replaced by the name of the SRM endpoint for your VO, and the file names should be changed as appropriate.

SURLs and TURLs

In the Grid world files are referred to using a SURL (Site URL). For CASTOR the standard form for a SURL is something like


srm://srm-atlas.gridpp.rl.ac.uk/castor/ads.rl.ac.uk/prod/atlas/StripDeg/path/to.file
where the path following the hostname is the same as the name known to the CASTOR name server. There is also an extended SURL format of the form

srm://srm-atlas.gridpp.rl.ac.uk:8443/srm/managerv2?SFN=/castor/ads.rl.ac.uk/prod/atlas/StripDeg/path/to/file
but in general the Grid tools fill in the extra information using the information system.

An instance of a file accessible via some specific protocol is referenced using a TURL (Transfer URL). This can be obtained from the SRM using the lcg-gt (get TURL) command. For RFIO it is possible to just use the CASTOR name directly, but the SRM returns a TURL in an extended format like


rfio://catlasstager.ads.rl.ac.uk:9002/?svcClass=atlasFarm&castorVersion=2&path=/castor/ads.rl.ac.uk/prod/atlas/StripDeg/path/to/file
which contains extra information that would otherwise be passed in environment variables. This format should be understood by recent versions of GFAL and the RFIO tools.

The TURL for GridFTP has a format like:


gsiftp://gdss325.gridpp.rl.ac.uk:2811//castor/ads.rl.ac.uk/prod/atlas/StripDeg/path/to/file
which includes the name of the disk server holding the file - this can be useful information for debugging.

CASTOR monitoring

Some information about the configuration and state of disk servers in CASTOR can be obtained from ganglia, for example for the whole system or restricted to a particular service class; edit the URLs or the query boxes at the top of the page for other views. There are also disk accounting views.

Regular tests of all SEs in the Grid are made with the SAM system; the linked page shows the test results for UK sites. Click on a node name to get the test history for that node, and on a test result to get details of that test.

Disk accounting statistics can be seen for the current year, or in more detail for a specific month.

Information about disk space usage can also be obtained from the information system. Queries are not entirely trivial, but an example showing the total and free space in GB in each of the ATLAS service classes is:


ldapsearch -x -h lcgbdii02.gridpp.rl.ac.uk -p 2170 \
   -b o=grid '(&(objectclass=GlueSA)(GlueChunkKey=GlueSEUniqueID=srm*.gridpp.rl.ac.uk)
(GlueSAAccessControlBaseRule=*atlas))' \
   GlueSALocalID GlueSATotalOnlineSize GlueSAFreeOnlineSize | grep -A 2 GlueSALocalID:
GlueSALocalID: atlas:atlasStripInput
GlueSATotalOnlineSize: 239950
GlueSAFreeOnlineSize: 37293
--
GlueSALocalID: atlas:atlasStripDeg
GlueSATotalOnlineSize: 17996
GlueSAFreeOnlineSize: 16732
--
GlueSALocalID: atlas:atlasT0Raw
GlueSATotalOnlineSize: 32992
GlueSAFreeOnlineSize: 8935
--
GlueSALocalID: atlas:atlasNonProd
GlueSATotalOnlineSize: 539874
GlueSAFreeOnlineSize: 507082
--
GlueSALocalID: atlas:atlasSimStrip
GlueSATotalOnlineSize: 213914
GlueSAFreeOnlineSize: 80129
--
GlueSALocalID: atlas:atlasFarm
GlueSATotalOnlineSize: 44989
GlueSAFreeOnlineSize: 7218
--
GlueSALocalID: atlas:atlasSimRaw
GlueSATotalOnlineSize: 11997
GlueSAFreeOnlineSize: 9031


Last modified Tue 10 November 2009 . View page history