CVMFS configuration at RAL

From GridPP Wiki
Jump to: navigation, search


UPDATES IN PROGRESS 2014/03/10

This describes the current cvmfs setup at RAL, this will surely be refined.


CVMFS client

cvmfs client packages

There are six rpms that must be installed:

  • cvmfs (currently 2.1.17-1)
  • cvmfs-init-scripts (currently version 1.0.20-1)
  • cvmfs-keys (currently version 1.4-1)
  • fuse (should be at latest version from OS)
  • fuse-libs (should be at latest version from OS)
  • autofs should be at latest version from OS)

The latest versions of cvmfs, cvmfs-keys and cvmfs-init-scripts can be obtained from http://cernvm.cern.ch/portal/downloads or directly from the yum repo at http://cvmrepo.web.cern.ch/cvmrepo/yum/cvmfs/.

cvmfs os configuration

cvmfs requires a cvmfs account, with its own group and also part of the fuse group. The rpm will create the account (with a low uid), but will accept a pre-exsiting one. At RAL we create an account with a high id using Quattor.


IMPORTANT cvmfs requires a modification to
/etc/fuse.conf
:
user_allow_other


The Quattor Working Group framework now includes support for cvmfs as a standard feature, which can be used by simply including

include { 'features/cvmfs/config'};

in your profile

In our case we use Quattor to manage automount. If you do use a system configuration tool to manage automount then you must put in an appropriate entry for cvmfs. In the case of Quattor the following snip[pet from the QWG configuration work:

#
# Configure autofs component, if already included
#
'/software/components' = {
    if (exists('/software/components/autofs/maps')) {
        autofs = SELF['autofs'] ;
        maps = merge(autofs['maps'], nlist('cvmfs', nlist(
            'enabled', true,
            'preserve', true,
            'mapname', '/etc/auto.cvmfs',
            'type', 'program',
            'mountpoint', '/cvmfs',
        )));
        autofs['maps'] = maps;
        SELF['autofs'] = autofs;
    };
    SELF;
};

cvmfs client configuration

You then need to customise
/etc/cvmfs/default.local

Ours looks like this:

CVMFS_REPOSITORIES=alice.cern.ch,atlas.cern.ch,atlas-condb.cern.ch,cms.cern.ch,geant4.cern.ch,lhcb.cern.ch,sft.cern.ch,lhcb-conddb.cern.ch,mice.gridpp.ac.uk,na62.gridpp.ac.uk,hone.gridpp.ac.uk,wenmr.gridpp.ac.uk,phys-ibergrid.gridpp.ac.uk
CVMFS_CACHE_BASE=/pool/cache/cvmfs2/
CVMFS_QUOTA_LIMIT=50000
CVMFS_MEMCACHE_SIZE=64
CVMFS_HTTP_PROXY="http://lcgsquid07.gridpp.rl.ac.uk:3128;http://lcg0617.gridpp.rl.ac.uk:3128"

The proxy line specifies that one of two proxies be selected at random, and in each case it will fail over to the second one if there are any issues.

The
CVMFS_QUOTA_LIMIT=50000
line specifies a shared local cache directory of 50GB for all repositories.

The key items are the repositories you want to support, the CVMFS server url and the proxies. (It is not a good idea to run cvmfs on a grid site without going though a squid - although this could be an existing one.)

You can enable debugging by uncommenting the debug log line:

CVMFS_DEBUGLOG=/tmp/cvmfs.log

IMPORTANT

If you want to use a repository other than CERN (the replica at RAL for example) you need to create
/etc/cvmfs/domain.d/cern.ch.local

Ours now looks like this:

CVMFS_SERVER_URL="http://cernvmfs.gridpp.rl.ac.uk:8000/opt/@org@;http://cvmfs-stratum-one.cern.ch:8000/opt/@org@;http://cvmfs.racf.bnl.gov:8000/opt/@org@"

RAL does not really need to use port 8000 - the replicas listen on both 80 and 8000, but sites with transparent proxies on port 80 should definitely specify 8000.

Cache location and sizing

For now our cache is located in the
/pool
directory at
/pool/cache/cvmfs2
. Note that it is important that you do not put it in
/tmp
.

We are now limiting the size of individual caches, our default is 10GB, in practise 20-50GB should be enough depending on how many VO repositories you support - there is no need to put the cache on its own partition. See Elisa Lanciotti's discussion about client cache usage in PIC's tests. (https://twiki.cern.ch/twiki/bin/view/Main/ElisaLanciottiWorkCVMFSTests

Experience at CERN and at RAL suggests that lhcb.cern.ch requires at least 5GB and atlas.cern.ch makes good use of at least 10GB.

There is no need to use a separate partition, and indeed this may make things less flexible.

Cache maintenance

We have a cron job on every worker node that runs
cvmfs_fsck
at 9:00am every day and logs the results centrally.
# Cron to check and fix cvmfs cache integrity
00 09 * * * root (date --iso-8601=seconds --utc; /usr/local/bin/cvmfs_fsck) >> /var/log/cvmfs-fsck.ncm-cron.log 2>&1

This checks the caches for corrupt files (by comparing the hash with the file name) and quarantines any corrupt files. cvmfs_fsck logs to syslog. The next time they are accessed they will be redownloaded. the quarantine (new in cvmfs 2.x) allows examination and debugging.

Squid Proxies

If you are going to run jobs on your cluster using CernVM-FS you should also use squids to cache data at your site.

We have two squids - essentially frontier squids restricted to just act as caches for the cern cvmfs repository, with ACLs that restrict them to proxying for the CVMFS replicas at CERN, RAL and BNL.

It would also be possible to use existing frontier-squids - the cvmfs proxying can happily coexist with other functions.

You can obtain the latest frontier squid packages from http://frontier.cern.ch/dist/rpms/ .

If you are using the standard squid packages it is very important that you increase the value for the largest file cached on the squid - the default:

maximum_object_size 4096 KB

this would mean that any requests for files that are larger than 4KB would get forwarded directly to the replica, not what we want (writing as the administrator of a replica).

We have:

maximum_object_size 1048576 KB

Links

Back to RAL_Tier1_CVMFS