RAL Tier1 CVMFS

What is CVMFS?

CVMFS is a caching, http based read-only filesystem optimised for delivering experiment software to (virtual) machines.

It was originally developed as part of the CernVM project, but it is potentially even more promising for physical WNs - particularly since the caches can persist longer on physical machines.

Notable features:

Caching, Read-only
Based on very standard technologies - outgoing http, fuse...
Scales very easily with additional squid caches
file checksums (currently SHA1) are verified against a trusted catalog obtained over HTTPS
File based deduplication as a handy side effect of the signed file catalog
Uses fuse to mount a virtual filesystem (based on a signed-catalog downloaded form the server)
Performance is close to a locally installed sofwtare after initial cache population
Jobs run faster than with either NFS or AFS in tests so far

And perhaps most significant:

It removes the need for local software install jobs at every site

CVMFS at RAL

We have been testing CVMFS at RAL since the summer 2010, mostly with Atlas job.

The client is installed on all worker nodes, configured with automount support for Atlas,the Atlas conditions database, CMS and LHCb. We have two squids dedicated to cvmfs.

In November 2010, Atlas user analysis jobs were resumed at RAL on an experimental basis, with software for these jobs coming exclusively from CVMFS.

February 2011 RAL deploys test replica server

RAL is now providing a replica of the CERN repositories.

April 2011 LHCB start testing with all jobs using CVMFS.

May 2011 all Atlas production switches to using CVMFS at RAL.

August 2011 Switched Atlas to using new namespace, which requires much less local configuration.

January 2013 CVMFS now preferred method for Atlas & LHCb to distribute software. (Mandated by April 30th). CMS are using cvmfs. Alice will do so.

CVMFS configuration at RAL

UK CVMFS Deployment

Small Sites

The ideal situation at sites using cvmfs would be to have two squids, locally. Although the load on the squids is small it is much better to have a pair so in the case that one squid fails jobs can still run. It is possible to run with clients pointing to squids at remote sites, but this tends to result in hard to track down job failures.

One solution is to configure squid to run on one or more worker nodes in the site. The load is really very small, and just requires enough disk space to provide a useful cache. It will at most remove a couple of cores from overall capacity (since squid is single threaded).