File Catalog Maintenance

From GridPP Wiki
Jump to: navigation, search

When experiments move files onto your Grid Storage they will usually register these files in a File Catalog. If your storage becomes permanently unavaliable, it is necessary to remove these links from the catalog.

All LHC experiements now use the LCG File Catalog and the LCG Utils client tools have been updated to reflect this now. These client tools can be used to discover which files on your SE are registered in a VO's fle catalog.

N.B., before taking any actions to replicate or delete catalog entries, if these are files owned by the VO, and are not a known user's files, talk to the VO's data manager first. The VO concerned may have their own procedures to be followed regarding file catalog updates.

The procedures outlined below may help, however.

Finding Registered Files

You can test which files are registered in the file catalog using the lcg-lr command, e.g.,

 export LFC_HOST=lfc.gridpp.rl.ac.uk 
 lcg-lr --vo dteam  srm://hepgrid11.ph.liv.ac.uk/dpm/ph.liv.ac.uk/home/dteam/generated/2012-02-15/file3ba909c3-ca04-4b86-8e97-9c42d8224e94
 srm://hepgrid11.ph.liv.ac.uk/dpm/ph.liv.ac.uk/home/dteam/generated/2012-02-15/file3ba909c3-ca04-4b86-8e97-9c42d8224e94

If this command produces a list of SURLs (and output status 0) then the file is registered in the file catalog (N.B. you must give the correct VO, but you should know this from the SURL). If the command produces lcg_lr: No such file or directory and output status 1 then the file is not registered in that VO's catalog. If the command produces lcg_lr: Invalid argument, and output status 1, then you may need to define the LFC_HOST variable.

Note that if there is more than one replica listed then you don't have the only copy of the data.

Looping over the SRM namespace

Assuming you have the SRM namespace still available (you did backup your namespace databases, didn't you?), then you can write a loop over DPNS commands (for DPM) or use pnfs (dCache) to list the files you need to test.

The following hacky bit of perl does the job for DPM:

#! /usr/bin/perl                                                                

use strict;

sub lsdir {
    my $dir = $_[0];
    my @contents=`dpns-ls -l $_[0]`;
    my $entry;    foreach $entry (@contents) {
        chomp($entry);
        my @stat = split(/\s+/, $entry);
        if ($stat[0] =~ /^d/) {
            lsdir($dir . "/" . $stat[8]);
        } else {
            print $dir . "/" . $stat[8], "\n";
        }
    }
}

lsdir($ARGV[0]);

It will print all the files beneath a DPNS directory. Call it dpns-find and just do, e.g.,

 dpns-find /dpm/scotgrid.ac.uk/home/dteam

You can wrap up the output with the usual prefix to get the requisite SURLs, e.g.,

 for file in $(dpns-find /dpm/scotgrid.ac.uk/home/dteam); do 
   echo srm://se2-gla.scotgrid.ac.uk${file}
 done

Copying Namespace Entries

If this needs to be done for a VO of which you are a member, then one can use lcg-cp, followed by lcg-del to copy the file to a new SE, then delete the old one. e.g.,

for surl in MY_LIST_OF_FILES_TO_BE_REPLICATED; do
  lcg-rep --vo MY_VO -d dcache.gridpp.rl.ac.uk $surl
  if ($? == 0); then
    lcg-del --vo MY_VO $surl
  else
    print "Error replicating SURL $surl"
  fi
done

Remember, you can only do this for VOs of which you are a member (neither the SRMs nor the file catalog allow entries to be updated using a foreign VO's grid certificate).

Deleting File Catalog Entries

If you need to delete catalog links, without copying files (say the SURLs have been lost in a hardware failure), then lcg-uf can be used to delete the corresponding catalog replica entries, but you need to use lcg-lg to list the associated GUID first:

for surl in MY_LIST_OF_FILES_TO_BE_DELETED; do
  guid=$(lcg-lg --vo MY_VO $surl)
  if ($? == 0); then
    lcg-uf --vo MY_VO $guid $surl
  else
    print "Error obtaining GUID for $surl"
  fi
done