Draining a dCache pool

From GridPP Wiki
Jump to: navigation, search

If you need to rearrange your disk pool setup (i.e. you would like to give VOs who have been sharing a disk pool their own dedicated disk) then you should use the dCache Copy Module. The module allows you to copy a subset of files from a pool to one or more destination pools.

Important The Copy Module is no longer supported with recent versions of dCache and is known to have issues with dCache v1.8.0 and onward. The migration module, part of the new pool code, provides a superset of the copy module functionality. The command "help migration copy" in the admin interface should give a description on the available options.

Copy Module

First of all, set the source pool as read only to ensure consistency during the copy process.

(PoolManager) admin > psu set pool <source_pool> rdonly

Then create an instance of the copy task, or attach to one that is already present:

(maintenance) admin > create task my-copy-task copy-module
(maintenance) admin > attach my-copy-task

The next step requires you to allow the copy task to see the contents of the source pool:

(maintenance) admin > load pool <source_pool>

The contents of the pool are summarised by running

(maintenance) admin > ls stat
< --- Mon Feb 12 11:45:52 GMT 2007 --- >>

            Class                   File Count          Bytes
            pinned                          0              0
             bad                            0              0
           precious                         4     3000017371
            cached                          0              0
            locked                          0              0
         dteam:STATIC                       3     3000000000
       dteam:GENERATED                      1          17371
            TOTAL                           4     3000017371

The list of pnfsids that will be copied by the process can be listed using

(maintenance) admin > ls files

However, this could be a large list if the pool contains a lot of files and no filtering rules (see next) have been applied. The subset of files is selected by creating a filter by running commands like:

(maintenance) admin > exclude <fileClass>

where file class may be 'cached, precious, pinned, locked or bad' or a storage class like 'cms:generated@osm'.

(maintenance) admin > keeponly <fileClass>

excludes all file classes except for the specified one. Both, 'exclude' and 'keeponly" may be used until the repository listing fits your needs. The copy process is started by running:

(maintenance) admin > copyto pools <pool_name>
(maintenance) admin > copyto group <group_name>

and the progress of the task is followed using:

(maintenance) admin > task info
(maintenance) admin > ls stat

Once you are done you will need to add the new pools to the relevant pool groups and make sure that all of the links are in order.

Checking files have been replicated

You can check that the files have been replicated by using the following query of the companion database. It lists the pnfsids belonging to a particular VO database (000E, 000F here) and in a particular source pool (pool1_01 here). It then searches for these pnfsids in all pools and prints out those ids that only appear once, i.e. that are not on 2 or more pools, meaning that they have not been successfully copied. The reason for the unsuccessful copy process maybe because the file is in some unusual state such as being orphaned (where it is on disk and in the pool, but not in the PNFS namespace).

select pnfsid, count(pool) from cacheinfo where pnfsid in    \
( select pnfsid from  cacheinfo where pool = 'pool1_01' and  \
( substr(pnfsid,1,4)='000E' or substr(pnfsid,1,4)='000F' ) ) \
group by pnfsid having count(pool)=1;

Removing the source files

Once you are convinced that all of the source files have been replicated, you can remove them from the source pool by running:

(PoolManager) admin > rep rm -force <pnfsid>

in the source pool cell. A script like this should be created to speed up such operations. In this case, the psql query is looking for files that belogn to a particular VO (i.e. have pnfsids starting in 000E or 000F). It may be the case that you are wanting to remove files that are a subset of a particular VO database. In this situation some more advanced query or use of the PNFS metadata may be required.

#! /bin/bash
# Run this as the postgres user.

getPnfsIds() {
psql companion -t -c "select pnfsid from cacheinfo where pool='$1' \
                      and ( substr(pnfsid,1,4)='000E' \
                         or substr(pnfsid,1,4)='000F')";
}

. /usr/etc/pnfsSetup
export PATH=$PATH:$pnfs/tools 

DELETE=/tmp/pnfsids-to-delete.txt 

echo "set timeout 120" > $DELETE
echo "cd $1" >> $DELETE 

for i in `getPnfsIds $1`
do
  echo "rep rm -force $i" >> $DELETE
done 

echo ".." >> $DELETE
echo "logoff" >> $DELETE 

cd

cat $DELETE 

ssh -l admin -p 22223 -c blowfish localhost < $DELETE