DPM-admin-tools

From GridPP Wiki
Jump to: navigation, search

GridPP DPM administration toolkit

GridPP have put together a collection of handy utilities for easing the management of DPM. This toolkit should help sites running a DPM to manage the installation and to help manage (or recover from) common problem such as disk failures and pool draining. The tools are written using the DPM python API, provided by the DPM-interfaces package. The tools are all focussed on performing a single task so you may find that to get the result you want you will need to use them in conjunction with the standard DPM command line utilities or standard shell tools (which I think is the best approach to use).

Author: Greig A Cowan, University of Edinburgh
Date: May 2008
Amendments: Sam Skipsey, University of Glasgow. Wahid Bhimji, University of Edinburgh 
License: EGEE

As of 12 August 2010, the license for the toolkit will be the ISC ("NetBSD") License, which is compatible with the EGEE license (in both extant forms), but less ambiguously stated.

Update

The release of v2 of the toolkit introduced a new naming convention for the tools (gridpp_* -> dpm-*, dpns-*) and the tools now appear in /opt/lcg/bin rather than /usr/bin. This places them in the same location as the other native DPM client tools. This version of the toolkit also cleans out some existing tools that are now supported by the native DPM client.

Update 2

Due to packaging changes within DPM itself, the rpms for release 2.6.5 have two versions. Suffix DPM173 is rpm-dependancy compatible with DPM versions <= 1.7.3, whilst suffix DPM174 is compatible with DPM 1.7.4 and above. There is no loss of data in replacing one with the other, and subsequent releases will only be rpm-dependancy compatible with DPM 1.7.4 plus.

Installation

The tools are probably best installed on the DPM head node, but should work on a grid UI with the DPM-interfaces package installed. You need to add this yum repository to your configuration:

[sys-man]
name=Systems Manager Storage repository
baseurl=http://www.sysadmin.hep.ac.uk/rpms/fabric-management/RPMS.storage
gpgcheck=0
enabled=1

And then install the package via:

yum install gridpp-dpm-tools

The tools will be installed in /usr/bin. We will soon provide an rpm containing the above repository.

rpm -ivh baseurl=http://www.sysadmin.hep.ac.uk/rpms/fabric-management/RPMS.storage/sys-man-repo-1.0.0.rpm

SQL-based tools (which start dpm-sql...) also require MySQL-python rpm to be installed, and attempt to locate the DPM MySQL instance by parsing /opt/lcg/etc/DPMINFO, which will probably not be available by default outside of the SE.

User Accounting

User-level accounting is now possible using the toolset. Versions of the toolkit >=2.3.9 copy two helper files to /opt/lcg/etc which you will need to use to set up your system to enable this.

/opt/lcg/etc/accountingdb.sql 

is a set of SQL commands which should be run against your MySQL instance on your head node (or wherever your DPM database is) to create a new database for accounting purposes. The final line, which grants access to the dpminfo user, should be altered to whatever user is listed in your /opt/lcg/etc/DPMINFO file. (NOTE: There is a typo in the first line of the script - you need to alter the current version to make a Primary Key on entry_date, gid, uid, rather than date, uid, gid.)

/opt/lcg/etc/usage_accounting 

is a cron job specification, which should be copied into /etc/cron.d/ . It calls the /opt/lcg/bin/dpm-sql-usage-by-vo-user command with the semi-secret "--es" option to write daily logs of the usage of the DPM, by user and group, into the database you just created.

An extension to the DPM_Monitoring tool exists to allow plotting of useful information from this database, and is documented on that page.

Environment

Since the tools use the dpm python module, it is essential that you have the correct PYTHONPATH:

export PYTHONPATH=$PYTHONPATH:/opt/lcg/lib/python

on 64bit machine this will be

export PYTHONPATH=$PYTHONPATH:/opt/lcg/lib64/python

you also need to ensure that:

export DPM_HOST=dpm-head-node.domain.ac.uk
export DPNS_HOST=dpm-head-node.domain.ac.uk

Mixing 32bit python with 64bit DPM-interfaces rpm, or vice-versa, will result in python being unable to load the (compiled C) _dpm.so library. In a future release, this case will be detected and result in graceful failure with an actual useful error message. Otherwise, one can work around the issue by forcing the correct python to be used to run the script (either by calling the tool with the right python:

python32 this_dpm_script

or by editing the header of the script to explicitly call the correct python.

Available tools

dpm-disk-to-dpns

usage: dpm-disk-to-dpns [options]

    Find the mappings between the files on a pool
    and the LFN in the DPNS namespace. If you want to analyse all
    server:filesytems on a pool, you can use the -p option. i.e.,

    $ dpm-disk-to-dpns -p poolname 

    To restrict to a particular server:filesystem combination, use the -s
    option. i.e.,

    $ dpm-disk-to-dpns -s pool1.glite.ecdf.ed.ac.uk:/grid01
 
options:
  -h, --help            show this help message and exit
  -d, --debug           Use debug flag only for testing.
  -sSERVERFS, --serverfs=SERVERFS
                        Specify which server:filesystem to be analysed.
  -pPOOL, --pool=POOL   Specify which pool to be analysed.

dpm-dpns-to-disk

usage: dpm-dpns-to-disk /dpm/path/to/file [-d DIRECTORY][-vz]

options:
  -h, --help            show this help message and exit
  -d,--directory        Analyse files in this directory
  -v,--verbose          See information about namespace entries without replicas
  -z,--zero             Only print out files with zero size.

dpns-du

usage: dpns-du /dpm/path/to/directory

options:
  -h, --help            show this help message and exit
  -si                   Print with decimal, not binary prefixes
  -x, --exclude         Ignore this directory
  -z, --zero            Only print out files in DPNS that have zero size.
  
  -s, --summary         Print a summary for each argument

dpns-find

This tool does not attempt to emulate everything that UNIX find can do. It is just a simple tool to help people find the files paths of the files they are interested in.

usage: dpns-find /dpm/path/dir filename

options:
  -h, --help            show this help message and exit
  -xDIRECTORY, --exclude=DIRECTORY
                        exclude all files in this dir.

dpm-list-disk

usage: dpm-list-disk [options]

    This allows you to list the replicas on disk from the DPM head node
    without having to log onto the pool nodes. You can use the command line
    options to pick out the filesystem you are interested in. 

options:
  -h, --help            show this help message and exit
  -fFS, --fs=FS         Specify filesystem of interest.
  -sSERVER, --server=SERVER
                        Specify server of interest.
  -pPOOL, --pool=POOL   Specify pool of interest.

dpm-sql-spacetoken-list-files

usage: dpm-sql-spacetoken-list-files [options]

    This allows you to list the files in a given spacetoken. For performance, 
    it does this by performing SQL queries against the dpm_db database.

options:
  --st                  specify a spacetoken

dpm-sql-spacetoken-usage

usage: dpm-sql-spacetoken-usage [options]

    This allows you to list spacetokens and their usage. For performance, 
    it does this by performing SQL queries against the dpm_db database.

dpm-sql-usage-by-vo-user

usage: dpm-sql-usage-by-vo-user [options]
    
    This allows you to list the usage of the DPM broken down by user (DN) and VO. For performance
    it does this by performing SQL queries against the cns_db database.

options:
  --vo                   specify a VO to limit the query to
  -s, --si               Use powers of 1000 not powers of 1024
  --es                   Update records to MySQL database for user accounting

dpm-sql-list-hotfiles

usage: dpm-sql-list-hotfiles --days N --num M [--implicit-suffix K][--surls]
    This allows you to list the M most "hot" files, sampled over the last N days of requests to the
    DPM. This involves a slightly intensive SQL query against the dpm_db and cns_db databases, the 
    latter to retrieve file sizes for files still on the DPM.

options:
   --days                Number of days before the present to sample for.
   --num                 Length of list to return.
   --implicit-suffix K   Use 'K' as the implicit SI suffix for filesize output (this should be an 
                         upper-case letter corresponding to the standard SI symbol)
   --surls               Output the surl for the file, rather than the pfn (that is, the name of the
                         file in the DPM namespace, rather than the "real" filename on the pool node)

dpm-sql-spacetoken-replicate-hotfiles

usage: dpm-sql-spacetoken-replicate-hotfiles --st SPACETOKEN --nreps N(=2)
    This allows you to replicate files in a given spacetoken.
options:
 -h, --help     show this help message and exit
 --st=ST        Specify a space token description
 --nreps=NREPS  Specify the number of copies required.Default 2.
 --del          Delete excess replicas (above amount specified in
                nreps)
 --list         Just list replicas. No action taken.
 --verbose      Print more output

dpm-sql-pfn-to-dpns

usage: ./dpm-sql-pfn-to-dpns server:filepath [server:filepath]
  Gives DPNS name for a single physical filename     
options:
 -h, --help     show this help message and exit

dpm-sql-files-by-vo-user

usage: ./dpm-sql-files-by-vo-user [--vo VO]
 
Gives a list of all SURLS on the DPM (for a particular VO if specified)

dpm-sql-diskfs-to-dpns-chk

This tool performs a reverse lookup for the contents of a disk or filesystem against the DPNS. Optionally does adler32 checksumming. This tool flags up files on disk which are not in the DPNS ("dark data").

dpm-sql-dpns-to-diskfs-chk

Perform a consistency check between the DPNS records for a given disk / filesystem and the actual resulting records on disk. Optionally, checksums can be calculated for files present, using adler32. This tool will flag files in the DPNS which do not exist on disk.

Discontinued tools

dpm-listspaces

This tool is discontinued in the dpm-tools package because a native version is available in DPM itself.

usage: dpm-listspaces [options]

options:
  -h, --help            show this help message and exit
  -dDPM_DOMAIN, --domain=DPM_DOMAIN
                        Set DPM domain (default: local domain)
  -g, --gip             Use as a GIP provider and produce Glue LDIF output
  -L, --legacy          Build a Glue 1.2 compatible SA in addition to standard
                        ones (requires --gip)
  -l, --long            Detailed information on pools and reservations
  -pPOOLS, --pool=POOLS
                        Pool to display
  -rRESERVATIONS, --reservation=RESERVATIONS
                        Reservation to display
  -v, --debug           Increase verbosity level for debugging (on stderr)


gridpp_dpm_find_dpns_zero_size_files

This tool has been superceeded by gridpp_dpm_dpns_to_disk with the -z option.

usage: gridpp_dpm_find_dpns_zero_size_files dpns-listing 

    The dpns-listing should be a text file containg the output
    of a dpns-ls command. i.e.,

    $ dpns-ls -lR /dpm/ecdf.ed.ac.uk/home/lhcb/ > /tmp/dpns.txt
    $ gridpp_dpm_find_dpns_zero_size_files /tmp/dpns.txt

options:
  -h, --help  show this help message and exit

gridpp_dpm_get_group_map

usage: gridpp_dpm_get_group_map  

    List all groups known to the DPM and their corresponding virtual gids.
    Requires DPM >= 1.6.10.

    $ gridpp_dpm_get_group_map

options:
  -h, --help  show this help message and exit

gridpp_dpm_get_user_map

usage: gridpp_dpm_get_user_map

    List all users known to the DPM and their corresponding virtual uids.
    Requires DPM >= 1.6.10. 

    $ gridpp_dpm_get_user_map

options:
  -h, --help  show this help message and exit

gridpp_dpm_list_space_tokens

usage: gridpp_dpm_list_space_tokens [options] 

    List all defined space tokens in the DPM. If you want to limit the search,
    please specify a regular expression. i.e.,

    $ gridpp_dpm_list_space_tokens -r ATLAS

options:
  -h, --help            show this help message and exit
  -rREGEXP, --regexp=REGEXP
                        If required, you can specify a regular expression for
                        the token desc.


Bugs and support

Please submit bugs to:

http://savannah.cern.ch/projects/srmsupportuk/

Questions can always be asked on:

gridpp-storage AT jiscmail.ac.uk
dpm-users-forum AT cern.ch

Announcements and updates

Updates and changes will be announced via the blog and the above mailing lists.

Acknowledgments

  • Remi Mollon, Jean-Philippe Baud, Lana Abadie (CERN) for help with the DPM API.
  • Ewan McMahon (University of Oxford) for writing the rpm spec file.

Other contributions always welcome!