DPM-admin-tools
From GridPPwiki
GridPP DPM administration toolkit
GridPP have put together a collection of handy utilities for easing the management of DPM. This toolkit should help sites running a DPM to manage the installation and to help manage (or recover from) common problem such as disk failures and pool draining. The tools are written using the DPM python API, provided by the DPM-interfaces package. The tools are all focussed on performing a single task so you may find that to get the result you want you will need to use them in conjunction with the standard DPM command line utilities or standard shell tools (which I think is the best approach to use).
Author: Greig A Cowan, University of Edinburgh Date: May 2008 Amendments: Sam Skipsey, University of Glasgow. Wahid Bhimji, University of Edinburgh License: EGEE
Update
The release of v2 of the toolkit introduced a new naming convention for the tools (gridpp_* -> dpm-*, dpns-*) and the tools now appear in /opt/lcg/bin rather than /usr/bin. This places them in the same location as the other native DPM client tools. This version of the toolkit also cleans out some existing tools that are now supported by the native DPM client.
Installation
The tools are probably best installed on the DPM head node, but should work on a grid UI with the DPM-interfaces package installed. You need to add this yum repository to your configuration:
[sys-man] name=Systems Manager Storage repository baseurl=http://www.sysadmin.hep.ac.uk/rpms/fabric-management/RPMS.storage gpgcheck=0 enabled=1
And then install the package via:
yum install gridpp-dpm-tools
The tools will be installed in /usr/bin. We will soon provide an rpm containing the above repository.
rpm -ivh baseurl=http://www.sysadmin.hep.ac.uk/rpms/fabric-management/RPMS.storage/sys-man-repo-1.0.0.rpm
SQL-based tools (which start dpm-sql...) also require MySQL-python rpm to be installed, and attempt to locate the DPM MySQL instance by parsing /opt/lcg/etc/DPMINFO, which will probably not be available by default outside of the SE.
User Accounting
User-level accounting is now possible using the toolset. Versions of the toolkit >=2.3.9 copy two helper files to /opt/lcg/etc which you will need to use to set up your system to enable this.
/opt/lcg/etc/accountingdb.sql
is a set of SQL commands which should be run against your MySQL instance on your head node (or wherever your DPM database is) to create a new database for accounting purposes. The final line, which grants access to the dpminfo user, should be altered to whatever user is listed in your /opt/lcg/etc/DPMINFO file. (NOTE: There is a typo in the first line of the script - you need to alter the current version to make a Primary Key on entry_date, gid, uid, rather than date, uid, gid.)
/opt/lcg/etc/usage_accounting
is a cron job specification, which should be copied into /etc/cron.d/ . It calls the /opt/lcg/bin/dpm-sql-usage-by-vo-user command with the semi-secret "--es" option to write daily logs of the usage of the DPM, by user and group, into the database you just created.
An extension to the DPM_Monitoring (http://www.gridpp.ac.uk/wiki/DPM_Monitoring) tool exists to allow plotting of useful information from this database, and is documented on that page.
Environment
Since the tools use the dpm python module, it is essential that you have the correct PYTHONPATH:
export PYTHONPATH=$PYTHONPATH:/opt/lcg/lib/python
on 64bit machine this will be
export PYTHONPATH=$PYTHONPATH:/opt/lcg/lib64/python
you also need to ensure that:
export DPM_HOST=dpm-head-node.domain.ac.uk export DPNS_HOST=dpm-head-node.domain.ac.uk
Mixing 32bit python with 64bit DPM-interfaces rpm, or vice-versa, will result in python being unable to load the (compiled C) _dpm.so library. In a future release, this case will be detected and result in graceful failure with an actual useful error message. Otherwise, one can work around the issue by forcing the correct python to be used to run the script (either by calling the tool with the right python:
python32 this_dpm_script
or by editing the header of the script to explicitly call the correct python.
Available tools
dpm-disk-to-dpns
usage: dpm-disk-to-dpns [options]
Find the mappings between the files on a pool
and the LFN in the DPNS namespace. If you want to analyse all
server:filesytems on a pool, you can use the -p option. i.e.,
$ dpm-disk-to-dpns -p poolname
To restrict to a particular server:filesystem combination, use the -s
option. i.e.,
$ dpm-disk-to-dpns -s pool1.glite.ecdf.ed.ac.uk:/grid01
options:
-h, --help show this help message and exit
-d, --debug Use debug flag only for testing.
-sSERVERFS, --serverfs=SERVERFS
Specify which server:filesystem to be analysed.
-pPOOL, --pool=POOL Specify which pool to be analysed.
dpm-dpns-to-disk
usage: dpm-dpns-to-disk /dpm/path/to/file [-d DIRECTORY][-vz] options: -h, --help show this help message and exit -d,--directory Analyse files in this directory -v,--verbose See information about namespace entries without replicas -z,--zero Only print out files with zero size.
dpns-du
usage: dpns-du /dpm/path/to/directory options: -h, --help show this help message and exit -si Print with decimal, not binary prefixes -x, --exclude Ignore this directory -z, --zero Only print out files in DPNS that have zero size. -s, --summary Print a summary for each argument
dpns-find
This tool does not attempt to emulate everything that UNIX find can do. It is just a simple tool to help people find the files paths of the files they are interested in.
usage: dpm-find /dpm/path/dir filename
options:
-h, --help show this help message and exit
-xDIRECTORY, --exclude=DIRECTORY
exclude all files in this dir.
dpm-list-disk
usage: dpm-list-disk [options]
This allows you to list the replicas on disk from the DPM head node
without having to log onto the pool nodes. You can use the command line
options to pick out the filesystem you are interested in.
options:
-h, --help show this help message and exit
-fFS, --fs=FS Specify filesystem of interest.
-sSERVER, --server=SERVER
Specify server of interest.
-pPOOL, --pool=POOL Specify pool of interest.
dpm-sql-spacetoken-list-files
usage: dpm-sql-spacetoken-list-files [options]
This allows you to list the files in a given spacetoken. For performance,
it does this by performing SQL queries against the dpm_db database.
options:
--st specify a spacetoken
dpm-sql-spacetoken-usage
usage: dpm-sql-spacetoken-usage [options]
This allows you to list spacetokens and their usage. For performance,
it does this by performing SQL queries against the dpm_db database.
dpm-sql-usage-by-vo-user
usage: dpm-sql-usage-by-vo-user [options]
This allows you to list the usage of the DPM broken down by user (DN) and VO. For performance
it does this by performing SQL queries against the cns_db database.
options:
--vo specify a VO to limit the query to
-s, --si Use powers of 1000 not powers of 1024
--es Update records to MySQL database for user accounting
dpm-sql-list-hotfiles
usage: dpm-sql-list-hotfiles --days N --num M [--implicit-suffix K][--surls]
This allows you to list the M most "hot" files, sampled over the last N days of requests to the
DPM. This involves a slightly intensive SQL query against the dpm_db and cns_db databases, the
latter to retrieve file sizes for files still on the DPM.
options:
--days Number of days before the present to sample for.
--num Length of list to return.
--implicit-suffix K Use 'K' as the implicit SI suffix for filesize output (this should be an
upper-case letter corresponding to the standard SI symbol)
--surls Output the surl for the file, rather than the pfn (that is, the name of the
file in the DPM namespace, rather than the "real" filename on the pool node)
dpm-sql-spacetoken-replicate-hotfiles
usage: dpm-sql-spacetoken-replicate-hotfiles --st SPACETOKEN --nreps N(=2)
This allows you to replicate files in a given spacetoken.
options:
-h, --help show this help message and exit
--st=ST Specify a space token description
--nreps=NREPS Specify the number of copies required.Default 2.
--del Delete excess replicas (above amount specified in
nreps)
--list Just list replicas. No action taken.
--verbose Print more output
Discontinued tools
dpm-listspaces
This tool is discontinued in the dpm-tools package because a native version is available in DPM itself.
usage: dpm-listspaces [options]
options:
-h, --help show this help message and exit
-dDPM_DOMAIN, --domain=DPM_DOMAIN
Set DPM domain (default: local domain)
-g, --gip Use as a GIP provider and produce Glue LDIF output
-L, --legacy Build a Glue 1.2 compatible SA in addition to standard
ones (requires --gip)
-l, --long Detailed information on pools and reservations
-pPOOLS, --pool=POOLS
Pool to display
-rRESERVATIONS, --reservation=RESERVATIONS
Reservation to display
-v, --debug Increase verbosity level for debugging (on stderr)
gridpp_dpm_find_dpns_zero_size_files
This tool has been superceeded by gridpp_dpm_dpns_to_disk with the -z option.
usage: gridpp_dpm_find_dpns_zero_size_files dpns-listing
The dpns-listing should be a text file containg the output
of a dpns-ls command. i.e.,
$ dpns-ls -lR /dpm/ecdf.ed.ac.uk/home/lhcb/ > /tmp/dpns.txt
$ gridpp_dpm_find_dpns_zero_size_files /tmp/dpns.txt
options:
-h, --help show this help message and exit
gridpp_dpm_get_group_map
usage: gridpp_dpm_get_group_map
List all groups known to the DPM and their corresponding virtual gids.
Requires DPM >= 1.6.10.
$ gridpp_dpm_get_group_map
options:
-h, --help show this help message and exit
gridpp_dpm_get_user_map
usage: gridpp_dpm_get_user_map
List all users known to the DPM and their corresponding virtual uids.
Requires DPM >= 1.6.10.
$ gridpp_dpm_get_user_map
options:
-h, --help show this help message and exit
gridpp_dpm_list_space_tokens
usage: gridpp_dpm_list_space_tokens [options]
List all defined space tokens in the DPM. If you want to limit the search,
please specify a regular expression. i.e.,
$ gridpp_dpm_list_space_tokens -r ATLAS
options:
-h, --help show this help message and exit
-rREGEXP, --regexp=REGEXP
If required, you can specify a regular expression for
the token desc.
Bugs and support
Please submit bugs to:
http://savannah.cern.ch/projects/srmsupportuk/
Questions can always be asked on:
gridpp-storage AT jiscmail.ac.uk dpm-users-forum AT cern.ch
Announcements and updates
Updates and changes will be announced via the blog (http://gridpp-storage.blogspot.com) and the above mailing lists.
Acknowledgments
- Remi Mollon, Jean-Philippe Baud, Lana Abadie (CERN) for help with the DPM API.
- Ewan McMahon (University of Oxford) for writing the rpm spec file.
Other contributions always welcome!
