DPM Monitoring

From GridPP Wiki
Jump to: navigation, search

GridPP has developed some monitoring tools to provide DPM sites with a simple way of understanding what their DPM is doing. The aim is to give sites different views of the information stored in the DPM database to enable them to understand what data is being transferred; which users are accessing; what protocols are being used; which client hosts are initiating the transfers; how many transfer errors are reported and which pools are these errors occurring on.

The monitoring is packaged up in an rpm (GridppDpmMonitor) and uses Brian Bockleman's GraphTool package. This provides a framework for querying a database, plotting the results and displaying them on a webpage. The GridppDpmMonitor package is strongly influenced by Brian's own dCache billing graph pakage. Essentially, GridppDpmMonitor is a repackage version, with appropriately constructed queries for DPM's MySQL database.

Installation

You need to use an SL4 node (32 or 64bit) to install GridppDpmMonitor.

Base packages

You will need to add a couple of yum repositories in order to get the required packages:

[sys-man]
name=Systems Manager Storage repository
baseurl=http://www.sysadmin.hep.ac.uk/rpms/fabric-management/RPMS.storage
gpgcheck=0
enabled=1
[nebraska]
name=Nebraska GraphTool RPMs
baseurl=http://t2.unl.edu/store/rpms/SL4/$basearch
enabled=1
gpgcheck=0

Install the monitoring package with:

yum install GridppDpmMonitor

This should pull in all the GraphTool related dependencies. The package can be installed on the DPM head node without any trouble. It can also be installed on a remote machine, but you will then need to configure the DPM MySQL database to allow connections to it from this remote machine.

Configuration

Apache

If you don't want to install apache, this is fine. In this case, all of the monitoring plots will appear here:

http://dpm-head-node:8098/dpm/xml/

They are served by the cherry-py instance that will be running on the server.

An alternative approach is to install apache on the same machine as GridppDpmMonitor. When you do this, you will need to make the following changes to /etc/httpd/conf/httpd.conf and then restart apache.

<VirtualHost *:80>
 # Change the alias to reflect your hostname
 ServerAlias   dpm-head-node.domain.ac.uk
 ServerAdmin   webmaster@example.org
 ServerSignature On

 LogLevel warn 

 <IfModule mod_rewrite.c>
   RewriteEngine On
   RewriteRule ^/dpm/(.*) http://localhost:8098/dpm/$1 [L,P]
   RewriteRule ^/billing/(.*) http://localhost:8098/billing/$1 [L,P]
   RewriteRule ^/content/(.*) http://localhost:8098/static/content/$1 [L,P]
 </IfModule>
</VirtualHost>

A file containing these changes is provided as part of the distribution. In this case, all of the monitoring plots will appear at

http://dpm-head-node/dpm/xml/

DBSrm.xml

Modify /etc/DBSrm.xml to suit your installation:

<graphtool-config>

 <import module="graphtool.database.connection_manager"> ConnectionManager </import>

  <class name="DpmConnMan" type="ConnectionManager" default="dpm">

    <attribute name="default"> dpm </attribute>

    <connection name="dpm">
      <attribute name="Interface"> MySQL </attribute>
      <attribute name="Database"> dpm_db </attribute>
      <attribute name="Host"> DPM_HEADNODE_HOSTNAME </attribute>
      <attribute name="Port"> 3306 </attribute>
      <attribute name="AuthDBUsername"> THE_DPM_DATABASE_USER (check site-info.def or /opt/lcg/etc/DPMINFO) </attribute>
      <attribute name="AuthDBPassword"> ??????? </attribute>
    </connection> 

  </class>

</graphtool-config>

prod.conf

/usr/lib/python2.3/site-packages/GridppDpmMonitor/billing/config/prod.conf contains the cherrypy configuration.

Add this line:

server.socket_host="hostname"

to prod.conf to get cherrypy to listen on hostname as opposed to localhost (by default).

Starting and stopping

A init.d script for GridppDpmMonitor will be placed into /etc/init.d on your system. The usual start|stop|restar|status commands apply. You can view all of the monitoring graphs by going to this webpage:

http://hostname.of.node.running.monitoring/dpm/xml/

Summary webpage

I would recommended that each site create their own summary webpage (which can be hosted where-ever is suitable). This will present all (or a subset) of the available plots in a single location which they (or users) can use for routine monitoring. Since the monitoring contains potentially sensitive information (like user DNs), it is recommended that access to the monitoring be restricted to within the site. A simple structure like this will do for a start:


 <html>
 <head>
 
 <title>GOC site name DPM monitoring</title>
 
 </head>
 
 <h1>GOC site name DPM monitoring</h1>
 
 <p><a><img src="http://hostname.of.node.running.monitoring/dpm/bar_graphs/host_transfer_quality?DN=%25&Request_type=%25&client=%25"></a></p>
 <p><a><img src="http://hostname.of.node.running.monitoring/dpm/bar_graphs/dn_success?DN=%25&Request_type=%25"></a></p>
 <p><a><img src="http://hostname.of.node.running.monitoring/dpm/bar_graphs/dn_failures?DN=%25&Request_type=%25"></a></p>
 <p><a><img src="http://hostname.of.node.running.monitoring/dpm/bar_graphs/dn_transfer_quality?DN=%25&Request_type=%25"></a></p>
 <p><a><img src="http://hostname.of.node.running.monitoring/dpm/bar_graphs/group_transfer_quality?DN=%25&Request_type=%25"></a></p>
 <p><a><img src="http://hostname.of.node.running.monitoring/dpm/bar_graphs/pool_transfer_quality?DN=%25&Request_type=%25"></a></p>
 <p><a><img src="http://hostname.of.node.running.monitoring/dpm/pie_graphs/dpm_dn_query"></a></p>
  
 </html>

Debugging

If the monitoring doesn't work for some reason, try running the following command in the shell and look at the resulting output:

billing_web
  • If you start the system up using the billing_web command, make sure that you kill off the process once you are finished with the debugging step. You will need to start a new terminal to do this. In billing_web is still running and you try to do service GridppDpmMonitor start, it will not work.
  • Note that the GridppDpmMonitor runs as the daemon user, so also check things like file permissions.

cherrypy

You may get an error saying that you do not have the _cpengine installed. It looks like something has broken in the packaging of cherrypy. As an interim fix, download this file and place it in /usr/lib/python2.3/site-packages/cherrypy/.

Bugs and support

Please submit bugs to:

http://savannah.cern.ch/projects/srmsupportuk/

Questions can always be asked on:

gridpp-storage AT jiscmail.ac.uk
dpm-users-forum AT cern.ch

Sites using GridppDpmMonitor

  • Glasgow
  • Edinburgh
  • Durham
  • Cambridge
  • Lancaster
  • Oxford
  • RHUL
  • GRIF
  • Somewhere in Poland

There may be others!

Acknowledgments

  • Big big thanks to Brian Bockleman (Nebraska).

Example Plots

File:Group transfer quality.png File:Pool transfer successes.png File:Host transfer quality.png File:Dn success.png You should be careful about who can see this plot of user DNs!