MonAMI DPM plugin

From GridPP Wiki
Jump to: navigation, search

User:Graeme_stewart and User:Paul_millar (MonAMI creator) have written a DPM monitoring target for MonAMI. This, along with other plugins allows MonAMI to monitor a DPM service. Details on this are given below.

Monitoring DPM

MonAMI has a DPM plugin for extracting DPM-specific information, such as the current disk-pool usage (in total and per-group). MonAMI also has a number of non-DPM-specific plugins that can gather further information.

The DPM head-node uses a MySQL database to store a lot of the current status information. The DPM plugin queries this database to discover this information (DPM does not provide any method of getting statistics).

We also might want to monitor the MySQL data itself and network activity. The MySQL plugin allows monitoring of a MySQL database, but produces copious information: one must be selective! Most network traffic to the head-node is sufficiently brief that the tcp plugin will likely not pick them up (in state ESTABLISHED).

The TIME_WAIT state of a TCP connection is a final state before the connection is considered closed. Since connections linger in this state for a short time, monitoring the number of connections in this state gives an impression of the number of "recent" connections.

Sample configuration

The following configuration gives a starting point for monitoring a DPM service. The file should be saved in the /etc/monami.d directory and MonAMI restarted.

Note that two of the monitoring targets (dpm and mysql) require an account within the MySQL database. The dpm plugin requires a MySQL account with SELECT privileges for DPM tables, the MySQL plugin requires just a MySQL account: no privileges are needed. The same account can be used for both plugins. For further details see the MonAMI users manual.

##
##  Input
##

[dpm]
 user = monami-dpm
 password = monami-secret

[mysql]
 user = monami-mysql
 password = monami-secret

[tcp]
 count = dpm [local_port=5015, state=ESTABLISHED]
 count = dpm-timewait [local_port=5015, state=TIME_WAIT]
 count = dpns [local_port=5010, state=ESTABLISHED]
 count = dpns-timewait [local_port=5010, state=TIME_WAIT]

##
##  Samples
##
##      NB. each section has separate time intervals.  The sample
##          sections can be merged if intervals are the same, which
##          leads to a simpler configuration file.
##

[sample]
 interval = 1m
 read = dpm
 write = ganglia

[sample]
 interval = 1m
 read = mysql.Network.Connections.current, \
  mysql.Execution.Threads.running, \
  mysql.Execution.Threads.cached, \
  mysql.Execution.Temporary storage.files, \
  mysql.Execution.Temporary storage.Tables.disk, \
  mysql.Execution.Temporary storage.Tables.memory, \
  mysql.Execution.Open.tables.current
 write = ganglia

[sample]
 interval = 10s
 read = tcp
 write = ganglia

##
##  Output
##

[ganglia]
 # you may need to configure this target (see manual).

TODO List

  • Improve DPM itself and DPM plugin to gather more statistics.

Useful links