MonAMI dCache plugin

From GridPP Wiki
Jump to: navigation, search

User:Greig_cowan and User:Paul_millar (MonAMI creator) have written a dCache monitoring target for MonAMI.

List of currently monitored targets

  • number of srmPut(), srmGet() and srmCopy() requests that a dCache instance receives. The information is gathered via an SQL query of the dcache postgreSQL database by the monami db user (has to be added manually).

Using the MonAMI framework, it is simple to display the gathered information in ganglia (which is already operated at many sites), allowing for historical plots to be created. Using the Nagios plugin, it will be possible to set limits on the number of SRM requests above which an alarm can be raised to draw the operators attention to the dCache state.

To-Do list

This is a list of dCache components that should be monitored. Some are only suggestions of potential monitoring targets as the technical details of how such information would be obtained have still to be determined.

  • Amount of available and used space.
  • Number of movers on each pool.
  • Number of flushes/restores to/from an attached HSM.
  • Number of files per VO.
  • ...

Other monitoring targets

The dCache plugin has been developed to monitor dCache specific components (i.e. the postgreSQL database). In addition, exisiting MonAMI plugins can be used to monitor dCache java processes and TCP connections. MonAMI can be configured to display this information in Ganglia, Nagios or some other system.

For example, the monami.conf file below can be used on a dCache head node to monitor a number of different components.

  • Check that dCache processes are listening on the correct ports.
  • Monitor the details of each of the java processes (i.e. memory used).
  • Check status of TCP connections (look out for those connections that have moved into a CLOSE_WAIT state).

See the .conf file for comments.

# Create a dCache target called "dcache".
# Specify the port the the postgres DB will run on.
port = 5432
user = monami
password = ****
# Create a ganglia target. Obviously you need to have gmons running on the node.
name = requests
# Create a target to monitor TCP connections.
# Look to see if the dCache SRM, http, admin and dcap
# processes are LISTEN-ING on the correct ports.
name = dcache-listening
count = srm [local_port=8443, state=LISTEN]
count = http  [local_port=2288, state=LISTEN]
count = admin [local_port=22223, state=LISTEN]
count = dcap [local_port=22125, state=LISTEN]
count = gridftp [local_port=2811, state=LISTEN] 
# Create TCP target to look for the different state of
# connections to a node. Useful for dCache door nodes.
name = dcache-gridftp-ctrl
count = established [state=ESTABLISHED]
count = connecting [state = CONNECTING]
count = disconnecting [state = DISCONNECTING]
count = close_wait [state = CLOSE_WAIT]

# Create a process target that is looking at all java processes.
# Improvements will be made in future releases when support for reg-exps
# is added (meaning that you can specify the process name(s) more precisely.
name = dcache-java
watch = java [uid = root, gid = root, state = R]
# Take all data from dcache, dcache-listening 
# and dcache-gridftp-ctrl to ganglia, updating ever 60s
interval = 60s
read = dcache, dcache-listening, dcache-gridftp-ctrl
write = requests

Useful links