MonAMI dCache plugin
List of currently monitored targets
- number of srmPut(), srmGet() and srmCopy() requests that a dCache instance receives. The information is gathered via an SQL query of the dcache postgreSQL database by the monami db user (has to be added manually).
Using the MonAMI framework, it is simple to display the gathered information in ganglia (which is already operated at many sites), allowing for historical plots to be created. Using the Nagios plugin, it will be possible to set limits on the number of SRM requests above which an alarm can be raised to draw the operators attention to the dCache state.
This is a list of dCache components that should be monitored. Some are only suggestions of potential monitoring targets as the technical details of how such information would be obtained have still to be determined.
- Amount of available and used space.
- Number of movers on each pool.
- Number of flushes/restores to/from an attached HSM.
- Number of files per VO.
Other monitoring targets
The dCache plugin has been developed to monitor dCache specific components (i.e. the postgreSQL database). In addition, exisiting MonAMI plugins can be used to monitor dCache java processes and TCP connections. MonAMI can be configured to display this information in Ganglia, Nagios or some other system.
For example, the monami.conf file below can be used on a dCache head node to monitor a number of different components.
- Check that dCache processes are listening on the correct ports.
- Monitor the details of each of the java processes (i.e. memory used).
- Check status of TCP connections (look out for those connections that have moved into a CLOSE_WAIT state).
See the .conf file for comments.
# Create a dCache target called "dcache". # Specify the port the the postgres DB will run on. [dcache] port = 5432 user = monami password = **** # Create a ganglia target. Obviously you need to have gmons running on the node. [ganglia] name = requests # Create a target to monitor TCP connections. # Look to see if the dCache SRM, http, admin and dcap # processes are LISTEN-ING on the correct ports. [tcp] name = dcache-listening count = srm [local_port=8443, state=LISTEN] count = http [local_port=2288, state=LISTEN] count = admin [local_port=22223, state=LISTEN] count = dcap [local_port=22125, state=LISTEN] count = gridftp [local_port=2811, state=LISTEN] # Create TCP target to look for the different state of # connections to a node. Useful for dCache door nodes. [tcp] name = dcache-gridftp-ctrl count = established [state=ESTABLISHED] count = connecting [state = CONNECTING] count = disconnecting [state = DISCONNECTING] count = close_wait [state = CLOSE_WAIT] # Create a process target that is looking at all java processes. # Improvements will be made in future releases when support for reg-exps # is added (meaning that you can specify the process name(s) more precisely. [process] name = dcache-java watch = java [uid = root, gid = root, state = R] # Take all data from dcache, dcache-listening # and dcache-gridftp-ctrl to ganglia, updating ever 60s [sample] interval = 60s read = dcache, dcache-listening, dcache-gridftp-ctrl write = requests