MonAMI by example

From GridPP Wiki
Jump to: navigation, search

This page gives a quick step-by-step tutorial on MonAMI. It has a number of examples, each is centred on a short, self-contained configuration file. These examples give a quick and gentle introduction to monitoring with MonAMI, walking you through various monitoring scenarios illustrating the features of MonAMI.

Its worth pointing out that this tutorial is not supposed to be definitive. See the users guide (HTML, HTML (single page), A4 PDF, Letter PDF) for a definitive guide to using MonAMI.

Before you start...

Prerequisites

For this tutorial, you need:

  • A computer,
  • a fresh install of MonAMI,
  • the ability to edit the MonAMI configuration files (e.g. root access),
  • About 5--10 minutes of spare time per example.
  • The fifth example requires KSysGuard (a standard part of the KDE desktop).

Location of configuration files

Configuration is held in the /etc/monami.d/ directory. In the following examples, you will be creating a file /etc/monami.d/example. Each example is self-contained, so you should overwrite this file for each example.

Starting and stopping MonAMI

For the following examples you should run MonAMI in a non-detaching, verbose mode. To do this, run the MonAMI executable directly use the command /usr/bin/monamid -fv. You can stop MonAMI with Control+C.

When starting MonAMI, you will see output like:

 Loading configuration file /etc/monami.conf
         plugin apache loaded
         plugin amga loaded
 [... many more lines like this ...]
         plugin tcp loaded
         plugin tomcat loaded
 Starting up...

When MonAMI is shutting down, you will see:

 Waiting for activity to stop...
 Shutting down threads...

If MonAMI is busy when it was told to shutdown, you may see a delay (of, at most, a few seconds) between the "waiting for activity to stop" message and the "shutting down threads" message. This is normal.

More detailed information on starting, running and stopping MonAMI is available within the MonAMI Users Guide.

The first example

In this example, MonAMI will record the current state of the root filesystem and store the result within the file /tmp/monami-filesystem.

The configuration file

The first monitoring example will look at monitoring the available space on a root partition. Copy the following text and store it as the /etc/monami.d/example file.

 ##
 ## Monitoring targets
 ##
 
 # Our root filesystem
 [filesystem]
  name = root-fs
  location = /
 
 ##
 ## Sample sections
 ##
 
 # Record latest f/s stats every second
 [sample]
  read = root-fs
  write = snapshot
  interval = 2
 
 ##
 ## Reporting targets
 ##
 
 # The current filesystem statistics
 [snapshot]
  filename = /tmp/monami-filesystem

This configuration will measure the current status of the root filesystem every two seconds and store the output in a file /tmp/monami-filesystem.

Running the example

Make sure you run MonAMI for at least two seconds. MonAMI attempts to spread its work evenly to reduce the impact of monitoring. It does this by starting the timed monitoring with a random fraction of the interval time, so it can take up to two seconds before MonAMI will monitor the root filesystem.

You should see a file /tmp/monami-filesystem. Depending on your local filesystem (and which version of MonAMI you are using) you should see output similar to the following:

"root-fs.fragment size" "4096" (B) [every 0s]
"root-fs.blocks.size"   "4096" (B) [every 0s]
"root-fs.blocks.total"  "11575120" (blocks) [every 0s]
"root-fs.blocks.free"   "2236238" (blocks) [every 0s]
"root-fs.blocks.available"      "1648254" (blocks) [every 0s]
"root-fs.capacity.total"        "45215.312500" (MiB) [every 0s]
"root-fs.capacity.free" "8735.304688" (MiB) [every 0s]
"root-fs.capacity.available"    "6438.492188" (MiB) [every 0s]
"root-fs.capacity.used" "36480.007812" (MiB) [every 0s]
"root-fs.files.used"    "5881856" (files) [every 0s]
"root-fs.files.free"    "5524442" (files) [every 0s]
"root-fs.files.available"       "5524442" (files) [every 0s]
"root-fs.flag"  "0" () [every 0s]
"root-fs.namemax"       "255" () [every 0s]

Points of interest

The following are some points worth noting from this example:

Configuration file semantics

The configuration files have a particular format.

Notice how:

  • Comment lines start with a hash (#) symbol.
  • The configuration is split into different sections. Each section starts with a name in square brackets.

Data and datatrees

Monitoring targets will often give lots of information about their current status. This information is held in a tree structure. This is just like a filesystem, with files representing a metric and directories ("folders") containing other directories ("folders") or files. Instead of using a slash ("/" or "\") to separate elements of some metric's path, a dot is used instead.

In the above example, there are three branches ("blocks", "capacity" and "files"), each containing multiple metrics.

Plugins and Targets

The configuration is split into different sections, each starting with a name in square brackets. Each of these sections defines a new target. The most fundamental information about a target is which plugin it uses. The plugin name is defined within the square brackets at the beginning of a target's definition.

A target is a configured plugin: one with sufficient information for it to work. For example, if we wanted to monitoring several partitions the configuration could contain multiple [filesystem] sections, one for each partition. Although we would have many filesystem targets, there is only ever the one filesystem plugin.

Every target will have a name, which must be unique. By default they take the name of their plugin, but this can be changed with the "name" attribute. In the above example, the filesystem-plugin target is named "root-fs" whereas the snapshot-plugin target takes the default name: "snapshot".

Some plugins provide information: these are monitoring plugins: filesystem is one of the monitoring plugins. The MonAMI users guide describes many other monitoring plugins. The targets based on a monitoring plugin are monitoring targets: the root-fs target above is an example.

Other plugins accept information so it can be stored or sent to external sources: these are reporting plugins; targets based on reporting plugins are reporting targets. The snapshot plugin is an example of a reporting plugin, although there are many others.

There are yet other plugins that lead a more complex life, such as "sample" and "dispatch". Don't worry about these just yet: they'll become clear as we go along.

The second example

In this example, we get MonAMI to monitor two different parts of the filesystem independently.

The configuration file

Copy the following text and store it as the /etc/monami.d/example file.

 ##
 ## Monitoring targets
 ##
 
 # Our root filesystem
 [filesystem]
  name = root-fs
  location = /
 
 # Our /home filesystem
 [filesystem]
  name = home-fs
  location = /home
   
 ##
 ## Sample sections
 ##
 
 # Record latest root-fs stats every two seconds
 [sample]
  interval = 2s
  read = root-fs
  write = root-snapshot
 
 # Record latest home-fs stats every two seconds
 [sample]
  interval = 2s
  read = home-fs
  write = home-snapshot
 
 ##
 ## Reporting targets
 ##
 
 # The current root filesystem statistics
 [snapshot]
  name = root-snapshot
  filename = /tmp/monami-root-filesystem
 
 # The current /home filesystem statistics
 [snapshot]
  name = home-snapshot
  filename = /tmp/monami-home-filesystem

This configuration will measure the current status of the two filesystems. This happens independently. For each monitoring activity, the output is stored in one of two files in /tmp: monami-root-filesystem and monami-home-filesystem.

Running the example

Make sure you run the example for at least two seconds. You should see the two files appear in /tmp: monami-root-filesystem and monami-home-filesystem

The exact contents of the output files will depend on your version of MonAMI and how your filesystems are configured. The file /tmp/monami-root-filesystem should look like:

 "root-fs.fragment size" "4096" (B) [every 0s]
 "root-fs.blocks.size"   "4096" (B) [every 0s]
 "root-fs.blocks.total"  "11575120" (blocks) [every 0s]
 "root-fs.blocks.free"   "2235855" (blocks) [every 0s]
 "root-fs.blocks.available"      "1647871" (blocks) [every 0s]
 "root-fs.capacity.total"        "45215.312500" (MiB) [every 0s]
 "root-fs.capacity.free" "8733.808594" (MiB) [every 0s]
 "root-fs.capacity.available"    "6436.996094" (MiB) [every 0s]
 "root-fs.capacity.used" "36481.503906" (MiB) [every 0s]
 "root-fs.files.used"    "5881856" (files) [every 0s]
 "root-fs.files.free"    "5524559" (files) [every 0s]
 "root-fs.files.available"       "5524559" (files) [every 0s]
 "root-fs.flag"  "0" () [every 0s]
 "root-fs.namemax"       "255" () [every 0s]

and the file /tmp/monami-home-filesystem should have contents like:

 "home-fs.fragment size" "32768" (B) [every 0s]
 "home-fs.blocks.size"   "32768" (B) [every 0s]
 "home-fs.blocks.total"  "4502722" (blocks) [every 0s]
 "home-fs.blocks.free"   "3601010" (blocks) [every 0s]
 "home-fs.blocks.available"      "3372285" (blocks) [every 0s]
 "home-fs.capacity.total"        "140710.062500" (MiB) [every 0s]
 "home-fs.capacity.free" "112531.562500" (MiB) [every 0s]
 "home-fs.capacity.available"    "105383.906250" (MiB) [every 0s]
 "home-fs.capacity.used" "28178.500000" (MiB) [every 0s]
 "home-fs.files.used"    "18300928" (files) [every 0s]
 "home-fs.files.free"    "17981924" (files) [every 0s]
 "home-fs.files.available"       "17981924" (files) [every 0s]
 "home-fs.flag"  "0" () [every 0s]
 "home-fs.namemax"       "255" () [every 0s]

Points of interest

Notice how...

Independent monitoring

The two files in /tmp directory are updated independently. The two files should have different time-stamps. This is deliberate. Whenever possible, MonAMI will try to spread the load, so there isn't a sudden peak in monitoring activity.

If you don't want independent monitoring, the following example will show how to monitor the two filesystems concurrently.

Naming of reporting targets

In this example, there are two snapshot-plugin targets. Since target names must be unique, we need to give them unique names.

The third example

Here we look at both selecting a subset of available information and merging different datatrees together. This allows us to monitor different targets concurrently.

The configuration file

Copy the following text and store it as the /etc/monami.d/example file, replacing the first example file.

 ##
 ## Monitoring targets
 ##
 
 # Our root filesystem
 [filesystem]
  name = root-fs
  location = /
 
 # Our /home filesystem
 [filesystem]
  name = home-fs
  location = /home
 
 ##
 ## Sample sections
 ##
 
 # Record latest f/s stats every two seconds
 [sample]
  read = root-fs.blocks.available, home-fs.blocks.available
  write = snapshot
  interval = 2
 
 ##
 ## Reporting targets
 ##
 
 # The current filesystem statistics
 [snapshot]
  filename = /tmp/monami-filesystem

A configuration with these four sections will measure the current available blocks of the root and /home filesystem every two seconds and store the latest values in a file /tmp/monami-filesystem.

Running the example

As with the previous example, you should make sure MonAMI is running for at least two seconds to guarantee that the file /tmp/monami-filesystem has been updated.

You should see a file /tmp/monami-filesystem. Depending on your filesystems this file should contain something like the following:

 "root-fs.blocks.available"      "1647882" (blocks) [every 0s]
 "home-fs.blocks.available"      "3372298" (blocks) [every 0s]

Points of interest

The points of interest with this example are selection of subsets of available data and merging multiple datatrees.

Selecting subsets of a datatree

In the above example, we select just one metric within each datatree. From the root-fs and home-fs targets, we select only the "blocks.available" metric.

When selecting parts of a datatree, we can select several parts of a datatree just by listing them separated by commas. For example, specifying "read = root-fs.blocks.available, root-fs.files.available" would select the following metrics (without any result):

 root-fs.blocks.available
 root-fs.files.available

We can also specify the name of a branch to include all data within that branch. In the above example, using "read = root-fs.blocks" would provide the following metrics:

 root-fs.blocks.size
 root-fs.blocks.total
 root-fs.blocks.free
 root-fs.blocks.available

Sometimes its easier to say what data you don't want to include. Specific metrics or branches can be exclude by listing them prefixed with an exclamation mark. For example, "read = root-fs.blocks, !root-fs.blocks.free" will select the following metrics:

 root-fs.blocks.size
 root-fs.blocks.total
 root-fs.blocks.available

Combining different datatrees

Sample sections can combine different datatrees together by specifying them as comma-separated list of sources. Specifying read = root-fs, home-fs would combine all data from the root-fs and home-fs targets. Depending on the state of the filesystems (and on which version on MonAMI you are running), you should see output like:

"root-fs.fragment size" "4096" (B) [every 0s]
"root-fs.blocks.size"   "4096" (B) [every 0s]
"root-fs.blocks.total"  "11575120" (blocks) [every 0s]
"root-fs.blocks.free"   "2235855" (blocks) [every 0s]
"root-fs.blocks.available"      "1647871" (blocks) [every 0s]
"root-fs.capacity.total"        "45215.312500" (MiB) [every 0s]
"root-fs.capacity.free" "8733.808594" (MiB) [every 0s]
"root-fs.capacity.available"    "6436.996094" (MiB) [every 0s]
"root-fs.capacity.used" "36481.503906" (MiB) [every 0s]
"root-fs.files.used"    "5881856" (files) [every 0s]
"root-fs.files.free"    "5524559" (files) [every 0s]
"root-fs.files.available"       "5524559" (files) [every 0s]
"root-fs.flag"  "0" () [every 0s]
"root-fs.namemax"       "255" () [every 0s]
"home-fs.fragment size" "32768" (B) [every 0s]
"home-fs.blocks.size"   "32768" (B) [every 0s]
"home-fs.blocks.total"  "4502722" (blocks) [every 0s]
"home-fs.blocks.free"   "3601010" (blocks) [every 0s]
"home-fs.blocks.available"      "3372285" (blocks) [every 0s]
"home-fs.capacity.total"        "140710.062500" (MiB) [every 0s]
"home-fs.capacity.free" "112531.562500" (MiB) [every 0s]
"home-fs.capacity.available"    "105383.906250" (MiB) [every 0s]
"home-fs.capacity.used" "28178.500000" (MiB) [every 0s]
"home-fs.files.used"    "18300928" (files) [every 0s]
"home-fs.files.free"    "17981924" (files) [every 0s]
"home-fs.files.available"       "17981924" (files) [every 0s]
"home-fs.flag"  "0" () [every 0s]
"home-fs.namemax"       "255" () [every 0s]

Combining datatrees are useful when combining monitoring results from different targets. In this example, we merge two datatrees (one metric from each), but we can combine any number of datatrees.

The fourth example

In this example, we show caching and named samples. Caching allows you to make sure you never overload a service from monitoring. Named samples allows logical grouping of related monitoring from different targets.

The configuration file

##
## Monitoring targets
##

# Our root filesystem
[filesystem]
 name = root-fs
 location = /

# Our /home filesystem
[filesystem]
 name = home-fs
 location = /home

##
## Sample sections
##

# Bring together information about the two partitions
[sample]
 name = partitions
 read = root-fs, home-fs
 cache = 10
 
# Update our snapshot every ten seconds
[sample]
 read = partitions
 write = snapshot
 interval = 10
 
# Once a minute, send data to a log file.
[sample]
 read = partitions.root-fs.blocks.available, partitions.home-fs.blocks.available
 write = filelog
 interval = 1m
 
##
## Reporting targets
##
 
# The current filesystem statistics
[snapshot]
 filename = /tmp/monami-fs-current
 
#  A log file for filesystem statistics
[filelog]
 filename = /tmp/monami-fs-log

Running the example

With this example, you should leave MonAMI running for a few minutes. Whilst it is running, you can check that data is being appended to the log file (/tmp/monami-fs-log) correctly.

Depending on which version of MonAMI and the current state of your partitions, the file /tmp/monami-fs-current should look like:

 "partitions.root-fs.fragment size"      "4096" (B) [every 0s]
 "partitions.root-fs.blocks.size"        "4096" (B) [every 0s]
 "partitions.root-fs.blocks.total"       "11575120" (blocks) [every 0s]
 "partitions.root-fs.blocks.free"        "2083150" (blocks) [every 0s]
 "partitions.root-fs.blocks.available"   "1495166" (blocks) [every 0s]
 "partitions.root-fs.capacity.total"     "45215.312500" (MiB) [every 0s]
 "partitions.root-fs.capacity.free"      "8137.304688" (MiB) [every 0s]
 "partitions.root-fs.capacity.available" "5840.492188" (MiB) [every 0s]
 "partitions.root-fs.capacity.used"      "37078.007812" (MiB) [every 0s]
 "partitions.root-fs.files.used" "5881856" (files) [every 0s]
 "partitions.root-fs.files.free" "5518204" (files) [every 0s]
 "partitions.root-fs.files.available"    "5518204" (files) [every 0s]
 "partitions.root-fs.flag"       "0" () [every 0s]
 "partitions.root-fs.namemax"    "255" () [every 0s]
 "partitions.home-fs.fragment size"      "32768" (B) [every 0s]
 "partitions.home-fs.blocks.size"        "32768" (B) [every 0s]
 "partitions.home-fs.blocks.total"       "4502722" (blocks) [every 0s]
 "partitions.home-fs.blocks.free"        "3736679" (blocks) [every 0s]
 "partitions.home-fs.blocks.available"   "3507953" (blocks) [every 0s]
 "partitions.home-fs.capacity.total"     "140710.062500" (MiB) [every 0s]
 "partitions.home-fs.capacity.free"      "116771.218750" (MiB) [every 0s]
 "partitions.home-fs.capacity.available" "109623.531250" (MiB) [every 0s]
 "partitions.home-fs.capacity.used"      "23938.843750" (MiB) [every 0s]
 "partitions.home-fs.files.used" "18300928" (files) [every 0s]
 "partitions.home-fs.files.free" "17983164" (files) [every 0s]
 "partitions.home-fs.files.available"    "17983164" (files) [every 0s]
 "partitions.home-fs.flag"       "0" () [every 0s]
 "partitions.home-fs.namemax"    "255" () [every 0s]


The file /tmp/monami-fs-log should look like:

 #       time            partitions.root-fs.blocks.available     partitions.home-fs.blocks.available
 2007-04-02 18:37:01     1495311 3507970
 2007-04-02 18:38:01     1495240 3507970
 2007-04-02 18:39:02     1495240 3507970
 2007-04-02 18:40:01     1495241 3507970
 2007-04-02 18:41:02     1495241 3507970
 2007-04-02 18:42:01     1495241 3507970
 2007-04-02 18:43:01     1495240 3507969
 2007-04-02 18:44:01     1495233 3507969
 2007-04-02 18:45:02     1495117 3507969

Points of interest

The two points of interest here are named sample sections and caching

Named sample sections

A named sample is a [sample] section that has a name attribute. Those without a name attribute are "anonymous samples". In fact, anonymous sample sections are assigned a name automatically when MonAMI starts; but, you never use this name or need to know it. If you find you need to collect data from an anonymous sample, simply give the sample section a name.

Named samples allow grouping of monitoring data. Suppose you wanted to monitor multiple attributes about a service; for example, count active TCP connections, watch the application's use of the database, and count number of daemons running. It is often useful to group these related metrics together into a single monitoring target. A named sample allows you to do this. Further (perhaps anonymous) sample sections can request monitoring data from the named sample, as in the above example.

Note that, although not illustrated in the above example, named samples will accept the interval attribute, allowing them to trigger monitoring as if they were anonymous samples.

Caching

Monitoring will always incurs some computational cost. Sometimes this cost is sufficiently high that we might want to rate-limit any queries so, for example, a service is never monitored more than one every 20 seconds.

Within MonAMI, this is achieved with the cache attribute. You can configure any monitoring (or monitoring-like) target to cache the results for a number of seconds. In the above example, results from the "partitions" named sample are cached for ten seconds. If one of the anonymous samples were active more than once every ten seconds, they would not trigger any gathering of fresh data and would receive the previous result until the ten-second cache had expired.

Some monitoring targets report a different set of metrics at different times. Most often this happens when the service being monitored goes down, although some services report more metrics once the service has stabilised (e.g. Apache). Once this is detected, MonAMI will invalidate its internal cache: the new data structure will propagate to all reporting targets independent of caching.

Note that the cache attribute works for named samples, as demonstrated in the above example. Caching a named sample allows a conservative level of caching for the bulk of the monitoring activity whilst retaining the possibility of adding more frequent monitoring later.


The fifth example

This example demonstrates on-demand monitoring. On-demand monitoring is where MonAMI will not trigger monitoring automatically, but rather it will gather data only when requested by a user. This allows a user to select which items they are interested in, rather than configuring the selection of information within a MonAMI configuration.

The configuration file

Use the following configuration file:


##
## Monitoring targets
##

# Our root filesystem
[filesystem]
 name = root-fs
 location = /

# Our /home filesystem
[filesystem]
 name = home-fs
 location = /home

##
## Sample sections
##

# Bring together all information we want KSysGuard to see
[sample]
 name = ksysguard-sample
 read = root-fs, home-fs
 cache = 10

##
## Reporting targets
##
 
#  Our ksysguard connection
[ksysguard]
 read = ksysguard-sample

Running the example

Start MonAMI as before (monamid -fv). Whilst MonAMI is running, it will listen on port 3112 for incoming connections. This is the default port KSysGuard will attempt to connect.

With MonAMI running, start the KSysGuard GUI (the program "ksysguard"). From the menus, select "File", "Connect to host...". Within the dialogue box, enter the name of the host MonAMI is running on, and ensure that "Daemon" is selected under Connection Type.

Clicking on "Ok" should result in the hostname appearing in the "Sensor browser" pane. If this doesn't work, check that MonAMI is running, you entered the host name correctly and there is not firewall blocking connections.

Once the hostname appears on the sensor browser, click on the "+" symbol (next to the hostname) to see the available sensors. If MonAMI is reporting many metrics, this can be a slow process: be patient.

To begin monitoring, drag and drop a metric to a display area. It is often easiest to create a new worksheet (menu items "File", "New worksheet...") for a collection of new metrics.

NB. You may notice that some of the metrics are labelled "Free Memory". This is due to a bug in KSysGuard where it assumes "free" refers to memory.

Points of interest

The two main points of interest are:

On-demand monitoring

KSysGuard is an example of on-demand monitoring. Here, we do not specify which of the available metrics to monitor within the MonAMI configuration. Rather, the user (through the KSysGuard GUI) chooses what to monitor. By adding additional monitoring within the KSysGuard GUI, MonAMI with gather the requested information. If no data is requested, then MonAMI will gather no data, so place no burden on the monitored system.

Mixed monitoring

One can configure MonAMI to do periodic monitoring whilst allowing on-demand monitoring. In the above example, the following configuration can be added:

 [sample]
  read = root-fs.blocks.available
  write = root-fs-log
  interval = 1m
 
 [filelog]
  name = root-fs-log
  filename = /tmp/monami-root-fs-log

MonAMI will record data to the root-fs-log target every minute.

This file logging of data is independent of and concurrent to any KSysGuard monitoring, except that MonAMI will honour the cache settings. Each of the two filesystem monitoring target will cache results for two seconds, irrespective of whether the request came from KSysGuard or from the anonymous sample section.