Difference between revisions of "MonitoringTools"

From GridPP Wiki
Jump to: navigation, search
 
(2 intermediate revisions by one user not shown)
Line 1: Line 1:
===Graphite===
+
=A clearing house of common monitoring tools with useful links=
  
= Motivation =
+
A set of common monitoring tools which people can use to looks at their sites, managed by type of monitoring.
  
The motivation behind looking at Graphite was to look at a common data store for monitor metric data, extending from the use of Ganglia as a systems monitoring solution. At Glasgow we have settled on using Ganglia as a core system monitor for load, memory, disk and network usage. The job of Graphite is as a metric aggregator and visualiser.
+
==Alerting/ Status==
  
 +
<ul>
 +
<li>Nagios http://www.nagios.org</li>
 +
</ul>
  
= Installation =
+
====Nagios compatible/ Nagios forks====
  
http://graphite.readthedocs.org
+
<ul>
 +
<li>Icinga https://www.icinga.org</li>
 +
<li>Naemon http://www.naemon.org</li>
 +
<li>Shinken http://www.shinken-monitoring.org
 +
</ul>
  
 +
==Time series collection==
  
Install from the epel repo -
+
<ul>
 +
<li>Ganglia http://ganglia.info</li>
 +
<li>Collectd https://collectd.org</li>
 +
<li>Graphite http://graphite.readthedocs.org/en/latest/</li>
 +
<li>Statsd http://statsd.readthedocs.org/en/latest/</li>
 +
</ul>
  
yum  install graphite-web python-carbon python-whisper
+
==Dashboards==
  
python /usr/lib/python2.6/site-packages/graphite/manage.py syncdb
+
<ul>
 +
<li>Grafana http://grafana.org</li>
 +
<li>Dashing http://dashing.io</li>
 +
<li>PyDashie https://github.com/evolvedlight/pydashie (Python Dashing port)</li>
 +
</ul>
  
yum install -y liberation*
+
==Network Monitoring==
fc-cache
+
  
chkconfig carbon-cache on
+
<ul>
chkconfig httpd on
+
<li>OpenNMS http://www.opennms.org</li>
 +
<li>Cacti http://www.cacti.net</li>
 +
<li>Observium http://www.observium.org</li>
 +
<li>Munin http://munin-monitoring.org</li>
 +
</ul>
  
 
+
{{KeyDocs|responsible=Federico Melaccio and David Crooks|reviewdate=2015-07-07|accuratedate=2014-09-17|percentage=90}}
Should be 0.9.12 - need for improved rendering, font usage (in our case at least) and JSON dashboard editing.
+
 
+
= Message format =
+
 
+
<metric> <value> <timestamp>
+
cluster.node.temperature 25 1369827513
+
 
+
echo "cluster.node.temperature 25 1369827513" | nc <carbon-server> <port>
+
 
+
 
+
(Can also package metrics using pickle and port 2004, which we currently don't use).
+
 
+
= Test case =
+
 
+
Pandmon data
+
 
+
= Scripts =
+
 
+
/usr/local/bin/collectors
+
 
+
 
+
 
+
/etc/cron.d
+
 
+
//usr/share/graphite/webapp/content/css/dashboard-default.css black->white
+
 
+
0.9.12 "Edit Dashboard"
+
create JSON versions of dashboards, edit sizes, metrics straightforwardly
+
 
+
= Security =
+
 
+
iptables
+
 
+
x509 through Apache
+
 
+
= Batch monitoring =
+
 
+
Currently we use monami (->Ganglia -> Graphite)
+
 
+
= JSON =
+
 
+
We use a cron version of httpjsonstats (https://github.com/sverma/httpJsonStats) to read in JSON data from external sources: a typical config file for this, with one external source consisting of hypothetical f2f users is:
+
 
+
{
+
  "global": {
+
    "GRAPHITE_SERVER" : "127.0.0.1",
+
    "GRAPHITE_PORT"  : 2003,
+
    "INTERVAL"      : 600,
+
        "LOG_FILE"      : "/var/log/httpJsonStats.log",
+
        "ERR_LOG_FILE"  : "/var/log/httpJsonStats.log",
+
        "PID_FILE"      : "/var/run/httpJsonStats.pid"
+
  },
+
  "ukhep":
+
  {
+
      "host": "<JSON host>",
+
      "port": "<port>",
+
      "groups": {
+
          "scotgrid": {
+
              "URN": "/json/f2fusers.json"
+
          }
+
 
+
      }
+
  }
+
 
+
 
+
 
+
{{KeyDocs|responsible=David Crooks|reviewdate=2013-11-06|accuratedate=2013-11-06|percentage=0}}
+

Revision as of 09:08, 7 July 2015

A clearing house of common monitoring tools with useful links

A set of common monitoring tools which people can use to looks at their sites, managed by type of monitoring.

Alerting/ Status

Nagios compatible/ Nagios forks

Time series collection

Dashboards

Network Monitoring

This page is a Key Document, and is the responsibility of Federico Melaccio and David Crooks. It was last reviewed on 2015-07-07 when it was considered to be 90% complete. It was last judged to be accurate on 2014-09-17.