MonitoringTools
Contents
Graphite
Motivation
The motivation behind looking at Graphite was to look at a common data store for monitor metric data, extending from the use of Ganglia as a systems monitoring solution. At Glasgow we have settled on using Ganglia as a core system monitor for load, memory, disk and network usage. The job of Graphite is as a metric aggregator and visualiser.
Installation
http://graphite.readthedocs.org
Install from the epel repo -
yum install graphite-web python-carbon python-whisper
python /usr/lib/python2.6/site-packages/graphite/manage.py syncdb
yum install -y liberation* fc-cache
chkconfig carbon-cache on chkconfig httpd on
Should be 0.9.12 - need for improved rendering, font usage (in our case at least) and JSON dashboard editing.
Message format
<metric> <value> <timestamp> cluster.node.temperature 25 1369827513
echo "cluster.node.temperature 25 1369827513" | nc <carbon-server> <port>
(Can also package metrics using pickle and port 2004, which we currently don't use).
Test case
Pandmon data
Scripts
/usr/local/bin/collectors
/etc/cron.d
//usr/share/graphite/webapp/content/css/dashboard-default.css black->white
0.9.12 "Edit Dashboard" create JSON versions of dashboards, edit sizes, metrics straightforwardly
Security
iptables
x509 through Apache
Batch monitoring
Currently we use monami (->Ganglia -> Graphite)
JSON
We use a cron version of httpjsonstats (https://github.com/sverma/httpJsonStats) to read in JSON data from external sources: a typical config file for this, with one external source consisting of hypothetical f2f users is:
{
"global": { "GRAPHITE_SERVER" : "127.0.0.1", "GRAPHITE_PORT" : 2003, "INTERVAL" : 600, "LOG_FILE" : "/var/log/httpJsonStats.log", "ERR_LOG_FILE" : "/var/log/httpJsonStats.log", "PID_FILE" : "/var/run/httpJsonStats.pid" }, "ukhep": { "host": "<JSON host>", "port": "<port>", "groups": { "scotgrid": { "URN": "/json/f2fusers.json" }
} }
This page is a Key Document, and is the responsibility of David Crooks. It was last reviewed on 2013-11-06 when it was considered to be 0% complete. It was last judged to be accurate on 2013-11-06.