Difference between revisions of "Site monitoring status"

Revision as of 09:16, 4 July 2014

This page is intended to gather together the tools that sites are currently using to monitoring their local sites. Please fill in the details for your site with the following pieces of information:

1) Current solution(s): What tools are currently used at the site and for what purpose?

2) Future plans: What plans (if any) does your site have for future monitoring?

3) Notes: Any other information you think might be useful.


Site	Current solution(s)	Future plans	Notes
RAL Tier-1
UKI-LT2-Brunel
UKI-LT2-IC-HEP	Nagios and Cacti
UKI-LT2-QMUL	OpenNMS	Extend monitoring - in particular improve room temperature monitoring using APC netbotz solution, investigate DELL poweredgeC snmp agent for C6100 type servers.	OpenNMS primarily monitors SNMP. snmpd.conf logmatch very easy way to monitor number of log entries matching a regexp. Use dell openmanage on dell servers to provides extended information and snmp traps. Use OpenNMS to monitor syslogs for ERROR and higher messages.
UKI-LT2-RHUL
UKI-LT2-UCL-HEP
UKI-NORTHGRID-LANCS-HEP	elasticsearch/logstash/kibana deployed widely on local cluster (as opposed to grid farms) Usual suspects: icinga, ganglia, graphite on both local+grid machines	perhaps deploy logstash on grid farms currently deploying Dashing for dashboards	Logstash easy to deploy, powerful, and interesting to explore the rich content Solid stuff, unlikely to replace these
UKI-NORTHGRID-LIV-HEP
UKI-NORTHGRID-MAN-HEP
UKI-NORTHGRID-SHEF-HEP
UKI-SCOTGRID-DURHAM
UKI-SCOTGRID-ECDF
UKI-SCOTGRID-GLASGOW	Naemon (status and alerting), Ganglia/Graphite (metric & time series graphing), Cacti (network monitoring)	Dashboards (Dashing, Grafana), reconsidering network monitoring	We currently use Ganglia for systems metrics, Graphite for a higher cluster level view, nagios plugin to check graphite thresholds
UKI-SOUTHGRID-BHAM-HEP
UKI-SOUTHGRID-BRIS	older cluster Ganglia & Pakiti; newer uses Nagios	Going to ditch Ganglia & Pakiti for nagios on older cluster.	Wish Munin scaled well!
UKI-SOUTHGRID-CAM-HEP
UKI-SOUTHGRID-OX-HEP
UKI-SOUTHGRID-RALPP
UKI-SOUTHGRID-SUSX

@@ Line 101: / Line 101: @@
 |UKI-SCOTGRID-GLASGOW
 |Naemon (status and alerting), Ganglia/Graphite (metric & time series graphing), Cacti (network monitoring)
-|Dashboards (Dashing or similar), reconsidering network monitoring
+|Dashboards (Dashing, Grafana), reconsidering network monitoring
-|We currently use Ganglia for systems metrics, Graphite for a higher cluster level view
+|We currently use Ganglia for systems metrics, Graphite for a higher cluster level view, nagios plugin to check graphite thresholds
 |-

Difference between revisions of "Site monitoring status"

Revision as of 09:16, 4 July 2014

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Main GridPP website

Navigation

Tools