Difference between revisions of "Site monitoring status"
From GridPP Wiki
Line 39: | Line 39: | ||
|- | |- | ||
|UKI-LT2-QMUL | |UKI-LT2-QMUL | ||
− | | OpenNMS | + | | OpenNMS, APC NetBotz 550 |
− | | Extend monitoring - | + | | Extend monitoring - investigate DELL poweredgeC snmp agent for C6100 type servers. syslog analysis. |
− | | OpenNMS primarily monitors SNMP. snmpd.conf logmatch very easy way to monitor number of log entries matching a regexp. Use dell openmanage on dell servers to provides extended information and snmp traps. Use OpenNMS to monitor syslogs for ERROR and higher messages. | + | | OpenNMS primarily monitors SNMP. snmpd.conf logmatch very easy way to monitor number of log entries matching a regexp. Use dell openmanage on dell servers to provides extended information and snmp traps. Use OpenNMS to monitor syslogs for ERROR and higher messages. room monitoring using APC netbotz solution |
|- | |- |
Revision as of 11:31, 5 January 2016
This page is intended to gather together the tools that sites are currently using to monitoring their local sites. Please fill in the details for your site with the following pieces of information:
1) Current solution(s): What tools are currently used at the site and for what purpose?
2) Future plans: What plans (if any) does your site have for future monitoring?
3) Notes: Any other information you think might be useful.
Site |
Current solution(s) | Future plans | Notes |
RAL Tier-1 | Nagios/Icinga, ganglia, cacti for networking, home grown dashboard - mimic | Starting to look at elasticsearch | Use Thruk interface to Nagios which provides useful additional views. |
UKI-LT2-Brunel | |||
UKI-LT2-IC-HEP | Nagios and Cacti | ||
UKI-LT2-QMUL | OpenNMS, APC NetBotz 550 | Extend monitoring - investigate DELL poweredgeC snmp agent for C6100 type servers. syslog analysis. | OpenNMS primarily monitors SNMP. snmpd.conf logmatch very easy way to monitor number of log entries matching a regexp. Use dell openmanage on dell servers to provides extended information and snmp traps. Use OpenNMS to monitor syslogs for ERROR and higher messages. room monitoring using APC netbotz solution |
UKI-LT2-RHUL | |||
UKI-LT2-UCL-HEP | |||
UKI-NORTHGRID-LANCS-HEP | elasticsearch/logstash/kibana deployed widely on local cluster (as opposed to grid farms)
Usual suspects: icinga, ganglia, graphite on both local+grid machines |
perhaps deploy logstash on grid farms
currently deploying Dashing for dashboards |
Logstash easy to deploy, powerful, and interesting to explore the rich content
Solid stuff, unlikely to replace these |
UKI-NORTHGRID-LIV-HEP | Nagios, Ganglia, Cacti | Replace Nagios with Icinga, upgrade Ganglia, investigate ELK, Graphite | |
UKI-NORTHGRID-MAN-HEP | |||
UKI-NORTHGRID-SHEF-HEP | |||
UKI-SCOTGRID-DURHAM | |||
UKI-SCOTGRID-ECDF | |||
UKI-SCOTGRID-GLASGOW | Naemon (status and alerting), Ganglia/Graphite (metric & time series graphing), Cacti (network monitoring) | Dashboards (Dashing, Grafana), reconsidering network monitoring | We currently use Ganglia for systems metrics, Graphite for a higher cluster level view, nagios plugin to check graphite thresholds. Deploying Grafana as unified fronted. |
UKI-SOUTHGRID-BHAM-HEP | |||
UKI-SOUTHGRID-BRIS | older cluster Ganglia & Pakiti; newer uses Nagios | Going to ditch Ganglia & Pakiti for nagios on older cluster. | Wish Munin scaled well!
|
UKI-SOUTHGRID-CAM-HEP | Nagios and Ganglia | ||
UKI-SOUTHGRID-OX-HEP | Nagios and Ganglia | Some testing of ELK | |
UKI-SOUTHGRID-RALPP | Nagios, Ganglia, Cacti, Pakiti, Dashing | ELK, new network monitoring (Observium?) | |
UKI-SOUTHGRID-SUSX |