Difference between revisions of "Nagios"
(No difference)
|
Latest revision as of 11:43, 2 February 2010
Nagios is a Network / Host monitoring package available under the GPL. See Either the Wikipedia Summary or the Product Homepage for more details.
Gridpp operates a UK-wide Nagios, info is here: http://www.gridpp.ac.uk/wiki/UKI_Regional_Nagios
Although not promarily designed as Monitoring_Tools_for_LCG it can provide administrators with alerts on failing services and potentially restart them, as well as provide availability statistics.
Monitoring Plugins
Are documented on a Separate Page.
Remote Hosts
Because Nagios runs on a central server, it can only interrogate the remote state of machines if they are somehow accessible over the network. This means that it can run any monitor on localhost but is restricted to the following for remote ones:
- Network services (ie, check_ssh used to see if there's an sshd service on target host)
- 'Polled' local scripts sending back over a secure pipe (NRPE)
- 'Pushed' results of passive / active checks back to nagios server (NSCA)
Configuration Tips
- See what others are doing - eg RALPP_Work_List_Nagios
- Generate templates automatically to make repetetive groups simple. ie Andrew Elwell has a set of shell scripts for each type of node (worker, server, disk) that contain loops such as:
for i in `seq 1 140` ; do h=`printf "%03d" $i` cat <<EOF >> $CFG define host { host_name node$h alias Worker Node $h address 10.141.0.$i use wn_template } EOF done
Rather than defining each service on each node individually, you can then add it to a group at once:
define hostgroup{ alias Worker Nodes hostgroup_name workernodes } define host{ name wn_template use linux-server hostgroups workernodes register 0 } define service{ hostgroup_name workernodes service_description sshd check_command check_ssh servicegroups sshservers use local-service }
- Group all the services together using servicegroups
- If you already restrict access to the webserver that nagios runs under (htaccess or SSL/x509), then you can set the cgi.cfg to allow user * and it'll use $REMOTE_USER within nagios
- an example SSL Configuration. This is for Apache 2, and also includes an example of how to apply basic certificate ACLs from within the nagios config.
SSLEngine on SSLCipherSuite ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP:+eNULL SSLCertificateFile /etc/apache2/ssl/nagios-hostcert.pem SSLCertificateKeyFile /etc/apache2/ssl/nagios-hostkey.pem SSLCACertificatePath /etc/grid-security/certificates SSLCACertificateFile /etc/apache2/ssl/cacert.crt SSLOptions +ExportCertData +CompatEnvVars +StdEnvVars SSLVerifyClient require SSLVerifyDepth 2 SSLUserName SSL_CLIENT_S_DN <Location /nagios> SSLRequire %{SSL_CLIENT_S_DN} eq "/C=UK/O=eScience/OU=Manchester/L=HEP/CN=colin morey" \ or %{SSL_CLIENT_S_DN} eq "/C=UK/O=eScience/OU=Manchester/L=HEP/CN=Someone Else" </Location>
Notifications
By Default Nagios comes with email notifications, but can easily be extended to notify with pagers, sms or even Jabber