Monitoring Resource Usage of Jobs with cAdvisor
Google's cAdvisor (https://github.com/google/cadvisor) provides information about the resources used by containers. A web UI is exposed at http://hostname:port/, and in addition data is exported to a central database.
Installing InfluxDB
Download and install the rpm:
wget https://s3.amazonaws.com/influxdb/influxdb-latest-1.x86_64.rpm rpm -ivh influxdb-latest-1.x86_64.rpm
then start the service
service influxdb start
In a browser go to http://hostname:8083/ and login using the default username (root) and password (root). To create a database for cAdvisor, specify a database name in the 'Database Details' part of 'Create a Database' and click 'Create Database'. Once the database has been created, click on the database name and create a user by specifying the username and password in the 'Create a New Database User' section.
More information available at http://influxdb.com
Installing Grafana
Download and install the rpm:
rpm -ivh https://grafanarel.s3.amazonaws.com/builds/grafana-2.0.2-1.x86_64.rpm
and start the service
service grafana-server start
Running cAdvisor
Example usage on a HTCondor worker node
/usr/local/bin/cadvisor -storage_driver=influxdb -storage_driver_host=hostname:8086 -storage_driver_db=database_name \ -storage_driver_password=password -storage_driver_user=user -storage_driver_secure=false -storage_driver_table=stats
where the InfluxDB hostname, database name, username and password should be changed as appropriate.