[![PyPI version](https://badge.fury.io/py/monitor-exporter.svg)](https://badge.fury.io/py/monitor-exporter)
monitor-exporter
-----------------------
- [Overview](#overview)
- [Metrics naming](#metrics-naming)
* [Service performance data](#service-performance-data)
* [Host performance data](#host-performance-data)
* [State](#state)
* [Metric labels](#metric-labels)
* [Performance metrics name to labels](#performance-metrics-name-to-labels)
- [Configuration](#configuration)
* [monitor-exporter](#monitor-exporter-1)
- [Using Redis cache](#using-redis-cache)
- [Logging](#logging)
- [Prometheus configuration](#prometheus-configuration)
* [Static config](#static-config)
* [File discovery config for usage with `monitor-promdiscovery`](#file-discovery-config-for-usage-with--monitor-promdiscovery-)
- [Installing](#installing)
- [Running](#running)
* [Development with Quart built in webserver](#development-with-quart-built-in-webserver)
* [Production deployment](#production-deployment)
+ [Deploying with gunicorn](#deploying-with-gunicorn)
* [Test the connection](#test-the-connection)
- [System requirements](#system-requirements)
- [License](#license)
# Overview
The monitor-exporter utilises ITRS, former OP5, Monitor's API to fetch host and service-based performance data and
publish it in a way that lets Prometheus scrape the performance data and state as metrics.
Benefits:
- Enable advanced queries and aggregation on time series
- Prometheus based alerting rules
- Grafana graphing
- Take advantage of metrics already collected by Monitor, without rerunning checks
- Collect hosts and services performance data and state and translate to Prometheus metrics
This solution is a perfect gateway for any Monitor users that would like to start using Prometheus and Grafana.
# Metrics naming
## Service performance data
Metrics that are scraped with the monitor-exporter will have the following naming structure:
monitor_<check_command>_<perfname>_<unit>
> Unit is only added if it exists for the performance data
For example the check command `check_ping` will result in two metrics:
monitor_check_ping_rta_seconds
monitor_check_ping_pl_ratio
## Host performance data
In Monitor the host also have a check to verify the state of the host. The metric name is always called `monitor_check_host_alive`.
If this check as multiple performance values they will be reported as individual metrics, e.g.
```
monitor_check_host_alive_pkt{hostname="foo.com", environment="production", service="isalive"} 1
monitor_check_host_alive_rta{hostname="foo.com", environment="production", service="isalive"} 2.547
monitor_check_host_alive_pl_ratio{hostname="foo.com", environment="production", service="isalive"} 0.0
```
> Service label will always be `isalive`
## State
State metrics is reported for both hosts and services.
State metrics is reported as value 0 (okay), 1 (warning), 2 (critical) and 4 (unknown).
For hosts the metric name is:
monitor_host_state
For services the metric name is:
monitor_service_state
## Metric labels
The monitor-exporter adds a number of labels to each metric:
- **hostname** - is the `host_name` in Monitor
- **service** - is the `service_description` in Monitor
- **downtime** - if the host or service is currently in a downtime period - true/false. If the host is in downtime its
services are also in downtime. **Attention, downtime is only support if monitor-export is running in cache mode.**
- **address** - the hosts real address
- **acknowledged** - is applicable if a host or service is in warning or critical and have been acknowledged by operations -
0/1 where 1 is acknowledged.
Optionally the monitor-exporter can be configured to pass all or specific custom variables configured in Monitor as
labels Prometheus.
> Any host based custom variables that is used as labels is also set for its services.
> Labels created from custom variables are all transformed to lowercase.
## Performance metrics name to labels
As described above, the default naming of the Prometheus name is:
monitor_<check_command>_<perfname>_<unit>
For some check commands this does not work well like for the `self_check_by_snmp_disk_usage_v3` check command where the
perfname are the unique mount paths.
For checks where the perfname is defined depending on a specific name, you can change it so the perfname becomes a
label instead.
This is defined in the configuration like:
```yaml
perfnametolabel:
# The command name
self_check_by_snmp_disk_usage_v3:
# the label name to be used
label_name: disk
check_disk_local_mb:
label_name: local_disk
```
So if the check command is `self_check_by_snmp_disk_usage_v3`, the Prometheus metrics will have a format like:
monitor_self_check_by_snmp_disk_usage_v3_bytes{hostname="monitor", service="Disk usage /", disk="/_used"} 48356130816.0
If we did not make this transformation, we would get the following:
monitor_self_check_by_snmp_disk_usage_v3_slash_used_bytes{hostname="monitor", service="Disk usage /"} 48356130816.0
Which is bad since we get specific metric name from the perfname.
> Please be aware of naming conventions for perfname and services, especially when they include a name depending on
> what is checked like a mountpoint or disk name.
# Configuration
## monitor-exporter
All configuration is made in the config.yml file.
Example:
```yaml
# Port can be overridden by using -p if running development flask
# This is the default port assigned at https://github.com/prometheus/prometheus/wiki/Default-port-allocations
#port: 9631
op5monitor:
# The url to the Monitor server
url: https://monitor.example.com
user: monitor
passwd: monitor
metric_prefix: monitor
# Example of custom vars that should be added as labels and how to be translated
host_custom_vars:
# Specify which custom_vars to extract from Monitor
- env:
# Name of the label in Prometheus
label_name: environment
- site:
label_name: dc
# This section enable that for specific check commands the perfdata metrics name will not be part of the
# Prometheus metrics name, and is instead moved to a label
# E.g for the self_check_by_snmp_disk_usage_v3 command the perfdata name will be set to the label disk like:
# monitor_self_check_by_snmp_disk_usage_v3_bytes{hostname="monitor", service="Disk usage /", disk="/_used"}
perfnametolabel:
# The command name
self_check_by_snmp_disk_usage_v3:
label_name: disk
logger:
# Path and name for the log file. If not set, send to stdout
logfile: /var/tmp/monitor-exporter.log
# Log level
level: INFO
```
> When running with gunicorn the port is defined by gunicorn
# Using Redis cache
If you have a large Monitor configuration, the load of the Monitor server can get high when collecting host and service data over the api with a high rate.
We strongly recommend that you instead collect host and service data in a batch and store it in a redis cache.
The interval of the batch collecting is configurable, but considering that most service checks in Monitor are often done in 5 minutes interval,
collecting every minute should be more than enough.
To use caching just add this to your `config.yml`:
```
cache:
# Use redis for cache - future may support others
# Values below is the default
redis:
# redis host
host: localhost
# redis port
port: 6379
# the auth string used in redis
#auth: secretstuff
# the redis db to use
db: 0
# The interval to collect data from Monitor in secoends
interval: 60
# The time to live for the stored Monitor objects in the redis cache
ttl: 300
```
> Redis must installed on some host on the network and be accessible from the server running monitor-exporter
# Logging
The log stream is configure in the above config. If `logfile` is not set the logs will go to stdout.
Logs are formatted as json so