Graphite Alerts
==============
Graphite Alerts is a small application to send PagerDuty alerts based on
Graphite metrics. This makes it easy to be paged about what's happening in
your system.
## Background
Graphite is a great tool for recording metrics but it isn't easy to get paged
when a metric passes a certain threshold.
Graphite-Alerts is an easy to use alerting tool for Graphite that will send
Pager Duty alerts if a metric reaches a warning or critical level.
## Requirements
* Graphite
## Notifiers
Notifiers are what communicate with your preferred alerting service. Currently
PagerDuty, HipChat, Email notifiers exists.
More notifiers are easy to write, file an issue if there is something you would like!
## Installation
At the moment the easiest way to install Graphite-Alerts from git repo directly
1. Install the package with Pip
`pip install -e git://github.com/ybrs/graphite-alerts.git#egg=graphitealerts`
2. Copy config-sample.yml and change as you like
4. Run `graphite-alerts`
graphite-alerts --config config.yml
Where the file `config.yml` is in the following format.
# Configuration of Alerts
Configuration of alerts is handled by a YAML file.
### Settings
Currently you at least need to set these, redisurl and graphite_url is mandatory, others are optional
```
settings:
hipchat_key: ''
pagerduty_key: ''
graphite_url: 'http://localhost:8080'
graphite_auth_user: foo
graphite_auth_password: bar
redisurl: 'redis://localhost:6379'
log_file: '/var/log/graphite-alerts.log'
log_level: debug
```
## Alert Format
Alerts have a simple configuration, you give a target first (the source in graphite), and add some rules
Simple Example:
```
alerts:
- target: servers.worker-1.system.load.load
name: system load
rules:
- greater than 5:
warning
- greater than 10:
critical
```
The first rule that triggers an alert will exit, and won't check the other rules.
You can combine greater and less than in some situations, suppose you have a metric hourly page views 10000,
if it goes over 50k you want to be alerted, but if it is less than 1000 you want alerts too because probably you
might have a problem.
Simple Example:
```
alerts:
- target: servers.worker-1.system.load.load
name: system load
rules:
- greater than 5:
warning
- greater than 10:
critical
- less than 0.1: # probably nothing is working on the server, heads up
warning
```
Optionally you can add a from field, and a method
```
from: -10min
check_method: average
```
from - The Graphite `from` parameter for how long to query for ex. `-10min` default `-1min`
check_method: `latest` or `average` average is default, but sometimes you might want latest,
average will take the average of not None values.
### Alerts based on history
Sometimes you want alerts not hard coded but based on history, suppose you have some servers working on
high load - converting mp4s maybe - and some are just have really low loads - just a chef/salt/puppet master.
If you have a couple of servers, its easy to hard code limits based on servers, but if you have more than a few
it becomes a pain. So here comes the historical alerts.
```
alerts:
- target: servers.*.system.load.load
name: system load
from: -10min
check_method: historical
rules:
- greater than historical * 2:
critical
- greater than historical * 1.2:
warning
```
This will fetch the historical data, find hourly average on the last 2 days, then will give a warning
if its over 1.2 of the usual load, and issue a critical alert if the load is 2 times then usual.
You can also combine this with hard coded alerts, here is an example:
```
alerts:
- target: servers.*.put.io.system.load.load
name: system load
from: -10min
check_method: historical
rules:
- less than 0.01:
warning
- less than 3:
nothing
- greater than historical * 2:
critical
- greater than historical * 1.1:
warning
```
If the load goes down 0.01 probably you are doing nothing with that server - maybe some services crashed on it ? -
The server might be working under very low load - like the usual load is just 1.0 - so you dont really want to wake
up if it goes over 2.0 - two times the usual load but, its still normal - so you add ``` less than 3: nothing ```
You can modify how historical data is grabbed,
```
alerts:
- target: servers.*.put.io.system.load.load
name: system load
from: -10min
check_method: historical
historical: summarize(target, "1hour", "avg") from -2days
rules:
- less than 0.1:
warning
- less than 3:
nothing
- greater than historical * 2:
critical
- greater than historical * 1.1:
warning
```
The default is taking the hourly average on the last 2 days but, sometimes you might want a longer or shorter period etc.
summarize(target, "1hour", "avg") and -2days are directly sent to graphite, so you can tweak it as much as you like.
In my opinion this adds an endless possibilities on dynamic metrics, like if you want to get alerts based on "daily signups",
you can easily add an alert based on history, so you'll get notified if you are on hacker news, and if it goes really low,
below the usual, you can get alerts and check whats going wrong - maybe there is a bug etc. -
Here is an example
```
alerts:
- target: summarize(stats_counts.signups, "1hour")
name: system load
from: -1day
check_method: historical
historical: summarize(target, "1hour", "avg") from -7days
rules:
- less than 1:
critical
- less than historical / 2:
critical
- greater than historical * 2:
critical
- greater than historical * 1.5:
warning
```
You'll get alerts if it goes lower than half the usual past week, and you'll get alerts if its double than usual,
if you have no signups today, you def. have a bug so you need alerts.
### Ordering of Alerts
Alerts with the same name and target will only be checked once! This is useful
if you want to have a subset of metrics with different check times and/or
values
```
Example:
- name: Load
target: aliasByNode(servers.worker-*.loadavg01,1)
rules:
- greater than .5:
warning
- name: Load
target: aliasByNode(servers.*.loadavg01,1)
rules:
- greater than 1:
warning
```
Any worker-* nodes will alert for anything 10 or higher but the catch all
will allow for the remaining metrics to be checked without alerting for
worker nodes above 5
### Credits
Originally I forked the project from https://github.com/philipcristiano/graphite-pager.
Changed the rules, removed environment variables, added historical alerts etc.
### You can consider this pre-alpha, so think again if you want to use this.
### TODO
- just check every day, hour etc. (maybe a cron like syntax ?)
- save alerts, warnings somewhere
没有合适的资源?快使用搜索试试~ 我知道了~
一个简单的Graphite指标警报应用程序_JavaScript_HTML_.zip
共180个文件
js:48个
py:33个
html:33个
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 74 浏览量
2023-04-05
13:07:17
上传
评论
收藏 959KB ZIP 举报
温馨提示
一个简单的Graphite指标警报应用程序_JavaScript_HTML_.zip
资源推荐
资源详情
资源评论
收起资源包目录
一个简单的Graphite指标警报应用程序_JavaScript_HTML_.zip (180个子文件)
setup.cfg 400B
bootstrap.css 124KB
bootstrap.min.css 103KB
bootstrap-responsive.css 22KB
bootstrap-responsive.min.css 16KB
rickshaw.css 6KB
rickshaw.min.css 5KB
graph.css 4KB
extensions.css 3KB
detail.css 1KB
jquery.gridster.min.css 1KB
legend.css 1KB
style.css 1KB
prettify.css 690B
lines.css 294B
.gitignore 333B
.gitignore 24B
layout.html 8KB
introduction.html 8KB
extensions.html 7KB
dashboard.html 6KB
series.html 5KB
example_07.html 3KB
index.html 3KB
example_06.html 3KB
example_05.html 3KB
fixed.html 3KB
hover.html 3KB
formatter.html 2KB
stops.html 2KB
lines.html 2KB
status.html 1KB
x_axis.html 1KB
colors.html 1KB
example_04.html 1KB
gaps.html 1KB
inconsistent.html 1KB
y_axis.html 1KB
jsonp.html 1KB
bars.html 976B
refresh.html 971B
example_03.html 857B
ajax.html 810B
scatterplot.html 747B
example_02.html 687B
negative.html 666B
simple.html 662B
example_01.html 459B
start.html 413B
newdashboard.html 241B
MANIFEST.in 18B
d3.v2.js 234KB
rickshaw.js 67KB
bootstrap.js 60KB
prettify.js 56KB
d3.min.js 56KB
rickshaw.min.js 50KB
jquery.gridster.min.js 32KB
bootstrap.min.js 28KB
d3.layout.min.js 17KB
underscore-min.js 13KB
Rickshaw.Class.js 6KB
Rickshaw.Graph.js 6KB
jquery.masonry.min.js 5KB
Rickshaw.Graph.HoverDetail.js 5KB
Rickshaw.Graph.Behavior.Series.Toggle.js 4KB
Rickshaw.Series.js 4KB
Rickshaw.Compat.ClassList.js 3KB
Rickshaw.Graph.Renderer.js 3KB
Rickshaw.Series.js 3KB
Rickshaw.Graph.Annotate.js 3KB
Rickshaw.Graph.Renderer.Bar.js 3KB
extensions.js 3KB
Rickshaw.Graph.Axis.X.js 3KB
Rickshaw.Fixtures.Time.js 3KB
Rickshaw.Graph.Axis.Y.js 2KB
Rickshaw.Graph.Renderer.js 2KB
Rickshaw.Graph.js 2KB
Rickshaw.Graph.Renderer.Area.js 2KB
Rickshaw.Fixtures.Color.js 2KB
Rickshaw.Series.FixedDuration.js 2KB
Rickshaw.Graph.Axis.Time.js 2KB
Rickshaw.Graph.Behavior.Series.Highlight.js 2KB
Rickshaw.Series.FixedDuration.js 2KB
Rickshaw.Graph.RangeSlider.js 1KB
Rickshaw.Graph.Ajax.js 1KB
Rickshaw.Color.Palette.js 1KB
Rickshaw.Graph.Legend.js 1KB
Rickshaw.Graph.Smoother.js 1KB
Rickshaw.Class.js 1KB
Rickshaw.Fixtures.Number.js 1KB
Rickshaw.Graph.Renderer.ScatterPlot.js 1KB
Rickshaw.Graph.Behavior.Series.Order.js 1KB
Rickshaw.Fixtures.RandomData.js 786B
Rickshaw.js 756B
Rickshaw.Graph.Renderer.Stack.js 692B
Rickshaw.Graph.Renderer.Line.js 633B
Rickshaw.Graph.Unstacker.js 417B
Rickshaw.Graph.JSONP.js 279B
.jshintignore 61B
共 180 条
- 1
- 2
资源评论
快撑死的鱼
- 粉丝: 1w+
- 资源: 9156
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功