# Cerberus
Guardian of Kubernetes and OpenShift Clusters
![Cerberus logo](media/logo_assets/full_color/over_light_background/cerberus-logo_small-color-light-full-horizontal.png)
Cerberus watches the Kubernetes/OpenShift clusters for dead nodes, system component failures/health and exposes a go or no-go signal which can be consumed by other workload generators or applications in the cluster and act accordingly.
### Workflow
![Cerberus workflow](media/cerberus-workflow.png)
### Installation
Instructions on how to setup, configure and run Cerberus can be found at [Installation](docs/installation.md).
### What Kubernetes/OpenShift components can Cerberus monitor?
Following are the components of Kubernetes/OpenShift that Cerberus can monitor today, we will be adding more soon.
Component | Description | Working
----------------------------------- | ---------------------------------------------------------------------------------------------------------------- | ------------------------- |
Nodes | Watches all the nodes including masters, workers as well as nodes created using custom MachineSets | :heavy_check_mark: |
Namespaces | Watches all the pods including containers running inside the pods in the namespaces specified in the config | :heavy_check_mark: |
Cluster Operators | Watches all Cluster Operators | :heavy_check_mark: |
Masters Schedulability | Watches and warns if masters nodes are marked as schedulable | :heavy_check_mark: |
Routes | Watches specified routes | :heavy_check_mark: |
CSRs | Warns if any CSRs are not approved | :heavy_check_mark: |
Critical Alerts | Warns the user on observing abnormal behavior which might effect the health of the cluster | :heavy_check_mark: |
Bring your own checks | Users can bring their own checks and Ceberus runs and includes them in the reporting as wells as go/no-go signal | :heavy_check_mark: |
An explanation of all the components that Cerberus can monitor are explained [here](docs/config.md)
### How does Cerberus report cluster health?
Cerberus exposes the cluster health and failures through a go/no-go signal, report and metrics API.
#### Go or no-go signal
When the cerberus is configured to run in the daemon mode, it will continuosly monitor the components specified, runs a light weight http server at http://0.0.0.0:8080 and publishes the signal i.e True or False depending on the components status. The tools can consume the signal and act accordingly.
#### Report
The report is generated in the run directory and it contains the information about each check/monitored component status per iteration with timestamps. It also displays information about the components in case of failure. Refer [report](docs/example_report.md) for example.
#### Metrics API
Cerberus exposes the metrics including the failures observed during the run through an API. Tools consuming Cerberus can query the API to get a blob of json with the observed failures to scrape and act accordingly. For example, we can query for etcd failures within a start and end time and take actions to determine pass/fail for test cases or report whether the cluster is healthy or unhealthy for that duration.
- The failures in the past 1 hour can be retrieved in the json format by visiting http://0.0.0.0:8080/history.
- The failures in a specific time window can be retrieved in the json format by visiting http://0.0.0.0:8080/history?loopback=<interval>.
- The failures between two time timestamps, the failures of specific issues types and the failures related to specific components can be retrieved in the json format by visiting http://0.0.0.0:8080/analyze url. The filters have to be applied to scrape the failures accordingly.
### Slack integration
Cerberus supports reporting failures in slack. Refer [slack integration](docs/slack.md) for information on how to set it up.
### Node Problem Detector
Cerberus also consumes [node-problem-detector](https://github.com/kubernetes/node-problem-detector) to detect various failures in Kubernetes/OpenShift nodes. More information on setting it up can be found at [node-problem-detector](docs/node-problem-detector.md)
### Bring your own checks
Users can add additional checks to monitor components that are not being monitored by Cerberus and consume it as part of the go/no-go signal. This can be accomplished by placing relative paths of files containing additional checks under custom_checks in config file. All the checks should be placed within the main function of the file. If the additional checks need to be considered in determining the go/no-go signal of Cerberus, the main function can return a boolean value for the same. Having a dict return value of the format {'status':status, 'message':message} shall send signal to Cerberus along with message to be displayed in slack notification. However, it's optional to return a value.
Refer to [example_check](https://github.com/openshift-scale/cerberus/blob/master/custom_checks/custom_check_sample.py) for an example custom check file.
### Alerts
Monitoring metrics and alerting on abnormal behavior is critical as they are the indicators for clusters health. Information on supported alerts can be found at [alerts](docs/alerts.md).
### Use cases
There can be number of use cases, here are some of them:
- We run tools to push the limits of Kubernetes/OpenShift to look at the performance and scalability. There are a number of instances where system components or nodes start to degrade, which invalidates the results and the workload generator continues to push the cluster until it is unrecoverable.
- When running chaos experiments on a kubernetes/OpenShift cluster, they can potentially break the components unrelated to the targeted components which means that the chaos experiment won't be able to find it. The go/no-go signal can be used here to decide whether the cluster recovered from the failure injection as well as to decide whether to continue with the next chaos scenario.
### Tools consuming Cerberus
- [Benchmark Operator](https://github.com/cloud-bulldozer/benchmark-operator): The intent of this Operator is to deploy common workloads to establish a performance baseline of Kubernetes cluster on your provider. Benchmark Operator consumes Cerberus to determine if the cluster was healthy during the benchmark run. More information can be found at [cerberus-integration](https://github.com/cloud-bulldozer/benchmark-operator#cerberus-integration).
- [Kraken](https://github.com/openshift-scale/kraken/): Tool to inject deliberate failures into Kubernetes/OpenShift clusters to check if it is resilient. Kraken consumes Cerberus to determine if the cluster is healthy as a whole in addition to the targeted component during chaos testing. More information can be found at [cerberus-integration](https://github.com/openshift-scale/kraken#kraken-scenario-passfail-criteria-and-report).
### Blogs and other useful resources
- https://www.openshift.com/blog/openshift-scale-ci-part-4-introduction-to-cerberus-guardian-of-kubernetes/openshift-clouds
- https://www.openshift.com/blog/reinforcing-cerberus-guardian-of-openshift/kubernetes-clusters
### Contributions
We are always looking for more enhancements, fixes to make it better, any contributions are most welcome.
没有合适的资源?快使用搜索试试~ 我知道了~
Kubernetes和OpenShift集群的守护者 用于监视集群运行状况和发出故障信号/警报的工具 -Python-Shell
共104个文件
png:25个
py:17个
md:13个
需积分: 1 0 下载量 199 浏览量
2023-01-09
13:22:46
上传
评论
收藏 566KB ZIP 举报
温馨提示
Kubernetes和OpenShift集群的守护者 用于监视集群运行状况和发出故障信号/警报的工具 -Python-Shell
资源推荐
资源详情
资源评论
收起资源包目录
Kubernetes和OpenShift集群的守护者 用于监视集群运行状况和发出故障信号/警报的工具 -Python-Shell (104个子文件)
setup.cfg 1KB
Dockerfile 747B
Dockerfile-ppc64le 1KB
.gitignore 515B
analysis.html 743B
MANIFEST.in 314B
LICENSE 11KB
config.md 9KB
README.md 8KB
usage.md 6KB
installation.md 4KB
example_report.md 3KB
slack.md 2KB
README.md 2KB
node-problem-detector.md 2KB
contribute.md 2KB
build_own_image-README.md 1013B
alerts.md 935B
README.md 722B
README.md 118B
my_tests 74B
cerberus-logo_color-light-full-horizontal.pdf 47KB
cerberus-logo_color-dark-full-horizontal.pdf 47KB
cerberus-logo_color-black-full-horizontal.pdf 47KB
cerberus-logo_color-black-full-horiszontal.pdf 47KB
cerberus-logo_color-light-full-stacked.pdf 47KB
cerberus-logo_color-dark-full-stacked.pdf 47KB
cerberus-logo_color-black-full-stacked.pdf 47KB
cerberus-logo_color-black-full-stacked.pdf 47KB
cerberus-logo_color-light_mark-only.pdf 42KB
cerberus-logo_color-dark-mark-only.pdf 42KB
cerberus-logo_color-black-mark-only.pdf 42KB
cerberus-logo_color-black-mark-only..pdf 42KB
cerberus-workflow.png 110KB
cerberus-logo_color-dark-large-full-horizontal.png 18KB
cerberus-logo_large-color-light-full-horizontal.png 18KB
cerberus-logo_color-black-large-full-horizontal.png 18KB
cerberus-logo_color-large-white-full-horizontal.png 16KB
cerberus-logo_color-dark-large-full-stacked.png 15KB
cerberus-logo_large-color-light-full-stacked.png 15KB
cerberus-logo_color-black-large-full-stacked.png 14KB
cerberus-logo_color-large-white-full-stacked.png 13KB
cerberus-logo_small-color-light-full-horizontal.png 9KB
cerberus-logo_color-dark-small-full-horizontal.png 9KB
cerberus-logo_color-black-small-full-horizontal.png 8KB
cerberus-logo_color-small-white-full-horizontal.png 8KB
cerberus-logo_large-color-light-mark-only.png 7KB
cerberus-logo_color-dark-large-mark-only.png 7KB
cerberus-logo_color-black-large-mark-only.png 7KB
cerberus-logo_small-color-light-full-stacked.png 7KB
cerberus-logo_color-dark-small-full-stacked.png 7KB
cerberus-logo_color-large-white-mark-only.png 6KB
cerberus-logo_color-black-small-full-stacked.png 6KB
cerberus-logo_color-small-white-full-stacked.png 6KB
cerberus-logo_small-color-light-mark-only.png 3KB
cerberus-logo_color-dark-small-mark-only.png 3KB
cerberus-logo_color-black-small-mark-only.png 3KB
cerberus-logo_color-small-white-mark-only.png 3KB
start_cerberus.py 25KB
client.py 19KB
client.py 4KB
slack_client.py 3KB
server.py 3KB
client.py 2KB
inspect.py 1KB
command.py 1KB
custom_check_sample.py 374B
setup.py 297B
__init__.py 0B
__init__.py 0B
__init__.py 0B
__init__.py 0B
__init__.py 0B
__init__.py 0B
__init__.py 0B
run_ci.sh 2KB
run_test.sh 1KB
test_detailed_data_inspection.sh 979B
test_slack_integration.sh 879B
common.sh 478B
test_daemon_disabled.sh 410B
master_test.sh 293B
cerberus-logo_color-light-full-horizontal.svg 7KB
cerberus-logo_color-dark-full-horizontal.svg 7KB
cerberus-logo_color-light-full-stacked.svg 7KB
cerberus-logo_color-dark-full-stacked.svg 7KB
cerberus-logo_color-black-full-horiszontal.svg 7KB
cerberus-logo_color-white-full-stacked.svg 7KB
cerberus-logo_color-black-full-horizontal.svg 7KB
cerberus-logo_color-black-full-stacked.svg 7KB
cerberus-logo_color-light_mark-only.svg 2KB
cerberus-logo_color-dark-mark-only.svg 2KB
cerberus-logo_color-white-mark-only.svg 2KB
cerberus-logo_color-black-mark-only..svg 2KB
test_list 83B
requirements.txt 119B
config.yaml 5KB
kubernetes_config.yaml 4KB
.pre-commit-config.yaml 808B
hello_openshift_pod.yaml 607B
共 104 条
- 1
- 2
资源评论
普通网友
- 粉丝: 1w+
- 资源: 402
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- CM2200系列.pdf
- CM5000系列.pdf
- 大数据实验报告(已提交留档).7z
- CM1100系列.pdf
- CM7000系列(普通版).pdf
- CP2100 系列.pdf
- CP2500 (普通版).pdf
- CP2200系列.pdf
- 汇川H3U收卷机程序案例 收卷机完成藤条的收卷功能: 主机变频器采用力矩模式,排线伺服采用速度模式,定时中断采集主轴速度信号,排线伺服进行速度更随
- CP2500系列(智享版).pdf
- CP1100系列.pdf
- CP5000系列.pdf
- 敏捷实践指南-中文版(可搜索、带页码).pdf
- M9000系列.pdf
- CM9100、9700、M9100、9700系列.pdf
- docx文件转html文件word文件转html
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功