TensorHive
===
![](https://img.shields.io/badge/release-v0.3.6-brightgreen.svg?style=popout-square)
![](https://img.shields.io/badge/pypi-v0.3.6-brightgreen.svg?style=popout-square)
![](https://img.shields.io/badge/Issues%20and%20PRs-welcome-yellow.svg?style=popout-square)
![](https://img.shields.io/badge/platform-Linux-blue.svg?style=popout-square)
![](https://img.shields.io/badge/hardware-Nvidia-green.svg?style=popout-square)
![](https://img.shields.io/badge/python-3.5%20|%203.6%20|%203.7%20|%203.8-blue.svg?style=popout-square)
![](https://img.shields.io/badge/license-Apache%202.0-blue.svg?style=popout-square)
<img src="https://github.com/roscisz/TensorHive/raw/master/images/logo_small.png" height="130" align="left">
TensorHive is an open source tool for monitoring and managing computing resources across multiple hosts.
It solves the most common problems and nightmares about accessing and sharing your AI-oriented infrastructure across multiple, often competing users.
It's designed with __simplicity, flexibility and configuration-friendliness__ in mind.
---------------
### Main features:
#### GPU Reservation calendar
Each column represents all reservation events for a GPU on a given day.
In order to make a new reservation simply click and drag with your mouse, select GPU(s), add some meaningful title, optionally adjust time range.
If there are many hosts and GPUs in our infrastructure, you can use our simplified, horizontal calendar to quickly identify empty time slots and filter out already reserved GPUs.
![image](https://raw.githubusercontent.com/roscisz/TensorHive/master/images/reservations_overview_screenshot.png)
From now on, **only your processes are eligible to run on reserved GPU(s)**. TensorHive periodically checks if some other user has violated it. He will be spammed with warnings on all his PTYs, emailed every once in a while, additionally admin will also be notified (it all depends on the configuration).
Terminal warning | Email warning | Admin warning
:-------------------------:|:-------------------------:|:-------------------------:
![image](https://raw.githubusercontent.com/roscisz/TensorHive/master/images/terminal_warning_screenshot.png) | ![image](https://raw.githubusercontent.com/roscisz/TensorHive/master/images/email_warning_screenshot.png) | ![image](https://raw.githubusercontent.com/roscisz/TensorHive/master/images/admin_warning_screenshot.png)
#### Infrastructure monitoring dashboard
Accessible infrastructure can be monitored in the Nodes overview tab. Sample screenshot:
Here you can add new watches, select metrics and monitor ongoing GPU processes and its' owners.
![image](https://raw.githubusercontent.com/roscisz/TensorHive/master/images/nodes_overview_screenshot.png)
#### Task execution
Thanks to the `Task execution` module, you can define commands for tasks you want to run on any configured nodes.
You can manage them manually or set spawn/terminate date.
Commands are run within `screen` session, so attaching to it while they are running is a piece of cake.
It provides a simple, but flexible (**framework-agnostic**) command templating mechanism that will help you automate multi-node trainings.
Additionally, specialized templates help to conveniently set proper parameters for chosen well known frameworks:
![image](https://raw.githubusercontent.com/roscisz/TensorHive/master/examples/TF_CONFIG/img/multi_process.png)
In the [examples](https://github.com/roscisz/TensorHive/tree/master/examples)
directory, you will find sample scenarios of using the `Task execution` module for various
frameworks and computing environments.
TensorHive requires that users who want to use this feature must append TensorHive's public key to their `~/.ssh/authorized_keys` on all nodes they want to connect to.
---------------
### Use cases
Our goal is to provide solutions for painful problems that ML engineers often have to struggle with when working with remote machines in order to run neural network trainings.
#### You should really consider using TensorHive if anything described in profiles below matches you:
1. You're an **admin**, who is responsible for managing a cluster (or multiple servers) with powerful GPUs installed.
- :angry: There are more users than resources, so they have to compete for it, but you don't know how to deal with that chaos
- :ocean: Other popular tools are simply an overkill, have different purpose or require a lot of time to spend on reading documentation, installation and configuration (Grafana, Kubernetes, Slurm)
- :penguin: People using your infrastructure expect only one interface for all the things related to training models (besides terminal): monitoring, reservation calendar and scheduling distributed jobs
- :collision: Can't risk messing up sensitive configuration by installing software on each individual machine, prefering centralized solution which can be managed from one place
2. You're a **standalone user** who has access to beefy GPUs scattered across multiple machines.
- :part_alternation_mark: You want to be able to determine if batch size is too small or if there's a bottleneck when moving data from memory to GPU - charts with metrics such as `gpu_util`, `mem_util`, `mem_used` are great for this purpose
- :date: Visualizing names of training experiments using calendar helps you track how you're progressing on the project
- :snake: Launching distributed trainings is essential for you, no matter what the framework is
- :dizzy_face: Managing a list of training commands for all your distributed training experiments drives you nuts
- :zzz: Remembering to manually launch the training before going sleep is no fun anymore
#### Advantages of TensorHive
:zero: Dead-simple one-machine installation and configuration, no `sudo` requirements
:one: Users can make GPU reservations for specific time range in advance via **reservation mechanism**
:arrow_right: no more frustration caused by rules: **"first come, first served"** or **"the law of the jungle"**.
:two: Users can prepare and schedule custom tasks (commands) to be run on selected GPUs and hosts
:arrow_right: automate and simplify **distributed trainings** - **"one button to rule them all"**
:three: Gather all useful GPU metrics, from all configured hosts **in one dashboard**
:arrow_right: no more manual logging in to each individual machine in order to check if GPU is currently in use or not
For more details, check out the [full list of features](#features).
---------------
### Getting started
#### Prerequisites
* All nodes must be accessible via SSH, without password, using SSH Key-Based Authentication ([How to set up SSH keys](https://www.shellhacks.com/ssh-login-without-password/) - explained in [Quickstart section](#basic-usage))
* Only NVIDIA GPUs are supported (relying on ```nvidia-smi``` command)
* Currently TensorHive assumes that all users who want to register into the system must have identical UNIX usernames on all nodes configured by TensorHive administrator (not relevant for standalone developers)
* (optional) We recommend installing TensorHive on a separate user account (for example `tensorhive`) and adding this user to the `tty` system group.
#### Installation
##### via pip
```shell
pip install tensorhive
```
##### From source
(optional) For development purposes we encourage separation from your current python packages using e.g. virtualenv, Anaconda.
```shell
git clone https://github.com/roscisz/TensorHive.git && cd TensorHive
pip install -e .
```
TensorHive is already shipped with newest web app build, but in case you modify the source, you can can build it with `make app` (currently on `master` branch). For more useful commands see our [Makefile](https://github.com/roscisz/TensorHive/blob/master/tensorhive/Makefile).
Build tested with `Node v14.15.1` and `npm 6.14.8`
#### Basic usage
###### Quickst
没有合适的资源?快使用搜索试试~ 我知道了~
资源详情
资源评论
资源推荐
收起资源包目录
![package](https://csdnimg.cn/release/downloadcmsfe/public/img/package.f3fc750b.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/HTML.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
共 157 条
- 1
- 2
![gz](https://img-home.csdnimg.cn/images/20210720083447.png)
![gz](https://img-home.csdnimg.cn/images/20210720083447.png)
![whl](https://img-home.csdnimg.cn/images/20210720083646.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![gz](https://img-home.csdnimg.cn/images/20210720083447.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![avatar](https://profile-avatar.csdnimg.cn/277f6345dca0446498fbbc03843436aa_qq_38161040.jpg!1)
挣扎的蓝藻
- 粉丝: 13w+
- 资源: 15万+
上传资源 快速赚钱
我的内容管理 展开
我的资源 快来上传第一个资源
我的收益
登录查看自己的收益我的积分 登录查看自己的积分
我的C币 登录后查看C币余额
我的收藏
我的下载
下载帮助
![voice](https://csdnimg.cn/release/downloadcmsfe/public/img/voice.245cc511.png)
![center-task](https://csdnimg.cn/release/downloadcmsfe/public/img/center-task.c2eda91a.png)
安全验证
文档复制为VIP权益,开通VIP直接复制
![dialog-icon](https://csdnimg.cn/release/downloadcmsfe/public/img/green-success.6a4acb44.png)
评论0