# Backend.AI Agent
The Backend.AI Agent is a small daemon that does:
* Reports the status and available resource slots of a worker to the manager
* Routes code execution requests to the designated kernel container
* Manages the lifecycle of kernel containers (create/monitor/destroy them)
## Package Structure
* `ai.backend`
- `agent`: The agent package
- `docker`: A docker-based backend implementation for the kernel lifecycle interface.
- `server`: The agent daemon which communicates with the manager and the Docker daemon
- `watcher`: A side-by-side daemon which provides a separate HTTP endpoint for accessing the status
information of the agent daemon and manipulation of the agent's systemd service
- `helpers`: A utility package that is available as `ai.backend.helpers` *inside* Python-based containers
- `kernel`: Language-specific runtimes (mostly ipykernel client adaptor) which run *inside* containers
- `runner`: Auxiliary components (usually self-contained binaries) mounted *inside* contaienrs
## Installation
Please visit [the installation guides](https://github.com/lablup/backend.ai/wiki).
### Kernel/system configuration
#### Recommended kernel parameters in the bootloader (e.g., Grub):
```
cgroup_enable=memory swapaccount=1
```
#### Recommended resource limits:
**`/etc/security/limits.conf`**
```
root hard nofile 512000
root soft nofile 512000
root hard nproc 65536
root soft nproc 65536
user hard nofile 512000
user soft nofile 512000
user hard nproc 65536
user soft nproc 65536
```
**sysctl**
```
fs.file-max=2048000
fs.inotify.max_user_watches=524288
net.core.somaxconn=1024
net.ipv4.tcp_max_syn_backlog=1024
net.ipv4.tcp_slow_start_after_idle=0
net.ipv4.tcp_fin_timeout=10
net.ipv4.tcp_window_scaling=1
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_early_retrans=1
net.ipv4.ip_local_port_range="40000 65000"
net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.ipv4.tcp_rmem=4096 12582912 16777216
net.ipv4.tcp_wmem=4096 12582912 16777216
net.netfilter.nf_conntrack_max=10485760
net.netfilter.nf_conntrack_tcp_timeout_established=432000
net.netfilter.nf_conntrack_tcp_timeout_close_wait=10
net.netfilter.nf_conntrack_tcp_timeout_fin_wait=10
net.netfilter.nf_conntrack_tcp_timeout_time_wait=10
```
The `ip_local_port_range` should not overlap with the container port range pool
(default: 30000 to 31000).
### For development
#### Prerequisites
* `libsnappy-dev` or `snappy-devel` system package depending on your distro
* Python 3.6 or higher with [pyenv](https://github.com/pyenv/pyenv)
and [pyenv-virtualenv](https://github.com/pyenv/pyenv-virtualenv) (optional but recommneded)
* Docker 18.03 or later with docker-compose (18.09 or later is recommended)
First, you need **a working manager installation**.
For the detailed instructions on installing the manager, please refer
[the manager's README](https://github.com/lablup/backend.ai-manager/blob/master/README.md)
and come back here again.
#### Preparing working copy
Install and activate [`git-lfs`](https://git-lfs.github.com/) to work with pre-built binaries in
`src/ai/backend/runner`.
```console
$ git lfs install
```
Next, prepare the source clone of the agent and install from it as follows.
`pyenv` is just a recommendation; you may use other virtualenv management tools.
```console
$ git clone https://github.com/lablup/backend.ai-agent agent
$ cd agent
$ pyenv virtualenv venv-agent
$ pyenv local venv-agent
$ pip install -U pip setuptools
$ pip install -U -r requirements/dev.txt
```
### Linting
We use `flake8` and `mypy` to statically check our code styles and type consistency.
Enable those linters in your favorite IDE or editor.
### Halfstack (single-node development & testing)
With the halfstack, you can run the agent simply.
Note that you need a working manager running with the halfstack already!
#### Recommended directory structure
* `backend.ai-dev`
- `manager` (git clone from [the manager repo](https://github.com/lablup/backend.ai-manager))
- `agent` (git clone from here)
- `common` (git clone from [the common repo](https://github.com/lablup/backend.ai-common))
Install `backend.ai-common` as an editable package in the agent (and the manager) virtualenvs
to keep the codebase up-to-date.
```console
$ cd agent
$ pip install -U -e ../common
```
#### Steps
```console
$ mkdir -p "./scratches"
$ cp config/halfstack.toml ./agent.toml
```
Then, run it (for debugging, append a `--debug` flag):
```console
$ python -m ai.backend.agent.server
```
To run the agent-watcher:
```console
$ python -m ai.backend.agent.watcher
```
The watcher shares the same configuration TOML file with the agent.
Note that the watcher is only meaningful if the agent is installed as a systemd service
named `backendai-agent.service`.
To run tests:
```console
$ python -m flake8 src tests
$ python -m pytest -m 'not integration' tests
```
## Deployment
### Configuration
Put a TOML-formatted agent configuration (see the sample in `config/sample.toml`)
in one of the following locations:
* `agent.toml` (current working directory)
* `~/.config/backend.ai/agent.toml` (user-config directory)
* `/etc/backend.ai/agent.toml` (system-config directory)
Only the first found one is used by the daemon.
The agent reads most other configurations from the etcd v3 server where the cluster
administrator or the Backend.AI manager stores all the necessary settings.
The etcd address and namespace must match with the manager to make the agent
paired and activated.
By specifying distinguished namespaces, you may share a single etcd cluster with multiple
separate Backend.AI clusters.
By default the agent uses `/var/cache/scratches` directory for making temporary
home directories used by kernel containers (the `/home/work` volume mounted in
containers). Note that the directory must exist in prior and the agent-running
user must have ownership of it. You can change the location by
`scratch-root` option in `agent.toml`.
### Running from a command line
The minimal command to execute:
```sh
python -m ai.backend.agent.server
python -m ai.backend.agent.watcher
```
For more arguments and options, run the command with `--help` option.
### Example config for systemd
`/etc/systemd/system/backendai-agent.service`:
```dosini
[Unit]
Description=Backend.AI Agent
Requires=docker.service
After=network.target remote-fs.target docker.service
[Service]
Type=simple
User=root
Group=root
Environment=HOME=/home/user
ExecStart=/home/user/backend.ai/agent/run-agent.sh
WorkingDirectory=/home/user/backend.ai/agent
KillMode=process
KillSignal=SIGTERM
PrivateTmp=false
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
```
`/home/user/agent/run-agent.sh`:
```sh
#! /bin/sh
if [ -z "$PYENV_ROOT" ]; then
export PYENV_ROOT="$HOME/.pyenv"
export PATH="$PYENV_ROOT/bin:$PATH"
fi
eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"
cd /home/user/backend.ai/agent
if [ "$#" -eq 0 ]; then
exec python -m ai.backend.agent.server
else
exec "$@"
fi
```
### Networking
The manager and agent should run in the same local network or different
networks reachable via VPNs, whereas the manager's API service must be exposed to
the public network or another private network that users have access to.
The manager must be able to access TCP ports 6001, 6009, and 30000 to 31000 of the agents in default
configurations. You can of course change those port numbers and ranges in the configuration.
| Manager-to-Agent TCP Ports | Usage |
|:--------------------------:|-------|
| 6001 | ZeroMQ-based RPC calls from managers to agents |
| 6009 | HTTP watcher API |
| 30000-31000 | Port pool for in-container services |
The operation of agent itself does not require both incoming/outgoing access to
the public Internet, but if the user's computation programs need the Internet, the docker containers
should be able to access the public Internet (maybe via some co
没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
收起资源包目录
PyPI 官网下载 | backend.ai-agent-21.3.1.tar.gz (129个子文件)
.bash_profile 147B
.bashrc 98B
jail.ubuntu16.04.bin 4.38MB
jail.alpine3.8.bin 3.52MB
sftp-server.ubuntu20.04.x86_64.bin 3.3MB
sftp-server.ubuntu18.04.x86_64.bin 3.26MB
sftp-server.ubuntu16.04.x86_64.bin 3.24MB
scp.ubuntu20.04.x86_64.bin 3.23MB
scp.ubuntu18.04.x86_64.bin 3.21MB
scp.ubuntu16.04.x86_64.bin 3.19MB
sftp-server.centos7.6.x86_64.bin 3.17MB
sftp-server.alpine3.8.x86_64.bin 3.13MB
scp.centos7.6.x86_64.bin 3.13MB
scp.alpine3.8.x86_64.bin 3.08MB
tmux.glibc.x86_64.bin 2.66MB
tmux.musl.x86_64.bin 2.3MB
dropbear.glibc.x86_64.bin 1.38MB
dropbearconvert.glibc.x86_64.bin 1.06MB
dropbearkey.glibc.x86_64.bin 1.05MB
dropbear.musl.x86_64.bin 1.04MB
dropbearconvert.musl.x86_64.bin 636KB
dropbearkey.musl.x86_64.bin 618KB
su-exec.ubuntu20.04.x86_64.bin 21KB
su-exec.alpine3.8.x86_64.bin 17KB
su-exec.ubuntu18.04.x86_64.bin 16KB
su-exec.ubuntu16.04.x86_64.bin 15KB
su-exec.centos7.6.x86_64.bin 15KB
setup.cfg 3KB
.tmux.conf 2KB
jupyter-custom.css 6KB
krunner-extractor.dockerfile 65B
.dockerignore 20B
backendai-socket-relay.img.tar.gz 3.1MB
MANIFEST.in 127B
LICENSE 7KB
README.md 8KB
not-zip-safe 1B
PKG-INFO 12KB
PKG-INFO 12KB
agent.py 70KB
agent.py 50KB
resources.py 36KB
base.py 33KB
kernel.py 30KB
server.py 28KB
intrinsic.py 21KB
stats.py 18KB
watcher.py 14KB
kernel.py 12KB
utils.py 11KB
terminal.py 7KB
canvas.py 7KB
service.py 6KB
intrinsic.py 5KB
__init__.py 5KB
utils.py 5KB
config.py 5KB
__init__.py 4KB
__init__.py 4KB
resources.py 4KB
__init__.py 4KB
turtle.py 3KB
__init__.py 3KB
__init__.py 3KB
__init__.py 3KB
compat.py 3KB
__init__.py 3KB
__init__.py 3KB
jupyter_client.py 2KB
proxy.py 2KB
inproc.py 2KB
__init__.py 2KB
__init__.py 2KB
__init__.py 2KB
files.py 2KB
__init__.py 2KB
service_actions.py 2KB
__init__.py 2KB
types.py 2KB
__init__.py 2KB
__init__.py 2KB
sitecustomize.py 2KB
linux.py 2KB
__init__.py 2KB
color.py 2KB
logging.py 1KB
__init__.py 1KB
utils.py 1KB
__init__.py 1KB
package.py 1KB
__main__.py 1KB
fs.py 1KB
__init__.py 994B
extract_dotfiles.py 883B
test_utils.py 651B
types.py 569B
exception.py 506B
exception.py 292B
encoding.py 283B
__init__.py 161B
共 129 条
- 1
- 2
资源评论
挣扎的蓝藻
- 粉丝: 13w+
- 资源: 15万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功