# JobFlow
## Introduction
In order to solve the problem of inter-job dependencies. We need many VCJobs to cooperate each other and orchestrate them manually or by another Job Orchestration Platform to get the job done finally.We present an new way of orchestrating VCJobs called JobFlow. We proposed two concepts to running multiple batch jobs automatically named JobTemplate and JobFlow so end users can easily declare their jobs and run them using complex controlling primitives, for example, sequential or parallel executing, if-then-else statement, switch-case statement, loop executing and so on.
JobFlow helps migrating AI, BigData, HPC workloads to the cloud-native world. Though there are already some workload flow engines, they are not designed for batch job workloads. Those jobs typically have a complex running dependencies and take long time to run, for example days or weeks. JobFlow helps the end users to declare their jobs as an jobTemplate and then reuse them accordingly. Also, JobFlow orchestrating those jobs using complex controlling primitives and launch those jobs automatically. This can significantly reduce the time consumption of an complex job and improve resource utilization. Finally, JobFlow is not an generally purposed workflow engine, it knows the details of VCJobs. End user can have a better understanding of their jobs, for example, job's running state, beginning and ending timestamps, the next jobs to run, pod-failure-ratio and so on.
## Scope
### In Scope
- Define the API of JobFlow
- Define the behaviour of JobFlow
- Start sequence between multiple jobs
- Dependency completion state of the job start sequence
- DAG-based job dependency startup
### Out of Scope
- Supports other job
- Achieve vcjobs level gang
## Scenarios
- Some jobs need to depend on the completion of the previous job or other status when running, etc. Otherwise, the correct result cannot be calculated.
- Sometimes inter-job dependencies also require diverse dependency types, such as conditional dependencies, circular dependencies, probes, and so on.
![jobflow-1.png](../images/jobflow-1.png)
## Design
![jobflow-2.png](../images/jobflow-2.png)
The blue part is the components of k8s itself, the orange is the existing definition of Volcano, and the red is the new definition of JobFlow.
**jobflow job submission complete process**:
1. After passing the Admission. kubectl will create JobTemplate and JobFlow (Volcano CRD) objects in kube-apiserver.
2. The JobFlowController uses the JobTemplate as a template according to the configuration of the JobFlow, and creates the corresponding VcJob according to the flow dependency rules.
3. After VcJob is created, VcJobController creates corresponding Pods and podgroups according to the configuration of VcJob.
4. After Pod and PodGroup are created, vc-scheduler will go to kube-apiserver to get Pod/PodGroup and node information.
5. After obtaining the information, vc-scheduler will select the appropriate node for each Pod according to its configured scheduling policy.
6. After assigning nodes to Pods, kubelet will get the Pod's configuration from kube-apiserver and start the corresponding containers.
**update jobflow**:
Currently, jobflow does not support the update operation, and the update of jobflow will be blocked through webhook.
**delete jobflow**:
Deleting a jobflow when the jobflow is in a non-complete state will be intercepted by the webhook. otherwise, after deleting jobflow, all vcjobs created by jobflow will be deleted directly.
### Controller
![jobflow-3.png](../images/jobflow-3.png)
### Webhook
```
Create a JobFlow check
1、There cannot be a template with the same name in a JobFlow dependency
Such as: A->B->A->C A appears twice
2、Closed loops cannot occur in JobFlow
E.g:A -> B -> C
^ /
| /
< - D
Create a JobTemplte check (following the vcjob parameter specification)
E.g: job minAvailable must be greater than or equal to zero
job maxRetry must be greater than or equal to zero
tasks cannot be empty, and cannot have tasks with the same name
The number of task replicas cannot be less than zero
task minAvailable cannot be greater than task replicas...
```
### JobFlow
#### Introduction
JobFlow defines the running flow of a set of jobs. Fields in JobFlow define how jobs are orchestrated.
JobFlow is abbreviated as jf, and the resource can be viewed through kubectl get jf
JobFlow aims to realize job-dependent operation between vcjobs in volcano. According to the dependency between vcjob, vcjob is issued.
#### Key Fields
##### Top-Level Attributes
The top-level attributes of a jobflow define its apiVersion, kind, metadata and spec.
| Attribute | Type | Required | Default Value | Description |
| ------------ | ----------------------- | -------- | -------------------------- | ------------------------------------------------------------ |
| `apiVersion` | `string` | Y | `flow.volcano.sh/v1alpha1` | A string that identifies the version of the schema the object should have. The core types uses `flow.volcano.sh/v1alpha1` in this version of documentation. |
| `kind` | `string` | Y | `JobFlow` | Must be `JobFlow` |
| `metadata` | [`Metadata`](#Metadata) | Y | | Information about the JobFlow resource. |
| `spec` | [`Spec`](#spec) | Y | | A specification for the JobFlow resource attributes. |
| `status` | [`Status`](#Status) | Y | | A specification for the JobFlow status attributes. |
<a id="Metadata"></a>
##### Metadata
Metadata provides basic information about the JobFlow.
| Attribute | Type | Required | Default Value | Description |
| ------------- | ------------------- | -------- | ------------- | ------------------------------------------------------------ |
| `name` | `string` | Y | | A name for the schematic. `name` is subject to the restrictions listed beneath this table. |
| `namespace` | `string` | Y | | A namespace for the schematic. `namespace` is subject to the restrictions listed beneath this table. |
| `labels` | `map[string]string` | N | | A set of string key/value pairs used as arbitrary labels on this component. Labels follow the [Kubernetes specification](https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/). |
| `annotations` | `map[string]string` | N | | A set of string key/value pairs used as arbitrary descriptive text associated with this object. Annotations follows the [Kubernetes specification](https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/#syntax-and-character-set). |
<a id="Spec"></a>
##### Spec
The specification of cloud-native services defines service metadata, version list, service capabilities and plugins.
| Attribute | Type | Required | Default Value | Description |
| ----------------- | ------------------------------------ | -------- | ------------- | ------------------------------------------------------------ |
| `flows` | [`Flow array`](#Flow) | Y | | Describes the dependencies between vcjobs. |
| `jobRetainPolicy` | `string` | Y | retain | After JobFlow succeed, keep the generated job. Otherwise, delete it. |
<a id="Flow"></a>
##### Flow
| Attribute | Type | Required | Default Value | Desc
没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
收起资源包目录
资料上传资料上传资料上传资料上传 (875个子文件)
volcano-admission.conf 1KB
volcano-scheduler-ci.conf 329B
volcano-scheduler.conf 297B
COPYING 11KB
jobp.coverprofile 6KB
Makefile.dev.def 363B
Makefile.def 333B
Makefile.release.def 328B
Dockerfile 1KB
Dockerfile 905B
Dockerfile 869B
Dockerfile 685B
Dockerfile 644B
Dockerfile 251B
Dockerfile 55B
jobflow.gif 2.37MB
.gitignore 2KB
binder_test.go 83KB
cache.go 49KB
binder.go 43KB
job_state_test.go 41KB
resource_info_test.go 38KB
admit_job_test.go 36KB
event_handlers.go 35KB
admission.go 35KB
volume_binding_test.go 32KB
job_controller_actions.go 27KB
validate_queue_test.go 26KB
job.go 24KB
job_error_handling.go 24KB
job_info.go 24KB
reclaim.go 23KB
predicates.go 21KB
cache_test.go 20KB
resource_info.go 20KB
nodeorder.go 20KB
session_plugins.go 20KB
job_controller_actions_test.go 19KB
tdm_test.go 19KB
event_handlers_test.go 18KB
jobflow_controller_action_test.go 18KB
session.go 17KB
usage_test.go 17KB
node_info.go 16KB
job_scheduling.go 16KB
preempt_test.go 16KB
volume_binding.go 15KB
drf.go 15KB
job_controller_util_test.go 14KB
job_controller_handler_test.go 14KB
view.go 14KB
proportion.go 14KB
job_info_test.go 14KB
assume_cache_test.go 13KB
test_utils.go 13KB
allocate_test.go 13KB
capacity.go 12KB
assume_cache.go 12KB
utils.go 12KB
admit_job.go 11KB
job_controller_handler.go 11KB
job_controller.go 11KB
node_info_test.go 11KB
statement.go 11KB
allocate.go 11KB
svc.go 11KB
proportion_test.go 11KB
util.go 10KB
tdm.go 10KB
jobflow_controller_action.go 10KB
preempt.go 10KB
view.go 10KB
preempt.go 10KB
topology.go 10KB
queue_controller.go 10KB
extender.go 10KB
manager.go 9KB
garbagecollector_test.go 9KB
numaaware.go 9KB
garbagecollector.go 9KB
cache_test.go 9KB
options.go 9KB
job_controller_plugins_test.go 9KB
admit_pod_test.go 9KB
job_lifecycle.go 9KB
binpack_test.go 9KB
node_utilization_util.go 9KB
allocate.go 8KB
util.go 8KB
hdrf_test.go 8KB
nodegroup.go 8KB
cache.go 8KB
job_plugins.go 8KB
pg_controller_handler.go 8KB
pytorch_test.go 8KB
topology_test.go 8KB
cpu_mng_test.go 8KB
metrics.go 8KB
binpack.go 8KB
socket.go 8KB
共 875 条
- 1
- 2
- 3
- 4
- 5
- 6
- 9
资源评论
半醉看夕阳
- 粉丝: 82
- 资源: 18
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功