# gusty
[![Versions](https://img.shields.io/badge/python-3.6+-blue)](https://pypi.org/project/gusty/)
[![PyPi](https://img.shields.io/pypi/v/gusty.svg)](https://pypi.org/project/gusty/)
![build](https://github.com/chriscardillo/gusty/workflows/build/badge.svg)
[![coverage](https://codecov.io/github/chriscardillo/gusty/coverage.svg?branch=master)](https://codecov.io/github/chriscardillo/gusty?branch=master)
[![Code Style](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
gusty allows you to control your Airflow DAGs, Task Groups, and Tasks with greater ease. gusty manages collections of tasks, represented as any number of YAML, Python, Jupyter Notebook, or R Markdown files. A directory of task files is instantly rendered into a DAG by passing a file path to gusty's `create_dag` function.
gusty also manages dependencies (within one DAG) and external dependencies (dependencies on tasks in other DAGs) for each task file you define. All you have to do is provide a list of `dependencies` or `external_dependencies` inside of a task file, and gusty will automatically set each task's dependencies and create external task sensors for any external dependencies listed.
gusty works with both Airflow 1.x and Airflow 2.x, and has even more features, all of which aim to make the creation, management, and iteration of DAGs more fluid, so that you can intuitively design your DAG and build your tasks.
## What's in gusty?
### Four Ways to Make Tasks
gusty will turn every file in a DAG directory into a task. gusty supports four different file types, which offer convenient ways to specify an operator and operator parameters for task creation.
| File Type | How It Works |
| --------- | -------------------------------------------------------------------------------------------------------------- |
| .yml | Declare an `operator` and pass in any operator parameters using YAML |
| .py | Simply define a function named `python_callable` and gusty will automatically turn it into a `PythonOperator` |
| .ipynb | Put a YAML block at the top of your notebook and specify an `operator` that renders your Jupyter Notebook |
| .Rmd | Use the YAML block at the top of your notebook and specify an `operator` that renders your R Markdown Document |
Here is quick example of a YAML task file, which might be called something like `hello_world.yml`:
```yml
operator: airflow.operators.bash.BashOperator
bash_command: echo hello world
```
The resulting task would be a `BashOperator` with the task id `hello_world`.
Here is the same approach using a Python file instead, named `hello_world.py`, which gusty will automatically turn into a `PythonOperator`:
```py
def python_callable():
phrase = "hello world"
print(phrase)
```
### Easy Dependencies
#### Declarative Dependencies
Every task file type supports `dependencies` and `external_dependencies` parameters, which gusty will use to automatically assign dependencies between tasks and create external task sensors for any external dependencies listed for a given task.
For .yml, .ipynb, and .Rmd task file types, dependencies and external_dependencies would be defined using YAML syntax:
```yml
operator: airflow.operators.bash.BashOperator
bash_command: echo hello world
dependencies:
- same_dag_task
external_dependencies:
- another_dag: another_task
- a_whole_dag: all
```
For external dependencies, the keyword `all` can be used when the task should wait on an entire external DAG to run successfully.
For a .py task file type, we can define these dependencies simply as variables:
```py
dependencies = [
"same_dag_task"
]
external_dependencies = [
{"another_dag": "another_task"},
{"a_whole_dag": "all"}
]
def python_callable():
phrase = "hello world"
print(phrase)
```
#### Dynamic Dependencies
gusty can also detect and generate dependencies through a task object's `dependencies` attribute. This means you can also **dynamically** set dependencies. One popular example of this option would be if your operator runs SQL, you can parse that SQL for table names, and attach a list of those table names to the operator's `dependencies` attribute. If those table names listed in the `dependencies` attribute are also task ids in the DAG, gusty will be able to automatically set these dependencies for you!
### DAG and TaskGroup Control
Both DAG and TaskGroup objects are created automatically simply by being directories and subfolders, respectively. The directory path you provide to gusty's `create_dag` function will become your DAG (and DAG name), and any subfolder in that DAG by default will be turned into a TaskGroup.
gusty offers a few compatible methods for configuring DAGs and Task Groups that we'll cover below.
#### Metadata
A special file name in any directory or subfolder is `METADATA.yml`, which gusty will use to determine how to configure that DAG or TaskGroup object.
Here is an example of a `METADATA.yml` file you might place in a DAG directory:
```yml
description: "An example of a DAG created using METADATA.yml"
schedule_interval: "1 0 * * *"
default_args:
owner: airflow
depends_on_past: False
start_date: !days_ago 1
email: airflow@example.com
email_on_failure: False
email_on_retry: False
retries: 1
retry_delay: !timedelta 'minutes: 5'
```
And here is an example of a `METADATA`.yml file you might place in a TaskGroup subfolder:
```yml
tooltip: "This is a task group tooltip"
prefix_group_id: True
dependencies:
- hello_world
```
As seen in the above example, gusty will also accept `dependencies` and `external_dependencies` in a TaskGroup's `METADATA.yml`. This means gusty can wire up your TaskGroup dependencies as well!
Note that gusty disables the TaskGroup `prefix_group_id` argument by default, as it's one of gusty's few opinions that tasks should explicitly named unless you say otherwise. gusty also offers a `suffix_group_id` argument for Task Groups!
#### create_dag
While `METADATA.yml` will always be the primary source of truth for a DAG or TaskGroup's configuration, gusty's `create_dag` function also accepts any parameters that can be passed to Airflow's DAG class, as well as a dictionary of `task_group_defaults` to set default behavior for any TaskGroup created by gusty.
Here's an example using `create_dag`, where instead of metadata we use `create_dag` arguments:
```py
import airflow
from datetime import timedelta
from airflow.utils.dates import days_ago
from gusty import create_dag
dag = create_dag(
'/usr/local/airflow/dags/hello_world',
description="A dag created without any metadata",
schedule_interval="1 0 * * *",
default_args={
"owner": "airflow",
"depends_on_past": False,
"start_date": days_ago(1),
"email": "airflow@example.com",
"email_on_failure": False,
"email_on_retry": False,
"retries": 1,
"retry_delay": timedelta(minutes=5),
},
task_group_defaults={
"tooltip": "This is a task group tooltip",
"prefix_group_id": True
}
)
```
You might notice that `task_group_defaults` does not include dependencies. For Task Groups, dependencies must be set using TaskGroup-specific metadata.
Default arguments in `create_dag` and a DAG or TaskGroup's `METADATA.yml` can be mixed and matched. `METADATA.yml` will always override defaults set in `create_dag`.
#### DAG-level Features
gusty features additional helpful arguments at the DAG-level to help you design your DAGs with ease:
- **`root_tasks`** - A list of task ids which should represent the roots of a DAG. For example, an HTTP sensor might have to succeed before any downstream tasks in the DAG run.
- **`leaf_tasks`** - A list of task ids which should represent the leaves of a DAG. For example, at the end of the DAG run, you might save a repo
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
资源分类:Python库 所属语言:Python 资源全名:gusty-0.4.0.tar.gz 资源来源:官方 安装方法:https://lanzao.blog.csdn.net/article/details/101784059
资源推荐
资源详情
资源评论
收起资源包目录
gusty-0.4.0.tar.gz (20个子文件)
gusty-0.4.0
PKG-INFO 12KB
gusty
__init__.py 2KB
parsing.py 5KB
building.py 32KB
importing.py 2KB
gusty.egg-info
PKG-INFO 12KB
requires.txt 54B
SOURCES.txt 420B
top_level.txt 12B
dependency_links.txt 1B
tests
test_parsing.py 381B
test_importing.py 905B
__init__.py 0B
test_python_tasks.py 1KB
test_adjusted_behavior.py 3KB
test_default_behavior.py 4KB
test_ignore_subfolders.py 1KB
setup.cfg 38B
setup.py 808B
README.md 10KB
共 20 条
- 1
资源评论
挣扎的蓝藻
- 粉丝: 14w+
- 资源: 15万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功